Back

Back

Experiments

Built an AI Food Photography App in a Weekend.

DishGenie, the Nano Banana Hackathon, and Why the Gap Between a Phone Photo and a Professional Image Is a Product Opportunity Hiding in Plain Sight

Why the Gap Between a Phone Photo and a Professional Image Is a Product Opportunity

It started with a coffee and an ugly photo of a beautiful plate. 48 hours later, I had a working prototype that turns any dish photo into social-ready content — and a deeper appreciation for why this problem is harder, more valuable, and more interesting than it looks.

The Coffee Shop Moment

The best product ideas don't come from brainstorming. They come from paying attention.

I was sitting in a nearby restaurant, drinking coffee, watching the owner photograph today's special for Instagram. The dish was gorgeous — careful plating, vibrant colors, the kind of food that makes you stop scrolling. The photo, shot quickly on a phone under fluorescent lighting against a cluttered table, looked like evidence at a health inspection.

The gap between what the dish looked like in person and what it looked like on screen was enormous. And I immediately started thinking: this is the problem. Not the cooking. Not the plating. Not even the social media strategy. It's the last mile — the visual translation from reality to content.

Professional food photography solves this. A dedicated shoot with a food stylist, photographer, proper lighting, and post-production can cost $500 to $2,500 per session — often $2,500 to $7,500 all-in when you add food stylists and studio rental. That works for a chain restaurant refreshing their national menu. It doesn't work for a local restaurant owner who changes their specials daily and needs content in minutes, not weeks.

That friction stayed with me. And when Google DeepMind announced the Nano Banana Hackathon — a 48-hour challenge to build with Gemini 2.5 Flash Image Preview, with $400,000 in prizes — I had my idea.

Meet DishGenie

DishGenie is an AI tool that turns a simple phone photo of any dish into a gallery of professional, social-ready images — and short videos.

The interaction is deliberately simple. Upload a photo or take one with your camera. The AI analyzes the dish, identifies key visual elements, and generates six styled variations: different lighting, backgrounds, angles, and presentation styles. Each image is optimized for social media dimensions. From the same screen, you can download individual images, share directly to Instagram, TikTok, or Facebook, or generate a short video reel from the gallery.

The prototype was built in a weekend. The product questions it surfaced will take much longer to answer.

The Hackathon: 48 Hours with Nano Banana

The Nano Banana Hackathon ran from September 6-8, 2025, hosted by Google DeepMind on Kaggle in collaboration with Fal AI and ElevenLabs. Over 832 projects were submitted. 50 winners shared the prize pool. The model at the center — Gemini 2.5 Flash Image Preview, codenamed "Nano Banana" — represented a meaningful step forward in AI image generation: fast generation (1-2 seconds per image), strong character consistency, and natural language editing that lets you describe changes conversationally rather than tweaking parameters.

For a hackathon focused on consumer applications, the judging criteria were telling: Innovation and Wow Factor (40%), Technical Execution (30%), Impact and Utility (20%), and Presentation Quality (10%). That weighting tells you what Google wanted to see — not technically impressive demos, but products that feel like they should exist.

Here's the stack I used to build DishGenie:

  • Ideation and validation: Gemini Apps for research and pressure-testing the concept against real-world examples. Not just "is this technically possible?" but "do restaurant owners, food bloggers, and home cooks actually struggle with this?"

  • Product definition: A manually written PRD, refined with an LLM. The discipline of writing the product requirements document by hand first — before touching any model — was critical. It forced me to define the user, the workflow, the constraints, and the success criteria before the excitement of generation took over.

  • Prototyping and UX: Google AI Studio for rapid prompt and interface prototyping. The speed of iteration here was the difference between building one thing in 48 hours versus building the right thing.

  • Infrastructure: Google Cloud, deployed directly from AI Studio. You still need a GCP project, but the friction between prototype and deployment has genuinely collapsed.

  • Image generation: Gemini 2.5 Flash "Nano Banana" for all food image transformations. The model's strength with food photography was notable — it understood lighting, plating composition, and the kinds of backgrounds that make food content work on social media.

  • Video: Veo 3 for generating short reels from the styled images. The jump from static image to short-form video is where content becomes truly social-native.

  • Voiceover: ElevenLabs, using my own voice clone. The hackathon provided credits, and the integration was clean — narration generated in seconds, tone-matched to the content.

  • Avatar: Originally planned with FAL.ai (Sync-Lipsync 2), but time constraints pushed me to D-ID for the digital avatar. This is the reality of hackathons — you plan one architecture and ship another.

The complete build — from concept to working, deployed prototype — took a weekend. That's the sentence worth pausing on. Not because the technology is trivial, but because the toolchain has matured to the point where a single PM with an idea, a clear product spec, and access to the right APIs can produce something functional in hours.

Why This Problem Is Bigger Than It Looks

The restaurant owner's Instagram post was my personal trigger. But the market underneath this friction is massive.

The creator economy was valued at $205.25 billion in 2024 and is projected to reach $1.35 trillion by 2033, growing at a 23.3% CAGR, according to Grand View Research. Within that economy, the photography and videography segment accounts for the largest creative service revenue share. The visual content engine powering all of it — Instagram, TikTok, YouTube Shorts, Facebook — runs on a simple truth: great visuals drive discovery, and discovery drives revenue.

The IAB's 2025 Creator Economy Ad Spend & Strategy Report quantifies the advertiser side: U.S. creator ad spend hit $29.5 billion in 2024 and is projected to reach $37 billion in 2025, growing at 4x the rate of the overall media industry. Nearly half of all ad spenders now consider creators a "must buy" channel, ranking just behind paid search and social media. Retail brands alone are projected to spend $12.3 billion on creator ads in 2025.

And here's the number that connects the macro trend to the micro problem DishGenie addresses: 74% of people use social media to decide where to eat. For Gen Z specifically, 84% actively try social media food trends, and 70% identify TikTok as their most valuable platform for food recommendations. Gen Z represents 40% of global consumers.

The chain is clear: great food photos drive social engagement, social engagement drives discovery, discovery drives foot traffic and orders. The business that creates those photos faster, cheaper, and more consistently than the current alternatives captures a meaningful piece of that value chain.

What Building DishGenie Taught Me About AI Image Products

The hackathon was 48 hours. The product lessons are worth unpacking at length, because they apply to anyone building consumer AI applications — not just food photography tools.

The PRD Matters More Than the Prompt

My most important decision was writing the product requirements document before generating a single image. In a 48-hour hackathon, spending the first few hours on a text document feels counterintuitive. Every instinct screams "start building." But the PRD defined constraints that prevented me from building the wrong thing.

Specifically, it forced answers to: Who is the primary user? (Restaurant owner posting daily, not a food blogger planning a campaign.) What's the core workflow? (Photo in, six styled images out, one tap to share.) What does "good enough" look like? (Emotionally appetizing and platform-ready, not pixel-perfect studio quality.) What's the first "no"? (No multi-dish sessions, no menu-wide consistency, no print-ready output — those are V2 features.)

Without the PRD, I would have spent 48 hours chasing capabilities rather than shipping a product.

Food Is Technically Harder Than It Looks

AI image models handle most subjects reasonably well. Food has specific challenges that aren't obvious until you start building.

  • Texture and sheen. Food photography depends on micro-details: the glisten of a sauce, the crust on bread, steam rising from a bowl. Current generation models can produce these details, but inconsistently. A slightly wrong sheen turns appetizing into artificial. The difference between "this looks delicious" and "this looks like plastic" is a few pixels of specular highlight.

  • Color accuracy matters more. In fashion photography, shifting a color temperature slightly is a creative choice. In food photography, turning a steak slightly gray or a salad slightly yellow triggers an immediate "something's wrong" response. Human perception of food color is evolutionary — we're wired to detect spoilage. AI models don't have that calibration by default.

  • Composition is cultural. The way food is plated, propped, and photographed varies by cuisine, by platform, and by audience. A Japanese bento has different visual grammar than an Italian pasta dish. An overhead flat-lay works for bowls and pizzas; a 45-degree angle works for burgers and tall desserts. The AI needs to understand these conventions or the output feels off even when technically competent.

  • The "uncanny valley" of food. There's a specific failure mode where AI-generated food looks almost right but triggers discomfort — impossible physics of liquid, garnishes that float, ingredients that don't belong to the dish. Unlike other image categories where imperfection reads as "artistic," imperfect food images read as "unappetizing." The tolerance for error is lower than in almost any other visual category.

Speed > Perfection for This User

The restaurant owner posting daily specials doesn't need a 30-second render that's 95% photorealistic. They need a 3-second result that's 80% of professional quality. That's a deliberate design choice: optimize for time-to-post, not pixel perfection.

This insight shaped every DishGenie decision — from the number of generated styles (six is enough to choose from without overwhelming), to the output format (social-ready dimensions by default), to the sharing flow (one tap to Instagram, not export-then-upload).

The premium photography market and the "good enough for social" market are different products with different users, different price points, and different quality thresholds. Trying to serve both from the same tool is a product mistake.

Video Is the Real Unlock

Static images are table stakes. The moment I integrated Veo 3 for short reel generation, the product shifted from "nice image tool" to "content creation engine." A 15-second reel of a dish — with slow zoom, gentle motion, and music — generates dramatically more engagement than a static post. Instagram's algorithm preferences confirm this: Reels reach more new users than any other content format.

The jump from image to video is technically more complex but commercially more valuable. This is where the next generation of food content tools will compete — not on image quality, but on how effortlessly they produce platform-native video content.

The Upsell Is Obvious (and That's Fine)

DishGenie's free tier generates six images from one photo. The upsell — "More Styles" — scales to hundreds of style variations. This freemium model works because the value proof happens immediately. You upload a photo, you see six styled results in seconds, and you're already imagining what 50 more styles would look like.

The key is that the free tier must be genuinely useful on its own. Six styled images are enough for a daily social media post. The upsell serves a different use case: building a content library, preparing for a menu refresh, or creating platform-specific versions at scale.

The Harder Product Questions

Building DishGenie surfaced questions I couldn't answer in 48 hours — but that any serious product in this space needs to address.

Authenticity and Trust

If a restaurant generates AI-enhanced images of their dishes, how different can those images be from reality before they cross a line? This isn't hypothetical. Research published in Food Quality and Preference examined whether AI-generated food images could substitute for real photographs in digital menus. The findings raised important questions about consumer expectations and the potential for AI to create unrealistic visual standards.

The regulatory direction we're seeing in real estate photography (California's AB 723 requiring disclosure of digitally altered listing images) may eventually reach food marketing. Delivery platforms like Uber Eats and DoorDash already have policies about image accuracy — food photos must represent what customers will actually receive. An AI tool that makes a dish look dramatically better than it is creates a trust problem that manifests as one-star reviews.

The product decision: DishGenie should enhance presentation (lighting, background, composition) without altering the food itself. The dish should look like its best self, not like a different dish.

Platform-Specific Optimization

Instagram, TikTok, Facebook, Pinterest, Google Business Profile, Uber Eats, DoorDash, and a restaurant's own website each have different optimal image dimensions, aspect ratios, and visual conventions. A tool that generates one beautiful image still leaves the creator with a resize-and-adapt workflow for every platform.

The real product opportunity is platform-aware generation: produce the right image in the right format for the right channel from a single upload. Not six versions of the same style — six versions optimized for six different destinations.

Consistency Across a Menu

A restaurant doesn't need one beautiful photo. It needs 40 beautiful photos that look like they were shot in the same session, with the same aesthetic, by the same photographer. Current per-image AI tools don't maintain visual consistency across multiple dishes. Each generation is independent — different backgrounds, different lighting angles, different color temperatures.

Solving menu-wide consistency is the same multi-image coherence problem that virtual staging tools face with multi-room apartments. It's solvable but requires deliberate architectural choices: style embeddings that persist across sessions, reference images that anchor the visual language, and quality checks that flag inconsistencies before they ship.

The Economics: Why This Market Is Accelerating

The math on AI food photography vs. traditional alternatives is stark.

Professional food photography: $500-$2,500 per session for a handful of finished images. $2,500-$7,500 all-in with food stylists and studio rental. Turnaround: days to weeks.

Freelance photo editors on platforms like Fiverr: $5-$100 per image. Faster but still manual. Quality varies wildly. And as one analysis noted, editing can polish a decent photo but can't rescue a bad one.

Stock photography: $2-$20 per image — but it's not your actual food. Using stock images to represent your menu is the visual equivalent of describing your food with someone else's words.

AI tools like DishGenie: $15/month or less, with per-image costs under $1. Turnaround: seconds. Uses your actual dishes.

The cost compression from professional photography to AI-assisted content is roughly 97-99%, with turnaround compressed from days to seconds. For the 74% of restaurants that rate Instagram as very or extremely important to their marketing, and the growing number that need daily content for Reels and TikTok, the economic case is overwhelming.

The creator economy's growth makes this even more compelling. If individual content creators hold 58.7% of revenue share in the creator economy, and photography/videography is the largest creative service segment, then tools that democratize professional visual quality for individual creators are sitting at the intersection of the largest market segment and the highest-growth channel.

What I'd Build Next

If DishGenie were a real product roadmap rather than a hackathon project, here's where I'd go.

  • V1 (Hackathon scope): Upload → 6 styled images → download/share → optional reel generation. Done. Shipped.

  • V2 (Content engine): Platform-specific output (Instagram post, TikTok vertical, Uber Eats listing, Google Business Profile). Batch processing for full menus. Style consistency across multiple dishes.

  • V3 (Marketing suite): Automated A/B testing of food images — generate 10 versions, post them to different platforms, measure which drives more engagement and clicks. Integration with social media scheduling tools. Analytics on which visual styles perform best for different cuisines and platforms.

  • V4 (Commerce layer): Direct integration with delivery platforms. Automated menu photo updates when dishes change. Dynamic content generation tied to inventory — if a restaurant's special today is grilled salmon, generate fresh social content from a single photo taken at prep time.

Each version increases the surface area from content creation to content distribution to content performance — moving up the value chain from "make my photo look better" to "help me sell more food."

What This Weekend Reinforced About Building

I come back to the same pattern that's been running through every project I've built recently. The distance from observation to prototype has collapsed. Not for everything — complex systems, regulated products, and enterprise integrations still require proper teams and proper timelines. But for the broad category of "I noticed a friction point and want to see if there's a product in it" — the cycle time is now measured in hours.

The Nano Banana Hackathon formalized this into a challenge: 48 hours, one model, build something people would actually use. 832 teams submitted projects. The winning projects weren't the most technically sophisticated — they were the ones that identified a real human need and built the fastest path from that need to a working solution.

DishGenie started with a coffee, a restaurant owner, and a bad photo of a beautiful plate. It ended with a prototype that works, a market opportunity that's bigger than I expected, and a set of product questions I'm still thinking about.

That's the job. Pay attention. Notice the friction. Check whether it's real. Build something. Learn what you didn't know. Decide whether to keep going.

And if you're a food blogger, home cook, or restaurant owner: what style would you add next?

View more articles

Learn actionable strategies, proven workflows, and tips from experts to help your product thrive.