AI Creative May 9, 2026

Last updated May 30, 2026

How to Prompt GPT Image 2 for Photorealistic Ad Avatars (5 Templates)

In this article

The short version: GPT Image 2 produces shippable photoreal ad avatars when you prompt it like a director, not a search engine. Five elements matter: specific character, specific environment, specific camera setup, specific emotional beat, explicit photorealism markers. The five templates below are the ones we run every week. Copy them, swap the variables, ship.

Most prompts written for GPT Image 2 are search queries dressed up as instructions. “Korean woman, 28, in a kitchen, golden hour” reads like an image-search box, and the output reads like one too: technically correct, visually generic. The model fills the missing detail with its median guess of every word.

Specificity is the unlock. When the prompt is specific enough, the model has fewer choices to make on your behalf, and the output starts looking like a real photograph instead of a stock illustration.

This post is the five templates we keep coming back to, plus the structure underneath them so you can write your own.

What is the best prompt structure for GPT Image 2 in 2026?

Five elements. Every photoreal ad avatar prompt we ship includes all five. According to OpenAI’s official documentation, GPT Image 2 is a state-of-the-art model for fast, high-quality image generation and editing. The quality you actually get back, though, comes down to how much specificity you put in.

Character. Age, ethnicity, build, hair, expression. One distinguishing feature (mole, scar pattern, particular jaw, etc.) for consistency across the sprint.
Environment. Location, time of day, light source, secondary objects in frame.
Camera setup. Focal length, distance, angle. “Shot on 50mm at chest height” reads more photographic than nothing. According to Wikipedia’s reference on portrait photography, classic portrait lenses fall between 75 and 135 mm. You do not need to know the theory to use it. You just need the model to know which lens it is pretending to be.
Emotional beat. What is the character feeling, and what micro-expression communicates it. “Happy” gets you nothing. “The small smile of someone who just realized they got away with something” gets you a face.
Photorealism markers. Skin texture, lens characteristics, lighting fall-off, depth of field. Without these, GPT Image 2 trends toward a slightly idealized look the audience reads as AI.

Skip any one of these and the model fills in with its defaults. Defaults are what generic AI output looks like.

The five templates at a glance

Template	Best for	What never changes
Bedroom Confessional	Late-night intimacy, parasocial pull	Single warm bedside lamp, framing from above, iPhone 15 Pro reference
Train Commuter	Universal weariness across markets	35mm prime, three-quarter angle, window-behind-shoulder composition
Window Selfie at Golden Hour	Loneliness and aspiration for lifestyle apps	Golden hour rim light at 45 degrees, iPhone front camera
Group Chat Reaction	"Screenshot to your group chat" products	Cross-legged posture, phone-below-chin, involuntary laugh beat
Desk / Workspace Story	Productivity, creator economy, automation pitches	Warm tungsten desk lamp, over-the-shoulder angle, two-screens visual

Template 1: Bedroom Confessional (3am, intimate, vulnerable)

Use when: the angle is late-night intimacy, embarrassed self-awareness, or anything that needs parasocial pull. Works best for journaling apps, mental-wellness products, dating apps, and DTC categories where the customer talks about the product like a friend told them.

Photorealistic portrait of [character description], lying in bed at 3am
under a single soft warm bedside lamp. Cream sweater, no makeup, hair
slightly messy. Expression: caught between embarrassment and amusement,
hands raised toward face as if to cover a quiet laugh. Shot from above
on iPhone 15 Pro, slight motion blur, shallow depth of field. Realistic
skin texture with visible pores and subtle imperfections. Warm amber
highlights, dark cool shadow tones. Vertical 9:16. No filters, no AI
gloss. Should look like a candid selfie taken by a friend.

What we swap per variant: character description, the specific micro-expression, the prop (sweater vs t-shirt vs hoodie), the time on the nightstand clock if visible.

What we never change: the lighting source (one warm bedside lamp), the framing (above), the iPhone 15 Pro reference. These hold the photorealism.

Template 2: Train Commuter (public transit, weary, public-facing private moment)

Use when: the angle is universal weariness, the in-between moments of a workday, or any “this is your life and you know it” hook. Works across markets because every major city has commuter trains, so the same template re-renders in Tokyo, Seoul, London, and Mumbai without losing the read.

This template produced the salaryman shot from our recent campaign. It is one of our highest-converting templates because the setting reads as universal regardless of the character’s nationality.

Photorealistic portrait of [character description], seated on a public
commuter train at [time of day]. Suit and tie, neat hair, slightly
weary expression looking away from the camera. Window behind shoulder
showing motion blur of a passing station platform in soft natural
light. Shot on 35mm prime lens at chest height, three-quarter angle.
Visible skin texture, fine grain, slight shallow depth of field on the
seatback behind. Color palette: muted grays, soft blues, warm skin
tones. Vertical 9:16. Should look like a candid documentary frame, not
a posed portrait.

What we swap: character (we have run this in JP, KR, UK, NYC, Mumbai versions), train type (subway vs commuter rail vs intercity), time of day, expression specificity.

What we never change: the 35mm prime, the three-quarter angle, the window-behind-shoulder composition. These three together are what make the shot read as journalistic.

Template 3: Window Selfie at Golden Hour

Use when: the angle is loneliness, longing, or aspiration without being on-the-nose about it. Works for lifestyle apps, travel, slow-living DTC, and anything where the product is sold against a feeling rather than a feature.

Most of our highest-emotion creative comes out of variations on this one, particularly the campaigns where the brief is “the feeling we are selling, not the product.”

Photorealistic selfie-style portrait of [character description], leaning
against a window during golden hour. City skyline blurred behind through
the window. Cream knit sweater, slight tousled hair. One hand cradling
chin. Expression: contemplative, slightly melancholy, half-aware of the
camera. Warm amber light from the window catches the side of the face
and creates a rim. Shot on iPhone front camera at arm's length.
Realistic skin texture, slight bloom on the highlights, shallow depth
of field on the background. Color palette: warm gold, soft cream, deep
shadow. Vertical 9:16.

What we swap: character, city skyline (or substitute kitchen, balcony, cafe window), the chin-cradle gesture (we sometimes swap for hand-on-cheek or chin-on-fist), the season cues in the wardrobe.

What we never change: the golden hour light source position (45 degrees behind the character, hitting the cheekbone), the iPhone front camera reference, the rim light callout.

Template 4: Group Chat Reaction

Use when: the value prop boils down to “you would screenshot this to your group chat.” Works for content products, novelty SaaS, dating, consumer fintech with a wow-moment, and anything where word-of-mouth is the growth loop.

The trick with this one is the implied second person. The character is reacting to someone the viewer cannot see. That absence pulls the viewer into the slot where the friend would be sitting. Get the laugh beat right and the ad does the heavy lifting on its own.

Photorealistic portrait of [character description] sitting cross-legged
on a couch in a small bright apartment, holding a phone in both hands
just below the chin. Expression: a quiet involuntary laugh as if
reacting to something a friend just said. One hand partially covering
the mouth. Warm sunlight from a nearby window throws a soft highlight
on the cheek. Background slightly blurred showing potted plants and a
beige wall. Shot on iPhone selfie camera, slight upward angle.
Realistic skin texture, no makeup gloss, candid feel. Color palette:
warm cream, soft green, peach skin highlights. Vertical 9:16.

What we swap: character, the laugh-trigger (we sometimes write the prompt with the implied conversation cue), the wardrobe, the apartment specifics (kitchen vs bedroom vs balcony).

What we never change: the cross-legged on-couch posture, the phone-below-chin holding position, the involuntary-laugh emotional beat. This combo reads more candid than any “smiling at phone” generic prompt.

Template 5: Desk / Workspace Story

Use when: the angle is a before/after, a two-timelines comparison, or “look how much easier your work could be.” Works for productivity SaaS, automation tools, creator-economy products, and B2B with a tangible workflow story.

The two-screens visual is doing the narrative work here. One screen is the pain (cluttered timeline, twelve tabs). The other is the relief (clean dashboard, one number). The expression bridges them: weary on one side of the face, lifted on the other. That contrast is what makes the ad legible in two seconds.

Photorealistic portrait of [character description] at a small home
office desk, late evening, single warm desk lamp casting most of the
light. Two laptop screens visible at angle, one showing a complex edit
timeline (suggest, do not detail), the other showing a clean dashboard.
Expression: split between weariness on the left side of frame and
slight relief on the right. Coffee mug, notebook, headphones partially
in frame. Shot on 50mm at slight angle behind the shoulder. Realistic
skin texture, warm tungsten light tones, slight grain. Color palette:
amber, dark wood, deep shadow with warm highlight on the face. Vertical
9:16 for paid social, or 4:5 for feed.

What we swap: character, the contrast on the two screens (this template is good for any “before vs after using our product” beat), the wardrobe, the desk objects.

What we never change: the warm tungsten desk-lamp lighting, the over-the-shoulder camera angle, the two-screens visual.

How to maintain character consistency across a sprint

The biggest unlock once the templates are working: holding the same character across 10 to 20 different prompts.

Write the character description once. Save it at the top of your prompt file. Reuse it verbatim in every prompt in the sprint. Include one or two distinguishing features (a small mole on the right cheek, a specific hair texture, a particular eyebrow shape) and repeat them every time.

A character description that holds across a sprint looks like this:

CHARACTER (use verbatim in every prompt this sprint):
Korean woman, age 28. Long dark hair, side-parted, slightly wavy.
Almond eyes. Small mole on the right cheekbone. Slim build. Warm
medium-toned skin. Subtle natural makeup. Expression usually quiet and
introspective unless directed otherwise.

Reusing this exact paragraph at the top of every prompt in the sprint produces visibly the same person across 12 variants in roughly 90% of generations. The other 10% you re-roll. Cheap to do.

Common mistakes

Asking for “natural” without anchoring it to a specific aesthetic. “Natural” is a cliche the model has many interpretations of. Anchor with a real reference (iPhone front camera, 35mm film, documentary photography).
Over-describing the wardrobe. More than two sentences on clothing pulls the model toward fashion-photography aesthetics. Keep wardrobe to one short clause.
Asking for “no AI look” in plain language. Does not work. The model does not know what AI looks like. Specify what you want instead (skin texture, depth of field, lens type).
Skipping the camera setup. Without a focal length and angle, the model defaults to a generic head-on portrait that reads as posed.

Where to take this next

These five templates cover roughly 80% of what we ship for paid social. The other 20% are bespoke prompts written from scratch for specific briefs.

If you want to see the full workflow these templates plug into (creative direction, image generation, animation through Seedance 2.0, batching through Lovart), our AI performance creative case study walks through the campaign these prompts produced. Or if you want this running for your account, our AI performance creative service is the front door.

Seeing patterns like this in your own growth data?

We help growth-stage companies diagnose exactly what's working and what's not.

Book a Free Diagnostic

Alex Montas Hernandez

Founder

Previously led growth at TubeBuddy (acquired by BENlabs), scaled Bloomberg's first DTC subscription, and drove measurable growth for brands like Verizon, Samsung, and Intel.

Frequently Asked Questions

What is the best prompt structure for GPT Image 2?

The strongest prompt structure for GPT Image 2 in 2026 has five elements: a specific character description (age, ethnicity, build, expression), a specific environment (location, time of day, lighting source), a specific camera setup (focal length, distance, angle), a specific emotional beat, and explicit photorealism markers (skin texture, depth of field, no AI gloss). Generic prompts produce generic output. Specific prompts produce shippable creative.

How do you make GPT Image 2 generate photorealistic faces?

Three things matter most. First, specify skin texture explicitly (the model defaults to slightly smoothed faces). Second, specify camera and focal length (50mm or 85mm prime lens reads more photographic than the default). Third, specify lighting source and time of day. Without these, GPT Image 2 trends toward a slightly idealized stock-photography look that the audience reads as AI.

Can GPT Image 2 generate consistent characters across multiple shots?

Yes, with discipline. Write the character description once, save it as a reusable token at the top of every prompt in the sprint, and include a short anchor phrase about distinguishing features (a small mole, a specific hair pattern, a particular jaw shape). Across a 12-variant sprint, character drift is usually under 10% with this approach.

Keep Reading

View all posts →

Ready to Turn These Ideas Into Results?

We don't just write about growth, we build the systems that make it compound.

Book a Free Strategy Call

Typically responds within 24 hours

Not ready to talk? Get the next post in your inbox.

Short emails. New posts as they ship.