The short version: GPT Image 2 produces shippable photoreal ad avatars when you prompt it like a director, not a search engine. Five elements matter: specific character, specific environment, specific camera setup, specific emotional beat, explicit photorealism markers. The five templates below are the ones we run every week. Copy them, swap the variables, ship.
Most prompts written for GPT Image 2 are search queries dressed up as instructions. “Korean woman, 28, in a kitchen, golden hour” reads like an image-search box, and the output reads like one too — technically correct, visually generic. The model fills the missing detail with its median guess of every word.
Specificity is the unlock. When the prompt is specific enough, the model has fewer choices to make on your behalf, and the output starts looking like a real photograph instead of a stock illustration.
This post is the five templates we keep coming back to, plus the structure underneath them so you can write your own.
What is the best prompt structure for GPT Image 2 in 2026?
Five elements. Every photoreal ad avatar prompt we ship includes all five. According to OpenAI’s official documentation, GPT Image 2 is a state-of-the-art model for fast, high-quality image generation and editing — but the quality you actually get back is determined by how much specificity you put in.
- Character. Age, ethnicity, build, hair, expression. One distinguishing feature (mole, scar pattern, particular jaw, etc.) for consistency across the sprint.
- Environment. Location, time of day, light source, secondary objects in frame.
- Camera setup. Focal length, distance, angle. “Shot on 50mm at chest height” reads more photographic than nothing. According to Wikipedia’s reference on portrait photography, classic portrait lenses fall between 75 and 135 mm — but you do not need to know the theory to use it. You need the model to know which lens it is pretending to be.
- Emotional beat. What is the character feeling, and what micro-expression communicates it. Not “happy.” Instead: “the small smile of someone who just realized they got away with something.”
- Photorealism markers. Skin texture, lens characteristics, lighting fall-off, depth of field. Without these, GPT Image 2 trends toward a slightly idealized look the audience reads as AI.
Skip any one of these and the model fills in with its defaults. Defaults are what generic AI output looks like.
The five templates at a glance
| Template | Best for | What never changes |
|---|---|---|
| Bedroom Confessional | Late-night intimacy, parasocial pull | Single warm bedside lamp, framing from above, iPhone 15 Pro reference |
| Train Commuter | Universal weariness across markets | 35mm prime, three-quarter angle, window-behind-shoulder composition |
| Window Selfie at Golden Hour | Loneliness and aspiration for lifestyle apps | Golden hour rim light at 45 degrees, iPhone front camera |
| Group Chat Reaction | "Screenshot to your group chat" products | Cross-legged posture, phone-below-chin, involuntary laugh beat |
| Desk / Workspace Story | Productivity, creator economy, automation pitches | Warm tungsten desk lamp, over-the-shoulder angle, two-screens visual |
Template 1: Bedroom Confessional (3am, intimate, vulnerable)
The hook this template plays into: late-night intimacy, embarrassed self-awareness, audience parasocial pull.
Photorealistic portrait of [character description], lying in bed at 3am
under a single soft warm bedside lamp. Cream sweater, no makeup, hair
slightly messy. Expression: caught between embarrassment and amusement,
hands raised toward face as if to cover a quiet laugh. Shot from above
on iPhone 15 Pro, slight motion blur, shallow depth of field. Realistic
skin texture with visible pores and subtle imperfections. Warm amber
highlights, dark cool shadow tones. Vertical 9:16. No filters, no AI
gloss. Should look like a candid selfie taken by a friend.
What we swap per variant: character description, the specific micro-expression, the prop (sweater vs t-shirt vs hoodie), the time on the nightstand clock if visible.
What we never change: the lighting source (one warm bedside lamp), the framing (above), the iPhone 15 Pro reference. These hold the photorealism.
Template 2: Train Commuter (public transit, weary, public-facing private moment)
This template produced the salaryman shot from our recent campaign. It is one of our highest-converting templates because the setting reads as universal regardless of the character’s nationality.
Photorealistic portrait of [character description], seated on a public
commuter train at [time of day]. Suit and tie, neat hair, slightly
weary expression looking away from the camera. Window behind shoulder
showing motion blur of a passing station platform in soft natural
light. Shot on 35mm prime lens at chest height, three-quarter angle.
Visible skin texture, fine grain, slight shallow depth of field on the
seatback behind. Color palette: muted grays, soft blues, warm skin
tones. Vertical 9:16. Should look like a candid documentary frame, not
a posed portrait.
What we swap: character (we have run this in JP, KR, UK, NYC, Mumbai versions), train type (subway vs commuter rail vs intercity), time of day, expression specificity.
What we never change: the 35mm prime, the three-quarter angle, the window-behind-shoulder composition. These three together are what make the shot read as journalistic.
Template 3: Window Selfie at Golden Hour
This is the loneliness-and-aspiration template. Most of our highest-emotion creative for lifestyle apps comes out of variations of this one.
Photorealistic selfie-style portrait of [character description], leaning
against a window during golden hour. City skyline blurred behind through
the window. Cream knit sweater, slight tousled hair. One hand cradling
chin. Expression: contemplative, slightly melancholy, half-aware of the
camera. Warm amber light from the window catches the side of the face
and creates a rim. Shot on iPhone front camera at arm's length.
Realistic skin texture, slight bloom on the highlights, shallow depth
of field on the background. Color palette: warm gold, soft cream, deep
shadow. Vertical 9:16.
What we swap: character, city skyline (or substitute kitchen, balcony, cafe window), the chin-cradle gesture (we sometimes swap for hand-on-cheek or chin-on-fist), the season cues in the wardrobe.
What we never change: the golden hour light source position (45 degrees behind the character, hitting the cheekbone), the iPhone front camera reference, the rim light callout.
Template 4: Group Chat Reaction
The single-person reaction template. Used heavily for any product whose value prop involves “thing you would screenshot to your group chat.”
Photorealistic portrait of [character description] sitting cross-legged
on a couch in a small bright apartment, holding a phone in both hands
just below the chin. Expression: a quiet involuntary laugh as if
reacting to something a friend just said. One hand partially covering
the mouth. Warm sunlight from a nearby window throws a soft highlight
on the cheek. Background slightly blurred showing potted plants and a
beige wall. Shot on iPhone selfie camera, slight upward angle.
Realistic skin texture, no makeup gloss, candid feel. Color palette:
warm cream, soft green, peach skin highlights. Vertical 9:16.
What we swap: character, the laugh-trigger (we sometimes write the prompt with the implied conversation cue), the wardrobe, the apartment specifics (kitchen vs bedroom vs balcony).
What we never change: the cross-legged on-couch posture, the phone-below-chin holding position, the involuntary-laugh emotional beat. This combo reads more candid than any “smiling at phone” generic prompt.
Template 5: Desk / Workspace Story
This template is the “before/after” or “two timelines” workhorse. Used for productivity, creator economy, automation pitches.
Photorealistic portrait of [character description] at a small home
office desk, late evening, single warm desk lamp casting most of the
light. Two laptop screens visible at angle, one showing a complex edit
timeline (suggest, do not detail), the other showing a clean dashboard.
Expression: split between weariness on the left side of frame and
slight relief on the right. Coffee mug, notebook, headphones partially
in frame. Shot on 50mm at slight angle behind the shoulder. Realistic
skin texture, warm tungsten light tones, slight grain. Color palette:
amber, dark wood, deep shadow with warm highlight on the face. Vertical
9:16 for paid social, or 4:5 for feed.
What we swap: character, the contrast on the two screens (this template is good for any “before vs after using our product” beat), the wardrobe, the desk objects.
What we never change: the warm tungsten desk-lamp lighting, the over-the-shoulder camera angle, the two-screens visual.
How to maintain character consistency across a sprint
The biggest unlock once the templates are working: holding the same character across 10 to 20 different prompts.
Write the character description once. Save it at the top of your prompt file. Reuse it verbatim in every prompt in the sprint. Include one or two distinguishing features (a small mole on the right cheek, a specific hair texture, a particular eyebrow shape) and repeat them every time.
A character description that holds across a sprint looks like this:
CHARACTER (use verbatim in every prompt this sprint):
Korean woman, age 28. Long dark hair, side-parted, slightly wavy.
Almond eyes. Small mole on the right cheekbone. Slim build. Warm
medium-toned skin. Subtle natural makeup. Expression usually quiet and
introspective unless directed otherwise.
Reusing this exact paragraph at the top of every prompt in the sprint produces visibly the same person across 12 variants in roughly 90% of generations. The other 10% you re-roll. Cheap to do.
Common mistakes
- Asking for “natural” without anchoring it to a specific aesthetic. “Natural” is a cliche the model has many interpretations of. Anchor with a real reference (iPhone front camera, 35mm film, documentary photography).
- Over-describing the wardrobe. More than two sentences on clothing pulls the model toward fashion-photography aesthetics. Keep wardrobe to one short clause.
- Asking for “no AI look” in plain language. Does not work. The model does not know what AI looks like. Specify what you want instead (skin texture, depth of field, lens type).
- Skipping the camera setup. Without a focal length and angle, the model defaults to a generic head-on portrait that reads as posed.
Where to take this next
These five templates cover roughly 80% of what we ship for paid social. The other 20% are bespoke prompts written from scratch for specific briefs.
If you want to see the full workflow these templates plug into (creative direction, image generation, animation through Seedance 2.0, batching through Lovart), our AI performance creative case study walks through the campaign these prompts produced. Or if you want this running for your account, our AI performance creative service is the front door.
Seeing patterns like this in your own growth data?
We help growth-stage companies diagnose exactly what's working and what's not.
Book a Free Diagnostic