Why AI edits the parts of the image, that you did not ask it to

2026-05-13

You generate an image. The face is perfect. Exactly the expression you wanted, first try. The rest is a disaster: mangled hands, mushy background, bad lighting. So you type an edit and roll again.

The hands are fixed. The face is gone.

You still have roll one - it’s sitting in your history. But you can’t pin down the one part you loved while fixing the rest. Every attempt to repair the hands rolls a new face. So you sit there for forty more rolls, hoping lightning strikes twice in the same frame. It never does.

History is not control

The image didn’t vanish. The generations are still there. You can scroll back to roll one anytime.

But you can’t hold it still. You ask for one specific fix, and the model scrambles the whole board. You want different lighting; the composition wanders off. You want longer hair; the face mutates. You want a new background; the subject loses the pose you spent six rolls setting up.

There’s no reliable “…and keep everything else exactly the same.” You aren’t editing. You’re asking for a new image and praying the good parts come back.

Why it drifts

A prompt isn’t an edit. It’s a set of pressures - identity, pose, lighting, style, composition - pulling the pixels out of static.

When you change the prompt, you change the forces shaping the entire image. Even a tiny text change pushes the math down a different path. The person shifts, the expression slips, the pose loosens.

Locking the seed doesn’t save you. The seed fixes the starting static, not the destination. Same seed plus a changed prompt still wanders, because you’re steering somewhere else.

The image didn’t drift because the AI forgot. It drifted because you changed the target.

The overcooking problem

Drift is only half the battle. Repeated edits don’t just accumulate noise. They accumulate the model’s biases.

Every pass gives the model another excuse to decide what the image should look like. And it loves polish.

One edit might improve things. Ten edits push those tendencies to the extreme: hyper-contrast, blown-out saturation, plastic skin, aggressively centered compositions. Generically perfect garbage.

That’s overcooking: not degradation, but weaponized normalization.

Erased by the average

Models hate tension. They prefer the dead center of the bell curve. A slightly off-center subject creeps toward the middle. A real asymmetry gets “fixed.” A weird crop becomes a standard portrait.

The model mistakes your intent for an error.

This is brutal on delicate details. A soft expression, a strange-but-working hand pose, a fragile lighting effect - the model won’t see them as choices. It sees them as problems to be solved, crushing them into something predictable.

Prompts amplify whatever they name

Every edit changes what the model pays attention to.

Say the original image had a woman, freckles, a soft smile, sunset lighting, and an old camera. Then you ask: “fix the hands.” Now hands are salient. Everything else has relatively less protection. Nothing in that instruction says the freckles must stay identical. Nothing says the smile is more important than the correction.

Semantic prompts can also over-complete. Ask for rain and the model may not only add water droplets. It may add the whole statistical package of “rain scene”: umbrellas, puddles, reflections, dark clouds, wet jackets, people looking upward. Ask for a teapot twice and the model may decide teapots are the point of the image now.

The model normalizes its own normalization

The feedback loop is the important part.

Each generation starts from an image that has already been nudged toward the model’s preferred look. Then the next pass nudges that version again. Then the next pass nudges the nudged version.

It is like saving a JPEG over itself, except the artifact is not blocky compression. The artifact is accumulated statistical prior.

Every generation is another vote for the model’s opinion of what your image should have been.

Modern models do local edits now

Instruct-edit models like GPT-Image-2, Gemini, Flux, Qwen, SeeDream, and Grok are built to change one thing and leave the rest. They drift far less than the prompt-only tools of a few years ago. That progress is real.

Preservation is still imperfect, though, and the burden is still on you to know which mode you’re in: full regeneration, image-to-image variation, masked edit, reference-guided edit, or multi-turn conversational edit. Each one decides how much of the image the model is allowed to reinterpret. Ask for “warmer lighting” in the wrong mode and identity, pose, and composition can still shift underneath you. The frustration isn’t that editing is impossible now; it’s that you can’t always tell how much the model is about to reinvent.

What’s missing is control

Not creativity. Not “unlocking your inner artist.” Not vibes.

Control. The plain kind. This part stays; that part changes. A win stays a win instead of going back into the roll.

Painters get this, but not by magic - they get it through layers, masks, selections, and undo. They do repaint and scrape back; the difference is they choose exactly what survives. Prompt-only generation takes that choice away and hands the whole frame back to the dice every time.

The fix isn’t asking the model to remember what worked. It’s pinning what worked into pixels and letting the model touch only the parts you point it at. Once the face, hands, pose, or composition are locked into layers, the model no longer has to infer them from the prompt. It only has to harmonize what is already there.

Preserve the win the way mature tools always have: isolate it, protect it on a layer, and let the AI fix only the joins.

The companion piece - Fix part of an AI image without regenerating it - shows the actual move.