Defining when AIproduct images aretrusted to ship

Company

IKEA Digital Team

Year

2026

Type

AI Product · Trust Boundary Design

Role

Product Intern / AIGC Quality & Scenario Analysis

The work was not about making AI images look impressive. It was about defining the quality boundary where generated assets become safe enough for e-commerce workflows — shipped, reviewed, or rejected with clear rules.

01 — Unsafe To Ship

AI images can look realistic and still be unsafe to ship.

I evaluated 93 product replacement tests and 150+ AIGC outputs, turning AI image review from a question of “does it look good?” into reusable quality criteria for shipping decisions.

A generated scene may look polished while quietly changing the product’s color, proportion, material, structure, or usage context. In e-commerce, that is not just a visual flaw. It is a product-truth risk.

93product replacement tests

150+AIGC outputs reviewed

3shipping states defined

Risk ComparisonOriginal SKU → AI Scene → Risk Annotation

OriginalOriginal SKU

→

GeneratedAI Generated Scene

→

Review

Color driftShape changedMaterial mismatchReview required

The problem is not that AI images look bad. It is that they can quietly stop representing the same SKU.

02 — Failure Pattern Board

The real risk is subtle rewriting of product truth.

The riskiest outputs were often not obviously broken images. They were the images that looked acceptable at first glance while product information had already changed.

So I stopped treating these failures as visual defects and started treating them as product-truth risks: color drift, material mismatch, structural repainting, scale distortion, lighting inconsistency, and unstable edge blending.

Failure Pattern BoardSix failure types worth reviewing

Color DriftProduct color changed

Scale DriftProduct proportion shifted

Structural DistortionGeometry no longer matches SKU

Material MismatchSurface texture was rewritten

Lighting InconsistencyScene lighting breaks realism

Edge Blending IssueProduct boundary looks unstable

Each card isolates one failure type so review language becomes reusable instead of subjective.

03 — Production Paths

I was not comparing model capability. I was comparing production paths.

Prompt + LLM, LoRA + LLM, and Depth / 3D + LLM were not useful to compare as isolated technical stacks. The product question was which path could actually enter a real content production workflow.

What mattered was not technical novelty, but whether each path was better suited for inspiration, controlled replacement, or future high-fidelity production.

AI Path MatrixThree paths, three product decisions

Prompt + LLM

Best For

Inspiration / scene exploration

Main Risk

Product repainting, hallucination, scale drift

Product DecisionReference Only

LoRA + LLM

Best For

Controlled replacement within similar categories

Main Risk

Depends on source quality, samples, and scene similarity

Product DecisionHuman Review

Depth / 3D + LLM

Best For

High-fidelity product image production

Main Risk

Higher cost, asset dependency, and process complexity

Product DecisionFuture Investment

This matrix expresses product trade-offs, not technical evangelism.

04 — Quality Criteria

Turning “it feels wrong” into reusable quality criteria.

The hardest part of image review is that everyone can say an image feels off, but it is much harder to explain why it cannot be used.

I broke subjective review into reusable dimensions: color, proportion, structure, material, lighting, edge blending, and e-commerce usability.

Quality ChecklistFrom taste-based review to review language

AI Generated ImageReview Surface

Color FidelityPass / Risk / Fail

Size AccuracyPass / Risk / Fail

Structure ConsistencyPass / Risk / Fail

Material MatchPass / Risk / Fail

Lighting NaturalnessPass / Risk / Fail

Edge BlendingPass / Risk / Fail

E-commerce UsabilityPass / Risk / Fail

The point was not to build a scoring system, but to establish a stable review vocabulary.

05 — Trust Boundary

The key judgment: Ship, Human Review, or Reference Only.

I grouped generated outputs into three states: ready to ship, requiring human review, or reference-only. This was the product boundary that turned AI from a generation tool into a controlled workflow.

Trust BoundaryThree routing decisions for every generated output

Ship

Stable colorStable structureAccurate proportion

Human Review

Minor color riskEdge uncertaintyScene relevance risk

Reference Only

Product repaintedStructure changedMisleading SKU

This is the strongest product-principle graphic on the page.

06 — Reflection

An AI PM’s value is not believing what the model can do. It is defining when the model should not be trusted.

For AIGC content production, model capability only becomes product value when it enters a workflow that is reviewable, explainable, and accountable.

The core of the workflow is not reducing how many images humans have to inspect. It is helping the team know which image can be shipped, which needs review, and which must never enter production.

AI capability becomes product capability only after its trust boundary is explicitly designed.

Next Project

Defining when AIproduct images aretrusted to ship

AI images can look realistic and still be unsafe to ship.

The real risk is subtle rewriting of product truth.

I was not comparing model capability. I was comparing production paths.

Turning “it feels wrong” into reusable quality criteria.

The key judgment: Ship, Human Review, or Reference Only.

An AI PM’s value is not believing what the model can do. It is defining when the model should not be trusted.

AIGC Visual Production