02

"AI can generate nice images"≠ "AI images can go liveas product content"

Company
IKEA Digital Team
Year
2026
Type
AI Product · Quality Boundary Definition
Role
Product Intern / AIGC Quality & Scenario Analysis

IKEA's Content Genie explored 3 AI paths for product image generation. I tested 93 replacement cases to define the exact boundary where AI results are stable enough to ship — turning a tech capability into a productized feature with clear constraints.

"AI can generate nice images" ≠ "AI images can go live as product content"

IKEA China's content team produces PDP detail images, lifestyle inspiration scenes, and localized marketing assets for 3,000+ product ranges every year. The traditional path is photoshoots: studio + photographer + props + post-production = thousands of RMB per set, weeks of lead time.

When AI image generation matured, the team began exploring three technical paths: Prompt + LLM (natural language generation), LoRA + LLM (fine-tuned model for product replacement), and Depth Image + 3D + LLM (depth map with 3D model fusion). Demos looked great — but between a demo and "can replace photoshoots for e-commerce pages" lies an entire productization engineering challenge.

The core tension: AI generation is unstable. Prompt path may repaint the image or hallucinate; LoRA only works for same-category same-size items; 3D path is accurate but slow (2 min vs 30 sec). The product question isn't "can AI generate" — it's "under what conditions can the output be used directly."

01

Size is the key constraint

Technical spikes revealed: the closer the bounding-box dimensions (≥95% similar), the higher the product replacement success rate. Size mismatch causes AI to misplace products — this is the hard boundary for productization.

02

Difference actually helps

The greater the color, material, and form difference between replacement and original product, the easier for AI to identify and generate quality results. "Similar size + different appearance" is the optimal input combination.

03

Massive cost at stake

PDP 5.0 alone (140 ranges) already saves 4.2M RMB in shooting costs via AI generation. VSPR + Bundle represent 7.8M RMB in incremental sales potential. The ROI bottleneck isn't tech capability — it's quality stability.

93 test cases later — I mapped where AI generation is stable enough to ship

As a product intern on the Content Genie team, I worked on the most critical gap between "tech validation" and "product launch": generation quality testing and scenario boundary definition for AIGC product images.

My core task: systematically testing 93 product replacement cases for the 95% similarity feature. Each case includes original scene images and AI-generated replacement results. My evaluation dimensions: product size accuracy, color fidelity, lighting naturalness, edge blending quality, and whether the output meets "ready for e-commerce display" standards.

This wasn't about glancing and saying "looks good" — it was about defining the product's usability boundary: which categories, size ratios, and scene complexities produce stable results; which need human adjustment; which the current tech path can't handle yet.

Quality analysis across 150+ cases

Beyond the 93 replacement tests, I analyzed ~150 AIGC cases across task types — color changes (PDP 5.0), prop-in replacement, background extension — forming a structured quality map by category × task type × failure mode.

Independent PRD from pain point discovery

During testing I identified a high-frequency bottleneck: Tmall hero image resizing. I independently authored the PRD for batch-cropping 24,000 product images across 6 channel specs — reducing ~2,400 person-days of manual work to automated hours.

Three AI paths tested — only one is production-ready for product replacement

The team explored three technical approaches to AI content generation. My testing work directly served the product decision of which path to ship first and under what constraints:

Path A — Prompt + LLM

Natural language instructions like "change the food in the pot to fried rice." Works for inspiration and ideation, but may repaint the image, produce hallucinations, and deliver unstable results. Cannot do precise product replacement.

Path B — LoRA + LLM (my focus)

Fine-tuned model trained on IKEA official product images. Replaces same-category, same-size items reliably — this is the 95% similarity replacement. Fails when target item has significantly different dimensions.

Path C — Depth + 3D + LLM

Rebuilds scene in 3D with depth info, merges 3D models precisely. Highest accuracy, solves occlusion — but 4× slower (2 min vs 30 sec), more steps, depends on 3D model quality. Future direction.

“My testing confirmed the product decision: Path B (LoRA) ships first, constrained to 95% similar-size same-category products. Path C becomes the FY26 H2 exploration. This isn't tech selection — it's defining under what conditions AI results can be trusted.”

Content Space four-quadrant problem definition — Content Accessibility, Creation, Effectiveness, Distribution

The Content Space capability landscape. AIGC sits within Content Creation — but its quality determines whether generated assets can enter the distribution pipeline or stay as mere references.

Content Space landscape — from source to distribution, AI enters at every layer

Content Space isn't just one product — it's an ecosystem with four capability layers. Understanding where AI fits in each layer was essential for prioritizing what to build and what to test.

Content Accessibility

440K+ global images, 16K+ local assets centralized in one space. AI-powered tagging completed 4M tags in 3 months, saving 35,000 working hours. Natural language search already live.

Content Creation (my focus)

AIGC generation (replace items, change colors, add props, image-to-video) + Digital Templates (36,000+ content batch-produced per tertial). This is where 95% similarity replacement sits — and where my 93 test cases lived.

Content Effectiveness

Personalized content enabler + performance dashboards + insight analysis. AI analyzes KOS posts for trending topics, popular products, and keywords — feeding back into what content to generate next.

Content Distribution

Automated distribution to Tmall, JD, Red, TikTok, WeChat, APP, SMS. The batch-crop PRD I authored addresses this layer — one image becomes six channel-ready formats automatically.

“Generation quality determines whether AIGC content can enter the distribution pipeline — or stays as mere "inspiration references." My testing defined that boundary: at what quality threshold can we confidently push AI-generated images to live e-commerce channels.”

Content Space Landscape — full ecosystem from Source to Capabilities to Integration

The Content Space ecosystem: Sources → Capabilities (Accessibility / Creation / Effectiveness) → Distribution to omni-channels. AI-generated content must pass quality gates before entering distribution.

Five use cases ranked — by quality stability and business value

Based on team testing results and the FY26 roadmap, Content Genie's AIGC capabilities are prioritized from "single deterministic operations" (color change) to "constrained replacements" (95% similarity) to "multi-step compositions" (VSPR) to "end-to-end orchestration" (Campaign Studio). Each step's input-output certainty decreases, so the productization order follows accordingly.

01

PDP 5.0 Color Change

Already live. Most deterministic (white-background → recolor → white-background). Quality auto-checkable. 140 ranges covered, 4.2M RMB shooting cost saved. Cost: 0.5 RMB/image.

02

95% Similarity Replacement

Development complete, in testing (my work). Constrained to same-category + 95% bounding-box match. LoRA model produces stable quality. User inputs product ID → system auto-matches eligible replacements.

03

VSPR Inspiration Images

100 sets PAX/BILLY/BESTA completed. Flow: design trending combination → white-background rendering → AI replaces product in scene → publish to channels. Est. 3.8M RMB incremental sales.

04

Image to Video

In preparation for testing. Generate 3–8 sec product intro videos from static images. Risk: motion naturalness, product structure stability. Current: AI livestream cutting already produces 10,000+ videos/year.

05

Campaign AI Studio (Vision)

AI Agent orchestrates: brief → audience targeting → product selection → content generation → landing page → distribution. Most complex, most dependencies. Positioned as long-term north star after scenarios 1–4 are proven.

AI scene editing capability — removing objects from lifestyle scenes with natural language instructions

Content Genie in action: AI-powered scene editing with structured inputs. The channel team uses it at ~15 min/image; HFRD checks quality at ~3 min/image.

Why "95% similar size" is a product decision, not a tech metric

"95% similarity" looks like a technical parameter. It's actually a product boundary definition that means different things to different stakeholders:

For users

"You input a product ID, and the system auto-matches eligible replacement candidates — not any product you want, only those the system can generate stably." The constraint IS the UX.

For business

"Not every replacement request can be AI-generated — only same-category items with similar bounding-box dimensions. This is our current capability boundary, and we're transparent about it."

For engineering

"LoRA training sets are grouped by HFB category + size. Each group has its own quality threshold. The multi-model routing (Gemini for add/remove, Qwen FC for IKEA replacement, dedicated upscale service) reflects task-specific optimization."

“The key to productizing AI isn't "make AI do more" — it's clearly telling users what AI can and can't do, then guaranteeing quality within that boundary. My 93 test cases defined exactly where that line sits.”

Natural language search — AI-powered content discovery within Content Space

A parallel capability already live: natural language asset search. Teams find existing assets before generating new ones — reducing unnecessary generation and its associated quality risk.

Test results → shipping criteria → 4.2M RMB saved

My work wasn't academic research — it directly fed into product shipping decisions and business value realization.

Testing supported launch decisions

My 93-case analysis directly informed which categories to open first (chairs, tables, cabinets), which need more training data (complex assembled furniture), and which scenes to exclude (high-density occlusion). This became the shipping criteria.

Business value already realized

PDP 5.0 (the most mature color-change capability) covers 140 ranges, saves 4.2M RMB, produces 370+ sets / 1,000+ images per year at 0.5 RMB/image. 95% replacement as Tier 2 capability will further expand AI-producible product range.

Product thinking, not tech thinking

My work helped establish the principle: "it's not model capability that decides launch — it's business quality standards." This is exactly the Content Genie team's positioning: 'Win reputation, create real and solid value result under certain business cases.'

AI product = define the boundary, not push the model

01

Product boundary > tech capability

At IKEA, the key to AI productization isn't "what the model can do" — it's "under what constraints can results be trusted." The 95% number comes from hundreds of validated cases, not guesswork.

02

Testing is product definition

My 93 test cases weren't QA bug-hunting — they were answering: "under what conditions can we tell users this feature is usable?" That's one of the core jobs of a product manager.

03

Cost structure drives priority

0.5 RMB/image AI cost vs thousands RMB/set for photoshoots. But this cost advantage only holds when AI quality reaches "directly usable" — otherwise it's back to shooting. Finding that exact quality threshold is the PM's job.

Next Project

AIGC Visual Production