02"AI can generate nice images"≠ "AI images can go liveas product content"
IKEA's Content Genie explored 3 AI paths for product image generation. I tested 93 replacement cases to define the exact boundary where AI results are stable enough to ship — turning a tech capability into a productized feature with clear constraints.
01 — Problem Statement
"AI can generate nice images" ≠ "AI images can go live as product content"
IKEA China's content team produces PDP detail images, lifestyle inspiration scenes, and localized marketing assets for 3,000+ product ranges every year. The traditional path is photoshoots: studio + photographer + props + post-production = thousands of RMB per set, weeks of lead time.
When AI image generation matured, the team began exploring three technical paths: Prompt + LLM (natural language generation), LoRA + LLM (fine-tuned model for product replacement), and Depth Image + 3D + LLM (depth map with 3D model fusion). Demos looked great — but between a demo and "can replace photoshoots for e-commerce pages" lies an entire productization engineering challenge.
The core tension: AI generation is unstable. Prompt path may repaint the image or hallucinate; LoRA only works for same-category same-size items; 3D path is accurate but slow (2 min vs 30 sec). The product question isn't "can AI generate" — it's "under what conditions can the output be used directly."
01
Size is the key constraint
Technical spikes revealed: the closer the bounding-box dimensions (≥95% similar), the higher the product replacement success rate. Size mismatch causes AI to misplace products — this is the hard boundary for productization.
02
Difference actually helps
The greater the color, material, and form difference between replacement and original product, the easier for AI to identify and generate quality results. "Similar size + different appearance" is the optimal input combination.
03
Massive cost at stake
PDP 5.0 alone (140 ranges) already saves 4.2M RMB in shooting costs via AI generation. VSPR + Bundle represent 7.8M RMB in incremental sales potential. The ROI bottleneck isn't tech capability — it's quality stability.
02 — My Role
93 test cases later — I mapped where AI generation is stable enough to ship
As a product intern on the Content Genie team, I worked on the most critical gap between "tech validation" and "product launch": generation quality testing and scenario boundary definition for AIGC product images.
My core task: systematically testing 93 product replacement cases for the 95% similarity feature. Each case includes original scene images and AI-generated replacement results. My evaluation dimensions: product size accuracy, color fidelity, lighting naturalness, edge blending quality, and whether the output meets "ready for e-commerce display" standards.
This wasn't about glancing and saying "looks good" — it was about defining the product's usability boundary: which categories, size ratios, and scene complexities produce stable results; which need human adjustment; which the current tech path can't handle yet.
Quality analysis across 150+ cases
Beyond the 93 replacement tests, I analyzed ~150 AIGC cases across task types — color changes (PDP 5.0), prop-in replacement, background extension — forming a structured quality map by category × task type × failure mode.
Independent PRD from pain point discovery
During testing I identified a high-frequency bottleneck: Tmall hero image resizing. I independently authored the PRD for batch-cropping 24,000 product images across 6 channel specs — reducing ~2,400 person-days of manual work to automated hours.
03 — Research Process
Three AI paths tested — only one is production-ready for product replacement
The team explored three technical approaches to AI content generation. My testing work directly served the product decision of which path to ship first and under what constraints:
Path A — Prompt + LLM
Natural language instructions like "change the food in the pot to fried rice." Works for inspiration and ideation, but may repaint the image, produce hallucinations, and deliver unstable results. Cannot do precise product replacement.
Path B — LoRA + LLM (my focus)
Fine-tuned model trained on IKEA official product images. Replaces same-category, same-size items reliably — this is the 95% similarity replacement. Fails when target item has significantly different dimensions.
Path C — Depth + 3D + LLM
Rebuilds scene in 3D with depth info, merges 3D models precisely. Highest accuracy, solves occlusion — but 4× slower (2 min vs 30 sec), more steps, depends on 3D model quality. Future direction.
“My testing confirmed the product decision: Path B (LoRA) ships first, constrained to 95% similar-size same-category products. Path C becomes the FY26 H2 exploration. This isn't tech selection — it's defining under what conditions AI results can be trusted.”

The Content Space capability landscape. AIGC sits within Content Creation — but its quality determines whether generated assets can enter the distribution pipeline or stay as mere references.
04 — Framework
Content Space landscape — from source to distribution, AI enters at every layer
Content Space isn't just one product — it's an ecosystem with four capability layers. Understanding where AI fits in each layer was essential for prioritizing what to build and what to test.
Content Accessibility
440K+ global images, 16K+ local assets centralized in one space. AI-powered tagging completed 4M tags in 3 months, saving 35,000 working hours. Natural language search already live.
Content Creation (my focus)
AIGC generation (replace items, change colors, add props, image-to-video) + Digital Templates (36,000+ content batch-produced per tertial). This is where 95% similarity replacement sits — and where my 93 test cases lived.
Content Effectiveness
Personalized content enabler + performance dashboards + insight analysis. AI analyzes KOS posts for trending topics, popular products, and keywords — feeding back into what content to generate next.
Content Distribution
Automated distribution to Tmall, JD, Red, TikTok, WeChat, APP, SMS. The batch-crop PRD I authored addresses this layer — one image becomes six channel-ready formats automatically.
“Generation quality determines whether AIGC content can enter the distribution pipeline — or stays as mere "inspiration references." My testing defined that boundary: at what quality threshold can we confidently push AI-generated images to live e-commerce channels.”

The Content Space ecosystem: Sources → Capabilities (Accessibility / Creation / Effectiveness) → Distribution to omni-channels. AI-generated content must pass quality gates before entering distribution.
05 — Prioritization
Five use cases ranked — by quality stability and business value
Based on team testing results and the FY26 roadmap, Content Genie's AIGC capabilities are prioritized from "single deterministic operations" (color change) to "constrained replacements" (95% similarity) to "multi-step compositions" (VSPR) to "end-to-end orchestration" (Campaign Studio). Each step's input-output certainty decreases, so the productization order follows accordingly.
01
PDP 5.0 Color Change
Already live. Most deterministic (white-background → recolor → white-background). Quality auto-checkable. 140 ranges covered, 4.2M RMB shooting cost saved. Cost: 0.5 RMB/image.
02
95% Similarity Replacement
Development complete, in testing (my work). Constrained to same-category + 95% bounding-box match. LoRA model produces stable quality. User inputs product ID → system auto-matches eligible replacements.
03
VSPR Inspiration Images
100 sets PAX/BILLY/BESTA completed. Flow: design trending combination → white-background rendering → AI replaces product in scene → publish to channels. Est. 3.8M RMB incremental sales.
04
Image to Video
In preparation for testing. Generate 3–8 sec product intro videos from static images. Risk: motion naturalness, product structure stability. Current: AI livestream cutting already produces 10,000+ videos/year.
05
Campaign AI Studio (Vision)
AI Agent orchestrates: brief → audience targeting → product selection → content generation → landing page → distribution. Most complex, most dependencies. Positioned as long-term north star after scenarios 1–4 are proven.

Content Genie in action: AI-powered scene editing with structured inputs. The channel team uses it at ~15 min/image; HFRD checks quality at ~3 min/image.
06 — Key Insight
Why "95% similar size" is a product decision, not a tech metric
"95% similarity" looks like a technical parameter. It's actually a product boundary definition that means different things to different stakeholders:
For users
"You input a product ID, and the system auto-matches eligible replacement candidates — not any product you want, only those the system can generate stably." The constraint IS the UX.
For business
"Not every replacement request can be AI-generated — only same-category items with similar bounding-box dimensions. This is our current capability boundary, and we're transparent about it."
For engineering
"LoRA training sets are grouped by HFB category + size. Each group has its own quality threshold. The multi-model routing (Gemini for add/remove, Qwen FC for IKEA replacement, dedicated upscale service) reflects task-specific optimization."
“The key to productizing AI isn't "make AI do more" — it's clearly telling users what AI can and can't do, then guaranteeing quality within that boundary. My 93 test cases defined exactly where that line sits.”

A parallel capability already live: natural language asset search. Teams find existing assets before generating new ones — reducing unnecessary generation and its associated quality risk.
07 — Impact
Test results → shipping criteria → 4.2M RMB saved
My work wasn't academic research — it directly fed into product shipping decisions and business value realization.
Testing supported launch decisions
My 93-case analysis directly informed which categories to open first (chairs, tables, cabinets), which need more training data (complex assembled furniture), and which scenes to exclude (high-density occlusion). This became the shipping criteria.
Business value already realized
PDP 5.0 (the most mature color-change capability) covers 140 ranges, saves 4.2M RMB, produces 370+ sets / 1,000+ images per year at 0.5 RMB/image. 95% replacement as Tier 2 capability will further expand AI-producible product range.
Product thinking, not tech thinking
My work helped establish the principle: "it's not model capability that decides launch — it's business quality standards." This is exactly the Content Genie team's positioning: 'Win reputation, create real and solid value result under certain business cases.'
08 — Reflection