TechnicalDiscussion

AI, Design, and the Power of Open Models

The a16z ShowJune 15, 202642m 56s

Mohamed Nourouzi, CEO of Ideogram, discusses the release of their first open-weight image generation model (9.3B parameters), explaining why they went open-source, how JSON prompting enables precise design control, and their focus on taste, typography, and editable design for professional creative workflows. The conversation covers technical innovations in training, enterprise customization, and the future of agentic creative tools.

Summary

In this A16Z podcast episode, Yoko Lee and Justine Moore interview Mohamed Nourouzi, founder and CEO of Ideogram, about the company's first open-weight image generation model. Nourouzi explains that the decision to go open-weight was strategic — rather than competing solely on scale with giants like Google, Ideogram chose to focus on model innovation and partnership across the stack, including inference providers, chip makers, and enterprise clients who want on-premise hosting or fine-tuning capabilities.

A central technical innovation discussed is JSON prompting, where images are described in a structured format with thousands of words detailing every element, its position, bounding boxes, and layout. This intermediate representation allows language models to handle creative expansion while image diffusion models focus on rendering. Nourouzi acknowledges that the community initially struggled with this because simple or non-JSON prompts triggered safety blocks, but argues this structured approach unlocks precise design control critical for professional use cases. He also hints that future releases may move toward HTML-like representations, given that large language models are already trained on HTML.

The model's strength in text rendering is traced back to Ideogram's founding differentiation — three years ago, they noticed that accurate text in images was a major gap (competitors like DALL-E 2 famously garbled text), and leaning into typography became a core brand identity. Despite being only 9.3 billion parameters compared to prior SOTA models at ~80 billion, the model achieves competitive text accuracy through careful data curation, detailed image-to-text-to-image training pipelines using visual language models, and rigorous internal evaluation focused on taste rather than generic leaderboard metrics.

Nourouzi emphasizes 'taste' as a core design goal — the model intentionally avoids the homogenized aesthetic that results from heavy reinforcement learning, instead producing diverse styles. This is seen as a competitive differentiator, particularly for artists and brands who need distinctive visual output. The model supports customization starting from as few as 15 images via Ideogram's consumer product ($60/month), up to full enterprise fine-tuning with annotation teams helping define brand DNA, mascots, and keywords.

The conversation also covers the future roadmap: editable text and layout control (not yet released at time of recording), editing models that use the same JSON prompting approach, and agentic workflows via MCP and API. Nourouzi sees JSON/image composability as foundational to agentic creative pipelines, where agents can explore thousands of design variations before a human selects a direction to refine in a UI. He contrasts image model customization with language model customization, arguing that visual brand identity is far more diverse and distinctive than written communication, making fine-tuning more critical in the image domain.

About this episode

Yoko Li and Justine Moore speak with Ideogram founder and CEO Mohammad Norouzi about image generation models, design workflows, and the evolving relationship between AI and creative work. The conversation covers Ideogram's decision to release an open-weight model, the challenges of generating text and layouts within images, and why controllability has become an increasingly important area of research. They discuss prompting, customization, editing, and the tradeoffs between general-purpose models and systems optimized for specific creative tasks. Along the way, Norouzi shares his views on open-source AI, design tools, agentic workflows, and how image generation models may evolve as creators and enterprises seek greater control over their outputs.

Key Insights

Nourouzi argues that Ideogram's open-weight release is primarily a partnership strategy — by releasing weights, they can work with inference providers, chip makers, and enterprises who need on-prem hosting, rather than competing solely on compute scale against companies like Google.
Nourouzi claims the 9.3B parameter model achieves near-SOTA performance not through scaling but through innovation in data pipelines, specifically using AI to generate detailed image-to-text descriptions with bounding box and element information, then training image generation on those descriptions.
Nourouzi argues that JSON prompting is not meant for end users but serves as an intermediate representation between a language model's creative expansion and the diffusion model's rendering, and that all major labs (OpenAI, Google) do similar prompt expansion but don't expose it to users.
Nourouzi claims the model deliberately avoided heavy reinforcement learning, which he says causes frontier models to produce homogenized aesthetics that dominate leaderboards but lack stylistic diversity — Ideogram prioritized taste and style variation over benchmark scores.
Nourouzi contends that visual brand identity requires customization far more urgently than language models do, because people can immediately distinguish brands visually but cannot easily distinguish their written communications — making fine-tuning more commercially critical for image models.
Nourouzi argues that editing and fine-tuning are complementary rather than competitive: editing enables fast iterative workflows without training, while fine-tuning provides deeper adherence to complex characters or styles that are too nuanced to capture through reference image inputs alone.
Nourouzi suggests that the logical endpoint of JSON prompting — specifying every image detail — approaches pixel-level specification, but the practical constraint is that language models handle discrete tokens well but struggle with continuous high-dimensional outputs, keeping the representation in natural language or HTML-like formats.
Nourouzi states that enterprise customers repeatedly reported that generic image models failed to meet their design standards or brand guidelines, but after Ideogram trained custom models for them, clients described the result as the model understanding their 'brand DNA' — validating the commercial demand for specialized fine-tuning.

Topics

Open-weight image model releaseJSON prompting as intermediate representationTypography and text rendering accuracyModel size efficiency (9.3B vs 80B parameters)Enterprise customization and brand fine-tuningTaste as a design goalAgentic creative workflowsEditable design and layout control

Transcript

It's not about how good a model is in the general sense. It's about how good is this model for my use case. For a lot of design and marketing use cases, we need editable design, not a single flat image. It's super impressive, honestly, reaching the level of things like nano banana or GPT image with an open source model. Why did you think that was important? We really want our models to have taste. Every artist, they can really customize this model to the nuances of their style, the texture of their canvas, and really get 2K output and hopefully make that part of their workflow. One thing we were always wondering is that this release open source…

Full transcript available for MurmurCast members

View original source →

More from The a16z Show

Get AI summaries like this delivered to your inbox daily

AI, Design, and the Power of Open Models

Summary

About this episode

Key Insights

Topics

Transcript

More from The a16z Show

Fei-Fei Li on Spatial Intelligence and Robotics

Steven Sinofsky: AI Doesn't Need New Rules Yet

Ben Horowitz: The Fight Over Open Source AI

Sriram Krishnan on Open Source AI's Biggest Week Yet

Building the Physical AI Stack | Travis Kalanick on TBPN

Get AI summaries delivered to your inbox