Inception Launches Diffusion-Based AI Model to Challenge LLMs
A new company, Inception, based in Palo Alto and founded by Stanford computer science professor Stefano Ermon, has announced the development of a novel AI model based on “diffusion” technology. This new model is being called a diffusion-based large language model, or “DLM.” The emergence of Inception adds to the growing landscape of generative AI, which currently revolves around two primary model types: large language models (LLMs) and diffusion models.
LLMs are the workhorses behind text generation, the technology that allows for the creation of human-like text. Conversely, diffusion models, which power popular art and media generators like Midjourney and OpenAI’s Sora, excel at creating images, video, and audio. Inception’s DLM attempts to unify these typically separate types of AI models, offering the capabilities of conventional LLMs, including generating code and answering questions, but with a critical advantage: significantly faster performance alongside reduced computing costs, according to the company.
Ermon explained to TechCrunch that he has spent considerable time in his Stanford lab investigating how to apply diffusion models to text. His research was driven by the observation that traditional LLMs are relatively slow compared to the potential of diffusion technology. Ermon explained that with LLMs, “you cannot generate the second word until you’ve generated the first one, and you cannot generate the third one until you generate the first two.” This sequential process inherently limits speed.
Ermon sought a method to apply a diffusion approach to text. Unlike LLMs, which are built on a sequential approach, diffusion models start with a rough representation of the data they are generating, be it an image or a block of text, and then refine it all at once. Ermon hypothesized that generating and modifying large blocks of text in parallel was achievable with diffusion models.
After years of dedicated research, Ermon and a student achieved a significant breakthrough, which they detailed in a research paper published last year. Recognizing the potential of this advancement, Ermon established Inception last summer, bringing in two former students, UCLA professor Aditya Grover and Cornell professor Volodymyr Kuleshov, to co-lead the company.
While Ermon declined to disclose the specific funding Inception has received, TechCrunch understands that the Mayfield Fund has invested in the company. Inception has already secured several customers, including unnamed Fortune 100 companies, by addressing their critical need for reduced AI latency and increased speed, Ermon stated.
“What we found is that our models can leverage the GPUs much more efficiently,” Ermon said, speaking of the computer chips typically used to run models in production. “I think this is a big deal. This is going to change the way people build language models.”
Inception offers an accessible API, along with options for on-premises and edge device deployment, complete with support for model fine-tuning. The company is also offering a suite of out-of-the-box DLMs tailored for a multitude of use cases. The company claims its DLMs can operate up to 10 times faster than conventional LLMs while incurring 10 times less in costs.
“Our ‘small’ coding model is as good as [OpenAI’s] GPT-4o mini while more than 10 times as fast,” a company spokesperson told TechCrunch. “Our ‘mini’ model outperforms small open-source models like [Meta’s] Llama 3.1 8B and achieves more than 1,000 tokens per second.”
“Tokens,” in industry terminology, refer to units of raw data processed by the models. One thousand tokens per second is certainly an impressive speed, assuming Inception’s claims are proven to be accurate.