Diffusion LLMs: A Promising Alternative in Generative AI
Generative AI is rapidly evolving, and a new contender, diffusion LLMs (dLLMs), is generating considerable excitement. This approach offers a potential shift from the more established autoregressive LLMs.
The conventional method for creating generative AI and large language models (LLMs) relies on autoregressive techniques. This system predicts the next word in a sequence, building responses word by word. However, diffusion LLMs present a different paradigm, which may offer some distinct advantages.
How Autoregressive LLMs Work
To understand diffusion LLMs, it’s important to first grasp how generative AI typically functions. Autoregressive algorithms essentially predict the next word to appear in a sentence. Inside the AI, words are represented numerically as tokens. When you enter a prompt, those words are converted into tokens, and the AI uses these tokens to determine what other tokens should be generated, on a token-by-token basis. These tokens are then converted back into words for display.
Diffusion in Image and Video Generation
Diffusion techniques are already used in generating images and videos. This is something most have seen or heard about already. The core concept involves a process that resembles sculpting. The AI starts with a block (static or noise) and systematically removes elements until the desired image or video emerges. Similarly to the way a sculptor works, the process eliminates what doesn’t belong to form the final product. The core contrast here is that, unlike adding paint in the case of image generation or video generation, AI diffusion techniques function analogously to the sculptor’s approach of removing the unnecessary pieces.
To data-train AI to generate an image of a cat involves data-training of existing pictures or renderings of a cat. Noise is added to the image and the AI is trained to remove this static. The AI is thus fed the clean and noisy versions of the image with the goal of removing the noise, to arrive back at the pristine version of the cat.
Text Generation via Diffusion
The same approach applied to images and videos can also generate text. Instead of starting with a blank canvas, the AI begins with garbled text, filled with noise. The AI is trained to remove the noise and transform it into coherent text. This process mirrors how it generates images: starting with a static-filled frame and removing noise until the desired image is revealed.
An Illustrative Example: “Why is the sky blue?”
Consider the question, “Why is the sky blue?” In an autoregressive LLM, this query would be converted into tokens (“why,” “is,” “the,” “sky,” “blue”), and the AI would assemble a response based on patterns learned from training data. This process mirrors adding brushstrokes, one at a time, to a blank canvas. A diffusion LLM, by contrast, would use garbled text produced from the prompt. The garbled data is then refined by removing the noise. For example:
- Initial noisy text: “skbl isz blu soshie rdackis flousy bof nofair soleish pur sang otto movei angok dorf sulu blsk”
- First pass: “Sky is blue soshie rdackis light flousy air molecules pur and blue light movei angok the most.”
- Second pass: “Sky is blue because rdackis light scatters off air molecules pur and blue light scatters angok the most.”
- Final pass: “The sky is blue because sunlight scatters off air molecules, and blue light scatters the most.”
Advantages of Diffusion LLMs
Diffusion LLMs offer some clear benefits. They have the potential for faster response times because the generation process can happen in parallel. It is also thought that coherence across a large volume of text can be enhanced. Some also assert that diffusion LLMs will end up being more “creative” than autoregression-based generative AIs.
Drawbacks and Considerations
Diffusion LLMs, while promising, have drawbacks. Diffusion LLMs may be less interpretable than autoregressive models. Further, issues dealing with potential mode collapse is an area of concern with these systems. The initial data training costs are potentially higher.
Conclusion: A Promising Frontier
The emergence of dLLMs signifies a valuable innovation in how we approach generative AI. As the field continues to evolve, dLLMs could change the way we approach generative AI. The industry is eager to test and develop diffusion LLMs further. As with any new technology, there are potential drawbacks and open questions. As Albert Einstein noted, “To raise new questions, new possibilities, to regard old problems from a new angle, requires creative imagination and marks real advance in science.”