Pruna AI Launches Open-Source Framework for AI Model Optimization
European startup Pruna AI is releasing its AI model optimization framework to the open-source community this Thursday. The framework focuses on compression algorithms to improve the efficiency of AI models. Pruna AI’s framework incorporates several methods such as caching, pruning, quantization, and distillation to optimize AI models.
“We also standardize saving and loading the compressed models, applying combinations of these compression methods, and also evaluating your compressed model after you compress it,” explained John Rachwan, co-founder and CTO of Pruna AI, in an interview with TechCrunch. The framework assesses the quality impact and performance benefits after compression.
Rachwan went on to explain, “If I were to use a metaphor, we are similar to how Hugging Face standardized transformers and diffusers — how to call them, how to save them, load them, etc. We are doing the same, but for efficiency methods.”
A Comprehensive Approach to AI Model Optimization
Major AI labs are already leveraging various compression techniques. For example, OpenAI employs distillation to create faster versions of its flagship models. This method likely enabled the creation of GPT-4 Turbo, an accelerated version of GPT-4. Similarly, the Flux.1-schnell image generation model is a distilled version from Black Forest Labs.
Distillation is a technique where knowledge is extracted from a large AI model, the “teacher,” to train a smaller “student” model. Developers feed requests to the teacher model, record its outputs, and assess their accuracy against a dataset. These outputs then train the student model to mimic the teacher’s behavior.
According to Rachwan, “For big companies, what they usually do is that they build this stuff in-house. And what you can find in the open source world is usually based on single methods. For example, let’s say one quantization method for LLMs, or one caching method for diffusion models. But you cannot find a tool that aggregates all of them, makes them all easy to use and combine together. And this is the big value that Pruna is bringing right now.”

While the framework supports diverse model types, Pruna AI currently focuses on image and video generation models. Notable users include Scenario and PhotoRoom. Besides the open-source edition, Pruna AI offers an enterprise version with advanced features, including an optimization agent.
Rachwan added, “The most exciting feature that we are releasing soon will be a compression agent. Basically, you give it your model, you say: ‘I want more speed but don’t drop my accuracy by more than 2%.’ And then, the agent will just do its magic. It will find the best combination for you, return it for you. You don’t have to do anything as a developer.”
Pruna AI’s pro version utilizes an hourly pricing model, similar to cloud-based GPU rentals, according to Rachwan. Optimized models can lead to significant cost savings in inference, especially for models critical to AI infrastructure. With its compression framework, Pruna AI has reduced the size of a Llama model eightfold without significant accuracy loss.
Pruna AI views its compression framework as a cost-effective investment. The company completed a $6.5 million seed funding round recently, with investments from EQT Ventures, Daphni, Motier Ventures, and Kima Ventures.