Decrypting the Latest Developments in the GenAI World: Tiny Language Models, Mistral NeMo, Gpt-4o-mini, and Llama-3.1
This week has been quite hectic with the model releases. Here is what you need to know:
- HuggingFace relaunched the race for the tiniest models with SmolLM.
- Mistral unveiled their latest state of the art 12B model, resulting from its partnership with Nvidia: the competition for the best small models has a new challenge.
- OpenAI released GPT-4o mini that will make you forget GPT-3.5, with performances above Anthropic’s Claude 3.0 Haiku and Google’s Gemini 1.5 Flash.
- Meta released a series of 3 state of the art models in each of their categories: the Llama 3.1 family, with at its top a 405B model, which is one of the biggest foundation models ever released.
A glimpse into the World of Tiny Models: Huggingface’s SmolLM
Last week, the French American AI company brought back the race for tiny language models with 3 models, from a suite they called SmolLM:
- one with 135 million parameters
- another with 360 million parameters
- and a last one with 1.5 billion parameters (Hugging Face, s.d.)
Each of those models are at the top of their categories and relaunches a race that was tackled by very few other actors (only other recent releases being from the Chinese Company, Alibaba’s Qwen models; and Google’s Gemini Nano models) [1,2].
While those models may not be yet up to the task -the more common open-source models are between 7B and 12B parameters, and frontier performance starts to be reached at 70B-, this race remains important.
This is a field that is very promising, as a model that would be the power of an 8B model while being able to fit into small and classical machines without issues is a very interesting perspective for any company.
Mistral’s latest flagship model: NeMo, made with Nvidia
On the 18th July, Mistral published their newest model: Mistral NeMo [4]. Mistral NeMo is a 12B parameters model that takes only text as an input but comes with a few peculiarities for its size.
First, this model was thought for use with long prompts with a native context window of 128k tokens. What this means is that it can use the information of much larger prompts than Llama-3–8B and Gemma-2–9B, which have a context window of 8k tokens. This makes it a much better model for complex use cases, which require quite the information. This is especially relevant considering that the model is better than the two latter models in almost all NLP-related benchmarks.
This model also offers impressive multilingual capabilities ; it has very good performances on several multilingual benchmarks, and is said to cover English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. Its size and multilingual size will make particularly useful for companies that aims at developing worldwide
The other characteristic is its training with quantization awareness. Quantization awareness means the model’s weights can be put into 8 bits with a very low degradation of the performances of the model. Putting the model’s weights into 8 bits will make them easy to store locally.
With this partnership with Nvidia, Mistral produced a strong model that reassigned them as ones of the go-to for Small Language Models (from the 7B to 14B parameter range), cementing their status as one of the big foundation model companies.
Such models are valuable for companies, as they provide flexible models that can be stored locally on Virtual Machines while keeping a high performance. The model is released under an Apache-2.0 license, which enables anyone to be able to use it for both research and commercial uses.
OpenAI’s latest model: GPT-4o mini
With greater performances than GPT 3–5 at lower cost (60% cheaper), GPT-4o mini (OpenAI, 2024) may break the greatest barrier from the adoption of GenAI in the industry: the cost.
Its lower cost makes it particularly useful for tasks requiring several API calls or integrating high volumes of contextual data such as complete integration of coding base or conversations. Hence developers will be able to develop applications integrating data or performing multiple actions with external systems.
Its excelling in mathematical reasoning and coding tasks will accelerate the development of applications requiring coding and mathematical skills by making accessible to a wider audience. In fact, it is the best model in its category of small frontier models, as it is cheaper and stronger than both Claude 3 Haiku and Gemini 1.5 Flash in most categories.
The second biggest barrier to the adoption of GenAI in the industry may be probably the risk related to it and this version of GPT is on its way to break it too. In the same way as they did for GPT-4o, experts in social psychology and misinformation tested the model [5]. In addition, the model can resist prompt injections jailbreaks, prompt injections, and system prompt extractions making it more reliable safer for applications at scale.
As of today, GPT-4o-mini processes both images and text inputs. OpenAI announced that it will be able to process audios and videos as well, which will make it particularly useful for the analysis of databases containing multimodal content. With this release, they make a strong stance and assert themselves as the current de facto company for middle size frontier models. This is a spot that could switch when Anthropic decides to release Claude 3.5 Haiku, so be on the lookout for the news!
Meta reveals a revolutionary suite of open-source models, of which Llama 3.1 405B is the flagship
The Llama 3.1 suite is a series of three State of the Art models on their categories: an 8B parameters, a 70B parameters and a 405B parameters.
All of them have a context length of 128k tokens, making it particularly useful for analysis long documents without potential loss of context. They are all already available on Hugging Face for use.
Through this innovative suite, Meta aims to redefine the future of AI, with itself as the center. We will focus here on what makes Llama 3.1 405B a significant milestone in AI development and explore the groundbreaking features and advancements that set it apart from the rest.
All models are the best in their category:
- Llama 3.1 8B rivals heavily against Mistral NeMo, while being smaller, which makes it easier to store and run.
- Llama 3.1 70B is miles better than GPT-3.5 in all textual benchmarks, and rivals even GPT-4o-mini on most of them.
- Llama 3.1 405B is state of the art on some benchmarks, and on the level of GPT-4o and Claude 3.5 Sonnet on most of them.
Focus: Llama-3.1 405B
If you’re not into the technical details, feel free to skip ahead to the next section!
Technical Features and Major Advancements
Llama 3.1 405B sets itself apart with its remarkable technical capabilities. Featuring an impressive 405 billion parameters, this model surpasses all its predecessors in the Llama 3 series.
Key functionalities include seamless integration with various ecosystems, particularly MetaAI, enhancing synergy with Meta’s tools and platforms.
Additional major technical features include strong multilingual capacity, and performances on par with the best of the best. It is worth noting that the model is text only. However, Meta also experimented with diving into multimodality for the 70B and 405B models and produced State of the Art versions for both video and audio, which they heavily documented and may make multimodal models available in the upcoming future. At the time, Meta declared that it won’t release its multimodal model in the EU due to unpredictable regulation.
Training the 405B model required substantial resources: vast amounts of diverse data, advanced training methods, and extensive computation time over several months (30 million GPU hours of training). Compared to smaller models in the Llama 3 series, the 405B model demanded 16 thousand GPUs. This comparison underscores significant advancements in data processing and model training that Meta has achieved with this new release.
On top of that, Llama-3.1 405B is already accessible on numerous platform providers: Groq, Azure, Bedrock, Nvidia… and its blazingly fast inference speed opens the door to almost immediate interaction.
Comparison with Competitors
When pitted against other state-of-the-art models like GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro, Meta’s Llama 3 405B holds its own.
Performance benchmarks reveal that Llama 3 405B is at the level of all of them. Despite its significantly higher parameter count, the gains in cost and control are substantial, as this is the first open-source model that reaches the performance of closed source models. This competitive advantage places Llama 3.1 405B at the forefront of AI advancements.
By offering a robust comparison, it becomes evident that Llama 3.1 405B not only competes but excels, making a strong case for its leading position in the AI landscape.
Redefining Use Cases
For data scientists working in the industry, Llama 3.1 405B opens new possibilities. Potential use cases include frontier-level processing of sensitive data, with fast inference speed.
This model can also unlock new industry use cases, as the 70B version or even itself can be locally stored (provided the resources) for use. This transformation in approach signifies a change in thinking in how data scientists can leverage advanced AI models to solve complex problems more efficiently and effectively.
All models are available for commercial use under a specific Llama 3.1 licence: the only restriction is that your service or product must not be over 700 million users.
The way forward
The introduction of Llama 3.1 405B marks a significant milestone in the landscape of large language models (LLMs). Its potential impact on the future of Generative AI is immense, promising more powerful and versatile tools to tackle complex challenges.
As language model capabilities continue to evolve, innovations like Llama 3.1 405B will be at the forefront of the upcoming digital transformation, redefining the limits of what we can achieve with artificial intelligence. This new model not only advances the field of AI but also sets a new standard for future developments in machine learning and data science.
The importance of such a massively strong model given to the open-source community is that it will allow companies which can afford it to have their own, local GPT-like model.
However, it is worth mentioning that Llama 3.1 405B will be a heavy model. It requires quite a storing capacity, which will make it very costly to host and to run. What we believe that will be the future norm at Sia Partners is the following, split between three divides:
- The companies that will use on-cloud, highly performing private and already existing solutions. Those are the companies that will put the emphasis on speed and flexibility.
- The companies that will use a mix of on-cloud models and specialist locally stored models: have the orchestrator be an on-cloud model, without access to the company’s sensible data, calling the local specialist operators, which will perform the operations (illustration bellow).
- The companies that will use fully locally stored models, with one big open-source model as the orchestrator, and smaller specialist models as the operators (illustration bellow). Those are the companies that will put the emphasis on having their models in-house for higher control over them.
The first kind of companies already exists, and integrations are already on their way for several use cases exploiting on-cloud models.
The second kind of companies has begun to emerge in recent months. We believe these companies will produce the most convincing results in their adoption of GenAI.
The last kind of companies are the ones that have yet to emerge, helped by the latest release of the Llama 3.1 herd. They will follow the same direction as the second category, with complete control of the data.
References
[1] : Qwen Team, Alibaba Group (2024), QWEN2 TECHNICAL REPORT
[2] : Google (2024), Gemini: A Family of Highly Capable
[3] : Hugging Face (2024), SmolLM
[4] : Mistral & Nvidia (2024), NeMo
[5] : OpenAI (2024), GPT-4o mini: advancing cost-efficient intelligence | OpenAI
[6] : Meta (2024), Introducing Llama 3.1: Our most capable models to date (meta.com)
Authors : Axel DARMOUNI, Romain BIGNOTTI, Jean-Sébastien ABESSOUGUIE BAYIHA, Raphaël TEBOUL, Vincent HAGUET