Meta Announces Llama 4 – Open-Source AI Steps Forward

Meta has officially unveiled Llama 4, its newest generation of large language models, positioned as an open-source alternative that competes with today’s frontier AI systems while keeping weights publicly available.

The Llama 4 family introduces multiple variants – Scout, Maverick and the still-training Behemoth – designed to span everything from single-GPU deployments to massive, frontier-scale research models. For developers, open-source enthusiasts and AI startups (including niche projects like the Hungarian AI game NovaryonAI), this announcement marks a major shift in how powerful models can be used and adapted.

What exactly did Meta launch with Llama 4?

On 5 April 2025, Meta released two production-ready Llama 4 variants: Llama 4 Scout and Llama 4 Maverick. Both are mixture-of-experts (MoE) models that activate only a subset of parameters per token, giving them much higher effective capacity without linear cost scaling.

Llama 4 Scout – efficient open model

Llama 4 Scout is built for efficiency: it keeps around 17B active parameters per token, spread over 16 experts and totaling roughly 100B parameters overall. It supports context windows up to 10 million tokens, making it suitable for long-context applications like document analysis, codebases and multi-day chat histories.

According to Meta’s statements and early third-party reporting, Scout can run on a single Nvidia H100 GPU, targeting organizations that want “near-frontier” quality without hyperscaler budgets.

Llama 4 Maverick – performance challenger

Llama 4 Maverick pushes the architecture further, still keeping around 17B active parameters per token, but now over 128 experts, for an effective model size around 400B parameters. It is designed as a high-end, multimodal model for demanding workloads in code, reasoning, and multi-modal tasks.

Early benchmark coverage suggests Maverick matches or surpasses several closed models in reasoning and coding tasks, and aims to rival systems like GPT-4-class and Gemini-class models on many public leaderboards.

        Key Llama 4 highlights
        Mixture-of-experts architecture with ~17B active parameters per token.
Scout: 16 experts, ~109B total parameters, up to 10M-token context window.
Maverick: 128 experts, ~400B total parameters, aimed at frontier-level performance.
Models released under an open-weights license for wide community use.
Behemoth, a much larger MoE system, is in training as a future foundation model.

      

Note: numbers are based on publicly reported information from Meta and secondary coverage at the time of writing. Exact configs may evolve as the models are updated.

Open-source AI that aims at frontier performance

Since the first LLaMA releases, Meta’s strategy has been to push open-source large language models as a counter-weight to closed-source offerings from OpenAI, Google and others. With Llama 4, the company explicitly claims it wants performance parity with frontier systems while still publishing weights for researchers and companies to build on.

Llama 4 continues the trend of:

full-weight releases (not just hosted APIs),
multilingual training across a wide set of languages,
multimodal input (text + images, with broader audio/video support announced),
and a growing ecosystem of fine-tunes, plugins and tools around the core models.

For many teams, the appeal is clear: access to a model that can seriously compete with the best commercial systems, but which can be deployed on-premise, customized deeply, and combined with proprietary data without giving that data to a third-party API.

What does Llama 4 change for developers and startups?

For developers, Llama 4 opens up several new possibilities:

High-end open models for reasoning, coding and multi-modal search.
Long-context workflows using the 10M-token window in Scout, such as full-repository code assistants or whole-archive chat analysis.
Custom fine-tuning on domain-specific data, without waiting for closed vendors to support a niche use case.
Hybrid stacks that mix Llama 4 with proprietary tools or smaller on-device models.

The open-weights nature of Llama 4 also means educational projects, local research groups and independent creators can experiment with near-frontier AI without paying per-token API fees.

How does this relate to NovaryonAI, a Hungarian AI gate?

NovaryonAI is a Hungarian-built AI logic game, not a general assistant. Instead of chatting for hours, players get one single sentence to convince a guarded AI gate. The system scores that sentence and decides whether the gate opens or stays closed.

A project like NovaryonAI can potentially leverage Llama-class models in several ways:

use Llama 4 as a reasoning engine behind the one-sentence judgment,
fine-tune a smaller Llama 4 variant on Hungarian-language persuasion attempts,
run experiments comparing different “guardians” – e.g. Llama-based vs other models – in terms of how strict or creative they are,
build explainability features that show players why a sentence failed, without revealing the full logic of the gate.

Because Llama 4 is open-weights, an indie project like NovaryonAI can actually run its own fine-tuned guardian model on self-hosted infrastructure in Hungary, rather than depending forever on a purely remote, closed API.

Open vs closed: where does Llama 4 fit in the 2025 AI race?

The broader 2025 AI landscape is dominated by three big themes:

Closed frontier models such as OpenAI’s GPT-4-class systems and Google’s Gemini line.
Hybrid models such as DeepMind/Google’s internal research models pushing towards AGI-level reasoning.
Open ecosystems led by Meta’s Llama series and a growing set of community forks and derivatives.

Llama 4 slots into the third group but openly challenges the first two: early reports claim that Maverick in particular competes with high-end proprietary models in reasoning and coding benchmarks, while Scout targets efficiency for single-GPU setups.

For users and builders, this diversity is good news. Closed models may still lead on some private benchmarks and voice capabilities, but open-source families like Llama 4 are rapidly closing the gap – and, in some scenarios, even setting the pace for innovation in tooling and integrations.

What’s next: Behemoth and beyond

Meta has also teased Llama 4 Behemoth, a massive MoE system that reportedly uses hundreds of billions of active parameters and around 2 trillion total parameters. It is still in training and intended to serve as a foundation from which smaller models like Scout and Maverick can be distilled.

If Behemoth delivers meaningful jumps in reasoning and multi-modal understanding – and if Meta continues to share strong distilled versions openly – we may see an AI ecosystem where open-source and closed systems genuinely co-evolve at similar capability levels.

Should you care about Llama 4 as an everyday user?

Most people will first encounter Llama 4 not through GitHub weights, but inside products: Meta AI chat inside WhatsApp, Messenger, Instagram or the Meta AI web interface.

However, the fact that the same underlying technology is also available openly means:

more local AI tools on your own devices,
more independent experiments like NovaryonAI’s Hungarian AI gate,
and more competition in quality and price among AI services.

In other words: even if you never download a model checkpoint in your life, the existence of Llama 4 affects the entire AI market – and the kinds of experiences that can be built on top of it.

If you want to see how a small, story-driven project can use advanced AI in a completely different way, you can always return to the gate and try your one-sentence luck with the NovaryonAI guardian.