ExchangeDEX+

Buy Crypto Markets Spot Futures500X Earn Events

Author: Andrej Karpathy Compiled by: Tim, PANews 2025 will be a year of rapid development and great uncertainty for large language models, and we have achieved Author: Andrej Karpathy Compiled by: Tim, PANews 2025 will be a year of rapid development and great uncertainty for large language models, and we have achieved

Six major AI paradigm shifts in 2025: From RLVR training and Vibe Coding to Nano Banana

2025/12/22 17:24

Author: Andrej Karpathy

Compiled by: Tim, PANews

2025 will be a year of rapid development and great uncertainty for large language models, and we have achieved fruitful results. Below are some "paradigm shifts" that I personally find noteworthy and somewhat surprising, changes that have altered the landscape and impressed me, at least conceptually.

1. Reinforcement Learning Based on Verifiable Rewards (RLVR)

In early 2025, the LLM production stacks of all AI labs will roughly take the following form:

Pre-training (GPT-2/3 in 2020);
Supervise fine-tuning (InstructGPT 2022);
And reinforcement learning based on human feedback (RLHF, 2022).

For a long time, this has been a stable and mature technology stack for training production-grade large language models. By 2025, reinforcement learning based on verifiable rewards had become the core technology primarily adopted. By training large language models in various environments with automatically verifiable rewards (such as mathematical and programming problem-solving), these models can spontaneously form strategies that resemble "reasoning" in human terms. They learn to break down problem-solving into intermediate computational steps and master multiple strategies for solving problems through iterative deduction (see the example in the DeepSeek-R1 paper). In previous stacks, these strategies were difficult to implement because the optimal reasoning path and backtracking mechanism were not explicit for large language models, and solutions had to be explored through reward optimization to find suitable solutions.

Unlike supervised fine-tuning and human-feedback-based reinforcement learning (these two phases are relatively short and involve minimal computational cost), reinforcement learning based on verifiable rewards involves long-term optimization training on an objective, non-game-theoretic reward function. It has been proven that running reinforcement learning based on verifiable rewards can deliver significant performance improvements at a given cost, consuming a large amount of computational resources originally intended for pre-training. Therefore, the advancements in large language model capabilities in 2025 will primarily be reflected in major AI labs absorbing the enormous computational demands brought about by this new technology. Overall, we see that while the model size remains roughly the same, the training time for reinforcement learning has been significantly extended. Another unique aspect of this new technology is that we have gained a completely new dimension of regulation (and the corresponding Scaling theorem), namely, controlling model capability as a function of computational cost during testing by generating longer inference trajectories and increasing "thinking time." OpenAI's o1 model (released at the end of 2024) was the first demonstration of a reinforcement learning model based on verifiable rewards, while the release of o3 (early 2025) is a clear turning point, allowing a visibly significant leap forward.

2. Ghostly Intelligence vs. Animal-like Sawtooth Intelligence

2025 marked the first time I (and I believe the entire industry) began to understand the "form" of large language model intelligence from a more intuitive perspective. We are not "evolving and breeding animals," but rather "summoning ghosts." The entire technology stack of large language models (neural architecture, training data, training algorithms, and especially optimization objectives) is fundamentally different. Therefore, it's not surprising that we are obtaining entities in the realm of intelligence that are vastly different from biological intelligence; it's inappropriate to examine them from an animalistic perspective. From the perspective of supervised information, human neural networks are optimized for tribal survival in jungle environments, while large language model neural networks are optimized to mimic human text, obtain rewards in mathematical problems, and win human approval in arenas. As verifiable domains provide the conditions for reinforcement learning based on verifiable rewards, the capabilities of large language models in these domains will "suddenly increase," exhibiting an interesting, jagged performance characteristic overall. They may be both erudite geniuses and confused, cognitively challenged elementary school students, potentially leaking your data under duress.

Human intelligence: blue; AI intelligence: red. I like this version of the meme (sorry I can't find the original post on Twitter) because it points out that human intelligence also presents itself in a unique, jagged wave pattern.

Relatedly, in 2025 I developed a general indifference and distrust towards various benchmarks. The core issue is that benchmarks are essentially verifiable environments, making them highly susceptible to reinforcement learning based on verifiable rewards and weaker forms generated from synthetic data. In the typical "score maximization" process, large language model teams inevitably construct training environments near the small embedding spaces of the benchmarks and cover these areas with "capability jaggedness." "Training on the test set" has become the new normal.

So what if it sweeps all benchmark tests but still fails to achieve general artificial intelligence?

3. Cursor: A new layer for LLM applications

What impressed me most about Cursor (besides its rapid rise this year) is its compelling revelation of a new hierarchy of “LLM applications” as people start talking about “Cursor for XX domain.” As I emphasized in my Y Combinator talk this year, the core of LLM applications like Cursor lies in integrating and orchestrating LLM calls for a specific vertical domain:

They are responsible for "context engineering";
At the underlying level, multiple LLM calls are orchestrated into increasingly complex directed acyclic graphs, with a fine balance between performance and cost; and application-specific graphical interfaces are provided for people in the "human loop".
It also provides an "autonomous adjustment slider".

By 2025, there had been extensive discussions surrounding the development potential of this emerging application layer. Will large language model platforms dominate all applications, or will there still be vast possibilities for large language model applications? My personal prediction is that the positioning of large language model platforms will gradually converge towards cultivating "generalist university graduates," while large language model applications will be responsible for organizing and refining these "graduates," and by providing private data, sensors, actuators, and feedback loops, enabling them to truly become "professional teams" that can be deployed in specific vertical fields.

4. Claude Code: AI running locally

The emergence of Claude Code convincingly demonstrates for the first time the form of LLM agents, combining tool use with inference processes in a cyclical manner to achieve more persistent and complex problem-solving. Furthermore, what impressed me most about Claude Code is that it runs on the user's personal computer, deeply integrated with the user's private environment, data, and context. I believe OpenAI's assessment in this direction is somewhat flawed, as they have focused their code assistant and agent development on cloud deployment—specifically, containerized environments orchestrated by ChatGPT—rather than local localhost environments. While cloud-running agent clusters seem to represent the "ultimate form of general artificial intelligence," we are currently in a transitional phase characterized by uneven capability development and relatively slow progress. Under these circumstances, deploying agents directly on local computers, closely collaborating with developers and their specific work environments, is a more logical path. Claude Code accurately grasps this priority and encapsulates it in a concise, elegant, and highly attractive command-line tool, thus reshaping how AI is presented. It is no longer just a website like Google that needs to be accessed, but a tiny sprite or ghost "residing" in your computer. This is a completely new and unique paradigm for interacting with AI.

5. Vibe Coding - An environment for programming

By 2025, AI will have crossed a critical capability threshold, making it possible to build amazing programs using only English descriptions, without even needing to understand the underlying code. Interestingly, I coined the term "Vibe Coding" in a casual tweet during a shower, never imagining it would evolve to its current state. In the Vibe Coding paradigm, programming is no longer strictly confined to highly trained professionals, but becomes something everyone can participate in. From this perspective, it's yet another example of the phenomenon I described in my article, "Empowering People: How Large Language Models Are Changing Technology Diffusion Patterns." In stark contrast to all other technologies to date, ordinary people benefit more from large language models than professionals, businesses, and governments. But Vibe Coding not only empowers ordinary people to access programming but also empowers professional developers to write more software that "would never have been implemented otherwise." While developing nanochat, I used Vibe Coding to write a custom, efficient BPE tokenizer in Rust, without relying on existing libraries or delving into Rust. This year, I also used Vibe Coding to quickly prototype several projects, simply to verify the feasibility of certain ideas. I've even written entire one-off applications just to pinpoint a specific vulnerability, because code suddenly becomes free, ephemeral, malleable, and disposable. Atmospheric programming will reshape the software development ecosystem and profoundly change the boundaries of career definitions.

6. Nano Banana: LLM graphical interface

Google's Gemini Nano banana is one of the most disruptive paradigm shifts of 2025. In my view, Large Language Models (LLMs) represent the next major computing paradigm after computers of the 1970s and 80s. Therefore, we will see similar innovations based on similar fundamental reasons, resembling the evolution of personal computing, microcontrollers, and even the internet. Especially at the level of human-computer interaction, the current "dialogue" mode with LLMs is somewhat similar to inputting commands into computer terminals in the 1980s. Text is the most primitive form of data representation for computers (and LLMs), but it is not the preferred method for humans (especially when inputting). Humans actually dislike reading text; it is slow and laborious. Instead, humans prefer to receive information through visual and spatial dimensions, which is why graphical user interfaces (GUIs) were born in traditional computing. Similarly, Large Language Models should communicate with us in forms that humans prefer, through images, infographics, slides, whiteboards, animations, videos, web applications, and other media. Early forms have already achieved this through "visual text decorations" such as emojis and Markdown (e.g., headings, bolding, lists, tables, and other typographical elements). But who will ultimately build the graphical interface for a large language model? From this perspective, nano banana is an early prototype of this future blueprint. It is worth noting that nano banana's breakthrough lies not only in its image generation capabilities, but also in its comprehensive capabilities formed by the interweaving of text generation, image generation, and world knowledge in the model weights .

Market Opportunity

SIX Price(SIX)

$0.01115

$0.01115$0.01115

+0.81%

USD

SIX (SIX) Live Price Chart

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.