Author: Andrej Karpathy Compiled by: Tim, PANews 2025 will be a year of rapid development and great uncertainty for large language models, and we have achieved Author: Andrej Karpathy Compiled by: Tim, PANews 2025 will be a year of rapid development and great uncertainty for large language models, and we have achieved

Six major AI paradigm shifts in 2025: From RLVR training and Vibe Coding to Nano Banana

2025/12/22 17:24

Author: Andrej Karpathy

Compiled by: Tim, PANews

2025 will be a year of rapid development and great uncertainty for large language models, and we have achieved fruitful results. Below are some "paradigm shifts" that I personally find noteworthy and somewhat surprising, changes that have altered the landscape and impressed me, at least conceptually.

1. Reinforcement Learning Based on Verifiable Rewards (RLVR)

In early 2025, the LLM production stacks of all AI labs will roughly take the following form:

  • Pre-training (GPT-2/3 in 2020);
  • Supervise fine-tuning (InstructGPT 2022);
  • And reinforcement learning based on human feedback (RLHF, 2022).

For a long time, this has been a stable and mature technology stack for training production-grade large language models. By 2025, reinforcement learning based on verifiable rewards had become the core technology primarily adopted. By training large language models in various environments with automatically verifiable rewards (such as mathematical and programming problem-solving), these models can spontaneously form strategies that resemble "reasoning" in human terms. They learn to break down problem-solving into intermediate computational steps and master multiple strategies for solving problems through iterative deduction (see the example in the DeepSeek-R1 paper). In previous stacks, these strategies were difficult to implement because the optimal reasoning path and backtracking mechanism were not explicit for large language models, and solutions had to be explored through reward optimization to find suitable solutions.

Unlike supervised fine-tuning and human-feedback-based reinforcement learning (these two phases are relatively short and involve minimal computational cost), reinforcement learning based on verifiable rewards involves long-term optimization training on an objective, non-game-theoretic reward function. It has been proven that running reinforcement learning based on verifiable rewards can deliver significant performance improvements at a given cost, consuming a large amount of computational resources originally intended for pre-training. Therefore, the advancements in large language model capabilities in 2025 will primarily be reflected in major AI labs absorbing the enormous computational demands brought about by this new technology. Overall, we see that while the model size remains roughly the same, the training time for reinforcement learning has been significantly extended. Another unique aspect of this new technology is that we have gained a completely new dimension of regulation (and the corresponding Scaling theorem), namely, controlling model capability as a function of computational cost during testing by generating longer inference trajectories and increasing "thinking time." OpenAI's o1 model (released at the end of 2024) was the first demonstration of a reinforcement learning model based on verifiable rewards, while the release of o3 (early 2025) is a clear turning point, allowing a visibly significant leap forward.

2. Ghostly Intelligence vs. Animal-like Sawtooth Intelligence

2025 marked the first time I (and I believe the entire industry) began to understand the "form" of large language model intelligence from a more intuitive perspective. We are not "evolving and breeding animals," but rather "summoning ghosts." The entire technology stack of large language models (neural architecture, training data, training algorithms, and especially optimization objectives) is fundamentally different. Therefore, it's not surprising that we are obtaining entities in the realm of intelligence that are vastly different from biological intelligence; it's inappropriate to examine them from an animalistic perspective. From the perspective of supervised information, human neural networks are optimized for tribal survival in jungle environments, while large language model neural networks are optimized to mimic human text, obtain rewards in mathematical problems, and win human approval in arenas. As verifiable domains provide the conditions for reinforcement learning based on verifiable rewards, the capabilities of large language models in these domains will "suddenly increase," exhibiting an interesting, jagged performance characteristic overall. They may be both erudite geniuses and confused, cognitively challenged elementary school students, potentially leaking your data under duress.

Human intelligence: blue; AI intelligence: red. I like this version of the meme (sorry I can't find the original post on Twitter) because it points out that human intelligence also presents itself in a unique, jagged wave pattern.

Relatedly, in 2025 I developed a general indifference and distrust towards various benchmarks. The core issue is that benchmarks are essentially verifiable environments, making them highly susceptible to reinforcement learning based on verifiable rewards and weaker forms generated from synthetic data. In the typical "score maximization" process, large language model teams inevitably construct training environments near the small embedding spaces of the benchmarks and cover these areas with "capability jaggedness." "Training on the test set" has become the new normal.

So what if it sweeps all benchmark tests but still fails to achieve general artificial intelligence?

3. Cursor: A new layer for LLM applications

What impressed me most about Cursor (besides its rapid rise this year) is its compelling revelation of a new hierarchy of “LLM applications” as people start talking about “Cursor for XX domain.” As I emphasized in my Y Combinator talk this year, the core of LLM applications like Cursor lies in integrating and orchestrating LLM calls for a specific vertical domain:

  • They are responsible for "context engineering";
  • At the underlying level, multiple LLM calls are orchestrated into increasingly complex directed acyclic graphs, with a fine balance between performance and cost; and application-specific graphical interfaces are provided for people in the "human loop".
  • It also provides an "autonomous adjustment slider".

By 2025, there had been extensive discussions surrounding the development potential of this emerging application layer. Will large language model platforms dominate all applications, or will there still be vast possibilities for large language model applications? My personal prediction is that the positioning of large language model platforms will gradually converge towards cultivating "generalist university graduates," while large language model applications will be responsible for organizing and refining these "graduates," and by providing private data, sensors, actuators, and feedback loops, enabling them to truly become "professional teams" that can be deployed in specific vertical fields.

4. Claude Code: AI running locally

The emergence of Claude Code convincingly demonstrates for the first time the form of LLM agents, combining tool use with inference processes in a cyclical manner to achieve more persistent and complex problem-solving. Furthermore, what impressed me most about Claude Code is that it runs on the user's personal computer, deeply integrated with the user's private environment, data, and context. I believe OpenAI's assessment in this direction is somewhat flawed, as they have focused their code assistant and agent development on cloud deployment—specifically, containerized environments orchestrated by ChatGPT—rather than local localhost environments. While cloud-running agent clusters seem to represent the "ultimate form of general artificial intelligence," we are currently in a transitional phase characterized by uneven capability development and relatively slow progress. Under these circumstances, deploying agents directly on local computers, closely collaborating with developers and their specific work environments, is a more logical path. Claude Code accurately grasps this priority and encapsulates it in a concise, elegant, and highly attractive command-line tool, thus reshaping how AI is presented. It is no longer just a website like Google that needs to be accessed, but a tiny sprite or ghost "residing" in your computer. This is a completely new and unique paradigm for interacting with AI.

5. Vibe Coding - An environment for programming

By 2025, AI will have crossed a critical capability threshold, making it possible to build amazing programs using only English descriptions, without even needing to understand the underlying code. Interestingly, I coined the term "Vibe Coding" in a casual tweet during a shower, never imagining it would evolve to its current state. In the Vibe Coding paradigm, programming is no longer strictly confined to highly trained professionals, but becomes something everyone can participate in. From this perspective, it's yet another example of the phenomenon I described in my article, "Empowering People: How Large Language Models Are Changing Technology Diffusion Patterns." In stark contrast to all other technologies to date, ordinary people benefit more from large language models than professionals, businesses, and governments. But Vibe Coding not only empowers ordinary people to access programming but also empowers professional developers to write more software that "would never have been implemented otherwise." While developing nanochat, I used Vibe Coding to write a custom, efficient BPE tokenizer in Rust, without relying on existing libraries or delving into Rust. This year, I also used Vibe Coding to quickly prototype several projects, simply to verify the feasibility of certain ideas. I've even written entire one-off applications just to pinpoint a specific vulnerability, because code suddenly becomes free, ephemeral, malleable, and disposable. Atmospheric programming will reshape the software development ecosystem and profoundly change the boundaries of career definitions.

6. Nano Banana: LLM graphical interface

Google's Gemini Nano banana is one of the most disruptive paradigm shifts of 2025. In my view, Large Language Models (LLMs) represent the next major computing paradigm after computers of the 1970s and 80s. Therefore, we will see similar innovations based on similar fundamental reasons, resembling the evolution of personal computing, microcontrollers, and even the internet. Especially at the level of human-computer interaction, the current "dialogue" mode with LLMs is somewhat similar to inputting commands into computer terminals in the 1980s. Text is the most primitive form of data representation for computers (and LLMs), but it is not the preferred method for humans (especially when inputting). Humans actually dislike reading text; it is slow and laborious. Instead, humans prefer to receive information through visual and spatial dimensions, which is why graphical user interfaces (GUIs) were born in traditional computing. Similarly, Large Language Models should communicate with us in forms that humans prefer, through images, infographics, slides, whiteboards, animations, videos, web applications, and other media. Early forms have already achieved this through "visual text decorations" such as emojis and Markdown (e.g., headings, bolding, lists, tables, and other typographical elements). But who will ultimately build the graphical interface for a large language model? From this perspective, nano banana is an early prototype of this future blueprint. It is worth noting that nano banana's breakthrough lies not only in its image generation capabilities, but also in its comprehensive capabilities formed by the interweaving of text generation, image generation, and world knowledge in the model weights .

Market Opportunity
SIX Logo
SIX Price(SIX)
$0.01115
$0.01115$0.01115
+0.81%
USD
SIX (SIX) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Building a DEXScreener Clone: A Step-by-Step Guide

Building a DEXScreener Clone: A Step-by-Step Guide

DEX Screener is used by crypto traders who need access to on-chain data like trading volumes, liquidity, and token prices. This information allows them to analyze trends, monitor new listings, and make informed investment decisions. In this tutorial, I will build a DEXScreener clone from scratch, covering everything from the initial design to a functional app. We will use Streamlit, a Python framework for building full-stack apps.
Share
Hackernoon2025/09/18 15:05
China Blocks Nvidia’s RTX Pro 6000D as Local Chips Rise

China Blocks Nvidia’s RTX Pro 6000D as Local Chips Rise

The post China Blocks Nvidia’s RTX Pro 6000D as Local Chips Rise appeared on BitcoinEthereumNews.com. China Blocks Nvidia’s RTX Pro 6000D as Local Chips Rise China’s internet regulator has ordered the country’s biggest technology firms, including Alibaba and ByteDance, to stop purchasing Nvidia’s RTX Pro 6000D GPUs. According to the Financial Times, the move shuts down the last major channel for mass supplies of American chips to the Chinese market. Why Beijing Halted Nvidia Purchases Chinese companies had planned to buy tens of thousands of RTX Pro 6000D accelerators and had already begun testing them in servers. But regulators intervened, halting the purchases and signaling stricter controls than earlier measures placed on Nvidia’s H20 chip. Image: Nvidia An audit compared Huawei and Cambricon processors, along with chips developed by Alibaba and Baidu, against Nvidia’s export-approved products. Regulators concluded that Chinese chips had reached performance levels comparable to the restricted U.S. models. This assessment pushed authorities to advise firms to rely more heavily on domestic processors, further tightening Nvidia’s already limited position in China. China’s Drive Toward Tech Independence The decision highlights Beijing’s focus on import substitution — developing self-sufficient chip production to reduce reliance on U.S. supplies. “The signal is now clear: all attention is focused on building a domestic ecosystem,” said a representative of a leading Chinese tech company. Nvidia had unveiled the RTX Pro 6000D in July 2025 during CEO Jensen Huang’s visit to Beijing, in an attempt to keep a foothold in China after Washington restricted exports of its most advanced chips. But momentum is shifting. Industry sources told the Financial Times that Chinese manufacturers plan to triple AI chip production next year to meet growing demand. They believe “domestic supply will now be sufficient without Nvidia.” What It Means for the Future With Huawei, Cambricon, Alibaba, and Baidu stepping up, China is positioning itself for long-term technological independence. Nvidia, meanwhile, faces…
Share
BitcoinEthereumNews2025/09/18 01:37
Ripple-Backed Evernorth Faces $220M Loss on XRP Holdings Amid Market Slump

Ripple-Backed Evernorth Faces $220M Loss on XRP Holdings Amid Market Slump

TLDR Evernorth invested $947M in XRP, now valued at $724M, a loss of over $220M. XRP’s price dropped 16% in the last 30 days, leading to Evernorth’s paper losses
Share
Coincentral2025/12/26 03:56