On February 27th, PANews reported that while the industry eagerly awaits the next-generation flagship model, DeepSeek V4, the DeepSeek team has quietly released a new academic paper. The paper introduces an innovative inference system called DualPath, specifically optimized for large model (LLM) inference performance under agent workloads. By introducing a "dual-path read KV-Cache (similar to a memory cache)" mechanism, the storage network load is reallocated, increasing offline inference throughput by up to 1.87 times and the average number of agent runs per second for online services by 1.96 times. The paper's introduction states that large models are rapidly evolving from single-turn chatbots and independent inference models into agent systems—capable of autonomous planning, invoking tools, and solving practical tasks through multi-turn interactions. This shift in application paradigm is driving a significant change in large model inference workloads: from traditional human-large model interaction to human-large model-environment interaction, with interaction rounds reaching dozens or even hundreds of rounds.


