Anyscale announces major Ray Serve optimizations with HAProxy and gRPC, achieving 11.1x throughput gains for LLM inference workloads on enterprise deployments. (Anyscale announces major Ray Serve optimizations with HAProxy and gRPC, achieving 11.1x throughput gains for LLM inference workloads on enterprise deployments. (

Ray Serve Upgrade Delivers 88% Lower Latency for AI Inference at Scale

2026/03/25 00:58
3 min read
For feedback or concerns regarding this content, please contact us at [email protected]

Ray Serve Upgrade Delivers 88% Lower Latency for AI Inference at Scale

Jessie A Ellis Mar 24, 2026 16:58

Anyscale announces major Ray Serve optimizations with HAProxy and gRPC, achieving 11.1x throughput gains for LLM inference workloads on enterprise deployments.

Ray Serve Upgrade Delivers 88% Lower Latency for AI Inference at Scale

Anyscale has shipped substantial performance upgrades to Ray Serve that slash P99 latency by up to 88% and boost throughput by 11.1x for large language model inference workloads. The improvements, available in Ray 2.55+, address scaling bottlenecks that have plagued enterprise AI deployments running latency-sensitive applications.

The upgrades center on two architectural changes: HAProxy integration for ingress traffic and direct gRPC communication between deployment replicas. Both bypass Python-based components that previously created chokepoints under heavy load.

What the Numbers Show

In benchmark testing of a deep learning recommendation model pipeline, the optimized configuration pushed throughput from 490 to 1,573 queries per second while cutting P99 latency by 75%. At 400 concurrent users, the performance gap widened dramatically as Ray Serve's default Python proxy saturated while HAProxy continued scaling.

For LLM inference specifically, the results proved even more striking. Running GPT-class models on H100 GPUs at 256 concurrent users per replica, throughput scaled linearly with replica count when using HAProxy—something the default configuration couldn't achieve as the Python process hit its ceiling.

Streaming workloads saw 8.9x throughput improvements, while unary request patterns hit the full 11.1x gain.

Technical Architecture Shift

The core problem: Ray Serve's default proxy runs on Python's asyncio, which struggles at high concurrency. HAProxy, written in C and battle-tested across production systems globally, handles the same traffic with significantly less overhead.

The second optimization targets inter-deployment communication. Previously, when one deployment called another, Ray Serve routed everything through Ray Core's actor task system—useful for complex orchestration but overkill for simple request-response patterns. The new gRPC option establishes direct channels between replica actors, serializing with protobuf instead of going through Ray's object store.

Benchmarks show gRPC alone delivers 1.5x throughput improvement for unary calls and 2.4x for streaming at equivalent latency targets.

Enterprise Implications

These aren't academic improvements. Companies running recommendation systems, real-time fraud detection, or customer-facing LLM applications have consistently hit Ray Serve's scaling limits. The partnership with Google Kubernetes Engine that drove these optimizations suggests enterprise demand was substantial enough to prioritize the work.

A single environment variable—RAY_SERVE_USE_GRPC_BY_DEFAULT—enables the gRPC transport. HAProxy activation requires cluster-level configuration but integrates with existing Kubernetes deployments.

Anyscale is working toward making both optimizations the default for all inter-deployment communication, with an RFC currently under discussion. For teams already running Ray Serve in production, the upgrade path is straightforward: update to Ray 2.55+ and flip the appropriate flags.

The benchmark code is publicly available on GitHub for teams wanting to validate performance gains against their specific workloads before deploying.

Image source: Shutterstock
  • ray serve
  • ai infrastructure
  • llm inference
  • machine learning
  • anyscale
Market Opportunity
Raydium Logo
Raydium Price(RAY)
$0.6066
$0.6066$0.6066
+1.23%
USD
Raydium (RAY) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Tags:

You May Also Like

Dogecoin Price Climbs as Crypto Market Rebounds

Dogecoin Price Climbs as Crypto Market Rebounds

The post Dogecoin Price Climbs as Crypto Market Rebounds appeared on BitcoinEthereumNews.com. The cryptocurrency market moved higher on Tuesday morning as risk
Share
BitcoinEthereumNews2026/03/25 02:34
‘Missed the Bitcoin Bus’ With 600 BTC 16 Years Ago and Satoshi Around: $42.7 Million Now

‘Missed the Bitcoin Bus’ With 600 BTC 16 Years Ago and Satoshi Around: $42.7 Million Now

The post ‘Missed the Bitcoin Bus’ With 600 BTC 16 Years Ago and Satoshi Around: $42.7 Million Now appeared on BitcoinEthereumNews.com. “Missed the Bitcoin bus”
Share
BitcoinEthereumNews2026/03/25 01:48
VanEck Targets Stablecoins & Next-Gen ICOs

VanEck Targets Stablecoins & Next-Gen ICOs

The post VanEck Targets Stablecoins & Next-Gen ICOs appeared on BitcoinEthereumNews.com. Welcome to the US Crypto News Morning Briefing—your essential rundown of the most important developments in crypto for the day ahead. Grab a coffee because the firms shaping crypto’s future are not just building products, but also trying to reshape how capital flows. Crypto News of the Day: VanEck Maps Next Frontier of Crypto Venture Investing VanEck, a Wall Street player known for financial “firsts,” is pushing that legacy into Web3. The firsts include pioneering US gold funds and launching one of the earliest spot Bitcoin ETFs. Sponsored Sponsored “Financial instruments have always been a kind of tokenization. From seashells to traveler’s checks, from relational databases to today’s on-chain assets. You could even joke that VanEck’s first gold mutual funds were the original ‘tokenized gold,’” Juan C. Lopez, General Partner at VanEck Ventures, told BeInCrypto. That same instinct drives the firm’s venture bets. Lopez said VanEck goes beyond writing checks and brings the full weight of the firm. This extends from regulatory proximity to product experiments to founders building the next phase of crypto infrastructure. Asked about key investment priorities, Lopez highlighted stablecoins. “We care deeply about three questions: How do we accelerate stablecoin ubiquity? What will users want to do with them once highly distributed? And what net new assets can we construct now that we have sophisticated market infrastructure?” Lopez added. However, VanEck is not limiting itself to the hottest narrative, acknowledging that decentralized finance (DeFi) is having a renaissance. The VanEck executive also noted that success will depend on new approaches to identity and programmable compliance layered on public blockchains. Backing Legion With A New Model for ICOs Sponsored Sponsored That compliance-first angle explains VanEck Ventures’ recent co-lead of Legion’s $5 million seed round alongside Brevan Howard. Legion aims to reinvent token fundraising by making early-stage access…
Share
BitcoinEthereumNews2025/09/18 03:52