Anyscale announces major Ray Serve optimizations with HAProxy and gRPC, achieving 11.1x throughput gains for LLM inference workloads on enterprise deployments. (Anyscale announces major Ray Serve optimizations with HAProxy and gRPC, achieving 11.1x throughput gains for LLM inference workloads on enterprise deployments. (

Ray Serve Upgrade Delivers 88% Lower Latency for AI Inference at Scale

2026/03/25 00:58
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Ray Serve Upgrade Delivers 88% Lower Latency for AI Inference at Scale

Jessie A Ellis Mar 24, 2026 16:58

Anyscale announces major Ray Serve optimizations with HAProxy and gRPC, achieving 11.1x throughput gains for LLM inference workloads on enterprise deployments.

Ray Serve Upgrade Delivers 88% Lower Latency for AI Inference at Scale

Anyscale has shipped substantial performance upgrades to Ray Serve that slash P99 latency by up to 88% and boost throughput by 11.1x for large language model inference workloads. The improvements, available in Ray 2.55+, address scaling bottlenecks that have plagued enterprise AI deployments running latency-sensitive applications.

The upgrades center on two architectural changes: HAProxy integration for ingress traffic and direct gRPC communication between deployment replicas. Both bypass Python-based components that previously created chokepoints under heavy load.

What the Numbers Show

In benchmark testing of a deep learning recommendation model pipeline, the optimized configuration pushed throughput from 490 to 1,573 queries per second while cutting P99 latency by 75%. At 400 concurrent users, the performance gap widened dramatically as Ray Serve's default Python proxy saturated while HAProxy continued scaling.

For LLM inference specifically, the results proved even more striking. Running GPT-class models on H100 GPUs at 256 concurrent users per replica, throughput scaled linearly with replica count when using HAProxy—something the default configuration couldn't achieve as the Python process hit its ceiling.

Streaming workloads saw 8.9x throughput improvements, while unary request patterns hit the full 11.1x gain.

Technical Architecture Shift

The core problem: Ray Serve's default proxy runs on Python's asyncio, which struggles at high concurrency. HAProxy, written in C and battle-tested across production systems globally, handles the same traffic with significantly less overhead.

The second optimization targets inter-deployment communication. Previously, when one deployment called another, Ray Serve routed everything through Ray Core's actor task system—useful for complex orchestration but overkill for simple request-response patterns. The new gRPC option establishes direct channels between replica actors, serializing with protobuf instead of going through Ray's object store.

Benchmarks show gRPC alone delivers 1.5x throughput improvement for unary calls and 2.4x for streaming at equivalent latency targets.

Enterprise Implications

These aren't academic improvements. Companies running recommendation systems, real-time fraud detection, or customer-facing LLM applications have consistently hit Ray Serve's scaling limits. The partnership with Google Kubernetes Engine that drove these optimizations suggests enterprise demand was substantial enough to prioritize the work.

A single environment variable—RAY_SERVE_USE_GRPC_BY_DEFAULT—enables the gRPC transport. HAProxy activation requires cluster-level configuration but integrates with existing Kubernetes deployments.

Anyscale is working toward making both optimizations the default for all inter-deployment communication, with an RFC currently under discussion. For teams already running Ray Serve in production, the upgrade path is straightforward: update to Ray 2.55+ and flip the appropriate flags.

The benchmark code is publicly available on GitHub for teams wanting to validate performance gains against their specific workloads before deploying.

Image source: Shutterstock
  • ray serve
  • ai infrastructure
  • llm inference
  • machine learning
  • anyscale
Market Opportunity
Raydium Logo
Raydium Price(RAY)
$0.5766
$0.5766$0.5766
+2.52%
USD
Raydium (RAY) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Tags:

You May Also Like

Chorus One and MEV Zone Team Up to Boost Avalanche Staking Rewards

Chorus One and MEV Zone Team Up to Boost Avalanche Staking Rewards

The post Chorus One and MEV Zone Team Up to Boost Avalanche Staking Rewards appeared on BitcoinEthereumNews.com. Through the partnership with MEV Zone, Chorus One users will earn extra yield automatically. The Chorus One Avalanche node has a total stake of over 1.7 million, valued at around $55 million. This collaboration will introduce MEV Zone to both public nodes and Validator-as-a-Service. The Avalanche network stands to benefit from fairer and more efficient markets due to enhanced transparency. Chorus One, a highly decorated institutional-grade staking provider, has inked a strategic partnership with MEV Zone to enhance yield generation on the Avalanche (AVAX) network. The Chorus One partnered with MEV Zone to increase the AVAX staking yields, while simultaneously contributing to the general growth of the Avalanche network. “At Chorus One, we see this as an important step in our ongoing journey to provide robust infrastructure and innovative yield strategies for our partners and clients,” the announcement noted.  Why Did Chorus One Partner With MEV Zone? The Chorus One platform has grown to a top-tier institutional-grade staking ecosystem, with more than 40 blockchains, since 2018. In a bid to evolve with the needs of crypto investors and the supported blockchains, Chorus One has inked several strategic partnerships in the recent past, including MEV Zone. In the recent past, MEV Zone has specialized in addressing the Maximal Extractable Value (MEV) challenges on the Avalanche network. The MEV Zone will help Chorus One’s AVAX node validator to use Proposer-Builder Separation (PBS). As such, Chorus One’s AVAX node will seamlessly select certain transactions that are more profitable when making blocks. For instance, MEV Zone will help Chorus One’s AVAX node validator to capture arbitrage and liquidation transactions more often since they are more profitable.  How will Chorus One’s AVAX Stakers Benefit Via This Partnership? The Chorus One AVAX node has grown over the years to more than 1.77 million coins staked, valued…
Share
BitcoinEthereumNews2025/09/18 03:19
NYDFS Mandates Blockchain Analysis for Banks’ Digital Asset Offerings

NYDFS Mandates Blockchain Analysis for Banks’ Digital Asset Offerings

Detail: https://coincu.com/news/nydfs-blockchain-guidance-digital-assets/
Share
Coinstats2025/09/17 23:40
[OPINION] Bowels of the earth, limitless energy source

[OPINION] Bowels of the earth, limitless energy source

RUSSIAN OIL. File photo shows oil pump jacks outside Almetyevsk in the Republic of Tatarstan, Russia June 4, 2023.
Share
Rappler2026/03/30 18:00