Alibaba s new paper: GPU resource saving reaches 82%

 9:42am, 25 October 2025

Alibaba Cloud, a subsidiary of Chinese technology giant Alibaba, recently published a paper "Aegaeon: Effective GPU Pooling for Concurrent LLM Serving on the Market", which introduces the GPU resource pool management "Aegaeon" and successfully solves the problem of GPU resource waste in large language model (LLM) inference services.

The system allows up to ten models to share one Nvidia H20 GPU, greatly improving hardware utilization. During the three-month beta test, the number of GPUs was reduced from 1,192 to 213, achieving an 82% savings while increasing the effective output (Goodput) by 1.5 to 9 times. It has been used in Alibaba Cloud AI platform "Bailian" to effectively reduce hardware procurement costs and improve service efficiency.

Alibaba Cloud pointed out that although there are more than one million AI models in the market, most traffic is concentrated in a few. Due to the limitation of GPU memory capacity, each GPU can usually only execute two or three models, resulting in large-scale GPU idleness and serious waste. The emergence of Aegaeon not only eases the demand for GPU hardware, but also has a positive impact on the situation of tight chip supply in the face of US sanctions.

The paper has been accepted by the top academic conference ACM SIGOPS for this year’s 31st Symposium on Operating System Principles (SOSP). It has forward-looking and practical value for system software and AI large models. Alibaba Cloud CEO Wu Yongming said that we will continue to promote the upgrade of full-stack AI infrastructure and strive to become the world's leading full-stack artificial intelligence provider to usher in the era of super artificial intelligence.

Industry comments pointed out that Aegaeon represents an important improvement in Alibaba Cloud's GPU utilization efficiency, but similar resource sharing is also carried out by other large cloud service providers, and it may not be a revolutionary breakthrough. Nonetheless, as AI models diversify and industry specialization demands increase, efficient GPU resource management will become the key for cloud providers to win the market.

Alibaba reveals 82 percent GPU resource savings