Svmuu News: Zhipu AI has launched the GLM-5.1 High-Speed API for select enterprise customers, achieving a model output speed of 400 tokens/s, setting a new global record for end-to-end speed across official large model interfaces.
It is understood that this high-speed version retains the capabilities of the original flagship model while being driven by a high-performance inference engine jointly developed by the Zhipu AI and TileRT teams. By restructuring the GPU runtime scheduling mechanism, the engine statically organizes the model into a persistent Engine Kernel resident on the GPU, reducing kernel launch and memory read/write latency associated with traditional inference.
In multi-GPU scenarios, TileRT further specializes GPU nodes within an 8-card NVL topology into different functional workers to enhance attention layer computation and inter-card communication efficiency.
Currently, this high-speed service has been made available to select enterprise customers on Zhipu AI's MaaS platform. Going forward, the optimization will continue with FP8 inference and ultra-long context capabilities, providing support for low-latency scenarios such as AI programming, real-time interaction, and real-time voice.
Disclaimer:All content on this platform is sourced from the internet and is provided for informational purposes only. None of the content represents the views of this site, nor does it constitute investment advice. Please exercise caution when investing.
Zhipu has released the GLM-5.1 High-Speed Edition API, with an output speed of up to 400 tokens per second
Disclaimer: This content reflects the author's personal views only and does not constitute investment advice. If you find any violations, please Click to Report
24H Trending
-
Iranian Foreign Ministry: Iran and the U.S. Reach Agreement
-
Gate's Stock Contracts Section Launches Trading for 8 Perpetual Contracts, Including ADSK (Autodesk) and BKNG (Booking.com Holdings)
-
Binance Seven U-denominated perpetual contracts, including LRCX and KLAC, will be launched
-
Learn More About the ALTHEA Token (ALTH) and Its Decentralized Network
-
After going long on crude oil with 10x leverage, the position is showing a paper loss of $1.33 million; a certain address holds CL long positions worth $37.77 million
-
The OKX DEX xStocks Trading Competition is currently underway, with a total prize pool of 300,000 USDC
-
Morgan Stanley Updates Ethereum and Solana ETF Filings, Proposing a 0.14% Fee
-
Record-High AI-Driven Leveraged Bets in Asia: SK Hynix’s 2x Long ETF in South Korea Reaches $13 Billion in Assets Under Management
-
Iranian media report that Iran-U.S. negotiations have resulted in five key points
-
A "smart money" investor bet $320,000 on Argentina to beat Austria in the World Cup group stage
Recommended Reading




