CVE-2025-62426

Description

vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.

CVSS breakdown

CVSS 3.1

Attack Vector

Network

Attack Complexity

Low

Privileges Required

Low

User Interaction

None

Scope

Unchanged

Confidentiality

None

Integrity

None

Availability

High

Affected products

vllm-project / vllm>= 0.5.5, < 0.11.1 – >= 0.5.5, < 0.11.1

References

VENDOR_ADVISORYhttps://github.com/vllm-project/vllm/security/advisories/GHSA-69j4-grxj-j64p
PATCHhttps://github.com/vllm-project/vllm/pull/27205
PATCHhttps://github.com/vllm-project/vllm/commit/3ada34f9cb4d1af763fdfa3b481862a93eb6bd2b
MISChttps://github.com/vllm-project/vllm/blob/2a6dc67eb520ddb9c4138d8b35ed6fe6226997fb/vllm/entrypoints/chat_utils.py#L1602-L1610
MISChttps://github.com/vllm-project/vllm/blob/2a6dc67eb520ddb9c4138d8b35ed6fe6226997fb/vllm/entrypoints/openai/serving_engine.py#L809-L814