Description
vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.23.1rc0, ll temperature validation gates use comparison operators (<, >), which silently evaluate to False for NaN and for positive Infinity in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce undefined behavior or CUDA errors that can crash the inference worker. This vulnerability is fixed in 0.23.1rc0.
CVSS breakdown
CVSS 4.0
Attack Vector
Network
Attack Complexity
Low
Attack Requirements
None
Privileges Required
None
User Interaction
None
Confidentiality (Vulnerable System)
None
Integrity (Vulnerable System)
None
Availability (Vulnerable System)
Low
Confidentiality (Subsequent System)
None
Integrity (Subsequent System)
None
Availability (Subsequent System)
None
Affected products
- vllm-project / vllm< 0.23.1rc0 – < 0.23.1rc0