CVE-2026-5757, nicknamed "Bleeding Llama," is a critical, currently unpatched heap out-of-bounds read vulnerability in Ollama — the widely-used open-source platform for running large language models (LLMs) locally on developer and enterprise machines. An unauthenticated remote attacker can exploit the flaw to read arbitrary contents from the server's heap memory (the region of a process's memory used for dynamic allocation — where runtime data like API keys, user sessions, and in-flight model inputs are temporarily stored) and silently exfiltrate that data to an attacker-controlled server using Ollama's own registry API. There is no patch available as of this writing, and approximately 300,000 Ollama instances are exposed on the public internet.
Security researcher Jeremy Brown discovered CVE-2026-5757 using AI-assisted vulnerability research techniques. The CERT Coordination Center (CERT/CC, the federally funded research and development center at Carnegie Mellon University that coordinates disclosure of software vulnerabilities) has been unable to reach Ollama's vendor to coordinate a fix, leaving the vulnerability open.
CVE-2026-5757: Technical Details
The vulnerability lives in Ollama's GGUF quantization engine. GGUF (GPT-Generated Unified Format — a binary format for storing and distributing large language model weights, widely used with open-source models like Llama, Mistral, and Phi) model files contain a header section that declares tensor metadata: the names, data types, and element counts of the model's numerical weight arrays. Ollama's quantization engine (the component that converts model weights between different precision formats to reduce memory usage) trusts the element counts declared in the GGUF file header without verifying them against the actual size of the data provided in the file body.
An attacker can craft a malicious GGUF file that declares a tensor element count far larger than the actual data in the file. When Ollama's quantization engine processes this file, it reads beyond the end of the legitimate tensor data into adjacent heap memory — a classic out-of-bounds read. That out-of-bounds heap data does not simply cause a crash; it is processed and written into a newly created model layer stored on the server's filesystem.
The attacker then uses Ollama's built-in registry API (the interface that allows users to push and pull model files to and from registries, similar to how Docker images are pushed to a container registry) to push this new "model layer" — which now contains a portion of the server's heap memory — to an attacker-controlled registry server. The exfiltration happens over a legitimate application protocol, using Ollama's own functionality, making it difficult to distinguish from normal model management traffic.
The practical impact is significant: whatever was in the Ollama server's heap memory at the time of the request can be captured. Depending on the server's workload, that memory may contain:
- API keys and authentication tokens for third-party services called by the LLM or its tools
- Private user prompts and conversation data being processed in memory
- Proprietary model inputs or outputs
- Internal configuration data, database connection strings, or environment variables loaded by the process
The attack requires no authentication and no user interaction. An attacker with a network path to the Ollama API port (default: TCP 11434) can trigger the vulnerability by uploading the crafted GGUF file and initiating the quantization process through the API.
The CERT/CC advisory is listed as VU#518910. SecurityWeek has named the vulnerability "Bleeding Llama," referencing the memory-reading technique that mirrors the Heartbleed vulnerability in OpenSSL (which allowed attackers to read server memory by exploiting a missing bounds check in the TLS heartbeat extension — a flaw discovered in 2014 that affected two-thirds of the internet's encrypted connections).
Exploitation Status and Threat Landscape
Active exploitation of CVE-2026-5757 has been observed. Honeypot data indicates that scanning for exposed Ollama instances with model upload functionality enabled began in late April 2026, consistent with the timing of the vulnerability's disclosure. There is no CISA KEV listing at this time, and no specific threat actor group has been publicly attributed to the active exploitation activity.
The lack of a vendor patch makes the threat landscape unusually straightforward: there is no "unpatched systems" population to contrast with a patched one — all Ollama deployments running with model upload enabled are currently vulnerable. The 300,000 figure for internet-exposed Ollama instances comes from internet-wide scanning data. Many of these instances are developer workstations, research environments, and enterprise AI deployment nodes that expose the API port without authentication (Ollama does not enable authentication by default in its standard configuration).
The mitigations described below are the only available controls until a vendor patch is released. The attack also aligns with MITRE ATT&CK T1557 (Adversary-in-the-Middle — though the primary technique here is better characterized as T1005: Data from Local System, combined with T1041: Exfiltration Over C2 Channel using the Ollama registry API as the exfiltration pathway).
Who Is Affected
Any deployment of Ollama that:
- Exposes the Ollama API (default port TCP 11434) to a network — including directly to the internet or to untrusted internal segments
- Has model upload functionality enabled (enabled by default)
…is potentially vulnerable. This includes:
- Developer workstations running Ollama locally with the API bound to all interfaces
- Enterprise Ollama servers used for shared LLM inference
- Cloud-deployed Ollama instances (AWS, GCP, Azure, bare-metal) where the API port is reachable
- Self-hosted AI assistants and RAG (Retrieval-Augmented Generation — an architecture where an LLM is connected to an external knowledge base to answer questions about proprietary data) pipelines built on top of Ollama
Deployments where the Ollama API is bound exclusively to localhost (127.0.0.1) and not exposed to any network interface are not directly exploitable from outside the host, though an attacker with local access could still exploit the vulnerability.
What You Should Do Right Now
- Restrict network access to the Ollama API port immediately. If the Ollama API (TCP 11434) is exposed to the internet or to untrusted network segments, block or restrict access at the firewall or network layer. Ollama should only be accessible from trusted hosts or networks.
“bash # Block external access to Ollama port with iptables iptables -A INPUT -p tcp --dport 11434 ! -s 127.0.0.1 -j DROP “
- Disable model upload functionality if not required. The exploitation path requires the ability to upload a GGUF file via the API. If your deployment only uses model downloads (pull) and inference, disable or restrict the push/upload endpoint through an API gateway or reverse proxy.
- Bind Ollama to localhost only. If your Ollama instance is running as a local service with no need for remote access, ensure it listens only on the loopback interface:
“bash # In your Ollama service configuration or environment OLLAMA_HOST=127.0.0.1 ollama serve “
Edit your systemd service file or startup script to enforce this binding.
- Monitor Ollama API logs for anomalous model upload activity. Look for unexpected
POST /api/pushor model upload requests from unfamiliar hosts:
“bash journalctl -u ollama --since "2026-04-25" | grep -i "push\|upload\|model" “
- Rotate any secrets that may have been in memory. If your Ollama server processes requests that involve API keys, authentication tokens, or other secrets (for example, if your LLM uses tools that make API calls), rotate those credentials now. Treat the server's memory as potentially compromised if it was reachable from an untrusted network while running the vulnerable version.
- Monitor the Ollama GitHub repository and CERT/CC VU#518910 for patch availability. When a fix is released, apply it immediately. Subscribe to Ollama's release notifications to ensure you are alerted without delay.
Background: Understanding the Risk
The Bleeding Llama vulnerability emerges at a moment when Ollama has become the de facto standard for self-hosted LLM deployment. Its ease of use — a single command to pull and run open-source models like Meta's Llama series, Mistral, Google's Gemma, and dozens of others — has driven rapid adoption across the full spectrum of users, from individual developers to enterprise teams building internal AI tools.
That same ease of use has created a large population of deployments that may not have been configured with security in mind. Ollama's default configuration binds its API to all network interfaces (0.0.0.0), not just localhost, and offers no built-in authentication mechanism. This means that in its out-of-the-box state, any host on the same network — or any host on the internet if a firewall exception exists — can interact with the Ollama API as if it were a fully authorized user.
Security research on AI infrastructure has consistently found that the rapid deployment cadence of LLM tooling outpaces security hardening. A scan of over one million exposed AI services in 2026 found widespread misconfigurations across Ollama, Jupyter notebooks, and other LLM infrastructure. The Bleeding Llama vulnerability adds a critical code-level flaw on top of the configuration-level exposure that research has documented.
The GGUF format itself introduces a novel attack surface that has not historically been scrutinized to the same degree as image formats (which have a long history of parser vulnerabilities) or document formats. As GGUF becomes the standard distribution format for open-source LLMs, it is likely that additional parser vulnerabilities in GGUF implementations will be discovered. The MetInfo-style model — a new format trusted by widely-deployed software, receiving user-supplied files without strict validation — is a recurring source of critical vulnerabilities across security history.
Ollama's inability to respond to CERT/CC's disclosure attempt is a procedural concern that affects the entire user community: without a vendor patch path, the only available mitigations are defensive configuration changes rather than a code-level fix. Organizations and developers running Ollama should treat this as an ongoing exposure until an official patch is confirmed.
Conclusion
CVE-2026-5757 (Bleeding Llama) is a critical, unpatched heap memory exfiltration vulnerability in Ollama affecting an estimated 300,000 internet-exposed deployments. An unauthenticated attacker can use a crafted GGUF file to read the server's heap memory and exfiltrate API keys, private data, and other sensitive contents using Ollama's own registry API. Until a patch is available, restrict network access to the Ollama API port, disable model upload functionality, and bind the service to localhost. Treat any externally accessible Ollama deployment as potentially compromised and rotate associated secrets.
For any query contact us at contact@cipherssecurity.com

