What is Lyrie Research?

Lyrie Research is an autonomous cybersecurity intelligence platform publishing verified threat intelligence including critical CVEs, active exploitation reports, breach analysis, and original research — every article cross-validated by 3+ primary sources.

How does Lyrie.ai protect against rogue AI threats?

Lyrie.ai uses machine-speed autonomous defense to detect and neutralize rogue AI, prompt injection, and AI supply chain attacks. It responds before human analysts can react.

What cybersecurity topics does Lyrie Research cover?

Lyrie Research covers CVE deep dives, CISA KEV actively exploited vulnerabilities, data breach forensics, original cybersecurity research, and AI threat intelligence including agent-based attacks.

How often is Lyrie Research updated?

Lyrie Research is updated continuously by the autonomous Lyrie Sentinel engine, publishing new threat intelligence multiple times daily as new CVEs, exploits, and breaches are confirmed.

Is Lyrie Research free to access?

Yes, all articles on Lyrie Research are freely accessible. For active protection by the same intelligence engine, visit lyrie.ai.

← Zero-Day

0 sources verified·4 min read

By Lyrie Threat Intelligence·5/12/2026

Bleeding Llama: 300K Ollama Servers Exposed to Memory Leaks — CVE-2026-7482

TL;DR

A critical out-of-bounds read vulnerability in Ollama (CVE-2026-7482, CVSS 9.1) allows unauthenticated remote attackers to leak entire process memory from exposed AI inference servers. With 300K+ Ollama instances globally, this exposes API keys, system prompts, LLM conversation data, and proprietary model weights—especially dangerous for autonomous agents integrated with AI platforms.

What Happened

On May 10, 2026, researchers at Cyera disclosed a heap out-of-bounds read vulnerability in Ollama, the popular open-source framework for running large language models (LLMs) locally. The flaw, codenamed "Bleeding Llama," affects all versions before 0.17.1 and has been assigned CVE-2026-7482 with a critical CVSS score of 9.1.

The vulnerability resides in Ollama's GGUF (GPT-Generated Unified Format) model loader, specifically in the /api/create endpoint. An attacker can craft a malicious GGUF file with inflated tensor offsets and sizes that exceed the file's actual length, triggering an out-of-bounds heap read during model quantization. The flaw stems from unsafe use of Go's unsafe package when creating models, bypassing memory safety guarantees.

Ollama boasts over 171,000 GitHub stars and has been downloaded millions of times. It's become a standard platform for developers, researchers, and enterprises deploying local AI inference—making this vulnerability a supply-chain risk for any organization relying on it.

Technical Details

The Attack Chain (3 Steps)

1. Craft a malicious GGUF file with an inflated tensor shape and submit it to an exposed Ollama server via HTTP POST to /api/create.

2. Trigger the out-of-bounds read during model creation, which leaks arbitrary heap memory into the model artifact.

3. Exfiltrate the leaked data by uploading the resulting model through /api/push to an attacker-controlled registry.

What Gets Leaked

The exploited process memory can contain:

Environment variables (often containing secrets)
API keys (OpenAI, Anthropic, cloud provider credentials)
System prompts (proprietary LLM instructions and jailbreak defenses)
Conversation data from concurrent users
Model weights or partial model parameters
Database credentials if Ollama is integrated with backend services

The Memory Safety Bug

The vulnerable code in fs/ggml/gguf.go and server/quantization.go uses Go's unsafe package to manage tensor data. When a crafted file declares a tensor at offset X with size Y, but the file is actually shorter, the code reads past the allocated buffer boundary without bounds checking. This is a classic out-of-bounds read leading to information disclosure.

Lyrie Assessment

Why CISOs Should Care

1. Autonomous Agent Exposure: Any AI agent (like Claude Code, custom orchestrators, or Lyrie itself if Ollama-integrated) running on a network-accessible Ollama instance becomes a high-value target. Attackers can steal the agent's API keys, prompts, and internal tool definitions.

2. Supply-Chain Cascade: Organizations running Ollama don't just lose their own data—they compromise upstream APIs and services. A stolen OpenAI/Anthropic key from an Ollama heap leak can be leveraged for further attacks across your SaaS stack.

3. LLM Jailbreak & Defense Evasion: System prompts, safety guidelines, and custom instructions are often stored in Ollama process memory. Leaking these reveals your defenses and allows adversaries to craft better jailbreak attacks on future interactions.

4. 300K Server Attack Surface: With 300,000+ Ollama instances globally (many exposed to the internet for convenience), this is one of the largest AI infrastructure vulnerabilities ever disclosed. Mass exploitation is feasible.

5. Autonomous Defense Blind Spot: Many organizations deploying Lyrie or competing autonomous defense tools haven't hardened their local LLM infrastructure. Ollama often gets deployed with default settings—no authentication, no network isolation, no monitoring.

The Autonomous Defense Angle

Attackers no longer need to compromise your firewall or cloud APIs. They can exploit Ollama instances, steal API keys to your cloud defenses, and then use those credentials to disable alerts or exfiltrate more data. This is a direct path to evading autonomous security systems.

Recommended Actions

Immediate (Today):

Audit all Ollama deployments: ps aux | grep ollama and check network bindings.
Isolate any internet-exposed Ollama instances behind a firewall immediately.
Rotate all API keys and credentials that may have been in Ollama process memory (especially OpenAI, Anthropic, cloud provider tokens).

Short-term (This Week):

Update to Ollama 0.17.1 or later.
Deploy an authentication proxy or API gateway in front of all Ollama instances (native REST API has no auth).
Enable network access logging and alert on /api/create requests with large tensor shapes.
Restrict Ollama to localhost (127.0.0.1:11434) unless absolutely required for distributed inference.

Long-term:

Implement secrets rotation policies for all AI inference infrastructure.
Segregate Ollama instances in a dedicated network segment with egress filtering.
Monitor for indicators of compromise: outbound uploads to unknown registries, unusual heap memory access patterns.
Evaluate air-gapping Ollama if it contains proprietary models or sensitive prompts.

Related Vulnerabilities in Ollama

Concurrently, researchers at Striga disclosed two unpatched flaws in Ollama's Windows auto-update mechanism (CVE-2026-42248, CVE-2026-42249) that can chain into persistent code execution at login. These remain unfixed in versions 0.12.10–0.22.0 and highlight systemic security gaps in Ollama's architecture.