CVE-2025-68664: A Case Study in How AI Agent Velocity Is Stress-Testing Vulnerability Management

By Aviad Chen

The proliferation of AI Agents is creating a “Vulnerability Gold Rush.” While developers are racing to ship features using LangChain, LiteLLM, and the new Claude/OpenAI SDKs, the underlying libraries are evolving so fast that security patches are frequently entangled with massive breaking changes.

For an organization running dozens of agents, this isn’t just a maintenance headache — it’s a critical security gap. Here is a technical deep dive into why manual patching is failing and how automated remediation is the only path forward.

The 200 Hour CVE: Why Your AI Agents Are Bankrupting Your Sprint Cycles

The software supply chain for AI is fundamentally different from traditional web development. In traditional stacks, a library might see a major breaking change once every few years. In the AI world, breaking changes are a weekly event.

1. The Numbers: CVEs vs. Release Velocity

Data from late 2025 and early 2026 shows a staggering trend. Popular AI libraries are being hit with high-severity CVEs (such as Remote Code Execution and Secret Extraction) while simultaneously releasing hundreds of versions.

The Maintenance Trap: LiteLLM has released over 1,000 versions in a relatively short lifespan. If your security team flags a vulnerability in version 1.44.5, but the fix is in 1.81.3, you aren’t just changing a version number—you are likely traversing several breaking changes in how callbacks or model parameters are handled.

2. The Scaling Crisis in Large Orgs

When a large organization has 50+ AI Agents in production, a single “Critical” CVE in LangChain doesn’t just mean one PR. It means:

Finding every instance where the vulnerable loads()or BaseAgentis used.
Testing if the new allowed_objectssecurity parameter breaks existing tool-calling logic.
Refactoring deprecated import paths (e.g., moving from langchain.agentsto langchain_classic).

3. Real-World Case Study: The “Secret Stealing” Remediation

A real example we handled at Backline involved CVE-2025-68664. It is a serialization injection issue in dumps()/dumpd()where attacker-controlled metadata could persist and later be interpreted during load()/loads().

The fix required upgrading to LangChain 0.3.81, or 1.2.5, which introduced a breaking change: loads()now defaults to a restrictive allowlist; custom objects often require explicit configuration and defaults secrets_from_envto False.

The Git Diff: Manual vs. Backline Auto-Remediation

Here is how the code looked before and after Backline’s remediation agent stepped in.

Before (Vulnerable – LangChain < 0.3.81)

from langchain.load import dumps, loads
from langchain.agents import AgentExecutor
from langchain.tools import Tool

class AgentStateManager:
    """Manages agent state persistence for multi-turn conversations."""

    def __init__(self, redis_client):
        self.redis = redis_client

    def save_agent_state(self, session_id: str, agent: AgentExecutor):
        # Pre-patch risk:
        # dumps()/dumpd() did not properly escape dictionaries containing
        # the reserved 'lc' key used for LangChain object metadata.
        # Attacker-controlled data could survive serialization as a
        # pseudo-LangChain object.
        serialized = dumps(agent)
        self.redis.set(f"agent:{session_id}", serialized)

    def restore_agent_state(self, session_id: str) -> AgentExecutor:
        serialized = self.redis.get(f"agent:{session_id}")

        # Risky pattern:
        # loads() may deserialize attacker-influenced structures if
        # unsafe data was serialized earlier.
        # Older versions also defaulted secrets_from_env=True.
        return loads(serialized)

    def load_tool_config(self, config_json: str) -> list[Tool]:
        # Deserializing user-controlled JSON without restrictions
        # increases exposure to object-injection issues.
        return loads(config_json)

After (Fixed – LangChain >= 0.3.81)

from langchain.load import dumps, loads
from langchain.agents import AgentExecutor
from langchain.tools import Tool
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory

# Explicit allow-list of deserializable classes
ALLOWED_OBJECTS = [
    AgentExecutor,
    Tool,
    ChatOpenAI,
    ConversationBufferMemory,
]

class AgentStateManager:
    """Manages agent state persistence for multi-turn conversations."""

    def __init__(self, redis_client):
        self.redis = redis_client

    def save_agent_state(self, session_id: str, agent: AgentExecutor):
        # dumps() now properly escapes reserved keys
        serialized = dumps(agent)
        self.redis.set(f"agent:{session_id}", serialized)

    def restore_agent_state(self, session_id: str) -> AgentExecutor:
        serialized = self.redis.get(f"agent:{session_id}")

        # Hardened deserialization:
        # - allowed_objects restricts what can be instantiated
        # - secrets_from_env now defaults to False (explicit here)
        return loads(
            serialized,
            allowed_objects=ALLOWED_OBJECTS,
            secrets_from_env=False,
        )

    def load_tool_config(self, config_json: str) -> list[Tool]:
        # Narrow allow-list for user-provided configs
        return loads(
            config_json,
            allowed_objects=[Tool],
            secrets_from_env=False,
        )

Why This Doesn’t Scale: The “Manual Tax” on Innovation

The diff shown above seems simple for a single file, but in a production environment with dozens of agents, it becomes a logistical nightmare. The challenge at scale is that every loads()call across 50+ agents needs a surgical approach.

When a vulnerability like that hits, a manual response requires:

Audit: You have to find every loads()invocation. In a medium-sized codebase, a simple grepcan easily return 200+ occurrences across various utility folders and agent logic.
Classify: Not every call is the same. You must determine which LangChain types (e.g., ChatMessage, SystemMessage, CustomToolOutput) each specific call site legitimately needs to support.
Refactor: You must add the allowed_objectsallow-list unique to each call site. A “blanket” allow-list defeats the security purpose, while one that is too restrictive breaks production.
Test: You have to verify that no legitimate deserialization is now blocked. One missed class name in that list, and your agent crashes mid-conversation for a user.

Why Manual Fixing Won’t Hold

If a developer spends just 4 hours per agent refactoring these “security-driven breaking changes,” an organization with 50 agents loses 200 engineering hours for a single CVE. By the time those PRs are merged, three new CVEs have likely been issued for other parts of the stack.

The Conclusion is Clear: We are using AI to write code faster than ever. We cannot expect humans to secure that code at the same speed. To secure the AI Agent revolution, we must use Security Agents that understand code semantics.

At Backline, we automate this “Research -> Refactor -> Test” loop. Our agents don’t just find the vulnerability; they understand the context of your specific loads() calls and perform the surgical refactoring required. This keeps your agents secure without your developers becoming full-time librarians for SDK changelogs.