An agent that forgets its instructions should not delete your inbox

Prompt-only safety rules can fail for email agents and other long-running LLM workflows. A model can summarize instructions, misread constraints, or follow new content that changes the task. AgentTrust ID keeps session mode, scope, and approval state outside the model context before a tool call reaches Gmail, Slack, a database, or an MCP server.

TechCrunch covered a claimed OpenClaw inbox incident in February 2026 and noted that it could not independently verify the report. Business Insider also covered the claim. The lesson is narrower than the headline: a delete rule belongs in server-side policy, not model memory.

Context compaction

Context compaction shortens older conversation state so an agent can keep working inside a fixed context window. The summary may keep the goal and drop a safety condition.

That matters when the safety condition says "ask before deleting." If the rule lives only in the prompt, the model has to remember it every time it selects a tool.

How the failure happens

An inbox agent often needs several tools:

gmail.search to find messages
gmail.read to inspect messages
gmail.archive to move messages
gmail.bulk_delete to remove messages

Those tools do not carry the same risk. Search and read are review actions. Archive changes state. Bulk delete can destroy data.

A long task can start as review and drift into cleanup. The model may still produce a plausible explanation. The wrapper needs a separate authorization check before the delete tool runs.

Python example

This example assumes the MCP server is registered with default_mode="read_only" and includes gmail.read and gmail.bulk_delete in its scope ceiling. The session allows read actions immediately. A destructive action returns an elevation request before execution.

from agenttrustid import AgentTrustClient

client = AgentTrustClient.from_env()

agent = client.agents.create(
    name="inbox-triage",
    framework="custom",
    capabilities=["gmail.search", "gmail.read", "gmail.bulk_delete"],
)

# Use the server ID returned by MCP server registration.
server_id = "mcp-server-id-from-dashboard"
session = client.sessions.init_session(
    agent_id=agent.id,
    server_id=server_id,
)

read = client.actions.check(
    agent_id=agent.id,
    session_id=session.session_id,
    tool_name="gmail.read",
    tool_input_summary="Read messages selected for inbox review.",
    action_effect="read",
)

delete = client.actions.check(
    agent_id=agent.id,
    session_id=session.session_id,
    tool_name="gmail.bulk_delete",
    tool_input_summary="Delete messages selected by the model.",
    action_effect="destructive",
)

assert read.allowed is True
assert delete.allowed is False

if delete.elevation_required:
    print(f"Approval required before delete: {delete.approval_id}")

The wrapper should stop when allowed is false. It should send the approval request to an operator path, not call the mailbox API and hope the model was right.

Business impact

Email should not rely on prompt memory for deletion rules. A deletion mistake can remove customer messages, audit evidence, purchase records, or legal notices.

The first cost is recovery work. The lasting cost is trust. People stop using the agent when they believe a cleanup task can become data loss.

Prevention

Put review work in a read-only session. Classify state-changing tools as mutating or destructive. Require approval before the tool runs.

Limit the approval. A five-minute grant for gmail.bulk_delete should not grant every write tool in the session.

Keep the scope ceiling honest. If an inbox review agent should never delete, leave gmail.bulk_delete out of the server scope.

Solving this with AgentTrust ID

AgentTrust ID stores the session mode, scope ceiling, and approval state outside the model context. A read-only session allows read-classified actions through the Guardian path. Mutating and destructive actions return elevation_required with an approval_id.

Approved elevation is time-boxed and per action. The code caps elevation at five minutes and checks that the requested action fits the session scope ceiling. Admin-classified actions cannot be elevated through this path.

Context compaction can change what the model remembers. It does not change the session mode, the server tools, or the approval record.

To add approval checks around agent tools, start with the SDK guide. To talk through read-only sessions for your workflow, join the waitlist.