Agent Sentinel: Out-of-Band Safety Gateway for Cloud AI Agents

Project summary

AI agents are starting to operate real infrastructure. They can delete servers, change permissions, and move data across cloud environments. As these systems become more autonomous, mistakes or unexpected behavior can cause immediate operational problems.

Most current safeguards rely on prompts, internal model reasoning, or simple permission controls. These approaches can fail if an agent hallucinates, is misconfigured, or is manipulated through prompt injection. They also struggle to detect situations where several individually safe actions combine into a dangerous sequence.

Agent Sentinel explores a different approach. It introduces an external safety gateway that sits between AI agents and infrastructure APIs. Instead of relying on the agent to regulate itself, every proposed action is evaluated before execution.

The system converts human safety instructions into structured policies and evaluates agent actions against those policies. It also uses AI-assisted risk scoring and sequence analysis to detect potentially dangerous patterns of behavior.

Based on this evaluation, the gateway returns one of three decisions: allow the action, block it, or require human approval.

The goal of this project is to build and document a working prototype showing how AI-driven infrastructure automation can be made safer using an out-of-band safety boundary.

Prototype Repo: https://github.com/indranimaz23-oss/agent-sentinel

What are this project's goals? How will you achieve them?

The goal of this project is to build a working prototype of Agent Sentinel, an out-of-band safety gateway that evaluates infrastructure actions proposed by AI agents before they are executed.

The prototype will focus on three core capabilities.

Operators often describe safety rules in natural language. For example: “Do not delete production storage unless emergency approval is provided.”
Agent Sentinel converts these instructions into structured policies that can be evaluated automatically against agent actions.

This creates a clear bridge between human operational intent and machine-enforced safety rules.

When an AI agent proposes an infrastructure action, the gateway evaluates the request using multiple signals:

structured policy rules
contextual metadata (environment, resource type, action type)
AI-assisted risk scoring for potentially destructive actions

This risk modeling layer helps detect situations where an action may technically be allowed but operationally dangerous.

Many infrastructure failures occur when several individually safe actions are combined in sequence. The prototype will implement sequence-aware analysis that tracks agent activity across multiple steps and evaluates whether a sequence of actions creates elevated risk.

After evaluation, the gateway returns one of three outcomes:

Allow the action
Block the action
Require human approval

This allows automation to proceed when safe while preventing destructive or uncertain operations.

The prototype will be implemented as a lightweight interception service that sits between AI agents and cloud infrastructure APIs. The system is designed with cloud environments in mind, where AI agents may interact with infrastructure APIs such as resource management, identity permissions, and data storage operations.

A testing environment will simulate agent-generated infrastructure actions so the system can evaluate realistic scenarios such as resource deletion, permission changes, and data movement.

The result of this project will be a working prototype and documented architecture demonstrating how external safety boundaries can make AI-driven infrastructure automation significantly safer.

How will this funding be used?

This funding will support the continued development and validation of the Agent Sentinel prototype.

The primary use of the funding will be engineering development and system experimentation. The project will expand the current prototype by implementing three core components: a policy compilation layer that converts human safety instructions into machine-readable rules, a risk evaluation engine for infrastructure actions proposed by AI agents, and a sequence-aware analysis module that detects risky patterns across multiple agent actions.

A portion of the funding will also support infrastructure experiments where simulated AI agents propose actions such as deleting resources, modifying permissions, or moving data across cloud environments. These experiments will allow the system to evaluate actions, log decisions, and refine risk scoring and policy evaluation mechanisms.

Another goal of this work is to document the architecture and release a clear technical description of the safety gateway model. The intention is to demonstrate how an external interception layer can provide practical safeguards for AI agents operating real infrastructure.

Cloud infrastructure used for testing is partially supported through credits from AWS Activate. This allows the grant funding to focus primarily on engineering development and prototype validation.

Prototype Repo: https://github.com/indranimaz23-oss/agent-sentinel

Who is on your team? What's your track record on similar projects?

Indrani Mazumdar - Founder & Lead Developer

I am the founder and lead developer of Agent Sentinel. I’m an AI architect and machine learning engineer with a career spent building and securing large-scale systems in operational environments.

Early in my career at Verizon, I worked in cybersecurity building autonomous threat-hunting models to catch anomalous patterns in network activity. That experience taught me that in security-critical environments, you cannot rely on "best-case" behavior; you have to build systems that assume things will go wrong.

More recently, my work has focused on the practical deployment of generative AI, specifically retrieval-augmented generation (RAG) and enterprise-grade guardrails. I’ve seen firsthand how unpredictable AI can be when it interacts with real-world operational data.

I'm building Agent Sentinel because I’ve spent years fixing "broken" infrastructure manually in enterprise environments. I know exactly how a single misconfigured script or a misunderstood command can take down a production system. As we give AI agents the keys to our cloud infrastructure, we are introducing a risk profile that current tools aren't ready for. I’d rather build the "emergency brake" now than wait to see it fail in production.

I’ve already begun early prototype development and infrastructure experimentation, supported in part through credits from AWS Activate. I am currently an adjunct researcher, and my goal with Agent Sentinel is to bridge the gap between AI safety research and the practical reality of cloud engineering.

As the project progresses, the goal is to expand collaboration with researchers and engineers interested in practical AI safety systems for autonomous infrastructure operations.

What are the most likely causes and outcomes if this project fails?

One possible challenge is that accurately modeling the risk of infrastructure actions may be more difficult than anticipated. AI agents can generate a wide range of actions across many environments, and defining policies that generalize well across these situations may require more iteration and experimentation than expected.

Another potential difficulty is sequence analysis. While individual actions may appear safe, identifying risky multi-step patterns requires collecting sufficient examples of agent behavior and refining the logic used to detect dangerous sequences.

There is also a possibility that existing cloud permission systems already provide enough safeguards for some environments, reducing the perceived need for an additional safety gateway.

If the project does not fully succeed in building a robust interception system, the work will still produce useful outcomes. The prototype and documentation will help clarify where current safety mechanisms for AI agents are insufficient and what types of guardrails are most effective.

Even a partial result would contribute to the broader understanding of how autonomous systems should interact with critical infrastructure and where additional safety controls are required.

How much money have you raised in the last 12 months, and from where?

No external funding has been raised for this project in the last 12 months. Early prototype development has been supported through cloud infrastructure credits provided by AWS Activate.