Test AI products before attackers do: prompt attacks, tool abuse, data leakage, unsafe output, guardrail bypass, multi-agent workflows, and runtime policy enforcement.
AI security
Sentry runtime guardrail
Policy checks for prompts, responses, tools, HTML, secrets, PII, and unsafe actions.
Findings, reports, dashboards, exports, integrations, and retests all read from the same normalized record.
Pencheff favors repeatable checks, then uses AI for triage, enrichment, orchestration, and remediation where it adds signal.
Coverage
What does Sentry runtime guardrail test?
- Policy checks for prompts, responses, tools, HTML, secrets, PII, and unsafe actions.
- This page is part of AI Security under Guardrails.
- It links back into the broader red team models, agents, tools, and guardrails experience.
- OWASP LLM Top 10 coverage for prompt injection, sensitive information disclosure, supply chain, data leakage, plugins, agency, overreliance, and model theft.
- Jailbreak strategies, roleplay, encoding, payload splitting, multilingual variants, custom datasets, and judge-backed scoring.
- Agentic tests for tool authorization, memory poisoning, context exfiltration, planner hijacking, and unsafe side effects.
- Sentry runtime guardrails, HTTP sidecars, LiteLLM plugins, MCP middleware, PII, secrets, unsafe HTML, and tool authorization checks.
- AI governance mapping to OWASP LLM, MITRE ATLAS, NIST AI RMF, EU AI Act, ISO/IEC 42001, GDPR, and SOC 2.
Execution
How does Pencheff run this?
- Register an LLM endpoint, chatbot, model gateway, MCP host, or agent workflow.
- Choose built-in categories, datasets, guardrails, custom prompts, and optional judge settings.
- Run adversarial campaigns across prompt, tool, memory, retrieval, output, and policy paths.
- Classify failures by category, strategy, severity, transcript, token cost, and guardrail recommendation.
- Turn passing and failing prompts into regression suites for releases and model upgrades.
Evidence
What evidence does this produce?
- Prompt, response, tool call, policy decision, transcript, category, strategy, judge result, and confidence.
- Recommended guardrails with exact unsafe behavior, enforcement point, and regression prompt.
- Token usage, model/provider metadata, retry behavior, and cost-oriented observability.
- Governance mappings for AI risk, safety, privacy, and compliance programs.
Controls
How is this kept safe to run?
- Tests can be run through HTTP, chat-completions, LiteLLM, MCP, or custom adapters.
- Guardrail recommendations stay tied to the scan that exposed the failure.
- Agentic testing focuses on authorization, context boundaries, and side-effect control.
- Runtime policy checks can be placed before prompts, after responses, or around tools.
Documentation
Read the full reference.
References
Authoritative sources
FAQ
Common questions
- What is a runtime AI guardrail?
- A runtime guardrail is an inline security layer that inspects every prompt and model response in real time — blocking prompt injections, policy violations, PII leakage, and toxic output before they affect the user or the application's downstream actions.
- How does Pencheff Sentry work as a guardrail?
- Sentry sits between your application and the LLM as a proxy or sidecar. Every incoming prompt and outgoing response passes through a configurable detector chain — checking for injection patterns, forbidden topics, PII, excessive permissions, and policy violations — before the content reaches its destination.
- Does the Sentry guardrail add latency to LLM responses?
- Sentry's detector chain is optimised for sub-10ms overhead on most checks. Expensive multi-classifier evaluations can be run asynchronously in monitoring mode, so you can observe policy violations without adding synchronous latency to the user experience.
- How does Sentry integrate into an existing LLM application?
- Sentry integrates as an OpenAI-compatible proxy (point your SDK at the Sentry endpoint), a LiteLLM plugin, or a sidecar container. No application code changes are required for basic guardrail enforcement.
Related