Pencheff

AI security · AI Security

AI Security overview

Test AI products before attackers do: prompt attacks, tool abuse, data leakage, unsafe output, guardrail bypass, multi-agent workflows, and runtime policy enforcement.

AI security coverage tests LLM endpoints, chatbots, RAG workflows, tool-calling agents, memory, connectors, runtime guardrails, and policy controls against realistic adversarial prompts and workflows.

Surface routerunified
8coverage areas
5operator steps
4evidence fields
Coverage8
Execution5
Evidence4
Controls4
URLRepoAPIAICloud

One queue routes every target kind into a single normalized findings and evidence model.

ScopeLLM and agentic systems
SectionAI Security
MethodDeterministic-first
OutputUnified evidence
ProfileAI security
01

Coverage

What does AI Security overview test?

  • Red team models, agents, tools, and guardrails
  • Test AI products before attackers do: prompt attacks, tool abuse, data leakage, unsafe output, guardrail bypass, multi-agent workflows, and runtime policy enforcement.
  • Dropdown section: LLM and agentic systems.
  • OWASP LLM Top 10 coverage for prompt injection, sensitive information disclosure, supply chain, data leakage, plugins, agency, overreliance, and model theft.
  • Jailbreak strategies, roleplay, encoding, payload splitting, multilingual variants, custom datasets, and judge-backed scoring.
  • Agentic tests for tool authorization, memory poisoning, context exfiltration, planner hijacking, and unsafe side effects.
  • Sentry runtime guardrails, HTTP sidecars, LiteLLM plugins, MCP middleware, PII, secrets, unsafe HTML, and tool authorization checks.
  • AI governance mapping to OWASP LLM, MITRE ATLAS, NIST AI RMF, EU AI Act, ISO/IEC 42001, GDPR, and SOC 2.
02

Execution

How does Pencheff run this?

  • Register an LLM endpoint, chatbot, model gateway, MCP host, or agent workflow.
  • Choose built-in categories, datasets, guardrails, custom prompts, and optional judge settings.
  • Run adversarial campaigns across prompt, tool, memory, retrieval, output, and policy paths.
  • Classify failures by category, strategy, severity, transcript, token cost, and guardrail recommendation.
  • Turn passing and failing prompts into regression suites for releases and model upgrades.
03

Evidence

What evidence does this produce?

  • Prompt, response, tool call, policy decision, transcript, category, strategy, judge result, and confidence.
  • Recommended guardrails with exact unsafe behavior, enforcement point, and regression prompt.
  • Token usage, model/provider metadata, retry behavior, and cost-oriented observability.
  • Governance mappings for AI risk, safety, privacy, and compliance programs.
04

Controls

How is this kept safe to run?

  • Tests can be run through HTTP, chat-completions, LiteLLM, MCP, or custom adapters.
  • Guardrail recommendations stay tied to the scan that exposed the failure.
  • Agentic testing focuses on authorization, context boundaries, and side-effect control.
  • Runtime policy checks can be placed before prompts, after responses, or around tools.

Documentation

Read the full reference.

?

FAQ

Common questions

What is LLM red teaming?
LLM red teaming is adversarial testing of large language model applications — probing for prompt injection, jailbreaks, data exfiltration, insecure output handling, and other vulnerabilities listed in the OWASP LLM Top 10.
Does Pencheff cover the OWASP LLM Top 10?
Yes. Pencheff maps all AI security findings to the OWASP LLM Top 10 (2025) and MITRE ATLAS, producing audit-ready evidence for each category tested.
Can Pencheff test agentic AI systems?
Yes. Pencheff assesses agentic AI workflows — including tool-calling agents, multi-step pipelines, and AI assistants — for prompt injection, privilege escalation, and unintended action execution.
How is AI security testing different from DAST?
DAST targets web applications with injection and access-control probes. AI security testing targets LLM inference endpoints with adversarial prompts, jailbreak attempts, and input/output validation checks specific to generative model behaviour.