Run web, API, code, dependency, cloud, AI, and internal-network assessments from one queue with unified findings, evidence, remediation, and audit output.
AI security
LLM red team
OWASP LLM Top 10 attack modules with jailbreak corpora, judges, and token accounting.
Findings, reports, dashboards, exports, integrations, and retests all read from the same normalized record.
Pencheff favors repeatable checks, then uses AI for triage, enrichment, orchestration, and remediation where it adds signal.
Coverage
What does LLM red team test?
- OWASP LLM Top 10 attack modules with jailbreak corpora, judges, and token accounting.
- This page is part of Platform under AI Security.
- It links back into the broader a complete adversarial security platform experience.
- OWASP LLM Top 10 coverage for prompt injection, sensitive information disclosure, supply chain, data leakage, plugins, agency, overreliance, and model theft.
- Jailbreak strategies, roleplay, encoding, payload splitting, multilingual variants, custom datasets, and judge-backed scoring.
- Agentic tests for tool authorization, memory poisoning, context exfiltration, planner hijacking, and unsafe side effects.
- Sentry runtime guardrails, HTTP sidecars, LiteLLM plugins, MCP middleware, PII, secrets, unsafe HTML, and tool authorization checks.
- AI governance mapping to OWASP LLM, MITRE ATLAS, NIST AI RMF, EU AI Act, ISO/IEC 42001, GDPR, and SOC 2.
Execution
How does Pencheff run this?
- Register an LLM endpoint, chatbot, model gateway, MCP host, or agent workflow.
- Choose built-in categories, datasets, guardrails, custom prompts, and optional judge settings.
- Run adversarial campaigns across prompt, tool, memory, retrieval, output, and policy paths.
- Classify failures by category, strategy, severity, transcript, token cost, and guardrail recommendation.
- Turn passing and failing prompts into regression suites for releases and model upgrades.
Evidence
What evidence does this produce?
- Prompt, response, tool call, policy decision, transcript, category, strategy, judge result, and confidence.
- Recommended guardrails with exact unsafe behavior, enforcement point, and regression prompt.
- Token usage, model/provider metadata, retry behavior, and cost-oriented observability.
- Governance mappings for AI risk, safety, privacy, and compliance programs.
Controls
How is this kept safe to run?
- Tests can be run through HTTP, chat-completions, LiteLLM, MCP, or custom adapters.
- Guardrail recommendations stay tied to the scan that exposed the failure.
- Agentic testing focuses on authorization, context boundaries, and side-effect control.
- Runtime policy checks can be placed before prompts, after responses, or around tools.
Documentation
Read the full reference.
References
Authoritative sources
FAQ
Common questions
- What is an LLM red team assessment?
- An LLM red team assessment systematically probes a large language model application for security vulnerabilities — including prompt injection, jailbreaks, data extraction, insecure output handling, and supply-chain risks — using adversarial attack strategies aligned with OWASP LLM Top 10.
- What attack strategies does Pencheff use for LLM red teaming?
- Pencheff uses multi-turn Crescendo attacks, PAIR (Prompt Automatic Iterative Refinement), TAP, GOAT, Hydra, and attacker-LLM synthesis — automatically generating and iterating adversarial prompts across thousands of turns to find exploitable model behaviours.
- Which LLM providers and deployment modes does Pencheff support?
- Pencheff supports OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure OpenAI, Mistral, and any OpenAI-compatible endpoint. It connects via direct API, proxy, or custom HTTP transport with configurable rate limits and cost ceilings.
- How does Pencheff grade LLM security findings?
- Each test turn is graded by an independent LLM-as-judge that evaluates whether the model's response constitutes a security failure. Results are classified by OWASP LLM Top 10 category and severity, with full prompt/response evidence included in the report.
Related