🇺🇸 English | 🇧🇷 Português | 🇪🇸 Español
Context Engineering Arsenal

Your AI agents don't have a capability problem. They have a memory problem — and this is the system that fixes it.
┌─[ THE PROBLEM ]─────────────────────────────────────────┐
│ │
│ Sessions reset. Decisions vanish. Context rots. │
│ │
│ You re-explain the same thing every session. Your │
│ agent confidently does the wrong thing — because it │
│ lost the context that would have stopped it. │
│ │
│ 76% of loaded context gets ignored. (CL-Bench) │
│ You're not under-prompting. You're over-loading. │
│ │
└──────────────────────────────────────────────────────────┘
┌─[ WHAT CHANGES ]────────────────────────────────────────┐
│ │
│ A complete context engineering operating system. │
│ Research-backed. Battle-tested on 150+ agent systems. │
│ Every framework traces to a published source. │
│ │
└──────────────────────────────────────────────────────────┘
By the Numbers
| Category | Count | Details |
|---|
| Agents | 13 | Specialized roles from orchestration to threat analysis |
| Skills | 10 | Standalone commands you can run immediately |
| Workflows | 6 | Multi-agent pipelines for complex operations |
| Frameworks | 50+ | Embedded decision models, rubrics, and protocols |
| Research Sources | 14 | Peer-reviewed papers, production case studies, expert analysis |
| Anti-Patterns | 33 | Named failure modes with detection and remediation |
| Rubric Criteria | 30 | Profile-adaptive health scoring (Solo: 7, Medium: 22, Enterprise: 30) |
| CELF Layers | 8 | L0 Constitution through L7 Delegation |
| Epistemic Tiers | 6 | AXIOM through SPECULATION classification |
| Context Pathologies | 4 | Research-validated LLM failure modes |
| Compression Methods | 3 | Ordered by information loss |
| Templates | 5 | Ready-to-use CLAUDE.md, BRAIN.yaml, STATE.yaml |
| Audit Scripts | 2 | Python-based diagnostics you can run standalone |
Quick Start
# Install
squads install context-engineering
# Your first command — run a health check
ce audit
# Scaffold a new project's context architecture
ce scaffold
# Design a payload for an LLM task
ce payload "Analyze legal contracts for hidden risks"
# Plan a sprint's context strategy
ce sprint-start
Onboarding Gradient
| Level | What To Try | Agents Involved |
|---|
| Simple | ce audit — instant health diagnostic | ContextAuditor |
| Medium | ce scaffold — build your project's context architecture | ContextChief + LayerArchitect |
| Advanced | ce sprint-start — full sprint lifecycle with state persistence | StateArchitect + TokenArchitect + ContextAuditor |
How To Use — Manual
┌─[ CONTEXT SCENARIOS ]───────────────────────────────────┐
│ │
│ Start from the problem, not the feature. │
│ │
└──────────────────────────────────────────────────────────┘
| Situation | Run This | What Happens |
|---|
| "My agent keeps forgetting decisions" | ce audit + ce scaffold | StateArchitect builds persistence layer |
| "I'm starting a new project" | ce scaffold | LayerArchitect sets up 8-layer CELF structure |
| "My context window is bloating" | *smart-compact | Compression Trilogy: Offload > Truncate > Summarize |
| "I need to feed a long document to an agent" | ce ingest | ETLEngineer runs 5-stage pipeline with So-What Gate |
| "How healthy is my project's context?" | *five-vitals | 5 structural signals, scored 0-10, graded A-F |
| "I want to design a prompt for a specific task" | ce payload | PayloadForge applies 7 cognitive strategies |
| "I don't know where to start" | *wizard | 5-question guided builder. No experience needed |
| "Something feels wrong but I can't name it" | *diagnose | Layer-by-layer CELF scan, scores 0-3 per layer |
| "I'm designing a multi-agent system" | ce blueprint | AgentDesigner + ThreatSentinel design + stress-test |
| "Is my documentation still accurate?" | *doc-rot | Finds decay. Wrong docs are worse than no docs |
| "How many tokens does my boot cost?" | ce boot-audit | Maps everything loading before your first prompt |
| "Which model should handle this task?" | ce route | Cognitive Scoring Matrix routes to cheapest sufficient model |
| "Where are my tokens being wasted?" | ce profile | 80/20 analysis: find the 20% consuming 80% of budget |
| "Should this file load every turn?" | *optimize-injection | Break-even analysis: persistent vs on-demand |
| "Best way to compress this context?" | *bench | Side-by-side: summarize vs truncate vs distill |
Agents
┌─[ THE SQUAD ]───────────────────────────────────────────┐
│ │
│ 13 specialists. Each one does one thing well. │
│ Depth = frameworks, techniques, and rubrics embedded. │
│ │
└──────────────────────────────────────────────────────────┘
| Agent | Role | Key Capability | Depth |
|---|
| ContextChief | Orchestrator | Routes requests, coordinates workflows, health dashboard | Standard |
| PayloadForge | Payload Designer | 7 cognitive strategies, wizard mode, cultural adaptation | Deep |
| LayerArchitect | Architecture Specialist | 8-layer CELF framework, scaffolding, epistemic classification | Deep |
| ContextAuditor | Health Auditor | 30-criterion rubric, 6 diagnostic skills, boot audit, forensic analysis | Deep |
| StateArchitect | State Engineer | Sprint lifecycles, decision preservation, smart compression | Standard |
| TokenArchitect | Token Economist | Cognitive scoring matrix, cost projection, burn rate tracking | Deep |
| ThreatSentinel | Threat Analyst | 4 pathologies, 5 threats, surgical context repair | Standard |
| ETLEngineer | Ingestion Specialist | 5-stage pipeline, semantic chunking, So-What Gate | Standard |
| AgentDesigner | System Architect | Multi-agent patterns, delegation packages, quality checklist | Standard |
| MetaAgent | Evolution Engine | Self-audit, knowledge curation, calibration, evolution planning | Standard |
| TokenProfiler | Cost Profiler | Per-file token cost, 80/20 analysis, ROI per line | Standard |
| BootAuditor | Boot Inspector | Boot loading map, traffic-light scoring, profile targets | Standard |
| ContextRouter | Routing Strategist | Cognitive Scoring Matrix (extended), cascade design, cost calculators | Deep |
Depth key: Standard = focused competency with clear boundaries. Deep = multiple embedded frameworks, rubrics, or decision matrices that compound during execution.
<details>
<summary><strong>Skills (10)</strong></summary>
| Skill | Command | What It Does |
|---|
| Five Vitals | *five-vitals | 5 structural health signals, scored 0-10, graded A-F |
| Doc Rot | *doc-rot | Finds documentation decay for deletion (wrong docs > no docs) |
| Epistemic Audit | *epistemic-audit | Validates epistemic coherence (AXIOM through SPECULATION) |
| Diagnose | *diagnose | Layer-by-layer CELF health scan, scores 0-3 per layer |
| Validate | *validate | 30-criterion pass/fail rubric, profile-adaptive (Solo: 7, Medium: 22, Enterprise: 30) |
| Smart Compact | *smart-compact | Compression Trilogy: Offload > Truncate > Summarize |
| Context Surgeon | *context-surgeon | Surgical repair of detected context pathologies |
| Wizard | *wizard | 5-question guided payload builder for beginners |
| Injection Optimizer | *optimize-injection | Persistent vs on-demand loading strategy with MVC formula |
| Compression Bench | *bench | Benchmark compression methods with quality preservation scoring |
</details>
<details>
<summary><strong>Workflows (6)</strong></summary>
| Workflow | Solves | Agents | Pipeline |
|---|
| full-payload | "I need to give an agent the right context for a task" | Chief + PayloadForge + ThreatSentinel | Brief > Strategy Selection > Payload Assembly > Threat Scan |
| project-scaffold | "I'm starting from zero and need structure" | Chief + LayerArchitect + ContextAuditor | Survey > CELF Mapping > Scaffold > Validation |
| full-audit | "Something feels off and I need a diagnosis" | ContextAuditor + ThreatSentinel | 5 Diagnostics > Pathology Scan > Prioritized Report |
| sprint-lifecycle | "I need context management across a multi-day sprint" | StateArchitect + TokenArchitect + ContextAuditor | Inject > Plan > Execute > Persist > Compact > Validate |
| dense-ingestion | "I have a long document or transcript to process" | ETLEngineer + ContextAuditor | Extract > Transform > Load > So-What Gate > Quality Check |
| agent-blueprint | "I'm designing a new agent or multi-agent system" | Chief + AgentDesigner + ThreatSentinel | Requirements > Pattern Match > Design > Stress Test |
</details>
<details>
<summary><strong>Key Frameworks (10 embedded)</strong></summary>
| Framework | What It Solves |
|---|
| 8-Layer CELF | Where does each piece of context live? (L0 Constitution through L7 Delegation) |
| Epistemic Classification | How reliable is this information? (AXIOM > FACT > EVIDENCE > HEURISTIC > INFERENCE > SPECULATION) |
| 7 Cognitive Strategies | Which approach for this LLM task? (Zero-Shot through Multi-Agent) |
| 4 Context Pathologies | What's going wrong? (Poisoning, Distraction, Confusion, Clash) |
| Compression Trilogy | How to shrink context safely? (Offload > Truncate > Summarize) |
| Cognitive Scoring Matrix | Which model for this task? (2-axis scoring: Cognition x Consequence) |
| MVC Formula | What context does this agent need? (Essential / Helpful / Noise) |
| 33 Anti-Patterns | What mistakes to avoid? (Context Dump through Compress-and-Forget) |
| 30-Criterion Rubric | How healthy is this project? (Profile-adaptive: Solo 7, Medium 22, Enterprise 30 criteria) |
| 5 Structural Vitals | Is the system architecturally sound? (Tone, Coherence, Density, Clarity, Hygiene) |
</details>
<details>
<summary><strong>Templates (5 included)</strong></summary>
Ready-to-use templates for common context artifacts:
- CLAUDE.md — Solo (~50 lines), Medium (~120 lines), Enterprise (~180 lines)
- BRAIN.yaml — Minimal, Standard, Full (knowledge graph entry points)
- STATE.yaml — Minimal, Standard (with DECISIONS.md template)
</details>
<details>
<summary><strong>Scripts (2 Python)</strong></summary>
python scripts/diagnose.py /path/to/project # Layer health scanner
python scripts/validate.py /path/to/project # Profile-adaptive rubric (7-30 criteria)
</details>
<details>
<summary><strong>Vocabulary</strong></summary>
| Term | Meaning |
|---|
| CELF | Context Engineering Layered Framework — 8-layer hierarchy (L0-L7) for organizing AI context |
| MVC | Minimum Viable Context — the least context needed for an agent to perform a task well |
| Epistemic Status | How reliable a piece of information is: AXIOM > FACT > EVIDENCE > HEURISTIC > INFERENCE > SPECULATION |
| Context Pathology | A research-validated failure mode of how LLMs process context (Poisoning, Distraction, Confusion, Clash) |
| Compression Trilogy | Three-step protocol for reducing context: Offload (zero loss) > Truncate (low loss) > Summarize (last resort) |
| So-What Gate | Quality filter: every extracted item must answer "So what?", "What action?", "Who does it?" or get discarded |
| CL-Bench | Research benchmark showing 76% of loaded context is ignored by models |
| ACE Cycle | Adaptive Context Evolution — Generate > Reflect > Curate loop for maintaining context quality |
| MemGPT | Research architecture for tiered AI memory: Working Memory > Short-term > Long-term |
| Sprint Blueprint | A plan defining what context loads when, token budget allocation, and execution sequence |
| Prompt Delta | Post-execution improvement artifact: what to observe, hypotheses, modifications for next iteration |
| Cognitive Scoring Matrix | 2-axis framework (Cognition x Consequence) for routing tasks to the right model tier |
</details>
Research Foundation
This squad distills intelligence from 14 sources. Not a reading list — each source contributed specific, testable frameworks that are embedded in the agents.
| Source | Contribution |
|---|
| Andrej Karpathy | Core framing: context engineering as a discipline, not prompt engineering |
| Gemini 2.5 Pathology Study | 4 validated context failure modes (Poisoning, Distraction, Confusion, Clash) |
| CL-Bench | Quantified the problem: 76% of loaded context gets ignored by models |
| ACE Framework | Adaptive Context Evolution cycle: +17.1% completion rate, -86.9% latency |
| MemGPT | Tiered memory architecture: Working > Short-term > Long-term |
| FrugalGPT | Cost optimization patterns: up to 98% cost reduction via cascading |
| RouteLLM (ICLR 2025) | Model routing heuristics: route to cheapest sufficient model |
| Structured Distillation | 11x compression ratio while preserving semantic fidelity |
| Lost in the Middle | U-curve attention pattern: models ignore information in the middle of context |
| Multi-model synthesis | Cross-validated across Claude, GPT, Gemini, Grok, DeepSeek, Perplexity |
| Expert transcripts | Agent context engineering patterns from YouTube deep-dives |
| Production systems (150+ agents) | Battle-tested patterns from multi-agent orchestration at scale |
| Epistemic classification research | 6-tier reliability framework for information provenance |
| Token economics literature | Cost modeling, burn rate tracking, ROI-per-token analysis |
Methodology
How this squad was built — not copy-pasted from prompts, but forged through a 5-stage pipeline:
RESEARCH ──> EXTRACT ──> MODEL ──> VALIDATE ──> SHIP
14 sources Frameworks Agent Battle-tested Production
analyzed isolated design + on 150+ agent artifact
cross-ref'd and named workflow systems with audit
wiring scripts
- RESEARCH — 14 sources analyzed. Cross-referenced across 6 LLM providers to eliminate single-source bias.
- EXTRACT — Every actionable framework isolated, named, and given clear boundaries. Theory discarded.
- MODEL — Frameworks assigned to specialist agents. Workflows wired for multi-agent coordination.
- VALIDATE — Tested on production systems running 150+ agents. Failure modes cataloged as anti-patterns.
- SHIP — Packaged with audit scripts, templates, onboarding gradient, and vocabulary. Ready to install and run.
Who This Is For
You build agents that run across multiple sessions. Workflows, autonomous pipelines, AI assistants with ongoing responsibilities. You've hit the wall where your agent stops being reliable the moment a session ends or a context gets compacted.
You don't want another prompt trick. You want the underlying structure that makes context stick: state management, decision persistence, memory protocols, token discipline.
Who This Is NOT For
You write one-shot prompts. Your use case starts and ends in a single conversation. You want a template library to copy-paste. If your agent doesn't need to remember anything across sessions, there's nothing here you need.
The Five Laws
- Clarity Over Complexity — Simple instructions beat sophisticated ambiguous ones
- Information Density, Not Volume — 10 perfect chunks > 1000 mediocre ones
- Context Has Cost — Every token loaded is unavailable for reasoning
- Iterative Refinement — Evidence-based optimization beats intuition
- Robustness Through Diversity — Multiple complementary approaches > single method
What Makes This Different
- System, not snippets. 13 agents that coordinate through 6 workflows. Not a folder of prompts.
- Research-backed. Every framework traces to a published source. 14 total. Zero invented heuristics.
- Failure modes are named. 4 pathologies, 33 anti-patterns, all cataloged. You diagnose, not guess.
- Profile-adaptive. Solo dev? 7-criterion rubric. Enterprise team? 30 criteria. Same squad, different depth.
- Compression is a protocol. Offload > Truncate > Summarize. Three steps, ordered by information loss.
- Tested on 150+ agent systems. Built from production, not theory.
Value Equation
The cost of bad context is invisible until it isn't.
| Scenario | Cost |
|---|
| Agent hallucinates due to stale context, you debug for 2 hours | ~$300-600 in time |
| Re-explaining project context every session, 15 min/day | ~40 hours/year wasted |
| Wrong model routing on 100 daily tasks at $0.50 overspend each | ~$18,000/year leaked |
| Building these frameworks yourself from the 14 sources | 40-80 hours of research + prompt engineering |
| Hiring a context engineer | $150-300/hr |
This squad: $49.90. One-time. Every project from now on.
The break-even is your first ce audit. One diagnostic run will surface context waste you didn't know existed. The token savings from a single *smart-compact pass typically exceed the purchase price within a week.
Evolution Path
v1.0 Foundation ██████████████████████░░░░ SHIPPED
Context architecture, auditing, frameworks,
state management, 19 anti-patterns catalog.
10 agents. 8 skills. 6 workflows.
v2.0 Performance ██████████████████████████ NOW
Token profiling, boot audit, model routing,
injection optimization, compression benchmarks.
13 agents. 10 skills. 6 workflows.
v3.0 Orchestration ░░░░░░░░░░░░░░░░░░░░░░░░░ NEXT
Multi-agent delegation packages with scoped
context payloads. Pipeline routing across model
tiers. Context leak detection between agents.
Sprint token budgets with real-time burn tracking.
Production telemetry: measure what matters,
kill what doesn't. The performance engine
becomes self-correcting.
· · · · · · · · · · ·
Forged by l0z4n0 | squads.sh
First forged: 2026-03-17 | v2.0: 2026-03-17
<!-- context is fuel. this squad is the refinery. -->