🧪

Testing & Simulation

Test your agents with realistic scenarios, security audits, and sandbox simulation before deploying to production.

Testing Types

🎯 Scenario Testing

Run your agent against specific test scenarios with expected outcomes

📦 Sandbox Simulation

Execute agent in an isolated environment with mock data

🔒 Security Audit

Check for vulnerabilities, PII exposure, and security issues

⚡ Performance Testing

Measure execution time, memory usage, and resource consumption

Running Tests

1 Basic Test Command

User: @agentos test

Runs the default test suite against your currently open agent code.

2 Test with Scenario

User: @agentos test this agent with high-load scenario

AgentOS: 🧪 Running Test: high-load scenario

Scenario Configuration:
- Concurrent requests: 100
- Duration: 60 seconds
- Data volume: 10,000 records

Test Results:
✅ All 100 concurrent requests completed
✅ Average response time: 245ms
✅ No policy violations detected
✅ Memory usage stayed within limits (85MB peak)

Performance Metrics:
┌─────────────────┬───────────┐
│ Metric          │ Value     │
├─────────────────┼───────────┤
│ Requests/sec    │ 156       │
│ Avg latency     │ 245ms     │
│ P95 latency     │ 412ms     │
│ Error rate      │ 0%        │
│ Peak memory     │ 85MB      │
└─────────────────┴───────────┘

✅ Test Passed - Agent is ready for deployment

Scenario Testing

Test your agent against specific scenarios:

User: @agentos test this agent with malformed-input scenario

✅ Test Passed: Malformed Input Handling

Agent correctly rejected invalid JSON input with proper error message. No crash or unexpected behavior.

Available Test Scenarios

high-load - Simulates 100 concurrent requests
malformed-input - Tests input validation
api-failure - Simulates external API failures
rate-limiting - Tests rate limit handling
data-edge-cases - Tests with edge case data
timeout - Tests timeout handling
auth-failure - Tests authentication failure handling

Sandbox Simulation

Run your agent in an isolated environment with mock data:

User: @agentos simulate this agent

AgentOS: 📦 Starting Sandbox Simulation

Sandbox Environment:
- Isolated container: sandbox-7f3a9b
- Mock API endpoints: Active
- Test database: Seeded with 1,000 records

Simulation Log:
[00:00.000] Agent initialized
[00:00.125] Connected to mock API
[00:00.250] Fetched 50 records from database
[00:00.380] Processing record batch #1
[00:01.200] Completed processing 50 records
[00:01.250] Posted results to mock webhook
[00:01.300] Agent completed successfully

Actions Taken:
- 1 API call (GET /data)
- 50 database reads
- 1 webhook POST

Policy Checks:
✅ No unauthorized API calls
✅ Data stayed within sandbox
✅ No PII leaked to logs

✅ Simulation Successful

💡 Tip: Sandbox simulation uses mock endpoints that mimic real APIs. Your actual credentials and production data are never used during simulation.

Security Audit

Run a comprehensive security audit on your agent:

User: @agentos security

AgentOS: 🔒 Security Audit Results

Scan Completed: 2.3 seconds

Vulnerability Check:
✅ No hardcoded credentials found
✅ No SQL injection vulnerabilities
✅ No command injection risks
⚠️ 1 potential XSS vector (line 45)

PII Detection:
📧 Email patterns: 2 instances
   - Line 23: user_email variable
   - Line 67: email in log statement
💳 Credit card patterns: 0 instances
🔢 SSN patterns: 0 instances

Secret Scanning:
✅ No API keys in code
✅ No passwords in code
✅ No tokens in code

Security Score: 85/100

Recommendations:
1. Line 45: Sanitize user input before rendering
2. Line 67: Remove email from log statements
3. Consider adding rate limiting to prevent abuse

⚠️ Important: Security audits should be run before every deployment. Agents with critical vulnerabilities will be blocked from deployment.

Test Result Cards

Understanding test results:

✅ Test Passed

All assertions passed. Agent behaved as expected.

❌ Test Failed

Expected status 200, got 500. Check error handling for API failures.

⚠️ Test Warning

Test passed but performance was below threshold (expected <200ms, got 450ms).

Continuous Testing

Set up automatic testing for your agent:

# .github/workflows/agent-tests.yml
name: Agent Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run AgentOS Tests
        uses: agentos/test-action@v1
        with:
          scenarios: high-load,malformed-input
          security-audit: true
          compliance-check: gdpr

← Compliance Frameworks Debugging & Optimization →