๐Ÿšง Coming Soon โ€” This extension is under development. Star the repo to get notified!
๐Ÿงช

Testing & Simulation

Test your agents with realistic scenarios, security audits, and sandbox simulation before deploying to production.

Testing Types

๐ŸŽฏ Scenario Testing

Run your agent against specific test scenarios with expected outcomes

๐Ÿ“ฆ Sandbox Simulation

Execute agent in an isolated environment with mock data

๐Ÿ”’ Security Audit

Check for vulnerabilities, PII exposure, and security issues

โšก Performance Testing

Measure execution time, memory usage, and resource consumption

Running Tests

1 Basic Test Command

User: @agentos test

Runs the default test suite against your currently open agent code.

2 Test with Scenario

User: @agentos test this agent with high-load scenario
AgentOS: ๐Ÿงช Running Test: high-load scenario

Scenario Configuration:
- Concurrent requests: 100
- Duration: 60 seconds
- Data volume: 10,000 records

Test Results:
โœ… All 100 concurrent requests completed
โœ… Average response time: 245ms
โœ… No policy violations detected
โœ… Memory usage stayed within limits (85MB peak)

Performance Metrics:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Metric          โ”‚ Value     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Requests/sec    โ”‚ 156       โ”‚
โ”‚ Avg latency     โ”‚ 245ms     โ”‚
โ”‚ P95 latency     โ”‚ 412ms     โ”‚
โ”‚ Error rate      โ”‚ 0%        โ”‚
โ”‚ Peak memory     โ”‚ 85MB      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โœ… Test Passed - Agent is ready for deployment

Scenario Testing

Test your agent against specific scenarios:

User: @agentos test this agent with malformed-input scenario
โœ… Test Passed: Malformed Input Handling

Agent correctly rejected invalid JSON input with proper error message. No crash or unexpected behavior.

Available Test Scenarios

  • high-load - Simulates 100 concurrent requests
  • malformed-input - Tests input validation
  • api-failure - Simulates external API failures
  • rate-limiting - Tests rate limit handling
  • data-edge-cases - Tests with edge case data
  • timeout - Tests timeout handling
  • auth-failure - Tests authentication failure handling

Sandbox Simulation

Run your agent in an isolated environment with mock data:

User: @agentos simulate this agent
AgentOS: ๐Ÿ“ฆ Starting Sandbox Simulation

Sandbox Environment:
- Isolated container: sandbox-7f3a9b
- Mock API endpoints: Active
- Test database: Seeded with 1,000 records

Simulation Log:
[00:00.000] Agent initialized
[00:00.125] Connected to mock API
[00:00.250] Fetched 50 records from database
[00:00.380] Processing record batch #1
[00:01.200] Completed processing 50 records
[00:01.250] Posted results to mock webhook
[00:01.300] Agent completed successfully

Actions Taken:
- 1 API call (GET /data)
- 50 database reads
- 1 webhook POST

Policy Checks:
โœ… No unauthorized API calls
โœ… Data stayed within sandbox
โœ… No PII leaked to logs

โœ… Simulation Successful
๐Ÿ’ก Tip: Sandbox simulation uses mock endpoints that mimic real APIs. Your actual credentials and production data are never used during simulation.

Security Audit

Run a comprehensive security audit on your agent:

User: @agentos security
AgentOS: ๐Ÿ”’ Security Audit Results

Scan Completed: 2.3 seconds

Vulnerability Check:
โœ… No hardcoded credentials found
โœ… No SQL injection vulnerabilities
โœ… No command injection risks
โš ๏ธ 1 potential XSS vector (line 45)

PII Detection:
๐Ÿ“ง Email patterns: 2 instances
   - Line 23: user_email variable
   - Line 67: email in log statement
๐Ÿ’ณ Credit card patterns: 0 instances
๐Ÿ”ข SSN patterns: 0 instances

Secret Scanning:
โœ… No API keys in code
โœ… No passwords in code
โœ… No tokens in code

Security Score: 85/100

Recommendations:
1. Line 45: Sanitize user input before rendering
2. Line 67: Remove email from log statements
3. Consider adding rate limiting to prevent abuse
โš ๏ธ Important: Security audits should be run before every deployment. Agents with critical vulnerabilities will be blocked from deployment.

Test Result Cards

Understanding test results:

โœ… Test Passed

All assertions passed. Agent behaved as expected.

โŒ Test Failed

Expected status 200, got 500. Check error handling for API failures.

โš ๏ธ Test Warning

Test passed but performance was below threshold (expected <200ms, got 450ms).

Continuous Testing

Set up automatic testing for your agent:

# .github/workflows/agent-tests.yml
name: Agent Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run AgentOS Tests
        uses: agentos/test-action@v1
        with:
          scenarios: high-load,malformed-input
          security-audit: true
          compliance-check: gdpr