Testing & Simulation
Test your agents with realistic scenarios, security audits, and sandbox simulation before deploying to production.
Testing Types
๐ฏ Scenario Testing
Run your agent against specific test scenarios with expected outcomes
๐ฆ Sandbox Simulation
Execute agent in an isolated environment with mock data
๐ Security Audit
Check for vulnerabilities, PII exposure, and security issues
โก Performance Testing
Measure execution time, memory usage, and resource consumption
Running Tests
1 Basic Test Command
Runs the default test suite against your currently open agent code.
2 Test with Scenario
AgentOS: ๐งช Running Test: high-load scenario
Scenario Configuration:
- Concurrent requests: 100
- Duration: 60 seconds
- Data volume: 10,000 records
Test Results:
โ
All 100 concurrent requests completed
โ
Average response time: 245ms
โ
No policy violations detected
โ
Memory usage stayed within limits (85MB peak)
Performance Metrics:
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโ
โ Metric โ Value โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโค
โ Requests/sec โ 156 โ
โ Avg latency โ 245ms โ
โ P95 latency โ 412ms โ
โ Error rate โ 0% โ
โ Peak memory โ 85MB โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโ
โ
Test Passed - Agent is ready for deployment
Scenario Testing
Test your agent against specific scenarios:
Agent correctly rejected invalid JSON input with proper error message. No crash or unexpected behavior.
Available Test Scenarios
- high-load - Simulates 100 concurrent requests
- malformed-input - Tests input validation
- api-failure - Simulates external API failures
- rate-limiting - Tests rate limit handling
- data-edge-cases - Tests with edge case data
- timeout - Tests timeout handling
- auth-failure - Tests authentication failure handling
Sandbox Simulation
Run your agent in an isolated environment with mock data:
AgentOS: ๐ฆ Starting Sandbox Simulation
Sandbox Environment:
- Isolated container: sandbox-7f3a9b
- Mock API endpoints: Active
- Test database: Seeded with 1,000 records
Simulation Log:
[00:00.000] Agent initialized
[00:00.125] Connected to mock API
[00:00.250] Fetched 50 records from database
[00:00.380] Processing record batch #1
[00:01.200] Completed processing 50 records
[00:01.250] Posted results to mock webhook
[00:01.300] Agent completed successfully
Actions Taken:
- 1 API call (GET /data)
- 50 database reads
- 1 webhook POST
Policy Checks:
โ
No unauthorized API calls
โ
Data stayed within sandbox
โ
No PII leaked to logs
โ
Simulation Successful
Security Audit
Run a comprehensive security audit on your agent:
AgentOS: ๐ Security Audit Results
Scan Completed: 2.3 seconds
Vulnerability Check:
โ
No hardcoded credentials found
โ
No SQL injection vulnerabilities
โ
No command injection risks
โ ๏ธ 1 potential XSS vector (line 45)
PII Detection:
๐ง Email patterns: 2 instances
- Line 23: user_email variable
- Line 67: email in log statement
๐ณ Credit card patterns: 0 instances
๐ข SSN patterns: 0 instances
Secret Scanning:
โ
No API keys in code
โ
No passwords in code
โ
No tokens in code
Security Score: 85/100
Recommendations:
1. Line 45: Sanitize user input before rendering
2. Line 67: Remove email from log statements
3. Consider adding rate limiting to prevent abuse
Test Result Cards
Understanding test results:
All assertions passed. Agent behaved as expected.
Expected status 200, got 500. Check error handling for API failures.
Test passed but performance was below threshold (expected <200ms, got 450ms).
Continuous Testing
Set up automatic testing for your agent:
# .github/workflows/agent-tests.yml
name: Agent Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run AgentOS Tests
uses: agentos/test-action@v1
with:
scenarios: high-load,malformed-input
security-audit: true
compliance-check: gdpr