IMPORTANT: When writing tests, you MUST follow the guidelines in this document. These patterns ensure consistency, maintainability, and proper test coverage across the SDK.
This document contains comprehensive testing guidelines for the Strands TypeScript SDK. For general development guidance, see AGENTS.md.
All test fixtures are located in src/__fixtures__/. Use these helpers to reduce boilerplate and ensure consistency.
| Fixture | File | When to Use | Details |
|---|---|---|---|
MockMessageModel |
mock-message-model.ts |
Agent loop tests - specify content blocks, auto-generates stream events | Model Fixtures |
TestModelProvider |
model-test-helpers.ts |
Low-level model tests - precise control over individual ModelStreamEvent sequences |
Model Fixtures |
collectIterator() |
model-test-helpers.ts |
Collect all items from any async iterable into an array | Model Fixtures |
collectGenerator() |
model-test-helpers.ts |
Collect yielded items AND final return value from async generators | Model Fixtures |
MockHookProvider |
mock-hook-provider.ts |
Record and verify hook invocations during agent execution | Hook Fixtures |
createMockTool() |
tool-helpers.ts |
Create mock tools with custom result behavior | Tool Fixtures |
createRandomTool() |
tool-helpers.ts |
Create minimal mock tools when execution doesn't matter | Tool Fixtures |
createMockContext() |
tool-helpers.ts |
Create mock ToolContext for testing tool implementations directly |
Tool Fixtures |
createMockAgent() |
agent-helpers.ts |
Create minimal mock Agent with messages and state | Agent Fixtures |
expectAgentResult() |
agent-helpers.ts |
Assert on AgentResult with expected stop reason, message text, cycle count, and traces |
Agent Fixtures |
isNode / isBrowser |
environment.ts |
Environment detection for conditional test execution | Environment Fixtures |
MockSpan |
mock-span.ts |
Mock OTEL Span that records all setAttribute/addEvent/end calls for assertion | Telemetry Fixtures |
eventAttr() |
mock-span.ts |
Extract a string attribute from a mock span event | Telemetry Fixtures |
MockMeter |
mock-meter.ts |
Mock OTEL Meter that records all counter/histogram instrument calls for assertion | Telemetry Fixtures |
expectLoopMetrics() |
metrics-helpers.ts |
Assert on AgentMetrics with expected cycle count, tool names, and optional token usage |
Metrics Fixtures |
findMetricValue() |
metrics-helpers.ts |
Find the latest data point value for a named OTEL metric from ResourceMetrics | Metrics Fixtures |
Rule: Unit test files are co-located with source files, grouped in a directory named __tests__
src/subdir/
├── agent.ts # Source file
├── model.ts # Source file
└── __tests__/
├── agent.test.ts # Tests for agent.ts
└── model.test.ts # Tests for model.ts
Rule: Integration tests are separate in tests_integ/
tests_integ/
├── api.test.ts # Tests public API
└── environment.test.ts # Tests environment compatibility
File naming determines which environment(s) tests run in:
*.test.ts— runs in both Node.js and browser environments*.test.node.ts— runs only in Node.js environment*.test.browser.ts— runs only in browser environment
This naming convention applies to both unit tests (src/**/__tests__/) and integration tests (test/integ/).
Examples:
src/module/__tests__/
├── module.test.ts # Runs in Node.js AND browser
├── module.test.node.ts # Runs in Node.js only
└── module.test.browser.ts # Runs in browser only
Use environment-specific test files when tests depend on platform-specific features like filesystem access, environment variables, or browser APIs.
Follow this nested describe pattern for consistency:
import { describe, it, expect } from 'vitest'
import { functionName } from '../module'
describe('functionName', () => {
describe('when called with valid input', () => {
it('returns expected result', () => {
const result = functionName('input')
expect(result).toBe('expected')
})
})
describe('when called with edge case', () => {
it('handles gracefully', () => {
const result = functionName('')
expect(result).toBeDefined()
})
})
})import { describe, it, expect } from 'vitest'
import { ClassName } from '../module'
describe('ClassName', () => {
describe('methodName', () => {
it('returns expected result', () => {
const instance = new ClassName()
const result = instance.methodName()
expect(result).toBe('expected')
})
it('handles error case', () => {
const instance = new ClassName()
expect(() => instance.methodName()).toThrow()
})
})
describe('anotherMethod', () => {
it('performs expected action', () => {
// Test implementation
})
})
})- Top-level
describeuses the function/class name - Nested
describeblocks group related test scenarios - Use descriptive test names without "should" prefix
- Group tests by functionality or scenario
// Good: Clear, specific test
describe('calculateTotal', () => {
describe('when given valid numbers', () => {
it('returns the sum', () => {
expect(calculateTotal([1, 2, 3])).toBe(6)
})
})
describe('when given empty array', () => {
it('returns zero', () => {
expect(calculateTotal([])).toBe(0)
})
})
})
// Bad: Vague, unclear test
describe('calculateTotal', () => {
it('works', () => {
expect(calculateTotal([1, 2, 3])).toBeTruthy()
})
})Rule: When test setup cost exceeds test logic cost, you MUST batch related assertions into a single test.
You MUST batch when:
- Setup complexity > test logic complexity
- Multiple assertions verify the same object state
- Related behaviors share expensive context
You SHOULD keep separate tests for:
- Distinct behaviors or execution paths
- Error conditions
- Different input scenarios
Bad - Redundant setup:
it('has correct tool name', () => {
const tool = createComplexTool({
/* expensive setup */
})
expect(tool.toolName).toBe('testTool')
})
it('has correct description', () => {
const tool = createComplexTool({
/* same expensive setup */
})
expect(tool.description).toBe('Test description')
})Good - Batched properties:
it('creates tool with correct properties', () => {
const tool = createComplexTool({
/* setup once */
})
expect(tool.toolName).toBe('testTool')
expect(tool.description).toBe('Test description')
expect(tool.toolSpec.name).toBe('testTool')
})Prefer testing entire objects at once instead of individual properties for better readability and test coverage.
// ✅ Good: Verify entire object at once
it('returns expected user object', () => {
const user = getUser('123')
expect(user).toEqual({
id: '123',
name: 'John Doe',
email: 'john@example.com',
isActive: true,
})
})
// ✅ Good: Verify entire array of objects
it('yields expected stream events', async () => {
const events = await collectIterator(stream)
expect(events).toEqual([
{ type: 'streamEvent', data: 'Starting...' },
{ type: 'streamEvent', data: 'Processing...' },
{ type: 'streamEvent', data: 'Complete!' },
])
})
// ❌ Bad: Testing individual properties
it('returns expected user object', () => {
const user = getUser('123')
expect(user).toBeDefined()
expect(user.id).toBe('123')
expect(user.name).toBe('John Doe')
expect(user.email).toBe('john@example.com')
expect(user.isActive).toBe(true)
})
// ❌ Bad: Testing array elements individually in a loop
it('yields expected stream events', async () => {
const events = await collectIterator(stream)
for (const event of events) {
expect(event.type).toBe('streamEvent')
expect(event).toHaveProperty('data')
}
})Benefits of testing entire objects:
- More concise: Single assertion instead of multiple
- Better test coverage: Catches unexpected additional or missing properties
- More readable: Clear expectation of the entire structure
- Easier to maintain: Changes to the object require updating one place
Use cases:
- Always use
toEqual()for object and array comparisons - Use
toBe()only for primitive values and reference equality - When testing error objects, verify the entire structure including message and type
Testing Approach:
- You MUST write tests for implementations (functions, classes, methods)
- You SHOULD NOT write tests for interfaces since TypeScript compiler already enforces type correctness
- You SHOULD write Vitest type tests (
*.test-d.ts) for complex types to ensure backwards compatibility
Example Implementation Test:
describe('BedrockModel', () => {
it('streams messages correctly', async () => {
const provider = new BedrockModel(config)
const stream = provider.stream(messages)
for await (const event of stream) {
if (event.type === 'modelMessageStartEvent') {
expect(event.role).toBe('assistant')
}
}
})
})- Minimum: 80% coverage required (enforced by Vitest)
- Target: Aim for high coverage on critical paths
- Exclusions: Test files, config files, generated code
When to use each test provider:
MockMessageModel: For agent loop tests and high-level flows - focused on content blocksTestModelProvider: For low-level event streaming tests where you need precise control over individual events
For tests focused on messages, you SHOULD use MockMessageModel with a content-focused API that eliminates boilerplate:
import { MockMessageModel } from '../__fixtures__/mock-message-model'
// ✅ RECOMMENDED - Single content block (most common)
const provider = new MockMessageModel().addTurn({ type: 'textBlock', text: 'Hello' })
// ✅ RECOMMENDED - Array of content blocks
const provider = new MockMessageModel().addTurn([
{ type: 'textBlock', text: 'Let me help' },
{ type: 'toolUseBlock', name: 'calc', toolUseId: 'id-1', input: {} },
])
// ✅ RECOMMENDED - Multi-turn with builder pattern
const provider = new MockMessageModel()
.addTurn({ type: 'toolUseBlock', name: 'calc', toolUseId: 'id-1', input: {} }) // Auto-derives 'toolUse'
.addTurn({ type: 'textBlock', text: 'The answer is 42' }) // Auto-derives 'endTurn'
// ✅ OPTIONAL - Explicit stopReason when needed
const provider = new MockMessageModel().addTurn({ type: 'textBlock', text: 'Partial response' }, { stopReason: 'maxTokens' })
// ✅ OPTIONAL - Token usage metadata (emits modelMetadataEvent after message stop)
const provider = new MockMessageModel()
.addTurn({ type: 'toolUseBlock', name: 'calc', toolUseId: 'id-1', input: {} }, {
usage: { inputTokens: 100, outputTokens: 50, totalTokens: 150 },
})
.addTurn({ type: 'textBlock', text: 'Done' }, {
usage: { inputTokens: 200, outputTokens: 30, totalTokens: 230 },
})
// ✅ OPTIONAL - Error handling
const provider = new MockMessageModel()
.addTurn({ type: 'textBlock', text: 'Success' })
.addTurn(new Error('Model failed'))When testing hook behavior, you MUST use agent.hooks.addCallback() for registering single callbacks when agent.hooks is available. Do NOT create inline HookProvider objects — this is an anti-pattern for single callbacks.
// ✅ CORRECT - Use agent.hooks.addCallback() for single callbacks
const agent = new Agent({ model, tools: [tool] })
agent.hooks.addCallback(BeforeToolCallEvent, (event: BeforeToolCallEvent) => {
event.toolUse = {
...event.toolUse,
input: { value: 42 },
}
})
// ✅ CORRECT - Use MockHookProvider to record and verify hook invocations
const hookProvider = new MockHookProvider()
const agent = new Agent({ model, hooks: [hookProvider] })
await agent.invoke('Hi')
expect(hookProvider.invocations).toContainEqual(new BeforeInvocationEvent({ agent }))
// ❌ WRONG - Do NOT create inline HookProvider objects
const switchToolHook = {
registerCallbacks: (registry: HookRegistry) => {
registry.addCallback(BeforeToolCallEvent, (event: BeforeToolCallEvent) => {
if (event.toolUse.name === 'tool1') {
event.tool = tool2
}
})
},
}When to use each approach:
agent.hooks.addCallback()- For adding a single callback to verify hook behavior (e.g., modifying tool input, switching tools)MockHookProvider- For recording and verifying hook lifecycle behavior and that specific hook events fired during execution
All test fixtures are located in src/__fixtures__/. Use these helpers to reduce boilerplate and ensure consistency.
MockMessageModel- Content-focused model for agent loop tests. UseaddTurn()with content blocks.TestModelProvider- Low-level model for precise control overModelStreamEventsequences.collectIterator(stream)- Collects all items from an async iterable into an array.collectGenerator(generator)- Collects yielded items and final return value from an async generator.
// MockMessageModel for agent tests
const model = new MockMessageModel()
.addTurn({ type: 'toolUseBlock', name: 'calc', toolUseId: 'id-1', input: {} })
.addTurn({ type: 'textBlock', text: 'Done' })
// collectIterator for stream results
const events = await collectIterator(agent.stream('Hi'))MockHookProvider- Records all hook invocations for verification. Pass toAgent({ hooks: [provider] }).- Use
{ includeModelEvents: false }to exclude model streaming and result events from recordings. - Access
provider.invocationsto verify hook events fired.
- Use
// Record and verify hook invocations
const hookProvider = new MockHookProvider({ includeModelEvents: false })
const agent = new Agent({ model, hooks: [hookProvider] })
await agent.invoke('Hi')
expect(hookProvider.invocations[0]).toEqual(new BeforeInvocationEvent({ agent }))createMockTool(name, resultFn)- Creates a mock tool with custom result behavior.createRandomTool(name?)- Creates a minimal mock tool (use when tool execution doesn't matter).createMockContext(toolUse, agentState?)- Creates a mockToolContextfor testing tool implementations directly.
// Mock tool with custom result
const tool = createMockTool(
'calculator',
() => new ToolResultBlock({ toolUseId: 'id', status: 'success', content: [new TextBlock('42')] })
)
// Minimal tool when execution doesn't matter
const tool = createRandomTool('myTool')When to use fixtures vs FunctionTool directly:
Use createMockTool() or createRandomTool() when tools are incidental to the test. Use FunctionTool or tool() directly only when testing tool-specific behavior.
// ✅ Use fixtures when testing agent/hook behavior
const tool = createMockTool('testTool', () => ({
type: 'toolResultBlock',
toolUseId: 'tool-1',
status: 'success' as const,
content: [new TextBlock('Success')],
}))
const agent = new Agent({ model, tools: [tool] })
// ❌ Don't use FunctionTool when tool behavior is irrelevant to the test
const tool = new FunctionTool({ name: 'testTool', description: '...', inputSchema: {...}, callback: ... })createMockAgent(data?)- Creates a minimal mock Agent with messages and state. Use for testing components that need an Agent reference without full agent behavior.
const agent = createMockAgent({
messages: [new Message({ role: 'user', content: [new TextBlock('Hi')] })],
state: { key: 'value' },
})expectAgentResult(options)- Creates an asymmetric matcher that validatesAgentResultstructure and values. Reduces deeply nested assertions by providing a clean, readable matcher that combines stop reason, message text, metrics, and traces validation.
import { expectAgentResult } from '../__fixtures__/agent-helpers'
// ✅ RECOMMENDED - Clean, readable assertion
expect(result).toEqual(
expectAgentResult({
stopReason: 'endTurn',
messageText: 'Hello, world!',
cycleCount: 1,
traceCount: 1,
})
)
// ✅ With tools and detailed metrics
expect(result).toEqual(
expectAgentResult({
stopReason: 'endTurn',
messageText: 'The answer is 42',
cycleCount: 2,
toolNames: ['calculator'],
traceCount: 2,
usage: { inputTokens: 300, outputTokens: 80, totalTokens: 380 },
})
)
// ✅ For detailed trace structure validation, follow up with specific assertions
expect(result).toEqual(
expectAgentResult({
stopReason: 'endTurn',
messageText: 'Done',
cycleCount: 2,
toolNames: ['calc'],
})
)
// Verify detailed trace structure
expect(result.traces).toEqual([
expect.objectContaining({
name: 'Cycle 1',
children: expect.arrayContaining([
expect.objectContaining({ name: 'stream_messages' }),
expect.objectContaining({ name: 'Tool: calc' }),
]),
}),
expect.objectContaining({
name: 'Cycle 2',
children: expect.arrayContaining([expect.objectContaining({ name: 'stream_messages' })]),
}),
])
// ❌ AVOID - Deeply nested, hard to read
expect(result).toEqual(
expect.objectContaining({
stopReason: 'endTurn',
lastMessage: expect.objectContaining({
role: 'assistant',
content: expect.arrayContaining([expect.objectContaining({ type: 'textBlock', text: 'Hello' })]),
}),
metrics: expectLoopMetrics({ cycleCount: 1 }),
traces: expect.arrayContaining([expect.objectContaining({ name: 'Cycle 1' })]),
})
)Options:
stopReason(required) - Expected stop reason ('endTurn', 'toolUse', 'maxTokens')messageText(optional) - Expected text content in last assistant message's TextBlock. When omitted, only validates message exists with role 'assistant'cycleCount(required) - Expected number of agent loop cyclestraceCount(optional) - Expected exact number of traces. When omitted, validates at least one trace existstoolNames(optional) - Expected tool names that were invokedusage(optional) - Expected token usage. When omitted, validates shape withexpect.any(Number)
isNode- Boolean that detects if running in Node.js environment.isBrowser- Boolean that detects if running in a browser environment.
Use these for conditional test execution when tests depend on environment-specific features.
import { isNode } from '../__fixtures__/environment'
// Skip tests that require Node.js features in browser
describe.skipIf(!isNode)('Node.js specific features', () => {
it('uses environment variables', () => {
expect(process.env.NODE_ENV).toBeDefined()
})
})MockSpan- Implements the OTELSpaninterface and records all calls (setAttribute,addEvent,setStatus,end,recordException) for assertion. Use withvi.mock('@opentelemetry/api')to intercept tracer span creation.- Access
mockSpan.calls.setAttributeetc. to verify recorded calls. - Use
mockSpan.getAttributeValue(key)to look up a specific attribute. - Use
mockSpan.getEvents(name)to filter events by name.
- Access
eventAttr(event, key)- Extracts a string attribute from a mock span event's attributes map.MockMeter- Implements the OTELMeterinterface and records all instrument data points. Use withvi.spyOn(otelMetrics, 'getMeter').mockReturnValue(mockMeter)to intercept meter creation.- Use
mockMeter.getCounter(name)to retrieve a counter by metric name. - Use
mockMeter.getHistogram(name)to retrieve a histogram by metric name. - Counters and histograms expose
.dataPoints(array of{ value, attributes }) and.sum(total of all values).
- Use
import { MockSpan, eventAttr } from '../__fixtures__/mock-span'
// Mock the OTEL API and inject MockSpan
const mockSpan = new MockSpan()
const mockStartSpan = vi.fn().mockReturnValue(mockSpan)
vi.mocked(trace.getTracer).mockReturnValue({ startSpan: mockStartSpan, startActiveSpan: vi.fn() })
// Assert on span attributes and events
expect(mockSpan.getAttributeValue('gen_ai.agent.name')).toBe('test-agent')
expect(mockSpan.getEvents('gen_ai.user.message')).toHaveLength(1)
expect(eventAttr(mockSpan.getEvents('gen_ai.choice')[0]!, 'finish_reason')).toBe('end_turn')import { MockMeter } from '../__fixtures__/mock-meter'
// Mock the OTEL API and inject MockMeter
const mockMeter = new MockMeter()
vi.spyOn(otelMetrics, 'getMeter').mockReturnValue(mockMeter)
const m = new Meter()
m.startNewInvocation()
// Assert on collected metric values
expect(mockMeter.getCounter('gen_ai.agent.invocation.count')?.sum).toBe(1)
expect(mockMeter.getHistogram('gen_ai.agent.cycle.duration')?.sum).toBe(2000)
expect(mockMeter.getCounter('gen_ai.agent.tool.call.count')?.dataPoints).toStrictEqual([
{ value: 1, attributes: { 'gen_ai.tool.name': 'search' } },
])expectLoopMetrics({ cycleCount, toolNames?, usage? })- Creates an asymmetric matcher that validatesAgentMetricsstructure and values. Whenusageis provided, asserts exact token counts. When omitted, falls back to shape-level assertions withexpect.any(Number).findMetricValue(resourceMetrics, metricName)- Flattens the OTEL ResourceMetrics → ScopeMetrics → MetricData hierarchy and returns the value of the last data point for the matching metric name. Returnsundefinedif not found.
import { expectLoopMetrics } from '../__fixtures__/metrics-helpers'
// Shape-level assertion (no concrete token counts)
expect(result).toEqual(
new AgentResult({
stopReason: 'endTurn',
lastMessage: expect.objectContaining({ role: 'assistant' }),
metrics: expectLoopMetrics({ cycleCount: 1 }),
})
)
// With tool names
expect(result).toEqual(
new AgentResult({
stopReason: 'endTurn',
lastMessage: expect.objectContaining({ role: 'assistant' }),
metrics: expectLoopMetrics({ cycleCount: 2, toolNames: ['calc'] }),
})
)
// With concrete token usage (pair with MockMessageModel usage param)
expect(result).toEqual(
new AgentResult({
stopReason: 'endTurn',
lastMessage: expect.objectContaining({ role: 'assistant' }),
metrics: expectLoopMetrics({
cycleCount: 2,
toolNames: ['calc'],
usage: { inputTokens: 300, outputTokens: 80, totalTokens: 380 },
}),
})
)import { findMetricValue } from '../__fixtures__/metrics-helpers'
// Find a counter value from OTEL InMemoryMetricExporter output
const cycleCount = findMetricValue(metricExporter.getMetrics(), 'gen_ai.agent.cycle.count')
expect(cycleCount).toBeGreaterThanOrEqual(1)
// Check a histogram was emitted
const duration = findMetricValue(metrics, 'gen_ai.agent.cycle.duration')
expect(duration).toBeDefined()The SDK is designed to work seamlessly in both Node.js and browser environments. Our test suite validates this by running tests in both environments using Vitest's browser mode with Playwright.
The test suite is organized into three projects:
- unit-node (green): Unit tests running in Node.js environment
- unit-browser (cyan): Same unit tests running in Chromium browser
- integ (magenta): Integration tests running in Node.js
- You MUST write tests that are environment-agnostic unless they depend on Node.js features like filesystem or env-vars
Some tests require Node.js-specific features (like process.env, AWS SDK) and should be skipped in browser environments:
import { describe, it, expect } from 'vitest'
import { isNode } from '../__fixtures__/environment'
// Tests will run in Node.js, skip in browser
describe.skipIf(!isNode)('Node.js specific features', () => {
it('uses environment variables', () => {
// This test accesses process.env
expect(process.env.NODE_ENV).toBeDefined()
})
})When stubbing environment variables with vi.stubEnv(), you do not need to wrap calls in if (isNode) conditions. Vitest handles this automatically across environments, and the vitest config has unstubEnvs: true which restores env vars after each test.
// ✅ CORRECT - No condition needed
beforeEach(() => {
vi.stubEnv('API_KEY', 'test-key')
})
// ❌ WRONG - Unnecessary condition
beforeEach(() => {
if (isNode) {
vi.stubEnv('API_KEY', 'test-key')
}
})Similarly, you do not need to call vi.unstubAllEnvs() in afterEach since the vitest config handles this automatically.
npm test # Run unit tests in Node.js
npm run test:browser # Run unit tests in browser (Chromium via Playwright)
npm run test:all # Run all tests in all environments
npm run test:integ # Run integration tests
npm run test:coverage # Run tests with coverage reportFor detailed command usage, see CONTRIBUTING.md - Testing Instructions.
- Do the tests use relevant helpers from
src/__fixtures__/as noted in the "Test Fixtures Quick Reference" table above? - Are recurring code or patterns extracted to functions for better usability/readability?
- Are tests focused on verifying one or two things only?
- Are tests written concisely enough that the bulk of each test is important to the test instead of boilerplate code?
- Are tests asserting on the entire object instead of specific fields?