Agent Steps Guide
Invoke AI agents (Claude, Codex, Gemini, OpenCode, OpenAI-Compatible) in your workflows with structured prompts and response parsing.
Overview
Agent steps allow you to integrate AI CLI tools into AWF workflows. Instead of shell commands, you define prompts as templates that get interpolated with workflow context and executed through provider-specific CLIs.
Benefits:
- Non-interactive: Suitable for CI/CD automation
- Stateless: Multi-turn conversations via state passing between steps
- Structured output: Automatic JSON parsing and token tracking
- Template interpolation: Access workflow context in prompts
Basic Usage
states:
initial: analyze
analyze:
type: agent
provider: claude
prompt: "Analyze this code: {{.inputs.code}}"
on_success: done
done:
type: terminalawf run workflow --input code="$(cat main.py)"Supported Providers
Claude (Anthropic)
Requires the claude CLI tool installed.
analyze:
type: agent
provider: claude
prompt: "Code review: {{.inputs.file_content}}"
options:
model: claude-sonnet-4-20250514
timeout: 120
on_success: nextProvider-Specific Options:
model: Claude model identifier (alias likesonnetor full name likeclaude-sonnet-4-20250514)
Codex (OpenAI)
Requires the codex CLI tool installed.
generate:
type: agent
provider: codex
prompt: "Generate a function to: {{.inputs.requirement}}"
timeout: 60
on_success: nextProvider-Specific Options:
model: Codex model identifierlanguage: Target programming languagequiet: Suppress progress output (boolean)
Gemini (Google)
Requires the gemini CLI tool installed.
summarize:
type: agent
provider: gemini
prompt: "Summarize: {{.inputs.text}}"
options:
model: gemini-pro
timeout: 60
on_success: nextOpenCode
Requires the opencode CLI tool installed.
refactor:
type: agent
provider: opencode
prompt: "Refactor this code for readability: {{.inputs.code}}"
timeout: 120
on_success: nextOpenAI-Compatible Provider
For any backend that implements the
Chat Completions API (OpenAI, Ollama, vLLM, Groq, LM Studio, etc.), use the openai_compatible provider. Unlike CLI-based providers, this sends HTTP requests directly — no CLI tool installation required.
analyze:
type: agent
provider: openai_compatible
prompt: "Analyze: {{.inputs.data}}"
options:
base_url: http://localhost:11434/v1
model: llama3
api_key: "{{.env.OPENAI_API_KEY}}"
timeout: 60
on_success: nextRequired Options:
base_url: Root URL of the API (e.g.,http://localhost:11434/v1). The provider appends/chat/completionsautomatically.model: Model identifier (e.g.,llama3,gpt-4o,mixtral)
Optional Options:
api_key: API key for authentication. Falls back toOPENAI_API_KEYenvironment variable if not set. Omit for local endpoints that don’t require auth (e.g., Ollama).temperature: Creativity level (0-2)max_completion_tokens: Maximum response tokens (preferred)max_tokens: Maximum response tokens (deprecated, usemax_completion_tokens)top_p: Nucleus sampling thresholdsystem_prompt: System message prepended to conversation (used inmode: conversation)
Token Tracking: Unlike CLI-based providers that estimate tokens from output length, openai_compatible reports actual token usage from the API response.
Example backends:
- Ollama:
base_url: http://localhost:11434/v1,model: llama3 - OpenAI:
base_url: https://api.openai.com/v1,model: gpt-4o - Groq:
base_url: https://api.groq.com/openai/v1,model: mixtral-8x7b-32768 - vLLM:
base_url: http://localhost:8000/v1,model: your-model
Prompt Templates
Prompts support full variable interpolation with access to workflow context:
review:
type: agent
provider: claude
prompt: |
Review this code file:
Path: {{.inputs.file_path}}
Language: {{.inputs.language}}
File content:
{{.inputs.file_content}}
Focus on:
- Performance issues
- Security vulnerabilities
- Code style violations
on_success: generate_reportAvailable Variables:
{{.inputs.*}}- Workflow input values{{.states.step_name.Output}}- Previous step raw output{{.states.step_name.Response}}- Previous step parsed JSON (heuristic){{.states.step_name.JSON}}- Parsed JSON fromoutput_format: json(explicit){{.env.VAR_NAME}}- Environment variables{{.workflow.id}}- Workflow execution ID{{.workflow.name}}- Workflow name
See Variable Interpolation Reference for complete details.
External Prompt Files
Instead of inlining prompts in YAML, you can load prompts from external Markdown files using the prompt_file field:
analyze:
type: agent
provider: claude
prompt_file: prompts/code_review.md
timeout: 120
on_success: doneFile: prompts/code_review.md
# Code Review Instructions
Analyze the following file for:
- Performance issues
- Security vulnerabilities
- Code style violations
## File Path
{{.inputs.file_path}}
## File Content
{{.inputs.file_content}}
## Language
{{.inputs.language}}Features
- Full Template Interpolation — Same variable access as inline prompts
- Helper Functions — String manipulation directly in templates
- Path Resolution — Relative paths resolve to workflow directory
- XDG Directory Support — Access system directories via
{{.awf.*}}
Mutual Exclusivity
You cannot specify both prompt and prompt_file on the same agent step:
# ❌ Invalid: both prompt and prompt_file
step:
type: agent
provider: claude
prompt: "Do this"
prompt_file: "prompts/template.md" # ERROR: only one allowed
# ✅ Valid: prompt only
step:
type: agent
provider: claude
prompt: "Do this"
# ✅ Valid: prompt_file only
step:
type: agent
provider: claude
prompt_file: "prompts/template.md"Path Resolution
Paths can be:
Relative to workflow directory:
prompt_file: prompts/analyze.md # Resolves to <workflow_dir>/prompts/analyze.mdAbsolute paths:
prompt_file: /home/user/my-prompts/template.mdHome directory expansion:
prompt_file: ~/my-prompts/template.md # Expands to user's home directoryXDG prompts directory with local override — via template interpolation with local-before-global resolution:
prompt_file: "{{.awf.prompts_dir}}/analyze.md" # Checks in order: # 1. <workflow_dir>/prompts/analyze.md (local override) # 2. ~/.config/awf/prompts/analyze.md (global fallback)
Local-Before-Global Resolution
When using {{.awf.prompts_dir}} in prompt_file, AWF prioritizes local project files over global ones:
- If local file exists at
<workflow_dir>/prompts/<suffix>→ use it - If local file missing → fall back to global
~/.config/awf/prompts/<suffix>
This enables shared prompts at the global level while allowing projects to override them locally:
# Workflow at: ~/myproject/.awf/workflows/review.yaml
analyze:
type: agent
provider: claude
prompt_file: "{{.awf.prompts_dir}}/code_review.md"
on_success: doneResolution order:
- Check
~/myproject/.awf/workflows/prompts/code_review.md(local override) - Check
~/.config/awf/prompts/code_review.md(global shared)
Template Helper Functions
When interpolating prompt templates, four helper functions are available:
split
Split a string into an array:
## Selected Agents
{{range split .states.select_agents.Output ","}}
- {{trimSpace .}}
{{end}}join
Join an array into a string:
Skills to use: {{join .states.available_skills.Output ", "}}readFile
Inline file contents (with 1MB size limit):
## Specification
{{readFile .states.get_spec.Output}}trimSpace
Remove leading/trailing whitespace:
Result: {{trimSpace .states.process.Output}}Example: Multi-File Workflow
Workflow: code-review.yaml
name: code-review
version: "1.0.0"
inputs:
- name: file_path
type: string
required: true
validation:
file_exists: true
- name: focus_areas
type: string
states:
initial: read_file
read_file:
type: step
command: cat "{{.inputs.file_path}}"
on_success: analyze
analyze:
type: agent
provider: claude
prompt_file: prompts/code_review.md
timeout: 120
on_success: done
done:
type: terminalTemplate: prompts/code_review.md
# Code Review
File: `{{.inputs.file_path}}`
Focus on:
{{.inputs.focus_areas}}
## Code to Review
{{.states.read_file.Output}}
Provide:
1. Issues found
2. Suggested fixes
3. Overall assessmentRun:
awf run code-review --input file_path=main.py --input focus_areas="Performance and security"Capturing Responses
Agent responses are automatically captured in the execution state:
| Field | Type | Description |
|---|---|---|
{{.states.step_name.Output}} | string | Raw response text (or cleaned text if output_format is set) |
{{.states.step_name.Response}} | object | Parsed JSON response (automatic heuristic) |
{{.states.step_name.JSON}} | object | Parsed JSON from output_format: json (explicit, see
Output Formatting) |
{{.states.step_name.TokensUsed}} | int | Tokens consumed by this step |
{{.states.step_name.ExitCode}} | int | 0 for success, non-zero for failure |
Accessing Raw Output
report_results:
type: step
command: echo "Agent said: {{.states.analyze.Output}}"
on_success: doneParsing JSON Responses
If an agent returns valid JSON, it’s automatically parsed:
# Agent returns: {"issues": ["bug1", "bug2"], "severity": "high"}
process_response:
type: step
command: echo "Found {{.states.analyze.Response.issues}} issues"
on_success: doneOutput Formatting
When an agent wraps its output in markdown code fences (common with many LLMs), use output_format to automatically strip the fences and optionally validate the content:
analyze:
type: agent
provider: claude
prompt: "Return JSON analysis"
output_format: json
on_success: processAvailable Formats
json Format
Strips markdown code fences and validates the output as valid JSON. Parsed JSON is accessible via {{.states.step_name.JSON}}:
analyze:
type: agent
provider: claude
prompt: |
Analyze the code and return results as JSON:
{
"issues": [<list of issues>],
"severity": "high|medium|low"
}
output_format: json
on_success: process_results
process_results:
type: step
command: echo "Severity: {{.states.analyze.JSON.severity}}"
on_success: doneBehavior:
- Strips outermost markdown code fences (e.g., ````json … ``` ``)
- Validates stripped content as valid JSON
- Stores parsed JSON in
{{.states.step_name.JSON}} - If validation fails, step fails with a descriptive error
- Works with both objects and arrays
Example agent output:
The analysis shows the following:
```json
{"issues": ["buffer overflow", "memory leak"], "severity": "high"}
**After processing:**
- `{{.states.analyze.Output}}` = `{"issues": ["buffer overflow", "memory leak"], "severity": "high"}`
- `{{.states.analyze.JSON.issues}}` = `["buffer overflow", "memory leak"]`
- `{{.states.analyze.JSON.severity}}` = `"high"`
#### `text` Format
Strips markdown code fences without JSON validation. Useful for code or plain text output:
```yaml
generate_code:
type: agent
provider: claude
prompt: "Generate a Python function to..."
output_format: text
on_success: save_code
save_code:
type: step
command: echo "{{.states.generate_code.Output}}" > generated.py
on_success: doneBehavior:
- Strips outermost markdown code fences (e.g., ````python … ``` ``)
- Returns clean text in
{{.states.step_name.Output}} - Does not populate
{{.states.step_name.JSON}}
Example agent output:
Here's the function:
```python
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
**After processing:**
- `{{.states.generate_code.Output}}` = `def fibonacci(n):\n if n <= 1:\n return n\n return fibonacci(n-1) + fibonacci(n-2)`
#### No Format (Default)
Omit `output_format` for backward compatibility. Raw agent output is stored unchanged:
```yaml
analyze:
type: agent
provider: claude
prompt: "Analyze this code"
on_success: nextError Handling
When output_format: json is specified but the output is invalid JSON:
analyze:
type: agent
provider: claude
prompt: "Return valid JSON"
output_format: json
timeout: 60
on_failure: handle_json_error
handle_json_error:
type: step
command: echo "JSON parsing failed"
on_success: doneError message includes:
- Clear indication of JSON validation failure
- First 200 characters of the malformed output (for debugging)
- Suggestions on how to fix the issue
Multi-Turn Conversations
There are two approaches for multi-turn conversations:
Chaining Steps (Manual State Passing)
For simple multi-turn workflows, chain agent steps with state passing:
name: code-review-conversation
version: "1.0.0"
inputs:
- name: code
type: string
required: true
states:
initial: initial_review
initial_review:
type: agent
provider: claude
prompt: |
Review this code for issues:
{{.inputs.code}}
on_success: ask_about_performance
ask_about_performance:
type: agent
provider: claude
prompt: |
Based on your previous analysis:
{{.states.initial_review.Output}}
Can you elaborate on performance concerns?
on_success: suggest_improvements
suggest_improvements:
type: agent
provider: claude
prompt: |
Based on the previous discussion, suggest 3 specific improvements to:
{{.inputs.code}}
on_success: done
done:
type: terminalEach step can reference previous agent outputs and build on the conversation without maintaining session state.
Conversation Mode (Built-In Multi-Turn)
For iterative refinement within a single step, use conversation mode with automatic context window management:
refine_code:
type: agent
provider: claude
mode: conversation
system_prompt: "You are a code reviewer. Iterate until code is approved."
initial_prompt: |
Review this code:
{{.inputs.code}}
conversation:
max_turns: 10
max_context_tokens: 100000
stop_condition: "response contains 'APPROVED'"
on_success: doneKey differences:
- Automatic turn management — No need to manually chain steps
- Context window handling — Automatically truncates old turns when token limit approached
- Stop conditions — Exit conversation early when specific condition met
- Single step — Simpler workflows for iterative refinement
See Conversation Mode Guide for detailed documentation, examples, and best practices.
Error Handling
Agent steps follow standard error handling:
analyze:
type: agent
provider: claude
prompt: "Review: {{.inputs.code}}"
timeout: 120
on_success: success_path
on_failure: error_path
retry:
max_attempts: 3
backoff: exponential
initial_delay: 2s
success_path:
type: terminal
error_path:
type: terminal
status: failureYou can also use inline error shorthand to avoid defining separate terminal states:
analyze:
type: agent
provider: claude
prompt: "Review: {{.inputs.code}}"
timeout: 120
on_success: done
on_failure: {message: "Agent analysis failed", status: 3}
done:
type: terminalSee Workflow Syntax — Inline Error Shorthand for full details.
Common Error Scenarios
| Error | Cause | Solution |
|---|---|---|
| Provider not found | CLI tool not installed | Install required CLI (e.g., claude install) |
| Timeout | Agent response took too long | Increase timeout or reduce prompt complexity |
| Invalid provider | Unsupported provider | Use claude, codex, gemini, opencode, or openai_compatible |
| Command failed | Provider CLI returned error | Check provider configuration and logs |
Debugging
Use --dry-run to preview resolved prompts without execution:
awf run workflow --dry-run
# Shows: [DRY RUN] Agent: claude
# Prompt: <resolved prompt text>Parallel Agent Execution
Run multiple agents concurrently:
parallel_analysis:
type: parallel
parallel:
- claude_review
- codex_suggest
strategy: all_succeed
on_success: aggregate
claude_review:
type: agent
provider: claude
prompt: "Analyze for security: {{.inputs.code}}"
codex_suggest:
type: agent
provider: codex
prompt: "Optimize performance: {{.inputs.code}}"
aggregate:
type: step
command: echo "Claude: {{.states.claude_review.Output}}\nCodex: {{.states.codex_suggest.Output}}"
on_success: doneToken Tracking
Some providers report token usage (useful for cost tracking):
analyze:
type: agent
provider: claude
prompt: "Review: {{.inputs.code}}"
options:
model: claude-sonnet-4-20250514
on_success: log_tokens
log_tokens:
type: step
command: echo "Tokens used: {{.states.analyze.TokensUsed}}"
on_success: doneNote: All agent providers (Claude, Gemini, Codex) report token usage in the TokensUsed field.
Best Practices
1. Keep Prompts Focused
Long, complex prompts may hit token limits or timeout. Break into multiple steps:
# ❌ Too much
ask_everything:
type: agent
provider: claude
prompt: |
Review code for security, performance, style, and suggest
improvements, then estimate refactoring effort...
# ✅ Better
security_review:
type: agent
provider: claude
prompt: "Security review: {{.inputs.code}}"
on_success: performance_review
performance_review:
type: agent
provider: claude
prompt: |
After this security review:
{{.states.security_review.Output}}
Now analyze performance: {{.inputs.code}}
on_success: done2. Use Consistent Formatting
Request structured output when relevant:
analyze:
type: agent
provider: claude
prompt: |
Analyze code and respond in JSON format:
{
"issues": [...],
"severity": "high|medium|low",
"estimate_hours": number
}
Code: {{.inputs.code}}
on_success: process_response3. Add Timeouts
Always set reasonable timeouts:
analyze:
type: agent
provider: claude
prompt: "Review: {{.inputs.code}}"
timeout: 120 # 2 minutes
on_success: next4. Test with Dry-Run
Preview prompts before running:
awf run my-workflow --dry-run --input file=/path/to/file5. Handle Missing Providers
Test that required providers are installed:
states:
initial: check_claude
check_claude:
type: step
command: which claude
on_success: analyze
on_failure: install_claude
analyze:
type: agent
provider: claude
prompt: "Review: {{.inputs.code}}"
on_success: done
install_claude:
type: terminal
status: failure
done:
type: terminalSee Also
- Workflow Syntax Reference - Complete agent step options
- Template Variables - Available interpolation variables
- Examples - More workflow examples