Agent Steps Guide

Invoke AI agents (Claude, Codex, Gemini, OpenCode, OpenAI-Compatible) in your workflows with structured prompts and response parsing.

Overview

Agent steps allow you to integrate AI CLI tools into AWF workflows. Instead of shell commands, you define prompts as templates that get interpolated with workflow context and executed through provider-specific CLIs.

Benefits:

Non-interactive: Suitable for CI/CD automation
Stateless: Multi-turn conversations via state passing between steps
Structured output: Automatic JSON parsing and token tracking
Template interpolation: Access workflow context in prompts

Basic Usage

states:
  initial: analyze

  analyze:
    type: agent
    provider: claude
    prompt: "Analyze this code: {{.inputs.code}}"
    on_success: done

  done:
    type: terminal

awf run workflow --input code="$(cat main.py)"

Supported Providers

Claude (Anthropic)

Requires the claude CLI tool installed.

analyze:
  type: agent
  provider: claude
  prompt: "Code review: {{.inputs.file_content}}"
  options:
    model: claude-sonnet-4-20250514
  timeout: 120
  on_success: next

Provider-Specific Options:

model: Claude model identifier (alias like sonnet or full name like claude-sonnet-4-20250514)

Codex (OpenAI)

Requires the codex CLI tool installed.

generate:
  type: agent
  provider: codex
  prompt: "Generate a function to: {{.inputs.requirement}}"
  timeout: 60
  on_success: next

Provider-Specific Options:

model: Codex model identifier
language: Target programming language
quiet: Suppress progress output (boolean)

Gemini (Google)

Requires the gemini CLI tool installed.

summarize:
  type: agent
  provider: gemini
  prompt: "Summarize: {{.inputs.text}}"
  options:
    model: gemini-pro
  timeout: 60
  on_success: next

OpenCode

Requires the opencode CLI tool installed.

refactor:
  type: agent
  provider: opencode
  prompt: "Refactor this code for readability: {{.inputs.code}}"
  timeout: 120
  on_success: next

OpenAI-Compatible Provider

For any backend that implements the Chat Completions API (OpenAI, Ollama, vLLM, Groq, LM Studio, etc.), use the openai_compatible provider. Unlike CLI-based providers, this sends HTTP requests directly — no CLI tool installation required.

analyze:
  type: agent
  provider: openai_compatible
  prompt: "Analyze: {{.inputs.data}}"
  options:
    base_url: http://localhost:11434/v1
    model: llama3
    api_key: "{{.env.OPENAI_API_KEY}}"
  timeout: 60
  on_success: next

Required Options:

base_url: Root URL of the API (e.g., http://localhost:11434/v1). The provider appends /chat/completions automatically.
model: Model identifier (e.g., llama3, gpt-4o, mixtral)

Optional Options:

api_key: API key for authentication. Falls back to OPENAI_API_KEY environment variable if not set. Omit for local endpoints that don’t require auth (e.g., Ollama).
temperature: Creativity level (0-2)
max_completion_tokens: Maximum response tokens (preferred)
max_tokens: Maximum response tokens (deprecated, use max_completion_tokens)
top_p: Nucleus sampling threshold
system_prompt: System message prepended to conversation (used in mode: conversation)

Token Tracking: Unlike CLI-based providers that estimate tokens from output length, openai_compatible reports actual token usage from the API response.

Example backends:

Ollama: base_url: http://localhost:11434/v1, model: llama3
OpenAI: base_url: https://api.openai.com/v1, model: gpt-4o
Groq: base_url: https://api.groq.com/openai/v1, model: mixtral-8x7b-32768
vLLM: base_url: http://localhost:8000/v1, model: your-model

Prompt Templates

Prompts support full variable interpolation with access to workflow context:

review:
  type: agent
  provider: claude
  prompt: |
    Review this code file:
    Path: {{.inputs.file_path}}
    Language: {{.inputs.language}}

    File content:
    {{.inputs.file_content}}

    Focus on:
    - Performance issues
    - Security vulnerabilities
    - Code style violations
  on_success: generate_report

Available Variables:

{{.inputs.*}} - Workflow input values
{{.states.step_name.Output}} - Previous step raw output
{{.states.step_name.Response}} - Previous step parsed JSON (heuristic)
{{.states.step_name.JSON}} - Parsed JSON from output_format: json (explicit)
{{.env.VAR_NAME}} - Environment variables
{{.workflow.id}} - Workflow execution ID
{{.workflow.name}} - Workflow name

See Variable Interpolation Reference for complete details.

External Prompt Files

Instead of inlining prompts in YAML, you can load prompts from external Markdown files using the prompt_file field:

analyze:
  type: agent
  provider: claude
  prompt_file: prompts/code_review.md
  timeout: 120
  on_success: done

File: prompts/code_review.md

# Code Review Instructions

Analyze the following file for:
- Performance issues
- Security vulnerabilities
- Code style violations

## File Path
{{.inputs.file_path}}

## File Content
{{.inputs.file_content}}

## Language
{{.inputs.language}}

Features

Full Template Interpolation — Same variable access as inline prompts
Helper Functions — String manipulation directly in templates
Path Resolution — Relative paths resolve to workflow directory
XDG Directory Support — Access system directories via {{.awf.*}}

Mutual Exclusivity

You cannot specify both prompt and prompt_file on the same agent step:

# ❌ Invalid: both prompt and prompt_file
step:
  type: agent
  provider: claude
  prompt: "Do this"
  prompt_file: "prompts/template.md"  # ERROR: only one allowed

# ✅ Valid: prompt only
step:
  type: agent
  provider: claude
  prompt: "Do this"

# ✅ Valid: prompt_file only
step:
  type: agent
  provider: claude
  prompt_file: "prompts/template.md"

Path Resolution

Paths can be:

Relative to workflow directory:

prompt_file: prompts/analyze.md           # Resolves to <workflow_dir>/prompts/analyze.md

Absolute paths:

prompt_file: /home/user/my-prompts/template.md

Home directory expansion:

prompt_file: ~/my-prompts/template.md      # Expands to user's home directory

XDG prompts directory with local override — via template interpolation with local-before-global resolution:

prompt_file: "{{.awf.prompts_dir}}/analyze.md"
# Checks in order:
# 1. <workflow_dir>/prompts/analyze.md (local override)
# 2. ~/.config/awf/prompts/analyze.md (global fallback)

Local-Before-Global Resolution

When using {{.awf.prompts_dir}} in prompt_file, AWF prioritizes local project files over global ones:

If local file exists at <workflow_dir>/prompts/<suffix> → use it
If local file missing → fall back to global ~/.config/awf/prompts/<suffix>

This enables shared prompts at the global level while allowing projects to override them locally:

# Workflow at: ~/myproject/.awf/workflows/review.yaml
analyze:
  type: agent
  provider: claude
  prompt_file: "{{.awf.prompts_dir}}/code_review.md"
  on_success: done

Resolution order:

Check ~/myproject/.awf/workflows/prompts/code_review.md (local override)
Check ~/.config/awf/prompts/code_review.md (global shared)

Template Helper Functions

When interpolating prompt templates, four helper functions are available:

`split`

Split a string into an array:

## Selected Agents

{{range split .states.select_agents.Output ","}}
- {{trimSpace .}}
{{end}}

`join`

Join an array into a string:

Skills to use: {{join .states.available_skills.Output ", "}}

`readFile`

Inline file contents (with 1MB size limit):

## Specification

{{readFile .states.get_spec.Output}}

`trimSpace`

Remove leading/trailing whitespace:

Result: {{trimSpace .states.process.Output}}

Example: Multi-File Workflow

Workflow: code-review.yaml

name: code-review
version: "1.0.0"

inputs:
  - name: file_path
    type: string
    required: true
    validation:
      file_exists: true
  - name: focus_areas
    type: string

states:
  initial: read_file

  read_file:
    type: step
    command: cat "{{.inputs.file_path}}"
    on_success: analyze

  analyze:
    type: agent
    provider: claude
    prompt_file: prompts/code_review.md
    timeout: 120
    on_success: done

  done:
    type: terminal

Template: prompts/code_review.md

# Code Review

File: `{{.inputs.file_path}}`

Focus on:
{{.inputs.focus_areas}}

## Code to Review

{{.states.read_file.Output}}

Provide:
1. Issues found
2. Suggested fixes
3. Overall assessment

Run:

awf run code-review --input file_path=main.py --input focus_areas="Performance and security"

Capturing Responses

Agent responses are automatically captured in the execution state:

Field	Type	Description
`{{.states.step_name.Output}}`	string	Raw response text (or cleaned text if `output_format` is set)
`{{.states.step_name.Response}}`	object	Parsed JSON response (automatic heuristic)
`{{.states.step_name.JSON}}`	object	Parsed JSON from `output_format: json` (explicit, see Output Formatting)
`{{.states.step_name.TokensUsed}}`	int	Tokens consumed by this step
`{{.states.step_name.ExitCode}}`	int	0 for success, non-zero for failure

Accessing Raw Output

report_results:
  type: step
  command: echo "Agent said: {{.states.analyze.Output}}"
  on_success: done

Parsing JSON Responses

If an agent returns valid JSON, it’s automatically parsed:

# Agent returns: {"issues": ["bug1", "bug2"], "severity": "high"}

process_response:
  type: step
  command: echo "Found {{.states.analyze.Response.issues}} issues"
  on_success: done

Output Formatting

When an agent wraps its output in markdown code fences (common with many LLMs), use output_format to automatically strip the fences and optionally validate the content:

analyze:
  type: agent
  provider: claude
  prompt: "Return JSON analysis"
  output_format: json
  on_success: process

Available Formats

`json` Format

Strips markdown code fences and validates the output as valid JSON. Parsed JSON is accessible via {{.states.step_name.JSON}}:

analyze:
  type: agent
  provider: claude
  prompt: |
    Analyze the code and return results as JSON:
    {
      "issues": [<list of issues>],
      "severity": "high|medium|low"
    }
  output_format: json
  on_success: process_results

process_results:
  type: step
  command: echo "Severity: {{.states.analyze.JSON.severity}}"
  on_success: done

Behavior:

Strips outermost markdown code fences (e.g., ````json … ``` ``)
Validates stripped content as valid JSON
Stores parsed JSON in {{.states.step_name.JSON}}
If validation fails, step fails with a descriptive error
Works with both objects and arrays

Example agent output:

The analysis shows the following:
```json
{"issues": ["buffer overflow", "memory leak"], "severity": "high"}


**After processing:**
- `{{.states.analyze.Output}}` = `{"issues": ["buffer overflow", "memory leak"], "severity": "high"}`
- `{{.states.analyze.JSON.issues}}` = `["buffer overflow", "memory leak"]`
- `{{.states.analyze.JSON.severity}}` = `"high"`

#### `text` Format

Strips markdown code fences without JSON validation. Useful for code or plain text output:

```yaml
generate_code:
  type: agent
  provider: claude
  prompt: "Generate a Python function to..."
  output_format: text
  on_success: save_code

save_code:
  type: step
  command: echo "{{.states.generate_code.Output}}" > generated.py
  on_success: done

Behavior:

Strips outermost markdown code fences (e.g., ````python … ``` ``)
Returns clean text in {{.states.step_name.Output}}
Does not populate {{.states.step_name.JSON}}

Example agent output:

Here's the function:
```python
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)


**After processing:**
- `{{.states.generate_code.Output}}` = `def fibonacci(n):\n    if n <= 1:\n        return n\n    return fibonacci(n-1) + fibonacci(n-2)`

#### No Format (Default)

Omit `output_format` for backward compatibility. Raw agent output is stored unchanged:

```yaml
analyze:
  type: agent
  provider: claude
  prompt: "Analyze this code"
  on_success: next

Error Handling

When output_format: json is specified but the output is invalid JSON:

analyze:
  type: agent
  provider: claude
  prompt: "Return valid JSON"
  output_format: json
  timeout: 60
  on_failure: handle_json_error

handle_json_error:
  type: step
  command: echo "JSON parsing failed"
  on_success: done

Error message includes:

Clear indication of JSON validation failure
First 200 characters of the malformed output (for debugging)
Suggestions on how to fix the issue

Multi-Turn Conversations

There are two approaches for multi-turn conversations:

Chaining Steps (Manual State Passing)

For simple multi-turn workflows, chain agent steps with state passing:

name: code-review-conversation
version: "1.0.0"

inputs:
  - name: code
    type: string
    required: true

states:
  initial: initial_review

  initial_review:
    type: agent
    provider: claude
    prompt: |
      Review this code for issues:
      {{.inputs.code}}
    on_success: ask_about_performance

  ask_about_performance:
    type: agent
    provider: claude
    prompt: |
      Based on your previous analysis:
      {{.states.initial_review.Output}}

      Can you elaborate on performance concerns?
    on_success: suggest_improvements

  suggest_improvements:
    type: agent
    provider: claude
    prompt: |
      Based on the previous discussion, suggest 3 specific improvements to:
      {{.inputs.code}}
    on_success: done

  done:
    type: terminal

Each step can reference previous agent outputs and build on the conversation without maintaining session state.

Conversation Mode (Built-In Multi-Turn)

For iterative refinement within a single step, use conversation mode with automatic context window management:

refine_code:
  type: agent
  provider: claude
  mode: conversation
  system_prompt: "You are a code reviewer. Iterate until code is approved."
  initial_prompt: |
    Review this code:
    {{.inputs.code}}
  conversation:
    max_turns: 10
    max_context_tokens: 100000
    stop_condition: "response contains 'APPROVED'"
  on_success: done

Key differences:

Automatic turn management — No need to manually chain steps
Context window handling — Automatically truncates old turns when token limit approached
Stop conditions — Exit conversation early when specific condition met
Single step — Simpler workflows for iterative refinement

See Conversation Mode Guide for detailed documentation, examples, and best practices.

Error Handling

Agent steps follow standard error handling:

analyze:
  type: agent
  provider: claude
  prompt: "Review: {{.inputs.code}}"
  timeout: 120
  on_success: success_path
  on_failure: error_path
  retry:
    max_attempts: 3
    backoff: exponential
    initial_delay: 2s

success_path:
  type: terminal

error_path:
  type: terminal
  status: failure

You can also use inline error shorthand to avoid defining separate terminal states:

analyze:
  type: agent
  provider: claude
  prompt: "Review: {{.inputs.code}}"
  timeout: 120
  on_success: done
  on_failure: {message: "Agent analysis failed", status: 3}

done:
  type: terminal

See Workflow Syntax — Inline Error Shorthand for full details.

Common Error Scenarios

Error	Cause	Solution
Provider not found	CLI tool not installed	Install required CLI (e.g., `claude install`)
Timeout	Agent response took too long	Increase timeout or reduce prompt complexity
Invalid provider	Unsupported provider	Use `claude`, `codex`, `gemini`, `opencode`, or `openai_compatible`
Command failed	Provider CLI returned error	Check provider configuration and logs

Debugging

Use --dry-run to preview resolved prompts without execution:

awf run workflow --dry-run
# Shows: [DRY RUN] Agent: claude
# Prompt: <resolved prompt text>

Parallel Agent Execution

Run multiple agents concurrently:

parallel_analysis:
  type: parallel
  parallel:
    - claude_review
    - codex_suggest
  strategy: all_succeed
  on_success: aggregate

claude_review:
  type: agent
  provider: claude
  prompt: "Analyze for security: {{.inputs.code}}"

codex_suggest:
  type: agent
  provider: codex
  prompt: "Optimize performance: {{.inputs.code}}"

aggregate:
  type: step
  command: echo "Claude: {{.states.claude_review.Output}}\nCodex: {{.states.codex_suggest.Output}}"
  on_success: done

Token Tracking

Some providers report token usage (useful for cost tracking):

analyze:
  type: agent
  provider: claude
  prompt: "Review: {{.inputs.code}}"
  options:
    model: claude-sonnet-4-20250514
  on_success: log_tokens

log_tokens:
  type: step
  command: echo "Tokens used: {{.states.analyze.TokensUsed}}"
  on_success: done

Note: All agent providers (Claude, Gemini, Codex) report token usage in the TokensUsed field.

Best Practices

1. Keep Prompts Focused

Long, complex prompts may hit token limits or timeout. Break into multiple steps:

# ❌ Too much
ask_everything:
  type: agent
  provider: claude
  prompt: |
    Review code for security, performance, style, and suggest
    improvements, then estimate refactoring effort...

# ✅ Better
security_review:
  type: agent
  provider: claude
  prompt: "Security review: {{.inputs.code}}"
  on_success: performance_review

performance_review:
  type: agent
  provider: claude
  prompt: |
    After this security review:
    {{.states.security_review.Output}}

    Now analyze performance: {{.inputs.code}}
  on_success: done

2. Use Consistent Formatting

Request structured output when relevant:

analyze:
  type: agent
  provider: claude
  prompt: |
    Analyze code and respond in JSON format:
    {
      "issues": [...],
      "severity": "high|medium|low",
      "estimate_hours": number
    }

    Code: {{.inputs.code}}
  on_success: process_response

3. Add Timeouts

Always set reasonable timeouts:

analyze:
  type: agent
  provider: claude
  prompt: "Review: {{.inputs.code}}"
  timeout: 120  # 2 minutes
  on_success: next

4. Test with Dry-Run

Preview prompts before running:

awf run my-workflow --dry-run --input file=/path/to/file

5. Handle Missing Providers

Test that required providers are installed:

states:
  initial: check_claude

  check_claude:
    type: step
    command: which claude
    on_success: analyze
    on_failure: install_claude

  analyze:
    type: agent
    provider: claude
    prompt: "Review: {{.inputs.code}}"
    on_success: done

  install_claude:
    type: terminal
    status: failure

  done:
    type: terminal

Agent Steps Guide

Overview#

Basic Usage#

Supported Providers#

Claude (Anthropic)#

Codex (OpenAI)#

Gemini (Google)#

OpenCode#

OpenAI-Compatible Provider#

Prompt Templates#

External Prompt Files#

Features#

Mutual Exclusivity#

Path Resolution#

Local-Before-Global Resolution#

Template Helper Functions#

split#

join#

readFile#

trimSpace#

Example: Multi-File Workflow#

Capturing Responses#

Accessing Raw Output#

Parsing JSON Responses#

Output Formatting#

Available Formats#

json Format#

Error Handling#

Multi-Turn Conversations#

Chaining Steps (Manual State Passing)#

Conversation Mode (Built-In Multi-Turn)#

Error Handling#

Common Error Scenarios#

Debugging#

Parallel Agent Execution#

Token Tracking#

Best Practices#

1. Keep Prompts Focused#

2. Use Consistent Formatting#

3. Add Timeouts#

4. Test with Dry-Run#

5. Handle Missing Providers#

See Also#

Overview

Basic Usage

Supported Providers

Claude (Anthropic)

Codex (OpenAI)

Gemini (Google)

OpenCode

OpenAI-Compatible Provider

Prompt Templates

External Prompt Files

Features

Mutual Exclusivity

Path Resolution

Local-Before-Global Resolution

Template Helper Functions

`split`

`join`

`readFile`

`trimSpace`

Example: Multi-File Workflow

Capturing Responses

Accessing Raw Output

Parsing JSON Responses

Output Formatting

Available Formats

`json` Format

Error Handling

Multi-Turn Conversations

Chaining Steps (Manual State Passing)

Conversation Mode (Built-In Multi-Turn)

Error Handling

Common Error Scenarios

Debugging

Parallel Agent Execution

Token Tracking

Best Practices

1. Keep Prompts Focused

2. Use Consistent Formatting

3. Add Timeouts

4. Test with Dry-Run

5. Handle Missing Providers

See Also