Status: Accepted Date: 2025-12-01 Supersedes: N/A Superseded by: N/A

Context

AWF persists workflow state to JSON files during execution. Workflows can be long-running (minutes) and may be interrupted by signals, crashes, or concurrent access. A partial write would corrupt the state file, making workflow resumption impossible.

Candidates

OptionProsCons
Atomic write (temp + rename)Corruption-proof, OS-guaranteed atomicity on same filesystemRequires same-filesystem temp, slightly more code
Direct write with fsyncSimpler codePartial writes on crash, no protection against concurrent access
SQLite WALACID transactions, concurrent readsCGO dependency (already present), heavier for simple state

Decision

Use temp file + rename pattern for all state file writes:

  1. Write to unique temp file (PID + timestamp suffix) in same directory
  2. Sync to disk
  3. Rename atomically to target path
  4. File locking for concurrent access protection

Rules:

  • All file writes in infrastructure layer use this pattern
  • Temp file names include PID and timestamp for uniqueness
  • Same-directory temp files to guarantee same-filesystem rename
  • File locking via flock for concurrent JSONStore access

Consequences

What becomes easier:

  • Workflow resume after crash is always safe
  • Concurrent awf status reads never see partial state
  • No corruption recovery code needed

What becomes harder:

  • Slightly more complex write path
  • Must ensure temp files are cleaned up on error paths

Constitution Compliance

PrincipleStatusJustification
Security FirstCompliantPrevents data corruption, ensures integrity
Go IdiomsCompliantUses os.Rename which is atomic on POSIX
Error TaxonomyCompliantWrite failures map to exit code 4 (system error)