docs(prd): add PRD for DAG-based concurrent execution by osterman · Pull Request #2194 · cloudposse/atmos

osterman · 2026-03-14T05:04:42Z

what

Added comprehensive Product Requirements Document for implementing DAG-based concurrent execution in Atmos. The PRD proposes a ready-queue scheduler that enables concurrent execution of components across all types (Terraform, Packer, Ansible, custom registry) while respecting dependency graphs and maintaining safe defaults (sequential by default with opt-in parallelism via --max-concurrency).

why

Currently Atmos executes components sequentially even when they have no dependencies and could safely run in parallel. For large deployments with dozens or hundreds of components, this serialization is the dominant bottleneck. The PRD establishes architectural principles, justifies ready-queue scheduling through industry research (Terragrunt, Make, Ninja, Bazel, Buck2, and 10+ other tools all use this pattern), and provides a phased rollout plan. The document also addresses critical concerns: output isolation under concurrency via stream injection, integration with legacy built-in component types without requiring migration, and configuration of concurrency defaults through atmos.yaml.

references

Prerequisite: PR feat: Add structured component dependencies with cross-type and file monitoring #2193 (dependencies.components format with cross-type dependencies)
Related: PR feat: implement dependency order execution for terraform --all flag #1516 (pkg/dependency/ graph package)
Related: PR docs: add proposal for concurrent component provisioning #2159 (alternative proposal for concurrent provisioning)
Related PRD: docs/prd/terraform-dependency-order.md

Summary by CodeRabbit

Documentation
- Added a PRD describing DAG-aware concurrent execution with a ready-queue scheduler, phased rollout, and configurable max-concurrency (default 1).
- Describes per-node streaming and logging, output labeling, JSON summaries, and visualization guidance for debugging DAGs.
- Includes examples, integration points for mixed component types, and operational/rollout considerations.

Implement a ready-queue scheduler for component-type-agnostic concurrent execution with bounded concurrency, proper output isolation, and safe defaults. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

github-actions · 2026-03-14T05:05:42Z

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

coderabbitai · 2026-03-14T05:22:18Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9a6e0250-d29d-481b-99a1-d9d6ca6dc73d

📥 Commits

Reviewing files that changed from the base of the PR and between 16c4179 and 1adabc7.

📒 Files selected for processing (1)

docs/prd/dag-concurrent-execution.md

📝 Walkthrough

Walkthrough

Adds a new PRD describing a DAG-aware concurrent executor: a ready-queue scheduler with bounded worker pool, phased rollout, integration points for replacing sequential execution, stream-injectable per-node outputs, and configuration defaults (max_concurrency default 1).

Changes

Cohort / File(s)	Summary
DAG Concurrency PRD `docs/prd/dag-concurrent-execution.md`	Adds a new PRD specifying a ready-queue DAG scheduler architecture (pkg/scheduler/), separation from dependency graph data (pkg/dependency/), phased implementation plan (4 phases), per-node stream injection and logging, concurrency primitives (errgroup, semaphores), CLI/env/config max_concurrency, and integration notes for cross-type components and legacy built-ins.

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant Scheduler
    participant DependencyGraph as DepGraph
    participant WorkerPool
    participant Subprocess

    CLI->>DepGraph: load graph
    CLI->>Scheduler: start execution (max_concurrency)
    Scheduler->>DepGraph: request ready nodes
    DepGraph-->>Scheduler: ready node(s)
    Scheduler->>WorkerPool: dispatch node job
    WorkerPool->>Subprocess: run node (stream-injected stdout/stderr)
    Subprocess-->>WorkerPool: complete (exit + outputs)
    WorkerPool-->>Scheduler: node finished (result, logs)
    Scheduler->>DepGraph: mark node complete, request new ready nodes
    DepGraph-->>Scheduler: new ready node(s) or done
    Scheduler->>CLI: emit JSON summary / per-node logs when complete

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

no-release

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and directly reflects the main change: adding a PRD document about DAG-based concurrent execution, which matches the changeset perfectly.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch osterman/dag-scheduler-prd

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/prd/dag-concurrent-execution.md`:
- Around line 273-319: The Run signature and loop must be fixed: change
Scheduler.Run(ctx context.Context) to return error (or return a *Result that
wraps g.Wait() error) so returning g.Wait() type-checks; also avoid deadlock by
making all channel operations respect context cancellation — when receiving the
next node from ready use a select with case <-ctx.Done() to return
ctx.Err()/g.Wait(), and when enqueuing dependents inside the g.Go closure (the
loop that updates inDegree and does ready <- s.graph.GetNode(dep)) use a
non-blocking/ctx-aware send (select with case ready<-node and case <-ctx.Done())
so failFast cancellation won't leave goroutines blocked. Apply these changes in
Scheduler.Run, around the ready channel receive and the dependents enqueue logic
inside the g.Go closure, and ensure completed/total bookkeeping still guarded by
mu.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7f195206-7caf-465b-9592-b65508e033f5

📥 Commits

Reviewing files that changed from the base of the PR and between 76a269b and 16c4179.

📒 Files selected for processing (1)

docs/prd/dag-concurrent-execution.md

docs/prd/dag-concurrent-execution.md

codecov · 2026-03-14T05:31:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.29%. Comparing base (bbac3f8) to head (1adabc7).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2194      +/-   ##
==========================================
- Coverage   77.29%   77.29%   -0.01%     
==========================================
  Files         960      960              
  Lines       91088    91088              
==========================================
- Hits        70410    70404       -6     
- Misses      16593    16602       +9     
+ Partials     4085     4082       -3

Flag	Coverage Δ
unittests	`77.29% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 4 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Address CodeRabbit review: fix Run() to return *Result (not error from g.Wait()), add ctx.Done() select to prevent deadlock on fail-fast cancellation, and use context-aware channel sends in dependent enqueuing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(prd): add comprehensive PRD for DAG-based concurrent execution

16c4179

Implement a ready-queue scheduler for component-type-agnostic concurrent execution with bounded concurrency, proper output isolation, and safe defaults. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

osterman requested a review from a team as a code owner March 14, 2026 05:04

github-actions bot added the size/m Medium size PR label Mar 14, 2026

osterman added the no-release Do not create a new release (wait for additional code changes) label Mar 14, 2026

coderabbitai bot requested changes Mar 14, 2026

View reviewed changes

docs/prd/dag-concurrent-execution.md Show resolved Hide resolved

coderabbitai bot approved these changes Mar 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(prd): add PRD for DAG-based concurrent execution#2194

docs(prd): add PRD for DAG-based concurrent execution#2194
osterman wants to merge 2 commits intomainfrom
osterman/dag-scheduler-prd

osterman commented Mar 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 14, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

codecov bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

osterman commented Mar 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

what

why

references

Summary by CodeRabbit

Uh oh!

github-actions bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

osterman commented Mar 14, 2026 •

edited by coderabbitai bot

Loading

github-actions bot commented Mar 14, 2026 •

edited

Loading

coderabbitai bot commented Mar 14, 2026 •

edited

Loading

codecov bot commented Mar 14, 2026 •

edited

Loading