Update workflow history #4364

josephjclark · 2026-02-01T17:11:39Z

Following #4333 and my final comment, I really don't think version history should be saved to the Workflow table. It's too hard to maintain there, and we can't run compaction properly.

This PR forces the provisioner and sandbox merge to look up the actual version history from the table.

Also, for good measure, since all version histories changed recently I've added a migration to remove all existing history entries. Hard reset on all of them. The CLI won't mind - it's incompatible with those versions anyway.

Tests are AI generated and I haven't even looked yet - I'll take a closer look before opening up for review.

Changelog deliberately omitted

AI Usage

Please disclose whether you've used AI anywhere in this PR (it's cool, we just
want to know!):

I have used Claude Code
I have used another model
I have not used AI

You can read more details in our
Responsible AI Policy

Pre-submission checklist

I have performed an AI review of my code (we recommend using /review
with Claude Code)
I have implemented and tested all related authorization policies.
(e.g., :owner, :admin, :editor, :viewer)
I have updated the changelog.
I have ticked a box in "AI usage" in this PR

priv/repo/migrations/20260201163115_remove_version_history_from_workflows.exs

josephjclark · 2026-02-03T18:32:08Z

I'm a bit worried about some of the diffs in this PR. I'll have to review more closely with a clear head tomrrow

Since the algo has changed we may as well clean the slate

josephjclark · 2026-02-10T13:50:26Z

priv/repo/migrations/20260201163115_remove_version_history_from_workflows.exs

@@ -0,0 +1,33 @@
+defmodule Lightning.Repo.Migrations.RemoveVersionHistoryFromWorkflows do


@midigofrank if I wanted to basically reset all the versions in history, just to clear the slate, then I guess you wouldn't do it as a migration?

Maybe we should drop this. I was going to do it just as a precaution - but if it's too complex we should probably skip it.

I suppose the other way to look at it is a job that removes all versions timestamped before a certain date

I agree we don't need to reset the version history here. The code works without it. history_for/1 and latest_hash/1 now query the table directly and handle empty results fine. But I think we still need this migration for the column drop and constraint removal, since the version_history field is removed from the schema, the column should go too. So I'd suggest keeping this migration but removing these two lines:

# Clean slate: remove all workflow_versions rows since they were out of sync execute "DELETE FROM workflow_versions"

If we want a clean slate later, a separate mix task with a date cutoff sounds like the right approach.

* Refactor CircleCI to build-then-fan-out pattern Previously, all 6 CI jobs ran the full build independently in parallel, causing cache race conditions and redundant compilation that contributed to flaky tests. Now uses a single compile job that persists to workspace, with 6 lightweight check jobs that attach the pre-built workspace: - compile: Full build, deps, PLT - runs once - check_formatting, check_credo, check_dialyzer, check_sobelow: Fast checks - test_elixir, test_javascript: Test suites Benefits: - Eliminates cache contention between parallel jobs - Single compilation instead of 6x redundant builds - Faster feedback on simple failures - More reliable and debuggable CI runs Also adds reusable executors and commands for cleaner config. * Update changelog for CircleCI refactor Add entry for PR #4378 * Fix workspace persist path error Remove ~/.mix from persist_to_workspace paths - it's outside the workspace root and already handled by the deps cache. * Fix git safe.directory error during checkout Add git safe.directory config before checkout to avoid "dubious ownership" error when running as root in the lightning user's home. * Install Hex/Rebar in downstream jobs ~/.mix isn't part of the workspace, so downstream jobs need to install Hex and Rebar. This is a fast operation that doesn't require compilation. * Add Node.js to test_elixir job Lightning.AdaptorService calls npm during startup, so Node must be available for Elixir tests to run. * Fix ownership after cache restore Cache is restored as root, so _build and deps directories need chown before lightning user can compile. * Add libsodium to test_elixir job The enacl NIF requires libsodium at runtime. * Fix cache path for Hex/Rebar ~/.mix expands to /root/.mix but Hex is installed to /home/lightning/.mix. Use explicit path and bump cache version to v7 for clean slate. * Fix ownership of .mix directory after cache restore Cache restores /home/lightning/.mix as root. Need to chown all of /home/lightning, not just the project directory. * Fix rebar install and consolidate lint checks into single job - Fix rebar3 installing for root instead of lightning user, which was causing all Erlang deps to recompile on every CI run - Combine formatting, credo, and sobelow into single lint job to reduce environment spin-up overhead (~30s per job saved) - Use when: always to run all lint checks even if one fails, with sentinel file to track failures * Use date-based cache keys instead of version numbers * Move changelog entry

* fix .gitignore to preserve test fixture adaptor_registry_cache.json The blanket ignore rule for adaptor_registry_cache.json was causing the test fixture copy to be excluded by tools that respect .gitignore (e.g. rsync), leading to test failures. * Fix cascade delete race condition causing StaleEntryError on edge retargeting When a job is deleted and its edge retargeted in the same save, PostgreSQL's ON DELETE CASCADE removes the edge before Ecto can update it, causing a StaleEntryError. Change edge FK constraints from CASCADE to partial SET NULL (PG 15+) so deleted jobs nullify edge references instead of cascading. Add orphan tracking to both save paths (save_workflow and provisioner) that precisely captures which edges were affected, then cleans up only those edges after the save completes - preserving legitimate NULL target edges. * Update provisioner test for partial SET NULL FK behavior The test previously expected StaleEntryError when deleting a job and its edge in the same import. With partial SET NULL FKs, the operation now succeeds correctly - the FK nullifies the edge reference instead of cascade-deleting it, allowing Ecto to process the edge deletion. * More tests * Clean up provisioner_test.exs * Handle source_job edge orphaning when jobs are deleted Previously only target_job_id references were tracked and cleaned up when jobs were deleted. Edges where the deleted job was the source (source_job_id) were nullified but not tracked for cleanup, leaving orphaned edges with no source. This matters when replacing the first job in a chain — both the trigger edge target and the downstream edge source need handling. - Track source-orphaned edge IDs alongside target-orphaned in both save_workflow and provisioner paths - Broaden cleanup to delete edges missing either target or source (without trigger fallback) - Restore two unrelated tests removed from provisioner "new project" block (collections, trigger edge enabled) - Add 6 new tests covering source-job deletion and retargeting scenarios across both save paths * Add changelog entry for edge retargeting StaleEntryError fix

* reuse existing version_chip component * update changelog

elias-ba

This is looking good Joe, nicely done 👏🏽. I have added a comment on my thoughts about the migration file.

The provisioning API still returns version_history (via WorkflowVersions.history_for), but the field was removed from the Workflow schema so it can no longer be extracted via Map.take on the struct. Use the query function instead.

elias-ba · 2026-02-10T18:56:37Z

Hey @josephjclark, I saw a few Elixir tests failing and wanted to help fix them. I already pushed a commit to fix the pull test in cli_deploy_test.exs (the expected state was missing version_history).

But there's one remaining failure I think is important for you to be aware of. The provisioning API still returns version_history via WorkflowVersions.history_for at provisioning_json.ex:82-84. The CLI round-trips it back during deploy, and the server rejects it with: "extraneous parameters: version_history"

Since the field was removed from the Workflow schema, the server no longer accepts it. Two options:

Stop returning version_history from the provisioning API (remove lines 82-87 in provisioning_json.ex)
Keep returning it but strip it from incoming deploy payloads

What do you think?

github-project-automation bot added this to v2 Feb 1, 2026

github-project-automation bot moved this to New Issues in v2 Feb 1, 2026

josephjclark commented Feb 2, 2026

View reviewed changes

priv/repo/migrations/20260201163115_remove_version_history_from_workflows.exs Show resolved Hide resolved

josephjclark added 6 commits February 9, 2026 15:49

lowercase job names when versioning

079948f

changelog

27f39fc

remove log

fc41511

drop version_history cache on the Workflow table

d252cdb

add migration to drop all workflow histories

0851bf4

Since the algo has changed we may as well clean the slate

tests

6b267cb

josephjclark force-pushed the update-workflow-history branch from e964a9a to 6b267cb Compare February 9, 2026 15:49

josephjclark commented Feb 10, 2026

View reviewed changes

josephjclark marked this pull request as ready for review February 10, 2026 14:06

josephjclark requested a review from elias-ba February 10, 2026 14:06

josephjclark assigned midigofrank Feb 10, 2026

midigofrank and others added 4 commits February 10, 2026 15:25

release v2.15.13

75c3b35

Update version chip to reuse existing component (#4389)

1996959

* reuse existing version_chip component * update changelog

elias-ba approved these changes Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update workflow history #4364

Update workflow history #4364

Uh oh!

josephjclark commented Feb 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

josephjclark commented Feb 3, 2026

Uh oh!

josephjclark Feb 10, 2026

Uh oh!

elias-ba Feb 10, 2026

Uh oh!

elias-ba left a comment

Uh oh!

elias-ba commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -0,0 +1,33 @@
		defmodule Lightning.Repo.Migrations.RemoveVersionHistoryFromWorkflows do

Update workflow history #4364

Are you sure you want to change the base?

Update workflow history #4364

Uh oh!

Conversation

josephjclark commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Usage

Pre-submission checklist

Uh oh!

Uh oh!

josephjclark commented Feb 3, 2026

Uh oh!

josephjclark Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

elias-ba Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

elias-ba left a comment

Choose a reason for hiding this comment

Uh oh!

elias-ba commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

josephjclark commented Feb 1, 2026 •

edited

Loading