-
Notifications
You must be signed in to change notification settings - Fork 63
Update workflow history #4364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update workflow history #4364
Conversation
priv/repo/migrations/20260201163115_remove_version_history_from_workflows.exs
Show resolved
Hide resolved
|
I'm a bit worried about some of the diffs in this PR. I'll have to review more closely with a clear head tomrrow |
Since the algo has changed we may as well clean the slate
e964a9a to
6b267cb
Compare
| @@ -0,0 +1,33 @@ | |||
| defmodule Lightning.Repo.Migrations.RemoveVersionHistoryFromWorkflows do | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@midigofrank if I wanted to basically reset all the versions in history, just to clear the slate, then I guess you wouldn't do it as a migration?
Maybe we should drop this. I was going to do it just as a precaution - but if it's too complex we should probably skip it.
I suppose the other way to look at it is a job that removes all versions timestamped before a certain date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we don't need to reset the version history here. The code works without it. history_for/1 and latest_hash/1 now query the table directly and handle empty results fine. But I think we still need this migration for the column drop and constraint removal, since the version_history field is removed from the schema, the column should go too. So I'd suggest keeping this migration but removing these two lines:
# Clean slate: remove all workflow_versions rows since they were out of sync
execute "DELETE FROM workflow_versions"If we want a clean slate later, a separate mix task with a date cutoff sounds like the right approach.
* Refactor CircleCI to build-then-fan-out pattern Previously, all 6 CI jobs ran the full build independently in parallel, causing cache race conditions and redundant compilation that contributed to flaky tests. Now uses a single compile job that persists to workspace, with 6 lightweight check jobs that attach the pre-built workspace: - compile: Full build, deps, PLT - runs once - check_formatting, check_credo, check_dialyzer, check_sobelow: Fast checks - test_elixir, test_javascript: Test suites Benefits: - Eliminates cache contention between parallel jobs - Single compilation instead of 6x redundant builds - Faster feedback on simple failures - More reliable and debuggable CI runs Also adds reusable executors and commands for cleaner config. * Update changelog for CircleCI refactor Add entry for PR #4378 * Fix workspace persist path error Remove ~/.mix from persist_to_workspace paths - it's outside the workspace root and already handled by the deps cache. * Fix git safe.directory error during checkout Add git safe.directory config before checkout to avoid "dubious ownership" error when running as root in the lightning user's home. * Install Hex/Rebar in downstream jobs ~/.mix isn't part of the workspace, so downstream jobs need to install Hex and Rebar. This is a fast operation that doesn't require compilation. * Add Node.js to test_elixir job Lightning.AdaptorService calls npm during startup, so Node must be available for Elixir tests to run. * Fix ownership after cache restore Cache is restored as root, so _build and deps directories need chown before lightning user can compile. * Add libsodium to test_elixir job The enacl NIF requires libsodium at runtime. * Fix cache path for Hex/Rebar ~/.mix expands to /root/.mix but Hex is installed to /home/lightning/.mix. Use explicit path and bump cache version to v7 for clean slate. * Fix ownership of .mix directory after cache restore Cache restores /home/lightning/.mix as root. Need to chown all of /home/lightning, not just the project directory. * Fix rebar install and consolidate lint checks into single job - Fix rebar3 installing for root instead of lightning user, which was causing all Erlang deps to recompile on every CI run - Combine formatting, credo, and sobelow into single lint job to reduce environment spin-up overhead (~30s per job saved) - Use when: always to run all lint checks even if one fails, with sentinel file to track failures * Use date-based cache keys instead of version numbers * Move changelog entry
* fix .gitignore to preserve test fixture adaptor_registry_cache.json The blanket ignore rule for adaptor_registry_cache.json was causing the test fixture copy to be excluded by tools that respect .gitignore (e.g. rsync), leading to test failures. * Fix cascade delete race condition causing StaleEntryError on edge retargeting When a job is deleted and its edge retargeted in the same save, PostgreSQL's ON DELETE CASCADE removes the edge before Ecto can update it, causing a StaleEntryError. Change edge FK constraints from CASCADE to partial SET NULL (PG 15+) so deleted jobs nullify edge references instead of cascading. Add orphan tracking to both save paths (save_workflow and provisioner) that precisely captures which edges were affected, then cleans up only those edges after the save completes - preserving legitimate NULL target edges. * Update provisioner test for partial SET NULL FK behavior The test previously expected StaleEntryError when deleting a job and its edge in the same import. With partial SET NULL FKs, the operation now succeeds correctly - the FK nullifies the edge reference instead of cascade-deleting it, allowing Ecto to process the edge deletion. * More tests * Clean up provisioner_test.exs * Handle source_job edge orphaning when jobs are deleted Previously only target_job_id references were tracked and cleaned up when jobs were deleted. Edges where the deleted job was the source (source_job_id) were nullified but not tracked for cleanup, leaving orphaned edges with no source. This matters when replacing the first job in a chain — both the trigger edge target and the downstream edge source need handling. - Track source-orphaned edge IDs alongside target-orphaned in both save_workflow and provisioner paths - Broaden cleanup to delete edges missing either target or source (without trigger fallback) - Restore two unrelated tests removed from provisioner "new project" block (collections, trigger edge enabled) - Add 6 new tests covering source-job deletion and retargeting scenarios across both save paths * Add changelog entry for edge retargeting StaleEntryError fix
* reuse existing version_chip component * update changelog
elias-ba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good Joe, nicely done 👏🏽. I have added a comment on my thoughts about the migration file.
The provisioning API still returns version_history (via WorkflowVersions.history_for), but the field was removed from the Workflow schema so it can no longer be extracted via Map.take on the struct. Use the query function instead.
|
Hey @josephjclark, I saw a few Elixir tests failing and wanted to help fix them. I already pushed a commit to fix the pull test in cli_deploy_test.exs (the expected state was missing version_history). But there's one remaining failure I think is important for you to be aware of. The provisioning API still returns Since the field was removed from the Workflow schema, the server no longer accepts it. Two options:
What do you think? |
Following #4333 and my final comment, I really don't think version history should be saved to the Workflow table. It's too hard to maintain there, and we can't run compaction properly.
This PR forces the provisioner and sandbox merge to look up the actual version history from the table.
Also, for good measure, since all version histories changed recently I've added a migration to remove all existing history entries. Hard reset on all of them. The CLI won't mind - it's incompatible with those versions anyway.
Tests are AI generated and I haven't even looked yet - I'll take a closer look before opening up for review.
Changelog deliberately omitted
AI Usage
Please disclose whether you've used AI anywhere in this PR (it's cool, we just
want to know!):
You can read more details in our
Responsible AI Policy
Pre-submission checklist
/reviewwith Claude Code)
(e.g.,
:owner,:admin,:editor,:viewer)