Skip to content

Add FORM-based source provider#379

Draft
barnaliy wants to merge 7 commits intomainfrom
barnali/provider-based-on-form
Draft

Add FORM-based source provider#379
barnaliy wants to merge 7 commits intomainfrom
barnali/provider-based-on-form

Conversation

@barnaliy
Copy link
Contributor

@barnaliy barnaliy commented Mar 3, 2026

Adds form_source.cpp implementing a FORM-based provider that retrieves data products from FORM storage into Phlex.
This is the source-side complement to form_module.cpp.

@barnaliy
Copy link
Contributor Author

barnaliy commented Mar 3, 2026

@phlexbot format

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

No automatic header-guards fixes were necessary.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

No automatic cmake-format fixes were necessary.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

No automatic markdownlint fixes were necessary.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

Automatic clang-format fixes pushed (commit 388095a).
⚠️ Note: Some issues may require manual review and fixing.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

No automatic jsonnetfmt fixes were necessary.

@barnaliy barnaliy requested review from gemmeren, knoepfel and pcanal March 3, 2026 16:00
@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

❌ Patch coverage is 0% with 29 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
form/form_source.cpp 0.00% 29 Missing ⚠️

❌ Your patch check has failed because the patch coverage (0.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

@@            Coverage Diff             @@
##             main     #379      +/-   ##
==========================================
- Coverage   85.36%   82.98%   -2.39%     
==========================================
  Files         122      127       +5     
  Lines        2433     3320     +887     
  Branches      389      587     +198     
==========================================
+ Hits         2077     2755     +678     
- Misses        233      354     +121     
- Partials      123      211      +88     
Flag Coverage Δ
unittests 82.98% <0.00%> (-2.39%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
form/form_source.cpp 0.00% <0.00%> (ø)

... and 48 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0435987...3433b24. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@barnaliy barnaliy requested a review from aolivier23 March 3, 2026 16:17
@gemmeren gemmeren requested a review from wwuoneway March 3, 2026 16:29
// product_query fields must be compile-time _id literals, not runtime strings.
// To add new products, add new s.provide() blocks below following the same pattern.

s.provide("provide_i",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be needed for every data product we read from input? If so, we may want to think about a way to shorten or auto-generate these functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already noted in the FIXME comment — it's a known limitation of Prototype 0.1 that will be addressed as the type list grows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too I wouldn't mind a git issue

@barnaliy
Copy link
Contributor Author

barnaliy commented Mar 3, 2026

@phlexbot format

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

No automatic header-guards fixes were necessary.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

No automatic jsonnetfmt fixes were necessary.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

No automatic markdownlint fixes were necessary.

Copy link
Contributor

@wwuoneway wwuoneway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a structural concern regarding the separation between form_module and form_source, as both currently communicate directly with Phlex. form_source appears to focus on reading input files, while form_module likely handles the integration with Phlex. Should form_source be a standalone module that communicates directly with Phlex, or would it be more appropriate as an internal component used by form_module?

form::experimental::config::output_item_config input_cfg;
form::experimental::config::tech_setting_config tech_cfg;
input_cfg.addItem("i", input_file, technology);
input_cfg.addItem("j", input_file, technology);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of debating what to hardcode ( here, placeholder i and j, or meaningful names), it would be preferable to make the code automatically detect available items from the input file. This would eliminate hardcoding entirely and make the module truly reusable and scalable. A real input file may contain many possible items, and production code should ideally adapt dynamically instead assuming a fixed set. I would suggest moving such hardcoded i and j to test code, while the production code should be intelligent enough to discover available items automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach is simple and explicit for Prototype 0.1. Auto-discovery is the right long-term goal, but there are two concrete blockers that go beyond the scope of this PR:

  1. product_query fields require compile-time _id literals and do not accept runtime strings, making it impossible to loop over a dynamically discovered list of products
  2. FORM has no enumeration API to discover available items from an input file

I propose keeping the current approach for Prototype 0.1 and tracking auto-discovery as a follow-up issue once these blockers are resolved.

}
return *static_cast<int const*>(pb.data);
})
.output_product(product_query{.creator = "input"_id, .layer = "event"_id, .suffix = "j"_id});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two provider lambdas for i and j are almost identical, with only the product name string differing. Consider extracting the common logic into a factory function or template to reduce code duplication and improve maintainability. This would also make it easier to add more products in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplication is not an oversight.
The lambda body duplication is minimal and intentional for clarity in Prototype 0.1
The output_product() call cannot be generalized due to product_query compile-time _id literal constraint — already noted in FIXME.

@barnaliy
Copy link
Contributor Author

barnaliy commented Mar 5, 2026

I have a structural concern regarding the separation between form_module and form_source, as both currently communicate directly with Phlex. form_source appears to focus on reading input files, while form_module likely handles the integration with Phlex. Should form_source be a standalone module that communicates directly with Phlex, or would it be more appropriate as an internal component used by form_module?

The strong reason to keep them separate is that Phlex itself makes this distinction at the framework level — form_module uses PHLEX_REGISTER_ALGORITHMS while form_source uses PHLEX_REGISTER_PROVIDERS. These are fundamentally different constructs with different roles in the framework. Merging them would fight the framework design.

I prefer to keep form_module and form_source separate. That said, what do you think @pcanal and @knoepfel ?

@barnaliy
Copy link
Contributor Author

barnaliy commented Mar 6, 2026

Updated form_source.cpp to address a few concerns (@gemmeren @pcanal @knoepfel @wwuoneway @aolivier23 ):

  • Product list now driven by config (products: ['i', 'j']) — consistent with form_module.cpp
  • Uses identifier(runtime_string) from phlex::experimental for runtime-driven product registration

This makes form_source fully config-driven and scalable.

@barnaliy
Copy link
Contributor Author

barnaliy commented Mar 6, 2026

@phlexbot format

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Format Fixes Applied

✅ clang-format fixes pushed (commit 456191d)
✅ cmake-format fixes pushed (commit 0f63a64)
✅ YAML formatter fixes pushed (commit e04c746)

⚠️ Note: Some issues may require manual review and fixing.

@barnaliy
Copy link
Contributor Author

barnaliy commented Mar 6, 2026

The YAML file changes visible in this PR are from the @phlexbot format auto-formatting commit — not part of this PR's changes. The only intentional changes are in form/form_source.cpp and form/CMakeLists.txt.

Appears like @phlexbot format is auto-formatting the yaml files.

@barnaliy
Copy link
Contributor Author

barnaliy commented Mar 6, 2026

The YAML file changes visible in this PR are from the @phlexbot format auto-formatting commit — not part of this PR's changes. The only intentional changes are in form/form_source.cpp and form/CMakeLists.txt.

Appears like @phlexbot format is auto-formatting the yaml files.

Following up on my earlier concern: Chris confirmed that @phlexbot format had issues when this PR was first submitted, which were later fixed. Running the format step again this morning correctly updated the files that needed formatting (including the YAML files), which is why the PR now shows additional changes.

@wwuoneway
Copy link
Contributor

If the current version passes all the checks, I'm fine with going with it for now and following up on the known issues in a later update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants