Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add late model extension to RNTupleMerger #16080

Merged
merged 9 commits into from
Sep 20, 2024

Conversation

silverweed
Copy link
Contributor

@silverweed silverweed commented Jul 22, 2024

This Pull request:

refactors RNTupleMerger to properly support late model extension.
The "Union" merging mode is added, allowing the merger to late-model-extend the destination to include all the fields of the input ntuples (instead of ignoring unknown fields / complaining when models don't match).
Likewise, the "Strict" merging mode is added that checks that all inputs have the exact same structure.
By default, the old behavior (named "Filter") is used.

To better compare the RNTuples structures, the merging logic is now more properly considering the fields of each input, rather than just the columns as it was previously. This also allows for more descriptive messages to the user if some mismatch is found.

NOTE: the RNTupleMerger::Merge function now returns a RResult instead of throwing exceptions on error.

TODO

  • the new merging modes should be exposed to hadd

Checklist:

  • tested changes locally
  • updated the docs (if necessary)

@silverweed silverweed self-assigned this Jul 22, 2024
@silverweed silverweed requested a review from jblomer as a code owner July 22, 2024 12:00
@silverweed silverweed marked this pull request as draft July 22, 2024 12:00
Copy link

github-actions bot commented Jul 22, 2024

Test Results

    13 files      13 suites   3d 2h 17m 12s ⏱️
 3 029 tests  3 029 ✅ 0 💤 0 ❌
33 856 runs  33 856 ✅ 0 💤 0 ❌

Results for commit edc5d49.

♻️ This comment has been updated with latest results.

@silverweed silverweed force-pushed the ntuple_merge_lmext branch 3 times, most recently from 39ee934 to 3ea010a Compare August 6, 2024 07:36
@silverweed silverweed force-pushed the ntuple_merge_lmext branch 3 times, most recently from e3fa48f to e4f81d5 Compare August 16, 2024 11:51
@silverweed silverweed marked this pull request as ready for review August 22, 2024 11:27
@silverweed silverweed force-pushed the ntuple_merge_lmext branch 2 times, most recently from 51eb509 to 9b203c5 Compare August 22, 2024 11:35
Copy link
Contributor

@jblomer jblomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement! Some comments.

tree/ntuple/v7/inc/ROOT/RNTupleDescriptor.hxx Outdated Show resolved Hide resolved
tree/ntuple/v7/inc/ROOT/RPageStorage.hxx Outdated Show resolved Hide resolved
tree/ntuple/v7/inc/ROOT/RNTupleMerger.hxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleMerger.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleMerger.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleMerger.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleMerger.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleMerger.cxx Show resolved Hide resolved
tree/ntuple/v7/src/RNTupleMerger.cxx Outdated Show resolved Hide resolved
tree/ntuple/v7/test/ntuple_merger.cxx Outdated Show resolved Hide resolved
[ntuple] extract ExtendOutputModel from Merge
... rather than saving one per source.
- grouped most data that is commonly passed to merge helper functions
  into a struct to reduce the number of function params
- added a page allocator to the merger
- removed friendship with RPageStorage, created a public static version of SealPage()
- other minor cleanups
Copy link
Contributor

@jblomer jblomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

struct RColumnInfo {
// This column name is built as a dot-separated concatenation of the ancestry of
// the columns' parent fields' names plus the index of the column itself.
// e.g. "Muon.pt.x._0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this in a follow-up PR: I think the representation index needs to be part of the name, too.

@silverweed silverweed merged commit 79e97d8 into root-project:master Sep 20, 2024
17 of 18 checks passed
@silverweed silverweed deleted the ntuple_merge_lmext branch September 20, 2024 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants