Skip to content

[ntuple] Split RNTupleProcessor into RNTupleComposer and RNTupleProcessor#22615

Open
enirolf wants to merge 3 commits into
root-project:masterfrom
enirolf:ntuple-composer
Open

[ntuple] Split RNTupleProcessor into RNTupleComposer and RNTupleProcessor#22615
enirolf wants to merge 3 commits into
root-project:masterfrom
enirolf:ntuple-composer

Conversation

@enirolf

@enirolf enirolf commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

This PR splits the existing RNTupleProcessor into two separate interfaces: the RNTupleComposer for creating compositions (chains and joins) of RNTuples and loading (single) entries from these compositions, and the RNTupleProcessor, which now simply is an iteration interface over RNTupleComposer objects.

Rationale

During the implementation of the RNTupleProcessor-based RDataSource, I realized that in this case, no processing is done by the RNTupleProcessor itself anymore, since this is already taken care of by RDF. From this point of view, I believe it makes sense to have a separation of concerns between the orchestration of the RNTuple compositions and the actual processor thereof, which is what this PR addresses.

Functionally, the changes are minimal. The bulk of the RNTupleProcessor class is moved to a new class, the RNTupleComposer, with only the iterator remaining. The RNTupleProcessor is now created by passing a reference to an RNTupleComposer.

The RNTupleComposer can now also serve as the backend for other data loading interfaces, most notably RDataSource as mentioned previously. In a similar vein, this could potentially open the possibility of merging the RNTupleProcessor into the RNTupleReader, but this requires further investigation.

Old interface

std::vector<RNTupleOpenSpec> ntuples = {{"ntuple1", "ntuple1.root"}, {"ntuple2", "ntuple2.root"}};
auto processor = RNTupleProcessor::CreateChain(ntuples);

auto pt = processor->RequestField<float>("pt");

for (const auto idx : *processor) {
   std::cout << "event = " << idx << ", pt = " << *pt << std::endl;
}

std::cout << "processed " << processor->GetNEntriesProcessed() << " events" << std::endl; 

New interface

std::vector<RNTupleOpenSpec> ntuples = {{"ntuple1", "ntuple1.root"}, {"ntuple2", "ntuple2.root"}};
auto composer = RNTupleComposer::CreateChain(ntuples);

auto pt = composer->RequestField<float>("pt");

RNTupleProcessor processor(*composer);

for (const auto idx : processor) {
   std::cout << "event = " << idx << ", pt = " << *pt << std::endl;
}

std::cout << "processed " << processor.GetNEntriesProcessed() << " events" << std::endl; 

@enirolf enirolf requested review from hahnjo, pcanal and vepadulano June 15, 2026 13:29
@enirolf enirolf self-assigned this Jun 15, 2026
@enirolf enirolf marked this pull request as draft June 15, 2026 13:29
@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown

Test Results

    19 files      19 suites   2d 22h 24m 11s ⏱️
 3 846 tests  3 844 ✅ 0 💤  2 ❌
66 560 runs  66 546 ✅ 0 💤 14 ❌

For more details on these failures, see this check.

Results for commit ddaf87f.

♻️ This comment has been updated with latest results.

enirolf added 3 commits June 16, 2026 09:32
The composition of RNTuples can also serve as the backend for other
data loading interfaces, most notably `RDataSource`, which is fully
separate from the processing interface offered by `RNTupleProcessor`.
It therefore makes sense to split both components into their own class.
...to reflect the separation between `RNTupleProcessor` and
  `RNTupleComposer`.
@enirolf enirolf marked this pull request as ready for review June 16, 2026 07:38
@silverweed

Copy link
Copy Markdown
Contributor

I see the value in splitting up the functionality in two interfaces; that said, on the usability side I have two questions:

  • should we try to name them in a way that makes it obvious they are closely related? (For usability reasons, so the user doesn't have to keep in mind two different names but just two variations of the same one - a bit like RHistEngine/RHist);
  • should we also provide a shorthand to create the Processor directly, that uses the Composer internally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants