perf: speed up dependency registration in Bom.validate()#1007
perf: speed up dependency registration in Bom.validate()#1007inspired-geek wants to merge 2 commits into
Bom.validate()#1007Conversation
Bom.validate() ensured every component/service had a Dependency entry by calling register_dependency(), which finds existing entries via a linear scan over the dependency collection. Called once per component, this made validation -- and therefore JSON/XML serialization, which always validates -- O(n^2), stalling for minutes on BOMs with thousands of components. Resolve "already registered" through a set of refs instead, keeping the loop linear. Observable output is unchanged; a regression test covers it. Serializing an 8000-component BOM drops from ~6.7s to ~0.6s. Fixes CycloneDX#1006 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Alexey Ivanov <lexa.ivanov@gmail.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
| def test_regression_issue_1006(self) -> None: | ||
| """regression test for issue #1006 | ||
|
|
||
| ``Bom.validate()`` must register a Dependency entry for the metadata |
There was a problem hiding this comment.
i dont see how the test proves the expected behavior - a speed improvement. what am i missing?
could you please elaborate?
Bom.validate()
Adds test_regression_issue_1006_scales_linearly: it counts BomRef equality comparisons performed during Bom.validate() for n and 2n components and asserts the count grows ~linearly, not quadratically -- a CI-stable complexity guard that does not rely on wall-clock timing. It fails on the previous O(n^2) scan (~22k -> ~84k comparisons, ratio ~3.9) and passes on the indexed implementation (~1.6k -> ~3.7k, ratio ~2.2). Addresses review feedback on CycloneDX#1007. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Alexey Ivanov <lexa.ivanov@gmail.com>
|
@jkowalleck good question — you're right: To actually guard the complexity, I've pushed
So it fails on the regression and passes on the fix. Happy to reformulate (e.g. a different counter or threshold) if you'd prefer. |
Description
Bom.validate()ensured every component/service had aDependencyentry by callingregister_dependency()once per component.register_dependency()locates an existing entry with a linearnext(filter(...))scan over the dependency collection, so the registration loop is O(n²). Because the JSON/XML outputters always callvalidate()during serialization, serializing large BOMs (thousands of components) stalls for minutes.This resolves "already registered" via a
setof refs, keeping the loop linear. Observable output is unchanged;test_regression_issue_1006covers correctness.Benchmark — serializing a BOM of N library components (
output_as_string, JSON v1.6):Scaling goes from ~quadratic (≈4× per 2×N) to linear (≈2× per 2×N).
Resolves or fixes issue: #1006
AI Tool Disclosure
validate()→register_dependencylinear scan) and the set-based fix were produced with assistance; I reviewed the change and ran the benchmarks, regression test, and linters (isort/flake8/mypy) to verify it.Affirmation