Skip to content

perf: speed up dependency registration in Bom.validate()#1007

Open
inspired-geek wants to merge 2 commits into
CycloneDX:mainfrom
inspired-geek:fix/1006-quadratic-dependency-registration
Open

perf: speed up dependency registration in Bom.validate()#1007
inspired-geek wants to merge 2 commits into
CycloneDX:mainfrom
inspired-geek:fix/1006-quadratic-dependency-registration

Conversation

@inspired-geek

Copy link
Copy Markdown

Description

Bom.validate() ensured every component/service had a Dependency entry by calling register_dependency() once per component. register_dependency() locates an existing entry with a linear next(filter(...)) scan over the dependency collection, so the registration loop is O(n²). Because the JSON/XML outputters always call validate() during serialization, serializing large BOMs (thousands of components) stalls for minutes.

This resolves "already registered" via a set of refs, keeping the loop linear. Observable output is unchanged; test_regression_issue_1006 covers correctness.

Benchmark — serializing a BOM of N library components (output_as_string, JSON v1.6):

N before after
1000 0.14s 0.05s
2000 0.48s 0.13s
4000 1.75s 0.26s
8000 6.70s 0.63s

Scaling goes from ~quadratic (≈4× per 2×N) to linear (≈2× per 2×N).

Resolves or fixes issue: #1006

AI Tool Disclosure

  • My contribution includes AI-generated content, as disclosed below:
    • AI Tools: Claude Code
    • LLMs and versions: Claude Opus 4.8
    • Prompts: Investigate slow SBOM serialization on large BOMs, identify the O(n²) hotspot, and propose a minimal fix with a regression test. The diagnosis (validate()register_dependency linear scan) and the set-based fix were produced with assistance; I reviewed the change and ran the benchmarks, regression test, and linters (isort/flake8/mypy) to verify it.

Affirmation

Bom.validate() ensured every component/service had a Dependency entry by
calling register_dependency(), which finds existing entries via a linear
scan over the dependency collection. Called once per component, this made
validation -- and therefore JSON/XML serialization, which always
validates -- O(n^2), stalling for minutes on BOMs with thousands of
components.

Resolve "already registered" through a set of refs instead, keeping the
loop linear. Observable output is unchanged; a regression test covers it.

Serializing an 8000-component BOM drops from ~6.7s to ~0.6s.

Fixes CycloneDX#1006

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Alexey Ivanov <lexa.ivanov@gmail.com>
@inspired-geek inspired-geek requested a review from a team as a code owner June 23, 2026 19:59
@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 duplication

Metric Results
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Comment thread tests/test_model_bom.py
def test_regression_issue_1006(self) -> None:
"""regression test for issue #1006

``Bom.validate()`` must register a Dependency entry for the metadata

@jkowalleck jkowalleck Jun 24, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont see how the test proves the expected behavior - a speed improvement. what am i missing?
could you please elaborate?

@jkowalleck jkowalleck changed the title Fix O(n^2) dependency registration in Bom.validate() perf: speed up dependency registration in Bom.validate() Jun 24, 2026
@jkowalleck jkowalleck self-requested a review June 24, 2026 09:11
Adds test_regression_issue_1006_scales_linearly: it counts BomRef equality
comparisons performed during Bom.validate() for n and 2n components and
asserts the count grows ~linearly, not quadratically -- a CI-stable
complexity guard that does not rely on wall-clock timing. It fails on the
previous O(n^2) scan (~22k -> ~84k comparisons, ratio ~3.9) and passes on
the indexed implementation (~1.6k -> ~3.7k, ratio ~2.2).

Addresses review feedback on CycloneDX#1007.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: Alexey Ivanov <lexa.ivanov@gmail.com>
@inspired-geek

Copy link
Copy Markdown
Author

@jkowalleck good question — you're right: test_regression_issue_1006 only guards correctness (that the change doesn't alter which dependencies end up registered). The speedup itself is shown by the benchmark in the PR description, and I deliberately avoided a wall-clock assertion since timing tests are flaky in CI.

To actually guard the complexity, I've pushed test_regression_issue_1006_scales_linearly: it counts BomRef equality comparisons performed during validate() for n and 2n components and asserts the count grows ~linearly rather than ~quadratically — no timing involved, so it's CI-stable.

  • previous O(n²) scan: ~22k → ~84k comparisons (ratio ~3.9) → test fails
  • indexed implementation: ~1.6k → ~3.7k comparisons (ratio ~2.2) → test passes

So it fails on the regression and passes on the fix. Happy to reformulate (e.g. a different counter or threshold) if you'd prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[PERF] Quadratic (O(N^2)) serialization time for large BOMs — Bom.validate()register_dependency() linear scan

2 participants