Build and test with CUDA 13.3.0#1409
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR updates CI/workflows and dependency/environment configs to CUDA 13.3, introduces cuSPARSE 12.7/12.8+ runtime-dispatch for SpMVOp symbols (dlsym + runtime probe), and adjusts lexicographic-search sentinel initializers. ChangesCUDA 13.3 Version Upgrade
cuSPARSE SpMVOp runtime dispatch
Routing lexicographic sentinel fix
🎯 4 (Complex) | ⏱️ ~45 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
This project needs some nontrivial C++ changes to support CUDA 13.3. I'll push those shortly. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1431-1443: is_cusparse_runtime_spmvop_supported() currently only
checks cusparseSpMVOp and can enable SpMVOp plans even when other required
symbols are missing; update this function to probe the full set of entry points
before returning true by checking
dynamic_load_runtime::function<...>("symbol").has_value() for each required
symbol (at least cusparseSpMVOp_bufferSize, cusparseSpMVOp_createDescr,
cusparseSpMVOp_createPlan, cusparseSpMVOp_destroyDescr,
cusparseSpMVOp_destroyPlan in addition to cusparseSpMVOp), and only return true
if all probes succeed; also ensure callers/wrappers referenced
(cusparse_spmvop_buffer_size,
cusparse_spmvop_descr_wrapper_t::dlsym_create/dlsym_destroy,
cusparse_spmvop_plan_wrapper_t::dlsym_create/dlsym_destroy) will not dereference
functions unless has_value() was confirmed by this probe.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5c5edb31-52bb-4a82-ae54-e282591b0a7d
📒 Files selected for processing (4)
cpp/src/pdlp/cusparse_view.cucpp/src/pdlp/cusparse_view.hppcpp/src/routing/ges/lexicographic_search/brute_force_lexico.cucpp/src/routing/ges/lexicographic_search/lexicographic_search.cu
| #if CUOPT_CUSPARSE_VER_12_7_UP | ||
| #if !CUOPT_CUSPARSE_VER_12_8_UP | ||
| // Headers older than cuSPARSE 12.8 (CUDA Toolkit 13.3) cannot name the newer SpMVOp | ||
| // descriptor types, so do not call the older 12.7 signature against a 12.8+ runtime. | ||
| if (is_cusparse_runtime_12_8_or_newer()) { return false; } | ||
| #endif // !CUOPT_CUSPARSE_VER_12_8_UP | ||
| // Probe the runtime to ensure cusparseSpMVOp is supported. | ||
| static const bool supported = | ||
| dynamic_load_runtime::function<cusparseSpMVOp_sig>("cusparseSpMVOp").has_value(); | ||
| return supported; | ||
| #else | ||
| return false; | ||
| #endif | ||
| #endif // CUOPT_CUSPARSE_VER_12_7_UP |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate occurrences
rg -n "is_cusparse_runtime_spmvop_supported|create_spmv_op_plans|cusparseSpMVOp_bufferSize|cusparseSpMVOp_createDescr|cusparseSpMVOp_createPlan|cusparseSpMVOp_destroyDescr|cusparseSpMVOp_destroyPlan|dlsym_create" cpp/src/pdlp/cusparse_view.cu
# Show surrounding code for the probe and gating
sed -n '1400,1520p' cpp/src/pdlp/cusparse_view.cu
# Show where create_spmv_op_plans uses the SpMVOp helpers
rg -n "create_spmv_op_plans" -n cpp/src/pdlp/cusparse_view.cu
python3 - <<'PY'
import re
path="cpp/src/pdlp/cusparse_view.cu"
with open(path,'r',encoding='utf-8') as f:
lines=f.readlines()
for i,l in enumerate(lines,1):
if "create_spmv_op_plans" in l:
start=max(1,i-40); end=min(len(lines), i+140)
print("\n--- around line",i,"---")
print("".join(f"{j:5d}: {lines[j-1]}" for j in range(start,end+1)))
PYRepository: NVIDIA/cuopt
Length of output: 15226
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate occurrences
rg -n "is_cusparse_runtime_spmvop_supported|create_spmv_op_plans|cusparseSpMVOp_bufferSize|cusparseSpMVOp_createDescr|cusparseSpMVOp_createPlan|cusparseSpMVOp_destroyDescr|cusparseSpMVOp_destroyPlan|dlsym_create" cpp/src/pdlp/cusparse_view.cu
# Show surrounding code for the probe and gating
sed -n '1400,1520p' cpp/src/pdlp/cusparse_view.cu
# Show where create_spmv_op_plans uses the SpMVOp helpers
rg -n "create_spmv_op_plans" -n cpp/src/pdlp/cusparse_view.cu
python3 - <<'PY'
import re
path="cpp/src/pdlp/cusparse_view.cu"
with open(path,'r',encoding='utf-8') as f:
lines=f.readlines()
for i,l in enumerate(lines,1):
if "create_spmv_op_plans" in l:
start=max(1,i-40); end=min(len(lines), i+140)
print("\n--- around line",i,"---")
for j in range(start,end+1):
print(f"{j:5d}: {lines[j-1]}", end='')
PYRepository: NVIDIA/cuopt
Length of output: 15225
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "is_cusparse_runtime_spmvop_supported|create_spmv_op_plans|cusparseSpMVOp_bufferSize|cusparseSpMVOp_createDescr|cusparseSpMVOp_createPlan|cusparseSpMVOp_destroyDescr|cusparseSpMVOp_destroyPlan|dlsym_create|has_value\(\)" cpp/src/pdlp/cusparse_view.cu | head -n 200
sed -n '1410,1495p' cpp/src/pdlp/cusparse_view.cu
# narrow: extract blocks defining cusparse_spmvop_* wrappers and their dlsym_create/destroy
rg -n "cusparse_spmvop|spmvop_plan_wrapper_t|spmvop_descr_wrapper_t|spmvop_buffer_size" cpp/src/pdlp/cusparse_view.cu | head -n 200
# show a bit more around those wrappers if found
python3 - <<'PY'
import re
path="cpp/src/pdlp/cusparse_view.cu"
with open(path,'r',encoding='utf-8') as f:
lines=f.readlines()
need_terms=[
"cusparseSpMVOp_bufferSize",
"cusparseSpMVOp_createDescr",
"cusparseSpMVOp_createPlan",
"cusparseSpMVOp_destroyDescr",
"cusparseSpMVOp_destroyPlan"
]
hits=[]
for i,l in enumerate(lines,1):
if any(t in l for t in need_terms) and "function<" in l:
hits.append(i)
print("function<...> hits:", hits[:40], "count", len(hits))
for i in hits[:20]:
start=max(1,i-15); end=min(len(lines), i+25)
print("\n--- around",i,"---")
for j in range(start,end+1):
print(f"{j:5d}: {lines[j-1]}", end='')
PYRepository: NVIDIA/cuopt
Length of output: 23898
Probe the full SpMVOp symbol set (not just cusparseSpMVOp) before enabling SpMVOp plans
is_cusparse_runtime_spmvop_supported() only checks cusparseSpMVOp, but the SpMVOp plan setup path calls wrappers that dynamically load additional entry points and then immediately dereference (*fn) without has_value() guards (e.g., in cusparse_spmvop_buffer_size, cusparse_spmvop_descr_wrapper_t::dlsym_create/dlsym_destroy, and cusparse_spmvop_plan_wrapper_t::dlsym_create/dlsym_destroy). If any of these symbols are missing due to header/runtime skew, this can crash instead of cleanly falling back.
At minimum, also probe these before returning true:
cusparseSpMVOp_bufferSizecusparseSpMVOp_createDescrcusparseSpMVOp_createPlancusparseSpMVOp_destroyDescrcusparseSpMVOp_destroyPlan
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/pdlp/cusparse_view.cu` around lines 1431 - 1443,
is_cusparse_runtime_spmvop_supported() currently only checks cusparseSpMVOp and
can enable SpMVOp plans even when other required symbols are missing; update
this function to probe the full set of entry points before returning true by
checking dynamic_load_runtime::function<...>("symbol").has_value() for each
required symbol (at least cusparseSpMVOp_bufferSize, cusparseSpMVOp_createDescr,
cusparseSpMVOp_createPlan, cusparseSpMVOp_destroyDescr,
cusparseSpMVOp_destroyPlan in addition to cusparseSpMVOp), and only return true
if all probes succeed; also ensure callers/wrappers referenced
(cusparse_spmvop_buffer_size,
cusparse_spmvop_descr_wrapper_t::dlsym_create/dlsym_destroy,
cusparse_spmvop_plan_wrapper_t::dlsym_create/dlsym_destroy) will not dereference
functions unless has_value() was confirmed by this probe.
| #if !CUOPT_CUSPARSE_VER_12_8_UP | ||
| // Headers older than cuSPARSE 12.8 (CUDA Toolkit 13.3) cannot name the newer SpMVOp | ||
| // descriptor types, so do not call the older 12.7 signature against a 12.8+ runtime. | ||
| if (is_cusparse_runtime_12_8_or_newer()) { return false; } |
There was a problem hiding this comment.
Do I understand correctly that this function will return false if compiled with CUDA 13.2 and run on 13.3? And true if compiled with 13.3 and run on 13.2?
There was a problem hiding this comment.
That's my understanding as well. Should we log here that SpMVOp is disabled to avoid performance surprises?
There was a problem hiding this comment.
I think we won't distribute binaries compiled on cuda 13.2 so it seems like a rare case to hit.
There was a problem hiding this comment.
Do I understand correctly that this function will return false if compiled with CUDA 13.2 and run on 13.3? And true if compiled with 13.3 and run on 13.2?
Yes, that is correct.
In theory since we are using dlsym I think we could leave it all up to runtime information, and let builds from 13.2 use features from 13.3 (since we have already hardcoded the signature of the newer library...) but that felt a bit strange to me and I didn't want to propose that as a tested, supported case.
Co-authored-by: Miles Lubin <miles.lubin@gmail.com>
|
What do we think about simply removing support for SpMVOp in cusparse 12.7 and only support it in 12.8 ? The material implications this have will be that the ~15% speedup will be lost on cuda 13.2 but gained back on cuda 13.3 This would make this versionning juggle inexistent and less error prone. |
IMO it's worth a bit of logic for the 15% speedup on cuda 13.2 runtimes. |
|
It seems very dangerous and crumblesome, only for an edge case that might never happen. Our customers that need the fastest, bleeding edge Cuopt will be using 13.3 anyways, right ? |
Some NVIDIA environments including brev still have containers with 12.9, so you may be assuming too much :) |
|
Then this also means having a CI ensuring 13.2 does not break, in addition to 13.3. |
|
I don't think CI coverage is necessary but some manual testing is called for. @bdice is there any way we could test running the 13.3 compiled version on a 13.2 runtime? If not now then blocking before this code is released. |
|
/merge |
|
@mlubin Thanks for the review and help with merging. If you ever do need explicit jobs for 13.2 to be added to your CI matrix, you can do something like this: https://gist.github.com/bdice/ea5b000be16ac4a8b3e19d984048840f |
Contributes to rapidsai/build-planning#286
Notes for Reviewers
This switches GitHub Actions workflows to the
cuda-13.3.0branch from here: rapidsai/shared-workflows#574A future round of PRs will revert that back to
main, once all of RAPIDS is migrated.