Add multi GPU CI job for libcu++#9435
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
OverviewThis PR adds a 2-GPU CI job to Changes
|
| Layer / File(s) | Summary |
|---|---|
GPU configuration schema with name and runner fields ci/matrix.yaml |
GPU definitions now include name and runner fields alongside sm; new h100_2gpu model and updated rtxpro6000 runner labels are added. |
GPU validation and build-time job naming/runner generation build-workflow.py |
get_gpu validates that GPU definitions include required name, runner, and sm fields; generate_dispatch_job_name appends gpu["name"] to job display names; generate_dispatch_job_runner uses gpu["runner"] instead of gpu["id"]-latest-1. |
Multi-GPU Docker device selection at runtime action.yml |
GPU device selection logic now detects multi-GPU runner labels via regex and requests --gpus all for counts > 1, otherwise uses prior device=${NVIDIA_VISIBLE_DEVICES:-} behavior. |
Multi-GPU test matrix entry for h100_2gpu ci/matrix.yaml |
New pull_request matrix job entry for libcudacxx targeting h100_2gpu GPU configuration. |
Comment @coderabbitai help to get the list of available commands and usage tips.
😬 CI Workflow Results🟥 Finished in 1h 44m: Pass: 99%/505 | Total: 4d 12h | Max: 54m 27s | Hits: 98%/655116See results here. |
We could use multi GPU CI jobs to test interactions of cccl-rt with multiple GPUs. This PR adds a 2 GPU job to matrix.yaml
Needed to update how the name to label translation works to support the 2 GPU runners