Add multi GPU CI job for libcu++ by pciolkosz · Pull Request #9435 · NVIDIA/cccl

pciolkosz · 2026-06-12T21:43:49Z

We could use multi GPU CI jobs to test interactions of cccl-rt with multiple GPUs. This PR adds a 2 GPU job to matrix.yaml

Needed to update how the name to label translation works to support the 2 GPU runners

coderabbitai · 2026-06-12T21:49:19Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cea86613-d6e5-4b94-a594-b4fc3b232e0f

📥 Commits

Reviewing files that changed from the base of the PR and between 9e1588a and 5b44eda.

📒 Files selected for processing (3)

.github/actions/workflow-build/build-workflow.py
.github/actions/workflow-run-job-linux/action.yml
ci/matrix.yaml

Note: CodeRabbit is enabled on this repository as a convenience for maintainers
and contributors. Use your best judgment when considering its review comments and
suggestions — a suggested change may be inadequate, unnecessary, or safe to ignore.
Contributors are not expected to address every comment. Human reviews are what
ultimately matter for merging.

Overview

This PR adds a 2-GPU CI job to ci/matrix.yaml for the libcudacxx project to test interactions with multiple GPUs. It also updates the GPU configuration handling in build and run workflow files to properly support multi-GPU runners.

Changes

`.github/actions/workflow-build/build-workflow.py`

Enhanced get_gpu function to validate that each GPU definition includes required fields (name, runner, sm), raising an exception when missing
Updated generate_dispatch_job_name to use gpu["name"] (with ", " prefix) for GPU job labels instead of constructing uppercase strings from raw GPU tags
Modified generate_dispatch_job_runner to use gpu["runner"] directly (plus -testing suffix when applicable) instead of using gpu["id"] with a hardcoded -latest-1 suffix

`.github/actions/workflow-run-job-linux/action.yml`

Updated Docker GPU access logic on *-gpu-* runners to conditionally handle multi-GPU setups
For multi-GPU runners (GPU count > 1), requests all GPUs via --gpus all
For single-GPU runners, continues to use --gpus "device=${NVIDIA_VISIBLE_DEVICES:-}"

`ci/matrix.yaml`

Added new multi-GPU CI job for libcudacxx: a test job targeting gpu: h100_2gpu with sm: gpu
Restructured GPU configuration section to include display name and specific runner label for each GPU entry
Added h100_2gpu GPU model with associated runner label
Updated rtxpro6000 entry to include runner label

Walkthrough

The PR updates GPU configuration and handling across CI workflow generation and execution. GPU definitions in the matrix gain structured name and runner fields; build script validation enforces these required fields and uses them for job naming and runner selection; runtime logic detects multi-GPU runners and conditionally selects Docker GPU device access; a new h100_2gpu test entry exercises the updated configuration.

Changes

GPU configuration and multi-GPU execution

Layer / File(s)	Summary
GPU configuration schema with name and runner fields `ci/matrix.yaml`	GPU definitions now include `name` and `runner` fields alongside `sm`; new `h100_2gpu` model and updated `rtxpro6000` runner labels are added.
GPU validation and build-time job naming/runner generation `build-workflow.py`	`get_gpu` validates that GPU definitions include required `name`, `runner`, and `sm` fields; `generate_dispatch_job_name` appends `gpu["name"]` to job display names; `generate_dispatch_job_runner` uses `gpu["runner"]` instead of `gpu["id"]-latest-1`.
Multi-GPU Docker device selection at runtime `action.yml`	GPU device selection logic now detects multi-GPU runner labels via regex and requests `--gpus all` for counts > 1, otherwise uses prior `device=${NVIDIA_VISIBLE_DEVICES:-}` behavior.
Multi-GPU test matrix entry for h100_2gpu `ci/matrix.yaml`	New `pull_request` matrix job entry for libcudacxx targeting `h100_2gpu` GPU configuration.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-12T23:31:21Z

😬 CI Workflow Results

🟥 Finished in 1h 44m: Pass: 99%/505 | Total: 4d 12h | Max: 54m 27s | Hits: 98%/655116

See results here.

Add multi GPU CI job for libcu++

5b44eda

pciolkosz requested a review from a team as a code owner June 12, 2026 21:43

pciolkosz requested a review from jrhemstad June 12, 2026 21:43

github-project-automation Bot added this to CCCL Jun 12, 2026

github-project-automation Bot moved this to Todo in CCCL Jun 12, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi GPU CI job for libcu++#9435

Add multi GPU CI job for libcu++#9435
pciolkosz wants to merge 1 commit into
NVIDIA:mainfrom
pciolkosz:multi_gpu_CI_jobs_for_libcudacxx

pciolkosz commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pciolkosz commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026

Overview

Changes

.github/actions/workflow-build/build-workflow.py

.github/actions/workflow-run-job-linux/action.yml

ci/matrix.yaml

Walkthrough

Changes

Uh oh!

github-actions Bot commented Jun 12, 2026

😬 CI Workflow Results

🟥 Finished in 1h 44m: Pass: 99%/505 | Total: 4d 12h | Max: 54m 27s | Hits: 98%/655116

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`.github/actions/workflow-build/build-workflow.py`

`.github/actions/workflow-run-job-linux/action.yml`

`ci/matrix.yaml`