Skip to content

grpc fix a race condition with log streaming#1426

Merged
rapids-bot[bot] merged 1 commit into
NVIDIA:mainfrom
tmckayus:fix/grpc-streamlogs-race
Jun 12, 2026
Merged

grpc fix a race condition with log streaming#1426
rapids-bot[bot] merged 1 commit into
NVIDIA:mainfrom
tmckayus:fix/grpc-streamlogs-race

Conversation

@tmckayus

Copy link
Copy Markdown
Contributor

For very quickly solved problems, the log file might not exist before the job is marked complete, or the client might start streaming before the file exists. Create the log at submission time to avoid this.

For very quickly solved problems, the log file might not
exist before the job is marked complete, or the client
might start streaming before the file exists. Create the log
at submission time to avoid this.
@tmckayus tmckayus requested a review from a team as a code owner June 11, 2026 22:41
@tmckayus tmckayus added bug Something isn't working non-breaking Introduces a non-breaking change labels Jun 11, 2026
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ae8cbf35-a6cc-4333-be05-04c5e33777c7

📥 Commits

Reviewing files that changed from the base of the PR and between 49286e7 and 418f80c.

📒 Files selected for processing (3)
  • cpp/src/grpc/server/grpc_job_management.cpp
  • cpp/src/grpc/server/grpc_server_types.hpp
  • cpp/src/grpc/server/grpc_service_impl.cpp

📝 Walkthrough

Walkthrough

This PR ensures job log files exist at submission time and refactors log streaming to reliably detect file availability. After job ID generation, both submission paths create an empty per-job log file. The StreamLogs implementation is refactored to check file existence via stat(), poll until the job completes when missing, and centralize message construction through a helper lambda for consistent streaming behavior.

Changes

Log File Creation and Streaming

Layer / File(s) Summary
Job log file creation at submission
cpp/src/grpc/server/grpc_job_management.cpp, cpp/src/grpc/server/grpc_server_types.hpp
After job ID generation in both submit_job_async and submit_chunked_job_async, create_job_log_file(job_id) is called to create and truncate the per-job log file. The helper function ensures the log directory exists via ensure_log_dir_exists and opens the file with std::ofstream, making it available for streaming from submission time onward.
StreamLogs refactoring for robust file handling
cpp/src/grpc/server/grpc_service_impl.cpp
StreamLogs now performs an immediate stat() check to determine log file presence. A helper lambda centralizes LogMessage construction and writer->Write() calls. When the file is missing, the code polls job status until the job leaves QUEUED/PROCESSING states before emitting a completion sentinel. The tail loop and EOF handling use the lambda consistently with explicit job_complete flags, and the final completion sentinel uses computed current_offset semantics.

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'grpc fix a race condition with log streaming' clearly and concisely describes the main change: fixing a race condition in gRPC log streaming by creating log files at job submission time.
Description check ✅ Passed The description is directly related to the changeset, explaining the race condition problem (log file not existing before job completion or before client starts streaming) and the solution (creating the log at submission time).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@tmckayus

Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot Bot merged commit 1ec39b3 into NVIDIA:main Jun 12, 2026
100 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants