Skip to content

[fix](mc) fix memory leak and optimize large data write for MaxCompute connector (#61245)#64449

Merged
morningman merged 1 commit into
apache:branch-4.0from
hubgeter:pick_40_mc_some_fix
Jun 12, 2026
Merged

[fix](mc) fix memory leak and optimize large data write for MaxCompute connector (#61245)#64449
morningman merged 1 commit into
apache:branch-4.0from
hubgeter:pick_40_mc_some_fix

Conversation

@hubgeter

@hubgeter hubgeter commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

pick #61245
Fix:

  • Fix potential memory leak in MaxComputeJniScanner by closing currentSplitReader in close().

Optimization:

  • mc.max_field_size_bytes: max field size in bytes for write session (default: 8MB)

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…e connector (apache#61245)

Fix:
- Fix potential memory leak in MaxComputeJniScanner by closing
  currentSplitReader in close().
- Fix potential memory leak in MaxComputeJniWriter by restructuring
  close() with try-finally to ensure allocator is always closed even
  when batchWriter.commit() throws. Also close VectorSchemaRoot after
  each batch write.
- Fix maxWriteBatchRows parameter key mismatch between BE
  ("max_write_batch_rows") and JNI ("mc.max_write_batch_rows"),
  which caused user-customized values to be silently ignored.

Optimization:
- Split large Arrow batches into smaller chunks (controlled by
  mc.max_write_batch_rows, default 4096) to avoid HTTP 413 Request
  Entity Too Large errors from MaxCompute Storage API.
- Skip unnecessary SORT node for static partition INSERT, since all
  data goes to a single known partition and no dynamic routing is
  needed.
- Enable ZSTD compression for Arrow data transfer to reduce network
  bandwidth.
New catalog properties:
- mc.max_write_batch_rows: max rows per Arrow batch for write
  (default: 4096)
- mc.max_field_size_bytes: max field size in bytes for write session
  (default: 8MB)

Co-authored-by: daidai <changyuwei@selectdb.com>
@hubgeter hubgeter requested a review from morningman as a code owner June 12, 2026 07:06
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter

Copy link
Copy Markdown
Contributor Author

run buildall

@morningman morningman merged commit 6f5ef0a into apache:branch-4.0 Jun 12, 2026
29 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants