Skip to content

[draft](catalog) Master catalog spi 07 paimon#64445

Open
morningman wants to merge 54 commits into
apache:masterfrom
morningman:master-catalog-spi-07-paimon
Open

[draft](catalog) Master catalog spi 07 paimon#64445
morningman wants to merge 54 commits into
apache:masterfrom
morningman:master-catalog-spi-07-paimon

Conversation

@morningman

Copy link
Copy Markdown
Contributor

only for testing

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29675 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit af2037cf13b39b5877fdca1ad3e11c9a4873724f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17747	4362	4360	4360
q2	q3	10778	1470	816	816
q4	4678	493	352	352
q5	7559	893	578	578
q6	192	185	143	143
q7	784	846	624	624
q8	9330	1635	1563	1563
q9	5895	4494	4478	4478
q10	6758	1849	1525	1525
q11	435	271	247	247
q12	637	433	305	305
q13	18104	3558	2700	2700
q14	280	276	254	254
q15	q16	840	784	713	713
q17	947	930	1007	930
q18	7269	5843	5641	5641
q19	1293	1360	1081	1081
q20	525	417	277	277
q21	6360	2949	2740	2740
q22	471	396	348	348
Total cold run time: 100882 ms
Total hot run time: 29675 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5517	5156	5201	5156
q2	q3	5064	5493	4791	4791
q4	2413	2453	1590	1590
q5	5027	5061	4961	4961
q6	259	190	137	137
q7	2006	1911	1700	1700
q8	2712	2336	2376	2336
q9	8242	7977	7652	7652
q10	4863	4835	4363	4363
q11	591	431	392	392
q12	810	802	593	593
q13	3025	3470	2811	2811
q14	287	288	273	273
q15	q16	721	727	638	638
q17	1306	1295	1291	1291
q18	7738	7152	6983	6983
q19	1162	1075	1109	1075
q20	2277	2276	1994	1994
q21	5598	4973	4829	4829
q22	552	496	433	433
Total cold run time: 60170 ms
Total hot run time: 53998 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 168026 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit af2037cf13b39b5877fdca1ad3e11c9a4873724f, data reload: false

query5	4318	625	462	462
query6	473	210	178	178
query7	4855	499	312	312
query8	360	217	201	201
query9	8775	4061	4023	4023
query10	442	310	256	256
query11	5903	2378	2210	2210
query12	158	107	114	107
query13	1266	605	437	437
query14	6397	5413	5048	5048
query14_1	4421	4402	4380	4380
query15	205	199	176	176
query16	973	436	459	436
query17	1134	702	588	588
query18	2500	474	338	338
query19	201	186	155	155
query20	110	110	107	107
query21	220	137	148	137
query22	13729	13662	13379	13379
query23	17338	16518	16153	16153
query23_1	16307	16181	16432	16181
query24	7609	1762	1312	1312
query24_1	1322	1290	1317	1290
query25	554	426	359	359
query26	1288	307	163	163
query27	2723	567	332	332
query28	4458	2050	2019	2019
query29	1025	594	468	468
query30	306	243	205	205
query31	1111	1065	951	951
query32	103	58	57	57
query33	516	303	242	242
query34	1196	1186	674	674
query35	736	758	679	679
query36	1405	1392	1233	1233
query37	153	103	91	91
query38	3209	3147	3051	3051
query39	938	908	892	892
query39_1	882	871	870	870
query40	217	119	97	97
query41	63	63	61	61
query42	94	94	91	91
query43	323	327	279	279
query44	
query45	196	183	177	177
query46	1027	1212	735	735
query47	2289	2341	2262	2262
query48	404	395	300	300
query49	608	450	349	349
query50	956	354	247	247
query51	4303	4261	4258	4258
query52	90	87	75	75
query53	233	263	188	188
query54	270	207	234	207
query55	78	73	68	68
query56	246	219	214	214
query57	1414	1406	1290	1290
query58	233	208	203	203
query59	1547	1670	1417	1417
query60	304	248	242	242
query61	175	175	171	171
query62	708	665	578	578
query63	231	189	197	189
query64	2592	815	653	653
query65	
query66	1791	468	354	354
query67	29719	29045	29616	29045
query68	
query69	428	310	269	269
query70	964	970	948	948
query71	313	226	218	218
query72	2883	2125	2330	2125
query73	812	789	435	435
query74	5148	4936	4748	4748
query75	2634	2564	2219	2219
query76	2345	1158	804	804
query77	350	376	278	278
query78	12374	12338	11926	11926
query79	1398	993	771	771
query80	708	473	374	374
query81	476	282	240	240
query82	553	155	120	120
query83	343	279	240	240
query84	
query85	864	495	444	444
query86	414	298	283	283
query87	3390	3338	3225	3225
query88	3622	2720	2713	2713
query89	429	376	327	327
query90	1813	180	177	177
query91	167	154	131	131
query92	62	59	58	58
query93	1421	1380	871	871
query94	607	343	324	324
query95	683	487	335	335
query96	1113	795	358	358
query97	2675	2673	2538	2538
query98	210	212	206	206
query99	1141	1177	1016	1016
Total cold run time: 250501 ms
Total hot run time: 168026 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 62.91% (687/1092) 🎉
Increment coverage report
Complete coverage report

morningman and others added 17 commits June 12, 2026 22:20
This multi-month refactor needs persistent state for progress, decisions,
risks, and cross-session agent handoff. Establishes a file-based tracking
system including dashboard, ADR decision log, deviation log, risk register,
per-stage task files, per-connector tracking, and an agent collaboration
playbook covering context budget / subagent usage / handoff norms. Closes
18 design decisions (D-001..D-018) and registers 14 risks (R-001..R-014).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…T27) (apache#63582)

## Summary

Lands the P0 SPI baseline for the catalog-SPI migration (master plan
§3.1 / RFC §2.1), with zero impact on the already-migrated JDBC + ES
connectors.

- **Batch 0** (commits 1-2): SPI types + fe-core bridges —
`ConnectorMetaInvalidator`, `ConnectorTransaction`,
`ConnectorMvccSnapshot`, `ExternalMetaCacheInvalidator`,
`ConnectorMvccSnapshotAdapter`, `PluginDrivenTransactionManager`
generalization.
- **Batch 1** (commit 3): DDL + Partition SPI —
`ConnectorCreateTableRequest` + 4 spec POJOs, 4 new defaults on
`ConnectorTableOps`, 3 new fields on `ConnectorPartitionInfo`, fe-core
converter, `PluginDrivenExternalCatalog.createTable` routing.
- **Batch 2** (commit 4): Import-gate + unit tests —
`tools/check-connector-imports.sh` wired through exec-maven-plugin;
`FakeConnectorPlugin` covering every default fall-through; routing tests
for the invalidator; converter tests for all 4 partition styles + 2
bucket flavors.

## Commits

- `[feat](connector) add P0 batch 0 SPI baseline: MetaInvalidator /
Transaction / MvccSnapshot` (T03-T08)
- `[feat](connector) wire P0 batch 0 SPI into fe-core` (T09-T12)
- `[feat](connector) add P0 batch 1 SPI: CreateTableRequest +
listPartitions` (T13-T20)
- `[feat](connector) add P0 batch 2 gate + unit tests` (T21-T23,
T26-T27)

## Test plan

- [x] `mvn -pl
fe-connector/fe-connector-api,fe-connector/fe-connector-spi -am compile`
— SPI modules compile
- [x] `mvn -pl fe-core -am compile -Dmaven.build.cache.enabled=false` —
fe-core compile
- [x] `mvn -pl fe-core checkstyle:check` — 0 violations
- [x] `mvn -pl fe-connector validate` — import gate runs and passes
(baseline clean)
- [x] `mvn -pl fe-core -am test
-Dtest='FakeConnectorPluginTest,ExternalMetaCacheInvalidatorTest,CreateTableInfoToConnectorRequestConverterTest,ConnectorPluginManagerTest,ConnectorSessionImplTest'`
— 39/39 green
- [x] `mvn -pl
fe-connector/fe-connector-jdbc,fe-connector/fe-connector-es -am compile`
— downstream connectors compile unchanged
- [ ] JDBC regression-test suite (T24) — to be exercised by this PR's CI
pipeline
- [ ] ES regression-test suite (T25) — to be exercised by this PR's CI
pipeline

## Tracking

Full plan, decisions, and risk log live under `plan-doc/` in the repo
(introduced by 6315983, already on the base branch). Per-task
status: `plan-doc/tasks/P0-spi-foundation.md`.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pache#63641)

## Summary

P1 batch A — close out scan-node SPI consolidation while keeping
migration-period fallbacks in place. Three surgical changes route
`PluginDrivenExternalTable` first in the nereids translator hot paths so
already-migrated SPI connectors (JDBC, ES) take the SPI route, while the
existing `instanceof XExternalTable` chains remain as fallbacks for
connectors still pending migration (P3–P7).

- **T3** — `PhysicalPlanTranslator.visitPhysicalFileScan`: move the
existing `PluginDrivenExternalTable` branch from position 8 to position
1; the 7 connector-specific branches (HMS / Iceberg / Paimon / Trino /
MaxCompute / LakeSoul / RemoteDoris) stay in place as migration-period
fallbacks
- **T4** — `PhysicalPlanTranslator.visitPhysicalHudiScan`: add a
`PluginDrivenExternalTable` branch routed to
`PluginDrivenScanNode.create(...)`, threading `tableSnapshot` +
`scanParams` through `FileQueryScanNode` setters; `incrementalRelation`
flagged as a P3 Hudi SPI extension TODO. The new branch is unreachable
today (`PhysicalHudiScan` is only built for `HMSExternalTable +
DLAType.HUDI`), so this is groundwork for P3 with zero current-day
runtime impact
- **T5** — `LogicalFileScan`: in `computeOutput()`, add a
`PluginDrivenExternalTable` branch calling new helper
`computePluginDrivenOutput()` — same shape as `computeIcebergOutput`,
using `getFullSchema()` + virtualColumns; in
`supportPruneNestedColumn()`, add an explicit `PluginDrivenExternalTable
→ false` branch. Both behaviorally equivalent for JDBC/ES today since
they have no hidden cols and no virtualColumns

P1 batch B (T1 — delete 13 legacy `Jdbc*Client` + `JdbcFieldSchema`) is
deferred to P8 because the 3 fe-core callers —
`PostgresResourceValidator`, `StreamingJobUtils`,
`CdcStreamTableValuedFunction` — are live CDC streaming code that
requires SPI extension for `getPrimaryKeys` / `getColumnsFromJdbc` /
`listTables`, which is out of P1 surgical scope.

Background and tracking docs live in `plan-doc/` (Master Plan §3.2 P1,
tasks/P1-scan-node-cleanup.md, decisions log).

## Test plan

- [x] `mvn -pl fe-core -am compile -Dmaven.build.cache.enabled=false` →
BUILD SUCCESS
- [x] `mvn -pl fe-core checkstyle:check` → 0 violations
- [x] JDBC + ES regression-test passing — baseline established in P0 /
PR apache#63582
- [ ] PR CI green on this PR
- [ ] Manual scan-node smoke for an SPI connector — JDBC `SELECT *`
should fall into the new `PluginDrivenExternalTable` branch first

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…apache#64096)

### What problem does this PR solve?

Related PR: apache#63582 (P0 — SPI baseline), apache#63641 (P1 — nereids
plugin-driven routing)

Problem Summary:

This is **P2** of the catalog SPI migration and targets the
`branch-catalog-spi` feature branch (continuing P0 apache#63582 and P1
apache#63641). It fully migrates `trino-connector` off the legacy in-tree
`fe-core/datasource/trinoconnector/` implementation and onto the
connector SPI module `fe-connector-trino`, making `trino-connector` the
first connector to complete the SPI consumption playbook that later
connectors will reuse as a template.

All five batches land together so there is no intermediate state where a
newly-created trino catalog cannot be serialized.

**Batch A — complete the SPI surface (`fe-connector-trino` only, no
fe-core changes)**
- `TrinoConnectorProvider.validateProperties`: enforce the required
`trino.connector.name` property at `CREATE CATALOG` time (ported from
the legacy `checkProperties`).
- `TrinoDorisConnector.preCreateValidation`: call `ensureInitialized()`
so plugin loading + connector-factory resolution happen at catalog
creation instead of being deferred to the first `SELECT`.
- `TrinoConnectorDorisMetadata.applyFilter` / `applyProjection`: bridge
Trino native filter/projection pushdown, reusing
`TrinoPredicateConverter` to translate a Doris `ConnectorExpression`
into a Trino `TupleDomain`. `remainingFilter` is conservatively returned
as the original expression to match legacy behavior (conjuncts are not
stripped; BE re-evaluates them).

**Batch B — fe-core bridge for image compatibility**
- `GsonUtils`: atomically replace the three legacy `registerSubtype`
entries (`TrinoConnectorExternalCatalog` / `Database` / `Table`) with
`registerCompatibleSubtype` redirects onto the `PluginDrivenExternal*`
hierarchy. This must be atomic — `RuntimeTypeAdapterFactory` rejects
duplicate labels, so keeping both bindings would throw at static init.
Mirrors what ES/JDBC already did.
- `PluginDrivenExternalCatalog.gsonPostProcess`: extract a
`legacyLogTypeToCatalogType()` helper that maps `Type.TRINO_CONNECTOR` →
`"trino-connector"`; the generic `name().toLowerCase()` would otherwise
produce the wrong `"trino_connector"` (underscore) that `CatalogFactory`
does not recognize.
- `PluginDrivenExternalTable.getEngine()` / `getEngineTableTypeName()`:
add `trino-connector` branches that preserve the legacy engine-name /
table-type display across `SHOW TABLE STATUS` and `information_schema`.

**Batch C — flip the switch**
- Add `"trino-connector"` to `CatalogFactory.SPI_READY_TYPES` so catalog
creation routes through the SPI path.

**Batch D — remove legacy code**
- Drop the `instanceof TrinoConnectorExternalTable` scan branch in
`PhysicalPlanTranslator` (the `PluginDrivenExternalTable` SPI branch
already handles it).
- Drop `case "trino-connector"` in `CatalogFactory`.
- Delete `fe-core/datasource/trinoconnector/` (10 files) and the
now-dead legacy `TrinoConnectorPredicateTest`.
- Route the `TRINO_CONNECTOR` db-build case in `ExternalCatalog` to
`PluginDrivenExternalDatabase` (mirrors the migrated JDBC case).
- **Retained for image compatibility**: the
`InitCatalogLog.Type.TRINO_CONNECTOR` and
`TableType.TRINO_CONNECTOR_EXTERNAL_TABLE` enums, the GsonUtils
redirects, and the `MetastoreProperties` trino-connector entry.

**Batch E — tests + tracking docs**
- 29 JUnit 5 unit tests over the plugin-free converters:
- `TrinoPredicateConverterTest` — `ConnectorExpression` pushdown trees →
Trino `TupleDomain` (EQ / range / NE / IN / IS [NOT] NULL / AND / OR,
Slice encoding), plus graceful degradation to `TupleDomain.all()` on
null/unsupported input.
- `TrinoTypeMappingTest` — Trino SPI type → Doris `ConnectorType`
(scalars, decimal precision/scale, timestamp precision clamp,
array/map/struct, unsupported-type failure).
- `TrinoConnectorProviderTest` — `validateProperties` fast-fails when
`trino.connector.name` is missing/empty.
- No Trino plugin/cluster required; plugin-dependent paths remain
covered by the existing `external_table_p0/p2` `trino_connector`
regression suites.
- Sync the migration tracking docs under `plan-doc/` (already carried on
this feature branch since P0).

**Net effect**: 28 files, +1025 / −2681 (~1656 LOC net removed). Old FE
images holding legacy trino catalogs / databases / tables deserialize
onto the `PluginDrivenExternal*` hierarchy through the GsonUtils
string-name redirect, with engine-name display preserved.

**Deferred (follow-ups, not in this PR)**:
- `trino_connector_migration_compat` regression test (old-image
deserialization) — requires a running cluster + Trino plugin + docker,
unavailable in this dev environment; tracked as a CI/cluster follow-up.
- The plugin-install documentation update lives in the `doris-website`
repo and is handled separately.

### Release note

None

### Check List (For Author)

- Test
- [x] Unit Test — 29 new tests in `fe-connector-trino` (predicate
converter / type mapping / property validation).
- [ ] Regression test — existing `trino_connector` suites cover plugin
paths; the new old-image compat regression is deferred to a CI/cluster
follow-up.
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason

- Behavior changed:
- [x] No. Internal routing moves from the legacy fe-core path to the SPI
path; image compatibility, engine-name display, and pushdown semantics
all mirror the legacy behavior. All batches land together, so there is
no serialization-gap window.

- Does this need documentation?
- [x] Yes. The trino-connector plugin-install doc update is a follow-up
in the `doris-website` repo.

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label
…tch design (hybrid, T02-T08) (apache#64143)

## Proposed changes

testing with apache#64146

P3 of the catalog-SPI migration (base: `branch-catalog-spi`). Migrates
the **hudi** connector following the **hybrid** strategy (D-019): harden
the dormant HMS-over-SPI hudi connector to correctness parity, build a
test baseline, and write the per-table dispatch design — **all behind
the closed gate** (`SPI_READY_TYPES` unchanged).

> ⚠️ **No user-visible behavior change.** The SPI hudi path stays
dormant (gate closed); hudi queries continue to use the legacy
`HMSExternalTable.dlaType=HUDI` path. This PR removes correctness
blockers ahead of the live cutover (deferred to P7 / batch E).

### What's included

**Correctness fixes (hardening dormant code, behind gate):**
- **T02** — fix hudi JNI `column_types` double bug: emit full Hive type
strings (was Doris bare type names, losing precision/scale/subtypes) and
send `column_names`/`column_types`/`delta_logs` as typed lists
end-to-end (was comma join/split, which shattered `decimal(10,2)` /
`struct<...>`). Matches the BE `hudi_jni_reader.cpp` contract (names `,`
/ types `#` / delta `,`).
- **T04** — fail loud on time-travel / incremental read in the SPI
`visitPhysicalHudiScan` branch (was silently returning the latest
snapshot / silently full-scanning).
- **T05** — real EQ/IN partition pruning in
`HudiConnectorMetadata.applyFilter` (was a placeholder that ignored
predicates and unconditionally switched the partition source from
Hudi-metadata to HMS); faithfully mirrors
`HiveConnectorMetadata.applyFilter`.
- **T07** — column-name casing fix in `avroSchemaToColumns` (top-level
lowercase, mirroring legacy `HMSExternalTable`).

**Test baseline (all three connector modules started P3 with 0 tests):**
- `fe-connector-hudi` (33): type-mapping / schema-parity (COW/MOR
golden) / table-type / partition-pruning / scan-range.
- `fe-connector-hms` (12): shared Hive-type-string parser tests.
- `fe-connector-hive` (14): file-format / partition-pruning (mirrors
T05).
- COW/MOR schema is **type-agnostic** (golden parity vs legacy
`initHudiSchema`); table type only affects scan planning.

**Decisions / design (code-grounded, design-only):**
- **T03** — defer `schema_id`/`history_schema_info` field-id evolution
to batch E (DV-006; not a model-agnostic SPI fix).
- **T06** — keep MVCC/snapshot SPI defaults (opt-out) + document
(DV-007).
- **T08** — `tableFormatType` dispatch design memo + **D-020**: single
`hms` catalog per-table routing via a new backward-compatible
`ConnectorMetadata.getScanPlanProvider(handle)` (per-table provider
seam); refines D-005. The keystone gap is split into M1 (identity
consumption, fe-core reads `tableFormatType` as an opaque string) and M2
(scan routing).

### Deferred to batch E / P7 (not in this PR)
Gate flip (`SPI_READY_TYPES += hms/hudi`), fe-core `tableFormatType`
consumption (M1+M2 implementation), live cutover, delete legacy
`datasource/hudi/`, full incremental/time-travel/MVCC, Iceberg-on-hms
via SPI (needs P6 `IcebergScanPlanProvider`), cluster/runtime
validation.

### Verification
Per task tracking, each code batch landed with: per-module compile +
checkstyle 0 (incl. test sources) + connector import-gate pass + new
unit tests green. The two most recent commits are docs-only
(`plan-doc/`); the code is unchanged since the last green batch. Gate
stays closed → the dormant SPI path is unreachable at runtime → zero
live-path risk. CI re-verifies.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…core + make fe-core odps-free (T07-T09) (apache#64300)

Follow-up to apache#64253 (the MaxCompute catalog-SPI cutover). After the
cutover a `max_compute` catalog deserializes to
`PluginDrivenExternalCatalog` and no legacy `MaxComputeExternal*` object
is ever instantiated, so the legacy MaxCompute subsystem in fe-core is
dead code. This removes it and makes fe-core's dependency tree fully
odps-free.

**1. Remove legacy subsystem** (`7a4db351100`)
- Delete 20 fe-core files: `datasource/maxcompute/*` (incl.
`MCTransaction`, `MaxComputeScanNode`/`Split`), the MaxCompute
sink/insert/txn plumbing, and 2 legacy-only tests.
- Clean ~21 reverse-reference sites (imports + dead
`instanceof`/visitor/rule branches), keeping every
`PluginDriven`/connector sibling branch and the image/replay keep-set
(GsonUtils compat strings;
`TableType`/`TransactionType`/`TableFormatType`/`InitCatalogLog.Type`
`MAX_COMPUTE` enums; block-id thrift).
- Rewire 3 tests; e.g. `FrontendServiceImplTest`'s block-id RPC test now
mocks the generic `Transaction` SPI, since `getMaxComputeBlockIdRange`
reads the PluginDriven connector transaction.

**2. Make fe-core odps-free** (`409300a75b8`)
- Drop the two odps deps from `fe-core/pom.xml`.
- Move `MCUtils` from fe-common into
`be-java-extensions/max-compute-connector` (its only consumer after the
removal); keep `MCProperties` (odps-free constants) in fe-common.
- Drop `odps-sdk-core` from fe-common — it was also leaking
netty/protobuf transitively to fe-common's own
`DorisHttpException`/`GsonUtilsBase`, so declare `netty-all` +
`protobuf-java` directly (proper dependency hygiene).

**3. Doc-sync** (`f8c305765e8`) — plan-doc
PROGRESS/HANDOFF/deviations/design tracking notes.

- `mvn -pl :fe-core -am test-compile` (main+test) passes; checkstyle 0
violations; connector import-gate passes.
- `grep -rn com.aliyun.odps fe/fe-core/src` → empty.
- `mvn -pl :fe-core dependency:tree | grep odps` → empty (no odps,
direct or transitive).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
本 session 仅调研+设计。14-agent code-grounded recon + cross-cut 对抗复审,
覆盖 paimon 5 功能区(普通读/系统表/procedure/DDL/mtmv)旧框架实现 →
映射新 catalog SPI → 对齐 maxcompute 连接器接口一致性。

新增:
- research/p5-paimon-migration-recon.md: 5 区旧实现 + E1–E10 SPI 状态 +
  跨切面风险 + MC 一致性 11 约定 + 测试基线
- tasks/P5-paimon-migration.md: old→new 映射 + 30 TODO/B0–B9 批 +
  批次依赖图 + 验收标准

用户签字决策:
- D-037 (P5-D1): flavor=单 Catalog + createCatalog flavor switch(MC 一致,
  不建 backend 模块——5 个 backend 模块是空壳)
- D-038 (P5-D2): MTMV/MVCC 桥 P5 内实现(fe-core PaimonPluginDrivenExternalTable),
  翻闸 gated on 它,禁静默读 latest 回归

证伪 3 先验: backend 模块空壳(连接器走单 Catalog stub)/ FE 分发部分已预接
(残留=连接器 listPartitions)/ Base64 非 blocker(BE 有 STD fallback)。
procedure 区=零可迁 doc-only。

doc 同步: connectors/paimon.md(修 3 stale 表述)、decisions-log.md(+D-037/D-038,
36→38)、PROGRESS.md(header/§一/§二/§三/§四/§六/§七)、HANDOFF.md(覆盖,不留折叠历史)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
T01: extract PaimonCatalogOps injection seam (5 read methods, B0 read-only)
over the paimon SDK Catalog; refactor PaimonConnectorMetadata to inject it
(6 call sites migrated, read path byte-for-byte unchanged); build the first
fe-connector-paimon test module (no-mockito recording fake, mirroring MC's
McStructureHelper): 9 metadata UTs pinning the databaseExists try/catch and
the getColumnHandles reload-fallback, FakePaimonTable (fail-loud on non-read
methods), and an env-gated live connectivity smoke.

T02: R-007 paimon.version 3-way pin invariant comment (FE connector + BE
paimon-scanner + preload-extensions already aligned at 1.3.1 via the single
fe/pom.xml property); offline FE->BE serialized-Table round-trip smoke (real
FileSystemCatalog -> connector encode -> BE-mirrored URL-first/STD-fallback
decode, asserts rowType/partition/primary keys); parity-baseline doc
inventorying the 41 existing regression suites as the after-cutover parity
gate plus the real connector-side gaps and the live-e2e hard gate.

Connector module: Tests run: 12, Failures: 0, Errors: 0, Skipped: 1 (the
skip is the env-gated live test); checkstyle 0; import-gate clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Single-Catalog flavor switch on paimon.catalog.type for all five flavors
(filesystem/hms/rest/jdbc/dlf), mirroring the legacy fe-core flavor
properties without importing fe-core/fe-common.

- New PaimonCatalogFactory: pure validate() + buildCatalogOptions()
  (paimon.catalog.type -> paimon `metastore` opt, per-flavor options,
  paimon.* passthrough excl storage prefixes) + buildHadoopConfiguration /
  buildHmsHiveConf / buildDlfHiveConf + requireOssStorageForDlf.
- PaimonConnector: thread ConnectorContext; createCatalog wires all 5
  flavors live (filesystem/jdbc with Hadoop Configuration, rest
  Options-only, hms/dlf with HiveConf), each wrapped in
  context.executeAuthenticated (Kerberos seam). JDBC DriverShim ported with
  driver-url resolution via getEnvironment() (replaces forbidden JdbcResource).
- PaimonConnectorProperties: all flavor key constants (multi-alias String[]).
- PaimonConnectorProvider: validateProperties override -> factory.validate.
- pom: add paimon-hive-connector-3.1 + hadoop-common + hive-common
  (hive-common over hive-catalog-shade to avoid the fastutil conflict).
- 31 new no-mockito unit tests (PaimonCatalogFactoryTest); module 43/0/0/1,
  checkstyle 0, import-gate clean.

hms/dlf live connection is gated on B7 cutover + live-e2e: the Thrift
metastore client is host-provided (not bundled) with a child-first
Configuration/HiveConf cross-loader hazard to verify; jdbc driver_url FE
security allow-list + external hive-site.xml file load are deferred. All
documented in code NOTEs and plan-doc. rest also requires warehouse
(legacy parity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Connector-side only; no fe-core / fe-connector-api / fe-connector-spi changes.
B2 and B3 were both uncommitted and are entangled in the same files
(PaimonConnectorMetadata, PaimonCatalogOps, PaimonConnector,
RecordingPaimonCatalogOps), so they are committed together.

B2 normal-read (T06-T10):
- T06 PaimonScanPlanProvider transient-Table reload fallback (planScan +
  getScanNodeProperties both guarded)
- T07 PaimonPredicateConverter parity-correct TZ (NTZ keeps UTC, LTZ not
  pushed) + supportsCastPredicatePushdown=false
- T08 listPartitionNames/listPartitions/listPartitionValues (legacy
  display-name parity) + seam listPartitions(Identifier)
- T09 doc-only pure-predicate pruning; T10 cache deferred to B8

B3 DDL metadata (T11-T15):
- T11 PaimonTypeMapping.toPaimonType (Doris->paimon, byte-parity with legacy
  DorisToPaimonTypeVisitor; narrow gap preserved)
- T12 PaimonSchemaBuilder (ConnectorCreateTableRequest -> paimon Schema)
- T13 createTable/dropTable + seam DDL methods + ConnectorContext threaded
  (D7=B: each DDL op wrapped in executeAuthenticated; read path un-wrapped)
- T14 supportsCreateDatabase/createDatabase (HMS-props gate) +
  dropDatabase(force) (enumerate-loop + native cascade)
- T15 offline UTs (no-mockito; WHY+MUTATION)

Verified: fe-connector-paimon Tests run: 96, Failures: 0, Errors: 0,
Skipped: 1 (live); checkstyle 0; connector import-gate 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port paimon system tables and MVCC snapshots onto the plugin connector SPI.

- T16: greenfield E7 SPI on ConnectorTableOps — listSupportedSysTables +
  getSysTableHandle (default no-ops; MC/jdbc/es/trino unaffected).
- T17: PaimonConnectorMetadata implements E7 — names from
  SystemTableLoader.SYSTEM_TABLES; sys table loaded via the existing
  getTable seam with a 4-arg Identifier(db,table,"main",sysName); sys
  handle carries sysTableName + forceJni (binlog/audit_log); shared
  PaimonTableResolver gives metadata + scan one sys-aware reload rule.
- T18: generic fe-core glue — PluginDrivenExternalTable centralizes handle
  acquisition into resolveConnectorTableHandle and delegates
  getSupportedSysTables to the connector; new PluginDrivenSysExternalTable
  (reports PLUGIN_EXTERNAL_TABLE) + PluginDrivenSysTable reuse the live
  SysTableResolver/NativeSysTable machinery (reusable by future connectors).
- T19: forceJni gate so binlog/audit_log go JNI not native; buildTableDescriptor
  -> HIVE_TABLE (also fixes a latent normal-table SCHEMA_TABLE descriptor gap,
  DV-024); PluginDrivenScanNode fail-loud guard rejects scan-params/time-travel
  on system tables.
- T20: first E5 MVCC consumer — beginQuerySnapshot/getSnapshotAt/getSnapshotById
  (empty table -> -1; sys handle -> empty) + SUPPORTS_MVCC_SNAPSHOT/TIME_TRAVEL
  capabilities. Inert until B5 wires the fe-core MvccTable consumer.

Decisions: D-039 (E7 reuses the live SysTable machinery; RFC §10's
$-suffix-via-getTableHandle design was never implemented and is superseded,
DV-023). Deviations: DV-023, DV-024.

Verification: import-gate 0; connector 124 tests pass (1 live skipped);
fe-core PluginDriven*Test 100 pass; checkstyle 0; no cutover/B5 leakage
(paimon not in SPI_READY_TYPES; PluginDrivenExternalTable still not an MvccTable).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ridge + time-travel + procedure doc no-op

B5a (MTMV/MVCC bridge): source-agnostic PluginDrivenMvccExternalTable (MTMVRelatedTableIf+MTMVBaseTableIf+MvccTable, D-042) wiring the B4-inert E5 snapshot SPI; PluginDrivenMvccSnapshot; list-partitions-at-snapshot.
B5b (time-travel): scan-pin + AS-OF + tag + branch + @incr across connector (ConnectorTimeTravelSpec, PaimonIncrementalScanParams) and fe-core; holistic review fixes RD-1 (partitioned time-travel empty-universe scan-all guard in PluginDrivenScanNode) + RD-2 (@incr lists-latest partitions/schema).
B6/T26: procedure doc no-op — zero migratable code; closed-form reject verified (ExecuteActionFactory:59-62 / CallFunc:42-43).
All inert/gated until B7 cutover (paimon NOT yet in SPI_READY_TYPES). Excludes regression-conf.groovy (secrets) + scratch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eview fixes

Combines all previously-uncommitted P5 paimon work into one commit (per request).

8 fullpath-review fixes (BLOCKERs + key MAJORs) — connector + SPI + fe-core bridge:
- FIX-STORAGE-CREDS: applyStorageConfig translates canonical s3.*/oss.*/AWS_* ->
  fs.s3a./fs.oss. (+DLF region->OSS endpoint)
- FIX-NATIVE-PARTVAL: per-type serializePartitionValue + session TZ (LTZ only);
  binary/varbinary drops the partition map (no [B@hash garbage)
- FIX-TZ-ALIAS: full legacy ZoneId.SHORT_IDS + 4 Doris overrides alias map
  (CST/PST/EST now resolve for FOR TIME AS OF datetime strings)
- FIX-TABLE-STATS: getTableStatistics override + PaimonCatalogOps.rowCount seam
  (normal AND system tables, via the sys-aware resolveTable)
- FIX-CPP-READER: honor enable_paimon_cpp_reader -> native DataSplit.serialize so
  BE's PaimonCppReader can decode the split
- FIX-READ-NOTNULL: mapFields forces read-path columns nullable (legacy parity)
- FIX-HMS-CONFRES: new ConnectorContext.loadHiveConfResources hook + 2-arg
  buildHmsHiveConf file-base merge (external hive-site.xml reaches the metastore)
- FIX-REST-VENDED: new ConnectorContext.vendStorageCredentials hook + scan-props
  vended AWS_* overlay (REST per-table tokens reach BE)

Also carries the previously-uncommitted B7 core cutover + D-045/D-046 restores.

Tests: fe-connector-paimon 213 pass / 0 fail / 1 skip (live-gated); fe-core compiles +
DefaultConnectorContextVendTest 2/0. Each fix's root-cause/patch/UT and impl-time
corrections are in plan-doc/tasks/designs/P5-fix-<id>-design.md.

Excluded from this commit: regression-test/conf/regression-conf.groovy (plaintext Aliyun
keys, pending scrub) and scratch dirs (.audit-scratch/, conf.cmy/, META-INF/, *.bak).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…canonical scheme

Root cause: the paimon connector sent native ORC/Parquet data-file paths and
deletion-vector (DV) paths to BE un-normalized. The paimon SDK emits
warehouse-native schemes (oss://, cos://, obs://, s3a://, or the OSS
bucket.endpoint authority form); BE's scheme-dispatched S3 file factory only
recognizes s3://. On S3-compatible (non-AWS) warehouses this breaks native reads
outright (B-7DF, data file) and silently drops the DV so DELETEd rows reappear
(B-7DV, merge-on-read corruption). Legacy PaimonScanNode normalized both via the
2-arg LocationPath.of; the cutover dropped it. The two paths reach BE via
different mechanisms (data-file through PluginDrivenSplit's single-arg
LocationPath.of -> FileQueryScanNode:568; DV baked into thrift by the connector's
populateRangeParams), so a fe-core-bridge-only fix cannot reach the DV path.

Solution: new ConnectorContext.normalizeStorageUri SPI hook (identity default,
mirroring vendStorageCredentials), implemented in DefaultConnectorContext via the
engine's 2-arg normalizing LocationPath.of with the catalog's static storage map
(threaded via a new lazy supplier + 4-arg ctor; PluginDrivenExternalCatalog wires
it). The connector routes BOTH the data-file and DV paths through it inside the
extracted, unit-testable buildNativeRange. JNI path untouched (carries its own
FileIO). Fail-loud on un-normalizable paths (legacy parity). Static-vs-vended map
scope noted in DV-025 (the pure-vended edge belongs to credential fixes #2/#3).

Tests: fe-core DefaultConnectorContextNormalizeUriTest (oss->s3, s3 idempotent,
null/blank, empty-map fail-loud); connector PaimonScanPlanProviderTest x3 (both
paths normalized + call count, DV-less, no-context raw). paimon module 216/0/0,
fe-core targeted green, checkstyle 0, import-gate clean. Live OSS+DV e2e CI-gated
(not run). SPI RFC section 21 (E13), deviations DV-025.

Also includes the round-2 review report + task list this fix derives from.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mark FIX-URI-NORMALIZE complete (commit 20b19d1) in the task list and update
HANDOFF: #1 summary + verification, next session starts at #2 (reuse the
normalizeStorageUri BE-scan-prop normalization seam), and the standing reminders
(regression-conf.groovy still holds a plaintext key -> path-whitelist only; P2
apache#8/apache#9 need user scope decision first).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…canonical AWS_*

Finding B-9 (BLOCKER, rereview2). The paimon connector copied static
catalog-level storage credentials/config verbatim into the BE scan-node
properties: PaimonScanPlanProvider.getScanNodeProperties iterated the raw
catalog properties and emitted location.<rawkey> for any s3./oss./cos./obs./
hadoop./fs./dfs./hive. prefix; the fe-core bridge only strips the location.
prefix. BE's native (FILE_S3) reader understands ONLY AWS_ACCESS_KEY/
AWS_SECRET_KEY/AWS_ENDPOINT/AWS_REGION/AWS_TOKEN, so static s3.access_key/
oss.access_key on a private bucket reached BE unintelligible -> no usable
credentials -> 403. This is the third credential seam (static->BE-scan),
missed by both the prior round and the 8 fixes (review §9.3); the catalog-
FileIO seam (FIX-STORAGE-CREDS) and the vended seam (FIX-REST-VENDED) were
already closed.

Root cause: legacy PaimonScanNode.getLocationProperties returns only
CredentialUtils.getBackendPropertiesFromStorageMap(storagePropertiesMap) (the
canonical AWS_*/hadoop/dfs map). The cutover replaced that single normalized
call with a raw prefix-copy loop; the connector cannot import fe-core's
StorageProperties so it had no access to the normalization.

Solution (D-048, user-signed full legacy-parity scope): new no-op-default SPI
ConnectorContext.getBackendStorageProperties(); DefaultConnectorContext returns
getBackendPropertiesFromStorageMap over the storagePropertiesSupplier already
wired in FIX-URI-NORMALIZE (no ctor change, CredentialUtils already imported).
The connector replaces its raw prefix-copy loop with a context-gated overlay of
that map; the vended overlay stays after it (vended wins on collision, legacy
precedence). Object-store creds -> AWS_*; HDFS -> canonical hadoop/dfs
(preserves user overrides + adds the legacy defaults, folding in the §211
MINOR); drops the non-parity hive.* passthrough. Investigated the
AWS_CREDENTIALS_PROVIDER_TYPE=ANONYMOUS two-step edge and confirmed via BE
s3_util.cpp (both providers prefer explicit ak/sk over cred_provider_type) that
it is harmless — no regression. Connector import-gate stays clean.

Tests: fe-core DefaultConnectorContextBackendStoragePropsTest (OSS static creds
-> AWS_*, raw alias absent; no-supplier -> empty); connector
PaimonScanPlanProviderTest (+getScanNodePropertiesNormalizesStaticCreds raw
alias not shipped; modified vended-overlay collision to canonical keys; renamed
no-context test -> emits no storage props). Fail-before/pass-after proven by
reverting the connector change (2/3 go red). Module 217/0/0 (1 CI-gated skip),
checkstyle clean, import-gate clean. Live private-bucket native-read e2e is
CI-gated (not run). SPI RFC §22 (E14).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
morningman and others added 12 commits June 12, 2026 22:23
…tion value to NULL (P4)

Root cause: PaimonScanRange.populateRangeParams routed paimon partition values
through ConnectorPartitionValues.normalize, which applies Hive-directory
null-sentinel coercion (a value of "\N" or "__HIVE_DEFAULT_PARTITION__" -> isNull).
That coercion is correct for hudi (path-encoded partitions) but wrong for paimon:
paimon partition values are TYPED — serializePartitionValue returns Java-null for a
genuine null and the literal toString() otherwise — so a null is never a directory
sentinel, and the coercion only ever bites a genuine literal value. A string
partition column literally holding "\N" (which paimon does NOT reserve) or
"__HIVE_DEFAULT_PARTITION__" was materialized as SQL NULL instead of the literal on
the native ORC/Parquet read, diverging from legacy PaimonScanNode.setScanParams
(source/PaimonScanNode.java:323-326) and yielding wrong rows for WHERE col='\N' /
col IS NULL. The dominant genuine-NULL case is unaffected (both sides set isNull=true
and BE ignores the rendered value string when is_null==true,
partition_column_filler.h:40-44).

Fix (1 file): derive isNull from the Java null ONLY (render genuine null as "",
legacy-exact); drop the unused ConnectorPartitionValues import. ConnectorPartitionValues
itself is left untouched — hudi (HudiScanRange.java:226) legitimately needs the
Hive-directory coercion. The residual scan-vs-prune skew for a literal
"__HIVE_DEFAULT_PARTITION__" value lives in the generic fe-core prune bridge
(TablePartitionValues), is pre-existing and unchanged by this fix, and is logged as a
deviation.

Tests: new PaimonScanRangePartitionNullTest pins genuine-null -> (isNull=true, "");
literal "\N" -> (isNull=false, "\N"); literal "__HIVE_DEFAULT_PARTITION__" ->
(isNull=false, verbatim); ordinary -> kept. Fail-before (re-inlined coercion) reds the
literal + render rows; pass-after green. Full module 261/0/0 (1 CI-gated live skip),
checkstyle 0, import-gate clean. Adversarial review (5 angles) SAFE_TO_COMMIT: total
convergence of all 3 range builders on populateRangeParams; no query goes correct->wrong.
No BE/SPI change; native partition materialization otherwise covered by the CI-gated
legacy paimon partition regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…035])

Records the P4 cleanup pass disposition (P0–P4 now all clear):
- FIX-VARCHAR-BOUNDARY (N10.1) `bcee91dcb52` + FIX-PARTITION-NULL-SENTINEL
  `4b2c2190dc2` landed as independent fix commits.
- 15 items accepted as deviations (M5.1 transient-only + 14
  display/perf/text/inert/connector-more-correct/false-premise) → [DV-035].
- D-057 logs the user-signed scope; DV-035 the accepted batch.
- task-list §P4 marked done; HANDOFF rolled to next session (B8 legacy
  deletion or cross-connector follow-up batch).

Read-only adversarial recon `wf_6884d37b-8ef` re-verified all ~17 review §5/§7
items against current code; the sentinel ACCEPT verdict was refuted by a
prune-path skeptic (converted to FIX) and M5.1's "cheap fallback" premise was
refuted at impl level (confirmed ACCEPT).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… injection)

Next session = a third independent adversarial review of every paimon
connector functional path (basic read, @incr, time travel, branch/tag,
sys-tables, metadata cache, deletion vectors, multi-metastore, multi-storage,
Parquet/ORC native read, type mapping, and a legacy-logic/fallback sweep),
checking design + implementation delivery and diffing each path against the
legacy datasource/paimon/* reference (kept in-tree for side-by-side).

Hard constraint per user: do NOT inject accumulated development priors during
the find-and-judge phase — reviewers judge from current code + legacy only;
decisions-log / deviations-log / prior review reports / catalog-spi-p5-* memory
are consulted ONLY in a final reconciliation phase and must not suppress a
finding. B8 legacy deletion deferred until after this review.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rows on URI normalize (P9-1, BLOCKER)

Root cause: native ORC/Parquet reads on a Paimon REST catalog over object
storage (oss/cos/obs/s3a) threw during FE planning —
"StoragePropertiesException: No storage properties found for schema: oss".
PaimonScanPlanProvider.normalizeUri routed both the data-file path and the
deletion-vector path through ConnectorContext.normalizeStorageUri, which
normalizes via the catalog's STATIC storage map. That map is empty by design
for REST catalogs (vended creds are per-table/dynamic;
CatalogProperty.initStorageProperties seeds an empty map when vended creds are
enabled), so LocationPath.of(uri, {}) found no scheme entry and threw.
shouldUseNativeReader has no flavor gate, so every REST native read hit it;
the only escape was SET force_jni_scanner=true. DV-025 deferred this exact
corner to FIX-STATIC-CREDS-BE / FIX-REST-VENDED, but those fixed credential
down-flow to BE, not normalizeStorageUri — the deferral was never closed.

Legacy parity: PaimonScanNode.doInitialize computes a vended-overlay storage
map once (VendedCredentialsFactory.getStoragePropertiesMapWithVendedCredentials
— vended REPLACES the empty static map for REST) and uses it for
LocationPath.of at both the data-file (:443) and DV (:296) sites.

Solution: route the per-table vended token into native URI normalization,
replicating legacy precedence.
- SPI: add default overload ConnectorContext.normalizeStorageUri(uri, token)
  that ignores the token and delegates to the 1-arg form, so every non-paimon
  connector is unaffected.
- fe-core DefaultConnectorContext: extract the vended-typed-map build (filter
  cloud props -> StorageProperties.createAll -> index by Type) into a shared
  buildVendedStorageMap (single source of truth with vendStorageCredentials, no
  drift). The 2-arg override normalizes against the vended map when present and
  falls back to the static map otherwise (legacy "vended replaces static"); the
  1-arg form delegates with a null token (byte-identical to prior behavior).
  vendStorageCredentials keeps an outer try so its fail-soft boundary is
  preserved across the refactor.
- connector PaimonScanPlanProvider: extract the vended token ONCE per planScan
  (validToken() may refresh) and thread it through buildNativeRanges/
  buildNativeRange to both normalize sites. Empty for non-REST (FileIO gate) and
  offline -> folds to the static path, so non-REST reads are byte-unchanged.

Tests:
- fe-core DefaultConnectorContextNormalizeUriTest (+3): vended-REST normalize
  under an empty static map (the gap that hid the bug twice); fail-loud when the
  token is also empty (proves the fix is the token, not a swallow); static-map
  path unaffected by an empty token.
- connector PaimonScanPlanProviderTest (+1, 5 call sites updated): the per-table
  vended token is threaded verbatim to BOTH the data-file and DV normalize calls
  (RecordingConnectorContext now captures the 2-arg token).
- The positive RESTTokenFileIO token-extraction path needs a live REST stack and
  remains E2E-gated (enablePaimonTest=false), not run here.
Verified: connector 42/0/0; fe-core NormalizeUri 7/0, Vend 2/0, BackendStorageProps 2/0;
checkstyle 0 across spi/paimon/fe-core; connector import-gate clean.
Design + adversarial red-team: plan-doc/FIX-REST-VENDED-URI-NORMALIZE-design.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stead of real format (P7-1, MAJOR)

Root cause: PaimonScanPlanProvider.buildJniScanRange and buildCountRange
hardcoded .fileFormat("jni") on PaimonScanRange.Builder. The real
defaultFileFormat (= table.options().getOrDefault(file.format,"parquet"),
computed in planScanInternal) was passed into buildJniScanRange and IGNORED,
and was not passed into buildCountRange at all. PaimonScanRange
.populateRangeParams then emitted fileDesc.file_format="jni". BE
paimon_cpp_reader.cpp backfills paimon FILE_FORMAT/MANIFEST_FORMAT from this
field (only when unset/empty, guarded !file_format.empty()) to avoid defaulting
manifest.format=avro — with the invalid "jni" it injects MANIFEST_FORMAT=jni
(and FILE_FORMAT=jni when unset) and the manifest read breaks.

Key mechanism: the JNI formatType routing is gated by the paimon.split property
(PaimonScanRange.populateRangeParams), NOT by the fileFormat string (that string
drives formatType only on the native branch, where it is already real). So
emitting the real orc/parquet leaves JNI routing intact and only corrects the
inner fileDesc.file_format BE consumes — matching legacy
PaimonScanNode.setPaimonParams, which sets setFormatType(FORMAT_JNI) AND
setFileFormat(getFileFormat(...)) = the real data-file format.

Solution (connector-only, no BE change):
- buildJniScanRange: .fileFormat("jni") -> .fileFormat(defaultFileFormat) (the
  already-passed, previously-ignored parameter). Covers the non-DataSplit
  metadata-split call and the DataSplit JNI call.
- buildCountRange: add a defaultFileFormat parameter, use it, and thread it from
  the call site in planScanInternal.
- PaimonScanRange.Builder default: "jni" -> "" (every production caller sets the
  format explicitly; empty is the safe default — BE skips its format backfill on
  empty rather than ever injecting an invalid value).

Tests: PaimonScanPlanProviderTest (+1) jniAndCountRangesCarryRealFileFormatNotJni
— a real FileSystemCatalog PK table created with explicit file.format=orc (so
the asserted value is the table option, distinct from the parquet fallback):
force_jni_scanner=true scan -> every JNI data range carries "orc" (not "jni");
count-pushdown scan -> the collapsed count range carries "orc". Reverting either
method to "jni", or dropping the threaded defaultFileFormat, turns the assertion red.
Verified: connector 262/0/1skip (PaimonScanPlanProviderTest 43/0); checkstyle 0;
import-gate clean. Design: plan-doc/FIX-JNI-FILE-FORMAT-design.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…AJOR) done, next FIX-3

- FIX-1 FIX-REST-VENDED-URI-NORMALIZE committed c376aba
- FIX-2 FIX-JNI-FILE-FORMAT committed 2e845e8
- HANDOFF now points the next session at FIX-3 (FIX-INCR-SCAN-RESET) → FIX-4 (FE-config parity)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ode reset (P2-1, MAJOR)

Root cause: PaimonIncrementalScanParams.validate stripped legacy's defensive null
reset of scan.snapshot-id/scan.mode (PaimonScanNode:842-846), justified by a wrong
"a fresh per-query Table can't inherit scan.*" rationale. A base table that PERSISTS
scan.snapshot-id/scan.mode (legal & mutable via ALTER TABLE SET / TBLPROPERTIES /
table-default.*) carries it on every fresh load. Without the reset, resolveScanTable's
Table.copy merges the stale scan.snapshot-id with incremental-between and paimon 1.3.1
either THROWS ("[incremental-between] must be null when you set [scan.snapshot-id,
scan.tag-name]") or silently downgrades the @incr read to FROM_SNAPSHOT at the stale id
(wrong rows). The connector dropped exactly the safeguard legacy relied on.

Solution (Option 2; design red-team wf_ffd11631-ed2, DESIGN-SOUND): keep validate()
emitting only the non-null incremental-between* keys so the shared ConnectorMvccSnapshot
SPI / handle stay null-free, and reapply the two null resets at the single Table.copy
chokepoint via new PaimonIncrementalScanParams.applyResetsIfIncremental(scanOptions),
called in PaimonScanPlanProvider.resolveScanTable. paimon copyInternal consumes a null
value as options.remove(k), clearing the stale pin. The one edit covers BOTH callers
(native/JNI scan planScanInternal + JNI serialized-table getScanNodeProperties). Gated
on incremental-between / incremental-between-timestamp presence, so a genuine
scan.snapshot-id / scan.tag-name pin passes through unchanged (no false positive). Strict
legacy parity: resets scan.snapshot-id + scan.mode only. Corrected the now-refuted
"byte-parity on a freshly-loaded base" rationale in the affected javadoc/comments.

Tests: PaimonIncrementalScanParamsTest +4 (helper seeds the null resets for snapshot and
timestamp windows; passes non-incremental pins through unchanged; no-op for empty/null)
and reworded the keep-null-free validate() test; PaimonScanPlanProviderTest +1 real-table
(FileSystemCatalog over a persisted scan.snapshot-id), proven fail-before (paimon throws)
/ pass-after; PaimonConnectorMetadataMvccTest WHY-comment reworded (assertions unchanged).
Connector suites 20/44/37 green; checkstyle 0; import-gate clean. Connector-only — no SPI,
no BE change. Live @incr-over-persisted-scan.snapshot-id E2E is CI-gated (enablePaimonTest
=false), noted as gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FIX-3 FIX-INCR-SCAN-RESET committed f08bc22. Adds FIX-INCR-SCAN-RESET-summary.md,
marks FIX-3 done in the task-list, rolls HANDOFF to FIX-4 (FIX-FECONF-STORAGE-PARITY).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l legacy parity (P8-1..4, P9-2/3)

Root cause: the connector cannot import fe-core, so PaimonCatalogFactory rebuilds the FE-side Hadoop
Configuration/HiveConf from raw props with literal key logic. That reconstruction was incomplete vs the
legacy *Properties classes, so paimon catalogs on several storage backends failed FE-side catalog/metadata
access (the live FileSystemCatalog/HiveCatalog/JdbcCatalog could not resolve the storage FileIO).

Solution (connector-only; no fe-core/SPI/BE change):
- Extract a shared applyS3aBaseConfig helper (port of AbstractS3CompatibleProperties.appendS3HdfsProperties)
  taking caller-resolved creds AND the 4 tuning values, so each scheme passes its OWN aliases/defaults.
- 4a OSS: derive fs.oss.endpoint from region when blank (oss-<region>[-internal].aliyuncs.com, default
  -internal, publicAccess from dlf.access.public/dlf.catalog.accessPublic), MOVED from the DLF-local block
  into the shared OSS block (so filesystem+hms flavors get it too); also emit the S3A base for OSS.
  Removed the now-dead DLF-local derivation block.
- 4b S3: emit fs.s3a.path.style.access + connection.maximum/request.timeout/timeout. Tuning defaults are
  per-backend: S3=50/3000/1000 (incl AWS_* alias twins), OSS/COS/OBS=100/10000/10000 (a single shared
  default would silently mis-tune AWS S3).
- 4c COS/OBS: new applyCanonicalCosConfig/ObsConfig. Detection mirrors legacy guessIsMe (endpoint/warehouse
  PATTERN: myqcloud.com / myhuaweicloud.com) OR a cos./obs.-prefixed key, NOT scheme-key-only (a cosn://
  catalog configured with only s3.endpoint=cos...myqcloud.com would be missed otherwise). Each emits the
  S3A base (cosn/obs FS impl is S3AFileSystem, which reads fs.s3a.*) THEN the unconditional fs.cosn.* /
  fs.obs.* keys; OBS prefers the native OBSFileSystem when classpath-available.
- S3 endpoint-from-region (user-approved, same defect class as the OSS P8-1 fix): region-only AWS S3 derives
  https://s3.<region>.amazonaws.com.
- 4d HMS username: resolve hadoop.username from firstNonBlank(hive.metastore.username, hadoop.username)
  (alias priority), run AFTER the storage overlay so the raw hadoop.* passthrough cannot clobber it.
- 4e (folded in, pre-existing MAJOR found in impl review): the kerberos block forced
  hadoop.security.authentication=kerberos before applyStorageConfig, so a kerberized-HMS + simple-HDFS
  catalog had it clobbered back to simple by the raw hadoop.* passthrough (auth=simple but sasl=true ->
  broken GSSAPI). Relocated the kerberos block to run AFTER the overlay, mirroring legacy
  initHadoopAuthenticator-last ordering.

Design red-team (wf_a6385c61-669, 5 skeptics + completeness critic) caught the divergent tuning defaults,
the endpoint-pattern detection gap, and the unconditional fs.cosn.*/fs.obs.* requirement before coding;
impl verification (wf_f90260cb-5e6) confirmed byte-for-byte legacy key/alias/default fidelity and found 4e.

Tests: PaimonCatalogFactoryTest +15 (S3 endpoint-from-region, S3 50/3000/1000 tuning, path-style, OSS
endpoint-from-region filesystem+hms, OSS S3A base, COS keys + pattern-detect + unconditional region, OBS
keys + pattern-detect, no-COS/OBS-for-plain-S3, HMS username alias + priority, kerberos-survives-simple-HDFS).
The priority + kerberos tests are RED on the pre-move ordering. Verified: connector 56/0/0 +
full module green; checkstyle 0; import-gate clean. Live e2e (paimon_base_filesystem/dlf/hms suites) CI-gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ne; all 4 round-3 fixes complete, next B8

- Mark FIX-4 done (commit f0210b5) in task-list-P5-rereview3-fixes.md; record the beyond-literal-scope
  items (user-approved S3 endpoint-from-region, per-backend tuning defaults, endpoint-pattern detection,
  unconditional fs.cosn.*/fs.obs.*, folded-in 4e kerberos-ordering MAJOR) and the known out-of-scope residual.
- Add FIX-FECONF-STORAGE-PARITY-summary.md.
- Roll HANDOFF: all 4 user-approved round-3 fixes (FIX-1..FIX-4) complete; next session = B8 legacy deletion
  (paimon/* + *Properties dead residue, now that FIX-4 no longer needs them as a literal-port reference)
  + round-3 follow-ups (D-057 re-scope, accepted-deviation sign-off, uncheckedFallbacks), gated on an
  AskUserQuestion scope check since B8 is a large change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oop FS closure

Root cause: the Paimon connector plugin runs under a child-first ClassLoader with
org.apache.hadoop NOT parent-first, and bundled hadoop-common/hadoop-client-api but
NOT hadoop-aws. So FileSystem/SecurityUtil loaded child-first while S3AFileSystem
resolved from the parent 'app' loader -> cross-loader ClassCastException
('S3AFileSystem cannot be cast to FileSystem') and a permanent SecurityUtil.<clinit>
poison ('Could not initialize class ...SecurityUtil', 'DNSDomainNameResolver not
DomainNameResolver', 'ServiceConfigurationError: NullScanFileSystem not a subtype'),
cascading to 'Unknown database X'. ~39 of 42 external-regression suites failed on the
af2037 TeamCity run; not fixed by any later commit.

Solution (self-contained plugin — aligns with fe-core dropping hadoop/hive-catalog-shade
after full connector migration; does NOT lean on the parent):
- pom: add hadoop-aws (the only missing FS impl, S3AFileSystem; DistributedFileSystem
  already comes from the transitive hadoop-client-api). hive-common stays bundled.
- PaimonCatalogFactory.buildHadoopConfiguration: conf.setClassLoader(plugin loader) so
  Configuration.getClass("fs.<scheme>.impl") resolves the FS impl from the plugin loader.
- PaimonConnector.createCatalogFromContext (single chokepoint for all flavors): pin the
  thread-context classloader to the plugin loader around catalog creation so the
  FileSystem ServiceLoader and SecurityUtil static init resolve from the child. Mirrors
  JdbcConnectorClient / ThriftHmsClient.

Tests: connector build SUCCESS + all connector UTs 0 fail/0 error; plugin lib/ now
contains hadoop-aws/S3AFileSystem; checkstyle + connector import-gate clean. The full
runtime proof is the docker external paimon suite (CI-gated, enablePaimonTest) — not run
locally. See plan-doc/FIX-PAIMON-HADOOP-CLASSLOADER-{design,summary}.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ROPERTIES to paimon

Root cause: branch commit 98a73bf (D-046 paimon parity) added LOCATION+PROPERTIES
emission to the SHARED PLUGIN_EXTERNAL_TABLE branch of Env.getDdlStmt, gated only on
!properties.isEmpty(). JDBC/ES/Trino catalogs are plugin-driven with non-empty
getTableProperties() (connection props incl. credentials), so SHOW CREATE TABLE on a JDBC
external table emitted LOCATION '' + PROPERTIES("password"=...) instead of the legacy
comment-only ENGINE=JDBC_EXTERNAL_TABLE; — a correctness regression
(test_nereids_refresh_catalog) and a JDBC credential leak. Still present on HEAD.

Solution: gate the LOCATION+PROPERTIES emission additionally on
TableType.PAIMON_EXTERNAL_TABLE.name().equals(getEngineTableTypeName()) — only the paimon
engine type (the sole plugin-driven connector whose legacy DDL carried LOCATION/PROPERTIES)
renders them. JDBC/ES/Trino/MaxCompute revert to comment-only; the credential leak is
closed. Did NOT rebaseline the .out (would entrench the leaked-credential output).

Tests: fe-core compile SUCCESS + checkstyle clean; adversarial static review SOUND (paimon
incl. sys-table unwrap still renders LOCATION/PROPERTIES; jdbc/es/trino/maxcompute match
committed comment-only .out; getTableProperties has no other DDL consumer). e2e:
external_table_p0/nereids_commands/test_nereids_refresh_catalog (CI external pipeline). See
plan-doc/FIX-SHOWCREATE-PLUGIN-PROPS-{design,summary}.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@morningman morningman force-pushed the master-catalog-spi-07-paimon branch from af2037c to f7114a2 Compare June 12, 2026 14:23
@morningman

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29192 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f7114a2836e0657e3eee70b4a9e30bf651a6354f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17699	4052	3991	3991
q2	q3	10784	1392	806	806
q4	4678	468	339	339
q5	7529	851	579	579
q6	184	171	136	136
q7	770	846	612	612
q8	10226	1537	1610	1537
q9	7135	4482	4493	4482
q10	6768	1895	1528	1528
q11	437	268	258	258
q12	666	429	287	287
q13	18210	3293	2766	2766
q14	259	260	242	242
q15	q16	829	777	706	706
q17	975	990	943	943
q18	6919	5775	5617	5617
q19	1164	1323	1082	1082
q20	516	400	266	266
q21	5960	2696	2684	2684
q22	462	367	331	331
Total cold run time: 102170 ms
Total hot run time: 29192 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4862	4618	4668	4618
q2	q3	4904	5193	4651	4651
q4	2111	2193	1379	1379
q5	4861	4761	4717	4717
q6	234	178	127	127
q7	1853	1721	1560	1560
q8	2475	1987	1934	1934
q9	7308	7358	7379	7358
q10	4731	4684	4215	4215
q11	535	385	358	358
q12	724	735	520	520
q13	2979	3403	2796	2796
q14	272	272	241	241
q15	q16	677	694	618	618
q17	1278	1253	1249	1249
q18	7311	6844	6818	6818
q19	1099	1076	1065	1065
q20	2213	2224	1952	1952
q21	5203	4573	4428	4428
q22	529	465	403	403
Total cold run time: 56159 ms
Total hot run time: 51007 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 167905 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f7114a2836e0657e3eee70b4a9e30bf651a6354f, data reload: false

query5	4339	623	471	471
query6	436	188	171	171
query7	4829	528	297	297
query8	359	212	193	193
query9	8733	4105	4093	4093
query10	429	309	248	248
query11	5998	2395	2134	2134
query12	156	104	97	97
query13	1287	602	416	416
query14	6404	5441	5095	5095
query14_1	4436	4438	4411	4411
query15	207	201	176	176
query16	1043	437	430	430
query17	1125	707	588	588
query18	2576	483	350	350
query19	199	191	144	144
query20	116	106	108	106
query21	216	137	118	118
query22	13659	13616	13365	13365
query23	17372	16497	16163	16163
query23_1	16226	16187	16297	16187
query24	7648	1770	1308	1308
query24_1	1314	1292	1329	1292
query25	544	429	360	360
query26	1340	318	160	160
query27	2631	567	336	336
query28	4451	2013	2022	2013
query29	1041	601	462	462
query30	307	242	193	193
query31	1117	1067	952	952
query32	106	61	59	59
query33	505	303	255	255
query34	1158	1151	669	669
query35	765	773	666	666
query36	1384	1369	1274	1274
query37	152	100	85	85
query38	3185	3148	3045	3045
query39	919	915	916	915
query39_1	874	869	902	869
query40	216	119	96	96
query41	63	60	61	60
query42	92	93	93	93
query43	330	320	282	282
query44	
query45	191	180	175	175
query46	1123	1213	748	748
query47	2333	2367	2214	2214
query48	366	430	276	276
query49	626	456	341	341
query50	1084	345	260	260
query51	4277	4324	4265	4265
query52	84	86	75	75
query53	234	256	187	187
query54	266	214	191	191
query55	78	80	68	68
query56	229	242	208	208
query57	1405	1383	1301	1301
query58	235	200	197	197
query59	1582	1644	1462	1462
query60	273	228	221	221
query61	151	152	147	147
query62	692	650	577	577
query63	239	183	192	183
query64	2506	750	594	594
query65	
query66	1792	457	336	336
query67	29701	29720	29509	29509
query68	
query69	432	307	260	260
query70	982	955	970	955
query71	310	221	209	209
query72	2847	2842	2411	2411
query73	839	797	447	447
query74	5116	4976	4747	4747
query75	2634	2579	2218	2218
query76	2327	1148	787	787
query77	352	369	275	275
query78	12306	12402	11764	11764
query79	1258	1018	748	748
query80	499	461	371	371
query81	449	277	238	238
query82	231	166	118	118
query83	264	265	241	241
query84	
query85	810	494	414	414
query86	341	294	284	284
query87	3382	3315	3249	3249
query88	3583	2726	2677	2677
query89	402	381	343	343
query90	2157	177	167	167
query91	168	163	132	132
query92	61	59	55	55
query93	1436	1485	952	952
query94	538	346	316	316
query95	677	447	328	328
query96	1056	840	354	354
query97	2685	2688	2560	2560
query98	209	208	204	204
query99	1138	1180	1025	1025
Total cold run time: 249686 ms
Total hot run time: 167905 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 39.47% (435/1102) 🎉
Increment coverage report
Complete coverage report

morningman and others added 9 commits June 13, 2026 06:05
…ma-cache (CI 968828)

Root cause: PluginDrivenSysExternalTable did not override getSchemaCacheValue(), so it
inherited ExternalTable.getSchemaCacheValue() which routes through ExternalCatalog.getSchema()
and re-resolves the table by name in the db map. A transient system table (e.g. tbl$snapshots /
tbl$manifests) is never registered in that map, so the lookup failed with "failed to load schema
cache value for: ...$snapshots". Regression from the paimon SPI migration; legacy
PaimonSysExternalTable avoided it by overriding getSchemaCacheValue()/initSchema() to compute on
the transient instance.

Solution: override getSchemaCacheValue() (and initSchema(SchemaCacheKey)) to compute the schema
directly via the inherited PluginDrivenExternalTable.initSchema() (which honors this class's
resolveConnectorTableHandle that threads the sys-table handle), memoized with double-checked
locking — mirroring legacy PaimonSysExternalTable.

Tests: covered by existing e2e suites paimon_system_table ($manifests), paimon_time_travel
($snapshots), test_paimon_system_table_auth (re-run in CI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…68828)

Root cause: PaimonConnectorMetadata.mapFields built ConnectorColumn via the 5-arg ctor, which
defaults isKey=false; ConnectorColumnConverter propagates it, so DESC showed Key=false for every
paimon column. Legacy PaimonExternalTable/PaimonSysExternalTable always set Column isKey=true (3rd
positional arg) for every column, so the .out files expect Key=true. Caused test_paimon_schema_change,
test_paimon_char_varchar_type, test_paimon_timestamp_with_time_zone DESC diffs.

Solution: pass isKey=true via the 6-arg ConnectorColumn ctor in mapFields (single chokepoint for
latest + at-snapshot + system-table schema paths; toSchemaCacheValue preserves isKey on remap).

Tests: extended PaimonConnectorMetadataTest.getTableSchemaForcesColumnsNullableForLegacyParity to
pin isKey=true for both a PK and a non-PK column.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… split (CI 968828)

Root cause: the paimon (and hudi) plugin-zip bundled org.apache.thrift:libthrift and loaded
org.apache.thrift.* child-first (not in the connector parent-first allowlist), while fe-thrift is
provided so org.apache.doris.thrift.TFileScanRangeParams resolves parent-first and implements the
PARENT's TBase. PaimonScanPlanProvider.encodeSchemaEvolution()'s TSerializer.serialize(carrier)
then mixes a child TSerializer with a parent-TBase carrier -> IncompatibleClassChangeError. Being an
Error (not Exception), it escaped catch(Exception) and the connection handler, killing the mysql
session. This was the dominant CI failure (~19 tests: 2 ANALYZE, the family-D connection drops, and
the predict/timestamp_tz/sql_block_rule explain failures).

Solution:
- Exclude org.apache.doris:fe-thrift + org.apache.thrift:libthrift from the paimon and hudi
  plugin-zip assemblies, so org.apache.thrift.* resolves from the single parent fe-core copy that
  also owns org.apache.doris.thrift.* (matches the es/jdbc/hive/maxcompute assemblies).
- Defense-in-depth: broaden encodeSchemaEvolution's catch to Exception | LinkageError so any future
  linkage error surfaces as a clean per-query failure instead of an uncaught Error that kills the
  whole connection (this is what turned ~5 real failures into ~19 collateral ones).

Verified: rebuilt paimon and hudi plugin zips no longer contain libthrift/fe-thrift.
Tests: e2e re-run in CI (the native-path paimon suites).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ilter scans (CI 968828)

Root cause: on the SPI plugin scan path, PaimonScanPlanProvider.getScanNodeProperties emitted the
paimon.predicate property only when filter.isPresent() && !predicates.isEmpty(), and
populateScanLevelParams set the thrift field only when non-null. So a paimon read with no
pushed-down filter (e.g. force_jni_scanner=true `select *`) omitted paimon_predicate entirely; BE
then omitted the JNI key, and PaimonJniScanner.getPredicates() called PaimonUtils.deserialize(null)
-> NPE "encodedStr is null". Legacy PaimonScanNode.createScanRangeLocations always serialized the
(possibly empty) predicate list, so the field was always present. Caused test_paimon_catalog_varbinary,
paimon_tb_mix_format, paimon_partition_legacy, paimon_timestamp_types, test_paimon_partition_table.

Solution:
- getScanNodeProperties always serializes the predicate list (empty list -> non-null base64 string)
  and emits paimon.predicate unconditionally, restoring the legacy invariant.
- BE backstop: PaimonJniScanner.getPredicates() treats a null paimon_predicate param as "no filter"
  (returns emptyList) so the JNI reader never NPEs on a missing param.

Tests: PaimonScanPlanProviderTest.getScanNodePropertiesAlwaysEmitsPredicateForNoFilterScan pins that
a no-filter scan emits paimon.predicate and it deserializes to an empty list.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8-family root-cause analysis (adversarially verified) of the 37 external-regression failures.
7 in-scope paimon-SPI regressions + 2 out-of-scope (hive CTAS stale test; BE shutdown ASAN race).
RC-1/2/6/7 fixed (contained); RC-3/4/5 deferred to the docker-gated self-contained-classloader batch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…imon plugin (CI 968828)

Root cause: the connector sets fs.oss.impl=com.aliyun.jindodata.oss.JindoOssFileSystem, but that impl
ships only in the thirdparty jindofs jars (packaged by post-build.sh into fe/lib/jindofs, not a maven
artifact). The paimon plugin runs child-first, so JindoOssFileSystem resolves from the parent and
cannot be cast to the plugin's child-loaded org.apache.hadoop.fs.FileSystem -> "JindoOssFileSystem
cannot be cast to FileSystem" -> "Unknown database" on first OSS listing (paimon_base_filesystem,
test_paimon_deletion_vector_oss). The maven route is unbuildable (jindo-sdk/jindo-core are bound to an
undeclared jindodata repo -> "present but unavailable"; runtime jindofs is 6.10.4, not in maven).

Solution: after deploying the connector plugins, copy the jindofs jars (already placed in fe/lib/jindofs
by post-build.sh) into the paimon plugin lib so JindoOssFileSystem loads child-first alongside the
plugin's own hadoop FileSystem. Naturally gated (no-op unless --jindofs/DISABLE_BUILD_JINDOFS=OFF).

CAVEAT (docker-gated, enablePaimonTest=true): jindo-core ships a native lib that binds to one
classloader per JVM, so this is safe only while no concurrent non-paimon path loads jindo from
fe/lib/jindofs in the same FE process — must be confirmed by the docker paimon suite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on plugin (CI 968828)

Root cause: the prior fix (FIX-PAIMON-HADOOP-CLASSLOADER) bundled hadoop-aws into the plugin
(S3AFileSystem child-first) but NOT the AWS SDK v2 (hadoop-aws declares it as software.amazon.awssdk:bundle,
which fe/pom.xml excludes). So the plugin's S3AInternalAuditConstants.<clinit> registered an
ExecutionAttribute against the single PARENT-loaded sdk-core static, colliding with fe-core's S3A in
ExecutionAttribute.ensureUnique() -> ExceptionInInitializerError that permanently poisoned S3A for the
whole FE JVM (test_iceberg_jdbc_catalog/statistics/case_sensibility, test_paimon_statistics).

Solution: bundle the AWS SDK v2 (software.amazon.awssdk:s3 + apache-client, BOM-managed 2.29.52) into the
plugin child-first, so the plugin's S3A registers against its OWN ExecutionAttribute static. s3's compile
closure brings sdk-core (ExecutionAttribute); apache-client is explicit (hadoop-aws wires ApacheHttpClient).
software.amazon.awssdk stays child-first (not parent-first) — the separate child SDK copy is the point.

Verified: rebuilt plugin zip bundles lib/sdk-core-2.29.52.jar containing
software/amazon/awssdk/core/interceptor/ExecutionAttribute.class. Runtime S3A read + assumed-role/STS
docker-gated (enablePaimonTest=true).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… client (CI 968828)

Root cause: paimon-hive-connector's RetryingMetaStoreClientFactory probes getProxy(HiveConf,...) via
reflection, but RetryingMetaStoreClient/HiveMetaHookLoader resolved from the parent hive-catalog-shade-3.1.1
whose getProxy overloads use the PARENT's Configuration/HiveConf Class objects -> exact Class-identity
mismatch across loaders -> all probes NoSuchMethodException -> "Failed to create the desired metastore
client" (test_create_paimon_table). The metastore itself is reachable.

Solution: bundle org.apache.hive:hive-metastore:2.3.7 (RetryingMetaStoreClient/HiveMetaStoreClient/
HiveMetaHookLoader + metastore api) child-first so its getProxy(HiveConf,...) overloads compile against the
SAME child-bundled hive-common-2.3.9 HiveConf the connector builds. 2.3.7 pairs with hive-common 2.3.9
(API-stable HiveConf) and is fastutil-CLEAN, so unlike hive-catalog-shade it does not reintroduce the
fastutil collision. libfb303 rides transitively; server-side datanucleus/derby/hbase/tephra, the stale
hadoop-2.7.2 trio + guava, and libthrift are excluded (libthrift stays parent-first like the other
connectors).

Verified: rebuilt plugin zip bundles lib/hive-metastore-2.3.7.jar (RetryingMetaStoreClient with 5
getProxy(HiveConf) overloads) + libfb303; 0 fastutil entries; no hadoop-2.7.2 leak. The thrift
0.9.3-vs-host-0.16.0 wire skew and the DLF ProxyMetaStoreClient path are docker-gated (enablePaimonTest=true).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
RC-3 AWS SDK (b5205c4), RC-5 HMS client (7841830), RC-4 jindo via build.sh (e881247).
Runtime behavior gated on the docker paimon suite (enablePaimonTest=true).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@morningman

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 28510 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d7d63df5179caab15b965147774e3593c576585e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17772	4054	4027	4027
q2	q3	10737	1387	814	814
q4	4684	477	335	335
q5	7537	886	578	578
q6	179	165	134	134
q7	776	846	601	601
q8	9494	1553	1523	1523
q9	6506	4412	4434	4412
q10	6819	1806	1519	1519
q11	439	271	245	245
q12	633	424	294	294
q13	18151	3616	2743	2743
q14	264	251	234	234
q15	q16	816	763	699	699
q17	921	908	962	908
q18	6931	5677	5475	5475
q19	1533	1173	1037	1037
q20	516	413	261	261
q21	5855	2615	2361	2361
q22	428	355	310	310
Total cold run time: 100991 ms
Total hot run time: 28510 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4383	4237	4264	4237
q2	q3	4469	4944	4334	4334
q4	2053	2165	1338	1338
q5	4477	4279	4258	4258
q6	225	172	125	125
q7	1693	1624	1727	1624
q8	2597	2154	2137	2137
q9	7849	7968	7903	7903
q10	4774	4724	4276	4276
q11	573	443	392	392
q12	778	749	529	529
q13	3425	3492	2929	2929
q14	300	297	299	297
q15	q16	721	787	628	628
q17	1341	1296	1333	1296
q18	8474	7420	7172	7172
q19	1135	1113	1060	1060
q20	2208	2224	1928	1928
q21	5279	4562	4453	4453
q22	519	454	429	429
Total cold run time: 57273 ms
Total hot run time: 51345 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 168080 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d7d63df5179caab15b965147774e3593c576585e, data reload: false

query5	4323	614	499	499
query6	433	186	170	170
query7	5205	594	311	311
query8	364	214	207	207
query9	8749	4096	4082	4082
query10	457	327	269	269
query11	5948	2357	2126	2126
query12	154	101	97	97
query13	1263	615	463	463
query14	6384	5432	5111	5111
query14_1	4442	4427	4410	4410
query15	208	198	184	184
query16	993	457	436	436
query17	1138	704	594	594
query18	2445	491	360	360
query19	210	187	143	143
query20	117	111	107	107
query21	218	139	116	116
query22	13600	13585	13397	13397
query23	17408	16495	16194	16194
query23_1	16341	16251	16228	16228
query24	7489	1784	1302	1302
query24_1	1328	1297	1282	1282
query25	541	444	374	374
query26	1307	290	161	161
query27	2715	572	320	320
query28	4419	1990	1995	1990
query29	1052	603	486	486
query30	304	237	194	194
query31	1120	1048	956	956
query32	102	58	56	56
query33	518	301	242	242
query34	1183	1168	652	652
query35	751	794	676	676
query36	1369	1389	1255	1255
query37	153	102	90	90
query38	3199	3116	3088	3088
query39	934	936	890	890
query39_1	874	886	864	864
query40	224	116	98	98
query41	61	59	61	59
query42	94	93	91	91
query43	314	318	275	275
query44	
query45	202	189	181	181
query46	1071	1205	721	721
query47	2329	2350	2255	2255
query48	363	416	286	286
query49	625	454	345	345
query50	1038	377	262	262
query51	4397	4307	4272	4272
query52	85	92	75	75
query53	238	269	193	193
query54	267	221	199	199
query55	77	73	68	68
query56	234	225	202	202
query57	1443	1395	1315	1315
query58	237	205	210	205
query59	1550	1618	1408	1408
query60	277	241	251	241
query61	146	139	147	139
query62	699	648	570	570
query63	224	182	182	182
query64	2519	757	595	595
query65	
query66	1804	441	345	345
query67	29920	29598	28860	28860
query68	
query69	424	302	268	268
query70	916	951	933	933
query71	298	217	209	209
query72	3005	2600	2357	2357
query73	881	799	442	442
query74	5134	4956	4780	4780
query75	2643	2552	2217	2217
query76	2303	1146	802	802
query77	343	359	294	294
query78	12255	12374	12004	12004
query79	1458	1031	732	732
query80	562	460	372	372
query81	454	276	245	245
query82	588	160	116	116
query83	353	275	246	246
query84	
query85	834	496	414	414
query86	366	302	279	279
query87	3413	3346	3186	3186
query88	3669	2759	2724	2724
query89	427	389	324	324
query90	1875	185	183	183
query91	173	168	133	133
query92	60	61	58	58
query93	1483	1509	841	841
query94	538	348	319	319
query95	680	375	443	375
query96	1050	807	336	336
query97	2683	2703	2564	2564
query98	211	204	202	202
query99	1141	1169	1044	1044
Total cold run time: 250642 ms
Total hot run time: 168080 ms

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 62.40% (692/1109) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants