[draft](catalog) Master catalog spi 07 paimon#64445
Open
morningman wants to merge 54 commits into
Open
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
TPC-H: Total hot run time: 29675 ms |
Contributor
TPC-DS: Total hot run time: 168026 ms |
Contributor
FE UT Coverage ReportIncrement line coverage |
This multi-month refactor needs persistent state for progress, decisions, risks, and cross-session agent handoff. Establishes a file-based tracking system including dashboard, ADR decision log, deviation log, risk register, per-stage task files, per-connector tracking, and an agent collaboration playbook covering context budget / subagent usage / handoff norms. Closes 18 design decisions (D-001..D-018) and registers 14 risks (R-001..R-014). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…T27) (apache#63582) ## Summary Lands the P0 SPI baseline for the catalog-SPI migration (master plan §3.1 / RFC §2.1), with zero impact on the already-migrated JDBC + ES connectors. - **Batch 0** (commits 1-2): SPI types + fe-core bridges — `ConnectorMetaInvalidator`, `ConnectorTransaction`, `ConnectorMvccSnapshot`, `ExternalMetaCacheInvalidator`, `ConnectorMvccSnapshotAdapter`, `PluginDrivenTransactionManager` generalization. - **Batch 1** (commit 3): DDL + Partition SPI — `ConnectorCreateTableRequest` + 4 spec POJOs, 4 new defaults on `ConnectorTableOps`, 3 new fields on `ConnectorPartitionInfo`, fe-core converter, `PluginDrivenExternalCatalog.createTable` routing. - **Batch 2** (commit 4): Import-gate + unit tests — `tools/check-connector-imports.sh` wired through exec-maven-plugin; `FakeConnectorPlugin` covering every default fall-through; routing tests for the invalidator; converter tests for all 4 partition styles + 2 bucket flavors. ## Commits - `[feat](connector) add P0 batch 0 SPI baseline: MetaInvalidator / Transaction / MvccSnapshot` (T03-T08) - `[feat](connector) wire P0 batch 0 SPI into fe-core` (T09-T12) - `[feat](connector) add P0 batch 1 SPI: CreateTableRequest + listPartitions` (T13-T20) - `[feat](connector) add P0 batch 2 gate + unit tests` (T21-T23, T26-T27) ## Test plan - [x] `mvn -pl fe-connector/fe-connector-api,fe-connector/fe-connector-spi -am compile` — SPI modules compile - [x] `mvn -pl fe-core -am compile -Dmaven.build.cache.enabled=false` — fe-core compile - [x] `mvn -pl fe-core checkstyle:check` — 0 violations - [x] `mvn -pl fe-connector validate` — import gate runs and passes (baseline clean) - [x] `mvn -pl fe-core -am test -Dtest='FakeConnectorPluginTest,ExternalMetaCacheInvalidatorTest,CreateTableInfoToConnectorRequestConverterTest,ConnectorPluginManagerTest,ConnectorSessionImplTest'` — 39/39 green - [x] `mvn -pl fe-connector/fe-connector-jdbc,fe-connector/fe-connector-es -am compile` — downstream connectors compile unchanged - [ ] JDBC regression-test suite (T24) — to be exercised by this PR's CI pipeline - [ ] ES regression-test suite (T25) — to be exercised by this PR's CI pipeline ## Tracking Full plan, decisions, and risk log live under `plan-doc/` in the repo (introduced by 6315983, already on the base branch). Per-task status: `plan-doc/tasks/P0-spi-foundation.md`. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pache#63641) ## Summary P1 batch A — close out scan-node SPI consolidation while keeping migration-period fallbacks in place. Three surgical changes route `PluginDrivenExternalTable` first in the nereids translator hot paths so already-migrated SPI connectors (JDBC, ES) take the SPI route, while the existing `instanceof XExternalTable` chains remain as fallbacks for connectors still pending migration (P3–P7). - **T3** — `PhysicalPlanTranslator.visitPhysicalFileScan`: move the existing `PluginDrivenExternalTable` branch from position 8 to position 1; the 7 connector-specific branches (HMS / Iceberg / Paimon / Trino / MaxCompute / LakeSoul / RemoteDoris) stay in place as migration-period fallbacks - **T4** — `PhysicalPlanTranslator.visitPhysicalHudiScan`: add a `PluginDrivenExternalTable` branch routed to `PluginDrivenScanNode.create(...)`, threading `tableSnapshot` + `scanParams` through `FileQueryScanNode` setters; `incrementalRelation` flagged as a P3 Hudi SPI extension TODO. The new branch is unreachable today (`PhysicalHudiScan` is only built for `HMSExternalTable + DLAType.HUDI`), so this is groundwork for P3 with zero current-day runtime impact - **T5** — `LogicalFileScan`: in `computeOutput()`, add a `PluginDrivenExternalTable` branch calling new helper `computePluginDrivenOutput()` — same shape as `computeIcebergOutput`, using `getFullSchema()` + virtualColumns; in `supportPruneNestedColumn()`, add an explicit `PluginDrivenExternalTable → false` branch. Both behaviorally equivalent for JDBC/ES today since they have no hidden cols and no virtualColumns P1 batch B (T1 — delete 13 legacy `Jdbc*Client` + `JdbcFieldSchema`) is deferred to P8 because the 3 fe-core callers — `PostgresResourceValidator`, `StreamingJobUtils`, `CdcStreamTableValuedFunction` — are live CDC streaming code that requires SPI extension for `getPrimaryKeys` / `getColumnsFromJdbc` / `listTables`, which is out of P1 surgical scope. Background and tracking docs live in `plan-doc/` (Master Plan §3.2 P1, tasks/P1-scan-node-cleanup.md, decisions log). ## Test plan - [x] `mvn -pl fe-core -am compile -Dmaven.build.cache.enabled=false` → BUILD SUCCESS - [x] `mvn -pl fe-core checkstyle:check` → 0 violations - [x] JDBC + ES regression-test passing — baseline established in P0 / PR apache#63582 - [ ] PR CI green on this PR - [ ] Manual scan-node smoke for an SPI connector — JDBC `SELECT *` should fall into the new `PluginDrivenExternalTable` branch first 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…apache#64096) ### What problem does this PR solve? Related PR: apache#63582 (P0 — SPI baseline), apache#63641 (P1 — nereids plugin-driven routing) Problem Summary: This is **P2** of the catalog SPI migration and targets the `branch-catalog-spi` feature branch (continuing P0 apache#63582 and P1 apache#63641). It fully migrates `trino-connector` off the legacy in-tree `fe-core/datasource/trinoconnector/` implementation and onto the connector SPI module `fe-connector-trino`, making `trino-connector` the first connector to complete the SPI consumption playbook that later connectors will reuse as a template. All five batches land together so there is no intermediate state where a newly-created trino catalog cannot be serialized. **Batch A — complete the SPI surface (`fe-connector-trino` only, no fe-core changes)** - `TrinoConnectorProvider.validateProperties`: enforce the required `trino.connector.name` property at `CREATE CATALOG` time (ported from the legacy `checkProperties`). - `TrinoDorisConnector.preCreateValidation`: call `ensureInitialized()` so plugin loading + connector-factory resolution happen at catalog creation instead of being deferred to the first `SELECT`. - `TrinoConnectorDorisMetadata.applyFilter` / `applyProjection`: bridge Trino native filter/projection pushdown, reusing `TrinoPredicateConverter` to translate a Doris `ConnectorExpression` into a Trino `TupleDomain`. `remainingFilter` is conservatively returned as the original expression to match legacy behavior (conjuncts are not stripped; BE re-evaluates them). **Batch B — fe-core bridge for image compatibility** - `GsonUtils`: atomically replace the three legacy `registerSubtype` entries (`TrinoConnectorExternalCatalog` / `Database` / `Table`) with `registerCompatibleSubtype` redirects onto the `PluginDrivenExternal*` hierarchy. This must be atomic — `RuntimeTypeAdapterFactory` rejects duplicate labels, so keeping both bindings would throw at static init. Mirrors what ES/JDBC already did. - `PluginDrivenExternalCatalog.gsonPostProcess`: extract a `legacyLogTypeToCatalogType()` helper that maps `Type.TRINO_CONNECTOR` → `"trino-connector"`; the generic `name().toLowerCase()` would otherwise produce the wrong `"trino_connector"` (underscore) that `CatalogFactory` does not recognize. - `PluginDrivenExternalTable.getEngine()` / `getEngineTableTypeName()`: add `trino-connector` branches that preserve the legacy engine-name / table-type display across `SHOW TABLE STATUS` and `information_schema`. **Batch C — flip the switch** - Add `"trino-connector"` to `CatalogFactory.SPI_READY_TYPES` so catalog creation routes through the SPI path. **Batch D — remove legacy code** - Drop the `instanceof TrinoConnectorExternalTable` scan branch in `PhysicalPlanTranslator` (the `PluginDrivenExternalTable` SPI branch already handles it). - Drop `case "trino-connector"` in `CatalogFactory`. - Delete `fe-core/datasource/trinoconnector/` (10 files) and the now-dead legacy `TrinoConnectorPredicateTest`. - Route the `TRINO_CONNECTOR` db-build case in `ExternalCatalog` to `PluginDrivenExternalDatabase` (mirrors the migrated JDBC case). - **Retained for image compatibility**: the `InitCatalogLog.Type.TRINO_CONNECTOR` and `TableType.TRINO_CONNECTOR_EXTERNAL_TABLE` enums, the GsonUtils redirects, and the `MetastoreProperties` trino-connector entry. **Batch E — tests + tracking docs** - 29 JUnit 5 unit tests over the plugin-free converters: - `TrinoPredicateConverterTest` — `ConnectorExpression` pushdown trees → Trino `TupleDomain` (EQ / range / NE / IN / IS [NOT] NULL / AND / OR, Slice encoding), plus graceful degradation to `TupleDomain.all()` on null/unsupported input. - `TrinoTypeMappingTest` — Trino SPI type → Doris `ConnectorType` (scalars, decimal precision/scale, timestamp precision clamp, array/map/struct, unsupported-type failure). - `TrinoConnectorProviderTest` — `validateProperties` fast-fails when `trino.connector.name` is missing/empty. - No Trino plugin/cluster required; plugin-dependent paths remain covered by the existing `external_table_p0/p2` `trino_connector` regression suites. - Sync the migration tracking docs under `plan-doc/` (already carried on this feature branch since P0). **Net effect**: 28 files, +1025 / −2681 (~1656 LOC net removed). Old FE images holding legacy trino catalogs / databases / tables deserialize onto the `PluginDrivenExternal*` hierarchy through the GsonUtils string-name redirect, with engine-name display preserved. **Deferred (follow-ups, not in this PR)**: - `trino_connector_migration_compat` regression test (old-image deserialization) — requires a running cluster + Trino plugin + docker, unavailable in this dev environment; tracked as a CI/cluster follow-up. - The plugin-install documentation update lives in the `doris-website` repo and is handled separately. ### Release note None ### Check List (For Author) - Test - [x] Unit Test — 29 new tests in `fe-connector-trino` (predicate converter / type mapping / property validation). - [ ] Regression test — existing `trino_connector` suites cover plugin paths; the new old-image compat regression is deferred to a CI/cluster follow-up. - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason - Behavior changed: - [x] No. Internal routing moves from the legacy fe-core path to the SPI path; image compatibility, engine-name display, and pushdown semantics all mirror the legacy behavior. All batches land together, so there is no serialization-gap window. - Does this need documentation? - [x] Yes. The trino-connector plugin-install doc update is a follow-up in the `doris-website` repo. ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label
…tch design (hybrid, T02-T08) (apache#64143) ## Proposed changes testing with apache#64146 P3 of the catalog-SPI migration (base: `branch-catalog-spi`). Migrates the **hudi** connector following the **hybrid** strategy (D-019): harden the dormant HMS-over-SPI hudi connector to correctness parity, build a test baseline, and write the per-table dispatch design — **all behind the closed gate** (`SPI_READY_TYPES` unchanged). >⚠️ **No user-visible behavior change.** The SPI hudi path stays dormant (gate closed); hudi queries continue to use the legacy `HMSExternalTable.dlaType=HUDI` path. This PR removes correctness blockers ahead of the live cutover (deferred to P7 / batch E). ### What's included **Correctness fixes (hardening dormant code, behind gate):** - **T02** — fix hudi JNI `column_types` double bug: emit full Hive type strings (was Doris bare type names, losing precision/scale/subtypes) and send `column_names`/`column_types`/`delta_logs` as typed lists end-to-end (was comma join/split, which shattered `decimal(10,2)` / `struct<...>`). Matches the BE `hudi_jni_reader.cpp` contract (names `,` / types `#` / delta `,`). - **T04** — fail loud on time-travel / incremental read in the SPI `visitPhysicalHudiScan` branch (was silently returning the latest snapshot / silently full-scanning). - **T05** — real EQ/IN partition pruning in `HudiConnectorMetadata.applyFilter` (was a placeholder that ignored predicates and unconditionally switched the partition source from Hudi-metadata to HMS); faithfully mirrors `HiveConnectorMetadata.applyFilter`. - **T07** — column-name casing fix in `avroSchemaToColumns` (top-level lowercase, mirroring legacy `HMSExternalTable`). **Test baseline (all three connector modules started P3 with 0 tests):** - `fe-connector-hudi` (33): type-mapping / schema-parity (COW/MOR golden) / table-type / partition-pruning / scan-range. - `fe-connector-hms` (12): shared Hive-type-string parser tests. - `fe-connector-hive` (14): file-format / partition-pruning (mirrors T05). - COW/MOR schema is **type-agnostic** (golden parity vs legacy `initHudiSchema`); table type only affects scan planning. **Decisions / design (code-grounded, design-only):** - **T03** — defer `schema_id`/`history_schema_info` field-id evolution to batch E (DV-006; not a model-agnostic SPI fix). - **T06** — keep MVCC/snapshot SPI defaults (opt-out) + document (DV-007). - **T08** — `tableFormatType` dispatch design memo + **D-020**: single `hms` catalog per-table routing via a new backward-compatible `ConnectorMetadata.getScanPlanProvider(handle)` (per-table provider seam); refines D-005. The keystone gap is split into M1 (identity consumption, fe-core reads `tableFormatType` as an opaque string) and M2 (scan routing). ### Deferred to batch E / P7 (not in this PR) Gate flip (`SPI_READY_TYPES += hms/hudi`), fe-core `tableFormatType` consumption (M1+M2 implementation), live cutover, delete legacy `datasource/hudi/`, full incremental/time-travel/MVCC, Iceberg-on-hms via SPI (needs P6 `IcebergScanPlanProvider`), cluster/runtime validation. ### Verification Per task tracking, each code batch landed with: per-module compile + checkstyle 0 (incl. test sources) + connector import-gate pass + new unit tests green. The two most recent commits are docs-only (`plan-doc/`); the code is unchanged since the last green batch. Gate stays closed → the dormant SPI path is unreachable at runtime → zero live-path risk. CI re-verifies. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…core + make fe-core odps-free (T07-T09) (apache#64300) Follow-up to apache#64253 (the MaxCompute catalog-SPI cutover). After the cutover a `max_compute` catalog deserializes to `PluginDrivenExternalCatalog` and no legacy `MaxComputeExternal*` object is ever instantiated, so the legacy MaxCompute subsystem in fe-core is dead code. This removes it and makes fe-core's dependency tree fully odps-free. **1. Remove legacy subsystem** (`7a4db351100`) - Delete 20 fe-core files: `datasource/maxcompute/*` (incl. `MCTransaction`, `MaxComputeScanNode`/`Split`), the MaxCompute sink/insert/txn plumbing, and 2 legacy-only tests. - Clean ~21 reverse-reference sites (imports + dead `instanceof`/visitor/rule branches), keeping every `PluginDriven`/connector sibling branch and the image/replay keep-set (GsonUtils compat strings; `TableType`/`TransactionType`/`TableFormatType`/`InitCatalogLog.Type` `MAX_COMPUTE` enums; block-id thrift). - Rewire 3 tests; e.g. `FrontendServiceImplTest`'s block-id RPC test now mocks the generic `Transaction` SPI, since `getMaxComputeBlockIdRange` reads the PluginDriven connector transaction. **2. Make fe-core odps-free** (`409300a75b8`) - Drop the two odps deps from `fe-core/pom.xml`. - Move `MCUtils` from fe-common into `be-java-extensions/max-compute-connector` (its only consumer after the removal); keep `MCProperties` (odps-free constants) in fe-common. - Drop `odps-sdk-core` from fe-common — it was also leaking netty/protobuf transitively to fe-common's own `DorisHttpException`/`GsonUtilsBase`, so declare `netty-all` + `protobuf-java` directly (proper dependency hygiene). **3. Doc-sync** (`f8c305765e8`) — plan-doc PROGRESS/HANDOFF/deviations/design tracking notes. - `mvn -pl :fe-core -am test-compile` (main+test) passes; checkstyle 0 violations; connector import-gate passes. - `grep -rn com.aliyun.odps fe/fe-core/src` → empty. - `mvn -pl :fe-core dependency:tree | grep odps` → empty (no odps, direct or transitive). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
本 session 仅调研+设计。14-agent code-grounded recon + cross-cut 对抗复审, 覆盖 paimon 5 功能区(普通读/系统表/procedure/DDL/mtmv)旧框架实现 → 映射新 catalog SPI → 对齐 maxcompute 连接器接口一致性。 新增: - research/p5-paimon-migration-recon.md: 5 区旧实现 + E1–E10 SPI 状态 + 跨切面风险 + MC 一致性 11 约定 + 测试基线 - tasks/P5-paimon-migration.md: old→new 映射 + 30 TODO/B0–B9 批 + 批次依赖图 + 验收标准 用户签字决策: - D-037 (P5-D1): flavor=单 Catalog + createCatalog flavor switch(MC 一致, 不建 backend 模块——5 个 backend 模块是空壳) - D-038 (P5-D2): MTMV/MVCC 桥 P5 内实现(fe-core PaimonPluginDrivenExternalTable), 翻闸 gated on 它,禁静默读 latest 回归 证伪 3 先验: backend 模块空壳(连接器走单 Catalog stub)/ FE 分发部分已预接 (残留=连接器 listPartitions)/ Base64 非 blocker(BE 有 STD fallback)。 procedure 区=零可迁 doc-only。 doc 同步: connectors/paimon.md(修 3 stale 表述)、decisions-log.md(+D-037/D-038, 36→38)、PROGRESS.md(header/§一/§二/§三/§四/§六/§七)、HANDOFF.md(覆盖,不留折叠历史)。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
T01: extract PaimonCatalogOps injection seam (5 read methods, B0 read-only) over the paimon SDK Catalog; refactor PaimonConnectorMetadata to inject it (6 call sites migrated, read path byte-for-byte unchanged); build the first fe-connector-paimon test module (no-mockito recording fake, mirroring MC's McStructureHelper): 9 metadata UTs pinning the databaseExists try/catch and the getColumnHandles reload-fallback, FakePaimonTable (fail-loud on non-read methods), and an env-gated live connectivity smoke. T02: R-007 paimon.version 3-way pin invariant comment (FE connector + BE paimon-scanner + preload-extensions already aligned at 1.3.1 via the single fe/pom.xml property); offline FE->BE serialized-Table round-trip smoke (real FileSystemCatalog -> connector encode -> BE-mirrored URL-first/STD-fallback decode, asserts rowType/partition/primary keys); parity-baseline doc inventorying the 41 existing regression suites as the after-cutover parity gate plus the real connector-side gaps and the live-e2e hard gate. Connector module: Tests run: 12, Failures: 0, Errors: 0, Skipped: 1 (the skip is the env-gated live test); checkstyle 0; import-gate clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Single-Catalog flavor switch on paimon.catalog.type for all five flavors (filesystem/hms/rest/jdbc/dlf), mirroring the legacy fe-core flavor properties without importing fe-core/fe-common. - New PaimonCatalogFactory: pure validate() + buildCatalogOptions() (paimon.catalog.type -> paimon `metastore` opt, per-flavor options, paimon.* passthrough excl storage prefixes) + buildHadoopConfiguration / buildHmsHiveConf / buildDlfHiveConf + requireOssStorageForDlf. - PaimonConnector: thread ConnectorContext; createCatalog wires all 5 flavors live (filesystem/jdbc with Hadoop Configuration, rest Options-only, hms/dlf with HiveConf), each wrapped in context.executeAuthenticated (Kerberos seam). JDBC DriverShim ported with driver-url resolution via getEnvironment() (replaces forbidden JdbcResource). - PaimonConnectorProperties: all flavor key constants (multi-alias String[]). - PaimonConnectorProvider: validateProperties override -> factory.validate. - pom: add paimon-hive-connector-3.1 + hadoop-common + hive-common (hive-common over hive-catalog-shade to avoid the fastutil conflict). - 31 new no-mockito unit tests (PaimonCatalogFactoryTest); module 43/0/0/1, checkstyle 0, import-gate clean. hms/dlf live connection is gated on B7 cutover + live-e2e: the Thrift metastore client is host-provided (not bundled) with a child-first Configuration/HiveConf cross-loader hazard to verify; jdbc driver_url FE security allow-list + external hive-site.xml file load are deferred. All documented in code NOTEs and plan-doc. rest also requires warehouse (legacy parity). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Connector-side only; no fe-core / fe-connector-api / fe-connector-spi changes. B2 and B3 were both uncommitted and are entangled in the same files (PaimonConnectorMetadata, PaimonCatalogOps, PaimonConnector, RecordingPaimonCatalogOps), so they are committed together. B2 normal-read (T06-T10): - T06 PaimonScanPlanProvider transient-Table reload fallback (planScan + getScanNodeProperties both guarded) - T07 PaimonPredicateConverter parity-correct TZ (NTZ keeps UTC, LTZ not pushed) + supportsCastPredicatePushdown=false - T08 listPartitionNames/listPartitions/listPartitionValues (legacy display-name parity) + seam listPartitions(Identifier) - T09 doc-only pure-predicate pruning; T10 cache deferred to B8 B3 DDL metadata (T11-T15): - T11 PaimonTypeMapping.toPaimonType (Doris->paimon, byte-parity with legacy DorisToPaimonTypeVisitor; narrow gap preserved) - T12 PaimonSchemaBuilder (ConnectorCreateTableRequest -> paimon Schema) - T13 createTable/dropTable + seam DDL methods + ConnectorContext threaded (D7=B: each DDL op wrapped in executeAuthenticated; read path un-wrapped) - T14 supportsCreateDatabase/createDatabase (HMS-props gate) + dropDatabase(force) (enumerate-loop + native cascade) - T15 offline UTs (no-mockito; WHY+MUTATION) Verified: fe-connector-paimon Tests run: 96, Failures: 0, Errors: 0, Skipped: 1 (live); checkstyle 0; connector import-gate 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port paimon system tables and MVCC snapshots onto the plugin connector SPI. - T16: greenfield E7 SPI on ConnectorTableOps — listSupportedSysTables + getSysTableHandle (default no-ops; MC/jdbc/es/trino unaffected). - T17: PaimonConnectorMetadata implements E7 — names from SystemTableLoader.SYSTEM_TABLES; sys table loaded via the existing getTable seam with a 4-arg Identifier(db,table,"main",sysName); sys handle carries sysTableName + forceJni (binlog/audit_log); shared PaimonTableResolver gives metadata + scan one sys-aware reload rule. - T18: generic fe-core glue — PluginDrivenExternalTable centralizes handle acquisition into resolveConnectorTableHandle and delegates getSupportedSysTables to the connector; new PluginDrivenSysExternalTable (reports PLUGIN_EXTERNAL_TABLE) + PluginDrivenSysTable reuse the live SysTableResolver/NativeSysTable machinery (reusable by future connectors). - T19: forceJni gate so binlog/audit_log go JNI not native; buildTableDescriptor -> HIVE_TABLE (also fixes a latent normal-table SCHEMA_TABLE descriptor gap, DV-024); PluginDrivenScanNode fail-loud guard rejects scan-params/time-travel on system tables. - T20: first E5 MVCC consumer — beginQuerySnapshot/getSnapshotAt/getSnapshotById (empty table -> -1; sys handle -> empty) + SUPPORTS_MVCC_SNAPSHOT/TIME_TRAVEL capabilities. Inert until B5 wires the fe-core MvccTable consumer. Decisions: D-039 (E7 reuses the live SysTable machinery; RFC §10's $-suffix-via-getTableHandle design was never implemented and is superseded, DV-023). Deviations: DV-023, DV-024. Verification: import-gate 0; connector 124 tests pass (1 live skipped); fe-core PluginDriven*Test 100 pass; checkstyle 0; no cutover/B5 leakage (paimon not in SPI_READY_TYPES; PluginDrivenExternalTable still not an MvccTable). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ridge + time-travel + procedure doc no-op B5a (MTMV/MVCC bridge): source-agnostic PluginDrivenMvccExternalTable (MTMVRelatedTableIf+MTMVBaseTableIf+MvccTable, D-042) wiring the B4-inert E5 snapshot SPI; PluginDrivenMvccSnapshot; list-partitions-at-snapshot. B5b (time-travel): scan-pin + AS-OF + tag + branch + @incr across connector (ConnectorTimeTravelSpec, PaimonIncrementalScanParams) and fe-core; holistic review fixes RD-1 (partitioned time-travel empty-universe scan-all guard in PluginDrivenScanNode) + RD-2 (@incr lists-latest partitions/schema). B6/T26: procedure doc no-op — zero migratable code; closed-form reject verified (ExecuteActionFactory:59-62 / CallFunc:42-43). All inert/gated until B7 cutover (paimon NOT yet in SPI_READY_TYPES). Excludes regression-conf.groovy (secrets) + scratch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eview fixes Combines all previously-uncommitted P5 paimon work into one commit (per request). 8 fullpath-review fixes (BLOCKERs + key MAJORs) — connector + SPI + fe-core bridge: - FIX-STORAGE-CREDS: applyStorageConfig translates canonical s3.*/oss.*/AWS_* -> fs.s3a./fs.oss. (+DLF region->OSS endpoint) - FIX-NATIVE-PARTVAL: per-type serializePartitionValue + session TZ (LTZ only); binary/varbinary drops the partition map (no [B@hash garbage) - FIX-TZ-ALIAS: full legacy ZoneId.SHORT_IDS + 4 Doris overrides alias map (CST/PST/EST now resolve for FOR TIME AS OF datetime strings) - FIX-TABLE-STATS: getTableStatistics override + PaimonCatalogOps.rowCount seam (normal AND system tables, via the sys-aware resolveTable) - FIX-CPP-READER: honor enable_paimon_cpp_reader -> native DataSplit.serialize so BE's PaimonCppReader can decode the split - FIX-READ-NOTNULL: mapFields forces read-path columns nullable (legacy parity) - FIX-HMS-CONFRES: new ConnectorContext.loadHiveConfResources hook + 2-arg buildHmsHiveConf file-base merge (external hive-site.xml reaches the metastore) - FIX-REST-VENDED: new ConnectorContext.vendStorageCredentials hook + scan-props vended AWS_* overlay (REST per-table tokens reach BE) Also carries the previously-uncommitted B7 core cutover + D-045/D-046 restores. Tests: fe-connector-paimon 213 pass / 0 fail / 1 skip (live-gated); fe-core compiles + DefaultConnectorContextVendTest 2/0. Each fix's root-cause/patch/UT and impl-time corrections are in plan-doc/tasks/designs/P5-fix-<id>-design.md. Excluded from this commit: regression-test/conf/regression-conf.groovy (plaintext Aliyun keys, pending scrub) and scratch dirs (.audit-scratch/, conf.cmy/, META-INF/, *.bak). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…canonical scheme Root cause: the paimon connector sent native ORC/Parquet data-file paths and deletion-vector (DV) paths to BE un-normalized. The paimon SDK emits warehouse-native schemes (oss://, cos://, obs://, s3a://, or the OSS bucket.endpoint authority form); BE's scheme-dispatched S3 file factory only recognizes s3://. On S3-compatible (non-AWS) warehouses this breaks native reads outright (B-7DF, data file) and silently drops the DV so DELETEd rows reappear (B-7DV, merge-on-read corruption). Legacy PaimonScanNode normalized both via the 2-arg LocationPath.of; the cutover dropped it. The two paths reach BE via different mechanisms (data-file through PluginDrivenSplit's single-arg LocationPath.of -> FileQueryScanNode:568; DV baked into thrift by the connector's populateRangeParams), so a fe-core-bridge-only fix cannot reach the DV path. Solution: new ConnectorContext.normalizeStorageUri SPI hook (identity default, mirroring vendStorageCredentials), implemented in DefaultConnectorContext via the engine's 2-arg normalizing LocationPath.of with the catalog's static storage map (threaded via a new lazy supplier + 4-arg ctor; PluginDrivenExternalCatalog wires it). The connector routes BOTH the data-file and DV paths through it inside the extracted, unit-testable buildNativeRange. JNI path untouched (carries its own FileIO). Fail-loud on un-normalizable paths (legacy parity). Static-vs-vended map scope noted in DV-025 (the pure-vended edge belongs to credential fixes #2/#3). Tests: fe-core DefaultConnectorContextNormalizeUriTest (oss->s3, s3 idempotent, null/blank, empty-map fail-loud); connector PaimonScanPlanProviderTest x3 (both paths normalized + call count, DV-less, no-context raw). paimon module 216/0/0, fe-core targeted green, checkstyle 0, import-gate clean. Live OSS+DV e2e CI-gated (not run). SPI RFC section 21 (E13), deviations DV-025. Also includes the round-2 review report + task list this fix derives from. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mark FIX-URI-NORMALIZE complete (commit 20b19d1) in the task list and update HANDOFF: #1 summary + verification, next session starts at #2 (reuse the normalizeStorageUri BE-scan-prop normalization seam), and the standing reminders (regression-conf.groovy still holds a plaintext key -> path-whitelist only; P2 apache#8/apache#9 need user scope decision first). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…canonical AWS_* Finding B-9 (BLOCKER, rereview2). The paimon connector copied static catalog-level storage credentials/config verbatim into the BE scan-node properties: PaimonScanPlanProvider.getScanNodeProperties iterated the raw catalog properties and emitted location.<rawkey> for any s3./oss./cos./obs./ hadoop./fs./dfs./hive. prefix; the fe-core bridge only strips the location. prefix. BE's native (FILE_S3) reader understands ONLY AWS_ACCESS_KEY/ AWS_SECRET_KEY/AWS_ENDPOINT/AWS_REGION/AWS_TOKEN, so static s3.access_key/ oss.access_key on a private bucket reached BE unintelligible -> no usable credentials -> 403. This is the third credential seam (static->BE-scan), missed by both the prior round and the 8 fixes (review §9.3); the catalog- FileIO seam (FIX-STORAGE-CREDS) and the vended seam (FIX-REST-VENDED) were already closed. Root cause: legacy PaimonScanNode.getLocationProperties returns only CredentialUtils.getBackendPropertiesFromStorageMap(storagePropertiesMap) (the canonical AWS_*/hadoop/dfs map). The cutover replaced that single normalized call with a raw prefix-copy loop; the connector cannot import fe-core's StorageProperties so it had no access to the normalization. Solution (D-048, user-signed full legacy-parity scope): new no-op-default SPI ConnectorContext.getBackendStorageProperties(); DefaultConnectorContext returns getBackendPropertiesFromStorageMap over the storagePropertiesSupplier already wired in FIX-URI-NORMALIZE (no ctor change, CredentialUtils already imported). The connector replaces its raw prefix-copy loop with a context-gated overlay of that map; the vended overlay stays after it (vended wins on collision, legacy precedence). Object-store creds -> AWS_*; HDFS -> canonical hadoop/dfs (preserves user overrides + adds the legacy defaults, folding in the §211 MINOR); drops the non-parity hive.* passthrough. Investigated the AWS_CREDENTIALS_PROVIDER_TYPE=ANONYMOUS two-step edge and confirmed via BE s3_util.cpp (both providers prefer explicit ak/sk over cred_provider_type) that it is harmless — no regression. Connector import-gate stays clean. Tests: fe-core DefaultConnectorContextBackendStoragePropsTest (OSS static creds -> AWS_*, raw alias absent; no-supplier -> empty); connector PaimonScanPlanProviderTest (+getScanNodePropertiesNormalizesStaticCreds raw alias not shipped; modified vended-overlay collision to canonical keys; renamed no-context test -> emits no storage props). Fail-before/pass-after proven by reverting the connector change (2/3 go red). Module 217/0/0 (1 CI-gated skip), checkstyle clean, import-gate clean. Live private-bucket native-read e2e is CI-gated (not run). SPI RFC §22 (E14). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion value to NULL (P4) Root cause: PaimonScanRange.populateRangeParams routed paimon partition values through ConnectorPartitionValues.normalize, which applies Hive-directory null-sentinel coercion (a value of "\N" or "__HIVE_DEFAULT_PARTITION__" -> isNull). That coercion is correct for hudi (path-encoded partitions) but wrong for paimon: paimon partition values are TYPED — serializePartitionValue returns Java-null for a genuine null and the literal toString() otherwise — so a null is never a directory sentinel, and the coercion only ever bites a genuine literal value. A string partition column literally holding "\N" (which paimon does NOT reserve) or "__HIVE_DEFAULT_PARTITION__" was materialized as SQL NULL instead of the literal on the native ORC/Parquet read, diverging from legacy PaimonScanNode.setScanParams (source/PaimonScanNode.java:323-326) and yielding wrong rows for WHERE col='\N' / col IS NULL. The dominant genuine-NULL case is unaffected (both sides set isNull=true and BE ignores the rendered value string when is_null==true, partition_column_filler.h:40-44). Fix (1 file): derive isNull from the Java null ONLY (render genuine null as "", legacy-exact); drop the unused ConnectorPartitionValues import. ConnectorPartitionValues itself is left untouched — hudi (HudiScanRange.java:226) legitimately needs the Hive-directory coercion. The residual scan-vs-prune skew for a literal "__HIVE_DEFAULT_PARTITION__" value lives in the generic fe-core prune bridge (TablePartitionValues), is pre-existing and unchanged by this fix, and is logged as a deviation. Tests: new PaimonScanRangePartitionNullTest pins genuine-null -> (isNull=true, ""); literal "\N" -> (isNull=false, "\N"); literal "__HIVE_DEFAULT_PARTITION__" -> (isNull=false, verbatim); ordinary -> kept. Fail-before (re-inlined coercion) reds the literal + render rows; pass-after green. Full module 261/0/0 (1 CI-gated live skip), checkstyle 0, import-gate clean. Adversarial review (5 angles) SAFE_TO_COMMIT: total convergence of all 3 range builders on populateRangeParams; no query goes correct->wrong. No BE/SPI change; native partition materialization otherwise covered by the CI-gated legacy paimon partition regression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…035]) Records the P4 cleanup pass disposition (P0–P4 now all clear): - FIX-VARCHAR-BOUNDARY (N10.1) `bcee91dcb52` + FIX-PARTITION-NULL-SENTINEL `4b2c2190dc2` landed as independent fix commits. - 15 items accepted as deviations (M5.1 transient-only + 14 display/perf/text/inert/connector-more-correct/false-premise) → [DV-035]. - D-057 logs the user-signed scope; DV-035 the accepted batch. - task-list §P4 marked done; HANDOFF rolled to next session (B8 legacy deletion or cross-connector follow-up batch). Read-only adversarial recon `wf_6884d37b-8ef` re-verified all ~17 review §5/§7 items against current code; the sentinel ACCEPT verdict was refuted by a prune-path skeptic (converted to FIX) and M5.1's "cheap fallback" premise was refuted at impl level (confirmed ACCEPT). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… injection) Next session = a third independent adversarial review of every paimon connector functional path (basic read, @incr, time travel, branch/tag, sys-tables, metadata cache, deletion vectors, multi-metastore, multi-storage, Parquet/ORC native read, type mapping, and a legacy-logic/fallback sweep), checking design + implementation delivery and diffing each path against the legacy datasource/paimon/* reference (kept in-tree for side-by-side). Hard constraint per user: do NOT inject accumulated development priors during the find-and-judge phase — reviewers judge from current code + legacy only; decisions-log / deviations-log / prior review reports / catalog-spi-p5-* memory are consulted ONLY in a final reconciliation phase and must not suppress a finding. B8 legacy deletion deferred until after this review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rows on URI normalize (P9-1, BLOCKER)
Root cause: native ORC/Parquet reads on a Paimon REST catalog over object
storage (oss/cos/obs/s3a) threw during FE planning —
"StoragePropertiesException: No storage properties found for schema: oss".
PaimonScanPlanProvider.normalizeUri routed both the data-file path and the
deletion-vector path through ConnectorContext.normalizeStorageUri, which
normalizes via the catalog's STATIC storage map. That map is empty by design
for REST catalogs (vended creds are per-table/dynamic;
CatalogProperty.initStorageProperties seeds an empty map when vended creds are
enabled), so LocationPath.of(uri, {}) found no scheme entry and threw.
shouldUseNativeReader has no flavor gate, so every REST native read hit it;
the only escape was SET force_jni_scanner=true. DV-025 deferred this exact
corner to FIX-STATIC-CREDS-BE / FIX-REST-VENDED, but those fixed credential
down-flow to BE, not normalizeStorageUri — the deferral was never closed.
Legacy parity: PaimonScanNode.doInitialize computes a vended-overlay storage
map once (VendedCredentialsFactory.getStoragePropertiesMapWithVendedCredentials
— vended REPLACES the empty static map for REST) and uses it for
LocationPath.of at both the data-file (:443) and DV (:296) sites.
Solution: route the per-table vended token into native URI normalization,
replicating legacy precedence.
- SPI: add default overload ConnectorContext.normalizeStorageUri(uri, token)
that ignores the token and delegates to the 1-arg form, so every non-paimon
connector is unaffected.
- fe-core DefaultConnectorContext: extract the vended-typed-map build (filter
cloud props -> StorageProperties.createAll -> index by Type) into a shared
buildVendedStorageMap (single source of truth with vendStorageCredentials, no
drift). The 2-arg override normalizes against the vended map when present and
falls back to the static map otherwise (legacy "vended replaces static"); the
1-arg form delegates with a null token (byte-identical to prior behavior).
vendStorageCredentials keeps an outer try so its fail-soft boundary is
preserved across the refactor.
- connector PaimonScanPlanProvider: extract the vended token ONCE per planScan
(validToken() may refresh) and thread it through buildNativeRanges/
buildNativeRange to both normalize sites. Empty for non-REST (FileIO gate) and
offline -> folds to the static path, so non-REST reads are byte-unchanged.
Tests:
- fe-core DefaultConnectorContextNormalizeUriTest (+3): vended-REST normalize
under an empty static map (the gap that hid the bug twice); fail-loud when the
token is also empty (proves the fix is the token, not a swallow); static-map
path unaffected by an empty token.
- connector PaimonScanPlanProviderTest (+1, 5 call sites updated): the per-table
vended token is threaded verbatim to BOTH the data-file and DV normalize calls
(RecordingConnectorContext now captures the 2-arg token).
- The positive RESTTokenFileIO token-extraction path needs a live REST stack and
remains E2E-gated (enablePaimonTest=false), not run here.
Verified: connector 42/0/0; fe-core NormalizeUri 7/0, Vend 2/0, BackendStorageProps 2/0;
checkstyle 0 across spi/paimon/fe-core; connector import-gate clean.
Design + adversarial red-team: plan-doc/FIX-REST-VENDED-URI-NORMALIZE-design.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…stead of real format (P7-1, MAJOR)
Root cause: PaimonScanPlanProvider.buildJniScanRange and buildCountRange
hardcoded .fileFormat("jni") on PaimonScanRange.Builder. The real
defaultFileFormat (= table.options().getOrDefault(file.format,"parquet"),
computed in planScanInternal) was passed into buildJniScanRange and IGNORED,
and was not passed into buildCountRange at all. PaimonScanRange
.populateRangeParams then emitted fileDesc.file_format="jni". BE
paimon_cpp_reader.cpp backfills paimon FILE_FORMAT/MANIFEST_FORMAT from this
field (only when unset/empty, guarded !file_format.empty()) to avoid defaulting
manifest.format=avro — with the invalid "jni" it injects MANIFEST_FORMAT=jni
(and FILE_FORMAT=jni when unset) and the manifest read breaks.
Key mechanism: the JNI formatType routing is gated by the paimon.split property
(PaimonScanRange.populateRangeParams), NOT by the fileFormat string (that string
drives formatType only on the native branch, where it is already real). So
emitting the real orc/parquet leaves JNI routing intact and only corrects the
inner fileDesc.file_format BE consumes — matching legacy
PaimonScanNode.setPaimonParams, which sets setFormatType(FORMAT_JNI) AND
setFileFormat(getFileFormat(...)) = the real data-file format.
Solution (connector-only, no BE change):
- buildJniScanRange: .fileFormat("jni") -> .fileFormat(defaultFileFormat) (the
already-passed, previously-ignored parameter). Covers the non-DataSplit
metadata-split call and the DataSplit JNI call.
- buildCountRange: add a defaultFileFormat parameter, use it, and thread it from
the call site in planScanInternal.
- PaimonScanRange.Builder default: "jni" -> "" (every production caller sets the
format explicitly; empty is the safe default — BE skips its format backfill on
empty rather than ever injecting an invalid value).
Tests: PaimonScanPlanProviderTest (+1) jniAndCountRangesCarryRealFileFormatNotJni
— a real FileSystemCatalog PK table created with explicit file.format=orc (so
the asserted value is the table option, distinct from the parquet fallback):
force_jni_scanner=true scan -> every JNI data range carries "orc" (not "jni");
count-pushdown scan -> the collapsed count range carries "orc". Reverting either
method to "jni", or dropping the threaded defaultFileFormat, turns the assertion red.
Verified: connector 262/0/1skip (PaimonScanPlanProviderTest 43/0); checkstyle 0;
import-gate clean. Design: plan-doc/FIX-JNI-FILE-FORMAT-design.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ode reset (P2-1, MAJOR)
Root cause: PaimonIncrementalScanParams.validate stripped legacy's defensive null
reset of scan.snapshot-id/scan.mode (PaimonScanNode:842-846), justified by a wrong
"a fresh per-query Table can't inherit scan.*" rationale. A base table that PERSISTS
scan.snapshot-id/scan.mode (legal & mutable via ALTER TABLE SET / TBLPROPERTIES /
table-default.*) carries it on every fresh load. Without the reset, resolveScanTable's
Table.copy merges the stale scan.snapshot-id with incremental-between and paimon 1.3.1
either THROWS ("[incremental-between] must be null when you set [scan.snapshot-id,
scan.tag-name]") or silently downgrades the @incr read to FROM_SNAPSHOT at the stale id
(wrong rows). The connector dropped exactly the safeguard legacy relied on.
Solution (Option 2; design red-team wf_ffd11631-ed2, DESIGN-SOUND): keep validate()
emitting only the non-null incremental-between* keys so the shared ConnectorMvccSnapshot
SPI / handle stay null-free, and reapply the two null resets at the single Table.copy
chokepoint via new PaimonIncrementalScanParams.applyResetsIfIncremental(scanOptions),
called in PaimonScanPlanProvider.resolveScanTable. paimon copyInternal consumes a null
value as options.remove(k), clearing the stale pin. The one edit covers BOTH callers
(native/JNI scan planScanInternal + JNI serialized-table getScanNodeProperties). Gated
on incremental-between / incremental-between-timestamp presence, so a genuine
scan.snapshot-id / scan.tag-name pin passes through unchanged (no false positive). Strict
legacy parity: resets scan.snapshot-id + scan.mode only. Corrected the now-refuted
"byte-parity on a freshly-loaded base" rationale in the affected javadoc/comments.
Tests: PaimonIncrementalScanParamsTest +4 (helper seeds the null resets for snapshot and
timestamp windows; passes non-incremental pins through unchanged; no-op for empty/null)
and reworded the keep-null-free validate() test; PaimonScanPlanProviderTest +1 real-table
(FileSystemCatalog over a persisted scan.snapshot-id), proven fail-before (paimon throws)
/ pass-after; PaimonConnectorMetadataMvccTest WHY-comment reworded (assertions unchanged).
Connector suites 20/44/37 green; checkstyle 0; import-gate clean. Connector-only — no SPI,
no BE change. Live @incr-over-persisted-scan.snapshot-id E2E is CI-gated (enablePaimonTest
=false), noted as gated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FIX-3 FIX-INCR-SCAN-RESET committed f08bc22. Adds FIX-INCR-SCAN-RESET-summary.md, marks FIX-3 done in the task-list, rolls HANDOFF to FIX-4 (FIX-FECONF-STORAGE-PARITY). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l legacy parity (P8-1..4, P9-2/3) Root cause: the connector cannot import fe-core, so PaimonCatalogFactory rebuilds the FE-side Hadoop Configuration/HiveConf from raw props with literal key logic. That reconstruction was incomplete vs the legacy *Properties classes, so paimon catalogs on several storage backends failed FE-side catalog/metadata access (the live FileSystemCatalog/HiveCatalog/JdbcCatalog could not resolve the storage FileIO). Solution (connector-only; no fe-core/SPI/BE change): - Extract a shared applyS3aBaseConfig helper (port of AbstractS3CompatibleProperties.appendS3HdfsProperties) taking caller-resolved creds AND the 4 tuning values, so each scheme passes its OWN aliases/defaults. - 4a OSS: derive fs.oss.endpoint from region when blank (oss-<region>[-internal].aliyuncs.com, default -internal, publicAccess from dlf.access.public/dlf.catalog.accessPublic), MOVED from the DLF-local block into the shared OSS block (so filesystem+hms flavors get it too); also emit the S3A base for OSS. Removed the now-dead DLF-local derivation block. - 4b S3: emit fs.s3a.path.style.access + connection.maximum/request.timeout/timeout. Tuning defaults are per-backend: S3=50/3000/1000 (incl AWS_* alias twins), OSS/COS/OBS=100/10000/10000 (a single shared default would silently mis-tune AWS S3). - 4c COS/OBS: new applyCanonicalCosConfig/ObsConfig. Detection mirrors legacy guessIsMe (endpoint/warehouse PATTERN: myqcloud.com / myhuaweicloud.com) OR a cos./obs.-prefixed key, NOT scheme-key-only (a cosn:// catalog configured with only s3.endpoint=cos...myqcloud.com would be missed otherwise). Each emits the S3A base (cosn/obs FS impl is S3AFileSystem, which reads fs.s3a.*) THEN the unconditional fs.cosn.* / fs.obs.* keys; OBS prefers the native OBSFileSystem when classpath-available. - S3 endpoint-from-region (user-approved, same defect class as the OSS P8-1 fix): region-only AWS S3 derives https://s3.<region>.amazonaws.com. - 4d HMS username: resolve hadoop.username from firstNonBlank(hive.metastore.username, hadoop.username) (alias priority), run AFTER the storage overlay so the raw hadoop.* passthrough cannot clobber it. - 4e (folded in, pre-existing MAJOR found in impl review): the kerberos block forced hadoop.security.authentication=kerberos before applyStorageConfig, so a kerberized-HMS + simple-HDFS catalog had it clobbered back to simple by the raw hadoop.* passthrough (auth=simple but sasl=true -> broken GSSAPI). Relocated the kerberos block to run AFTER the overlay, mirroring legacy initHadoopAuthenticator-last ordering. Design red-team (wf_a6385c61-669, 5 skeptics + completeness critic) caught the divergent tuning defaults, the endpoint-pattern detection gap, and the unconditional fs.cosn.*/fs.obs.* requirement before coding; impl verification (wf_f90260cb-5e6) confirmed byte-for-byte legacy key/alias/default fidelity and found 4e. Tests: PaimonCatalogFactoryTest +15 (S3 endpoint-from-region, S3 50/3000/1000 tuning, path-style, OSS endpoint-from-region filesystem+hms, OSS S3A base, COS keys + pattern-detect + unconditional region, OBS keys + pattern-detect, no-COS/OBS-for-plain-S3, HMS username alias + priority, kerberos-survives-simple-HDFS). The priority + kerberos tests are RED on the pre-move ordering. Verified: connector 56/0/0 + full module green; checkstyle 0; import-gate clean. Live e2e (paimon_base_filesystem/dlf/hms suites) CI-gated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ne; all 4 round-3 fixes complete, next B8 - Mark FIX-4 done (commit f0210b5) in task-list-P5-rereview3-fixes.md; record the beyond-literal-scope items (user-approved S3 endpoint-from-region, per-backend tuning defaults, endpoint-pattern detection, unconditional fs.cosn.*/fs.obs.*, folded-in 4e kerberos-ordering MAJOR) and the known out-of-scope residual. - Add FIX-FECONF-STORAGE-PARITY-summary.md. - Roll HANDOFF: all 4 user-approved round-3 fixes (FIX-1..FIX-4) complete; next session = B8 legacy deletion (paimon/* + *Properties dead residue, now that FIX-4 no longer needs them as a literal-port reference) + round-3 follow-ups (D-057 re-scope, accepted-deviation sign-off, uncheckedFallbacks), gated on an AskUserQuestion scope check since B8 is a large change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oop FS closure
Root cause: the Paimon connector plugin runs under a child-first ClassLoader with
org.apache.hadoop NOT parent-first, and bundled hadoop-common/hadoop-client-api but
NOT hadoop-aws. So FileSystem/SecurityUtil loaded child-first while S3AFileSystem
resolved from the parent 'app' loader -> cross-loader ClassCastException
('S3AFileSystem cannot be cast to FileSystem') and a permanent SecurityUtil.<clinit>
poison ('Could not initialize class ...SecurityUtil', 'DNSDomainNameResolver not
DomainNameResolver', 'ServiceConfigurationError: NullScanFileSystem not a subtype'),
cascading to 'Unknown database X'. ~39 of 42 external-regression suites failed on the
af2037 TeamCity run; not fixed by any later commit.
Solution (self-contained plugin — aligns with fe-core dropping hadoop/hive-catalog-shade
after full connector migration; does NOT lean on the parent):
- pom: add hadoop-aws (the only missing FS impl, S3AFileSystem; DistributedFileSystem
already comes from the transitive hadoop-client-api). hive-common stays bundled.
- PaimonCatalogFactory.buildHadoopConfiguration: conf.setClassLoader(plugin loader) so
Configuration.getClass("fs.<scheme>.impl") resolves the FS impl from the plugin loader.
- PaimonConnector.createCatalogFromContext (single chokepoint for all flavors): pin the
thread-context classloader to the plugin loader around catalog creation so the
FileSystem ServiceLoader and SecurityUtil static init resolve from the child. Mirrors
JdbcConnectorClient / ThriftHmsClient.
Tests: connector build SUCCESS + all connector UTs 0 fail/0 error; plugin lib/ now
contains hadoop-aws/S3AFileSystem; checkstyle + connector import-gate clean. The full
runtime proof is the docker external paimon suite (CI-gated, enablePaimonTest) — not run
locally. See plan-doc/FIX-PAIMON-HADOOP-CLASSLOADER-{design,summary}.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ROPERTIES to paimon Root cause: branch commit 98a73bf (D-046 paimon parity) added LOCATION+PROPERTIES emission to the SHARED PLUGIN_EXTERNAL_TABLE branch of Env.getDdlStmt, gated only on !properties.isEmpty(). JDBC/ES/Trino catalogs are plugin-driven with non-empty getTableProperties() (connection props incl. credentials), so SHOW CREATE TABLE on a JDBC external table emitted LOCATION '' + PROPERTIES("password"=...) instead of the legacy comment-only ENGINE=JDBC_EXTERNAL_TABLE; — a correctness regression (test_nereids_refresh_catalog) and a JDBC credential leak. Still present on HEAD. Solution: gate the LOCATION+PROPERTIES emission additionally on TableType.PAIMON_EXTERNAL_TABLE.name().equals(getEngineTableTypeName()) — only the paimon engine type (the sole plugin-driven connector whose legacy DDL carried LOCATION/PROPERTIES) renders them. JDBC/ES/Trino/MaxCompute revert to comment-only; the credential leak is closed. Did NOT rebaseline the .out (would entrench the leaked-credential output). Tests: fe-core compile SUCCESS + checkstyle clean; adversarial static review SOUND (paimon incl. sys-table unwrap still renders LOCATION/PROPERTIES; jdbc/es/trino/maxcompute match committed comment-only .out; getTableProperties has no other DDL consumer). e2e: external_table_p0/nereids_commands/test_nereids_refresh_catalog (CI external pipeline). See plan-doc/FIX-SHOWCREATE-PLUGIN-PROPS-{design,summary}.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
af2037c to
f7114a2
Compare
Contributor
Author
|
run buildall |
Contributor
TPC-H: Total hot run time: 29192 ms |
Contributor
TPC-DS: Total hot run time: 167905 ms |
Contributor
FE Regression Coverage ReportIncrement line coverage |
…ma-cache (CI 968828) Root cause: PluginDrivenSysExternalTable did not override getSchemaCacheValue(), so it inherited ExternalTable.getSchemaCacheValue() which routes through ExternalCatalog.getSchema() and re-resolves the table by name in the db map. A transient system table (e.g. tbl$snapshots / tbl$manifests) is never registered in that map, so the lookup failed with "failed to load schema cache value for: ...$snapshots". Regression from the paimon SPI migration; legacy PaimonSysExternalTable avoided it by overriding getSchemaCacheValue()/initSchema() to compute on the transient instance. Solution: override getSchemaCacheValue() (and initSchema(SchemaCacheKey)) to compute the schema directly via the inherited PluginDrivenExternalTable.initSchema() (which honors this class's resolveConnectorTableHandle that threads the sys-table handle), memoized with double-checked locking — mirroring legacy PaimonSysExternalTable. Tests: covered by existing e2e suites paimon_system_table ($manifests), paimon_time_travel ($snapshots), test_paimon_system_table_auth (re-run in CI). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…68828) Root cause: PaimonConnectorMetadata.mapFields built ConnectorColumn via the 5-arg ctor, which defaults isKey=false; ConnectorColumnConverter propagates it, so DESC showed Key=false for every paimon column. Legacy PaimonExternalTable/PaimonSysExternalTable always set Column isKey=true (3rd positional arg) for every column, so the .out files expect Key=true. Caused test_paimon_schema_change, test_paimon_char_varchar_type, test_paimon_timestamp_with_time_zone DESC diffs. Solution: pass isKey=true via the 6-arg ConnectorColumn ctor in mapFields (single chokepoint for latest + at-snapshot + system-table schema paths; toSchemaCacheValue preserves isKey on remap). Tests: extended PaimonConnectorMetadataTest.getTableSchemaForcesColumnsNullableForLegacyParity to pin isKey=true for both a PK and a non-PK column. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… split (CI 968828) Root cause: the paimon (and hudi) plugin-zip bundled org.apache.thrift:libthrift and loaded org.apache.thrift.* child-first (not in the connector parent-first allowlist), while fe-thrift is provided so org.apache.doris.thrift.TFileScanRangeParams resolves parent-first and implements the PARENT's TBase. PaimonScanPlanProvider.encodeSchemaEvolution()'s TSerializer.serialize(carrier) then mixes a child TSerializer with a parent-TBase carrier -> IncompatibleClassChangeError. Being an Error (not Exception), it escaped catch(Exception) and the connection handler, killing the mysql session. This was the dominant CI failure (~19 tests: 2 ANALYZE, the family-D connection drops, and the predict/timestamp_tz/sql_block_rule explain failures). Solution: - Exclude org.apache.doris:fe-thrift + org.apache.thrift:libthrift from the paimon and hudi plugin-zip assemblies, so org.apache.thrift.* resolves from the single parent fe-core copy that also owns org.apache.doris.thrift.* (matches the es/jdbc/hive/maxcompute assemblies). - Defense-in-depth: broaden encodeSchemaEvolution's catch to Exception | LinkageError so any future linkage error surfaces as a clean per-query failure instead of an uncaught Error that kills the whole connection (this is what turned ~5 real failures into ~19 collateral ones). Verified: rebuilt paimon and hudi plugin zips no longer contain libthrift/fe-thrift. Tests: e2e re-run in CI (the native-path paimon suites). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ilter scans (CI 968828) Root cause: on the SPI plugin scan path, PaimonScanPlanProvider.getScanNodeProperties emitted the paimon.predicate property only when filter.isPresent() && !predicates.isEmpty(), and populateScanLevelParams set the thrift field only when non-null. So a paimon read with no pushed-down filter (e.g. force_jni_scanner=true `select *`) omitted paimon_predicate entirely; BE then omitted the JNI key, and PaimonJniScanner.getPredicates() called PaimonUtils.deserialize(null) -> NPE "encodedStr is null". Legacy PaimonScanNode.createScanRangeLocations always serialized the (possibly empty) predicate list, so the field was always present. Caused test_paimon_catalog_varbinary, paimon_tb_mix_format, paimon_partition_legacy, paimon_timestamp_types, test_paimon_partition_table. Solution: - getScanNodeProperties always serializes the predicate list (empty list -> non-null base64 string) and emits paimon.predicate unconditionally, restoring the legacy invariant. - BE backstop: PaimonJniScanner.getPredicates() treats a null paimon_predicate param as "no filter" (returns emptyList) so the JNI reader never NPEs on a missing param. Tests: PaimonScanPlanProviderTest.getScanNodePropertiesAlwaysEmitsPredicateForNoFilterScan pins that a no-filter scan emits paimon.predicate and it deserializes to an empty list. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8-family root-cause analysis (adversarially verified) of the 37 external-regression failures. 7 in-scope paimon-SPI regressions + 2 out-of-scope (hive CTAS stale test; BE shutdown ASAN race). RC-1/2/6/7 fixed (contained); RC-3/4/5 deferred to the docker-gated self-contained-classloader batch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…imon plugin (CI 968828) Root cause: the connector sets fs.oss.impl=com.aliyun.jindodata.oss.JindoOssFileSystem, but that impl ships only in the thirdparty jindofs jars (packaged by post-build.sh into fe/lib/jindofs, not a maven artifact). The paimon plugin runs child-first, so JindoOssFileSystem resolves from the parent and cannot be cast to the plugin's child-loaded org.apache.hadoop.fs.FileSystem -> "JindoOssFileSystem cannot be cast to FileSystem" -> "Unknown database" on first OSS listing (paimon_base_filesystem, test_paimon_deletion_vector_oss). The maven route is unbuildable (jindo-sdk/jindo-core are bound to an undeclared jindodata repo -> "present but unavailable"; runtime jindofs is 6.10.4, not in maven). Solution: after deploying the connector plugins, copy the jindofs jars (already placed in fe/lib/jindofs by post-build.sh) into the paimon plugin lib so JindoOssFileSystem loads child-first alongside the plugin's own hadoop FileSystem. Naturally gated (no-op unless --jindofs/DISABLE_BUILD_JINDOFS=OFF). CAVEAT (docker-gated, enablePaimonTest=true): jindo-core ships a native lib that binds to one classloader per JVM, so this is safe only while no concurrent non-paimon path loads jindo from fe/lib/jindofs in the same FE process — must be confirmed by the docker paimon suite. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on plugin (CI 968828) Root cause: the prior fix (FIX-PAIMON-HADOOP-CLASSLOADER) bundled hadoop-aws into the plugin (S3AFileSystem child-first) but NOT the AWS SDK v2 (hadoop-aws declares it as software.amazon.awssdk:bundle, which fe/pom.xml excludes). So the plugin's S3AInternalAuditConstants.<clinit> registered an ExecutionAttribute against the single PARENT-loaded sdk-core static, colliding with fe-core's S3A in ExecutionAttribute.ensureUnique() -> ExceptionInInitializerError that permanently poisoned S3A for the whole FE JVM (test_iceberg_jdbc_catalog/statistics/case_sensibility, test_paimon_statistics). Solution: bundle the AWS SDK v2 (software.amazon.awssdk:s3 + apache-client, BOM-managed 2.29.52) into the plugin child-first, so the plugin's S3A registers against its OWN ExecutionAttribute static. s3's compile closure brings sdk-core (ExecutionAttribute); apache-client is explicit (hadoop-aws wires ApacheHttpClient). software.amazon.awssdk stays child-first (not parent-first) — the separate child SDK copy is the point. Verified: rebuilt plugin zip bundles lib/sdk-core-2.29.52.jar containing software/amazon/awssdk/core/interceptor/ExecutionAttribute.class. Runtime S3A read + assumed-role/STS docker-gated (enablePaimonTest=true). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… client (CI 968828) Root cause: paimon-hive-connector's RetryingMetaStoreClientFactory probes getProxy(HiveConf,...) via reflection, but RetryingMetaStoreClient/HiveMetaHookLoader resolved from the parent hive-catalog-shade-3.1.1 whose getProxy overloads use the PARENT's Configuration/HiveConf Class objects -> exact Class-identity mismatch across loaders -> all probes NoSuchMethodException -> "Failed to create the desired metastore client" (test_create_paimon_table). The metastore itself is reachable. Solution: bundle org.apache.hive:hive-metastore:2.3.7 (RetryingMetaStoreClient/HiveMetaStoreClient/ HiveMetaHookLoader + metastore api) child-first so its getProxy(HiveConf,...) overloads compile against the SAME child-bundled hive-common-2.3.9 HiveConf the connector builds. 2.3.7 pairs with hive-common 2.3.9 (API-stable HiveConf) and is fastutil-CLEAN, so unlike hive-catalog-shade it does not reintroduce the fastutil collision. libfb303 rides transitively; server-side datanucleus/derby/hbase/tephra, the stale hadoop-2.7.2 trio + guava, and libthrift are excluded (libthrift stays parent-first like the other connectors). Verified: rebuilt plugin zip bundles lib/hive-metastore-2.3.7.jar (RetryingMetaStoreClient with 5 getProxy(HiveConf) overloads) + libfb303; 0 fastutil entries; no hadoop-2.7.2 leak. The thrift 0.9.3-vs-host-0.16.0 wire skew and the DLF ProxyMetaStoreClient path are docker-gated (enablePaimonTest=true). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
|
run buildall |
Contributor
TPC-H: Total hot run time: 28510 ms |
Contributor
TPC-DS: Total hot run time: 168080 ms |
Contributor
FE UT Coverage ReportIncrement line coverage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
only for testing