issue #7427 : Dimension Lookup/Update regression and bug fix#7429
Open
mattcasters wants to merge 1 commit into
Open
issue #7427 : Dimension Lookup/Update regression and bug fix#7429mattcasters wants to merge 1 commit into
mattcasters wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Issue #7427 addresses regressions and related failures in the Dimension Lookup / Update transform after the validation work landed in #7408 (commit
5ca8eb1eb6, PR #7408). That change added extensive field and metadata validation but inadvertently broke a previously valid configuration: lookup-only mode with an empty version field.This branch also improves preload-cache behaviour: it replaces the cryptic
Comparison method violates its general contracterror with actionable validation messages, adds an option to skip zero-length validity rows during preload, fixes a preload date-type mismatch, and adds integration tests underintegration-tests/database/.Root cause analysis
1. Version field regression (primary regression from #7408)
Commit
5ca8eb1eb6introducedvalidateFieldsAndMetadata()inDimensionLookup.java. When the version field is empty, it always threw:That check is appropriate for update mode (SCD insert/update algorithm needs a version column), but lookup-only mode (
update = N) has never required a version field. Lookup-only pipelines with<version/>empty therefore started failing at init.The same unconditional requirement existed in
DimensionLookupMeta.checkReturns()used by the transform verification UI.Fix: Gate the version-field requirement behind
isUpdate()in both places:DimensionLookup.validateFieldsAndMetadata()DimensionLookupMeta.checkReturns()2. Preload cache sort comparator error (pre-existing, surfaced by testing)
DimensionCache.compare()is designed for binary-search lookup, not forCollections.sort(). When two rows share the same natural key and overlapping[date_from, date_to)ranges, the comparator can return0for non-equal rows, violating the Comparator contract and producing:This is especially common when upstream flattening produces zero-length validity rows (
date_from == date_to).Fix: Before
sortRows(), call newDimensionCache.validateRowsForSort()which checks:from >= to, including zero-length)Errors are reported with natural key and date-range context (capped at 3 messages).
HopTransformExceptionis rethrown directly frompreloadCache()so messages are not wrapped in a genericHopException.3. Preload lookup date type mismatch (found during integration testing)
When comparing the lookup date against preloaded rows,
java.util.Date(pipeline execution date) was placed directly into a row whose metadata expectsjava.sql.Timestamp(PostgreSQL preload). This caused:Fix: Convert the lookup date through the preload row's date value meta before cache lookup:
Code changes
Transform plugin (
plugins/transforms/dimensionlookup)DimensionLookup.javaisUpdate(); preload validation call; zero-length filter; date conversion fix;HopTransformExceptionpassthrough from preload; improved error logging (stack trace on init failure)DimensionLookupMeta.javaignoreZeroLengthValidityproperty (ignore_zero_length_validity, defaultfalse); version check gated onisUpdate()incheckReturns()DimensionCache.javavalidateRowsForSort(),isZeroLengthValidity(),excludeZeroLengthValidityRows(), overlap/invalid-range helpersDimensionLookupDialog.javamessages_en_US.propertiesDimensionCacheTest.javaIntegration tests (
integration-tests/database/)Wired into
main-0012-dimension-lookup-update.hwfafter existing0012-5:preload_cache=Y,cache_size=0. Asserts regression fix.Supporting files (new):
0012-dimension-lookup-no-version.hpl— includes downstream Verify (Dummy) transform for unit-test row capture0012-dimension-lookup-zero-length-validity.hplUnit-test design note: The Hop unit-test framework captures input rows to the golden-data transform (
rowReadEvent), not transform output. Golden data must therefore be attached to a downstream transform (e.g. Dummy Verify) whose input row meta already contains the expected lookup results. This follows the same pattern as0023-db-procedure.New option: ignore zero-length validity
Property:
ignore_zero_length_validity(defaultfalse)When enabled with preload cache in lookup-only mode, rows where
date_from == date_toare excluded before cache validation and sorting. Useful when upstream flattening produces point-in-time records that cannot participate in range-based lookup but should not block the pipeline.Log messages (detailed/basic) report how many rows were skipped; an empty cache after filtering logs a basic warning.
Testing
Unit tests
./mvnw test -pl plugins/transforms/dimensionlookupAll
DimensionCacheTestcases pass, including new validation and zero-length filtering tests.Integration tests
Verified on a clean Docker run (container exit code 0):
1/2resolved todimension_id1/2, valuesalpha/betaNatural key [K1] has an invalid date range [2020/03/01, 2020/03/01) (date-from must be before date-to)result=[true]Build note for reviewers: A full reactor build (
-am) may fail on unrelated modules (e.g. corruptedhop-action-ftpcheckstyle XML). Building the plugin and client directly works:The Docker test script unzips
assemblies/client/target/hop-client-*.zipbefore building the image; ensure the client assembly includes the rebuilt dimensionlookup JAR.What reviewers should focus on
validateRowsForSort()uses a sort-safe comparator (natural key, thendate_from); the existingcompare()for binary search is unchanged. Confirm this separation is intentional and sufficient.HopTransformExceptionmessages directly to the user, not generic wrappers.Out of scope / known limitations
DimensionCache.compare()contract issue forCollections.sort()is not fixed by changing the comparator; invalid/overlapping data is rejected upfront instead.0029-sql-file-output(cannot create output folder) — not introduced by this work.File change summary