acc: Prevent lifecycle-started-terraform-error from leaking a started warehouse#5584
Open
chrisst wants to merge 2 commits into
Open
acc: Prevent lifecycle-started-terraform-error from leaking a started warehouse#5584chrisst wants to merge 2 commits into
chrisst wants to merge 2 commits into
Conversation
… warehouse This test deploys a bundle with lifecycle.started: true on the terraform engine and expects plan/deploy to fail with a validation error. The script had no teardown: it relied entirely on the deploy failing before resource creation. If that error path regresses (or the failure moves to mid-apply, after the warehouse is created), every run leaks a started Medium SQL warehouse with a 2h auto-stop into the shared cloud test workspace. Leaked warehouses like this contributed to a GCP local-SSD quota exhaustion that took down CI. Defuse the test in two ways: - Add a cleanup trap that always runs 'bundle destroy --auto-approve' on exit (the idiom used by the sibling lifecycle-started tests), so any accidentally created warehouse is torn down even when the script fails. bundle destroy does not run PreDeployChecks, so it is not blocked by the same lifecycle.started validation error. - Shrink the warehouse to 2X-Small with auto_stop_mins: 10 so that even a transient leak is as cheap and short-lived as possible. The test only exercises the validation error path, so the size is irrelevant to intent. Expected output regenerated with -update (error location moved to line 17; destroy reports no active deployment in the happy path). Co-authored-by: Isaac
Contributor
Waiting for approvalBased on git history, these people are best suited to review:
Eligible reviewers: Suggestions based on git history. See OWNERS for ownership rules. |
Match the comment-free cleanup idiom of the sibling lifecycle-started tests; regenerate expected output for the shifted error location. Co-authored-by: Isaac
Collaborator
Integration test reportCommit: 03b39b1
54 interesting tests: 31 FAIL, 15 SKIP, 7 KNOWN, 1 flaky
Top 27 slowest tests (at least 2 minutes):
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Makes
acceptance/bundle/resources/sql_warehouses/lifecycle-started-terraform-errorleak-proof:trap cleanup EXIT/bundle destroy --auto-approveidiom used by its sibling tests, so any accidental partial deploy is torn down even when the script fails (verifiedbundle destroyis not blocked by the same pre-deploy validation the test exercises);2X-Smallwithauto_stop_mins: 10as defense-in-depth (size is irrelevant to the validation under test).Why
The test verifies that
lifecycle.startedfails with a clear error on the terraform engine. Today that error fires inPreDeployChecks(before anything is created), but the script had zero teardown — any regression that lets the deploy proceed past validation would leak a startedMediumwarehouse into the shared cloud test workspace on every run, while the test still "fails loudly." Leaked started warehouses exhausted the GCP local-SSD quota in the shared test project on 2026-06-11/12 and blocked terraform-provider CI for ~2 days (ref ES-1974228).Tests
Local = true: ran against the local testserver; expected output regenerated (error location shifted by the added config lines, destroy step appended) and the fullsql_warehousessubtree passes cleanly.This pull request and its description were written by Isaac.