feat(api): make orphan-task recovery configurable and drop the Jira idempotency table#11472
feat(api): make orphan-task recovery configurable and drop the Jira idempotency table#11472AdriiiPRodri wants to merge 6 commits into
Conversation
…a from orphan-task recovery
|
✅ Conflict Markers Resolved All conflict markers have been successfully resolved in this pull request. |
|
✅ All necessary |
🔒 osv-scanner: 2 finding(s) in
|
| Severity | ID | Package | Version | Summary |
|---|---|---|---|---|
| 🟠 HIGH (8.8) | GHSA-897w-fcg9-f6xj |
PyPI/dulwich |
0.23.0 |
Dulwich has an arbitrary file write via NTFS-hostile tree entries on Windows |
| 🟠 HIGH (7.4) | PYSEC-2026-179 |
PyPI/pyjwt |
2.12.1 |
(no summary) |
To accept a finding, add an [[IgnoredVulns]] entry to osv-scanner.toml at the repo root with a reason and ignoreUntil.
🔒 Container Security ScanImage: 📊 Vulnerability Summary
15 package(s) affected
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #11472 +/- ##
========================================
Coverage 93.96% 93.97%
========================================
Files 242 240 -2
Lines 35619 35407 -212
========================================
- Hits 33471 33273 -198
+ Misses 2148 2134 -14
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR makes orphaned-task recovery operationally controllable via Django settings (master opt-in plus per-task-group toggles) and removes the unreleased Jira idempotency mechanism (JiraIssueDispatch) along with its migration and related cleanup logic.
Changes:
- Add master + per-group feature flags for orphan-task recovery and refactor the allowlist into grouped
RECOVERY_TASK_GROUPS. - Remove the Jira idempotency dispatch table/model/migration and revert Jira send behavior accordingly (and ensure Jira tasks are no longer auto re-enqueued).
- Update tests, docs, and changelog entries to match the new recovery semantics and the Jira de-dup removal.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| api/src/backend/tasks/tests/test_orphan_recovery.py | Adds coverage for master/per-group recovery flags and updates Jira recovery expectations. |
| api/src/backend/tasks/tests/test_integrations.py | Removes Jira dispatch-based idempotency assertions and updates expected return shape. |
| api/src/backend/tasks/tests/test_deletion.py | Removes provider-deletion cleanup test for the deleted Jira dispatch table. |
| api/src/backend/tasks/jobs/orphan_recovery.py | Introduces grouped re-enqueue allowlist + master flag gate in reconcile_orphans(). |
| api/src/backend/tasks/jobs/integrations.py | Removes Jira dispatch reservation logic and simplifies Jira send loop/return payload. |
| api/src/backend/tasks/jobs/deletion.py | Drops Jira dispatch cleanup step from provider deletion workflow. |
| api/src/backend/config/django/base.py | Adds TASK_RECOVERY_* settings sourced from DJANGO_TASK_RECOVERY_* env vars. |
| api/src/backend/api/models.py | Deletes the JiraIssueDispatch model. |
| api/src/backend/api/migrations/0096_jiraissuedispatch.py | Removes the migration that created the Jira dispatch table. |
| api/docs/orphan-task-recovery.md | Updates operational docs for recovery flags and removes Jira idempotency claims. |
| api/CHANGELOG.md | Removes the Jira idempotency entry tied to the now-removed dispatch table. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…orphan-task recovery
…livering scan-perform/integration-jira on crash
Context
This builds on the orphan-task recovery added in #11416, which re-enqueues background tasks whose worker died mid-run (deploy, OOM, eviction). That work shipped without an operational kill switch or any per-area control, relied on a dedicated
jira_issue_dispatchestable for Jira de-duplication, and made scan re-runs idempotent. This PR makes recovery toggleable per task group (opt-in), drops the Jira idempotency table, and removes the scan re-run idempotency so scans are no longer auto-recovered. The whole #11416 feature is still unreleased, so removing the migrations does not affect any released schema.Description
Feature flags for orphan-task recovery. Recovery is gated by Django settings (environment variables). The master switch is OFF by default, so recovery is opt-in; the per-group flags default to enabled, so once the master is on every group recovers unless explicitly turned off.
DJANGO_TASK_RECOVERY_ENABLEDfalseDJANGO_TASK_RECOVERY_SUMMARIES_ENABLEDtruescan-summary,scan-compliance-overviews,scan-provider-compliance-scores,scan-daily-severity,scan-finding-group-summaries,scan-reset-ephemeral-resources.DJANGO_TASK_RECOVERY_DELETIONS_ENABLEDtrueprovider-deletion,tenant-deletion.The flat reenqueueable allowlist is replaced by
RECOVERY_TASK_GROUPS(summaries, deletions) plus areenqueueable_tasks()helper that unions only the enabled groups. A task in a disabled group is still detected and marked terminal (clearing the stuck "in progress" state), but it is not re-enqueued. With the master flag off, the task-recovery sweep is skipped entirely; the attack-paths stale cleanup, a separate concern, keeps running.Scans excluded from recovery. The scan re-run idempotency added in #11416 is removed (the pre-run
_clear_scan_rerun_statedelete and the compliance summary/requirement re-deletes),Scan.recovery_countand its migration are dropped, andscan-perform/scan-perform-scheduledare moved into the watchdog's skip set. An orphaned scan is now left untouched (not detected, marked, or re-enqueued), reverting scans to their pre-#11416 behavior, because re-running a scan is not safe to do automatically.Remove the Jira idempotency dispatch table. The
JiraIssueDispatchmodel and its0096_jiraissuedispatchmigration are removed,send_findings_to_jirais reverted to its pre-#11416 form, andintegration-jirais dropped from the reenqueueable allowlist because, without the dedup table, re-running it would create duplicate Jira issues. The dispatch cleanup step is also removed from provider deletion.Tasks never re-enqueued. Only the two groups above are ever re-enqueued. Every other task is detected and marked terminal (so it stops showing as "in progress"), but never re-run, for one of three reasons:
integration-jira(duplicate Jira issues),integration-s3(upload rebuilt from worker-local files that do not survive the crash),integration-security-hub(pushes findings to AWS),scan-report,scan-compliance-reports(generate/compress/upload output files from worker-local tmp).integration-check,integration-connection-check,provider-connection-check,lighthouse-connection-check,lighthouse-provider-connection-check,lighthouse-provider-models-refresh.backfill-compliance-summaries,backfill-daily-severity-summaries,backfill-finding-group-summaries,backfill-provider-compliance-scores,backfill-scan-resource-summaries,scan-attack-surface-overviews,scan-category-summaries,scan-resource-group-summaries,reaggregate-all-finding-group-summaries,findings-mute-historical.Some tasks are skipped entirely (not even detected):
scan-performandscan-perform-scheduled(not auto-recovered),attack-paths-scan-perform(handled by its own stale cleanup, which drops the temporary Neo4j database), andattack-paths-cleanup-stale-scansandreconcile-orphan-tasks(they re-run on their own schedule).Migration cleanup
This PR deletes two unreleased migrations:
0094_scan_recovery_count(addedScan.recovery_count) and0096_jiraissuedispatch(createdjira_issue_dispatches). Neither shipped in a release, so no released schema is affected. If an environment already applied either one while trackingmaster, the column/table and their migration records are left behind on pull; drop them manually:Steps to review
DJANGO_TASK_RECOVERY_ENABLEDdefaults tofalse, so the sweep does nothing until you set it totrue.false(for exampleDJANGO_TASK_RECOVERY_SUMMARIES_ENABLED=false) to exclude that group; its orphaned tasks are then marked terminal instead of re-enqueued.scan-perform/scan-perform-scheduledis ignored (not detected, marked, or re-enqueued).python manage.py makemigrations --checkreports no changes.pytest tasks/tests/test_orphan_recovery.py(master/per-group flags and the scan skip) andpytest tasks/tests/test_integrations.py -k Jira(Jira send reverted).Checklist
API
License
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.