feat: add Jira ticket-context support#2420
Conversation
The review tool's ticket-analysis flow (require_ticket_analysis_review) only fetched GitHub issues and linked Azure Boards work items. Teams that track work in Jira got no ticket context. This adds provider-agnostic Jira ticket lookup. The lookup uses get_user_description() and get_pr_branch() plus the PR title, which every git provider implements, so it works on GitHub, GitLab, Bitbucket, and Azure DevOps. It is a no-op when [jira] is not configured. - find_jira_tickets(): extract Jira keys from the PR title, description and branch name. Matching is case-insensitive and keys are normalized to upper case, so lowercased branch names (e.g. bugfix/abc-123-x) are detected. First-seen order is preserved so the MAX_TICKETS cap is deterministic. - _get_jira_client() / extract_jira_tickets(): fetch each key via the atlassian-python-api Jira client (already a dependency, used by the Bitbucket providers). Supports Jira Cloud (email + token) and Server/Data Center (PAT). The REST API version is pinned to v2 (JIRA_API_VERSION): v2 returns description and custom fields as plain strings, while v3 returns ADF JSON dicts that would need separate parsing. - add_jira_tickets(): provider-agnostic step in extract_tickets() that appends Jira tickets to tickets_content, de-duplicated by URL. - jira_requirements_field maps an instance-specific custom field (e.g. customfield_10127) to the ticket "requirements" section. Both the body and the requirements field are truncated to MAX_TICKET_CHARACTERS. - Add a [jira] config section and secrets-template entry using the key names from the existing docs. Credentials are read from settings/env, not committed. - Add unit tests for key extraction, ticket fetching (auth modes, capping, skip-on-error, truncation), client construction, and the provider-agnostic append. An autouse fixture restores the JIRA.* settings between tests. - Document Jira support in the fetching-ticket-context docs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Review Summary by QodoAdd Jira ticket-context support to ticket analysis flow
WalkthroughsDescription• Add Jira ticket-context support to PR analysis flow • Extract Jira keys from PR title, description, branch name • Fetch ticket details via Jira REST API v2 client • Support Cloud (email + token) and Server/Data Center (PAT) auth • Map custom fields to ticket requirements section • Provider-agnostic implementation works across all git providers • Comprehensive unit tests for key extraction and ticket fetching Diagramflowchart LR
PR["PR Title/Description/Branch"] -->|find_jira_tickets| Keys["Jira Keys<br/>ABC-123"]
Keys -->|_get_jira_client| Client["Jira Client<br/>Cloud/Server"]
Client -->|extract_jira_tickets| Tickets["Ticket Objects<br/>title, body, requirements"]
Tickets -->|add_jira_tickets| Content["tickets_content<br/>de-duplicated"]
Content -->|extract_tickets| Review["Ticket Analysis<br/>Review"]
File Changes1. pr_agent/tools/ticket_pr_compliance_check.py
|
Code Review by Qodo
1. Nondeterministic github_tickets truncation
|
| try: | ||
| if api_email: | ||
| return Jira(url=base_url.rstrip("/"), username=api_email, password=api_token, api_version=JIRA_API_VERSION) | ||
| # No email/username: treat the token as a Server/Data Center PAT. | ||
| return Jira(url=base_url.rstrip("/"), token=api_token, api_version=JIRA_API_VERSION) | ||
| except Exception as e: | ||
| get_logger().error(f"Failed to initialize Jira client: {e}", | ||
| artifact={"traceback": traceback.format_exc()}) | ||
| return None |
There was a problem hiding this comment.
1. _get_jira_client() catches broad exception 📘 Rule violation ☼ Reliability
New Jira provider-initialization and ticket-fetch logic uses broad except Exception as e and continues/returns None, which can silently mask unexpected failures and make misconfiguration harder to detect. Compliance requires narrow, explicit exception handling (or immediate re-raise with preserved context) in parsing/credential/provider-init paths.
Agent Prompt
## Issue description
Jira client initialization and issue fetching catch broad `Exception` and then return `None`/`continue`, which can hide unexpected failures.
## Issue Context
This code is in provider-initialization / credential-handling paths, where compliance requires narrow exception types or immediate re-raise while preserving context.
## Fix Focus Areas
- pr_agent/tools/ticket_pr_compliance_check.py[70-78]
- pr_agent/tools/ticket_pr_compliance_check.py[103-107]
- pr_agent/tools/ticket_pr_compliance_check.py[168-171]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Expanding on why I am not narrowing the exception handling here.
_get_jira_client()catches broadException📘 Rule violation ☼ Reliability
New Jira provider-initialization and ticket-fetch logic uses broadexcept Exception as eand continues/returnsNone... Compliance requires narrow, explicit exception handling (or immediate re-raise with preserved context) in parsing/credential/provider-init paths.
Two reasons, one about consistency and one about what the library actually raises.
1. Broad except Exception is the established convention in this exact file. On main, before this PR, ticket_pr_compliance_check.py already has 7 broad except Exception as e: handlers (lines 62, 140, 167, 170, 178, 212, 219), and the same "log and continue / return None" shape is already the idiom in extract_tickets():
- GitHub path (main L138-143):
try: issue_main = ...get_issue(...) except Exception as e: get_logger().error(...); continue - Azure DevOps path (main L212-217):
except Exception as e: get_logger().error("Error processing Azure DevOps ticket: ...") - top-level wrapper (main L219):
except Exception as e: get_logger().error("Error extracting tickets ...")
The new Jira handlers deliberately match that shape. Narrowing only the new code would make this file inconsistent with its own neighbours (and with ~312 except Exception across ~50 files under pr_agent/).
2. The library does not give a clean typed contract to narrow to on these paths. Jira.issue() in atlassian-python-api (3.41.4) is a bare return self.get(...) with no custom exceptions; it relies on AtlassianRestAPI.raise_for_status(), which raises requests.HTTPError (e.g. HTTPError("Unauthorized (401)")), not the typed ApiError / ApiNotFoundError / ApiPermissionError classes. Those typed classes exist but are only raised by a few specific helpers, not the generic .get() path. So the most precise correct catch would be (requests.HTTPError, requests.RequestException) — and misconfiguration can still surface other ways — which is why the broad catch (with {e} logged) is the safer choice here.
If tighter exception handling is wanted, I would suggest it as a separate, file-wide cleanup so the convention stays uniform, rather than singling out the new Jira code. Happy to do that if you would like.
extract_tickets() capped provider-native tickets to MAX_TICKETS, then add_jira_tickets() appended up to MAX_TICKETS more, so a PR could carry up to 2 * MAX_TICKETS ticket bodies into the review prompt. Treat MAX_TICKETS as the combined per-PR cap: provider-native tickets already in tickets_content count against it, and Jira tickets are appended only until the total reaches MAX_TICKETS, keeping existing tickets first. Skip the Jira lookup entirely (no client init) when the cap is already met. Co-Authored-By: Claude
extract_jira_tickets() built the Jira client before checking whether the text contained any Jira keys. Since add_jira_tickets() runs for every provider, a configured-but-unreferenced Jira would construct a client (and log an init failure if misconfigured) on every keyless PR. Extract keys first and return early when there are none, so the client is only built when there is something to fetch. Co-Authored-By: Claude
|
Code review by qodo was updated up to the latest commit d33bc57 |
d33bc57 to
cb9ee43
Compare
|
Thanks for the review. On finding #5 (nondeterministic GitHub PR-description ticket cap): this is a real bug, but it is in the pre-existing For the other findings: #2 (combined |
_get_jira_client() returned None silently whenever base_url + api_token were not both set. If a user set some [jira] values but not the required pair, the misconfiguration was invisible and Jira lookup just never ran. Log a warning naming the missing keys when Jira is partially configured, while staying silent when nothing is set (Jira simply not in use). Co-Authored-By: Claude
|
Follow-up on finding #1 (broad The Jira code deliberately matches the existing convention in this exact file. On
Our Happy to tighten exception handling here if you would like, but I would suggest doing it as a separate, file-wide cleanup rather than singling out the new code, so the convention stays uniform. Let me know your preference. |
Hoist the base-url normalization into a local so the two Jira(...) calls
fit comfortably within the 120-char line limit, and drop the duplicated
.rstrip("/").
Co-Authored-By: Claude
|
Code review by qodo was updated up to the latest commit 1bc29cd |
| # Provider-agnostic Jira lookup. Tickets are often referenced by key in the PR | ||
| # title, description or branch name rather than via a provider-native link, so | ||
| # this runs for every provider and is a no-op when Jira is not configured. | ||
| add_jira_tickets(git_provider, tickets_content) | ||
|
|
||
| return tickets_content |
There was a problem hiding this comment.
1. Blocking jira i/o in async 🐞 Bug ➹ Performance
extract_tickets() is async but calls add_jira_tickets()/extract_jira_tickets() which perform synchronous Jira HTTP requests (jira_client.issue) inline. This blocks the event loop during review execution and can reduce FastAPI throughput / cause latency spikes under concurrent reviews.
Agent Prompt
### Issue description
`extract_tickets()` runs in an async pipeline but performs blocking Jira network calls via `atlassian-python-api` (e.g., `jira_client.issue(...)`) directly on the event loop.
### Issue Context
This function is awaited as part of the PR review flow, so blocking I/O here can stall handling of other concurrent requests/tasks.
### Fix Focus Areas
- pr_agent/tools/ticket_pr_compliance_check.py[94-157]
- pr_agent/tools/ticket_pr_compliance_check.py[273-385]
- pr_agent/tools/pr_reviewer.py[120-141]
### What to change
- Offload Jira network work from the event loop:
- Option A (minimal): in `extract_tickets()`, call Jira extraction via `await asyncio.to_thread(add_jira_tickets, git_provider, tickets_content)` (or `asyncio.get_running_loop().run_in_executor(...)`).
- Option B (more invasive): make `add_jira_tickets()` / `extract_jira_tickets()` async and use an async HTTP client.
- Ensure the returned/updated `tickets_content` is preserved (if using `to_thread`, use the returned list).
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Accurate that jira_client.issue() is synchronous, but I would not address it in this PR — the new Jira call follows the pre-existing pattern of this function rather than introducing a new one.
Verified against main (before this PR):
extract_tickets()is already declaredasync def(ticket_pr_compliance_check.py L108).- It already performs synchronous, blocking network calls inline:
- GitHub path:
git_provider.repo_obj.get_issue(...)(L139, L156) andgit_provider.fetch_sub_issues(...)(L152) — PyGithub, blocking. - Azure path:
git_provider.get_linked_work_items()(L194) — blocking.
- GitHub path:
- Those provider methods are plain
def(synchronous): e.g.def get_linked_work_itemsanddef fetch_sub_issues, notasync def.
So the event loop is already blocked on synchronous ticket I/O for the existing GitHub and Azure providers; the Jira call added here is consistent with that. Wrapping only the Jira call in asyncio.to_thread would not actually fix the concern (the GitHub/Azure calls in the same function would still block) and would make the new code stylistically inconsistent with its neighbours.
A proper fix — offloading all the ticket I/O in extract_tickets() (GitHub, Azure, and Jira together), or making the providers async — is a refactor of pre-existing code that is out of scope for adding Jira support. Happy to open a separate PR for that if it is wanted, but I would prefer not to single out the Jira call here.
|
On finding:
Worth clarifying that this PR did not introduce these limits — it codified pre-existing ones as named constants. On
This PR lifts those scattered literals into module-level On making them configurable: I agree these two are legitimate tunables, and exposing them is already raised as open question #3 (
|
|
On finding:
I do not think this is a bug — token-without-email is a documented, first-class auth mode, not a Cloud misconfiguration. Per Atlassian's official docs on Using Personal Access Tokens (Jira Data Center / Server):
So Adding a "Cloud requires email" guard would actually break that documented PAT setup. There is also no reliable way to distinguish "Cloud user who forgot their email" from "Server PAT user" up front — both present as On the "late, per-ticket auth failure" concern: a genuinely bad credential is not silent — the per-ticket fetch is wrapped and logs |
|
Follow-up on finding #6, to be precise about it. To be fair to the finding: Atlassian's Cloud basic-auth docs do require The problem is the proposed guard can't be implemented without breaking valid setups. Server / Data Center users authenticating with a PAT do not supply an email at all —
The credentials carry no signal to tell those two apart, so any "Cloud requires email" guard would false-positive against every valid PAT user. That is why I would not add it. On the "late, per-ticket auth failure" concern: the failure is not silent. |
The Azure DevOps branch of extract_tickets() truncated the work-item body to MAX_TICKET_CHARACTERS but passed acceptance_criteria into "requirements" unbounded, so a large field could inject an oversized blob into the review prompt. This mirrors the truncation the Jira path in this PR already applies. Cap requirements to MAX_TICKET_CHARACTERS and guard against non-string values, matching the body handling. Co-Authored-By: Claude
|
Code review by qodo was updated up to the latest commit 8d6bac2 |
|
On finding:
Fixed in 8d6bac2. One clarification: the Azure |
Building the Jira client from a free-form jira_base_url let repo-controlled configuration (e.g. pyproject.toml [tool.pr-agent]) redirect the authenticated request, and the API token with it, to an arbitrary host. PR-Agent merges repo config into the same settings object that supplies the base URL, so an untrusted PR could point Jira at an attacker server and exfiltrate the token. Replace jira_base_url with jira_site (the "<site>" in https://<site>.atlassian.net) and build the base URL from it. jira_site is validated as a single DNS label, so it cannot contain the characters (. / : @ # ? etc.) needed to escape *.atlassian.net. The destination is therefore fixed by construction regardless of what untrusted config supplies. This scopes the feature to Jira Cloud. Jira Server / Data Center, which needs a free-form host, is not supported for now; it can return once base-URL trust is handled (e.g. a secrets-only source). Cloud authenticates with email + API token, so the partial-config warning now requires site + email + token. - Add JIRA_SITE_PATTERN + _jira_cloud_base_url(); validate and build the URL. - Cloud-only auth in _get_jira_client() (drop the Server/DC PAT branch). - Update [jira] config + secrets template: jira_site instead of jira_base_url. - Rewrite the docs Jira section Cloud-only, noting Server/DC is not yet supported and why. - Add TestJiraSiteInjection covering breakout attempts (query/fragment/path/port/ userinfo/encoded-dot/whitespace), asserting they are rejected and no client is built. Co-Authored-By: Claude
|
Code review by qodo was updated up to the latest commit 9a39046 |
| if len(github_tickets) > MAX_TICKETS: | ||
| get_logger().info(f"Too many tickets found in PR description: {len(github_tickets)}") | ||
| # Limit the number of tickets to 3 | ||
| github_tickets = set(list(github_tickets)[:3]) | ||
| # Limit the number of tickets | ||
| github_tickets = set(list(github_tickets)[:MAX_TICKETS]) |
There was a problem hiding this comment.
1. Nondeterministic github_tickets truncation 📘 Rule violation ☼ Reliability
When too many GitHub issue links are found, the code enforces the MAX_TICKETS cap by converting an unordered set to a list and slicing, making which tickets are kept nondeterministic across runs. This can lead to inconsistent related ticket context (and thus inconsistent compliance analysis inputs) and flaky behavior dependent on hash iteration order.
Agent Prompt
## Issue description
`extract_ticket_links_from_pr_description()` gathers GitHub issue URLs into an unordered `set` and then truncates by converting that set to a list and slicing (e.g., `set(list(github_tickets)[:MAX_TICKETS])` / `list(github_tickets)[:MAX_TICKETS]`). Because set iteration order is not stable, the specific subset of tickets that survives the `MAX_TICKETS` cap can vary between runs, leading to inconsistent downstream `related_tickets` context and ticket compliance analysis inputs.
## Issue Context
Compliance requires deterministic ordering when truncating collections. In this PR, Jira key extraction intentionally preserves first-seen order for determinism, but GitHub issue extraction still truncates arbitrarily; additionally, `extract_tickets()` merges ticket lists and slices again based on list order, so nondeterminism in GitHub ticket ordering can affect which tickets end up in the final context.
## Fix Focus Areas
- pr_agent/tools/ticket_pr_compliance_check.py[233-261]
- pr_agent/tools/ticket_pr_compliance_check.py[263-300]
- pr_agent/tools/ticket_pr_compliance_check.py[307-326]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
On finding:
- Nondeterministic
github_ticketstruncation 📘 Rule violation ☼ Reliability
This is a real bug, but it is pre-existing in extract_ticket_links_from_pr_description() and unrelated to Jira. On main, that function collects GitHub issue URLs into a set and caps via set(list(github_tickets)[:3]), which has no defined order. This PR only renamed the literal cap ([:3] → [:MAX_TICKETS]) when introducing the shared constant; it did not introduce, and does not change, the set-based nondeterminism.
To keep this PR scoped to Jira support, I have split the fix out separately:
- Issue: Nondeterministic ticket selection when a PR description references more than 3 issues #2421
- Fix PR: fix: make GitHub PR-description ticket selection deterministic #2422 (tracks first-seen order while de-duplicating, then slices the ordered list; includes a regression test verified to fail against the set-based code on every hash seed)
Happy to rebase this PR on top of that once it merges if you would prefer the constant rename not to touch that line here in the meantime.
Reframe the "why Cloud only" note around the design (URL derived from a validated site name, always an Atlassian host) rather than detailing a configuration-injection scenario, which overstated this feature's role. Co-Authored-By: Claude
|
Code review by qodo was updated up to the latest commit 926e33f |
|
@naorpeled Here is the initial implementation of Jira (Cloud) integration. I would focus first on the PR's description, particularly the open questions, because they affect what else, if anything, belongs in this PR's scope. |
|
@dellch thanks for this! |
|
|
||
| @pytest.mark.parametrize("bad_site", INJECTION_ATTEMPTS) | ||
| def test_injection_attempt_rejected(self, bad_site): | ||
| import pr_agent.tools.ticket_pr_compliance_check as m |
| def test_valid_site_only_ever_targets_atlassian_net(self): | ||
| """Any accepted site name resolves to a host under .atlassian.net.""" | ||
| from urllib.parse import urlparse | ||
| import pr_agent.tools.ticket_pr_compliance_check as m |
Summary
Adds Jira ticket-context support to the review tool's ticket-analysis flow (
require_ticket_analysis_review). Previously only GitHub issues and Azure Boards work items were fetched; teams tracking work in Jira got no ticket context.The lookup is provider-agnostic — it only uses
get_user_description()andget_pr_branch()(plus the PR title), which every git provider implements — so Jira context works on GitHub, GitLab, Bitbucket, and Azure DevOps. It is a no-op when[jira]is not configured.Jira Cloud only. The base URL is built from a validated site name (
jira_site→https://<site>.atlassian.net) rather than taken as a free-form URL. This keeps the design simple — config is a site name, not a URL — and is safe by construction: the destination is always*.atlassian.net, so configuration can't point the authenticated request at an arbitrary host. Jira Server / Data Center needs a free-form host, so it isn't supported here yet; it can be added once base-URL handling for the self-hosted case is settled. Happy to revisit the shape if you'd prefer to accept a full URL for Cloud too.This follows the discussion in #2162, where @naorpeled gave the go-ahead to add Jira support as a focused first PR, deferring
/similar_issueand the broader issue-provider abstraction to later work.What it does
find_jira_tickets(): extract Jira keys from the PR title, description and branch name (case-insensitive, normalized to upper case sobugfix/abc-123-xis detected). First-seen order is preserved so theMAX_TICKETScap is deterministic._jira_cloud_base_url(): build the base URL fromjira_site, validated againstJIRA_SITE_PATTERN(a single DNS label) so it cannot contain the characters (. / : @ # ?etc.) needed to escape*.atlassian.net. ReturnsNone(and warns) on a missing/invalid site._get_jira_client()/extract_jira_tickets(): fetch each key via theatlassian-python-apiJira client (already a dependency; used by the Bitbucket providers). Jira Cloud auth (email + API token). The REST API version is pinned to v2 (JIRA_API_VERSION): v2 returns description and custom fields as plain strings, while v3 returns ADF JSON dicts that would need separate parsing.add_jira_tickets(): provider-agnostic step inextract_tickets()that appends Jira tickets totickets_content, de-duplicated by URL, respecting the overallMAX_TICKETScap.jira_requirements_field: maps an instance-specific custom field (e.g.customfield_10127) to the ticket "requirements" section (acceptance criteria). Both the body and the requirements field are truncated toMAX_TICKET_CHARACTERS.[jira]config section (jira_site,jira_api_email,jira_api_token,jira_requirements_field) and a secrets-template entry. Credentials are read from settings/env, not committed.Verified end-to-end against a live Jira Cloud instance.
Open questions
I am leaving these for maintainer input rather than choosing the API shape in this PR:
Config key naming. For
jira_api_token/jira_api_emailthis PR keeps the names the docs already publish. Notejira_base_urlis replaced byjira_site(the site name, not a URL — see the Jira Cloud note above), so the documented contract already changes here. Because the keys sit under the[jira]table, settings resolve toJIRA.JIRA_API_TOKENand the env override isjira__jira_api_token— the repeatedjira_is redundant. Options: keep thejira_-prefixed names for continuity with the existing docs, or drop the prefix (api_token,site, etc., like[bitbucket_server]) for clarity. Happy to go either way.A dedicated issue-provider config section. Rather than a
[jira]section, would you prefer something like[related_issues]withissue_provider = "jira"? That gives a natural home for future, provider-neutral options (see below) instead of accreting keys under[jira]. This would change the configuration shape, so worth deciding before it ships.Relatedly — should more than one issue provider be supported at once? Today
extract_tickets()is additive: it runs the provider-native branch (GitHub Issues / Azure Work Items) and thenadd_jira_tickets(), so a PR referencing both a native issue and a Jira key fetches from both (now bounded by the sharedMAX_TICKETScap). If that is not desired, anissue_providersarray (e.g.["jira"]for one system, or["jira", "github"]for both in priority order) would let a repo express "one tracker only" or "both, in this order," defaulting to today's additive behavior when unset. This is an API-shape decision, hence flagged here rather than implemented.Scope of this PR vs. follow-ups. This PR is intentionally minimal (fetch + map + tests). Should any of the following be in this PR, or separate follow-ups?
jira_project_keys— Jira-specific: restrict key matching to configured project prefixes (e.g.DVT,ABC), reducing false positives likeutf-8/sha-1. Doesn't apply to GitHub (numeric#123refs) or Azure DevOps (native linked work items, no text extraction), so it's named to make the binding explicit. (This differs from Jira issue provider support (minimal) #2162, wherevalid_project_keysfilters JQL search results; here it would filter keys parsed from PR text before fetch.)max_tickets_to_fetchand similar tunables (currently the fixedMAX_TICKETS = 3, matching the existing GitHub path).Issue/IssueProviderabstraction from Jira issue provider support (minimal) #2162 — left out here as it may be premature abstraction with only one consumer.False-positive handling. Key extraction is heuristic; non-existent keys 404 and are skipped, and candidates are capped at 3 — the same trade-off the existing GitHub
#123extraction already makes. Is the current fetch-and-skip behavior acceptable for the first PR, or should strict project-key filtering (option 3'sjira_project_keys) be required before merge?Out of scope
/similar_issue, embedding client, vector-DB integration, and the issue-provider abstraction layer (all part of #2162) are intentionally excluded.Test plan
pytest tests/unittest/test_jira_ticket_extraction.py(49 tests), includingTestJiraSiteInjectioncovering site-name breakout attempts (query / fragment / path / port / userinfo / encoded-dot / whitespace), asserting they are rejected and no client is builttests/unittest/test_extract_issue_from_branch.pyCo-Authored-By: Claude