You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Stabilizes iOS runner sessions around navigation, stale runners, and broken XCTest accessibility trees.
The PR now does five concrete things: resolved iOS/macOS element activation uses coordinate-first taps, ready runner sessions are checked with a bounded uptime preflight instead of a legacy recent-success cache, AX-fatal iOS runner responses immediately invalidate the runner session so the next command starts from a fresh runner, AX-unavailable snapshot guidance clearly routes agents through plain screenshot plus coordinate recovery before retrying AX on the next screen, and the provider-backed iOS lifecycle scenario now covers synthesized coordinate taps from press @ref.
For broken-AX apps such as Bluesky, sparse snapshot fallbacks are no longer cached as healthy runner state. Mutating commands that record XCTest failures now return XCTEST_RECORDED_FAILURE instead of reporting success, and iOS coordinate-only taps use the lower-level synthesized event path so screenshot plus x/y recovery works even when app AX queries are poisoned. Review follow-up also makes collapsed-tab fallback use the AX-aware query invalidation path and reports synthesized tap fallback diagnostics when the runner falls back to XCTest coordinates.
Adds ADR 0005 plus internal docs and AGENTS.md guidance for resolved element activation, runner lifecycle behavior, AX-unavailable target invalidation, concise recoverable error messages, and checking daemon logs for unexpectedly slow interactions. Current PR scope is 31 files, focused on iOS runner lifecycle/transport, Swift runner interaction/snapshot handling, targeted tests, command guidance, and internal docs.
Validation
CI failure follow-up passed locally: pnpm test:integration:provider -- test/integration/provider-scenarios/ios-lifecycle.test.ts first reproduced the failed provider scenario, then passed 19 files / 37 tests after updating the transcript. The same fix also unblocks coverage: sandbox-free pnpm test:coverage passed 249 files / 2148 tests with 83.66% statements and 85.72% lines, above the configured thresholds. A sandboxed coverage attempt failed on local permissions (listen EPERM) before this, so the comparable result is the sandbox-free run.
Focused regression coverage passed: pnpm exec vitest run src/platforms/ios/__tests__/runner-session.test.ts src/platforms/ios/__tests__/runner-client.test.ts passed 77 tests, pnpm exec vitest run src/platforms/ios/__tests__/index.test.ts --testNamePattern "iosRunnerOverrides" passed 9 relevant tests with 70 skipped, and pnpm exec vitest run src/utils/__tests__/output.test.ts src/utils/__tests__/args.test.ts passed 147 tests. Swift review-follow-up coverage was added in the runner source for synthesized tap fallback diagnostics, XCTEST_RECORDED_FAILURE response wrapping, and sparse runnerFatal AX snapshot payloads.
Build and static checks passed: pnpm build, pnpm check:quick, pnpm build:xcuitest, pnpm format, pnpm check:fallow --base origin/main, and git diff --check. pnpm format reformatted unrelated existing files; that formatter churn was removed from the final diff. pnpm test:skillgym:case ios-ax-unavailable-screenshot-coordinate-recovery was added but could not run in this sandbox because SkillGym requires external Codex/Claude runners and network access.
Dogfooded on React Navigation org.reactnavigation.playground: repeated snapshots stayed fast after the earlier runner lifecycle fix, and the original long outlier was confirmed as stale runner recovery rather than snapshot traversal.
Dogfooded twice on Bluesky xyz.blueskyweb.app from /Users/thymikee/Developer/bluesky with Metro on 8081. The final run reproduced the known Home-screen AX failure (kAXErrorIllegalArgument), returned a sparse 1-node snapshot in 4.67s, and daemon.log showed AGENT_DEVICE_RUNNER_TARGET_CACHE_INVALIDATE reason=ax_snapshot_unavailable, ios_runner_session_invalidated, and ** BUILD INTERRUPTED **. A follow-up coordinate tap to Search used a fresh runner and returned in 4.49s, screenshot /private/tmp/agent-device-bluesky-final/after-search.png showed Explore/Search, and the next snapshot was healthy at 177 total nodes / 43 visible in 0.81s. The manual session, Metro process tree, stale xcodebuild runner, and daemon metadata were cleaned up.
Updated the PR with an explicit uptime fast path in the Swift runner. The readiness probe is now answered directly by the HTTP listener before command journaling, the serial command queue, app activation, and main-thread XCTest dispatch. It still uses the existing command protocol, so we avoid adding a second health endpoint while removing the XCTest/app costs from the preflight.\n\nDogfood on React Navigation Example after rebuilding the local runner cache:\n- startup read-only snapshot skipped preflight as expected\n- ready-session preflights measured 2 ms, 3 ms, 4 ms, and 3 ms\n- successful Navigate to Details tap: preflight 4 ms, wall 0.71 s\n- post-tap snapshot: preflight 3 ms, snapshot_capture 120 ms, wall 0.19 s\n- daemon log had no TEST EXECUTE FAILED, runner restart, or snapshot matching failure lines for this session\n\nAlso added a runner journal regression test asserting uptime is not accepted into the command journal, and documented the probe contract in ADR 0005.
Cleanup pass done and pushed. The only additional code change is in runner-session: read-only and mutating commands now share one post-preflight send helper, and readiness decision derives read-only/probe status internally instead of being passed an options object. I also scanned for stale freshness-cache branches and skipped-preflight recovery metadata and found none remaining.
Validation after cleanup: focused iOS runner Vitest suite passed, pnpm check:quick passed, git diff --check passed, and pnpm build:xcuitest passed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stabilizes iOS runner sessions around navigation, stale runners, and broken XCTest accessibility trees.
The PR now does five concrete things: resolved iOS/macOS element activation uses coordinate-first taps, ready runner sessions are checked with a bounded uptime preflight instead of a legacy recent-success cache, AX-fatal iOS runner responses immediately invalidate the runner session so the next command starts from a fresh runner, AX-unavailable snapshot guidance clearly routes agents through plain screenshot plus coordinate recovery before retrying AX on the next screen, and the provider-backed iOS lifecycle scenario now covers synthesized coordinate taps from
press @ref.For broken-AX apps such as Bluesky, sparse snapshot fallbacks are no longer cached as healthy runner state. Mutating commands that record XCTest failures now return
XCTEST_RECORDED_FAILUREinstead of reporting success, and iOS coordinate-only taps use the lower-level synthesized event path so screenshot plus x/y recovery works even when app AX queries are poisoned. Review follow-up also makes collapsed-tab fallback use the AX-aware query invalidation path and reports synthesized tap fallback diagnostics when the runner falls back to XCTest coordinates.Adds ADR 0005 plus internal docs and AGENTS.md guidance for resolved element activation, runner lifecycle behavior, AX-unavailable target invalidation, concise recoverable error messages, and checking daemon logs for unexpectedly slow interactions. Current PR scope is 31 files, focused on iOS runner lifecycle/transport, Swift runner interaction/snapshot handling, targeted tests, command guidance, and internal docs.
Validation
CI failure follow-up passed locally:
pnpm test:integration:provider -- test/integration/provider-scenarios/ios-lifecycle.test.tsfirst reproduced the failed provider scenario, then passed 19 files / 37 tests after updating the transcript. The same fix also unblocks coverage: sandbox-freepnpm test:coveragepassed 249 files / 2148 tests with 83.66% statements and 85.72% lines, above the configured thresholds. A sandboxed coverage attempt failed on local permissions (listen EPERM) before this, so the comparable result is the sandbox-free run.Focused regression coverage passed:
pnpm exec vitest run src/platforms/ios/__tests__/runner-session.test.ts src/platforms/ios/__tests__/runner-client.test.tspassed 77 tests,pnpm exec vitest run src/platforms/ios/__tests__/index.test.ts --testNamePattern "iosRunnerOverrides"passed 9 relevant tests with 70 skipped, andpnpm exec vitest run src/utils/__tests__/output.test.ts src/utils/__tests__/args.test.tspassed 147 tests. Swift review-follow-up coverage was added in the runner source for synthesized tap fallback diagnostics,XCTEST_RECORDED_FAILUREresponse wrapping, and sparserunnerFatalAX snapshot payloads.Build and static checks passed:
pnpm build,pnpm check:quick,pnpm build:xcuitest,pnpm format,pnpm check:fallow --base origin/main, andgit diff --check.pnpm formatreformatted unrelated existing files; that formatter churn was removed from the final diff.pnpm test:skillgym:case ios-ax-unavailable-screenshot-coordinate-recoverywas added but could not run in this sandbox because SkillGym requires external Codex/Claude runners and network access.Dogfooded on React Navigation
org.reactnavigation.playground: repeated snapshots stayed fast after the earlier runner lifecycle fix, and the original long outlier was confirmed as stale runner recovery rather than snapshot traversal.Dogfooded twice on Bluesky
xyz.blueskyweb.appfrom/Users/thymikee/Developer/blueskywith Metro on 8081. The final run reproduced the known Home-screen AX failure (kAXErrorIllegalArgument), returned a sparse 1-node snapshot in 4.67s, and daemon.log showedAGENT_DEVICE_RUNNER_TARGET_CACHE_INVALIDATE reason=ax_snapshot_unavailable,ios_runner_session_invalidated, and** BUILD INTERRUPTED **. A follow-up coordinate tap to Search used a fresh runner and returned in 4.49s, screenshot/private/tmp/agent-device-bluesky-final/after-search.pngshowed Explore/Search, and the next snapshot was healthy at 177 total nodes / 43 visible in 0.81s. The manual session, Metro process tree, stale xcodebuild runner, and daemon metadata were cleaned up.