fix(opal-server): propagate scope data config to scoped clients#918
Open
Kyzgor wants to merge 1 commit into
Open
fix(opal-server): propagate scope data config to scoped clients#918Kyzgor wants to merge 1 commit into
Kyzgor wants to merge 1 commit into
Conversation
In scopes mode, a scoped agent that connects before its scope is created never receives the scope's data-source configuration. On scope create/update the server publishes a policy sync trigger but no data update, and the client fetches its data config only on (re)connect, so the agent loads policy but none of the scope's data and every authorization decision that depends on it fails (permitio#779). The not-yet-synced data route also falls back to the server-global OPAL_DATA_CONFIG_SOURCES, whose bare `policy_data` topic is disjoint from a scoped agent's `{scope_id}:data:{topic}` subscriptions, so the client discards every entry. Fix, server-side only: - put_scope publishes the scope's data entries as a DataUpdate (via the existing DataUpdatePublisher) when the data config is new or changed, so already-connected scoped clients receive it. - Entry topics served and published are namespaced `{scope_id}:data:{topic}` (on a deep copy; the persisted scope keeps the authored form). Bare/default topics get the scope prefix; already-namespaced topics pass through. This also closes a permanent variant where a default-topic entry was discarded on every base-fetch. - The absent-scope branch of get_scope_data_config returns an empty DataSourceConfig(entries=[]) instead of the server-global default; the external_source_url redirect branch is unchanged. Adds a 10-test regression suite under packages/opal-server/opal_server/scopes/tests/ (the scopes module had no tests before). Closes permitio#779
✅ Deploy Preview for opal-docs canceled.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes Issue
Closes #779
Changes proposed
In scopes mode (
OPAL_SCOPES=1), a scoped agent that connects before its scope is created now receives thescope's data-source configuration.
Two paths led to the reported symptom. On scope create/update the server published a policy sync trigger
but no data update, and the client fetches its data config only on (re)connect — so an agent connected
before its scope existed loaded policy but none of the scope's data, and every authorization decision that
depends on that data failed. Separately, while a scope is not yet synced,
GET /scopes/{scope_id}/datafell back to the server-global
OPAL_DATA_CONFIG_SOURCES, whose barepolicy_datatopic is disjoint from ascoped agent's
{scope_id}:data:{topic}subscriptions, so the client discarded every entry.The fix is server-side only (no client, schema, or protocol change), in
packages/opal-server/opal_server/scopes/api.py:PUT /scopes. After the scope is persisted,put_scopepublishes itsdata.entriesasa
DataUpdatevia the sameDataUpdatePublisherthe data-update endpoint uses, so agents that connectedbefore the scope existed receive it. It fires only when the data config is new or changed
(
existing_scope.data != scope_in.data), mirroring the policy path's notify-on-change.{scope_id}:data:{topic}at both the publish and the serve paths, via a smallhelper (
normalize_data_topics_for_scope). Bare or default topics get the scope prefix; already-namespaced authored topics pass through unchanged. This also closes a permanent variant of the bug: an
entry authored with the schema-default
policy_datatopic was discarded by the scoped agent on everybase-fetch, independent of the create race. Normalization runs on a deep copy, so the persisted scope
keeps the authored form.
except ScopeNotFoundErrorbranch ofget_scope_data_configreturns anempty
DataSourceConfig(entries=[])instead of the server-global default; theexternal_source_urlredirect branch is unchanged.
Adds
packages/opal-server/opal_server/scopes/tests/test_scope_data_config_propagation.py(10 tests); thescopes/module had no tests before.Check List (Check all the applicable boxes)
Screenshots
N/A — server behavior change; before/after is below.
Note to reviewers
Reproduce with a scopes-mode server and a scoped client (
OPAL_SCOPE_ID=documents,OPAL_DATA_TOPICS=data)that connects before the scope exists, then
PUT /scopesfordocumentswith adata.entries[]configtopic'd
documents:data:dataand a rego-only policy source. Before the fix, the server logsRequested scope documents not found, returning OPAL_DATA_CONFIG_SOURCES, serves the bare-policy_datadefault thescoped client discards, and the scope's data never reaches OPA; after, the client receives the namespaced
update, writes the entries to OPA, and they persist. The new suite is 10 tests, all green; reverting only
this fix makes 7 of the 10 fail, and all 10 pass once it's restored. The existing data-updater,
server-to-client integration, and data-update-publisher tests stay green, and
flake8(the CI selection)is clean on the changed file.
On the absent-scope fallback I return HTTP 200 with an empty config rather than 404, because
DataUpdater.get_policy_data_configreturns only on 200 and raises (uncaught at connect) on anything else,so a 404 would break currently deployed clients. Happy to switch to a 404/not-ready contract instead given
a coordinated client/server version bump.
One known limitation, not introduced here: entries delivered by this push are fetched once and don't begin
periodic_update_intervalpolling until the client's next reconnect, since the client schedules pollingonly in its on-connect base fetch. That still beats today's behavior (nothing propagates) and matches
every other pushed
DataUpdate; I can follow up if you'd like periodic entries to self-schedule on push.Deliberately out of scope, to keep this to one concern: a policy repo that ships a root
data.json(thebundle import writes it to OPA's root document and can overwrite delivered data — a pre-existing, OPAL-wide
interaction; use a rego-only source to reproduce #779 cleanly); the policy route's own not-ready handling
and any scope create/sync ordering redesign; and a couple of related issues in adjacent scope code paths
that I'll file separately. The docs note for the changed fallback and the namespacing contract is
intentionally deferred to a follow-up.