Skip to content

Support Provider Configuration via Helm Values / Config File #1886

@sjoerdvanBommel

Description

@sjoerdvanBommel

Problem Statement

To use providers in sandboxes, they must be pre-registered using the openshell provider create CLI command:

openshell provider create --name my-provider --type generic \
  --credential API_KEY="${API_KEY}" \
  --credential TOKEN="${TOKEN}"

In Kubernetes deployments using the Helm chart, this typically requires a separate job/script to run somewhere, which is inconvenient and imperative

Proposed Design

Allow providers to be configured declaratively via a config file or Helm values:

openshell:
  providers:
    - name: cimpress-bedrock
      type: generic
      credentials:
        - key: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: openshell-providers
              key: openai-api-key
....

The gateway would automatically register these providers on startup using the configured credentials.

Alternatives Considered

  • Separate init container
  • Lifecycle hooks (postStart): Would require upstream chart to expose lifecycle hook configuration, which currently isn't available.

Agent Investigation

Problem Statement

OpenShell providers must currently be registered via openshell provider create CLI commands, which requires post-install Kubernetes Jobs that run after gateway deployment. This adds operational complexity: operators must create separate Jobs with CLI installation, manage timing dependencies with gateway readiness, and script provider registration commands. For multi-tenant Kubernetes deployments using GitOps workflows, this imperative approach doesn't align with declarative configuration patterns used throughout the stack.

This feature request proposes adding declarative provider configuration via Helm chart values and gateway.toml, allowing providers to be defined alongside other gateway configuration and automatically synced at startup. Credentials would be sourced from Kubernetes Secrets, matching standard Kubernetes patterns for secret management.

Technical Context

The current provider management system stores providers in the gateway's database (PostgreSQL or SQLite) and manages them through a gRPC API. The CLI (openshell provider create) parses credentials, calls the gRPC endpoint, and the gateway validates and persists the provider record. Providers include metadata (name, type), credentials (map of key-value pairs stored encrypted), and configuration options. A background refresh worker mints short-lived tokens for OAuth2/JWT flows.

The gateway.toml config file is already parsed at startup using the serde-based ConfigFile struct, but providers are not part of the config schema. The gateway startup sequence connects to the database, initializes components, and resumes persisted sandboxes, but does not reconcile any declarative provider definitions.

Adding declarative configuration requires building a provider sync mechanism that runs during gateway startup, reads declared providers from gateway.toml (sourced from Helm-generated ConfigMaps), resolves credentials from Kubernetes Secrets, and ensures the providers exist in the database before sandboxes resume.

Affected Components

Component Key Files Role
Gateway Config crates/openshell-server/src/config_file.rs Parse gateway.toml including new providers section
Gateway Startup crates/openshell-server/src/lib.rs Orchestrate provider sync during run_server() initialization
Provider Sync (new) crates/openshell-server/src/provider_sync.rs Reconcile declared providers with database state, resolve credentials from Secrets
Provider gRPC crates/openshell-server/src/grpc/provider.rs Reuse existing create_provider_record() validation and persistence logic
Helm Chart deploy/helm/openshell/values.yaml, templates Define providers values schema, generate ConfigMap, update RBAC for Secret reads
Gateway Config Docs docs/reference/gateway-config.mdx Document new [openshell.gateway.providers] TOML section and credential sourcing patterns

Technical Investigation

Architecture Overview

Current Provider Storage:

  • Providers are stored in a database (PostgreSQL or SQLite) as protobuf objects in the objects table
  • Schema: object_type = "provider", name (unique), id (UUID), payload (protobuf binary), resource_version (for optimistic concurrency control)
  • Each provider has metadata, type, credentials (encrypted map<string,string>), config (map<string,string>), and credential expiry timestamps
  • Credentials are redacted in API responses and only exposed to sandboxes via GetInferenceBundleRequest or environment injection

Provider Creation Flow:

  1. CLI parses openshell provider create command → validates type, discovers credentials from flags or environment
  2. CLI calls gRPC CreateProviderRequest with credentials and config
  3. Gateway create_provider_record() validates the provider type, checks for name collisions, and persists to database with MustCreate CAS condition
  4. Background refresh worker mints short-lived tokens for OAuth2/JWT providers (e.g., Vertex AI)

Gateway Startup Sequence:

  1. Parse CLI args and load gateway.toml (config_file.rs)
  2. Connect to database (Store::connect())
  3. Initialize OIDC, sandbox index, compute drivers
  4. Build ServerState with store, compute, auth components
  5. Resume persisted sandboxes
  6. Spawn watchers and background workers including provider refresh
  7. Start gRPC/HTTP listeners

No existing reconciliation mechanism: The gateway does not currently check for declarative provider definitions at startup. Providers are only created via explicit gRPC API calls.

Code References

Location Description
crates/openshell-cli/src/main.rs:717-827 CLI ProviderCommands::Create struct and argument parsing
crates/openshell-cli/src/run.rs:4476-4653 provider_create() function: credential discovery, validation, gRPC call
crates/openshell-server/src/grpc/provider.rs:61-140 create_provider_record() gRPC handler: validates type, persists to database
crates/openshell-server/src/persistence/mod.rs:114-199 Store enum dispatching to PostgreSQL/SQLite backends, CAS operations
proto/datamodel.proto:32-44 Provider protobuf message schema
crates/openshell-server/src/config_file.rs:38-156 ConfigFile and GatewayFileSection structs for gateway.toml parsing
crates/openshell-server/src/lib.rs:194-351 run_server() gateway startup orchestration, provider refresh worker spawn at line 351
deploy/helm/openshell/values.yaml Helm chart values schema (needs providers array addition)

Current Behavior

Providers are created imperatively via CLI commands. The CLI:

  1. Parses --credential KEY=VALUE or --credential KEY (discovers from environment) flags
  2. Validates provider type against supported types (generic, vertex-ai, etc.)
  3. Calls gRPC CreateProviderRequest with credentials map and config map
  4. Gateway validates and persists to database with MustCreate (fails if name already exists)

Kubernetes deployments require a post-install Job that:

  1. Installs openshell CLI
  2. Registers the gateway using bootstrap token
  3. Runs openshell provider create commands for each provider
  4. Manages timing/readiness checks to ensure gateway is available

What Would Need to Change

1. Config File Schema (config_file.rs)

  • Add providers: Option<Vec<ProviderDeclaration>> to GatewayFileSection struct
  • Define new types:
    pub struct ProviderDeclaration {
        pub name: String,
        pub provider_type: String,
        pub credentials: Vec<CredentialSource>,
        pub config: HashMap<String, String>,
    }
    
    pub enum CredentialSource {
        SecretKeyRef { secret_name: String, key: String },
        EnvVar { name: String },
    }
  • Validation: reject Literal credential sources (force operators to use Secrets)

2. Provider Sync Module (new provider_sync.rs)

  • sync_declarative_providers(store: &Store, config_file: &ConfigFile, namespace: &str) -> Result<()>

    • Read config_file.gateway.providers
    • For each declared provider:
      • Resolve credentials from Kubernetes Secrets or environment variables
      • Check if provider exists in database (by name)
      • If missing: call create_provider_record() to persist
      • If exists: log warning (create-only mode, no updates)
    • Return Err on any failure (fail-fast: gateway does not start if provider sync fails)
  • resolve_credential_sources(credentials: &[CredentialSource], namespace: &str) -> Result<HashMap<String, String>>

    • For SecretKeyRef: use kube client to GET /api/v1/namespaces/{namespace}/secrets/{secret_name}, extract data[key], base64 decode
    • For EnvVar: read from std::env::var()
    • Fail if Secret does not exist, key is missing, or env var is unset
  • ensure_provider(store: &Store, decl: &ProviderDeclaration, credentials: HashMap<String, String>) -> Result<()>

    • Call existing create_provider_record() logic (reuse validation)
    • Handle MustCreate conflict error → log warning, skip (provider already exists)

3. Gateway Startup (lib.rs)

  • Insert provider sync call after database connection (around line 206, after Store::connect())
  • Must run before SandboxIndex::new() and sandbox resume to ensure providers exist before sandboxes try to use them
  • Pass config_file, store, and detected namespace (from K8s API or --namespace flag)
  • Fail gateway startup if sync_declarative_providers() returns Err

4. Kubernetes Client Integration

  • Provider sync module needs kube crate to read Secrets (already a dependency for K8s compute driver)
  • Reuse existing K8s client initialization from openshell-driver-kubernetes
  • Handle in-cluster auth (ServiceAccount token) vs out-of-cluster kubeconfig

5. Helm Chart Updates

  • values.yaml: Add providers: [] array schema
    providers:
      - name: gitlab
        type: generic
        credentials:
          - key: GITLAB_TOKEN
            valueFrom:
              secretKeyRef:
                name: openshell-providers
                key: gitlab-token
  • ConfigMap template: Render providers into gateway.toml as TOML array
  • RBAC: Update gateway ServiceAccount Role to allow get on Secrets with label openshell.io/provider-credentials: "true" (label-based RBAC selector)
  • Example Secret manifest: Provide example Secret in chart README

Alternative Approaches Considered

1. --auto-providers flag (rejected)

  • Already exists but relies on environment variable credential discovery
  • Doesn't work for custom provider types with non-standard credential keys
  • Still requires passing all credentials as environment variables (no Secret sourcing)
  • Doesn't support GitOps declarative patterns

2. Simplified post-install Job (rejected)

  • Reduces complexity by running job only once (not on upgrades)
  • Still requires CLI installation, scripting, timing coordination
  • Not declarative, adds operational overhead

3. Init containers (rejected)

  • Tightly coupled to gateway pod lifecycle
  • Still requires CLI installation and scripting
  • Harder to debug than separate Jobs

4. Lifecycle hooks (postStart) (rejected)

  • Helm chart does not expose lifecycle hook configuration
  • Would require upstream chart changes
  • postStart hooks block pod readiness, delaying service availability

5. Full reconciliation with prune (rejected for initial version)

  • Declared providers become authoritative, CLI-created providers not in config are deleted
  • Dangerous: breaks existing workflows, destroys user-managed providers
  • Better to start with create-only mode and add prune as opt-in feature later

Patterns to Follow

1. Config File Parsing

  • Follow existing TOML deserialization patterns in config_file.rs
  • Use #[serde(default)] for optional fields
  • Add validation in ConfigFile::validate() method

2. Error Handling

  • Return anyhow::Result<()> for provider sync
  • Use context() to add breadcrumbs for debugging (e.g., "failed to read Secret {name}/{key}")
  • Fail fast: gateway does not start if provider sync fails (matches existing behavior for required config)

3. Kubernetes Client Usage

  • Reuse openshell-driver-kubernetes client setup
  • Handle both in-cluster and out-of-cluster auth
  • Use label selectors for RBAC restrictions

4. Provider Creation

  • Reuse existing create_provider_record() validation logic (don't duplicate)
  • Respect MustCreate CAS semantics (fail if provider already exists)
  • Log provider creation events at INFO level for audit trail

5. Testing

  • Unit tests for credential resolution (resolve_credential_sources)
  • Integration tests for provider sync (in-memory SQLite database)
  • E2E test for Helm deployment with declarative providers (requires K8s cluster)

Proposed Approach

Add a provider sync mechanism that runs during gateway startup:

  1. Extend gateway.toml schema to include an optional providers array with name, type, credentials (sourced from Secrets or env vars), and config.

  2. Build provider sync module that reads declared providers from config, resolves credentials from Kubernetes Secrets using the kube client, and ensures each provider exists in the database by calling the existing create_provider_record() function.

  3. Integrate sync into gateway startup immediately after database connection and before sandbox resume, ensuring providers are available when sandboxes start.

  4. Update Helm chart to expose providers in values.yaml, render them into a ConfigMap that mounts as gateway.toml, and update RBAC to allow reading Secrets with a specific label.

  5. Use create-only mode for initial version: declared providers are created if missing but ignored if they already exist. CLI-created providers are unaffected. Log warnings for name collisions. This avoids destructive operations and maintains backwards compatibility.

  6. Fail fast on sync errors: if any provider fails to sync (Secret missing, invalid type, database error), the gateway does not start. This gives operators immediate feedback that configuration is broken.

Scope Assessment

  • Complexity: Medium
  • Confidence: High (clear path, well-understood provider system, existing patterns to follow)
  • Estimated files to change: 8-10 files (3 core Rust files, 4 Helm templates, 2 doc files)
  • Issue type: feat

Risks & Open Questions

Risks:

  • CWE-522 (Insufficiently Protected Credentials): HIGH — Declarative config encourages storing credentials in ConfigMaps. Mitigation: ONLY support valueFrom.secretKeyRef, reject literal credential values in TOML. Document that credentials MUST live in Secrets.
  • CWE-269 (Improper Privilege Management): MEDIUM — Gateway ServiceAccount gains get permission on Secrets. Mitigation: Use label-based RBAC selector (openshell.io/provider-credentials), requiring operators to explicitly label provider Secrets.
  • CWE-1188 (Insecure Default Configuration): MEDIUM — Default Helm chart might include example providers with placeholder credentials. Mitigation: Do NOT include any provider in default values.yaml. Document that providers: [] is the safe default.

Edge Cases:

  • Secret does not exist: Fail provider sync with clear error, prevent gateway startup. Operators must create Secret before deploying gateway.
  • Provider name collision: Declared provider name matches existing CLI-created provider. Log warning, skip creation (CLI takes precedence). Document this behavior.
  • Partial sync failure: One provider succeeds, another fails. Gateway fails to start (all-or-nothing). Clear error messages guide operators to fix the broken provider.

Open Design Questions (need stakeholder input):

  1. Sync strategy: Create-only (recommended), full reconciliation, or label-based hybrid (managed-by: config)?

  2. Credential update policy: Should declarative config be allowed to update existing providers' credentials, or is create-only the permanent behavior?

  3. RBAC scope: Should gateway read all Secrets in the namespace, only labeled Secrets (openshell.io/provider-credentials: "true"), or support cross-namespace Secret reads?

  4. Failure mode: Fail gateway startup on any provider sync error (safe, recommended) or log error and continue with partial sync (forgiving, but leaves system in inconsistent state)?

  5. Helm integration depth: Should provider credentials live in a separate Secret created outside Helm (recommended, matches operator pattern), or should Helm chart accept credential values that it templates into the Secret (dangerous, credentials in values files)?

Test Considerations

Unit Tests:

  • resolve_credential_sources() with mocked kube client: test SecretKeyRef resolution, env var fallback, error handling for missing Secrets/keys
  • Provider declaration parsing: valid TOML → struct deserialization, invalid TOML → validation errors

Integration Tests:

  • Provider sync with in-memory SQLite database
  • Create-only behavior: declare provider, sync, declare same provider again, verify it's not duplicated
  • Name collision: CLI-created provider + declarative provider with same name → warning logged, no overwrite
  • Partial failure: multiple providers declared, one has invalid type → gateway fails to start with clear error

E2E Tests:

  • Helm deployment with declarative providers (requires K8s cluster)
  • Deploy gateway with providers in values.yaml, verify providers exist in database via openshell provider list
  • Create Secret with credentials, deploy gateway referencing Secret, verify provider credentials work in sandbox
  • RBAC test: deploy gateway without Secret read permission, verify sync fails with permission error

Test Patterns from Existing Code:

  • Provider tests in crates/openshell-server/src/grpc/provider.rs use TestFixture with in-memory store
  • E2E tests in .agents/skills/test-release-canary/ use openshell CLI for validation
  • Follow existing test file organization: unit tests in module files, integration tests in tests/ directory

Created by spike investigation. Review the proposed approach and design decisions, then use build-from-issue to plan and implement.

Checklist

  • I've reviewed existing issues and the architecture docs
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions