Problem#
Workflows need to authenticate to external systems most commonly to deploy builds and packages. Spindle secrets cover this use case but have several drawbacks.
Firstly and most importantly, storing long-lived secrets for the purposes of authenticating to external systems carries a risk. And worst of all to mitigate this risk requires end-user action - a leaked secret can't be mitigated by any part of the Tangled infra without removing the secret and breaking pipelines. And that's assume that the AppView, spindle, or Knot host even has the capacity or knowledge to mitigate such leaks.
Secondly, secrets require communication with the AppView which is a potential place for a secret to be leaked in the event of a bug.
Proposal#
We can solve these problems by using short lived tokens via OIDC. Many services (eg. Cloud Providers) already support OIDC for auth, as do many pipeline providers.
A workflow can declare they require a token and the spindle will generate a short lived token for the workflow to use. The token will be signed by the spindle and can be verified by the external service.
Example#
# deploy.yaml
when:
- event: ["push", "pull_request"]
branch: ["master"]
dependencies:
nixpkgs:
- nodejs
steps:
- name: build site
command: |
./buikd.sh
- name: deploy
oidc_tokens:
# The token will be available in the environment variable DEPLOY_TOKEN
DEPLOY_TOKEN:
# Optionally customize the audience of the token
aud: "https://example.com/deploy"
command: |
./deploy.sh token=$DEPLOY_TOKEN
Token generation and verification#
When a workflow declares it requires an OIDC token, the spindle will generate a token for the workflow before starting execution. The token will be signed by the spindle service and will include the necessary claims for OIDC.
The spindle will also expose a public JWK endpoint that external services can use to verify the token. This endpoint will be discoverable via the standard OIDC Discovery mechanism.
Private key management#
The spindle will manage the private key used to sign the tokens. The key will be rotated periodically to ensure security.
I think we can use the existing secret store for this, or at least part of the implementation. Open to thoughts here as this part is still a bit fuzzy to me.
Security considerations#
The tokens will be short-lived and will only be valid for the duration of the workflow execution. This mitigates the risk of leaked secrets as the tokens will expire quickly. However, the spindle will need to ensure that the private key is not available to any workflow steps to prevent it from being leaked. I assume this is already possible with the secret store implementation as it already can restrict access to secrets per repo, this secret would just be a special case of a secret that is only available to the spindle service itself.
Token customization#
The workflow can customize the audience of the token by specifying the aud field in the id_token section. If not specified, a default audience will be used.
In the future it could be extended to allow other claims to be customized but this needs to be carefully considered to avoid security issues. GitHub Actions and GitLab pipelines allow customization of the audience within the pipeline itself, which is a good model to follow. Other claims would probably require configuration outside of the pipeline itself to avoid security issues.
Token Structure#
The token will be a JWT with a standard structure. It will include the following claims:
iss: Issuer, which will be the spindle service domain.sub: Subject, which will be the AT URI of the pipeline eg.at://{knot_did}/sh.tangled.pipeline/{workflow_id}.aud: Audience, which can be customized by the workflow or defaulted to the spindle service.expnbfiatjti
We can bikeshed any additional claims that we might want to include and iterate over time. The above is the minimum required for OIDC tokens to be useful but we can add more claims such as git_sha or
user_id to make it more useful for specific use cases.
Use of AT URI for sub claim#
This part I'm not entirely sure about. The sub claim is supposed to be a unique identifier for the user or entity that the token represents. Using the AT URI is nice because we don't need to invent a new format and it ties the token to the specific workflow execution. However it doesn't leave any room for additional information to be put in the sub claim like a git ref, these would need to be added as additional claims.
The reason I'm not sure about this is that external services might expect all of the required information to be in the sub claim and not allow authorization based on additional claims. This is something we can test with the services we want to support and iterate on.
Token lifetime#
Tokens will have a lifetime of 5 minutes by default but this could be configurable in the future.
It's important to keep the lifetime short but still long enough to be useful. This is the main reason for the token to be scoped to a specific step, and not the entire pipeline. Without this the token may expire on a slow build step before it can be used for a subsequent deployment step, for example.
I've started sketching out an implementation and it all seems very straightforward except storing and retrieving the private keys.
Using the secret manager for this is tricky, especially the part where the secret needs to not be linked to any repo.
For jwk rotation, maybe openbao could handle this? It would still need an application-level implementation for sqlite enjoyers though.
Maybe this trickiness can be completely sidestepped by just holding the secrets in memory. The JWKs I think can actually be ephemeral, the tokens are so short lived that it's probably ok for them to not be verifiable if the server crashes.