proposal: ingest repo records #282

sh.tangled.repo records are one of the last few records that need atprotation (2-way sync between appview/pds). this one is tricky because we want consistency between knot state, appview state and PDS state. the path forward is described below:

migration to did/rkey syntax universally: in several places, we utilize did/repo-name as a globally unique identifier for a repository, we should migrate this to did/rkey:
- for the tangled appview: this means reworking our ACLs, routers and DB to use did/rkey or ATURI as a globally unique identifier for a repo
- for knots: this means changing paths on disk to be did/rkey, allowing git ops to host:did/rkey , updating ACLs, and XRPC endpoints
- for spindles: this means updating ACLs and XRPC endpoints
once this is done, we can define ingestion logic for all services in the network to pull sh.tangled.repo records
- for the tangled appview, this should create/delete/update a repo pointer record
- for knots: this should create-or-ignore/delete-or-ignore/update-or-ignore a repo on disk. migration of repos needs to be thought out here.
- for spindles: as above
define edge-case behaviors:
- when referring to repos by rkey, it is possible for clever users to create duplicate records with the same repo-name, appview routers must handle this ambiguity with a new interstitial page that offers a redirect, knots must return an error upon push/pull on ambiguous git URLs
- the knot cli can introduce a command backfill or sync to bring knot state up to sync with the rest of the world (repos, collaborators, pubkeys, ACLs etc.).

Using did/rkey as internal identifier is a good idea. That will allow repository rename which is better than current.

once this is done, we can define ingestion logic for all services in the network to pull sh.tangled.repo records

Not sure I understood this correctly. you mean we will introduce migration logic for existing did/repo-name form of data to did/rkey on startup, right? yeah that sounds reasonable. We can remove that extra migration logic when we're out of the alpha phase.

it is possible for clever users to create duplicate records with the same repo-name

How about using rkey field itself for a repository name? I've seen we are already doing this for default labels. (e.g. gfi label has rkey good-first-issue). I'm not sure if this is ok in atproto spec.

the knot cli can introduce a command backfill or sync

Just to make sure, current knot also needs this anyway right? I haven't seen any backfill logic for public keys it is storing.

Though I think we can rather skip this part and go with did-for-repo. Both are breaking changes so it would be better to do once after proper discussion. It can also atprotate the repository and even allow cross-user migrations without loosing existing references. Issues and PRs will migrate with the repository itself. Also, serving repository identity in knot can solve knot<->pds syncing.