Monorepo for Tangled โ€” https://tangled.org

proposal: generalized collection migration UI #299

open
opened by boltless.me

As an atproto service, we will face situations where we need all users to migrate their existing records to new collection/lexicon schema.

For example,

  • #294 will merge two different repo.*.comment collections into one feed.comment
  • There is ongoing discussion to use did for repositories, which might require all users to migrate their existing records.

To support this, it will be really convenient if we have a generalized user data migration UI.

Implementation#

Similar to DB migration but it opposite way, the appview will have atproto_need_migrations table holding user did & migration name. When we change lexicon NSID or schema in backwards incompatible way, we will list all authors of legacy records to the atproto_need_migrations table. So when listed user logs in, we will show them about the need of record migrations on banner. Which navigates to a dedicated page.

In most cases, we can just drop the legacy table as the data is atprotated. We can refetch them when the user migrate their legacy records.

why not just automatically run the migration of records in the background upon login?

i personally do believe that migrating records is essential. but realistically; this will not happen for all users that have created tangled records. we should therefore announce a date on which we stabilize records; and that would be the earliest date for which network-wide backfill is valid + supported

or write appview code to handle ingestion of backward incompatible changes:

  • for lexicon name changes; this is trivial; we just need to ingest records with both NSIDs
  • similarly with field names (support unmarshal of new and old fields)
  • other records will be tricky; but if there exists a one-one mapping of old to new model; we can write ingesters accordingly

this way backfill can be run from older dates.

why not just automatically run the migration of records in the background?

Because that might take some time, fail, or require additional permission.

I think first option is more realistic. Writing ingester/model logic for all backward incompatible changes will quickly be huge mess. So we should announce a date for stabilized records at some point and ideally that should happen only once.

For now, I want to unify sh.tangled.*.comment records into sh.tangled.feed.comment. Which will be done in following steps:

  1. list all users with legacy records, and drop all legacy records
  2. when they log in, run migration (can be run background for now, but I think user-faced frontend would be helpful for communication. they do have rights to know what's happening with their PDS)
  3. in future, drop the user list with this migration, this will abandon all non-migrated comments.

@oppi.li does this this sound reasonable?

sign up or login to add to the discussion
Labels

None yet.

area
appview
assignee
boltless.me
Participants 2
AT URI
at://did:plc:xasnlahkri4ewmbuzly2rlc5/sh.tangled.repo.issue/3m5tix52bpr22