Constellation, Spacedust, Slingshot, UFOs: atproto crates and services for microcosm
1# constellation 🌌 2 3A global atproto backlink index ✨ 4 5- Self hostable: handles the full write throughput of the global atproto firehose on a raspberry pi 4b + single SSD 6- Storage efficient: less than 2GB/day disk consumption indexing all references in all lexicons and all non-atproto URLs 7- Handles record deletion, account de/re-activation, and account deletion, ensuring accurate link counts and respecting users data choices 8- Simple JSON API 9 10All social interactions in atproto tend to be represented by links (or references) between PDS records. This index can answer questions like "how many likes does a bsky post have", "who follows an account", "what are all the comments on a [frontpage](https://frontpage.fyi/) post", and more. 11 12- **status**: works! api is unstable and likely to change, and no known instances have a full network backfill yet. 13- source: [./constellation/](./constellation/) 14- public instance: [constellation.microcosm.blue](https://constellation.microcosm.blue/) 15 16_note: the public instance currently runs on a little raspberry pi in my house, feel free to use it! it comes with only with best-effort uptime, no commitment to not breaking the api for now, and possible rate-limiting. if you want to be nice you can put your project name and bsky username (or email) in your user-agent header for api requests._ 17 18 19## API endpoints 20 21currently this is a bit out of date -- refer to the [api docs hosted by the app itself](https://constellation.microcosm.blue/) for now. they also let you try out live requests. 22 23terms as used here: 24 25- "URI": a URI, AT-URI, or DID. 26- "JSON path": a dot-separated (and dot-prefixed, for now) path to a field in an atproto record. Arrays are noted by `[]` and cannot contain a specific index. 27 28### `GET /links/count` 29 30The number of backlinks to a URI from a specified collection + json path. 31 32#### Required URL parameters 33 34- `target` (required): the URI. must be URL-encoded. 35 - example: `at%3A%2F%2Fdid%3Aplc%3A57vlzz2egy6eqr4nksacmbht%2Fapp.bsky.feed.post%2F3lg2pgq3gq22b` 36- `collection` (required): the source NSID of referring documents to consider. 37 - example: `app.bsky.feed.post` 38- `path` (required): the JSON path in referring documents to consider. 39 - example: `.subject.uri` 40 41#### Response 42 43A number (u64) in plain text format 44 45#### cURL example: Get a count of all bluesky likes for a post 46 47```bash 48curl '<HOST>/links/count?target=at%3A%2F%2Fdid%3Aplc%3A57vlzz2egy6eqr4nksacmbht%2Fapp.bsky.feed.post%2F3lg2pgq3gq22b&collection=app.bsky.feed.like&path=.subject.uri' 49 5040 51``` 52 53### `GET /links/all/count` 54 55The number of backlinks to a URI from any source collection or json path 56 57#### Required URL parameters 58 59- `target` (required): the URI. must be URL-encoded. 60 - example: `did:plc:vc7f4oafdgxsihk4cry2xpze` 61 62#### Response 63 64A JSON object `{[NSID]: {[JSON path]: [N]}}` 65 66#### cURL example: Get reference counts to a DID from any collection at any path 67 68```bash 69curl '<HOST>/links/all/count?target=did:plc:vc7f4oafdgxsihk4cry2xpze' 70 71curl '<HOST>/links/all/count?target=did:plc:vc7f4oafdgxsihk4cry2xpze' 72{ 73 "app.bsky.graph.block": { ".subject": 13 }, 74 "app.bsky.graph.follow": { ".subject": 159 }, 75 "app.bsky.feed.post": { ".facets[].features[].did": 16 }, 76 "app.bsky.graph.listitem": { ".subject": 6 }, 77 "app.bsky.graph.starterpack": 78 { 79 ".feeds[].creator.did": 1, 80 ".feeds[].creator.labels[].src": 1 81 } 82} 83``` 84 85 86## Contributions 87 88### Licensing 89 90Constellation's source code is currently available exclusively under the AGPL license (see [LICENSE](./LICENSE)). 91 92In the future, its code MAY become available under the MIT and/or Apache2.0 licenses, at the sole discretion of the microcosm organization. Contributing implies acceptance with this possible future licensing change. The change has not happed yet and is not guaranteed. 93 94 95some todos 96 97- [x] find links and write them to rocksdb 98- [x] handle account active status 99- [x] handle account deletion 100- [ ] handle account privacy setting? (is this a bsky-nsid-specific config and should that matter?) 101 - instead of looking this up, should be able to listen for it to be published on the firehose. 102 - this should _work_, but without backfill it won't be accurate. targeted backfill might be an option. 103- [x] move ownership of canonical seq to an owned non-atomic 104- [x] custom path for db storage 105- [x] api server to look up backlink count 106- [~] other useful endpoints for the api server 107 - [x] show all nisd/path links to target 108 - [x] get backlinking dids 109 - [x] paging for all backlinking dids 110 - [x] get count + most recent dids 111 - [ ] get count with any dids from provided set 112- [~] write this readme 113- [?] fix it sometimes getting stuck 114 - seems to unstick in my possibly-different repro (letting laptop fall asleep) after a bit. 115 - [ ] add a detection for no new links coming in after some period 116 - [x] add tcp connect, read, and write timeouts 🤞 117- [x] handle jetstream restart: don't miss events (currently sketch: rewinds cursor by 1us so we will always double-count at least one event) 118 - [x] especially: figure out what the risk is to rotating to another jetstream server in terms of gap/overlap from a different jetstream instance's cursor (follow up separately) 119 - [x] jetstream: don't rotate servers, explicitly pass via cli 120- [x] metrics! 121 - [x] event ts lag 122- [x] machine resource metrics 123 - [x] disk consumption 124 - [x] cpu usage 125 - [x] mem usage 126 - [x] network? 127- [x] make all rocks apis return Result instead of unwrapping 128- [~] handle all the unwraps 129- [ ] deadletter queue of some kind for failed db writes 130 - [ ] also for valid json that was rejected? 131- [x] get it running on raspi 132- [x] get an estimate of disk usage per day after a few days of running 133 - very close to 1GB with data model before adding rkeys to linkers + fixing paths 134- [x] make the did_init check only happen on test config (or remove it) (removed) 135- [ ] actual error types (thiserror?) for lib-ish code 136- [~] clean up the main readme 137- [x] web server metrics 138 - [x] origin and ua labels 139- [ ] tokio metrics? 140- [x] handle shutdown cleanly -- be nice to rocksdb 141- [x] add user-agent to jetstream request 142- [ ] wow the shutdown stuff i wrote is really bad and doesn't work a lot 143- [x] serve html for browser requests 144- [ ] add a health check endpoint 145- [x] add seq numbers to metrics 146- [ ] persist the jetstream server url, error if started with a different one (maybe with --switch-streams or something) 147- [ ] put delete-account tasks into a separate (persisted?) task queue for the writer so it can work on them incrementally. 148- [ ] jetstream: connect retry: only reset counter after some *time* has passed. 149- [x] either count or estimate the total number of links added (distinct from link targets) 150- [x] jetstream: don't crash on connection refused (retry * backoff) 151- [x] allow cors requests (ie. atproto-browser. (but it's really meant for backends)) 152- [x] api: get distinct linking dids (https://bsky.app/profile/bnewbold.net/post/3lhhzejv7zc2h) 153 - [x] endpoint for count 154 - [x] endpoint for listing them 155 - [x] add to exploratory /all endpoint 156- [ ] nginx: support http2 157- [x] nginx metrics 158- [ ] add TimeoutLayer for axum 159- [~] rocksdb metrics 160 - [x] write ops (count? per actionable?) 161 - [x] write time hist 162 - [ ] read ops (api) 163 - [ ] expose internal stats? 164- [ ] figure out what's the right thing to do if merge op fails. happened on startup after an unclean reboot. 165- [x] backups! 166 - [x] manual backup on startup 167 - [x] background task to create backups on an interval 168- [ ] add a low-ulimit check on startup? 169 170cache 171- [ ] set api response headers 172 - [ ] put "stale-while-revalidate" in Cache-Control w/ num seconds 173 - [ ] put "stale-if-error" in Cache-Control w/ num seconds 174 - [ ] set Expires or Cache-Control expires 175 - [ ] add Accept to vary response 176- [ ] cache vary: might need to take bsky account privacy setting into account (unless this ends up being in query) 177 178data fixes 179- [x] add rkey to linkers 🤦‍♀️ 180- [x] don't remove deleted links from the reverse records -- null them out. this will keep things stable for paging. 181- [x] don't show deactivated accounts in link responses 182- [ ] canonicalize handles to dids! 183- [ ] links: 184 - [~] pull `$type`/`type` from object children of arrays (distinguish replies, quotes, etc) 185 - just $type to start 186 - [ ] rewrite the entire "path" stuff 187 - [ ] actually define the format (deal with in-band dots etc) 188 - [x] ~_could_ throw cid neighbour into the target. probably should? but it's a lot of high volume uncompressible bytes~ 189 - and it could be looked up from the linker's doc 190 - ^^ for now, look up from source doc to get cid. might revisit this later.