forked from
microcosm.blue/microcosm-rs
Constellation, Spacedust, Slingshot, UFOs: atproto crates and services for microcosm
1# constellation 🌌
2
3A global atproto backlink index ✨
4
5- Self hostable: handles the full write throughput of the global atproto firehose on a raspberry pi 4b + single SSD
6- Storage efficient: less than 2GB/day disk consumption indexing all references in all lexicons and all non-atproto URLs
7- Handles record deletion, account de/re-activation, and account deletion, ensuring accurate link counts and respecting users data choices
8- Simple JSON API
9
10All social interactions in atproto tend to be represented by links (or references) between PDS records. This index can answer questions like "how many likes does a bsky post have", "who follows an account", "what are all the comments on a [frontpage](https://frontpage.fyi/) post", and more.
11
12- **status**: works! api is unstable and likely to change, and no known instances have a full network backfill yet.
13- source: [./constellation/](./constellation/)
14- public instance: [constellation.microcosm.blue](https://constellation.microcosm.blue/)
15
16_note: the public instance currently runs on a little raspberry pi in my house, feel free to use it! it comes with only with best-effort uptime, no commitment to not breaking the api for now, and possible rate-limiting. if you want to be nice you can put your project name and bsky username (or email) in your user-agent header for api requests._
17
18
19## API endpoints
20
21currently this is a bit out of date -- refer to the [api docs hosted by the app itself](https://constellation.microcosm.blue/) for now. they also let you try out live requests.
22
23terms as used here:
24
25- "URI": a URI, AT-URI, or DID.
26- "JSON path": a dot-separated (and dot-prefixed, for now) path to a field in an atproto record. Arrays are noted by `[]` and cannot contain a specific index.
27
28### `GET /links/count`
29
30The number of backlinks to a URI from a specified collection + json path.
31
32#### Required URL parameters
33
34- `target` (required): the URI. must be URL-encoded.
35 - example: `at%3A%2F%2Fdid%3Aplc%3A57vlzz2egy6eqr4nksacmbht%2Fapp.bsky.feed.post%2F3lg2pgq3gq22b`
36- `collection` (required): the source NSID of referring documents to consider.
37 - example: `app.bsky.feed.post`
38- `path` (required): the JSON path in referring documents to consider.
39 - example: `.subject.uri`
40
41#### Response
42
43A number (u64) in plain text format
44
45#### cURL example: Get a count of all bluesky likes for a post
46
47```bash
48curl '<HOST>/links/count?target=at%3A%2F%2Fdid%3Aplc%3A57vlzz2egy6eqr4nksacmbht%2Fapp.bsky.feed.post%2F3lg2pgq3gq22b&collection=app.bsky.feed.like&path=.subject.uri'
49
5040
51```
52
53### `GET /links/all/count`
54
55The number of backlinks to a URI from any source collection or json path
56
57#### Required URL parameters
58
59- `target` (required): the URI. must be URL-encoded.
60 - example: `did:plc:vc7f4oafdgxsihk4cry2xpze`
61
62#### Response
63
64A JSON object `{[NSID]: {[JSON path]: [N]}}`
65
66#### cURL example: Get reference counts to a DID from any collection at any path
67
68```bash
69curl '<HOST>/links/all/count?target=did:plc:vc7f4oafdgxsihk4cry2xpze'
70
71curl '<HOST>/links/all/count?target=did:plc:vc7f4oafdgxsihk4cry2xpze'
72{
73 "app.bsky.graph.block": { ".subject": 13 },
74 "app.bsky.graph.follow": { ".subject": 159 },
75 "app.bsky.feed.post": { ".facets[].features[].did": 16 },
76 "app.bsky.graph.listitem": { ".subject": 6 },
77 "app.bsky.graph.starterpack":
78 {
79 ".feeds[].creator.did": 1,
80 ".feeds[].creator.labels[].src": 1
81 }
82}
83```
84
85
86## Contributions
87
88### Licensing
89
90Constellation's source code is currently available exclusively under the AGPL license (see [LICENSE](./LICENSE)).
91
92In the future, its code MAY become available under the MIT and/or Apache2.0 licenses, at the sole discretion of the microcosm organization. Contributing implies acceptance with this possible future licensing change. The change has not happed yet and is not guaranteed.
93
94
95some todos
96
97- [x] find links and write them to rocksdb
98- [x] handle account active status
99- [x] handle account deletion
100- [ ] handle account privacy setting? (is this a bsky-nsid-specific config and should that matter?)
101 - instead of looking this up, should be able to listen for it to be published on the firehose.
102 - this should _work_, but without backfill it won't be accurate. targeted backfill might be an option.
103- [x] move ownership of canonical seq to an owned non-atomic
104- [x] custom path for db storage
105- [x] api server to look up backlink count
106- [~] other useful endpoints for the api server
107 - [x] show all nisd/path links to target
108 - [x] get backlinking dids
109 - [x] paging for all backlinking dids
110 - [x] get count + most recent dids
111 - [ ] get count with any dids from provided set
112- [~] write this readme
113- [?] fix it sometimes getting stuck
114 - seems to unstick in my possibly-different repro (letting laptop fall asleep) after a bit.
115 - [ ] add a detection for no new links coming in after some period
116 - [x] add tcp connect, read, and write timeouts 🤞
117- [x] handle jetstream restart: don't miss events (currently sketch: rewinds cursor by 1us so we will always double-count at least one event)
118 - [x] especially: figure out what the risk is to rotating to another jetstream server in terms of gap/overlap from a different jetstream instance's cursor (follow up separately)
119 - [x] jetstream: don't rotate servers, explicitly pass via cli
120- [x] metrics!
121 - [x] event ts lag
122- [x] machine resource metrics
123 - [x] disk consumption
124 - [x] cpu usage
125 - [x] mem usage
126 - [x] network?
127- [x] make all rocks apis return Result instead of unwrapping
128- [~] handle all the unwraps
129- [ ] deadletter queue of some kind for failed db writes
130 - [ ] also for valid json that was rejected?
131- [x] get it running on raspi
132- [x] get an estimate of disk usage per day after a few days of running
133 - very close to 1GB with data model before adding rkeys to linkers + fixing paths
134- [x] make the did_init check only happen on test config (or remove it) (removed)
135- [ ] actual error types (thiserror?) for lib-ish code
136- [~] clean up the main readme
137- [x] web server metrics
138 - [x] origin and ua labels
139- [ ] tokio metrics?
140- [x] handle shutdown cleanly -- be nice to rocksdb
141- [x] add user-agent to jetstream request
142- [ ] wow the shutdown stuff i wrote is really bad and doesn't work a lot
143- [x] serve html for browser requests
144- [ ] add a health check endpoint
145- [x] add seq numbers to metrics
146- [ ] persist the jetstream server url, error if started with a different one (maybe with --switch-streams or something)
147- [ ] put delete-account tasks into a separate (persisted?) task queue for the writer so it can work on them incrementally.
148- [ ] jetstream: connect retry: only reset counter after some *time* has passed.
149- [x] either count or estimate the total number of links added (distinct from link targets)
150- [x] jetstream: don't crash on connection refused (retry * backoff)
151- [x] allow cors requests (ie. atproto-browser. (but it's really meant for backends))
152- [x] api: get distinct linking dids (https://bsky.app/profile/bnewbold.net/post/3lhhzejv7zc2h)
153 - [x] endpoint for count
154 - [x] endpoint for listing them
155 - [x] add to exploratory /all endpoint
156- [ ] nginx: support http2
157- [x] nginx metrics
158- [ ] add TimeoutLayer for axum
159- [~] rocksdb metrics
160 - [x] write ops (count? per actionable?)
161 - [x] write time hist
162 - [ ] read ops (api)
163 - [ ] expose internal stats?
164- [ ] figure out what's the right thing to do if merge op fails. happened on startup after an unclean reboot.
165- [x] backups!
166 - [x] manual backup on startup
167 - [x] background task to create backups on an interval
168- [ ] add a low-ulimit check on startup?
169
170cache
171- [ ] set api response headers
172 - [ ] put "stale-while-revalidate" in Cache-Control w/ num seconds
173 - [ ] put "stale-if-error" in Cache-Control w/ num seconds
174 - [ ] set Expires or Cache-Control expires
175 - [ ] add Accept to vary response
176- [ ] cache vary: might need to take bsky account privacy setting into account (unless this ends up being in query)
177
178data fixes
179- [x] add rkey to linkers 🤦♀️
180- [x] don't remove deleted links from the reverse records -- null them out. this will keep things stable for paging.
181- [x] don't show deactivated accounts in link responses
182- [ ] canonicalize handles to dids!
183- [ ] links:
184 - [~] pull `$type`/`type` from object children of arrays (distinguish replies, quotes, etc)
185 - just $type to start
186 - [ ] rewrite the entire "path" stuff
187 - [ ] actually define the format (deal with in-band dots etc)
188 - [x] ~_could_ throw cid neighbour into the target. probably should? but it's a lot of high volume uncompressible bytes~
189 - and it could be looked up from the linker's doc
190 - ^^ for now, look up from source doc to get cid. might revisit this later.