CLI tool to sync your Markdown to Leaflet
leafletpub atproto cli markdown

some facets have incorrect offsets #5

closed
opened by mackuba.eu

Some facet offsets are off - likely because of non-ASCII characters, which encode as more than one byte in UTF-8. The facet offsets are counted in bytes, so they need to be taken from a byte array representation of the (UTF-8 encoded) string.

Example block:

There are two relevant indexes in that table: one on `(repo, time)` (repo = user's DID), and one on just `(time)`. Roughly speaking, for those users who follow e.g. 80 or 200 people, it makes more sense to scan the `(repo, time)` index those 80-200 times and collect the 100 most recent posts from all of those found, and for those who follow e.g. 9000 (yes, that happens ๐Ÿ˜›), it's faster to scan the single `(time)` index until you find 100 relevant posts. But I've been struggling to make Postgres always use the right index.

The first 3 code sections look ok, but the last one renders as "single (time)", probably because it's after the "๐Ÿ˜›" emoji which makes any characters after it have different byte offsets than character offsets.

Oh yea true, I'm just using .length from JS string and from what I'm seeing its UTF-16. I'm doing it in one place only and from what I saw the solution isn't that complicated so should be a small fast fix. I'll fix and release it when I can.

sign up or login to add to the discussion
Labels

None yet.

assignee

None yet.

Participants 2
AT URI
at://did:plc:oio4hkxaop4ao4wz2pp3f4cr/sh.tangled.repo.issue/3m5onrv26kc22