cozy-ucosm


## gateway

- tailscale (exit node enabled)
  -> allow ipv4 and ipv6 forwarding
- caddy

    ```bash
    apt install golang
    go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest
    go/bin/xcaddy build \
      --with github.com/caddyserver/cache-handler \
      --with github.com/darkweak/storages/badger/caddy \
      --with github.com/mholt/caddy-ratelimit
    # then https://caddyserver.com/docs/running#manual-installation

    mkdir /var/cache/caddy-badger
    chown -R caddy:caddy /var/cache/caddy-badger/
    ```

    - `/etc/caddy/Caddyfile`

        ```
        {
          cache {
            badger
            api {
              prometheus
            }
          }
        }

        links.bsky.bad-example.com {
          reverse_proxy link-aggregator:6789

          @browser `{header.Origin.startsWith("Mozilla/5.0")`
          rate_limit {
            zone global_burst {
              key {remote_host}
              events 10
              window 1s
            }
            zone global_general {
              key {remote_host}
              events 100
              window 60s
              log_key true
            }
            zone website_harsh_limit {
              key {header.Origin}
              match {
                expression {header.User-Agent}.startsWith("Mozilla/5.0")
              }
              events 1000
              window 30s
              log_key true
            }
          }
          respond /souin-api/metrics "denied" 403 # does not work
          cache {
            ttl 3s
            stale 1h
            default_cache_control public, s-maxage=3
            badger {
              path /var/cache/caddy-badger/links
            }
          }
        }

        gateway:80 {
          metrics
          cache
        }
        ```
  well... the gateway fell over IMMEDIATELY with like 2 req/sec from deletions, with that ^^ config. for now i removed everything except the reverse proxy config + normal caddy metrics and it's running fine on vanilla caddy. i did try reducing the rate-limiting configs to a single, fixed-key global limit but it still ate all the ram and died. maybe badger w/ the cache config was still a problem. maybe it would have been ok on a machine with more than 1GB mem.


  alternative proxies:

    - nginx. i should probably just use this. acme-client is a piece of cake to set up, and i know how to configure it.
    - haproxy. also kind of familiar, it's old and stable. no idea how it handle low-mem (our 1gb) vs nginx.
    - sozu. popular rust thing, fast. doesn't have rate-limiting or cache feature?
    - rpxy. like caddy (auto-tls) but in rust and actually fast? has an "experimental" cache feature. but the cache feature looks good.
    - rama. build-your-own proxy. not sure that it has both cache and limiter in their standard features?
    - pingora. build-your-own cloudflare, so like, probably stable. has tools for cache and limiting. low-mem...?
      - cache stuff in pingora seems a little... hit and miss (byeeeee). only a test impl for Storage for the main cache feature?
      - but the rate-limiter has a guide: https://github.com/cloudflare/pingora/blob/main/docs/user_guide/rate_limiter.md

  what i want is low-resource reverse proxy with built-in rate-limiting and caching. but maybe cache (and/or ratelimiting) could be external to the reverse proxy
    - varnish is a dedicated cache. has https://github.com/varnish/varnish-modules/blob/master/src/vmod_vsthrottle.vcc
    - apache traffic control has experimental rate-limiting plugins


- victoriametrics

    ```bash
    curl -LO https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.109.1/victoria-metrics-linux-amd64-v1.109.1.tar.gz
    tar xzf victoria-metrics-linux-amd64-v1.109.1.tar.gz
    # and then https://docs.victoriametrics.com/quick-start/#starting-vm-single-from-a-binary
    sudo mkdir /etc/victoria-metrics && sudo chown -R victoriametrics:victoriametrics /etc/victoria-metrics

    ```

    - `/etc/victoria-metrics/prometheus.yml`

        ```yaml
global:
  scrape_interval: '15s'

scrape_configs:
  - job_name: 'link_aggregator'
    static_configs:
      - targets: ['link-aggregator:8765']
  - job_name: 'gateway:caddy'
    static_configs:
      - targets: ['gateway:80/metrics']
  - job_name: 'gateway:cache'
    static_configs:
      - targets: ['gateway:80/souin-api/metrics']
        ```

    - `ExecStart` in `/etc/systemd/system/victoriametrics.service`:

        ```
        ExecStart=/usr/local/bin/victoria-metrics-prod -storageDataPath=/var/lib/victoria-metrics -retentionPeriod=90d -selfScrapeInterval=1m -promscrape.config=/etc/victoria-metrics/prometheus.yml
        ```

- grafana

    followed `https://grafana.com/docs/grafana/latest/setup-grafana/installation/debian/#install-grafana-on-debian-or-ubuntu`

    something something something then

    ```
    sudo grafana-cli --pluginUrl https://github.com/VictoriaMetrics/victoriametrics-datasource/releases/download/v0.11.1/victoriametrics-datasource-v0.11.1.zip plugins install victoriametrics
    ```

- raspi node_exporter

    ```bash
    curl -LO https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-armv7.tar.gz
    tar xzf node_exporter-1.8.2.linux-armv7.tar.gz
    sudo cp node_exporter-1.8.2.linux-armv7/node_exporter /usr/local/bin/
    sudo useradd --no-create-home --shell /bin/false node_exporter
    sudo nano /etc/systemd/system/node_exporter.service
      # [Unit]
      # Description=Node Exporter
      # Wants=network-online.target
      # After=network-online.target

      # [Service]
      # User=node_exporter
      # Group=node_exporter
      # Type=simple
      # ExecStart=/usr/local/bin/node_exporter
      # Restart=always
      # RestartSec=3

      # [Install]
      # WantedBy=multi-user.target
    sudo systemctl daemon-reload
    sudo systemctl enable node_exporter.service
    sudo systemctl start node_exporter.service
    ```

    todo: get raspi vcgencmd outputs into metrics

- nginx on gateway

    ```nginx
    # in http

    ##
    # cozy cache
    ##
    proxy_cache_path /var/cache/nginx keys_zone=cozy_zone:10m;

    ##
    # cozy limit
    ##
    limit_req_zone $binary_remote_addr zone=cozy_ip_limit:10m rate=50r/s;
    limit_req_zone $server_name zone=cozy_global_limit:10m rate=1000r/s;

    # in sites-available/constellation.microcosm.blue

    upstream cozy_link_aggregator {
      server link-aggregator:6789;
      keepalive 16;
    }

    server {
      listen 8080;
      listen [::]:8080;

      server_name constellation.microcosm.blue;

      proxy_cache cozy_zone;
      proxy_cache_background_update on;
      proxy_cache_key "$scheme$proxy_host$uri$is_args$args$http_accept";
      proxy_cache_lock on; # make simlutaneous requests for the same uri wait for it to appear in cache instead of hitting origin
      proxy_cache_lock_age 1s;
      proxy_cache_lock_timeout 2s;
      proxy_cache_valid 10s; # default -- should be explicitly set in the response headers
      proxy_cache_valid any 15s; # non-200s default
      proxy_read_timeout 5s;
      proxy_send_timeout 15s;
      proxy_socket_keepalive on;

      limit_req zone=cozy_ip_limit nodelay burst=100;
      limit_req zone=cozy_global_limit;
      limit_req_status 429;

      location / {
        proxy_pass http://cozy_link_aggregator;
        include proxy_params;
        proxy_http_version 1.1;
        proxy_set_header Connection ""; # for keepalive
      }
    }
    ```

    also `systemctl edit nginx` and paste

    ```
    [Service]
    Restart=always
    ```

    —https://serverfault.com/a/1003373

    now making browsers redirect to the microcosm.blue url:

    ```
      [...]
      server_name links.bsky.bad-example.com;

      add_header Access-Control-Allow-Origin * always; # bit of hack to have it here but nginx doesn't like it in the `if`
      if ($http_user_agent ~ ^Mozilla/) {
        # for now send *browsers* to the new location, hopefully without impacting api requests
        # (yeah we're doing UA test here and content-negotatiation in the app. whatever.)
        return 301 https://constellation.microcosm.blue$request_uri;
      }
      [...]
    ```

- nginx metrics

  - download nginx-prometheus-exporter
    https://github.com/nginx/nginx-prometheus-exporter/releases/download/v1.4.1/nginx-prometheus-exporter_1.4.1_linux_amd64.tar.gz

  - err actually going to make mistakes and try with snap
    `snap install nginx-prometheus-exporter`
      - so it got a binary for me but no systemd task set up. boooo.
      `snap remove nginx-prometheus-exporter`

  - ```bash
    curl -LO https://github.com/nginx/nginx-prometheus-exporter/releases/download/v1.4.1/nginx-prometheus-exporter_1.4.1_linux_amd64.tar.gz
    tar xzf nginx-prometheus-exporter_1.4.1_linux_amd64.tar.gz
    mv nginx-prometheus-exporter /usr/local/bin
    useradd --no-create-home --shell /bin/false nginx-prometheus-exporter
    nano /etc/systemd/system/nginx-prometheus-exporter.service
      # [Unit]
      # Description=NGINX Exporter
      # Wants=network-online.target
      # After=network-online.target

      # [Service]
      # User=nginx-prometheus-exporter
      # Group=nginx-prometheus-exporter
      # Type=simple
      # ExecStart=/usr/local/bin/nginx-prometheus-exporter --nginx.scrape-uri=http://gateway:8080/stub_status  --web.listen-address=gateway:9113
      # Restart=always
      # RestartSec=3

      # [Install]
      # WantedBy=multi-user.target
    systemctl daemon-reload
    systemctl start nginx-prometheus-exporter.service
    systemctl enable nginx-prometheus-exporter.service
    ```

    - nginx `/etc/nginx/sites-available/gateway-nginx-status`

      ```nginx
      server {
        listen 8080;
        listen [::]:8080;

        server_name gateway;

        location /stub_status {
                stub_status;
        }
        location / {
                return 404;
        }
      }
      ```

      ```bash
      ln -s /etc/nginx/sites-available/gateway-nginx-status /etc/nginx/sites-enabled/
      ```


## bootes (pi5)

- mount sd card, touch `ssh` file echo `echo "pi:$(echo raspberry | openssl passwd -6 -stdin)" > userconf.txt`
- raspi-config: enable pcie 3, set hostname, enable ssh
- put ssh key into `.ssh/authorized_keys`
- put `PasswordAuthentication no` in `/etc/ssh/sshd_config`
- `sudo apt update && sudo apt upgrade`
- `sudo apt install xfsprogs`
- `sudo mkfs.xfs -L c11n-kv /dev/nvme0n1`
- `sudo mount /dev/nvme0n1 /mnt`
- set up tailscale
- `sudo tailscale up`
- `git clone https://github.com/atcosm/links.git`
- tailscale: disable bootes key expiry
- rustup `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`
- `cd links/constellation`
- `sudo apt install libssl-dev` needed
- `sudo apt install clang` needed for bindgen
- (in tmux) `cargo build --release`
- `mkdir ~/backup`
- `sudo mount.cifs "//truenas.local/folks data" /home/pi/backup -o user=phil,uid=pi`
- `sudo chown pi:pi /mnt/`
- `RUST_BACKTRACE=full cargo run --bin rocks-restore-from-backup --release -- --from-backup-dir "/home/pi/backup/constellation-index" --to-data-dir /mnt/constellation-index`
etc
- follow above `- raspi node_exporter`
- configure victoriametrics to scrape the new pi
- configure ulimit before starting! `ulimit -n 16384`
- `RUST_BACKTRACE=full cargo run --release -- --backend rocks --data /mnt/constellation-index/ --jetstream us-east-2 --backup /home/pi/backup/constellation-index --backup-interval 6 --max-old-backups 20`
- add server to nginx gateway upstream: `        server 100.123.79.12:6789;  # bootes`
- stop backups from running on the older instance! `RUST_BACKTRACE=full cargo run --release -- --backend rocks --data /mnt/links-2.rocks/ --jetstream us-east-1`
- stop upstreaming requests to older instance in nginx


- systemd unit for running: `sudo nano /etc/systemd/system/constellation.service`

    ```ini
    [Unit]
    Description=Constellation backlinks index
    After=network.target

    [Service]
    User=pi
    WorkingDirectory=/home/pi/links/constellation
    ExecStart=/home/pi/links/target/release/main --backend rocks --data /mnt/constellation-index/ --jetstream us-east-2 --backup /home/pi/backup/constellation-index --backup-interval 6 --max-old-backups 20
    LimitNOFILE=16384
    Restart=always

    [Install]
    WantedBy=multi-user.target
    ```


- todo: overlayfs? would need to figure out builds/updates still, also i guess logs are currently written to sd? (oof)
- todo: cross-compile for raspi?

---

some todos

- [x] tailscale: exit node
  - [!] link_aggregator: use exit node
    -> worked, but reverted for now: tailscale on raspi was consuming ~50% cpu for the jetstream traffic. this might be near its max since it would have been catching up at the time (max jetstream throughput) but it feels a bit too much. we have to trust the jetstream server and link_aggregator doesn't (yet) make any other external connections, so for now the raspi connects directly from my home again.
- [x] caddy: reverse proxy
  - [x] build with cache and rate-limit plugins
  - [x] configure systemd to keep it alive
- [x] configure caddy cache
- [x] configure caddy rate-limit
- [ ] configure ~caddy~ nginx to use a health check (once it's added)
- [ ] ~configure caddy to only expose cache metrics to tailnet :/~
- [x] make some grafana dashboards
- [ ] raspi: mount /dev/sda on boot
- [ ] raspi: run link_aggregator via systemd so it starts on startup (and restarts?)

- [x] use nginx instead of caddy
- [x] nginx: enable cache
- [x] nginx: rate-limit
- [ ] nginx: get metrics


---

nginx cors for constellation + small burst bump

```nginx
upstream cozy_constellation {
  server <tailnet ip>:6789;  # bootes; ip so that we don't race on reboot with tailscale coming up, which nginx doesn't like
  keepalive 16;
}

server {
  server_name constellation.microcosm.blue;

  proxy_cache cozy_zone;
  proxy_cache_background_update on;
  proxy_cache_key "$scheme$proxy_host$uri$is_args$args$http_accept";
  proxy_cache_lock on; # make simlutaneous requests for the same uri wait for it to appear in cache instead of hitting origin
  proxy_cache_lock_age 1s;
  proxy_cache_lock_timeout 2s;
  proxy_cache_valid 10s; # default -- should be explicitly set in the response headers
  proxy_cache_valid any 2s; # non-200s default
  proxy_read_timeout 5s;
  proxy_send_timeout 15s;
  proxy_socket_keepalive on;

  # take over cors responsibility from upsteram. `always` applies it to error responses.
  proxy_hide_header 'Access-Control-Allow-Origin';
  proxy_hide_header 'Access-Control-Allowed-Methods';
  proxy_hide_header 'Access-Control-Allow-Headers';
  add_header 'Access-Control-Allow-Origin' '*' always;
  add_header 'Access-Control-Allow-Methods' 'GET' always;
  add_header 'Access-Control-Allow-Headers' '*' always;


  limit_req zone=cozy_ip_limit nodelay burst=150;
  limit_req zone=cozy_global_limit burst=1800;
  limit_req_status 429;

  location / {
    proxy_pass http://cozy_constellation;
    include proxy_params;
    proxy_http_version 1.1;
    proxy_set_header Connection ""; # for keepalive
  }


    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/constellation.microcosm.blue/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/constellation.microcosm.blue/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

}

server {
    if ($host = constellation.microcosm.blue) {
        return 301 https://$host$request_uri;
    } # managed by Certbot


  server_name constellation.microcosm.blue;
    listen 80;
    return 404; # managed by Certbot
}
```

re-reading about `nodelay`, i should probably remove it -- nginx would then queue requests to upstream, but still service them at the configured limit. it's fine for my internet since the global limit isn't nodelay, but probably less "fair" to clients if there's contention around the global limit (earlier requests would get all of theirs serviced before later ones can get in the queue)

leaving it for now though.


### nginx logs to prom

```bash
curl -LO https://github.com/martin-helmich/prometheus-nginxlog-exporter/releases/download/v1.11.0/prometheus-nginxlog-exporter_1.11.0_linux_amd64.deb
apt install ./prometheus-nginxlog-exporter_1.11.0_linux_amd64.deb
systemctl enable prometheus-nginxlog-exporter.service

```

have it run as www-data (maybe not the best idea but...)
file `/usr/lib/systemd/system/prometheus-nginxlog-exporter.service`
set User under service and remove capabilities bounding

```systemd
User=www-data
#CapabilityBoundingSet=
```

in `nginx.conf` in `http`:

```nginx
log_format constellation_format "$remote_addr - $remote_user [$time_local] \"$request\" $status $body_bytes_sent \"$http_referer\" \"$http_user_agent\" \"$http_x_forwarded_for\"";
```

in `sites-available/constellation.microcosm.blue` in `server`:

```nginx
# log format must match prometheus-nginx-log-exporter
access_log /var/log/nginx/constellation-access.log constellation_format;
```

config at `/etc/prometheus-nginxlog-exporter.hcl`


```bash
systemctl start prometheus-nginxlog-exporter.service
```