Manage Atom feeds in a persistent git repository
Python 100.0%
20 3 0

Clone this repository

https://tangled.org/anil.recoil.org/thicket
git@git.recoil.org:anil.recoil.org/thicket

For self-hosted knots, clone URLs may differ based on your setup.

README.md

Thicket#

A modern CLI tool for persisting Atom/RSS feeds in Git repositories, designed to enable distributed webblog comment structures.

Features#

  • Feed Auto-Discovery: Automatically extracts user metadata from Atom/RSS feeds
  • Git Storage: Stores feed entries in a Git repository with full history
  • Duplicate Management: Manual curation of duplicate entries across feeds
  • Modern CLI: Built with Typer and Rich for beautiful terminal output
  • Comprehensive Parsing: Supports RSS 0.9x, RSS 1.0, RSS 2.0, and Atom feeds
  • Zulip Bot Integration: Automatically post new feed articles to Zulip chat
  • Cron-Friendly: Designed for scheduled execution

Installation#

# Install from source
pip install -e .

# Or install with dev dependencies
pip install -e .[dev]

Quick Start#

  1. Initialize a new thicket repository:
thicket init ./my-feeds
  1. Add a user with their feed:
thicket add user "alice" --feed "https://alice.example.com/feed.xml"
  1. Sync feeds to download entries:
thicket sync --all
  1. List users and feeds:
thicket list users
thicket list feeds
thicket list entries

Commands#

Initialize#

thicket init <git-store-path> [--cache-dir <path>] [--config <config-file>]

Add Users and Feeds#

# Add user with auto-discovery
thicket add user "username" --feed "https://example.com/feed.xml"

# Add user with manual metadata
thicket add user "username" \
  --feed "https://example.com/feed.xml" \
  --email "user@example.com" \
  --homepage "https://example.com" \
  --display-name "User Name"

# Add additional feed to existing user
thicket add feed "username" "https://example.com/other-feed.xml"

Sync Feeds#

# Sync all users
thicket sync --all

# Sync specific user
thicket sync --user "username"

# Dry run (preview changes)
thicket sync --all --dry-run

List Information#

# List all users
thicket list users

# List all feeds
thicket list feeds

# List feeds for specific user
thicket list feeds --user "username"

# List recent entries
thicket list entries --limit 20

# List entries for specific user
thicket list entries --user "username"

Manage Duplicates#

# List duplicate mappings
thicket duplicates list

# Mark entries as duplicates
thicket duplicates add "https://example.com/dup" "https://example.com/canonical"

# Remove duplicate mapping
thicket duplicates remove "https://example.com/dup"

Zulip Bot Integration#

# Test bot functionality
thicket bot test

# Show bot status 
thicket bot status

# Run bot (requires configuration)
thicket bot run --config bot-config/zuliprc

Bot Setup:

  1. Create a Zulip bot in your organization
  2. Copy bot-config/zuliprc.template to bot-config/zuliprc
  3. Configure with your bot's credentials
  4. Run the bot and configure via Zulip chat:
    @thicket config path /path/to/thicket.yaml
    @thicket config stream general
    @thicket config topic "Feed Updates"
    

See docs/ZULIP_BOT.md for detailed setup instructions.

Configuration#

Thicket uses a YAML configuration file (default: thicket.yaml):

git_store: ./feeds-repo
cache_dir: ~/.cache/thicket
users:
  - username: alice
    feeds:
      - https://alice.example.com/feed.xml
    email: alice@example.com
    homepage: https://alice.example.com
    display_name: Alice

Git Repository Structure#

feeds-repo/
├── index.json              # User directory index
├── duplicates.json         # Duplicate entry mappings
├── alice/
│   ├── metadata.json       # User metadata
│   ├── entry_id_1.json     # Feed entries
│   └── entry_id_2.json
└── bob/
    └── ...

Development#

Setup#

# Install in development mode
pip install -e .[dev]

# Run tests
pytest

# Run linting
ruff check src/
black --check src/

# Run type checking
mypy src/

Architecture#

  • CLI: Modern interface with Typer and Rich
  • Feed Processing: Universal parsing with feedparser
  • Git Storage: Structured storage with GitPython
  • Data Models: Pydantic for validation and serialization
  • Async HTTP: httpx for efficient feed fetching

Use Cases#

  • Blog Aggregation: Collect and archive blog posts from multiple sources
  • Comment Networks: Enable distributed commenting systems
  • Feed Archival: Preserve feed history beyond typical feed depth limits
  • Content Curation: Manage and deduplicate content across feeds

License#

MIT License - see LICENSE file for details.