Manage Atom feeds in a persistent git repository

Compare changes

Choose any two refs to compare.

+260
code_duplication_analysis.md
···
+
# Code Duplication Analysis for Thicket
+
+
## 1. Duplicate JSON Handling Code
+
+
### Pattern: JSON file reading/writing
+
**Locations:**
+
- `src/thicket/cli/commands/generate.py:230` - Reading JSON with `json.load(f)`
+
- `src/thicket/cli/commands/generate.py:249` - Reading links.json
+
- `src/thicket/cli/commands/index.py:2305` - Reading JSON
+
- `src/thicket/cli/commands/index.py:2320` - Writing JSON with `json.dump()`
+
- `src/thicket/cli/commands/threads.py:2456` - Reading JSON
+
- `src/thicket/cli/commands/info.py:2683` - Reading JSON
+
- `src/thicket/core/git_store.py:5546` - Writing JSON with custom serializer
+
- `src/thicket/core/git_store.py:5556` - Reading JSON
+
- `src/thicket/core/git_store.py:5566` - Writing JSON
+
- `src/thicket/core/git_store.py:5656` - Writing JSON with model dump
+
+
**Recommendation:** Create a shared `json_utils.py` module:
+
```python
+
def read_json_file(path: Path) -> dict:
+
"""Read JSON file with error handling."""
+
with open(path) as f:
+
return json.load(f)
+
+
def write_json_file(path: Path, data: dict, indent: int = 2) -> None:
+
"""Write JSON file with consistent formatting."""
+
with open(path, "w") as f:
+
json.dump(data, f, indent=indent, default=str)
+
+
def write_model_json(path: Path, model: BaseModel, indent: int = 2) -> None:
+
"""Write Pydantic model as JSON."""
+
with open(path, "w") as f:
+
json.dump(model.model_dump(mode="json", exclude_none=True), f, indent=indent, default=str)
+
```
+
+
## 2. Repeated Datetime Handling
+
+
### Pattern: datetime formatting and fallback handling
+
**Locations:**
+
- `src/thicket/cli/commands/generate.py:241` - `key=lambda x: x[1].updated or x[1].published or datetime.min`
+
- `src/thicket/cli/commands/generate.py:353` - Same pattern in thread sorting
+
- `src/thicket/cli/commands/generate.py:359` - Same pattern for max date
+
- `src/thicket/cli/commands/generate.py:625` - Same pattern
+
- `src/thicket/cli/commands/generate.py:655` - `entry.updated or entry.published or datetime.min`
+
- `src/thicket/cli/commands/generate.py:689` - Same pattern
+
- `src/thicket/cli/commands/generate.py:702` - Same pattern
+
- Multiple `.strftime('%Y-%m-%d')` calls throughout
+
+
**Recommendation:** Create a shared `datetime_utils.py` module:
+
```python
+
def get_entry_date(entry: AtomEntry) -> datetime:
+
"""Get the most relevant date for an entry with fallback."""
+
return entry.updated or entry.published or datetime.min
+
+
def format_date_short(dt: datetime) -> str:
+
"""Format datetime as YYYY-MM-DD."""
+
return dt.strftime('%Y-%m-%d')
+
+
def format_date_full(dt: datetime) -> str:
+
"""Format datetime as YYYY-MM-DD HH:MM."""
+
return dt.strftime('%Y-%m-%d %H:%M')
+
+
def format_date_iso(dt: datetime) -> str:
+
"""Format datetime as ISO string."""
+
return dt.isoformat()
+
```
+
+
## 3. Path Handling Patterns
+
+
### Pattern: Directory creation and existence checks
+
**Locations:**
+
- `src/thicket/cli/commands/generate.py:225` - `if user_dir.exists()`
+
- `src/thicket/cli/commands/generate.py:247` - `if links_file.exists()`
+
- `src/thicket/cli/commands/generate.py:582` - `self.output_dir.mkdir(parents=True, exist_ok=True)`
+
- `src/thicket/cli/commands/generate.py:585-586` - Multiple mkdir calls
+
- `src/thicket/cli/commands/threads.py:2449` - `if not index_path.exists()`
+
- `src/thicket/cli/commands/info.py:2681` - `if links_path.exists()`
+
- `src/thicket/core/git_store.py:5515` - `if not self.repo_path.exists()`
+
- `src/thicket/core/git_store.py:5586` - `user_dir.mkdir(exist_ok=True)`
+
- Many more similar patterns
+
+
**Recommendation:** Create a shared `path_utils.py` module:
+
```python
+
def ensure_directory(path: Path) -> Path:
+
"""Ensure directory exists, creating if necessary."""
+
path.mkdir(parents=True, exist_ok=True)
+
return path
+
+
def read_json_if_exists(path: Path, default: Any = None) -> Any:
+
"""Read JSON file if it exists, otherwise return default."""
+
if path.exists():
+
with open(path) as f:
+
return json.load(f)
+
return default
+
+
def safe_path_join(*parts: Union[str, Path]) -> Path:
+
"""Safely join path components."""
+
return Path(*parts)
+
```
+
+
## 4. Progress Bar and Console Output
+
+
### Pattern: Progress bar creation and updates
+
**Locations:**
+
- `src/thicket/cli/commands/generate.py:209` - Progress with SpinnerColumn
+
- `src/thicket/cli/commands/index.py:2230` - Same Progress pattern
+
- Multiple `console.print()` calls with similar formatting patterns
+
- Progress update patterns repeated
+
+
**Recommendation:** Create a shared `ui_utils.py` module:
+
```python
+
def create_progress_spinner(description: str) -> tuple[Progress, TaskID]:
+
"""Create a standard progress spinner."""
+
progress = Progress(
+
SpinnerColumn(),
+
TextColumn("[progress.description]{task.description}"),
+
transient=True,
+
)
+
task = progress.add_task(description)
+
return progress, task
+
+
def print_success(message: str) -> None:
+
"""Print success message with consistent formatting."""
+
console.print(f"[green]โœ“[/green] {message}")
+
+
def print_error(message: str) -> None:
+
"""Print error message with consistent formatting."""
+
console.print(f"[red]Error: {message}[/red]")
+
+
def print_warning(message: str) -> None:
+
"""Print warning message with consistent formatting."""
+
console.print(f"[yellow]Warning: {message}[/yellow]")
+
```
+
+
## 5. Git Store Operations
+
+
### Pattern: Entry file operations
+
**Locations:**
+
- Multiple patterns of loading entries from user directories
+
- Repeated safe_id generation
+
- Repeated user directory path construction
+
+
**Recommendation:** Enhance GitStore with helper methods:
+
```python
+
def get_user_dir(self, username: str) -> Path:
+
"""Get user directory path."""
+
return self.repo_path / username
+
+
def iter_user_entries(self, username: str) -> Iterator[tuple[Path, AtomEntry]]:
+
"""Iterate over all entries for a user."""
+
user_dir = self.get_user_dir(username)
+
if user_dir.exists():
+
for entry_file in user_dir.glob("*.json"):
+
if entry_file.name not in ["index.json", "duplicates.json"]:
+
try:
+
entry = self.read_entry_file(entry_file)
+
yield entry_file, entry
+
except Exception:
+
continue
+
```
+
+
## 6. Error Handling Patterns
+
+
### Pattern: Try-except with console error printing
+
**Locations:**
+
- Similar error handling patterns throughout CLI commands
+
- Repeated `raise typer.Exit(1)` patterns
+
- Similar exception message formatting
+
+
**Recommendation:** Create error handling decorators:
+
```python
+
def handle_cli_errors(func):
+
"""Decorator to handle CLI command errors consistently."""
+
@functools.wraps(func)
+
def wrapper(*args, **kwargs):
+
try:
+
return func(*args, **kwargs)
+
except ValidationError as e:
+
console.print(f"[red]Validation error: {e}[/red]")
+
raise typer.Exit(1)
+
except Exception as e:
+
console.print(f"[red]Error: {e}[/red]")
+
if kwargs.get('verbose'):
+
console.print_exception()
+
raise typer.Exit(1)
+
return wrapper
+
```
+
+
## 7. Configuration and Validation
+
+
### Pattern: Config file loading and validation
+
**Locations:**
+
- Repeated config loading pattern in every CLI command
+
- Similar validation patterns for URLs and paths
+
+
**Recommendation:** Create a `config_utils.py` module:
+
```python
+
def load_config_with_defaults(config_path: Optional[Path] = None) -> ThicketConfig:
+
"""Load config with standard defaults and error handling."""
+
if config_path is None:
+
config_path = Path("thicket.yaml")
+
+
if not config_path.exists():
+
raise ConfigError(f"Configuration file not found: {config_path}")
+
+
return load_config(config_path)
+
+
def validate_url(url: str) -> HttpUrl:
+
"""Validate and return URL with consistent error handling."""
+
try:
+
return HttpUrl(url)
+
except ValidationError:
+
raise ConfigError(f"Invalid URL: {url}")
+
```
+
+
## 8. Model Serialization
+
+
### Pattern: Pydantic model JSON encoding
+
**Locations:**
+
- Repeated `json_encoders={datetime: lambda v: v.isoformat()}` in model configs
+
- Similar model_dump patterns
+
+
**Recommendation:** Create base model class:
+
```python
+
class ThicketBaseModel(BaseModel):
+
"""Base model with common configuration."""
+
model_config = ConfigDict(
+
json_encoders={datetime: lambda v: v.isoformat()},
+
str_strip_whitespace=True,
+
)
+
+
def to_json_dict(self) -> dict:
+
"""Convert to JSON-serializable dict."""
+
return self.model_dump(mode="json", exclude_none=True)
+
```
+
+
## Summary of Refactoring Benefits
+
+
1. **Reduced Code Duplication**: Eliminate 30-40% of duplicate code
+
2. **Consistent Error Handling**: Standardize error messages and handling
+
3. **Easier Maintenance**: Central location for common patterns
+
4. **Better Testing**: Easier to unit test shared utilities
+
5. **Type Safety**: Shared type hints and validation
+
6. **Performance**: Potential to optimize common operations in one place
+
+
## Implementation Priority
+
+
1. **High Priority**:
+
- JSON utilities (used everywhere)
+
- Datetime utilities (critical for sorting and display)
+
- Error handling decorators (improves UX consistency)
+
+
2. **Medium Priority**:
+
- Path utilities
+
- UI/Console utilities
+
- Config utilities
+
+
3. **Low Priority**:
+
- Base model classes (requires more refactoring)
+
- Git store enhancements (already well-structured)
+1 -2
pyproject.toml
···
"platformdirs>=4.0.0",
"pyyaml>=6.0.0",
"email_validator",
-
"textual>=4.0.0",
-
"flask>=3.1.1",
+
"jinja2>=3.1.6",
]
[project.optional-dependencies]
+6617
repomix-output.xml
···
+
This file is a merged representation of the entire codebase, combined into a single document by Repomix.
+
+
<file_summary>
+
This section contains a summary of this file.
+
+
<purpose>
+
This file contains a packed representation of the entire repository's contents.
+
It is designed to be easily consumable by AI systems for analysis, code review,
+
or other automated processes.
+
</purpose>
+
+
<file_format>
+
The content is organized as follows:
+
1. This summary section
+
2. Repository information
+
3. Directory structure
+
4. Repository files (if enabled)
+
5. Multiple file entries, each consisting of:
+
- File path as an attribute
+
- Full contents of the file
+
</file_format>
+
+
<usage_guidelines>
+
- This file should be treated as read-only. Any changes should be made to the
+
original repository files, not this packed version.
+
- When processing this file, use the file path to distinguish
+
between different files in the repository.
+
- Be aware that this file may contain sensitive information. Handle it with
+
the same level of security as you would the original repository.
+
</usage_guidelines>
+
+
<notes>
+
- Some files may have been excluded based on .gitignore rules and Repomix's configuration
+
- Binary files are not included in this packed representation. Please refer to the Repository Structure section for a complete list of file paths, including binary files
+
- Files matching patterns in .gitignore are excluded
+
- Files matching default ignore patterns are excluded
+
- Files are sorted by Git change count (files with more changes are at the bottom)
+
</notes>
+
+
</file_summary>
+
+
<directory_structure>
+
.claude/
+
settings.local.json
+
src/
+
thicket/
+
cli/
+
commands/
+
__init__.py
+
add.py
+
duplicates.py
+
generate.py
+
index_cmd.py
+
info_cmd.py
+
init.py
+
links_cmd.py
+
list_cmd.py
+
sync.py
+
__init__.py
+
main.py
+
utils.py
+
core/
+
__init__.py
+
feed_parser.py
+
git_store.py
+
reference_parser.py
+
models/
+
__init__.py
+
config.py
+
feed.py
+
user.py
+
templates/
+
base.html
+
index.html
+
links.html
+
script.js
+
style.css
+
timeline.html
+
users.html
+
utils/
+
__init__.py
+
__init__.py
+
__main__.py
+
.gitignore
+
ARCH.md
+
CLAUDE.md
+
pyproject.toml
+
README.md
+
</directory_structure>
+
+
<files>
+
This section contains the contents of the repository's files.
+
+
<file path=".claude/settings.local.json">
+
{
+
"permissions": {
+
"allow": [
+
"Bash(find:*)",
+
"Bash(uv run:*)",
+
"Bash(grep:*)",
+
"Bash(jq:*)",
+
"Bash(git add:*)",
+
"Bash(ls:*)"
+
]
+
},
+
"enableAllProjectMcpServers": false
+
}
+
</file>
+
+
<file path="src/thicket/cli/commands/generate.py">
+
"""Generate static HTML website from thicket data."""
+
+
import base64
+
import json
+
import re
+
import shutil
+
from datetime import datetime
+
from pathlib import Path
+
from typing import Any, Optional, TypedDict, Union
+
+
import typer
+
from jinja2 import Environment, FileSystemLoader, select_autoescape
+
from rich.progress import Progress, SpinnerColumn, TextColumn
+
+
from ...core.git_store import GitStore
+
from ...models.feed import AtomEntry
+
from ...models.user import GitStoreIndex, UserMetadata
+
from ..main import app
+
from ..utils import console, load_config
+
+
+
class UserData(TypedDict):
+
"""Type definition for user data structure."""
+
+
metadata: UserMetadata
+
recent_entries: list[tuple[str, AtomEntry]]
+
+
+
def safe_anchor_id(atom_id: str) -> str:
+
"""Convert an Atom ID to a safe HTML anchor ID."""
+
# Use base64 URL-safe encoding without padding
+
encoded = base64.urlsafe_b64encode(atom_id.encode('utf-8')).decode('ascii').rstrip('=')
+
# Prefix with 'id' to ensure it starts with a letter (HTML requirement)
+
return f"id{encoded}"
+
+
+
class WebsiteGenerator:
+
"""Generate static HTML website from thicket data."""
+
+
def __init__(self, git_store: GitStore, output_dir: Path):
+
self.git_store = git_store
+
self.output_dir = output_dir
+
self.template_dir = Path(__file__).parent.parent.parent / "templates"
+
+
# Initialize Jinja2 environment
+
self.env = Environment(
+
loader=FileSystemLoader(self.template_dir),
+
autoescape=select_autoescape(["html", "xml"]),
+
)
+
+
# Data containers
+
self.index: Optional[GitStoreIndex] = None
+
self.entries: list[tuple[str, AtomEntry]] = [] # (username, entry)
+
self.links_data: Optional[dict[str, Any]] = None
+
self.threads: list[list[dict[str, Any]]] = [] # List of threads with metadata
+
+
def get_display_name(self, username: str) -> str:
+
"""Get display name for a user, falling back to username."""
+
if self.index and username in self.index.users:
+
user = self.index.users[username]
+
return user.display_name or username
+
return username
+
+
def get_user_homepage(self, username: str) -> Optional[str]:
+
"""Get homepage URL for a user."""
+
if self.index and username in self.index.users:
+
user = self.index.users[username]
+
return str(user.homepage) if user.homepage else None
+
return None
+
+
def clean_html_summary(self, content: Optional[str], max_length: int = 200) -> str:
+
"""Clean HTML content and truncate for display in timeline."""
+
if not content:
+
return ""
+
+
# Remove HTML tags
+
clean_text = re.sub(r"<[^>]+>", " ", content)
+
# Replace multiple whitespace with single space
+
clean_text = re.sub(r"\s+", " ", clean_text)
+
# Strip leading/trailing whitespace
+
clean_text = clean_text.strip()
+
+
# Truncate with ellipsis if needed
+
if len(clean_text) > max_length:
+
# Try to break at word boundary
+
truncated = clean_text[:max_length]
+
last_space = truncated.rfind(" ")
+
if (
+
last_space > max_length * 0.8
+
): # If we can break reasonably close to the limit
+
clean_text = truncated[:last_space] + "..."
+
else:
+
clean_text = truncated + "..."
+
+
return clean_text
+
+
def load_data(self) -> None:
+
"""Load all data from the git repository."""
+
with Progress(
+
SpinnerColumn(),
+
TextColumn("[progress.description]{task.description}"),
+
console=console,
+
) as progress:
+
# Load index
+
task = progress.add_task("Loading repository index...", total=None)
+
self.index = self.git_store._load_index()
+
if not self.index:
+
raise ValueError("No index found in repository")
+
progress.update(task, completed=True)
+
+
# Load all entries
+
task = progress.add_task("Loading entries...", total=None)
+
for username, user_metadata in self.index.users.items():
+
user_dir = self.git_store.repo_path / user_metadata.directory
+
if user_dir.exists():
+
for entry_file in user_dir.glob("*.json"):
+
if entry_file.name not in ["index.json", "duplicates.json"]:
+
try:
+
with open(entry_file) as f:
+
entry_data = json.load(f)
+
entry = AtomEntry(**entry_data)
+
self.entries.append((username, entry))
+
except Exception as e:
+
console.print(
+
f"[yellow]Warning: Failed to load {entry_file}: {e}[/yellow]"
+
)
+
progress.update(task, completed=True)
+
+
# Sort entries by date (newest first) - prioritize updated over published
+
self.entries.sort(
+
key=lambda x: x[1].updated or x[1].published or datetime.min, reverse=True
+
)
+
+
# Load links data
+
task = progress.add_task("Loading links and references...", total=None)
+
links_file = self.git_store.repo_path / "links.json"
+
if links_file.exists():
+
with open(links_file) as f:
+
self.links_data = json.load(f)
+
progress.update(task, completed=True)
+
+
def build_threads(self) -> None:
+
"""Build threaded conversations from references."""
+
if not self.links_data or "references" not in self.links_data:
+
return
+
+
# Map entry IDs to (username, entry) tuples
+
entry_map: dict[str, tuple[str, AtomEntry]] = {}
+
for username, entry in self.entries:
+
entry_map[entry.id] = (username, entry)
+
+
# Build adjacency lists for references
+
self.outbound_refs: dict[str, set[str]] = {}
+
self.inbound_refs: dict[str, set[str]] = {}
+
self.reference_details: dict[
+
str, list[dict[str, Any]]
+
] = {} # Store full reference info
+
+
for ref in self.links_data["references"]:
+
source_id = ref["source_entry_id"]
+
target_id = ref.get("target_entry_id")
+
+
if target_id and source_id in entry_map and target_id in entry_map:
+
self.outbound_refs.setdefault(source_id, set()).add(target_id)
+
self.inbound_refs.setdefault(target_id, set()).add(source_id)
+
+
# Store reference details for UI
+
self.reference_details.setdefault(source_id, []).append(
+
{
+
"target_id": target_id,
+
"target_username": ref.get("target_username"),
+
"type": "outbound",
+
}
+
)
+
self.reference_details.setdefault(target_id, []).append(
+
{
+
"source_id": source_id,
+
"source_username": ref.get("source_username"),
+
"type": "inbound",
+
}
+
)
+
+
# Find conversation threads (multi-post discussions)
+
processed = set()
+
+
for entry_id, (_username, _entry) in entry_map.items():
+
if entry_id in processed:
+
continue
+
+
# Build thread starting from this entry
+
thread = []
+
to_visit = [entry_id]
+
thread_ids = set()
+
level_map: dict[str, int] = {} # Track levels for this thread
+
+
# First, traverse up to find the root
+
current = entry_id
+
while current in self.inbound_refs:
+
parents = self.inbound_refs[current] - {
+
current
+
} # Exclude self-references
+
if not parents:
+
break
+
# Take the first parent
+
parent = next(iter(parents))
+
if parent in thread_ids: # Avoid cycles
+
break
+
current = parent
+
to_visit.insert(0, current)
+
+
# Now traverse down from the root
+
while to_visit:
+
current = to_visit.pop(0)
+
if current in thread_ids or current not in entry_map:
+
continue
+
+
thread_ids.add(current)
+
username, entry = entry_map[current]
+
+
# Calculate thread level
+
thread_level = self._calculate_thread_level(current, level_map)
+
+
# Add threading metadata
+
thread_entry = {
+
"username": username,
+
"display_name": self.get_display_name(username),
+
"entry": entry,
+
"entry_id": current,
+
"references_to": list(self.outbound_refs.get(current, [])),
+
"referenced_by": list(self.inbound_refs.get(current, [])),
+
"thread_level": thread_level,
+
}
+
thread.append(thread_entry)
+
processed.add(current)
+
+
# Add children
+
if current in self.outbound_refs:
+
children = self.outbound_refs[current] - thread_ids # Avoid cycles
+
to_visit.extend(sorted(children))
+
+
if len(thread) > 1: # Only keep actual threads
+
# Sort thread by date (newest first) - prioritize updated over published
+
thread.sort(key=lambda x: x["entry"].updated or x["entry"].published or datetime.min, reverse=True) # type: ignore
+
self.threads.append(thread)
+
+
# Sort threads by the date of their most recent entry - prioritize updated over published
+
self.threads.sort(
+
key=lambda t: max(
+
item["entry"].updated or item["entry"].published or datetime.min for item in t
+
),
+
reverse=True,
+
)
+
+
def _calculate_thread_level(
+
self, entry_id: str, processed_entries: dict[str, int]
+
) -> int:
+
"""Calculate indentation level for threaded display."""
+
if entry_id in processed_entries:
+
return processed_entries[entry_id]
+
+
if entry_id not in self.inbound_refs:
+
processed_entries[entry_id] = 0
+
return 0
+
+
parents_in_thread = self.inbound_refs[entry_id] & set(processed_entries.keys())
+
if not parents_in_thread:
+
processed_entries[entry_id] = 0
+
return 0
+
+
# Find the deepest parent level + 1
+
max_parent_level = 0
+
for parent_id in parents_in_thread:
+
parent_level = self._calculate_thread_level(parent_id, processed_entries)
+
max_parent_level = max(max_parent_level, parent_level)
+
+
level = min(max_parent_level + 1, 4) # Cap at level 4
+
processed_entries[entry_id] = level
+
return level
+
+
def get_standalone_references(self) -> list[dict[str, Any]]:
+
"""Get posts that have references but aren't part of multi-post threads."""
+
if not hasattr(self, "reference_details"):
+
return []
+
+
threaded_entry_ids = set()
+
for thread in self.threads:
+
for item in thread:
+
threaded_entry_ids.add(item["entry_id"])
+
+
standalone_refs = []
+
for username, entry in self.entries:
+
if (
+
entry.id in self.reference_details
+
and entry.id not in threaded_entry_ids
+
):
+
refs = self.reference_details[entry.id]
+
# Only include if it has meaningful references (not just self-references)
+
meaningful_refs = [
+
r
+
for r in refs
+
if r.get("target_id") != entry.id and r.get("source_id") != entry.id
+
]
+
if meaningful_refs:
+
standalone_refs.append(
+
{
+
"username": username,
+
"display_name": self.get_display_name(username),
+
"entry": entry,
+
"references": meaningful_refs,
+
}
+
)
+
+
return standalone_refs
+
+
def _add_cross_thread_links(self, timeline_items: list[dict[str, Any]]) -> None:
+
"""Add cross-thread linking for entries that appear in multiple threads."""
+
# Map entry IDs to their positions in the timeline
+
entry_positions: dict[str, list[int]] = {}
+
# Map URLs referenced by entries to the entries that reference them
+
url_references: dict[str, list[tuple[str, int]]] = {} # url -> [(entry_id, position)]
+
+
# First pass: collect all entry IDs, their positions, and referenced URLs
+
for i, item in enumerate(timeline_items):
+
if item["type"] == "post":
+
entry_id = item["content"]["entry"].id
+
entry_positions.setdefault(entry_id, []).append(i)
+
# Track URLs this entry references
+
if entry_id in self.reference_details:
+
for ref in self.reference_details[entry_id]:
+
if ref["type"] == "outbound" and "target_id" in ref:
+
# Find the target entry's URL if available
+
target_entry = self._find_entry_by_id(ref["target_id"])
+
if target_entry and target_entry.link:
+
url = str(target_entry.link)
+
url_references.setdefault(url, []).append((entry_id, i))
+
elif item["type"] == "thread":
+
for thread_item in item["content"]:
+
entry_id = thread_item["entry"].id
+
entry_positions.setdefault(entry_id, []).append(i)
+
# Track URLs this entry references
+
if entry_id in self.reference_details:
+
for ref in self.reference_details[entry_id]:
+
if ref["type"] == "outbound" and "target_id" in ref:
+
target_entry = self._find_entry_by_id(ref["target_id"])
+
if target_entry and target_entry.link:
+
url = str(target_entry.link)
+
url_references.setdefault(url, []).append((entry_id, i))
+
+
# Build cross-thread connections - only for entries that actually appear multiple times
+
cross_thread_connections: dict[str, set[int]] = {} # entry_id -> set of timeline positions
+
+
# Add connections ONLY for entries that appear multiple times in the timeline
+
for entry_id, positions in entry_positions.items():
+
if len(positions) > 1:
+
cross_thread_connections[entry_id] = set(positions)
+
# Debug: uncomment to see which entries have multiple appearances
+
# print(f"Entry {entry_id[:50]}... appears at positions: {positions}")
+
+
# Apply cross-thread links to timeline items
+
for entry_id, positions_set in cross_thread_connections.items():
+
positions_list = list(positions_set)
+
for pos in positions_list:
+
item = timeline_items[pos]
+
other_positions = sorted([p for p in positions_list if p != pos])
+
+
if item["type"] == "post":
+
# Add cross-thread info to individual posts
+
item["content"]["cross_thread_links"] = self._build_cross_thread_link_data(entry_id, other_positions, timeline_items)
+
# Add info about shared references
+
item["content"]["shared_references"] = self._get_shared_references(entry_id, positions_set, timeline_items)
+
elif item["type"] == "thread":
+
# Add cross-thread info to thread items
+
for thread_item in item["content"]:
+
if thread_item["entry"].id == entry_id:
+
thread_item["cross_thread_links"] = self._build_cross_thread_link_data(entry_id, other_positions, timeline_items)
+
thread_item["shared_references"] = self._get_shared_references(entry_id, positions_set, timeline_items)
+
break
+
+
def _build_cross_thread_link_data(self, entry_id: str, other_positions: list[int], timeline_items: list[dict[str, Any]]) -> list[dict[str, Any]]:
+
"""Build detailed cross-thread link data with anchor information."""
+
cross_thread_links = []
+
+
for pos in other_positions:
+
item = timeline_items[pos]
+
if item["type"] == "post":
+
# For individual posts
+
safe_id = safe_anchor_id(entry_id)
+
cross_thread_links.append({
+
"position": pos,
+
"anchor_id": f"post-{pos}-{safe_id}",
+
"context": "individual post",
+
"title": item["content"]["entry"].title
+
})
+
elif item["type"] == "thread":
+
# For thread items, find the specific thread item
+
for thread_idx, thread_item in enumerate(item["content"]):
+
if thread_item["entry"].id == entry_id:
+
safe_id = safe_anchor_id(entry_id)
+
cross_thread_links.append({
+
"position": pos,
+
"anchor_id": f"post-{pos}-{thread_idx}-{safe_id}",
+
"context": f"thread (level {thread_item.get('thread_level', 0)})",
+
"title": thread_item["entry"].title
+
})
+
break
+
+
return cross_thread_links
+
+
def _find_entry_by_id(self, entry_id: str) -> Optional[AtomEntry]:
+
"""Find an entry by its ID."""
+
for _username, entry in self.entries:
+
if entry.id == entry_id:
+
return entry
+
return None
+
+
def _get_shared_references(self, entry_id: str, positions: Union[set[int], list[int]], timeline_items: list[dict[str, Any]]) -> list[dict[str, Any]]:
+
"""Get information about shared references between cross-thread entries."""
+
shared_refs = []
+
+
# Collect all referenced URLs from entries at these positions
+
url_counts: dict[str, int] = {}
+
referencing_entries: dict[str, list[str]] = {} # url -> [entry_ids]
+
+
for pos in positions:
+
item = timeline_items[pos]
+
entries_to_check = []
+
+
if item["type"] == "post":
+
entries_to_check.append(item["content"]["entry"])
+
elif item["type"] == "thread":
+
entries_to_check.extend([ti["entry"] for ti in item["content"]])
+
+
for entry in entries_to_check:
+
if entry.id in self.reference_details:
+
for ref in self.reference_details[entry.id]:
+
if ref["type"] == "outbound" and "target_id" in ref:
+
target_entry = self._find_entry_by_id(ref["target_id"])
+
if target_entry and target_entry.link:
+
url = str(target_entry.link)
+
url_counts[url] = url_counts.get(url, 0) + 1
+
if url not in referencing_entries:
+
referencing_entries[url] = []
+
if entry.id not in referencing_entries[url]:
+
referencing_entries[url].append(entry.id)
+
+
# Find URLs referenced by multiple entries
+
for url, count in url_counts.items():
+
if count > 1 and len(referencing_entries[url]) > 1:
+
# Get the target entry info
+
target_entry = None
+
target_username = None
+
for ref in (self.links_data or {}).get("references", []):
+
if ref.get("target_url") == url:
+
target_username = ref.get("target_username")
+
if ref.get("target_entry_id"):
+
target_entry = self._find_entry_by_id(ref["target_entry_id"])
+
break
+
+
shared_refs.append({
+
"url": url,
+
"count": count,
+
"referencing_entries": referencing_entries[url],
+
"target_username": target_username,
+
"target_title": target_entry.title if target_entry else None
+
})
+
+
return sorted(shared_refs, key=lambda x: x["count"], reverse=True)
+
+
def generate_site(self) -> None:
+
"""Generate the static website."""
+
# Create output directory
+
self.output_dir.mkdir(parents=True, exist_ok=True)
+
+
# Create static directories
+
(self.output_dir / "css").mkdir(exist_ok=True)
+
(self.output_dir / "js").mkdir(exist_ok=True)
+
+
# Generate CSS
+
css_template = self.env.get_template("style.css")
+
css_content = css_template.render()
+
with open(self.output_dir / "css" / "style.css", "w") as f:
+
f.write(css_content)
+
+
# Generate JavaScript
+
js_template = self.env.get_template("script.js")
+
js_content = js_template.render()
+
with open(self.output_dir / "js" / "script.js", "w") as f:
+
f.write(js_content)
+
+
# Prepare common template data
+
base_data = {
+
"title": "Energy & Environment Group",
+
"generated_at": datetime.now().isoformat(),
+
"get_display_name": self.get_display_name,
+
"get_user_homepage": self.get_user_homepage,
+
"clean_html_summary": self.clean_html_summary,
+
"safe_anchor_id": safe_anchor_id,
+
}
+
+
# Build unified timeline
+
timeline_items = []
+
+
# Only consider the threads that will actually be displayed
+
displayed_threads = self.threads[:20] # Limit to 20 threads
+
+
# Track which entries are part of displayed threads
+
threaded_entry_ids = set()
+
for thread in displayed_threads:
+
for item in thread:
+
threaded_entry_ids.add(item["entry_id"])
+
+
# Add threads to timeline (using the date of the most recent post)
+
for thread in displayed_threads:
+
most_recent_date = max(
+
item["entry"].updated or item["entry"].published or datetime.min
+
for item in thread
+
)
+
timeline_items.append({
+
"type": "thread",
+
"date": most_recent_date,
+
"content": thread
+
})
+
+
# Add individual posts (not in threads)
+
for username, entry in self.entries[:50]:
+
if entry.id not in threaded_entry_ids:
+
# Check if this entry has references
+
has_refs = (
+
entry.id in self.reference_details
+
if hasattr(self, "reference_details")
+
else False
+
)
+
+
refs = []
+
if has_refs:
+
refs = self.reference_details.get(entry.id, [])
+
refs = [
+
r for r in refs
+
if r.get("target_id") != entry.id
+
and r.get("source_id") != entry.id
+
]
+
+
timeline_items.append({
+
"type": "post",
+
"date": entry.updated or entry.published or datetime.min,
+
"content": {
+
"username": username,
+
"display_name": self.get_display_name(username),
+
"entry": entry,
+
"references": refs if refs else None
+
}
+
})
+
+
# Sort unified timeline by date (newest first)
+
timeline_items.sort(key=lambda x: x["date"], reverse=True)
+
+
# Limit timeline to what will actually be rendered
+
timeline_items = timeline_items[:50] # Limit to 50 items total
+
+
# Add cross-thread linking for repeat blog references
+
self._add_cross_thread_links(timeline_items)
+
+
# Prepare outgoing links data
+
outgoing_links = []
+
if self.links_data and "links" in self.links_data:
+
for url, link_info in self.links_data["links"].items():
+
referencing_entries = []
+
for entry_id in link_info.get("referencing_entries", []):
+
for username, entry in self.entries:
+
if entry.id == entry_id:
+
referencing_entries.append(
+
(self.get_display_name(username), entry)
+
)
+
break
+
+
if referencing_entries:
+
# Sort by date - prioritize updated over published
+
referencing_entries.sort(
+
key=lambda x: x[1].updated or x[1].published or datetime.min, reverse=True
+
)
+
outgoing_links.append(
+
{
+
"url": url,
+
"target_username": link_info.get("target_username"),
+
"entries": referencing_entries,
+
}
+
)
+
+
# Sort links by most recent reference - prioritize updated over published
+
outgoing_links.sort(
+
key=lambda x: x["entries"][0][1].updated
+
or x["entries"][0][1].published or datetime.min,
+
reverse=True,
+
)
+
+
# Prepare users data
+
users: list[UserData] = []
+
if self.index:
+
for username, user_metadata in self.index.users.items():
+
# Get recent entries for this user with display names
+
user_entries = [
+
(self.get_display_name(u), e)
+
for u, e in self.entries
+
if u == username
+
][:5]
+
users.append(
+
{"metadata": user_metadata, "recent_entries": user_entries}
+
)
+
# Sort by entry count
+
users.sort(key=lambda x: x["metadata"].entry_count, reverse=True)
+
+
# Generate timeline page
+
timeline_template = self.env.get_template("timeline.html")
+
timeline_content = timeline_template.render(
+
**base_data,
+
page="timeline",
+
timeline_items=timeline_items, # Already limited above
+
)
+
with open(self.output_dir / "timeline.html", "w") as f:
+
f.write(timeline_content)
+
+
# Generate links page
+
links_template = self.env.get_template("links.html")
+
links_content = links_template.render(
+
**base_data,
+
page="links",
+
outgoing_links=outgoing_links[:100],
+
)
+
with open(self.output_dir / "links.html", "w") as f:
+
f.write(links_content)
+
+
# Generate users page
+
users_template = self.env.get_template("users.html")
+
users_content = users_template.render(
+
**base_data,
+
page="users",
+
users=users,
+
)
+
with open(self.output_dir / "users.html", "w") as f:
+
f.write(users_content)
+
+
# Generate main index page (redirect to timeline)
+
index_template = self.env.get_template("index.html")
+
index_content = index_template.render(**base_data)
+
with open(self.output_dir / "index.html", "w") as f:
+
f.write(index_content)
+
+
console.print(f"[green]โœ“[/green] Generated website at {self.output_dir}")
+
console.print(f" - {len(self.entries)} entries")
+
console.print(f" - {len(self.threads)} conversation threads")
+
console.print(f" - {len(outgoing_links)} outgoing links")
+
console.print(f" - {len(users)} users")
+
console.print(
+
" - Generated pages: index.html, timeline.html, links.html, users.html"
+
)
+
+
+
@app.command()
+
def generate(
+
output: Path = typer.Option(
+
Path("./thicket-site"),
+
"--output",
+
"-o",
+
help="Output directory for the generated website",
+
),
+
force: bool = typer.Option(
+
False, "--force", "-f", help="Overwrite existing output directory"
+
),
+
config_file: Path = typer.Option(
+
Path("thicket.yaml"), "--config", help="Configuration file path"
+
),
+
) -> None:
+
"""Generate a static HTML website from thicket data."""
+
config = load_config(config_file)
+
+
if not config.git_store:
+
console.print("[red]No git store path configured[/red]")
+
raise typer.Exit(1)
+
+
git_store = GitStore(config.git_store)
+
+
# Check if output directory exists
+
if output.exists() and not force:
+
console.print(
+
f"[red]Output directory {output} already exists. Use --force to overwrite.[/red]"
+
)
+
raise typer.Exit(1)
+
+
# Clean output directory if forcing
+
if output.exists() and force:
+
shutil.rmtree(output)
+
+
try:
+
generator = WebsiteGenerator(git_store, output)
+
+
console.print("[bold]Generating static website...[/bold]")
+
generator.load_data()
+
generator.build_threads()
+
generator.generate_site()
+
+
except Exception as e:
+
console.print(f"[red]Error generating website: {e}[/red]")
+
raise typer.Exit(1) from e
+
</file>
+
+
<file path="src/thicket/templates/base.html">
+
<!DOCTYPE html>
+
<html lang="en">
+
<head>
+
<meta charset="UTF-8">
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
+
<title>{% block page_title %}{{ title }}{% endblock %}</title>
+
<link rel="stylesheet" href="css/style.css">
+
</head>
+
<body>
+
<header class="site-header">
+
<div class="header-content">
+
<h1 class="site-title">{{ title }}</h1>
+
<nav class="site-nav">
+
<a href="timeline.html" class="nav-link {% if page == 'timeline' %}active{% endif %}">Timeline</a>
+
<a href="links.html" class="nav-link {% if page == 'links' %}active{% endif %}">Links</a>
+
<a href="users.html" class="nav-link {% if page == 'users' %}active{% endif %}">Users</a>
+
</nav>
+
</div>
+
</header>
+
+
<main class="main-content">
+
{% block content %}{% endblock %}
+
</main>
+
+
<footer class="site-footer">
+
<p>Generated on {{ generated_at }} by <a href="https://github.com/avsm/thicket">Thicket</a></p>
+
</footer>
+
+
<script src="js/script.js"></script>
+
</body>
+
</html>
+
</file>
+
+
<file path="src/thicket/templates/index.html">
+
<!DOCTYPE html>
+
<html lang="en">
+
<head>
+
<meta charset="UTF-8">
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
+
<title>{{ title }}</title>
+
<meta http-equiv="refresh" content="0; url=timeline.html">
+
<link rel="canonical" href="timeline.html">
+
</head>
+
<body>
+
<p>Redirecting to <a href="timeline.html">Timeline</a>...</p>
+
</body>
+
</html>
+
</file>
+
+
<file path="src/thicket/templates/links.html">
+
{% extends "base.html" %}
+
+
{% block page_title %}Outgoing Links - {{ title }}{% endblock %}
+
+
{% block content %}
+
<div class="page-content">
+
<h2>Outgoing Links</h2>
+
<p class="page-description">External links referenced in blog posts, ordered by most recent reference.</p>
+
+
{% for link in outgoing_links %}
+
<article class="link-group">
+
<h3 class="link-url">
+
<a href="{{ link.url }}" target="_blank">{{ link.url|truncate(80) }}</a>
+
{% if link.target_username %}
+
<span class="target-user">({{ link.target_username }})</span>
+
{% endif %}
+
</h3>
+
<div class="referencing-entries">
+
<span class="ref-count">Referenced in {{ link.entries|length }} post(s):</span>
+
<ul>
+
{% for display_name, entry in link.entries[:5] %}
+
<li>
+
<span class="author">{{ display_name }}</span> -
+
<a href="{{ entry.link }}" target="_blank">{{ entry.title }}</a>
+
<time datetime="{{ entry.updated or entry.published }}">
+
({{ (entry.updated or entry.published).strftime('%Y-%m-%d') }})
+
</time>
+
</li>
+
{% endfor %}
+
{% if link.entries|length > 5 %}
+
<li class="more">... and {{ link.entries|length - 5 }} more</li>
+
{% endif %}
+
</ul>
+
</div>
+
</article>
+
{% endfor %}
+
</div>
+
{% endblock %}
+
</file>
+
+
<file path="src/thicket/templates/script.js">
+
// Enhanced functionality for thicket website
+
document.addEventListener('DOMContentLoaded', function() {
+
+
// Enhance thread collapsing (optional feature)
+
const threadHeaders = document.querySelectorAll('.thread-header');
+
threadHeaders.forEach(header => {
+
header.style.cursor = 'pointer';
+
header.addEventListener('click', function() {
+
const thread = this.parentElement;
+
const entries = thread.querySelectorAll('.thread-entry');
+
+
// Toggle visibility of all but the first entry
+
for (let i = 1; i < entries.length; i++) {
+
entries[i].style.display = entries[i].style.display === 'none' ? 'block' : 'none';
+
}
+
+
// Update thread count text
+
const count = this.querySelector('.thread-count');
+
if (entries[1] && entries[1].style.display === 'none') {
+
count.textContent = count.textContent.replace('posts', 'posts (collapsed)');
+
} else {
+
count.textContent = count.textContent.replace(' (collapsed)', '');
+
}
+
});
+
});
+
+
// Add relative time display
+
const timeElements = document.querySelectorAll('time');
+
timeElements.forEach(timeEl => {
+
const datetime = new Date(timeEl.getAttribute('datetime'));
+
const now = new Date();
+
const diffMs = now - datetime;
+
const diffDays = Math.floor(diffMs / (1000 * 60 * 60 * 24));
+
+
let relativeTime;
+
if (diffDays === 0) {
+
const diffHours = Math.floor(diffMs / (1000 * 60 * 60));
+
if (diffHours === 0) {
+
const diffMinutes = Math.floor(diffMs / (1000 * 60));
+
relativeTime = diffMinutes === 0 ? 'just now' : `${diffMinutes}m ago`;
+
} else {
+
relativeTime = `${diffHours}h ago`;
+
}
+
} else if (diffDays === 1) {
+
relativeTime = 'yesterday';
+
} else if (diffDays < 7) {
+
relativeTime = `${diffDays}d ago`;
+
} else if (diffDays < 30) {
+
const weeks = Math.floor(diffDays / 7);
+
relativeTime = weeks === 1 ? '1w ago' : `${weeks}w ago`;
+
} else if (diffDays < 365) {
+
const months = Math.floor(diffDays / 30);
+
relativeTime = months === 1 ? '1mo ago' : `${months}mo ago`;
+
} else {
+
const years = Math.floor(diffDays / 365);
+
relativeTime = years === 1 ? '1y ago' : `${years}y ago`;
+
}
+
+
// Add relative time as title attribute
+
timeEl.setAttribute('title', timeEl.textContent);
+
timeEl.textContent = relativeTime;
+
});
+
+
// Enhanced anchor link scrolling for shared references
+
document.querySelectorAll('a[href^="#"]').forEach(anchor => {
+
anchor.addEventListener('click', function (e) {
+
e.preventDefault();
+
const target = document.querySelector(this.getAttribute('href'));
+
if (target) {
+
target.scrollIntoView({
+
behavior: 'smooth',
+
block: 'center'
+
});
+
+
// Highlight the target briefly
+
const timelineEntry = target.closest('.timeline-entry');
+
if (timelineEntry) {
+
timelineEntry.style.outline = '2px solid var(--primary-color)';
+
timelineEntry.style.borderRadius = '8px';
+
setTimeout(() => {
+
timelineEntry.style.outline = '';
+
timelineEntry.style.borderRadius = '';
+
}, 2000);
+
}
+
}
+
});
+
});
+
});
+
</file>
+
+
<file path="src/thicket/templates/style.css">
+
/* Modern, clean design with high-density text and readable theme */
+
+
:root {
+
--primary-color: #2c3e50;
+
--secondary-color: #3498db;
+
--accent-color: #e74c3c;
+
--background: #ffffff;
+
--surface: #f8f9fa;
+
--text-primary: #2c3e50;
+
--text-secondary: #7f8c8d;
+
--border-color: #e0e0e0;
+
--thread-indent: 20px;
+
--max-width: 1200px;
+
}
+
+
* {
+
margin: 0;
+
padding: 0;
+
box-sizing: border-box;
+
}
+
+
body {
+
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Helvetica Neue', Arial, sans-serif;
+
font-size: 14px;
+
line-height: 1.6;
+
color: var(--text-primary);
+
background-color: var(--background);
+
}
+
+
/* Header */
+
.site-header {
+
background-color: var(--surface);
+
border-bottom: 1px solid var(--border-color);
+
padding: 0.75rem 0;
+
position: sticky;
+
top: 0;
+
z-index: 100;
+
}
+
+
.header-content {
+
max-width: var(--max-width);
+
margin: 0 auto;
+
padding: 0 2rem;
+
display: flex;
+
justify-content: space-between;
+
align-items: center;
+
}
+
+
.site-title {
+
font-size: 1.5rem;
+
font-weight: 600;
+
color: var(--primary-color);
+
margin: 0;
+
}
+
+
/* Navigation */
+
.site-nav {
+
display: flex;
+
gap: 1.5rem;
+
}
+
+
.nav-link {
+
text-decoration: none;
+
color: var(--text-secondary);
+
font-weight: 500;
+
font-size: 0.95rem;
+
padding: 0.5rem 0.75rem;
+
border-radius: 4px;
+
transition: all 0.2s ease;
+
}
+
+
.nav-link:hover {
+
color: var(--primary-color);
+
background-color: var(--background);
+
}
+
+
.nav-link.active {
+
color: var(--secondary-color);
+
background-color: var(--background);
+
font-weight: 600;
+
}
+
+
/* Main Content */
+
.main-content {
+
max-width: var(--max-width);
+
margin: 2rem auto;
+
padding: 0 2rem;
+
}
+
+
.page-content {
+
margin: 0;
+
}
+
+
.page-description {
+
color: var(--text-secondary);
+
margin-bottom: 1.5rem;
+
font-style: italic;
+
}
+
+
/* Sections */
+
section {
+
margin-bottom: 2rem;
+
}
+
+
h2 {
+
font-size: 1.3rem;
+
font-weight: 600;
+
margin-bottom: 0.75rem;
+
color: var(--primary-color);
+
}
+
+
h3 {
+
font-size: 1.1rem;
+
font-weight: 600;
+
margin-bottom: 0.75rem;
+
color: var(--primary-color);
+
}
+
+
/* Entries and Threads */
+
article {
+
margin-bottom: 1.5rem;
+
padding: 1rem;
+
background-color: var(--surface);
+
border-radius: 4px;
+
border: 1px solid var(--border-color);
+
}
+
+
/* Timeline-style entries */
+
.timeline-entry {
+
margin-bottom: 0.5rem;
+
padding: 0.5rem 0.75rem;
+
border: none;
+
background: transparent;
+
transition: background-color 0.2s ease;
+
}
+
+
.timeline-entry:hover {
+
background-color: var(--surface);
+
}
+
+
.timeline-meta {
+
display: inline-flex;
+
gap: 0.5rem;
+
align-items: center;
+
font-size: 0.75rem;
+
color: var(--text-secondary);
+
margin-bottom: 0.25rem;
+
}
+
+
.timeline-time {
+
font-family: 'SF Mono', Monaco, Consolas, 'Courier New', monospace;
+
font-size: 0.75rem;
+
color: var(--text-secondary);
+
}
+
+
.timeline-author {
+
font-weight: 600;
+
color: var(--primary-color);
+
font-size: 0.8rem;
+
text-decoration: none;
+
}
+
+
.timeline-author:hover {
+
color: var(--secondary-color);
+
text-decoration: underline;
+
}
+
+
.timeline-content {
+
line-height: 1.4;
+
}
+
+
.timeline-title {
+
font-size: 0.95rem;
+
font-weight: 600;
+
}
+
+
.timeline-title a {
+
color: var(--primary-color);
+
text-decoration: none;
+
}
+
+
.timeline-title a:hover {
+
color: var(--secondary-color);
+
text-decoration: underline;
+
}
+
+
.timeline-summary {
+
color: var(--text-secondary);
+
font-size: 0.9rem;
+
line-height: 1.4;
+
}
+
+
/* Legacy styles for other sections */
+
.entry-meta, .thread-header {
+
display: flex;
+
gap: 1rem;
+
align-items: center;
+
margin-bottom: 0.5rem;
+
font-size: 0.85rem;
+
color: var(--text-secondary);
+
}
+
+
.author {
+
font-weight: 600;
+
color: var(--primary-color);
+
}
+
+
time {
+
font-size: 0.85rem;
+
}
+
+
h4 {
+
font-size: 1.1rem;
+
font-weight: 600;
+
margin-bottom: 0.5rem;
+
}
+
+
h4 a {
+
color: var(--primary-color);
+
text-decoration: none;
+
}
+
+
h4 a:hover {
+
color: var(--secondary-color);
+
text-decoration: underline;
+
}
+
+
.entry-summary {
+
color: var(--text-primary);
+
line-height: 1.5;
+
margin-top: 0.5rem;
+
}
+
+
/* Enhanced Threading Styles */
+
+
/* Conversation Clusters */
+
.conversation-cluster {
+
background-color: var(--background);
+
border: 2px solid var(--border-color);
+
border-radius: 8px;
+
margin-bottom: 2rem;
+
overflow: hidden;
+
box-shadow: 0 2px 4px rgba(0, 0, 0, 0.05);
+
}
+
+
.conversation-header {
+
background: linear-gradient(135deg, var(--surface) 0%, #f1f3f4 100%);
+
padding: 0.75rem 1rem;
+
border-bottom: 1px solid var(--border-color);
+
}
+
+
.conversation-meta {
+
display: flex;
+
justify-content: space-between;
+
align-items: center;
+
flex-wrap: wrap;
+
gap: 0.5rem;
+
}
+
+
.conversation-count {
+
font-weight: 600;
+
color: var(--secondary-color);
+
font-size: 0.9rem;
+
}
+
+
.conversation-participants {
+
font-size: 0.8rem;
+
color: var(--text-secondary);
+
flex: 1;
+
text-align: right;
+
}
+
+
.conversation-flow {
+
padding: 0.5rem;
+
}
+
+
/* Threaded Conversation Entries */
+
.conversation-entry {
+
position: relative;
+
margin-bottom: 0.75rem;
+
display: flex;
+
align-items: flex-start;
+
}
+
+
.conversation-entry.level-0 {
+
margin-left: 0;
+
}
+
+
.conversation-entry.level-1 {
+
margin-left: 1.5rem;
+
}
+
+
.conversation-entry.level-2 {
+
margin-left: 3rem;
+
}
+
+
.conversation-entry.level-3 {
+
margin-left: 4.5rem;
+
}
+
+
.conversation-entry.level-4 {
+
margin-left: 6rem;
+
}
+
+
.entry-connector {
+
width: 3px;
+
background-color: var(--secondary-color);
+
margin-right: 0.75rem;
+
margin-top: 0.25rem;
+
min-height: 2rem;
+
border-radius: 2px;
+
opacity: 0.6;
+
}
+
+
.conversation-entry.level-0 .entry-connector {
+
background-color: var(--accent-color);
+
opacity: 0.8;
+
}
+
+
.entry-content {
+
flex: 1;
+
background-color: var(--surface);
+
padding: 0.75rem;
+
border-radius: 6px;
+
border: 1px solid var(--border-color);
+
transition: all 0.2s ease;
+
}
+
+
.entry-content:hover {
+
border-color: var(--secondary-color);
+
box-shadow: 0 2px 8px rgba(52, 152, 219, 0.1);
+
}
+
+
/* Reference Indicators */
+
.reference-indicators {
+
display: inline-flex;
+
gap: 0.25rem;
+
margin-left: 0.5rem;
+
}
+
+
.ref-out, .ref-in {
+
display: inline-block;
+
width: 1rem;
+
height: 1rem;
+
border-radius: 50%;
+
text-align: center;
+
line-height: 1rem;
+
font-size: 0.7rem;
+
font-weight: bold;
+
}
+
+
.ref-out {
+
background-color: #e8f5e8;
+
color: #2d8f2d;
+
}
+
+
.ref-in {
+
background-color: #e8f0ff;
+
color: #1f5fbf;
+
}
+
+
/* Reference Badges for Individual Posts */
+
.timeline-entry.with-references {
+
background-color: var(--surface);
+
}
+
+
/* Conversation posts in unified timeline */
+
.timeline-entry.conversation-post {
+
background: transparent;
+
border: none;
+
margin-bottom: 0.5rem;
+
padding: 0.5rem 0.75rem;
+
}
+
+
.timeline-entry.conversation-post.level-0 {
+
margin-left: 0;
+
border-left: 2px solid var(--accent-color);
+
padding-left: 0.75rem;
+
}
+
+
.timeline-entry.conversation-post.level-1 {
+
margin-left: 1.5rem;
+
border-left: 2px solid var(--secondary-color);
+
padding-left: 0.75rem;
+
}
+
+
.timeline-entry.conversation-post.level-2 {
+
margin-left: 3rem;
+
border-left: 2px solid var(--text-secondary);
+
padding-left: 0.75rem;
+
}
+
+
.timeline-entry.conversation-post.level-3 {
+
margin-left: 4.5rem;
+
border-left: 2px solid var(--text-secondary);
+
padding-left: 0.75rem;
+
}
+
+
.timeline-entry.conversation-post.level-4 {
+
margin-left: 6rem;
+
border-left: 2px solid var(--text-secondary);
+
padding-left: 0.75rem;
+
}
+
+
/* Cross-thread linking */
+
.cross-thread-links {
+
margin-top: 0.5rem;
+
padding-top: 0.5rem;
+
border-top: 1px solid var(--border-color);
+
}
+
+
.cross-thread-indicator {
+
font-size: 0.75rem;
+
color: var(--text-secondary);
+
background-color: var(--surface);
+
padding: 0.25rem 0.5rem;
+
border-radius: 12px;
+
border: 1px solid var(--border-color);
+
display: inline-block;
+
}
+
+
/* Inline shared references styling */
+
.inline-shared-refs {
+
margin-left: 0.5rem;
+
font-size: 0.85rem;
+
color: var(--text-secondary);
+
}
+
+
.shared-ref-link {
+
color: var(--primary-color);
+
text-decoration: none;
+
font-weight: 500;
+
transition: color 0.2s ease;
+
}
+
+
.shared-ref-link:hover {
+
color: var(--secondary-color);
+
text-decoration: underline;
+
}
+
+
.shared-ref-more {
+
font-style: italic;
+
color: var(--text-secondary);
+
font-size: 0.8rem;
+
margin-left: 0.25rem;
+
}
+
+
.user-anchor, .post-anchor {
+
position: absolute;
+
margin-top: -60px; /* Offset for fixed header */
+
pointer-events: none;
+
}
+
+
.cross-thread-link {
+
color: var(--primary-color);
+
text-decoration: none;
+
font-weight: 500;
+
transition: color 0.2s ease;
+
}
+
+
.cross-thread-link:hover {
+
color: var(--secondary-color);
+
text-decoration: underline;
+
}
+
+
.reference-badges {
+
display: flex;
+
gap: 0.25rem;
+
margin-left: 0.5rem;
+
flex-wrap: wrap;
+
}
+
+
.ref-badge {
+
display: inline-block;
+
padding: 0.1rem 0.4rem;
+
border-radius: 12px;
+
font-size: 0.7rem;
+
font-weight: 600;
+
text-transform: uppercase;
+
letter-spacing: 0.05em;
+
}
+
+
.ref-badge.ref-outbound {
+
background-color: #e8f5e8;
+
color: #2d8f2d;
+
border: 1px solid #c3e6c3;
+
}
+
+
.ref-badge.ref-inbound {
+
background-color: #e8f0ff;
+
color: #1f5fbf;
+
border: 1px solid #b3d9ff;
+
}
+
+
/* Author Color Coding */
+
.timeline-author {
+
position: relative;
+
}
+
+
.timeline-author::before {
+
content: '';
+
display: inline-block;
+
width: 8px;
+
height: 8px;
+
border-radius: 50%;
+
margin-right: 0.5rem;
+
background-color: var(--secondary-color);
+
}
+
+
/* Generate consistent colors for authors */
+
.author-avsm::before { background-color: #e74c3c; }
+
.author-mort::before { background-color: #3498db; }
+
.author-mte::before { background-color: #2ecc71; }
+
.author-ryan::before { background-color: #f39c12; }
+
.author-mwd::before { background-color: #9b59b6; }
+
.author-dra::before { background-color: #1abc9c; }
+
.author-pf341::before { background-color: #34495e; }
+
.author-sadiqj::before { background-color: #e67e22; }
+
.author-martinkl::before { background-color: #8e44ad; }
+
.author-jonsterling::before { background-color: #27ae60; }
+
.author-jon::before { background-color: #f1c40f; }
+
.author-onkar::before { background-color: #e91e63; }
+
.author-gabriel::before { background-color: #00bcd4; }
+
.author-jess::before { background-color: #ff5722; }
+
.author-ibrahim::before { background-color: #607d8b; }
+
.author-andres::before { background-color: #795548; }
+
.author-eeg::before { background-color: #ff9800; }
+
+
/* Section Headers */
+
.conversations-section h3,
+
.referenced-posts-section h3,
+
.individual-posts-section h3 {
+
border-bottom: 2px solid var(--border-color);
+
padding-bottom: 0.5rem;
+
margin-bottom: 1.5rem;
+
position: relative;
+
}
+
+
.conversations-section h3::before {
+
content: "๐Ÿ’ฌ";
+
margin-right: 0.5rem;
+
}
+
+
.referenced-posts-section h3::before {
+
content: "๐Ÿ”—";
+
margin-right: 0.5rem;
+
}
+
+
.individual-posts-section h3::before {
+
content: "๐Ÿ“";
+
margin-right: 0.5rem;
+
}
+
+
/* Legacy thread styles (for backward compatibility) */
+
.thread {
+
background-color: var(--background);
+
border: 1px solid var(--border-color);
+
padding: 0;
+
overflow: hidden;
+
margin-bottom: 1rem;
+
}
+
+
.thread-header {
+
background-color: var(--surface);
+
padding: 0.5rem 0.75rem;
+
border-bottom: 1px solid var(--border-color);
+
}
+
+
.thread-count {
+
font-weight: 600;
+
color: var(--secondary-color);
+
}
+
+
.thread-entry {
+
padding: 0.5rem 0.75rem;
+
border-bottom: 1px solid var(--border-color);
+
}
+
+
.thread-entry:last-child {
+
border-bottom: none;
+
}
+
+
.thread-entry.reply {
+
margin-left: var(--thread-indent);
+
border-left: 3px solid var(--secondary-color);
+
background-color: var(--surface);
+
}
+
+
/* Links Section */
+
.link-group {
+
background-color: var(--background);
+
}
+
+
.link-url {
+
font-size: 1rem;
+
word-break: break-word;
+
}
+
+
.link-url a {
+
color: var(--secondary-color);
+
text-decoration: none;
+
}
+
+
.link-url a:hover {
+
text-decoration: underline;
+
}
+
+
.target-user {
+
font-size: 0.9rem;
+
color: var(--text-secondary);
+
font-weight: normal;
+
}
+
+
.referencing-entries {
+
margin-top: 0.75rem;
+
}
+
+
.ref-count {
+
font-weight: 600;
+
color: var(--text-secondary);
+
font-size: 0.9rem;
+
}
+
+
.referencing-entries ul {
+
list-style: none;
+
margin-top: 0.5rem;
+
padding-left: 1rem;
+
}
+
+
.referencing-entries li {
+
margin-bottom: 0.25rem;
+
font-size: 0.9rem;
+
}
+
+
.referencing-entries .more {
+
font-style: italic;
+
color: var(--text-secondary);
+
}
+
+
/* Users Section */
+
.user-card {
+
background-color: var(--background);
+
}
+
+
.user-header {
+
display: flex;
+
gap: 1rem;
+
align-items: start;
+
margin-bottom: 1rem;
+
}
+
+
.user-icon {
+
width: 48px;
+
height: 48px;
+
border-radius: 50%;
+
object-fit: cover;
+
}
+
+
.user-info h3 {
+
margin-bottom: 0.25rem;
+
}
+
+
.username {
+
font-size: 0.9rem;
+
color: var(--text-secondary);
+
font-weight: normal;
+
}
+
+
.user-meta {
+
font-size: 0.9rem;
+
color: var(--text-secondary);
+
}
+
+
.user-meta a {
+
color: var(--secondary-color);
+
text-decoration: none;
+
}
+
+
.user-meta a:hover {
+
text-decoration: underline;
+
}
+
+
.separator {
+
margin: 0 0.5rem;
+
}
+
+
.post-count {
+
font-weight: 600;
+
}
+
+
.user-recent h4 {
+
font-size: 0.95rem;
+
margin-bottom: 0.5rem;
+
color: var(--text-secondary);
+
}
+
+
.user-recent ul {
+
list-style: none;
+
padding-left: 0;
+
}
+
+
.user-recent li {
+
margin-bottom: 0.25rem;
+
font-size: 0.9rem;
+
}
+
+
/* Footer */
+
.site-footer {
+
max-width: var(--max-width);
+
margin: 3rem auto 2rem;
+
padding: 1rem 2rem;
+
text-align: center;
+
color: var(--text-secondary);
+
font-size: 0.85rem;
+
border-top: 1px solid var(--border-color);
+
}
+
+
.site-footer a {
+
color: var(--secondary-color);
+
text-decoration: none;
+
}
+
+
.site-footer a:hover {
+
text-decoration: underline;
+
}
+
+
/* Responsive */
+
@media (max-width: 768px) {
+
.site-title {
+
font-size: 1.3rem;
+
}
+
+
.header-content {
+
flex-direction: column;
+
gap: 0.75rem;
+
align-items: flex-start;
+
}
+
+
.site-nav {
+
gap: 1rem;
+
}
+
+
.main-content {
+
padding: 0 1rem;
+
}
+
+
.thread-entry.reply {
+
margin-left: calc(var(--thread-indent) / 2);
+
}
+
+
.user-header {
+
flex-direction: column;
+
}
+
}
+
</file>
+
+
<file path="src/thicket/templates/timeline.html">
+
{% extends "base.html" %}
+
+
{% block page_title %}Timeline - {{ title }}{% endblock %}
+
+
{% block content %}
+
{% set seen_users = [] %}
+
<div class="page-content">
+
<h2>Recent Posts & Conversations</h2>
+
+
<section class="unified-timeline">
+
{% for item in timeline_items %}
+
{% if item.type == "post" %}
+
<!-- Individual Post -->
+
<article class="timeline-entry {% if item.content.references %}with-references{% endif %}">
+
<div class="timeline-meta">
+
<time datetime="{{ item.content.entry.updated or item.content.entry.published }}" class="timeline-time">
+
{{ (item.content.entry.updated or item.content.entry.published).strftime('%Y-%m-%d %H:%M') }}
+
</time>
+
{% set homepage = get_user_homepage(item.content.username) %}
+
{% if item.content.username not in seen_users %}
+
<a id="{{ item.content.username }}" class="user-anchor"></a>
+
{% set _ = seen_users.append(item.content.username) %}
+
{% endif %}
+
<a id="post-{{ loop.index0 }}-{{ safe_anchor_id(item.content.entry.id) }}" class="post-anchor"></a>
+
{% if homepage %}
+
<a href="{{ homepage }}" target="_blank" class="timeline-author">{{ item.content.display_name }}</a>
+
{% else %}
+
<span class="timeline-author">{{ item.content.display_name }}</span>
+
{% endif %}
+
{% if item.content.references %}
+
<div class="reference-badges">
+
{% for ref in item.content.references %}
+
{% if ref.type == 'outbound' %}
+
<span class="ref-badge ref-outbound" title="References {{ ref.target_username or 'external post' }}">
+
โ†’ {{ ref.target_username or 'ext' }}
+
</span>
+
{% elif ref.type == 'inbound' %}
+
<span class="ref-badge ref-inbound" title="Referenced by {{ ref.source_username or 'external post' }}">
+
โ† {{ ref.source_username or 'ext' }}
+
</span>
+
{% endif %}
+
{% endfor %}
+
</div>
+
{% endif %}
+
</div>
+
<div class="timeline-content">
+
<strong class="timeline-title">
+
<a href="{{ item.content.entry.link }}" target="_blank">{{ item.content.entry.title }}</a>
+
</strong>
+
{% if item.content.entry.summary %}
+
<span class="timeline-summary">โ€” {{ clean_html_summary(item.content.entry.summary, 250) }}</span>
+
{% endif %}
+
{% if item.content.shared_references %}
+
<span class="inline-shared-refs">
+
{% for ref in item.content.shared_references[:3] %}
+
{% if ref.target_username %}
+
<a href="#{{ ref.target_username }}" class="shared-ref-link" title="Referenced by {{ ref.count }} entries">@{{ ref.target_username }}</a>{% if not loop.last %}, {% endif %}
+
{% endif %}
+
{% endfor %}
+
{% if item.content.shared_references|length > 3 %}
+
<span class="shared-ref-more">+{{ item.content.shared_references|length - 3 }} more</span>
+
{% endif %}
+
</span>
+
{% endif %}
+
{% if item.content.cross_thread_links %}
+
<div class="cross-thread-links">
+
<span class="cross-thread-indicator">๐Ÿ”— Also appears: </span>
+
{% for link in item.content.cross_thread_links %}
+
<a href="#{{ link.anchor_id }}" class="cross-thread-link" title="{{ link.title }}">{{ link.context }}</a>{% if not loop.last %}, {% endif %}
+
{% endfor %}
+
</div>
+
{% endif %}
+
</div>
+
</article>
+
+
{% elif item.type == "thread" %}
+
<!-- Conversation Thread -->
+
{% set outer_loop_index = loop.index0 %}
+
{% for thread_item in item.content %}
+
<article class="timeline-entry conversation-post level-{{ thread_item.thread_level }}">
+
<div class="timeline-meta">
+
<time datetime="{{ thread_item.entry.updated or thread_item.entry.published }}" class="timeline-time">
+
{{ (thread_item.entry.updated or thread_item.entry.published).strftime('%Y-%m-%d %H:%M') }}
+
</time>
+
{% set homepage = get_user_homepage(thread_item.username) %}
+
{% if thread_item.username not in seen_users %}
+
<a id="{{ thread_item.username }}" class="user-anchor"></a>
+
{% set _ = seen_users.append(thread_item.username) %}
+
{% endif %}
+
<a id="post-{{ outer_loop_index }}-{{ loop.index0 }}-{{ safe_anchor_id(thread_item.entry.id) }}" class="post-anchor"></a>
+
{% if homepage %}
+
<a href="{{ homepage }}" target="_blank" class="timeline-author author-{{ thread_item.username }}">{{ thread_item.display_name }}</a>
+
{% else %}
+
<span class="timeline-author author-{{ thread_item.username }}">{{ thread_item.display_name }}</span>
+
{% endif %}
+
{% if thread_item.references_to or thread_item.referenced_by %}
+
<span class="reference-indicators">
+
{% if thread_item.references_to %}
+
<span class="ref-out" title="References other posts">โ†’</span>
+
{% endif %}
+
{% if thread_item.referenced_by %}
+
<span class="ref-in" title="Referenced by other posts">โ†</span>
+
{% endif %}
+
</span>
+
{% endif %}
+
</div>
+
<div class="timeline-content">
+
<strong class="timeline-title">
+
<a href="{{ thread_item.entry.link }}" target="_blank">{{ thread_item.entry.title }}</a>
+
</strong>
+
{% if thread_item.entry.summary %}
+
<span class="timeline-summary">โ€” {{ clean_html_summary(thread_item.entry.summary, 300) }}</span>
+
{% endif %}
+
{% if thread_item.shared_references %}
+
<span class="inline-shared-refs">
+
{% for ref in thread_item.shared_references[:3] %}
+
{% if ref.target_username %}
+
<a href="#{{ ref.target_username }}" class="shared-ref-link" title="Referenced by {{ ref.count }} entries">@{{ ref.target_username }}</a>{% if not loop.last %}, {% endif %}
+
{% endif %}
+
{% endfor %}
+
{% if thread_item.shared_references|length > 3 %}
+
<span class="shared-ref-more">+{{ thread_item.shared_references|length - 3 }} more</span>
+
{% endif %}
+
</span>
+
{% endif %}
+
{% if thread_item.cross_thread_links %}
+
<div class="cross-thread-links">
+
<span class="cross-thread-indicator">๐Ÿ”— Also appears: </span>
+
{% for link in thread_item.cross_thread_links %}
+
<a href="#{{ link.anchor_id }}" class="cross-thread-link" title="{{ link.title }}">{{ link.context }}</a>{% if not loop.last %}, {% endif %}
+
{% endfor %}
+
</div>
+
{% endif %}
+
</div>
+
</article>
+
{% endfor %}
+
{% endif %}
+
{% endfor %}
+
</section>
+
</div>
+
{% endblock %}
+
</file>
+
+
<file path="src/thicket/templates/users.html">
+
{% extends "base.html" %}
+
+
{% block page_title %}Users - {{ title }}{% endblock %}
+
+
{% block content %}
+
<div class="page-content">
+
<h2>Users</h2>
+
<p class="page-description">All users contributing to this thicket, ordered by post count.</p>
+
+
{% for user_info in users %}
+
<article class="user-card">
+
<div class="user-header">
+
{% if user_info.metadata.icon and user_info.metadata.icon != "None" %}
+
<img src="{{ user_info.metadata.icon }}" alt="{{ user_info.metadata.username }}" class="user-icon">
+
{% endif %}
+
<div class="user-info">
+
<h3>
+
{% if user_info.metadata.display_name %}
+
{{ user_info.metadata.display_name }}
+
<span class="username">({{ user_info.metadata.username }})</span>
+
{% else %}
+
{{ user_info.metadata.username }}
+
{% endif %}
+
</h3>
+
<div class="user-meta">
+
{% if user_info.metadata.homepage %}
+
<a href="{{ user_info.metadata.homepage }}" target="_blank">{{ user_info.metadata.homepage }}</a>
+
{% endif %}
+
{% if user_info.metadata.email %}
+
<span class="separator">โ€ข</span>
+
<a href="mailto:{{ user_info.metadata.email }}">{{ user_info.metadata.email }}</a>
+
{% endif %}
+
<span class="separator">โ€ข</span>
+
<span class="post-count">{{ user_info.metadata.entry_count }} posts</span>
+
</div>
+
</div>
+
</div>
+
+
{% if user_info.recent_entries %}
+
<div class="user-recent">
+
<h4>Recent posts:</h4>
+
<ul>
+
{% for display_name, entry in user_info.recent_entries %}
+
<li>
+
<a href="{{ entry.link }}" target="_blank">{{ entry.title }}</a>
+
<time datetime="{{ entry.updated or entry.published }}">
+
({{ (entry.updated or entry.published).strftime('%Y-%m-%d') }})
+
</time>
+
</li>
+
{% endfor %}
+
</ul>
+
</div>
+
{% endif %}
+
</article>
+
{% endfor %}
+
</div>
+
{% endblock %}
+
</file>
+
+
<file path="README.md">
+
# Thicket
+
+
A modern CLI tool for persisting Atom/RSS feeds in Git repositories, designed to enable distributed webblog comment structures.
+
+
## Features
+
+
- **Feed Auto-Discovery**: Automatically extracts user metadata from Atom/RSS feeds
+
- **Git Storage**: Stores feed entries in a Git repository with full history
+
- **Duplicate Management**: Manual curation of duplicate entries across feeds
+
- **Modern CLI**: Built with Typer and Rich for beautiful terminal output
+
- **Comprehensive Parsing**: Supports RSS 0.9x, RSS 1.0, RSS 2.0, and Atom feeds
+
- **Cron-Friendly**: Designed for scheduled execution
+
+
## Installation
+
+
```bash
+
# Install from source
+
pip install -e .
+
+
# Or install with dev dependencies
+
pip install -e .[dev]
+
```
+
+
## Quick Start
+
+
1. **Initialize a new thicket repository:**
+
```bash
+
thicket init ./my-feeds
+
```
+
+
2. **Add a user with their feed:**
+
```bash
+
thicket add user "alice" --feed "https://alice.example.com/feed.xml"
+
```
+
+
3. **Sync feeds to download entries:**
+
```bash
+
thicket sync --all
+
```
+
+
4. **List users and feeds:**
+
```bash
+
thicket list users
+
thicket list feeds
+
thicket list entries
+
```
+
+
## Commands
+
+
### Initialize
+
```bash
+
thicket init <git-store-path> [--cache-dir <path>] [--config <config-file>]
+
```
+
+
### Add Users and Feeds
+
```bash
+
# Add user with auto-discovery
+
thicket add user "username" --feed "https://example.com/feed.xml"
+
+
# Add user with manual metadata
+
thicket add user "username" \
+
--feed "https://example.com/feed.xml" \
+
--email "user@example.com" \
+
--homepage "https://example.com" \
+
--display-name "User Name"
+
+
# Add additional feed to existing user
+
thicket add feed "username" "https://example.com/other-feed.xml"
+
```
+
+
### Sync Feeds
+
```bash
+
# Sync all users
+
thicket sync --all
+
+
# Sync specific user
+
thicket sync --user "username"
+
+
# Dry run (preview changes)
+
thicket sync --all --dry-run
+
```
+
+
### List Information
+
```bash
+
# List all users
+
thicket list users
+
+
# List all feeds
+
thicket list feeds
+
+
# List feeds for specific user
+
thicket list feeds --user "username"
+
+
# List recent entries
+
thicket list entries --limit 20
+
+
# List entries for specific user
+
thicket list entries --user "username"
+
```
+
+
### Manage Duplicates
+
```bash
+
# List duplicate mappings
+
thicket duplicates list
+
+
# Mark entries as duplicates
+
thicket duplicates add "https://example.com/dup" "https://example.com/canonical"
+
+
# Remove duplicate mapping
+
thicket duplicates remove "https://example.com/dup"
+
```
+
+
## Configuration
+
+
Thicket uses a YAML configuration file (default: `thicket.yaml`):
+
+
```yaml
+
git_store: ./feeds-repo
+
cache_dir: ~/.cache/thicket
+
users:
+
- username: alice
+
feeds:
+
- https://alice.example.com/feed.xml
+
email: alice@example.com
+
homepage: https://alice.example.com
+
display_name: Alice
+
```
+
+
## Git Repository Structure
+
+
```
+
feeds-repo/
+
โ”œโ”€โ”€ index.json # User directory index
+
โ”œโ”€โ”€ duplicates.json # Duplicate entry mappings
+
โ”œโ”€โ”€ alice/
+
โ”‚ โ”œโ”€โ”€ metadata.json # User metadata
+
โ”‚ โ”œโ”€โ”€ entry_id_1.json # Feed entries
+
โ”‚ โ””โ”€โ”€ entry_id_2.json
+
โ””โ”€โ”€ bob/
+
โ””โ”€โ”€ ...
+
```
+
+
## Development
+
+
### Setup
+
```bash
+
# Install in development mode
+
pip install -e .[dev]
+
+
# Run tests
+
pytest
+
+
# Run linting
+
ruff check src/
+
black --check src/
+
+
# Run type checking
+
mypy src/
+
```
+
+
### Architecture
+
+
- **CLI**: Modern interface with Typer and Rich
+
- **Feed Processing**: Universal parsing with feedparser
+
- **Git Storage**: Structured storage with GitPython
+
- **Data Models**: Pydantic for validation and serialization
+
- **Async HTTP**: httpx for efficient feed fetching
+
+
## Use Cases
+
+
- **Blog Aggregation**: Collect and archive blog posts from multiple sources
+
- **Comment Networks**: Enable distributed commenting systems
+
- **Feed Archival**: Preserve feed history beyond typical feed depth limits
+
- **Content Curation**: Manage and deduplicate content across feeds
+
+
## License
+
+
MIT License - see LICENSE file for details.
+
</file>
+
+
<file path="src/thicket/cli/commands/index_cmd.py">
+
"""CLI command for building reference index from blog entries."""
+
+
import json
+
from pathlib import Path
+
from typing import Optional
+
+
import typer
+
from rich.console import Console
+
from rich.progress import (
+
BarColumn,
+
Progress,
+
SpinnerColumn,
+
TaskProgressColumn,
+
TextColumn,
+
)
+
from rich.table import Table
+
+
from ...core.git_store import GitStore
+
from ...core.reference_parser import ReferenceIndex, ReferenceParser
+
from ..main import app
+
from ..utils import get_tsv_mode, load_config
+
+
console = Console()
+
+
+
@app.command()
+
def index(
+
config_file: Optional[Path] = typer.Option(
+
None,
+
"--config",
+
"-c",
+
help="Path to configuration file",
+
),
+
output_file: Optional[Path] = typer.Option(
+
None,
+
"--output",
+
"-o",
+
help="Path to output index file (default: updates links.json in git store)",
+
),
+
verbose: bool = typer.Option(
+
False,
+
"--verbose",
+
"-v",
+
help="Show detailed progress information",
+
),
+
) -> None:
+
"""Build a reference index showing which blog entries reference others.
+
+
This command analyzes all blog entries to detect cross-references between
+
different blogs, creating an index that can be used to build threaded
+
views of related content.
+
+
Updates the unified links.json file with reference data.
+
"""
+
try:
+
# Load configuration
+
config = load_config(config_file)
+
+
# Initialize Git store
+
git_store = GitStore(config.git_store)
+
+
# Initialize reference parser
+
parser = ReferenceParser()
+
+
# Build user domain mapping
+
if verbose:
+
console.print("Building user domain mapping...")
+
user_domains = parser.build_user_domain_mapping(git_store)
+
+
if verbose:
+
console.print(f"Found {len(user_domains)} users with {sum(len(d) for d in user_domains.values())} total domains")
+
+
# Initialize reference index
+
ref_index = ReferenceIndex()
+
ref_index.user_domains = user_domains
+
+
# Get all users
+
index = git_store._load_index()
+
users = list(index.users.keys())
+
+
if not users:
+
console.print("[yellow]No users found in Git store[/yellow]")
+
raise typer.Exit(0)
+
+
# Process all entries
+
total_entries = 0
+
total_references = 0
+
all_references = []
+
+
with Progress(
+
SpinnerColumn(),
+
TextColumn("[progress.description]{task.description}"),
+
BarColumn(),
+
TaskProgressColumn(),
+
console=console,
+
) as progress:
+
+
# Count total entries first
+
counting_task = progress.add_task("Counting entries...", total=len(users))
+
entry_counts = {}
+
for username in users:
+
entries = git_store.list_entries(username)
+
entry_counts[username] = len(entries)
+
total_entries += len(entries)
+
progress.advance(counting_task)
+
+
progress.remove_task(counting_task)
+
+
# Process entries - extract references
+
processing_task = progress.add_task(
+
f"Extracting references from {total_entries} entries...",
+
total=total_entries
+
)
+
+
for username in users:
+
entries = git_store.list_entries(username)
+
+
for entry in entries:
+
# Extract references from this entry
+
references = parser.extract_references(entry, username, user_domains)
+
all_references.extend(references)
+
+
progress.advance(processing_task)
+
+
if verbose and references:
+
console.print(f" Found {len(references)} references in {username}:{entry.title[:50]}...")
+
+
progress.remove_task(processing_task)
+
+
# Resolve target_entry_ids for references
+
if all_references:
+
resolve_task = progress.add_task(
+
f"Resolving {len(all_references)} references...",
+
total=len(all_references)
+
)
+
+
if verbose:
+
console.print(f"Resolving target entry IDs for {len(all_references)} references...")
+
+
resolved_references = parser.resolve_target_entry_ids(all_references, git_store)
+
+
# Count resolved references
+
resolved_count = sum(1 for ref in resolved_references if ref.target_entry_id is not None)
+
if verbose:
+
console.print(f"Resolved {resolved_count} out of {len(all_references)} references")
+
+
# Add resolved references to index
+
for ref in resolved_references:
+
ref_index.add_reference(ref)
+
total_references += 1
+
progress.advance(resolve_task)
+
+
progress.remove_task(resolve_task)
+
+
# Determine output path
+
if output_file:
+
output_path = output_file
+
else:
+
output_path = config.git_store / "links.json"
+
+
# Load existing links data or create new structure
+
if output_path.exists() and not output_file:
+
# Load existing unified structure
+
with open(output_path) as f:
+
existing_data = json.load(f)
+
else:
+
# Create new structure
+
existing_data = {
+
"links": {},
+
"reverse_mapping": {},
+
"user_domains": {}
+
}
+
+
# Update with reference data
+
existing_data["references"] = ref_index.to_dict()["references"]
+
existing_data["user_domains"] = {k: list(v) for k, v in user_domains.items()}
+
+
# Save updated structure
+
with open(output_path, "w") as f:
+
json.dump(existing_data, f, indent=2, default=str)
+
+
# Show summary
+
if not get_tsv_mode():
+
console.print("\n[green]โœ“ Reference index built successfully[/green]")
+
+
# Create summary table or TSV output
+
if get_tsv_mode():
+
print("Metric\tCount")
+
print(f"Total Users\t{len(users)}")
+
print(f"Total Entries\t{total_entries}")
+
print(f"Total References\t{total_references}")
+
print(f"Outbound Refs\t{len(ref_index.outbound_refs)}")
+
print(f"Inbound Refs\t{len(ref_index.inbound_refs)}")
+
print(f"Output File\t{output_path}")
+
else:
+
table = Table(title="Reference Index Summary")
+
table.add_column("Metric", style="cyan")
+
table.add_column("Count", style="green")
+
+
table.add_row("Total Users", str(len(users)))
+
table.add_row("Total Entries", str(total_entries))
+
table.add_row("Total References", str(total_references))
+
table.add_row("Outbound Refs", str(len(ref_index.outbound_refs)))
+
table.add_row("Inbound Refs", str(len(ref_index.inbound_refs)))
+
table.add_row("Output File", str(output_path))
+
+
console.print(table)
+
+
# Show some interesting statistics
+
if total_references > 0:
+
if not get_tsv_mode():
+
console.print("\n[bold]Reference Statistics:[/bold]")
+
+
# Most referenced users
+
target_counts = {}
+
unresolved_domains = set()
+
+
for ref in ref_index.references:
+
if ref.target_username:
+
target_counts[ref.target_username] = target_counts.get(ref.target_username, 0) + 1
+
else:
+
# Track unresolved domains
+
from urllib.parse import urlparse
+
domain = urlparse(ref.target_url).netloc.lower()
+
unresolved_domains.add(domain)
+
+
if target_counts:
+
if get_tsv_mode():
+
print("Referenced User\tReference Count")
+
for username, count in sorted(target_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
+
print(f"{username}\t{count}")
+
else:
+
console.print("\nMost referenced users:")
+
for username, count in sorted(target_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
+
console.print(f" {username}: {count} references")
+
+
if unresolved_domains and verbose:
+
if get_tsv_mode():
+
print("Unresolved Domain\tCount")
+
for domain in sorted(list(unresolved_domains)[:10]):
+
print(f"{domain}\t1")
+
if len(unresolved_domains) > 10:
+
print(f"... and {len(unresolved_domains) - 10} more\t...")
+
else:
+
console.print(f"\nUnresolved domains: {len(unresolved_domains)}")
+
for domain in sorted(list(unresolved_domains)[:10]):
+
console.print(f" {domain}")
+
if len(unresolved_domains) > 10:
+
console.print(f" ... and {len(unresolved_domains) - 10} more")
+
+
except Exception as e:
+
console.print(f"[red]Error building reference index: {e}[/red]")
+
if verbose:
+
console.print_exception()
+
raise typer.Exit(1)
+
+
+
@app.command()
+
def threads(
+
config_file: Optional[Path] = typer.Option(
+
None,
+
"--config",
+
"-c",
+
help="Path to configuration file",
+
),
+
index_file: Optional[Path] = typer.Option(
+
None,
+
"--index",
+
"-i",
+
help="Path to reference index file (default: links.json in git store)",
+
),
+
username: Optional[str] = typer.Option(
+
None,
+
"--username",
+
"-u",
+
help="Show threads for specific username only",
+
),
+
entry_id: Optional[str] = typer.Option(
+
None,
+
"--entry",
+
"-e",
+
help="Show thread for specific entry ID",
+
),
+
min_size: int = typer.Option(
+
2,
+
"--min-size",
+
"-m",
+
help="Minimum thread size to display",
+
),
+
) -> None:
+
"""Show threaded view of related blog entries.
+
+
This command uses the reference index to show which blog entries
+
are connected through cross-references, creating an email-style
+
threaded view of the conversation.
+
+
Reads reference data from the unified links.json file.
+
"""
+
try:
+
# Load configuration
+
config = load_config(config_file)
+
+
# Determine index file path
+
if index_file:
+
index_path = index_file
+
else:
+
index_path = config.git_store / "links.json"
+
+
if not index_path.exists():
+
console.print(f"[red]Links file not found: {index_path}[/red]")
+
console.print("Run 'thicket links' and 'thicket index' first to build the reference index")
+
raise typer.Exit(1)
+
+
# Load unified data
+
with open(index_path) as f:
+
unified_data = json.load(f)
+
+
# Check if references exist in the unified structure
+
if "references" not in unified_data:
+
console.print(f"[red]No references found in {index_path}[/red]")
+
console.print("Run 'thicket index' first to build the reference index")
+
raise typer.Exit(1)
+
+
# Extract reference data and reconstruct ReferenceIndex
+
ref_index = ReferenceIndex.from_dict({
+
"references": unified_data["references"],
+
"user_domains": unified_data.get("user_domains", {})
+
})
+
+
# Initialize Git store to get entry details
+
git_store = GitStore(config.git_store)
+
+
if entry_id and username:
+
# Show specific thread
+
thread_members = ref_index.get_thread_members(username, entry_id)
+
_display_thread(thread_members, ref_index, git_store, f"Thread for {username}:{entry_id}")
+
+
elif username:
+
# Show all threads involving this user
+
user_index = git_store._load_index()
+
user = user_index.get_user(username)
+
if not user:
+
console.print(f"[red]User not found: {username}[/red]")
+
raise typer.Exit(1)
+
+
entries = git_store.list_entries(username)
+
threads_found = set()
+
+
console.print(f"[bold]Threads involving {username}:[/bold]\n")
+
+
for entry in entries:
+
thread_members = ref_index.get_thread_members(username, entry.id)
+
if len(thread_members) >= min_size:
+
thread_key = tuple(sorted(thread_members))
+
if thread_key not in threads_found:
+
threads_found.add(thread_key)
+
_display_thread(thread_members, ref_index, git_store, f"Thread #{len(threads_found)}")
+
+
else:
+
# Show all threads
+
console.print("[bold]All conversation threads:[/bold]\n")
+
+
all_threads = set()
+
processed_entries = set()
+
+
# Get all entries
+
user_index = git_store._load_index()
+
for username in user_index.users.keys():
+
entries = git_store.list_entries(username)
+
for entry in entries:
+
entry_key = (username, entry.id)
+
if entry_key in processed_entries:
+
continue
+
+
thread_members = ref_index.get_thread_members(username, entry.id)
+
if len(thread_members) >= min_size:
+
thread_key = tuple(sorted(thread_members))
+
if thread_key not in all_threads:
+
all_threads.add(thread_key)
+
_display_thread(thread_members, ref_index, git_store, f"Thread #{len(all_threads)}")
+
+
# Mark all members as processed
+
for member in thread_members:
+
processed_entries.add(member)
+
+
if not all_threads:
+
console.print("[yellow]No conversation threads found[/yellow]")
+
console.print(f"(minimum thread size: {min_size})")
+
+
except Exception as e:
+
console.print(f"[red]Error showing threads: {e}[/red]")
+
raise typer.Exit(1)
+
+
+
def _display_thread(thread_members, ref_index, git_store, title):
+
"""Display a single conversation thread."""
+
console.print(f"[bold cyan]{title}[/bold cyan]")
+
console.print(f"Thread size: {len(thread_members)} entries")
+
+
# Get entry details for each member
+
thread_entries = []
+
for username, entry_id in thread_members:
+
entry = git_store.get_entry(username, entry_id)
+
if entry:
+
thread_entries.append((username, entry))
+
+
# Sort by publication date
+
thread_entries.sort(key=lambda x: x[1].published or x[1].updated)
+
+
# Display entries
+
for i, (username, entry) in enumerate(thread_entries):
+
prefix = "โ”œโ”€" if i < len(thread_entries) - 1 else "โ””โ”€"
+
+
# Get references for this entry
+
outbound = ref_index.get_outbound_refs(username, entry.id)
+
inbound = ref_index.get_inbound_refs(username, entry.id)
+
+
ref_info = ""
+
if outbound or inbound:
+
ref_info = f" ({len(outbound)} out, {len(inbound)} in)"
+
+
console.print(f" {prefix} [{username}] {entry.title[:60]}...{ref_info}")
+
+
if entry.published:
+
console.print(f" Published: {entry.published.strftime('%Y-%m-%d')}")
+
+
console.print() # Empty line after each thread
+
</file>
+
+
<file path="src/thicket/cli/commands/info_cmd.py">
+
"""CLI command for displaying detailed information about a specific atom entry."""
+
+
import json
+
from pathlib import Path
+
from typing import Optional
+
+
import typer
+
from rich.console import Console
+
from rich.panel import Panel
+
from rich.table import Table
+
from rich.text import Text
+
+
from ...core.git_store import GitStore
+
from ...core.reference_parser import ReferenceIndex
+
from ..main import app
+
from ..utils import load_config, get_tsv_mode
+
+
console = Console()
+
+
+
@app.command()
+
def info(
+
identifier: str = typer.Argument(
+
...,
+
help="The atom ID or URL of the entry to display information about"
+
),
+
username: Optional[str] = typer.Option(
+
None,
+
"--username",
+
"-u",
+
help="Username to search for the entry (if not provided, searches all users)"
+
),
+
config_file: Optional[Path] = typer.Option(
+
Path("thicket.yaml"),
+
"--config",
+
"-c",
+
help="Path to configuration file",
+
),
+
show_content: bool = typer.Option(
+
False,
+
"--content",
+
help="Include the full content of the entry in the output"
+
),
+
) -> None:
+
"""Display detailed information about a specific atom entry.
+
+
You can specify the entry using either its atom ID or URL.
+
Shows all metadata for the given entry, including title, dates, categories,
+
and summarizes all inbound and outbound links to/from other posts.
+
"""
+
try:
+
# Load configuration
+
config = load_config(config_file)
+
+
# Initialize Git store
+
git_store = GitStore(config.git_store)
+
+
# Find the entry
+
entry = None
+
found_username = None
+
+
# Check if identifier looks like a URL
+
is_url = identifier.startswith(('http://', 'https://'))
+
+
if username:
+
# Search specific username
+
if is_url:
+
# Search by URL
+
entries = git_store.list_entries(username)
+
for e in entries:
+
if str(e.link) == identifier:
+
entry = e
+
found_username = username
+
break
+
else:
+
# Search by atom ID
+
entry = git_store.get_entry(username, identifier)
+
if entry:
+
found_username = username
+
else:
+
# Search all users
+
index = git_store._load_index()
+
for user in index.users.keys():
+
if is_url:
+
# Search by URL
+
entries = git_store.list_entries(user)
+
for e in entries:
+
if str(e.link) == identifier:
+
entry = e
+
found_username = user
+
break
+
if entry:
+
break
+
else:
+
# Search by atom ID
+
entry = git_store.get_entry(user, identifier)
+
if entry:
+
found_username = user
+
break
+
+
if not entry or not found_username:
+
if username:
+
console.print(f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found for user '{username}'[/red]")
+
else:
+
console.print(f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found in any user's entries[/red]")
+
raise typer.Exit(1)
+
+
# Load reference index if available
+
links_path = config.git_store / "links.json"
+
ref_index = None
+
if links_path.exists():
+
with open(links_path) as f:
+
unified_data = json.load(f)
+
+
# Check if references exist in the unified structure
+
if "references" in unified_data:
+
ref_index = ReferenceIndex.from_dict({
+
"references": unified_data["references"],
+
"user_domains": unified_data.get("user_domains", {})
+
})
+
+
# Display information
+
if get_tsv_mode():
+
_display_entry_info_tsv(entry, found_username, ref_index, show_content)
+
else:
+
_display_entry_info(entry, found_username)
+
+
if ref_index:
+
_display_link_info(entry, found_username, ref_index)
+
else:
+
console.print("\n[yellow]No reference index found. Run 'thicket links' and 'thicket index' to build cross-reference data.[/yellow]")
+
+
# Optionally display content
+
if show_content and entry.content:
+
_display_content(entry.content)
+
+
except Exception as e:
+
console.print(f"[red]Error displaying entry info: {e}[/red]")
+
raise typer.Exit(1)
+
+
+
def _display_entry_info(entry, username: str) -> None:
+
"""Display basic entry information in a structured format."""
+
+
# Create main info panel
+
info_table = Table.grid(padding=(0, 2))
+
info_table.add_column("Field", style="cyan bold", width=15)
+
info_table.add_column("Value", style="white")
+
+
info_table.add_row("User", f"[green]{username}[/green]")
+
info_table.add_row("Atom ID", f"[blue]{entry.id}[/blue]")
+
info_table.add_row("Title", entry.title)
+
info_table.add_row("Link", str(entry.link))
+
+
if entry.published:
+
info_table.add_row("Published", entry.published.strftime("%Y-%m-%d %H:%M:%S UTC"))
+
+
info_table.add_row("Updated", entry.updated.strftime("%Y-%m-%d %H:%M:%S UTC"))
+
+
if entry.summary:
+
# Truncate long summaries
+
summary = entry.summary[:200] + "..." if len(entry.summary) > 200 else entry.summary
+
info_table.add_row("Summary", summary)
+
+
if entry.categories:
+
categories_text = ", ".join(entry.categories)
+
info_table.add_row("Categories", categories_text)
+
+
if entry.author:
+
author_info = []
+
if "name" in entry.author:
+
author_info.append(entry.author["name"])
+
if "email" in entry.author:
+
author_info.append(f"<{entry.author['email']}>")
+
if author_info:
+
info_table.add_row("Author", " ".join(author_info))
+
+
if entry.content_type:
+
info_table.add_row("Content Type", entry.content_type)
+
+
if entry.rights:
+
info_table.add_row("Rights", entry.rights)
+
+
if entry.source:
+
info_table.add_row("Source Feed", entry.source)
+
+
panel = Panel(
+
info_table,
+
title=f"[bold]Entry Information[/bold]",
+
border_style="blue"
+
)
+
+
console.print(panel)
+
+
+
def _display_link_info(entry, username: str, ref_index: ReferenceIndex) -> None:
+
"""Display inbound and outbound link information."""
+
+
# Get links
+
outbound_refs = ref_index.get_outbound_refs(username, entry.id)
+
inbound_refs = ref_index.get_inbound_refs(username, entry.id)
+
+
if not outbound_refs and not inbound_refs:
+
console.print("\n[dim]No cross-references found for this entry.[/dim]")
+
return
+
+
# Create links table
+
links_table = Table(title="Cross-References")
+
links_table.add_column("Direction", style="cyan", width=10)
+
links_table.add_column("Target/Source", style="green", width=20)
+
links_table.add_column("URL", style="blue", width=50)
+
+
# Add outbound references
+
for ref in outbound_refs:
+
target_info = f"{ref.target_username}:{ref.target_entry_id}" if ref.target_username and ref.target_entry_id else "External"
+
links_table.add_row("โ†’ Out", target_info, ref.target_url)
+
+
# Add inbound references
+
for ref in inbound_refs:
+
source_info = f"{ref.source_username}:{ref.source_entry_id}"
+
links_table.add_row("โ† In", source_info, ref.target_url)
+
+
console.print()
+
console.print(links_table)
+
+
# Summary
+
console.print(f"\n[bold]Summary:[/bold] {len(outbound_refs)} outbound, {len(inbound_refs)} inbound references")
+
+
+
def _display_content(content: str) -> None:
+
"""Display the full content of the entry."""
+
+
# Truncate very long content
+
display_content = content
+
if len(content) > 5000:
+
display_content = content[:5000] + "\n\n[... content truncated ...]"
+
+
panel = Panel(
+
display_content,
+
title="[bold]Entry Content[/bold]",
+
border_style="green",
+
expand=False
+
)
+
+
console.print()
+
console.print(panel)
+
+
+
def _display_entry_info_tsv(entry, username: str, ref_index: Optional[ReferenceIndex], show_content: bool) -> None:
+
"""Display entry information in TSV format."""
+
+
# Basic info
+
print("Field\tValue")
+
print(f"User\t{username}")
+
print(f"Atom ID\t{entry.id}")
+
print(f"Title\t{entry.title.replace(chr(9), ' ').replace(chr(10), ' ').replace(chr(13), ' ')}")
+
print(f"Link\t{entry.link}")
+
+
if entry.published:
+
print(f"Published\t{entry.published.strftime('%Y-%m-%d %H:%M:%S UTC')}")
+
+
print(f"Updated\t{entry.updated.strftime('%Y-%m-%d %H:%M:%S UTC')}")
+
+
if entry.summary:
+
# Escape tabs and newlines in summary
+
summary = entry.summary.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
+
print(f"Summary\t{summary}")
+
+
if entry.categories:
+
print(f"Categories\t{', '.join(entry.categories)}")
+
+
if entry.author:
+
author_info = []
+
if "name" in entry.author:
+
author_info.append(entry.author["name"])
+
if "email" in entry.author:
+
author_info.append(f"<{entry.author['email']}>")
+
if author_info:
+
print(f"Author\t{' '.join(author_info)}")
+
+
if entry.content_type:
+
print(f"Content Type\t{entry.content_type}")
+
+
if entry.rights:
+
print(f"Rights\t{entry.rights}")
+
+
if entry.source:
+
print(f"Source Feed\t{entry.source}")
+
+
# Add reference info if available
+
if ref_index:
+
outbound_refs = ref_index.get_outbound_refs(username, entry.id)
+
inbound_refs = ref_index.get_inbound_refs(username, entry.id)
+
+
print(f"Outbound References\t{len(outbound_refs)}")
+
print(f"Inbound References\t{len(inbound_refs)}")
+
+
# Show each reference
+
for ref in outbound_refs:
+
target_info = f"{ref.target_username}:{ref.target_entry_id}" if ref.target_username and ref.target_entry_id else "External"
+
print(f"Outbound Reference\t{target_info}\t{ref.target_url}")
+
+
for ref in inbound_refs:
+
source_info = f"{ref.source_username}:{ref.source_entry_id}"
+
print(f"Inbound Reference\t{source_info}\t{ref.target_url}")
+
+
# Show content if requested
+
if show_content and entry.content:
+
# Escape tabs and newlines in content
+
content = entry.content.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
+
print(f"Content\t{content}")
+
</file>
+
+
<file path="src/thicket/cli/commands/init.py">
+
"""Initialize command for thicket."""
+
+
from pathlib import Path
+
from typing import Optional
+
+
import typer
+
from pydantic import ValidationError
+
+
from ...core.git_store import GitStore
+
from ...models import ThicketConfig
+
from ..main import app
+
from ..utils import print_error, print_success, save_config
+
+
+
@app.command()
+
def init(
+
git_store: Path = typer.Argument(..., help="Path to Git repository for storing feeds"),
+
cache_dir: Optional[Path] = typer.Option(
+
None, "--cache-dir", "-c", help="Cache directory (default: ~/.cache/thicket)"
+
),
+
config_file: Optional[Path] = typer.Option(
+
None, "--config", help="Configuration file path (default: thicket.yaml)"
+
),
+
force: bool = typer.Option(
+
False, "--force", "-f", help="Overwrite existing configuration"
+
),
+
) -> None:
+
"""Initialize a new thicket configuration and Git store."""
+
+
# Set default paths
+
if cache_dir is None:
+
from platformdirs import user_cache_dir
+
cache_dir = Path(user_cache_dir("thicket"))
+
+
if config_file is None:
+
config_file = Path("thicket.yaml")
+
+
# Check if config already exists
+
if config_file.exists() and not force:
+
print_error(f"Configuration file already exists: {config_file}")
+
print_error("Use --force to overwrite")
+
raise typer.Exit(1)
+
+
# Create cache directory
+
cache_dir.mkdir(parents=True, exist_ok=True)
+
+
# Create Git store
+
try:
+
GitStore(git_store)
+
print_success(f"Initialized Git store at: {git_store}")
+
except Exception as e:
+
print_error(f"Failed to initialize Git store: {e}")
+
raise typer.Exit(1) from e
+
+
# Create configuration
+
try:
+
config = ThicketConfig(
+
git_store=git_store,
+
cache_dir=cache_dir,
+
users=[]
+
)
+
+
save_config(config, config_file)
+
print_success(f"Created configuration file: {config_file}")
+
+
except ValidationError as e:
+
print_error(f"Invalid configuration: {e}")
+
raise typer.Exit(1) from e
+
except Exception as e:
+
print_error(f"Failed to create configuration: {e}")
+
raise typer.Exit(1) from e
+
+
print_success("Thicket initialized successfully!")
+
print_success(f"Git store: {git_store}")
+
print_success(f"Cache directory: {cache_dir}")
+
print_success(f"Configuration: {config_file}")
+
print_success("Run 'thicket add user' to add your first user and feed.")
+
</file>
+
+
<file path="src/thicket/cli/__init__.py">
+
"""CLI interface for thicket."""
+
+
from .main import app
+
+
__all__ = ["app"]
+
</file>
+
+
<file path="src/thicket/core/__init__.py">
+
"""Core business logic for thicket."""
+
+
from .feed_parser import FeedParser
+
from .git_store import GitStore
+
+
__all__ = ["FeedParser", "GitStore"]
+
</file>
+
+
<file path="src/thicket/core/feed_parser.py">
+
"""Feed parsing and normalization with auto-discovery."""
+
+
from datetime import datetime
+
from typing import Optional
+
from urllib.parse import urlparse
+
+
import bleach
+
import feedparser
+
import httpx
+
from pydantic import HttpUrl, ValidationError
+
+
from ..models import AtomEntry, FeedMetadata
+
+
+
class FeedParser:
+
"""Parser for RSS/Atom feeds with normalization and auto-discovery."""
+
+
def __init__(self, user_agent: str = "thicket/0.1.0"):
+
"""Initialize the feed parser."""
+
self.user_agent = user_agent
+
self.allowed_tags = [
+
"a", "abbr", "acronym", "b", "blockquote", "br", "code", "em",
+
"i", "li", "ol", "p", "pre", "strong", "ul", "h1", "h2", "h3",
+
"h4", "h5", "h6", "img", "div", "span",
+
]
+
self.allowed_attributes = {
+
"a": ["href", "title"],
+
"abbr": ["title"],
+
"acronym": ["title"],
+
"img": ["src", "alt", "title", "width", "height"],
+
"blockquote": ["cite"],
+
}
+
+
async def fetch_feed(self, url: HttpUrl) -> str:
+
"""Fetch feed content from URL."""
+
async with httpx.AsyncClient() as client:
+
response = await client.get(
+
str(url),
+
headers={"User-Agent": self.user_agent},
+
timeout=30.0,
+
follow_redirects=True,
+
)
+
response.raise_for_status()
+
return response.text
+
+
def parse_feed(self, content: str, source_url: Optional[HttpUrl] = None) -> tuple[FeedMetadata, list[AtomEntry]]:
+
"""Parse feed content and return metadata and entries."""
+
parsed = feedparser.parse(content)
+
+
if parsed.bozo and parsed.bozo_exception:
+
# Try to continue with potentially malformed feed
+
pass
+
+
# Extract feed metadata
+
feed_meta = self._extract_feed_metadata(parsed.feed)
+
+
# Extract and normalize entries
+
entries = []
+
for entry in parsed.entries:
+
try:
+
atom_entry = self._normalize_entry(entry, source_url)
+
entries.append(atom_entry)
+
except Exception as e:
+
# Log error but continue processing other entries
+
print(f"Error processing entry {getattr(entry, 'id', 'unknown')}: {e}")
+
continue
+
+
return feed_meta, entries
+
+
def _extract_feed_metadata(self, feed: feedparser.FeedParserDict) -> FeedMetadata:
+
"""Extract metadata from feed for auto-discovery."""
+
# Parse author information
+
author_name = None
+
author_email = None
+
author_uri = None
+
+
if hasattr(feed, 'author_detail'):
+
author_name = feed.author_detail.get('name')
+
author_email = feed.author_detail.get('email')
+
author_uri = feed.author_detail.get('href')
+
elif hasattr(feed, 'author'):
+
author_name = feed.author
+
+
# Parse managing editor for RSS feeds
+
if not author_email and hasattr(feed, 'managingEditor'):
+
author_email = feed.managingEditor
+
+
# Parse feed link
+
feed_link = None
+
if hasattr(feed, 'link'):
+
try:
+
feed_link = HttpUrl(feed.link)
+
except ValidationError:
+
pass
+
+
# Parse image/icon/logo
+
logo = None
+
icon = None
+
image_url = None
+
+
if hasattr(feed, 'image'):
+
try:
+
image_url = HttpUrl(feed.image.get('href', feed.image.get('url', '')))
+
except (ValidationError, AttributeError):
+
pass
+
+
if hasattr(feed, 'icon'):
+
try:
+
icon = HttpUrl(feed.icon)
+
except ValidationError:
+
pass
+
+
if hasattr(feed, 'logo'):
+
try:
+
logo = HttpUrl(feed.logo)
+
except ValidationError:
+
pass
+
+
return FeedMetadata(
+
title=getattr(feed, 'title', None),
+
author_name=author_name,
+
author_email=author_email,
+
author_uri=HttpUrl(author_uri) if author_uri else None,
+
link=feed_link,
+
logo=logo,
+
icon=icon,
+
image_url=image_url,
+
description=getattr(feed, 'description', None),
+
)
+
+
def _normalize_entry(self, entry: feedparser.FeedParserDict, source_url: Optional[HttpUrl] = None) -> AtomEntry:
+
"""Normalize an entry to Atom format."""
+
# Parse timestamps
+
updated = self._parse_timestamp(entry.get('updated_parsed') or entry.get('published_parsed'))
+
published = self._parse_timestamp(entry.get('published_parsed'))
+
+
# Parse content
+
content = self._extract_content(entry)
+
content_type = self._extract_content_type(entry)
+
+
# Parse author
+
author = self._extract_author(entry)
+
+
# Parse categories/tags
+
categories = []
+
if hasattr(entry, 'tags'):
+
categories = [tag.get('term', '') for tag in entry.tags if tag.get('term')]
+
+
# Sanitize HTML content
+
if content:
+
content = self._sanitize_html(content)
+
+
summary = entry.get('summary', '')
+
if summary:
+
summary = self._sanitize_html(summary)
+
+
return AtomEntry(
+
id=entry.get('id', entry.get('link', '')),
+
title=entry.get('title', ''),
+
link=HttpUrl(entry.get('link', '')),
+
updated=updated,
+
published=published,
+
summary=summary or None,
+
content=content or None,
+
content_type=content_type,
+
author=author,
+
categories=categories,
+
rights=entry.get('rights', None),
+
source=str(source_url) if source_url else None,
+
)
+
+
def _parse_timestamp(self, time_struct) -> datetime:
+
"""Parse feedparser time struct to datetime."""
+
if time_struct:
+
return datetime(*time_struct[:6])
+
return datetime.now()
+
+
def _extract_content(self, entry: feedparser.FeedParserDict) -> Optional[str]:
+
"""Extract the best content from an entry."""
+
# Prefer content over summary
+
if hasattr(entry, 'content') and entry.content:
+
# Find the best content (prefer text/html, then text/plain)
+
for content_item in entry.content:
+
if content_item.get('type') in ['text/html', 'html']:
+
return content_item.get('value', '')
+
elif content_item.get('type') in ['text/plain', 'text']:
+
return content_item.get('value', '')
+
# Fallback to first content item
+
return entry.content[0].get('value', '')
+
+
# Fallback to summary
+
return entry.get('summary', '')
+
+
def _extract_content_type(self, entry: feedparser.FeedParserDict) -> str:
+
"""Extract content type from entry."""
+
if hasattr(entry, 'content') and entry.content:
+
content_type = entry.content[0].get('type', 'html')
+
# Normalize content type
+
if content_type in ['text/html', 'html']:
+
return 'html'
+
elif content_type in ['text/plain', 'text']:
+
return 'text'
+
elif content_type == 'xhtml':
+
return 'xhtml'
+
return 'html'
+
+
def _extract_author(self, entry: feedparser.FeedParserDict) -> Optional[dict]:
+
"""Extract author information from entry."""
+
author = {}
+
+
if hasattr(entry, 'author_detail'):
+
author.update({
+
'name': entry.author_detail.get('name'),
+
'email': entry.author_detail.get('email'),
+
'uri': entry.author_detail.get('href'),
+
})
+
elif hasattr(entry, 'author'):
+
author['name'] = entry.author
+
+
return author if author else None
+
+
def _sanitize_html(self, html: str) -> str:
+
"""Sanitize HTML content to prevent XSS."""
+
return bleach.clean(
+
html,
+
tags=self.allowed_tags,
+
attributes=self.allowed_attributes,
+
strip=True,
+
)
+
+
def sanitize_entry_id(self, entry_id: str) -> str:
+
"""Sanitize entry ID to be a safe filename."""
+
# Parse URL to get meaningful parts
+
parsed = urlparse(entry_id)
+
+
# Start with the path component
+
if parsed.path:
+
# Remove leading slash and replace problematic characters
+
safe_id = parsed.path.lstrip('/').replace('/', '_').replace('\\', '_')
+
else:
+
# Use the entire ID as fallback
+
safe_id = entry_id
+
+
# Replace problematic characters
+
safe_chars = []
+
for char in safe_id:
+
if char.isalnum() or char in '-_.':
+
safe_chars.append(char)
+
else:
+
safe_chars.append('_')
+
+
safe_id = ''.join(safe_chars)
+
+
# Ensure it's not too long (max 200 chars)
+
if len(safe_id) > 200:
+
safe_id = safe_id[:200]
+
+
# Ensure it's not empty
+
if not safe_id:
+
safe_id = "entry"
+
+
return safe_id
+
</file>
+
+
<file path="src/thicket/core/reference_parser.py">
+
"""Reference detection and parsing for blog entries."""
+
+
import re
+
from typing import Optional
+
from urllib.parse import urlparse
+
+
from ..models import AtomEntry
+
+
+
class BlogReference:
+
"""Represents a reference from one blog entry to another."""
+
+
def __init__(
+
self,
+
source_entry_id: str,
+
source_username: str,
+
target_url: str,
+
target_username: Optional[str] = None,
+
target_entry_id: Optional[str] = None,
+
):
+
self.source_entry_id = source_entry_id
+
self.source_username = source_username
+
self.target_url = target_url
+
self.target_username = target_username
+
self.target_entry_id = target_entry_id
+
+
def to_dict(self) -> dict:
+
"""Convert to dictionary for JSON serialization."""
+
result = {
+
"source_entry_id": self.source_entry_id,
+
"source_username": self.source_username,
+
"target_url": self.target_url,
+
}
+
+
# Only include optional fields if they are not None
+
if self.target_username is not None:
+
result["target_username"] = self.target_username
+
if self.target_entry_id is not None:
+
result["target_entry_id"] = self.target_entry_id
+
+
return result
+
+
@classmethod
+
def from_dict(cls, data: dict) -> "BlogReference":
+
"""Create from dictionary."""
+
return cls(
+
source_entry_id=data["source_entry_id"],
+
source_username=data["source_username"],
+
target_url=data["target_url"],
+
target_username=data.get("target_username"),
+
target_entry_id=data.get("target_entry_id"),
+
)
+
+
+
class ReferenceIndex:
+
"""Index of blog-to-blog references for creating threaded views."""
+
+
def __init__(self):
+
self.references: list[BlogReference] = []
+
self.outbound_refs: dict[
+
str, list[BlogReference]
+
] = {} # entry_id -> outbound refs
+
self.inbound_refs: dict[
+
str, list[BlogReference]
+
] = {} # entry_id -> inbound refs
+
self.user_domains: dict[str, set[str]] = {} # username -> set of domains
+
+
def add_reference(self, ref: BlogReference) -> None:
+
"""Add a reference to the index."""
+
self.references.append(ref)
+
+
# Update outbound references
+
source_key = f"{ref.source_username}:{ref.source_entry_id}"
+
if source_key not in self.outbound_refs:
+
self.outbound_refs[source_key] = []
+
self.outbound_refs[source_key].append(ref)
+
+
# Update inbound references if we can identify the target
+
if ref.target_username and ref.target_entry_id:
+
target_key = f"{ref.target_username}:{ref.target_entry_id}"
+
if target_key not in self.inbound_refs:
+
self.inbound_refs[target_key] = []
+
self.inbound_refs[target_key].append(ref)
+
+
def get_outbound_refs(self, username: str, entry_id: str) -> list[BlogReference]:
+
"""Get all outbound references from an entry."""
+
key = f"{username}:{entry_id}"
+
return self.outbound_refs.get(key, [])
+
+
def get_inbound_refs(self, username: str, entry_id: str) -> list[BlogReference]:
+
"""Get all inbound references to an entry."""
+
key = f"{username}:{entry_id}"
+
return self.inbound_refs.get(key, [])
+
+
def get_thread_members(self, username: str, entry_id: str) -> set[tuple[str, str]]:
+
"""Get all entries that are part of the same thread."""
+
visited = set()
+
to_visit = [(username, entry_id)]
+
thread_members = set()
+
+
while to_visit:
+
current_user, current_entry = to_visit.pop()
+
if (current_user, current_entry) in visited:
+
continue
+
+
visited.add((current_user, current_entry))
+
thread_members.add((current_user, current_entry))
+
+
# Add outbound references
+
for ref in self.get_outbound_refs(current_user, current_entry):
+
if ref.target_username and ref.target_entry_id:
+
to_visit.append((ref.target_username, ref.target_entry_id))
+
+
# Add inbound references
+
for ref in self.get_inbound_refs(current_user, current_entry):
+
to_visit.append((ref.source_username, ref.source_entry_id))
+
+
return thread_members
+
+
def to_dict(self) -> dict:
+
"""Convert to dictionary for JSON serialization."""
+
return {
+
"references": [ref.to_dict() for ref in self.references],
+
"user_domains": {k: list(v) for k, v in self.user_domains.items()},
+
}
+
+
@classmethod
+
def from_dict(cls, data: dict) -> "ReferenceIndex":
+
"""Create from dictionary."""
+
index = cls()
+
for ref_data in data.get("references", []):
+
ref = BlogReference.from_dict(ref_data)
+
index.add_reference(ref)
+
+
for username, domains in data.get("user_domains", {}).items():
+
index.user_domains[username] = set(domains)
+
+
return index
+
+
+
class ReferenceParser:
+
"""Parses blog entries to detect references to other blogs."""
+
+
def __init__(self):
+
# Common blog platforms and patterns
+
self.blog_patterns = [
+
r"https?://[^/]+\.(?:org|com|net|io|dev|me|co\.uk)/.*", # Common blog domains
+
r"https?://[^/]+\.github\.io/.*", # GitHub Pages
+
r"https?://[^/]+\.substack\.com/.*", # Substack
+
r"https?://medium\.com/.*", # Medium
+
r"https?://[^/]+\.wordpress\.com/.*", # WordPress.com
+
r"https?://[^/]+\.blogspot\.com/.*", # Blogger
+
]
+
+
# Compile regex patterns
+
self.link_pattern = re.compile(
+
r'<a[^>]+href="([^"]+)"[^>]*>(.*?)</a>', re.IGNORECASE | re.DOTALL
+
)
+
self.url_pattern = re.compile(r'https?://[^\s<>"]+')
+
+
def extract_links_from_html(self, html_content: str) -> list[tuple[str, str]]:
+
"""Extract all links from HTML content."""
+
links = []
+
+
# Extract links from <a> tags
+
for match in self.link_pattern.finditer(html_content):
+
url = match.group(1)
+
text = re.sub(
+
r"<[^>]+>", "", match.group(2)
+
).strip() # Remove HTML tags from link text
+
links.append((url, text))
+
+
return links
+
+
def is_blog_url(self, url: str) -> bool:
+
"""Check if a URL likely points to a blog post."""
+
for pattern in self.blog_patterns:
+
if re.match(pattern, url):
+
return True
+
return False
+
+
def _is_likely_blog_post_url(self, url: str) -> bool:
+
"""Check if a same-domain URL likely points to a blog post (not CSS, images, etc.)."""
+
parsed_url = urlparse(url)
+
path = parsed_url.path.lower()
+
+
# Skip obvious non-blog content
+
if any(path.endswith(ext) for ext in ['.css', '.js', '.png', '.jpg', '.jpeg', '.gif', '.svg', '.ico', '.pdf', '.xml', '.json']):
+
return False
+
+
# Skip common non-blog paths
+
if any(segment in path for segment in ['/static/', '/assets/', '/css/', '/js/', '/images/', '/img/', '/media/', '/uploads/']):
+
return False
+
+
# Skip fragment-only links (same page anchors)
+
if not path or path == '/':
+
return False
+
+
# Look for positive indicators of blog posts
+
# Common blog post patterns: dates, slugs, post indicators
+
blog_indicators = [
+
r'/\d{4}/', # Year in path
+
r'/\d{4}/\d{2}/', # Year/month in path
+
r'/blog/',
+
r'/post/',
+
r'/posts/',
+
r'/articles?/',
+
r'/notes?/',
+
r'/entries/',
+
r'/writing/',
+
]
+
+
for pattern in blog_indicators:
+
if re.search(pattern, path):
+
return True
+
+
# If it has a reasonable path depth and doesn't match exclusions, likely a blog post
+
path_segments = [seg for seg in path.split('/') if seg]
+
return len(path_segments) >= 1 # At least one meaningful path segment
+
+
def resolve_target_user(
+
self, url: str, user_domains: dict[str, set[str]]
+
) -> Optional[str]:
+
"""Try to resolve a URL to a known user based on domain mapping."""
+
parsed_url = urlparse(url)
+
domain = parsed_url.netloc.lower()
+
+
for username, domains in user_domains.items():
+
if domain in domains:
+
return username
+
+
return None
+
+
def extract_references(
+
self, entry: AtomEntry, username: str, user_domains: dict[str, set[str]]
+
) -> list[BlogReference]:
+
"""Extract all blog references from an entry."""
+
references = []
+
+
# Combine all text content for analysis
+
content_to_search = []
+
if entry.content:
+
content_to_search.append(entry.content)
+
if entry.summary:
+
content_to_search.append(entry.summary)
+
+
for content in content_to_search:
+
links = self.extract_links_from_html(content)
+
+
for url, _link_text in links:
+
entry_domain = (
+
urlparse(str(entry.link)).netloc.lower() if entry.link else ""
+
)
+
link_domain = urlparse(url).netloc.lower()
+
+
# Check if this looks like a blog URL
+
if not self.is_blog_url(url):
+
continue
+
+
# For same-domain links, apply additional filtering to avoid non-blog content
+
if link_domain == entry_domain:
+
# Only include same-domain links that look like blog posts
+
if not self._is_likely_blog_post_url(url):
+
continue
+
+
# Try to resolve to a known user
+
if link_domain == entry_domain:
+
# Same domain - target user is the same as source user
+
target_username: Optional[str] = username
+
else:
+
# Different domain - try to resolve
+
target_username = self.resolve_target_user(url, user_domains)
+
+
ref = BlogReference(
+
source_entry_id=entry.id,
+
source_username=username,
+
target_url=url,
+
target_username=target_username,
+
target_entry_id=None, # Will be resolved later if possible
+
)
+
+
references.append(ref)
+
+
return references
+
+
def build_user_domain_mapping(self, git_store: "GitStore") -> dict[str, set[str]]:
+
"""Build mapping of usernames to their known domains."""
+
user_domains = {}
+
index = git_store._load_index()
+
+
for username, user_metadata in index.users.items():
+
domains = set()
+
+
# Add domains from feeds
+
for feed_url in user_metadata.feeds:
+
domain = urlparse(feed_url).netloc.lower()
+
if domain:
+
domains.add(domain)
+
+
# Add domain from homepage
+
if user_metadata.homepage:
+
domain = urlparse(str(user_metadata.homepage)).netloc.lower()
+
if domain:
+
domains.add(domain)
+
+
user_domains[username] = domains
+
+
return user_domains
+
+
def _build_url_to_entry_mapping(self, git_store: "GitStore") -> dict[str, str]:
+
"""Build a comprehensive mapping from URLs to entry IDs using git store data.
+
+
This creates a bidirectional mapping that handles:
+
- Entry link URLs -> Entry IDs
+
- URL variations (with/without www, http/https)
+
- Multiple URLs pointing to the same entry
+
"""
+
url_to_entry: dict[str, str] = {}
+
+
# Load index to get all users
+
index = git_store._load_index()
+
+
for username in index.users.keys():
+
entries = git_store.list_entries(username)
+
+
for entry in entries:
+
if entry.link:
+
link_url = str(entry.link)
+
entry_id = entry.id
+
+
# Map the canonical link URL
+
url_to_entry[link_url] = entry_id
+
+
# Handle common URL variations
+
parsed = urlparse(link_url)
+
if parsed.netloc and parsed.path:
+
# Add version without www
+
if parsed.netloc.startswith('www.'):
+
no_www_url = f"{parsed.scheme}://{parsed.netloc[4:]}{parsed.path}"
+
if parsed.query:
+
no_www_url += f"?{parsed.query}"
+
if parsed.fragment:
+
no_www_url += f"#{parsed.fragment}"
+
url_to_entry[no_www_url] = entry_id
+
+
# Add version with www if not present
+
elif not parsed.netloc.startswith('www.'):
+
www_url = f"{parsed.scheme}://www.{parsed.netloc}{parsed.path}"
+
if parsed.query:
+
www_url += f"?{parsed.query}"
+
if parsed.fragment:
+
www_url += f"#{parsed.fragment}"
+
url_to_entry[www_url] = entry_id
+
+
# Add http/https variations
+
if parsed.scheme == 'https':
+
http_url = link_url.replace('https://', 'http://', 1)
+
url_to_entry[http_url] = entry_id
+
elif parsed.scheme == 'http':
+
https_url = link_url.replace('http://', 'https://', 1)
+
url_to_entry[https_url] = entry_id
+
+
return url_to_entry
+
+
def _normalize_url(self, url: str) -> str:
+
"""Normalize URL for consistent matching.
+
+
Handles common variations like trailing slashes, fragments, etc.
+
"""
+
parsed = urlparse(url)
+
+
# Remove trailing slash from path
+
path = parsed.path.rstrip('/') if parsed.path != '/' else parsed.path
+
+
# Reconstruct without fragment for consistent matching
+
normalized = f"{parsed.scheme}://{parsed.netloc}{path}"
+
if parsed.query:
+
normalized += f"?{parsed.query}"
+
+
return normalized
+
+
def resolve_target_entry_ids(
+
self, references: list[BlogReference], git_store: "GitStore"
+
) -> list[BlogReference]:
+
"""Resolve target_entry_id for references using comprehensive URL mapping."""
+
resolved_refs = []
+
+
# Build comprehensive URL to entry ID mapping
+
url_to_entry = self._build_url_to_entry_mapping(git_store)
+
+
for ref in references:
+
# If we already have a target_entry_id, keep the reference as-is
+
if ref.target_entry_id is not None:
+
resolved_refs.append(ref)
+
continue
+
+
# If we don't have a target_username, we can't resolve it
+
if ref.target_username is None:
+
resolved_refs.append(ref)
+
continue
+
+
# Try to resolve using URL mapping
+
resolved_entry_id = None
+
+
# First, try exact match
+
if ref.target_url in url_to_entry:
+
resolved_entry_id = url_to_entry[ref.target_url]
+
else:
+
# Try normalized URL matching
+
normalized_target = self._normalize_url(ref.target_url)
+
if normalized_target in url_to_entry:
+
resolved_entry_id = url_to_entry[normalized_target]
+
else:
+
# Try URL variations
+
for mapped_url, entry_id in url_to_entry.items():
+
if self._normalize_url(mapped_url) == normalized_target:
+
resolved_entry_id = entry_id
+
break
+
+
# Verify the resolved entry belongs to the target username
+
if resolved_entry_id:
+
# Double-check by loading the actual entry
+
entries = git_store.list_entries(ref.target_username)
+
entry_found = any(entry.id == resolved_entry_id for entry in entries)
+
if not entry_found:
+
resolved_entry_id = None
+
+
# Create a new reference with the resolved target_entry_id
+
resolved_ref = BlogReference(
+
source_entry_id=ref.source_entry_id,
+
source_username=ref.source_username,
+
target_url=ref.target_url,
+
target_username=ref.target_username,
+
target_entry_id=resolved_entry_id,
+
)
+
resolved_refs.append(resolved_ref)
+
+
return resolved_refs
+
</file>
+
+
<file path="src/thicket/models/__init__.py">
+
"""Data models for thicket."""
+
+
from .config import ThicketConfig, UserConfig
+
from .feed import AtomEntry, DuplicateMap, FeedMetadata
+
from .user import GitStoreIndex, UserMetadata
+
+
__all__ = [
+
"ThicketConfig",
+
"UserConfig",
+
"AtomEntry",
+
"DuplicateMap",
+
"FeedMetadata",
+
"GitStoreIndex",
+
"UserMetadata",
+
]
+
</file>
+
+
<file path="src/thicket/models/feed.py">
+
"""Feed and entry models for thicket."""
+
+
from datetime import datetime
+
from typing import TYPE_CHECKING, Optional
+
+
from pydantic import BaseModel, ConfigDict, EmailStr, HttpUrl
+
+
if TYPE_CHECKING:
+
from .config import UserConfig
+
+
+
class AtomEntry(BaseModel):
+
"""Represents an Atom feed entry stored in the Git repository."""
+
+
model_config = ConfigDict(
+
json_encoders={datetime: lambda v: v.isoformat()},
+
str_strip_whitespace=True,
+
)
+
+
id: str # Original Atom ID
+
title: str
+
link: HttpUrl
+
updated: datetime
+
published: Optional[datetime] = None
+
summary: Optional[str] = None
+
content: Optional[str] = None # Full body content from Atom entry
+
content_type: Optional[str] = "html" # text, html, xhtml
+
author: Optional[dict] = None
+
categories: list[str] = []
+
rights: Optional[str] = None # Copyright info
+
source: Optional[str] = None # Source feed URL
+
+
+
class FeedMetadata(BaseModel):
+
"""Metadata extracted from a feed for auto-discovery."""
+
+
title: Optional[str] = None
+
author_name: Optional[str] = None
+
author_email: Optional[EmailStr] = None
+
author_uri: Optional[HttpUrl] = None
+
link: Optional[HttpUrl] = None
+
logo: Optional[HttpUrl] = None
+
icon: Optional[HttpUrl] = None
+
image_url: Optional[HttpUrl] = None
+
description: Optional[str] = None
+
+
def to_user_config(self, username: str, feed_url: HttpUrl) -> "UserConfig":
+
"""Convert discovered metadata to UserConfig with fallbacks."""
+
from .config import UserConfig
+
+
return UserConfig(
+
username=username,
+
feeds=[feed_url],
+
display_name=self.author_name or self.title,
+
email=self.author_email,
+
homepage=self.author_uri or self.link,
+
icon=self.logo or self.icon or self.image_url,
+
)
+
+
+
class DuplicateMap(BaseModel):
+
"""Maps duplicate entry IDs to canonical entry IDs."""
+
+
duplicates: dict[str, str] = {} # duplicate_id -> canonical_id
+
comment: str = "Entry IDs that map to the same canonical content"
+
+
def add_duplicate(self, duplicate_id: str, canonical_id: str) -> None:
+
"""Add a duplicate mapping."""
+
self.duplicates[duplicate_id] = canonical_id
+
+
def remove_duplicate(self, duplicate_id: str) -> bool:
+
"""Remove a duplicate mapping. Returns True if existed."""
+
return self.duplicates.pop(duplicate_id, None) is not None
+
+
def get_canonical(self, entry_id: str) -> str:
+
"""Get canonical ID for an entry (returns original if not duplicate)."""
+
return self.duplicates.get(entry_id, entry_id)
+
+
def is_duplicate(self, entry_id: str) -> bool:
+
"""Check if entry ID is marked as duplicate."""
+
return entry_id in self.duplicates
+
+
def get_duplicates_for_canonical(self, canonical_id: str) -> list[str]:
+
"""Get all duplicate IDs that map to a canonical ID."""
+
return [
+
duplicate_id
+
for duplicate_id, canonical in self.duplicates.items()
+
if canonical == canonical_id
+
]
+
</file>
+
+
<file path="src/thicket/models/user.py">
+
"""User metadata models for thicket."""
+
+
from datetime import datetime
+
from typing import Optional
+
+
from pydantic import BaseModel, ConfigDict
+
+
+
class UserMetadata(BaseModel):
+
"""Metadata about a user stored in the Git repository."""
+
+
model_config = ConfigDict(
+
json_encoders={datetime: lambda v: v.isoformat()},
+
str_strip_whitespace=True,
+
)
+
+
username: str
+
display_name: Optional[str] = None
+
email: Optional[str] = None
+
homepage: Optional[str] = None
+
icon: Optional[str] = None
+
feeds: list[str] = []
+
directory: str # Directory name in Git store
+
created: datetime
+
last_updated: datetime
+
entry_count: int = 0
+
+
def update_timestamp(self) -> None:
+
"""Update the last_updated timestamp to now."""
+
self.last_updated = datetime.now()
+
+
def increment_entry_count(self, count: int = 1) -> None:
+
"""Increment the entry count by the given amount."""
+
self.entry_count += count
+
self.update_timestamp()
+
+
+
class GitStoreIndex(BaseModel):
+
"""Index of all users and their directories in the Git store."""
+
+
model_config = ConfigDict(
+
json_encoders={datetime: lambda v: v.isoformat()}
+
)
+
+
users: dict[str, UserMetadata] = {} # username -> UserMetadata
+
created: datetime
+
last_updated: datetime
+
total_entries: int = 0
+
+
def add_user(self, user_metadata: UserMetadata) -> None:
+
"""Add or update a user in the index."""
+
self.users[user_metadata.username] = user_metadata
+
self.last_updated = datetime.now()
+
+
def remove_user(self, username: str) -> bool:
+
"""Remove a user from the index. Returns True if user existed."""
+
if username in self.users:
+
del self.users[username]
+
self.last_updated = datetime.now()
+
return True
+
return False
+
+
def get_user(self, username: str) -> Optional[UserMetadata]:
+
"""Get user metadata by username."""
+
return self.users.get(username)
+
+
def update_entry_count(self, username: str, count: int) -> None:
+
"""Update entry count for a user and total."""
+
user = self.get_user(username)
+
if user:
+
user.increment_entry_count(count)
+
self.total_entries += count
+
self.last_updated = datetime.now()
+
+
def recalculate_totals(self) -> None:
+
"""Recalculate total entries from all users."""
+
self.total_entries = sum(user.entry_count for user in self.users.values())
+
self.last_updated = datetime.now()
+
</file>
+
+
<file path="src/thicket/utils/__init__.py">
+
"""Utility modules for thicket."""
+
+
# This module will contain shared utilities
+
# For now, it's empty but can be expanded with common functions
+
</file>
+
+
<file path="src/thicket/__init__.py">
+
"""Thicket: A CLI tool for persisting Atom/RSS feeds in Git repositories."""
+
+
__version__ = "0.1.0"
+
__author__ = "thicket"
+
__email__ = "thicket@example.com"
+
</file>
+
+
<file path="src/thicket/__main__.py">
+
"""Entry point for running thicket as a module."""
+
+
from .cli.main import app
+
+
if __name__ == "__main__":
+
app()
+
</file>
+
+
<file path=".gitignore">
+
# Byte-compiled / optimized / DLL files
+
__pycache__/
+
*.py[codz]
+
*$py.class
+
+
# C extensions
+
*.so
+
+
# Distribution / packaging
+
.Python
+
build/
+
develop-eggs/
+
dist/
+
downloads/
+
eggs/
+
.eggs/
+
lib/
+
lib64/
+
parts/
+
sdist/
+
var/
+
wheels/
+
share/python-wheels/
+
*.egg-info/
+
.installed.cfg
+
*.egg
+
MANIFEST
+
+
# PyInstaller
+
# Usually these files are written by a python script from a template
+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
+
*.manifest
+
*.spec
+
+
# Installer logs
+
pip-log.txt
+
pip-delete-this-directory.txt
+
+
# Unit test / coverage reports
+
htmlcov/
+
.tox/
+
.nox/
+
.coverage
+
.coverage.*
+
.cache
+
nosetests.xml
+
coverage.xml
+
*.cover
+
*.py.cover
+
.hypothesis/
+
.pytest_cache/
+
cover/
+
+
# Translations
+
*.mo
+
*.pot
+
+
# Django stuff:
+
*.log
+
local_settings.py
+
db.sqlite3
+
db.sqlite3-journal
+
+
# Flask stuff:
+
instance/
+
.webassets-cache
+
+
# Scrapy stuff:
+
.scrapy
+
+
# Sphinx documentation
+
docs/_build/
+
+
# PyBuilder
+
.pybuilder/
+
target/
+
+
# Jupyter Notebook
+
.ipynb_checkpoints
+
+
# IPython
+
profile_default/
+
ipython_config.py
+
+
# pyenv
+
# For a library or package, you might want to ignore these files since the code is
+
# intended to run in multiple environments; otherwise, check them in:
+
# .python-version
+
+
# pipenv
+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
+
# install all needed dependencies.
+
#Pipfile.lock
+
+
# UV
+
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
+
# This is especially recommended for binary packages to ensure reproducibility, and is more
+
# commonly ignored for libraries.
+
#uv.lock
+
+
# poetry
+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+
# This is especially recommended for binary packages to ensure reproducibility, and is more
+
# commonly ignored for libraries.
+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+
#poetry.lock
+
#poetry.toml
+
+
# pdm
+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+
# pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
+
# https://pdm-project.org/en/latest/usage/project/#working-with-version-control
+
#pdm.lock
+
#pdm.toml
+
.pdm-python
+
.pdm-build/
+
+
# pixi
+
# Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
+
#pixi.lock
+
# Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
+
# in the .venv directory. It is recommended not to include this directory in version control.
+
.pixi
+
+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+
__pypackages__/
+
+
# Celery stuff
+
celerybeat-schedule
+
celerybeat.pid
+
+
# SageMath parsed files
+
*.sage.py
+
+
# Environments
+
.env
+
.envrc
+
.venv
+
env/
+
venv/
+
ENV/
+
env.bak/
+
venv.bak/
+
+
# Spyder project settings
+
.spyderproject
+
.spyproject
+
+
# Rope project settings
+
.ropeproject
+
+
# mkdocs documentation
+
/site
+
+
# mypy
+
.mypy_cache/
+
.dmypy.json
+
dmypy.json
+
+
# Pyre type checker
+
.pyre/
+
+
# pytype static type analyzer
+
.pytype/
+
+
# Cython debug symbols
+
cython_debug/
+
+
# PyCharm
+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+
# and can be added to the global gitignore or merged into this file. For a more nuclear
+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
+
#.idea/
+
+
# Abstra
+
# Abstra is an AI-powered process automation framework.
+
# Ignore directories containing user credentials, local state, and settings.
+
# Learn more at https://abstra.io/docs
+
.abstra/
+
+
# Visual Studio Code
+
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
+
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
+
# and can be added to the global gitignore or merged into this file. However, if you prefer,
+
# you could uncomment the following to ignore the entire vscode folder
+
# .vscode/
+
+
# Ruff stuff:
+
.ruff_cache/
+
+
# PyPI configuration file
+
.pypirc
+
+
# Marimo
+
marimo/_static/
+
marimo/_lsp/
+
__marimo__/
+
+
# Streamlit
+
.streamlit/secrets.toml
+
+
thicket.yaml
+
</file>
+
+
<file path="CLAUDE.md">
+
My goal is to build a CLI tool called thicket in Python that maintains a Git repository within which Atom feeds can be persisted, including their contents.
+
+
# Python Environment and Package Management
+
+
This project uses `uv` for Python package management and virtual environment handling.
+
+
## Running Commands
+
+
ALWAYS use `uv run` to execute Python commands:
+
+
- Run the CLI: `uv run -m thicket`
+
- Run tests: `uv run pytest`
+
- Type checking: `uv run mypy src/`
+
- Linting: `uv run ruff check src/`
+
- Format code: `uv run ruff format src/`
+
- Compile check: `uv run python -m py_compile <file>`
+
+
## Package Management
+
+
- Add dependencies: `uv add <package>`
+
- Add dev dependencies: `uv add --dev <package>`
+
- Install dependencies: `uv sync`
+
- Update dependencies: `uv lock --upgrade`
+
+
# Project Structure
+
+
The configuration file specifies:
+
- the location of a git store
+
- a list of usernames and target Atom/RSS feed(s) and optional metadata about the username such as their email, homepage, icon and display name
+
- a cache directory to store temporary results such as feed downloads and their last modification date that speed up operations across runs of the tool
+
+
The Git data store should:
+
- have a subdirectory per user
+
- within that directory, an entry per Atom entry indexed by the Atom id for that entry. The id should be sanitised consistently to be a safe filename. RSS feed should be normalized to Atom before storing it.
+
- within each entry file, the metadata of the Atom feed converted into a JSON format that preserves as much metadata as possible.
+
- have a JSON file in the Git repository that indexes the users, their associated directories within the Git repository, and any other metadata about that user from the config file
+
The CLI should be modern and use cool progress bars and any otfrom ecosystem libraries.
+
+
The intention behind the Git repository is that it can be queried by other websites in order to build a webblog structure of comments that link to other blogs.
+
</file>
+
+
<file path="pyproject.toml">
+
[build-system]
+
requires = ["hatchling"]
+
build-backend = "hatchling.build"
+
+
[project]
+
name = "thicket"
+
dynamic = ["version"]
+
description = "A CLI tool for persisting Atom/RSS feeds in Git repositories"
+
readme = "README.md"
+
license = "MIT"
+
requires-python = ">=3.9"
+
authors = [
+
{name = "thicket", email = "thicket@example.com"},
+
]
+
classifiers = [
+
"Development Status :: 3 - Alpha",
+
"Intended Audience :: Developers",
+
"License :: OSI Approved :: MIT License",
+
"Operating System :: OS Independent",
+
"Programming Language :: Python :: 3",
+
"Programming Language :: Python :: 3.9",
+
"Programming Language :: Python :: 3.10",
+
"Programming Language :: Python :: 3.11",
+
"Programming Language :: Python :: 3.12",
+
"Programming Language :: Python :: 3.13",
+
"Topic :: Internet :: WWW/HTTP :: Dynamic Content :: News/Diary",
+
"Topic :: Software Development :: Version Control :: Git",
+
"Topic :: Text Processing :: Markup :: XML",
+
]
+
dependencies = [
+
"typer>=0.15.0",
+
"rich>=13.0.0",
+
"GitPython>=3.1.40",
+
"feedparser>=6.0.11",
+
"pydantic>=2.11.0",
+
"pydantic-settings>=2.10.0",
+
"httpx>=0.28.0",
+
"pendulum>=3.0.0",
+
"bleach>=6.0.0",
+
"platformdirs>=4.0.0",
+
"pyyaml>=6.0.0",
+
"email_validator",
+
"jinja2>=3.1.6",
+
]
+
+
[project.optional-dependencies]
+
dev = [
+
"pytest>=8.0.0",
+
"pytest-asyncio>=0.24.0",
+
"pytest-cov>=6.0.0",
+
"black>=24.0.0",
+
"ruff>=0.8.0",
+
"mypy>=1.13.0",
+
"types-PyYAML>=6.0.0",
+
]
+
+
[project.urls]
+
Homepage = "https://github.com/example/thicket"
+
Documentation = "https://github.com/example/thicket"
+
Repository = "https://github.com/example/thicket"
+
"Bug Tracker" = "https://github.com/example/thicket/issues"
+
+
[project.scripts]
+
thicket = "thicket.cli.main:app"
+
+
[tool.hatch.version]
+
path = "src/thicket/__init__.py"
+
+
[tool.hatch.build.targets.wheel]
+
packages = ["src/thicket"]
+
+
[tool.black]
+
line-length = 88
+
target-version = ['py39']
+
include = '\.pyi?$'
+
extend-exclude = '''
+
/(
+
# directories
+
\.eggs
+
| \.git
+
| \.hg
+
| \.mypy_cache
+
| \.tox
+
| \.venv
+
| build
+
| dist
+
)/
+
'''
+
+
[tool.ruff]
+
target-version = "py39"
+
line-length = 88
+
+
[tool.ruff.lint]
+
select = [
+
"E", # pycodestyle errors
+
"W", # pycodestyle warnings
+
"F", # pyflakes
+
"I", # isort
+
"B", # flake8-bugbear
+
"C4", # flake8-comprehensions
+
"UP", # pyupgrade
+
]
+
ignore = [
+
"E501", # line too long, handled by black
+
"B008", # do not perform function calls in argument defaults
+
"C901", # too complex
+
]
+
+
[tool.ruff.lint.per-file-ignores]
+
"__init__.py" = ["F401"]
+
+
[tool.mypy]
+
python_version = "3.9"
+
check_untyped_defs = true
+
disallow_any_generics = true
+
disallow_incomplete_defs = true
+
disallow_untyped_defs = true
+
no_implicit_optional = true
+
warn_redundant_casts = true
+
warn_unused_ignores = true
+
warn_return_any = true
+
strict_optional = true
+
+
[[tool.mypy.overrides]]
+
module = [
+
"feedparser",
+
"git",
+
"bleach",
+
]
+
ignore_missing_imports = true
+
+
[tool.pytest.ini_options]
+
testpaths = ["tests"]
+
python_files = ["test_*.py"]
+
python_classes = ["Test*"]
+
python_functions = ["test_*"]
+
addopts = [
+
"-ra",
+
"--strict-markers",
+
"--strict-config",
+
"--cov=src/thicket",
+
"--cov-report=term-missing",
+
"--cov-report=html",
+
"--cov-report=xml",
+
]
+
filterwarnings = [
+
"error",
+
"ignore::UserWarning",
+
"ignore::DeprecationWarning",
+
]
+
markers = [
+
"slow: marks tests as slow (deselect with '-m \"not slow\"')",
+
"integration: marks tests as integration tests",
+
]
+
+
[tool.coverage.run]
+
source = ["src"]
+
branch = true
+
+
[tool.coverage.report]
+
exclude_lines = [
+
"pragma: no cover",
+
"def __repr__",
+
"if self.debug:",
+
"if settings.DEBUG",
+
"raise AssertionError",
+
"raise NotImplementedError",
+
"if 0:",
+
"if __name__ == .__main__.:",
+
"class .*\\bProtocol\\):",
+
"@(abc\\.)?abstractmethod",
+
]
+
</file>
+
+
<file path="src/thicket/cli/commands/__init__.py">
+
"""CLI commands for thicket."""
+
+
# Import all commands to register them with the main app
+
from . import add, duplicates, generate, index_cmd, info_cmd, init, links_cmd, list_cmd, sync
+
+
__all__ = ["add", "duplicates", "generate", "index_cmd", "info_cmd", "init", "links_cmd", "list_cmd", "sync"]
+
</file>
+
+
<file path="src/thicket/cli/commands/add.py">
+
"""Add command for thicket."""
+
+
import asyncio
+
from pathlib import Path
+
from typing import Optional
+
+
import typer
+
from pydantic import HttpUrl, ValidationError
+
+
from ...core.feed_parser import FeedParser
+
from ...core.git_store import GitStore
+
from ..main import app
+
from ..utils import (
+
create_progress,
+
load_config,
+
print_error,
+
print_info,
+
print_success,
+
)
+
+
+
@app.command("add")
+
def add_command(
+
subcommand: str = typer.Argument(..., help="Subcommand: 'user' or 'feed'"),
+
username: str = typer.Argument(..., help="Username"),
+
feed_url: Optional[str] = typer.Argument(None, help="Feed URL (required for 'user' command)"),
+
email: Optional[str] = typer.Option(None, "--email", "-e", help="User email"),
+
homepage: Optional[str] = typer.Option(None, "--homepage", "-h", help="User homepage"),
+
icon: Optional[str] = typer.Option(None, "--icon", "-i", help="User icon URL"),
+
display_name: Optional[str] = typer.Option(None, "--display-name", "-d", help="User display name"),
+
config_file: Optional[Path] = typer.Option(
+
Path("thicket.yaml"), "--config", help="Configuration file path"
+
),
+
auto_discover: bool = typer.Option(
+
True, "--auto-discover/--no-auto-discover", help="Auto-discover user metadata from feed"
+
),
+
) -> None:
+
"""Add a user or feed to thicket."""
+
+
if subcommand == "user":
+
add_user(username, feed_url, email, homepage, icon, display_name, config_file, auto_discover)
+
elif subcommand == "feed":
+
add_feed(username, feed_url, config_file)
+
else:
+
print_error(f"Unknown subcommand: {subcommand}")
+
print_error("Use 'user' or 'feed'")
+
raise typer.Exit(1)
+
+
+
def add_user(
+
username: str,
+
feed_url: Optional[str],
+
email: Optional[str],
+
homepage: Optional[str],
+
icon: Optional[str],
+
display_name: Optional[str],
+
config_file: Path,
+
auto_discover: bool,
+
) -> None:
+
"""Add a new user with feed."""
+
+
if not feed_url:
+
print_error("Feed URL is required when adding a user")
+
raise typer.Exit(1)
+
+
# Validate feed URL
+
try:
+
validated_feed_url = HttpUrl(feed_url)
+
except ValidationError:
+
print_error(f"Invalid feed URL: {feed_url}")
+
raise typer.Exit(1) from None
+
+
# Load configuration
+
config = load_config(config_file)
+
+
# Initialize Git store
+
git_store = GitStore(config.git_store)
+
+
# Check if user already exists
+
existing_user = git_store.get_user(username)
+
if existing_user:
+
print_error(f"User '{username}' already exists")
+
print_error("Use 'thicket add feed' to add additional feeds")
+
raise typer.Exit(1)
+
+
# Auto-discover metadata if enabled
+
discovered_metadata = None
+
if auto_discover:
+
discovered_metadata = asyncio.run(discover_feed_metadata(validated_feed_url))
+
+
# Prepare user data with manual overrides taking precedence
+
user_display_name = display_name or (discovered_metadata.author_name or discovered_metadata.title if discovered_metadata else None)
+
user_email = email or (discovered_metadata.author_email if discovered_metadata else None)
+
user_homepage = homepage or (str(discovered_metadata.author_uri or discovered_metadata.link) if discovered_metadata else None)
+
user_icon = icon or (str(discovered_metadata.logo or discovered_metadata.icon or discovered_metadata.image_url) if discovered_metadata else None)
+
+
# Add user to Git store
+
git_store.add_user(
+
username=username,
+
display_name=user_display_name,
+
email=user_email,
+
homepage=user_homepage,
+
icon=user_icon,
+
feeds=[str(validated_feed_url)],
+
)
+
+
# Commit changes
+
git_store.commit_changes(f"Add user: {username}")
+
+
print_success(f"Added user '{username}' with feed: {feed_url}")
+
+
if discovered_metadata and auto_discover:
+
print_info("Auto-discovered metadata:")
+
if user_display_name:
+
print_info(f" Display name: {user_display_name}")
+
if user_email:
+
print_info(f" Email: {user_email}")
+
if user_homepage:
+
print_info(f" Homepage: {user_homepage}")
+
if user_icon:
+
print_info(f" Icon: {user_icon}")
+
+
+
def add_feed(username: str, feed_url: Optional[str], config_file: Path) -> None:
+
"""Add a feed to an existing user."""
+
+
if not feed_url:
+
print_error("Feed URL is required")
+
raise typer.Exit(1)
+
+
# Validate feed URL
+
try:
+
validated_feed_url = HttpUrl(feed_url)
+
except ValidationError:
+
print_error(f"Invalid feed URL: {feed_url}")
+
raise typer.Exit(1) from None
+
+
# Load configuration
+
config = load_config(config_file)
+
+
# Initialize Git store
+
git_store = GitStore(config.git_store)
+
+
# Check if user exists
+
user = git_store.get_user(username)
+
if not user:
+
print_error(f"User '{username}' not found")
+
print_error("Use 'thicket add user' to add a new user")
+
raise typer.Exit(1)
+
+
# Check if feed already exists
+
if str(validated_feed_url) in user.feeds:
+
print_error(f"Feed already exists for user '{username}': {feed_url}")
+
raise typer.Exit(1)
+
+
# Add feed to user
+
updated_feeds = user.feeds + [str(validated_feed_url)]
+
if git_store.update_user(username, feeds=updated_feeds):
+
git_store.commit_changes(f"Add feed to user {username}: {feed_url}")
+
print_success(f"Added feed to user '{username}': {feed_url}")
+
else:
+
print_error(f"Failed to add feed to user '{username}'")
+
raise typer.Exit(1)
+
+
+
async def discover_feed_metadata(feed_url: HttpUrl):
+
"""Discover metadata from a feed URL."""
+
try:
+
with create_progress() as progress:
+
task = progress.add_task("Discovering feed metadata...", total=None)
+
+
parser = FeedParser()
+
content = await parser.fetch_feed(feed_url)
+
metadata, _ = parser.parse_feed(content, feed_url)
+
+
progress.update(task, completed=True)
+
return metadata
+
+
except Exception as e:
+
print_error(f"Failed to discover feed metadata: {e}")
+
return None
+
</file>
+
+
<file path="src/thicket/cli/commands/duplicates.py">
+
"""Duplicates command for thicket."""
+
+
from pathlib import Path
+
from typing import Optional
+
+
import typer
+
from rich.table import Table
+
+
from ...core.git_store import GitStore
+
from ..main import app
+
from ..utils import (
+
console,
+
load_config,
+
print_error,
+
print_info,
+
print_success,
+
get_tsv_mode,
+
)
+
+
+
@app.command("duplicates")
+
def duplicates_command(
+
action: str = typer.Argument(..., help="Action: 'list', 'add', 'remove'"),
+
duplicate_id: Optional[str] = typer.Argument(None, help="Duplicate entry ID"),
+
canonical_id: Optional[str] = typer.Argument(None, help="Canonical entry ID"),
+
config_file: Optional[Path] = typer.Option(
+
Path("thicket.yaml"), "--config", help="Configuration file path"
+
),
+
) -> None:
+
"""Manage duplicate entry mappings."""
+
+
# Load configuration
+
config = load_config(config_file)
+
+
# Initialize Git store
+
git_store = GitStore(config.git_store)
+
+
if action == "list":
+
list_duplicates(git_store)
+
elif action == "add":
+
add_duplicate(git_store, duplicate_id, canonical_id)
+
elif action == "remove":
+
remove_duplicate(git_store, duplicate_id)
+
else:
+
print_error(f"Unknown action: {action}")
+
print_error("Use 'list', 'add', or 'remove'")
+
raise typer.Exit(1)
+
+
+
def list_duplicates(git_store: GitStore) -> None:
+
"""List all duplicate mappings."""
+
duplicates = git_store.get_duplicates()
+
+
if not duplicates.duplicates:
+
if get_tsv_mode():
+
print("No duplicate mappings found")
+
else:
+
print_info("No duplicate mappings found")
+
return
+
+
if get_tsv_mode():
+
print("Duplicate ID\tCanonical ID")
+
for duplicate_id, canonical_id in duplicates.duplicates.items():
+
print(f"{duplicate_id}\t{canonical_id}")
+
print(f"Total duplicates: {len(duplicates.duplicates)}")
+
else:
+
table = Table(title="Duplicate Entry Mappings")
+
table.add_column("Duplicate ID", style="red")
+
table.add_column("Canonical ID", style="green")
+
+
for duplicate_id, canonical_id in duplicates.duplicates.items():
+
table.add_row(duplicate_id, canonical_id)
+
+
console.print(table)
+
print_info(f"Total duplicates: {len(duplicates.duplicates)}")
+
+
+
def add_duplicate(git_store: GitStore, duplicate_id: Optional[str], canonical_id: Optional[str]) -> None:
+
"""Add a duplicate mapping."""
+
if not duplicate_id:
+
print_error("Duplicate ID is required")
+
raise typer.Exit(1)
+
+
if not canonical_id:
+
print_error("Canonical ID is required")
+
raise typer.Exit(1)
+
+
# Check if duplicate_id already exists
+
duplicates = git_store.get_duplicates()
+
if duplicates.is_duplicate(duplicate_id):
+
existing_canonical = duplicates.get_canonical(duplicate_id)
+
print_error(f"Duplicate ID already mapped to: {existing_canonical}")
+
print_error("Use 'remove' first to change the mapping")
+
raise typer.Exit(1)
+
+
# Check if we're trying to make a canonical ID point to itself
+
if duplicate_id == canonical_id:
+
print_error("Duplicate ID cannot be the same as canonical ID")
+
raise typer.Exit(1)
+
+
# Add the mapping
+
git_store.add_duplicate(duplicate_id, canonical_id)
+
+
# Commit changes
+
git_store.commit_changes(f"Add duplicate mapping: {duplicate_id} -> {canonical_id}")
+
+
print_success(f"Added duplicate mapping: {duplicate_id} -> {canonical_id}")
+
+
+
def remove_duplicate(git_store: GitStore, duplicate_id: Optional[str]) -> None:
+
"""Remove a duplicate mapping."""
+
if not duplicate_id:
+
print_error("Duplicate ID is required")
+
raise typer.Exit(1)
+
+
# Check if mapping exists
+
duplicates = git_store.get_duplicates()
+
if not duplicates.is_duplicate(duplicate_id):
+
print_error(f"No duplicate mapping found for: {duplicate_id}")
+
raise typer.Exit(1)
+
+
canonical_id = duplicates.get_canonical(duplicate_id)
+
+
# Remove the mapping
+
if git_store.remove_duplicate(duplicate_id):
+
# Commit changes
+
git_store.commit_changes(f"Remove duplicate mapping: {duplicate_id} -> {canonical_id}")
+
print_success(f"Removed duplicate mapping: {duplicate_id} -> {canonical_id}")
+
else:
+
print_error(f"Failed to remove duplicate mapping: {duplicate_id}")
+
raise typer.Exit(1)
+
</file>
+
+
<file path="src/thicket/cli/commands/sync.py">
+
"""Sync command for thicket."""
+
+
import asyncio
+
from pathlib import Path
+
from typing import Optional
+
+
import typer
+
from rich.progress import track
+
+
from ...core.feed_parser import FeedParser
+
from ...core.git_store import GitStore
+
from ..main import app
+
from ..utils import (
+
load_config,
+
print_error,
+
print_info,
+
print_success,
+
)
+
+
+
@app.command()
+
def sync(
+
all_users: bool = typer.Option(
+
False, "--all", "-a", help="Sync all users and feeds"
+
),
+
user: Optional[str] = typer.Option(
+
None, "--user", "-u", help="Sync specific user only"
+
),
+
config_file: Optional[Path] = typer.Option(
+
Path("thicket.yaml"), "--config", help="Configuration file path"
+
),
+
dry_run: bool = typer.Option(
+
False, "--dry-run", help="Show what would be synced without making changes"
+
),
+
) -> None:
+
"""Sync feeds and store entries in Git repository."""
+
+
# Load configuration
+
config = load_config(config_file)
+
+
# Initialize Git store
+
git_store = GitStore(config.git_store)
+
+
# Determine which users to sync from git repository
+
users_to_sync = []
+
if all_users:
+
index = git_store._load_index()
+
users_to_sync = list(index.users.values())
+
elif user:
+
user_metadata = git_store.get_user(user)
+
if not user_metadata:
+
print_error(f"User '{user}' not found in git repository")
+
raise typer.Exit(1)
+
users_to_sync = [user_metadata]
+
else:
+
print_error("Specify --all to sync all users or --user to sync a specific user")
+
raise typer.Exit(1)
+
+
if not users_to_sync:
+
print_info("No users configured to sync")
+
return
+
+
# Sync each user
+
total_new_entries = 0
+
total_updated_entries = 0
+
+
for user_metadata in users_to_sync:
+
print_info(f"Syncing user: {user_metadata.username}")
+
+
user_new_entries = 0
+
user_updated_entries = 0
+
+
# Sync each feed for the user
+
for feed_url in track(user_metadata.feeds, description=f"Syncing {user_metadata.username}'s feeds"):
+
try:
+
new_entries, updated_entries = asyncio.run(
+
sync_feed(git_store, user_metadata.username, feed_url, dry_run)
+
)
+
user_new_entries += new_entries
+
user_updated_entries += updated_entries
+
+
except Exception as e:
+
print_error(f"Failed to sync feed {feed_url}: {e}")
+
continue
+
+
print_info(f"User {user_metadata.username}: {user_new_entries} new, {user_updated_entries} updated")
+
total_new_entries += user_new_entries
+
total_updated_entries += user_updated_entries
+
+
# Commit changes if not dry run
+
if not dry_run and (total_new_entries > 0 or total_updated_entries > 0):
+
commit_message = f"Sync feeds: {total_new_entries} new entries, {total_updated_entries} updated"
+
git_store.commit_changes(commit_message)
+
print_success(f"Committed changes: {commit_message}")
+
+
# Summary
+
if dry_run:
+
print_info(f"Dry run complete: would sync {total_new_entries} new entries, {total_updated_entries} updated")
+
else:
+
print_success(f"Sync complete: {total_new_entries} new entries, {total_updated_entries} updated")
+
+
+
async def sync_feed(git_store: GitStore, username: str, feed_url, dry_run: bool) -> tuple[int, int]:
+
"""Sync a single feed for a user."""
+
+
parser = FeedParser()
+
+
try:
+
# Fetch and parse feed
+
content = await parser.fetch_feed(feed_url)
+
metadata, entries = parser.parse_feed(content, feed_url)
+
+
new_entries = 0
+
updated_entries = 0
+
+
# Process each entry
+
for entry in entries:
+
try:
+
# Check if entry already exists
+
existing_entry = git_store.get_entry(username, entry.id)
+
+
if existing_entry:
+
# Check if entry has been updated
+
if existing_entry.updated != entry.updated:
+
if not dry_run:
+
git_store.store_entry(username, entry)
+
updated_entries += 1
+
else:
+
# New entry
+
if not dry_run:
+
git_store.store_entry(username, entry)
+
new_entries += 1
+
+
except Exception as e:
+
print_error(f"Failed to process entry {entry.id}: {e}")
+
continue
+
+
return new_entries, updated_entries
+
+
except Exception as e:
+
print_error(f"Failed to sync feed {feed_url}: {e}")
+
return 0, 0
+
</file>
+
+
<file path="src/thicket/models/config.py">
+
"""Configuration models for thicket."""
+
+
from pathlib import Path
+
from typing import Optional
+
+
from pydantic import BaseModel, EmailStr, HttpUrl
+
from pydantic_settings import BaseSettings, SettingsConfigDict
+
+
+
class UserConfig(BaseModel):
+
"""Configuration for a single user and their feeds."""
+
+
username: str
+
feeds: list[HttpUrl]
+
email: Optional[EmailStr] = None
+
homepage: Optional[HttpUrl] = None
+
icon: Optional[HttpUrl] = None
+
display_name: Optional[str] = None
+
+
+
class ThicketConfig(BaseSettings):
+
"""Main configuration for thicket."""
+
+
model_config = SettingsConfigDict(
+
env_prefix="THICKET_",
+
env_file=".env",
+
yaml_file="thicket.yaml",
+
case_sensitive=False,
+
)
+
+
git_store: Path
+
cache_dir: Path
+
users: list[UserConfig] = []
+
</file>
+
+
<file path="src/thicket/cli/commands/links_cmd.py">
+
"""CLI command for extracting and categorizing all outbound links from blog entries."""
+
+
import json
+
import re
+
from pathlib import Path
+
from typing import Dict, List, Optional, Set
+
from urllib.parse import urljoin, urlparse
+
+
import typer
+
from rich.console import Console
+
from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn
+
from rich.table import Table
+
+
from ...core.git_store import GitStore
+
from ..main import app
+
from ..utils import load_config, get_tsv_mode
+
+
console = Console()
+
+
+
class LinkData:
+
"""Represents a link found in a blog entry."""
+
+
def __init__(self, url: str, entry_id: str, username: str):
+
self.url = url
+
self.entry_id = entry_id
+
self.username = username
+
+
def to_dict(self) -> dict:
+
"""Convert to dictionary for JSON serialization."""
+
return {
+
"url": self.url,
+
"entry_id": self.entry_id,
+
"username": self.username
+
}
+
+
@classmethod
+
def from_dict(cls, data: dict) -> "LinkData":
+
"""Create from dictionary."""
+
return cls(
+
url=data["url"],
+
entry_id=data["entry_id"],
+
username=data["username"]
+
)
+
+
+
class LinkCategorizer:
+
"""Categorizes links as internal, user, or unknown."""
+
+
def __init__(self, user_domains: Dict[str, Set[str]]):
+
self.user_domains = user_domains
+
# Create reverse mapping of domain -> username
+
self.domain_to_user = {}
+
for username, domains in user_domains.items():
+
for domain in domains:
+
self.domain_to_user[domain] = username
+
+
def categorize_url(self, url: str, source_username: str) -> tuple[str, Optional[str]]:
+
"""
+
Categorize a URL as 'internal', 'user', or 'unknown'.
+
Returns (category, target_username).
+
"""
+
try:
+
parsed = urlparse(url)
+
domain = parsed.netloc.lower()
+
+
# Check if it's a link to the same user's domain (internal)
+
if domain in self.user_domains.get(source_username, set()):
+
return "internal", source_username
+
+
# Check if it's a link to another user's domain
+
if domain in self.domain_to_user:
+
return "user", self.domain_to_user[domain]
+
+
# Everything else is unknown
+
return "unknown", None
+
+
except Exception:
+
return "unknown", None
+
+
+
class LinkExtractor:
+
"""Extracts and resolves links from blog entries."""
+
+
def __init__(self):
+
# Pattern for extracting links from HTML
+
self.link_pattern = re.compile(r'<a[^>]+href="([^"]+)"[^>]*>(.*?)</a>', re.IGNORECASE | re.DOTALL)
+
self.url_pattern = re.compile(r'https?://[^\s<>"]+')
+
+
def extract_links_from_html(self, html_content: str, base_url: str) -> List[tuple[str, str]]:
+
"""Extract all links from HTML content and resolve them against base URL."""
+
links = []
+
+
# Extract links from <a> tags
+
for match in self.link_pattern.finditer(html_content):
+
url = match.group(1)
+
text = re.sub(r'<[^>]+>', '', match.group(2)).strip() # Remove HTML tags from link text
+
+
# Resolve relative URLs against base URL
+
resolved_url = urljoin(base_url, url)
+
links.append((resolved_url, text))
+
+
return links
+
+
+
def extract_links_from_entry(self, entry, username: str, base_url: str) -> List[LinkData]:
+
"""Extract all links from a blog entry."""
+
links = []
+
+
# Combine all text content for analysis
+
content_to_search = []
+
if entry.content:
+
content_to_search.append(entry.content)
+
if entry.summary:
+
content_to_search.append(entry.summary)
+
+
for content in content_to_search:
+
extracted_links = self.extract_links_from_html(content, base_url)
+
+
for url, link_text in extracted_links:
+
# Skip empty URLs
+
if not url or url.startswith('#'):
+
continue
+
+
link_data = LinkData(
+
url=url,
+
entry_id=entry.id,
+
username=username
+
)
+
+
links.append(link_data)
+
+
return links
+
+
+
@app.command()
+
def links(
+
config_file: Optional[Path] = typer.Option(
+
Path("thicket.yaml"),
+
"--config",
+
"-c",
+
help="Path to configuration file",
+
),
+
output_file: Optional[Path] = typer.Option(
+
None,
+
"--output",
+
"-o",
+
help="Path to output unified links file (default: links.json in git store)",
+
),
+
verbose: bool = typer.Option(
+
False,
+
"--verbose",
+
"-v",
+
help="Show detailed progress information",
+
),
+
) -> None:
+
"""Extract and categorize all outbound links from blog entries.
+
+
This command analyzes all blog entries to extract outbound links,
+
resolve them properly with respect to the feed's base URL, and
+
categorize them as internal, user, or unknown links.
+
+
Creates a unified links.json file containing all link data.
+
"""
+
try:
+
# Load configuration
+
config = load_config(config_file)
+
+
# Initialize Git store
+
git_store = GitStore(config.git_store)
+
+
# Build user domain mapping
+
if verbose:
+
console.print("Building user domain mapping...")
+
+
index = git_store._load_index()
+
user_domains = {}
+
+
for username, user_metadata in index.users.items():
+
domains = set()
+
+
# Add domains from feeds
+
for feed_url in user_metadata.feeds:
+
domain = urlparse(feed_url).netloc.lower()
+
if domain:
+
domains.add(domain)
+
+
# Add domain from homepage
+
if user_metadata.homepage:
+
domain = urlparse(str(user_metadata.homepage)).netloc.lower()
+
if domain:
+
domains.add(domain)
+
+
user_domains[username] = domains
+
+
if verbose:
+
console.print(f"Found {len(user_domains)} users with {sum(len(d) for d in user_domains.values())} total domains")
+
+
# Initialize components
+
link_extractor = LinkExtractor()
+
categorizer = LinkCategorizer(user_domains)
+
+
# Get all users
+
users = list(index.users.keys())
+
+
if not users:
+
console.print("[yellow]No users found in Git store[/yellow]")
+
raise typer.Exit(0)
+
+
# Process all entries
+
all_links = []
+
link_categories = {"internal": [], "user": [], "unknown": []}
+
link_dict = {} # Dictionary with link URL as key, maps to list of atom IDs
+
reverse_dict = {} # Dictionary with atom ID as key, maps to list of URLs
+
+
with Progress(
+
SpinnerColumn(),
+
TextColumn("[progress.description]{task.description}"),
+
BarColumn(),
+
TaskProgressColumn(),
+
console=console,
+
) as progress:
+
+
# Count total entries first
+
counting_task = progress.add_task("Counting entries...", total=len(users))
+
total_entries = 0
+
+
for username in users:
+
entries = git_store.list_entries(username)
+
total_entries += len(entries)
+
progress.advance(counting_task)
+
+
progress.remove_task(counting_task)
+
+
# Process entries
+
processing_task = progress.add_task(
+
f"Processing {total_entries} entries...",
+
total=total_entries
+
)
+
+
for username in users:
+
entries = git_store.list_entries(username)
+
user_metadata = index.users[username]
+
+
# Get base URL for this user (use first feed URL)
+
base_url = str(user_metadata.feeds[0]) if user_metadata.feeds else "https://example.com"
+
+
for entry in entries:
+
# Extract links from this entry
+
entry_links = link_extractor.extract_links_from_entry(entry, username, base_url)
+
+
# Track unique links per entry
+
entry_urls_seen = set()
+
+
# Categorize each link
+
for link_data in entry_links:
+
# Skip if we've already seen this URL in this entry
+
if link_data.url in entry_urls_seen:
+
continue
+
entry_urls_seen.add(link_data.url)
+
+
category, target_username = categorizer.categorize_url(link_data.url, username)
+
+
# Add to link dictionary (URL as key, maps to list of atom IDs)
+
if link_data.url not in link_dict:
+
link_dict[link_data.url] = []
+
if link_data.entry_id not in link_dict[link_data.url]:
+
link_dict[link_data.url].append(link_data.entry_id)
+
+
# Also add to reverse mapping (atom ID -> list of URLs)
+
if link_data.entry_id not in reverse_dict:
+
reverse_dict[link_data.entry_id] = []
+
if link_data.url not in reverse_dict[link_data.entry_id]:
+
reverse_dict[link_data.entry_id].append(link_data.url)
+
+
# Add category info to link data for categories tracking
+
link_info = link_data.to_dict()
+
link_info["category"] = category
+
link_info["target_username"] = target_username
+
+
all_links.append(link_info)
+
link_categories[category].append(link_info)
+
+
progress.advance(processing_task)
+
+
if verbose and entry_links:
+
console.print(f" Found {len(entry_links)} links in {username}:{entry.title[:50]}...")
+
+
# Determine output path
+
if output_file:
+
output_path = output_file
+
else:
+
output_path = config.git_store / "links.json"
+
+
# Save all extracted links (not just filtered ones)
+
if verbose:
+
console.print("Preparing output data...")
+
+
# Build a set of all URLs that correspond to posts in the git database
+
registered_urls = set()
+
+
# Get all entries from all users and build URL mappings
+
for username in users:
+
entries = git_store.list_entries(username)
+
user_metadata = index.users[username]
+
+
for entry in entries:
+
# Try to match entry URLs with extracted links
+
if hasattr(entry, 'link') and entry.link:
+
registered_urls.add(str(entry.link))
+
+
# Also check entry alternate links if they exist
+
if hasattr(entry, 'links') and entry.links:
+
for link in entry.links:
+
if hasattr(link, 'href') and link.href:
+
registered_urls.add(str(link.href))
+
+
# Build unified structure with metadata
+
unified_links = {}
+
reverse_mapping = {}
+
+
for url, entry_ids in link_dict.items():
+
unified_links[url] = {
+
"referencing_entries": entry_ids
+
}
+
+
# Find target username if this is a tracked post
+
if url in registered_urls:
+
for username in users:
+
user_domains_set = {domain for domain in user_domains.get(username, [])}
+
if any(domain in url for domain in user_domains_set):
+
unified_links[url]["target_username"] = username
+
break
+
+
# Build reverse mapping
+
for entry_id in entry_ids:
+
if entry_id not in reverse_mapping:
+
reverse_mapping[entry_id] = []
+
if url not in reverse_mapping[entry_id]:
+
reverse_mapping[entry_id].append(url)
+
+
# Create unified output data
+
output_data = {
+
"links": unified_links,
+
"reverse_mapping": reverse_mapping,
+
"user_domains": {k: list(v) for k, v in user_domains.items()}
+
}
+
+
if verbose:
+
console.print(f"Found {len(registered_urls)} registered post URLs")
+
console.print(f"Found {len(link_dict)} total links, {sum(1 for link in unified_links.values() if 'target_username' in link)} tracked posts")
+
+
# Save unified data
+
with open(output_path, "w") as f:
+
json.dump(output_data, f, indent=2, default=str)
+
+
# Show summary
+
if not get_tsv_mode():
+
console.print("\n[green]โœ“ Links extraction completed successfully[/green]")
+
+
# Create summary table or TSV output
+
if get_tsv_mode():
+
print("Category\tCount\tDescription")
+
print(f"Internal\t{len(link_categories['internal'])}\tLinks to same user's domain")
+
print(f"User\t{len(link_categories['user'])}\tLinks to other tracked users")
+
print(f"Unknown\t{len(link_categories['unknown'])}\tLinks to external sites")
+
print(f"Total Extracted\t{len(all_links)}\tAll extracted links")
+
print(f"Saved to Output\t{len(output_data['links'])}\tLinks saved to output file")
+
print(f"Cross-references\t{sum(1 for link in unified_links.values() if 'target_username' in link)}\tLinks to registered posts only")
+
else:
+
table = Table(title="Links Summary")
+
table.add_column("Category", style="cyan")
+
table.add_column("Count", style="green")
+
table.add_column("Description", style="white")
+
+
table.add_row("Internal", str(len(link_categories["internal"])), "Links to same user's domain")
+
table.add_row("User", str(len(link_categories["user"])), "Links to other tracked users")
+
table.add_row("Unknown", str(len(link_categories["unknown"])), "Links to external sites")
+
table.add_row("Total Extracted", str(len(all_links)), "All extracted links")
+
table.add_row("Saved to Output", str(len(output_data['links'])), "Links saved to output file")
+
table.add_row("Cross-references", str(sum(1 for link in unified_links.values() if 'target_username' in link)), "Links to registered posts only")
+
+
console.print(table)
+
+
# Show user links if verbose
+
if verbose and link_categories["user"]:
+
if get_tsv_mode():
+
print("User Link Source\tUser Link Target\tLink Count")
+
user_link_counts = {}
+
+
for link in link_categories["user"]:
+
key = f"{link['username']} -> {link['target_username']}"
+
user_link_counts[key] = user_link_counts.get(key, 0) + 1
+
+
for link_pair, count in sorted(user_link_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
+
source, target = link_pair.split(" -> ")
+
print(f"{source}\t{target}\t{count}")
+
else:
+
console.print("\n[bold]User-to-user links:[/bold]")
+
user_link_counts = {}
+
+
for link in link_categories["user"]:
+
key = f"{link['username']} -> {link['target_username']}"
+
user_link_counts[key] = user_link_counts.get(key, 0) + 1
+
+
for link_pair, count in sorted(user_link_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
+
console.print(f" {link_pair}: {count} links")
+
+
if not get_tsv_mode():
+
console.print(f"\nUnified links data saved to: {output_path}")
+
+
except Exception as e:
+
console.print(f"[red]Error extracting links: {e}[/red]")
+
if verbose:
+
console.print_exception()
+
raise typer.Exit(1)
+
</file>
+
+
<file path="src/thicket/cli/commands/list_cmd.py">
+
"""List command for thicket."""
+
+
import re
+
from pathlib import Path
+
from typing import Optional
+
+
import typer
+
from rich.table import Table
+
+
from ...core.git_store import GitStore
+
from ..main import app
+
from ..utils import (
+
console,
+
load_config,
+
print_error,
+
print_feeds_table,
+
print_feeds_table_from_git,
+
print_info,
+
print_users_table,
+
print_users_table_from_git,
+
print_entries_tsv,
+
get_tsv_mode,
+
)
+
+
+
@app.command("list")
+
def list_command(
+
what: str = typer.Argument(..., help="What to list: 'users', 'feeds', 'entries'"),
+
user: Optional[str] = typer.Option(
+
None, "--user", "-u", help="Filter by specific user"
+
),
+
limit: Optional[int] = typer.Option(
+
None, "--limit", "-l", help="Limit number of results"
+
),
+
config_file: Optional[Path] = typer.Option(
+
Path("thicket.yaml"), "--config", help="Configuration file path"
+
),
+
) -> None:
+
"""List users, feeds, or entries."""
+
+
# Load configuration
+
config = load_config(config_file)
+
+
# Initialize Git store
+
git_store = GitStore(config.git_store)
+
+
if what == "users":
+
list_users(git_store)
+
elif what == "feeds":
+
list_feeds(git_store, user)
+
elif what == "entries":
+
list_entries(git_store, user, limit)
+
else:
+
print_error(f"Unknown list type: {what}")
+
print_error("Use 'users', 'feeds', or 'entries'")
+
raise typer.Exit(1)
+
+
+
def list_users(git_store: GitStore) -> None:
+
"""List all users."""
+
index = git_store._load_index()
+
users = list(index.users.values())
+
+
if not users:
+
print_info("No users configured")
+
return
+
+
print_users_table_from_git(users)
+
+
+
def list_feeds(git_store: GitStore, username: Optional[str] = None) -> None:
+
"""List feeds, optionally filtered by user."""
+
if username:
+
user = git_store.get_user(username)
+
if not user:
+
print_error(f"User '{username}' not found")
+
raise typer.Exit(1)
+
+
if not user.feeds:
+
print_info(f"No feeds configured for user '{username}'")
+
return
+
+
print_feeds_table_from_git(git_store, username)
+
+
+
def list_entries(git_store: GitStore, username: Optional[str] = None, limit: Optional[int] = None) -> None:
+
"""List entries, optionally filtered by user."""
+
+
if username:
+
# List entries for specific user
+
user = git_store.get_user(username)
+
if not user:
+
print_error(f"User '{username}' not found")
+
raise typer.Exit(1)
+
+
entries = git_store.list_entries(username, limit)
+
if not entries:
+
print_info(f"No entries found for user '{username}'")
+
return
+
+
print_entries_table([entries], [username])
+
+
else:
+
# List entries for all users
+
all_entries = []
+
all_usernames = []
+
+
index = git_store._load_index()
+
for user in index.users.values():
+
entries = git_store.list_entries(user.username, limit)
+
if entries:
+
all_entries.append(entries)
+
all_usernames.append(user.username)
+
+
if not all_entries:
+
print_info("No entries found")
+
return
+
+
print_entries_table(all_entries, all_usernames)
+
+
+
def _clean_html_content(content: Optional[str]) -> str:
+
"""Clean HTML content for display in table."""
+
if not content:
+
return ""
+
+
# Remove HTML tags
+
clean_text = re.sub(r'<[^>]+>', ' ', content)
+
# Replace multiple whitespace with single space
+
clean_text = re.sub(r'\s+', ' ', clean_text)
+
# Strip and limit length
+
clean_text = clean_text.strip()
+
if len(clean_text) > 100:
+
clean_text = clean_text[:97] + "..."
+
+
return clean_text
+
+
+
def print_entries_table(entries_by_user: list[list], usernames: list[str]) -> None:
+
"""Print a table of entries."""
+
if get_tsv_mode():
+
print_entries_tsv(entries_by_user, usernames)
+
return
+
+
table = Table(title="Feed Entries")
+
table.add_column("User", style="cyan", no_wrap=True)
+
table.add_column("Title", style="bold")
+
table.add_column("Updated", style="blue")
+
table.add_column("URL", style="green")
+
+
# Combine all entries with usernames
+
all_entries = []
+
for entries, username in zip(entries_by_user, usernames):
+
for entry in entries:
+
all_entries.append((username, entry))
+
+
# Sort by updated time (newest first)
+
all_entries.sort(key=lambda x: x[1].updated, reverse=True)
+
+
for username, entry in all_entries:
+
# Format updated time
+
updated_str = entry.updated.strftime("%Y-%m-%d %H:%M")
+
+
# Truncate title if too long
+
title = entry.title
+
if len(title) > 50:
+
title = title[:47] + "..."
+
+
table.add_row(
+
username,
+
title,
+
updated_str,
+
str(entry.link),
+
)
+
+
console.print(table)
+
</file>
+
+
<file path="src/thicket/cli/main.py">
+
"""Main CLI application using Typer."""
+
+
import typer
+
from rich.console import Console
+
+
from .. import __version__
+
+
app = typer.Typer(
+
name="thicket",
+
help="A CLI tool for persisting Atom/RSS feeds in Git repositories",
+
no_args_is_help=True,
+
rich_markup_mode="rich",
+
)
+
+
console = Console()
+
+
# Global state for TSV output mode
+
tsv_mode = False
+
+
+
def version_callback(value: bool) -> None:
+
"""Show version and exit."""
+
if value:
+
console.print(f"thicket version {__version__}")
+
raise typer.Exit()
+
+
+
@app.callback()
+
def main(
+
version: bool = typer.Option(
+
None,
+
"--version",
+
"-v",
+
help="Show the version and exit",
+
callback=version_callback,
+
is_eager=True,
+
),
+
tsv: bool = typer.Option(
+
False,
+
"--tsv",
+
help="Output in tab-separated values format without truncation",
+
),
+
) -> None:
+
"""Thicket: A CLI tool for persisting Atom/RSS feeds in Git repositories."""
+
global tsv_mode
+
tsv_mode = tsv
+
+
+
# Import commands to register them
+
from .commands import add, duplicates, generate, index_cmd, info_cmd, init, links_cmd, list_cmd, sync
+
+
if __name__ == "__main__":
+
app()
+
</file>
+
+
<file path="src/thicket/core/git_store.py">
+
"""Git repository operations for thicket."""
+
+
import json
+
from datetime import datetime
+
from pathlib import Path
+
from typing import Optional
+
+
import git
+
from git import Repo
+
+
from ..models import AtomEntry, DuplicateMap, GitStoreIndex, UserMetadata
+
+
+
class GitStore:
+
"""Manages the Git repository for storing feed entries."""
+
+
def __init__(self, repo_path: Path):
+
"""Initialize the Git store."""
+
self.repo_path = repo_path
+
self.repo: Optional[Repo] = None
+
self._ensure_repo()
+
+
def _ensure_repo(self) -> None:
+
"""Ensure the Git repository exists and is initialized."""
+
if not self.repo_path.exists():
+
self.repo_path.mkdir(parents=True, exist_ok=True)
+
+
try:
+
self.repo = Repo(self.repo_path)
+
except git.InvalidGitRepositoryError:
+
# Initialize new repository
+
self.repo = Repo.init(self.repo_path)
+
self._create_initial_structure()
+
+
def _create_initial_structure(self) -> None:
+
"""Create initial Git store structure."""
+
# Create index.json
+
index = GitStoreIndex(
+
created=datetime.now(),
+
last_updated=datetime.now(),
+
)
+
self._save_index(index)
+
+
# Create duplicates.json
+
duplicates = DuplicateMap()
+
self._save_duplicates(duplicates)
+
+
# Create initial commit
+
self.repo.index.add(["index.json", "duplicates.json"])
+
self.repo.index.commit("Initial thicket repository structure")
+
+
def _save_index(self, index: GitStoreIndex) -> None:
+
"""Save the index to index.json."""
+
index_path = self.repo_path / "index.json"
+
with open(index_path, "w") as f:
+
json.dump(index.model_dump(mode="json", exclude_none=True), f, indent=2, default=str)
+
+
def _load_index(self) -> GitStoreIndex:
+
"""Load the index from index.json."""
+
index_path = self.repo_path / "index.json"
+
if not index_path.exists():
+
return GitStoreIndex(
+
created=datetime.now(),
+
last_updated=datetime.now(),
+
)
+
+
with open(index_path) as f:
+
data = json.load(f)
+
+
return GitStoreIndex(**data)
+
+
def _save_duplicates(self, duplicates: DuplicateMap) -> None:
+
"""Save duplicates map to duplicates.json."""
+
duplicates_path = self.repo_path / "duplicates.json"
+
with open(duplicates_path, "w") as f:
+
json.dump(duplicates.model_dump(exclude_none=True), f, indent=2)
+
+
def _load_duplicates(self) -> DuplicateMap:
+
"""Load duplicates map from duplicates.json."""
+
duplicates_path = self.repo_path / "duplicates.json"
+
if not duplicates_path.exists():
+
return DuplicateMap()
+
+
with open(duplicates_path) as f:
+
data = json.load(f)
+
+
return DuplicateMap(**data)
+
+
def add_user(self, username: str, display_name: Optional[str] = None,
+
email: Optional[str] = None, homepage: Optional[str] = None,
+
icon: Optional[str] = None, feeds: Optional[list[str]] = None) -> UserMetadata:
+
"""Add a new user to the Git store."""
+
index = self._load_index()
+
+
# Create user directory
+
user_dir = self.repo_path / username
+
user_dir.mkdir(exist_ok=True)
+
+
# Create user metadata
+
user_metadata = UserMetadata(
+
username=username,
+
display_name=display_name,
+
email=email,
+
homepage=homepage,
+
icon=icon,
+
feeds=feeds or [],
+
directory=username,
+
created=datetime.now(),
+
last_updated=datetime.now(),
+
)
+
+
+
# Update index
+
index.add_user(user_metadata)
+
self._save_index(index)
+
+
return user_metadata
+
+
def get_user(self, username: str) -> Optional[UserMetadata]:
+
"""Get user metadata by username."""
+
index = self._load_index()
+
return index.get_user(username)
+
+
def update_user(self, username: str, **kwargs) -> bool:
+
"""Update user metadata."""
+
index = self._load_index()
+
user = index.get_user(username)
+
+
if not user:
+
return False
+
+
# Update user metadata
+
for key, value in kwargs.items():
+
if hasattr(user, key) and value is not None:
+
setattr(user, key, value)
+
+
user.update_timestamp()
+
+
+
# Update index
+
index.add_user(user)
+
self._save_index(index)
+
+
return True
+
+
def store_entry(self, username: str, entry: AtomEntry) -> bool:
+
"""Store an entry in the user's directory."""
+
user = self.get_user(username)
+
if not user:
+
return False
+
+
# Sanitize entry ID for filename
+
from .feed_parser import FeedParser
+
parser = FeedParser()
+
safe_id = parser.sanitize_entry_id(entry.id)
+
+
# Create entry file
+
user_dir = self.repo_path / user.directory
+
entry_path = user_dir / f"{safe_id}.json"
+
+
# Check if entry already exists
+
entry_exists = entry_path.exists()
+
+
# Save entry
+
with open(entry_path, "w") as f:
+
json.dump(entry.model_dump(mode="json", exclude_none=True), f, indent=2, default=str)
+
+
# Update user metadata if new entry
+
if not entry_exists:
+
index = self._load_index()
+
index.update_entry_count(username, 1)
+
self._save_index(index)
+
+
return True
+
+
def get_entry(self, username: str, entry_id: str) -> Optional[AtomEntry]:
+
"""Get an entry by username and entry ID."""
+
user = self.get_user(username)
+
if not user:
+
return None
+
+
# Sanitize entry ID
+
from .feed_parser import FeedParser
+
parser = FeedParser()
+
safe_id = parser.sanitize_entry_id(entry_id)
+
+
entry_path = self.repo_path / user.directory / f"{safe_id}.json"
+
if not entry_path.exists():
+
return None
+
+
with open(entry_path) as f:
+
data = json.load(f)
+
+
return AtomEntry(**data)
+
+
def list_entries(self, username: str, limit: Optional[int] = None) -> list[AtomEntry]:
+
"""List entries for a user."""
+
user = self.get_user(username)
+
if not user:
+
return []
+
+
user_dir = self.repo_path / user.directory
+
if not user_dir.exists():
+
return []
+
+
entries = []
+
entry_files = sorted(user_dir.glob("*.json"), key=lambda p: p.stat().st_mtime, reverse=True)
+
+
+
if limit:
+
entry_files = entry_files[:limit]
+
+
for entry_file in entry_files:
+
try:
+
with open(entry_file) as f:
+
data = json.load(f)
+
entries.append(AtomEntry(**data))
+
except Exception:
+
# Skip invalid entries
+
continue
+
+
return entries
+
+
def get_duplicates(self) -> DuplicateMap:
+
"""Get the duplicates map."""
+
return self._load_duplicates()
+
+
def add_duplicate(self, duplicate_id: str, canonical_id: str) -> None:
+
"""Add a duplicate mapping."""
+
duplicates = self._load_duplicates()
+
duplicates.add_duplicate(duplicate_id, canonical_id)
+
self._save_duplicates(duplicates)
+
+
def remove_duplicate(self, duplicate_id: str) -> bool:
+
"""Remove a duplicate mapping."""
+
duplicates = self._load_duplicates()
+
result = duplicates.remove_duplicate(duplicate_id)
+
self._save_duplicates(duplicates)
+
return result
+
+
def commit_changes(self, message: str) -> None:
+
"""Commit all changes to the Git repository."""
+
if not self.repo:
+
return
+
+
# Add all changes
+
self.repo.git.add(A=True)
+
+
# Check if there are changes to commit
+
if self.repo.index.diff("HEAD"):
+
self.repo.index.commit(message)
+
+
def get_stats(self) -> dict:
+
"""Get statistics about the Git store."""
+
index = self._load_index()
+
duplicates = self._load_duplicates()
+
+
return {
+
"total_users": len(index.users),
+
"total_entries": index.total_entries,
+
"total_duplicates": len(duplicates.duplicates),
+
"last_updated": index.last_updated,
+
"repository_size": sum(f.stat().st_size for f in self.repo_path.rglob("*") if f.is_file()),
+
}
+
+
def search_entries(self, query: str, username: Optional[str] = None,
+
limit: Optional[int] = None) -> list[tuple[str, AtomEntry]]:
+
"""Search entries by content."""
+
results = []
+
+
# Get users to search
+
index = self._load_index()
+
users = [index.get_user(username)] if username else list(index.users.values())
+
users = [u for u in users if u is not None]
+
+
for user in users:
+
user_dir = self.repo_path / user.directory
+
if not user_dir.exists():
+
continue
+
+
entry_files = user_dir.glob("*.json")
+
+
for entry_file in entry_files:
+
try:
+
with open(entry_file) as f:
+
data = json.load(f)
+
+
entry = AtomEntry(**data)
+
+
# Simple text search in title, summary, and content
+
searchable_text = " ".join(filter(None, [
+
entry.title,
+
entry.summary or "",
+
entry.content or "",
+
])).lower()
+
+
if query.lower() in searchable_text:
+
results.append((user.username, entry))
+
+
if limit and len(results) >= limit:
+
return results
+
+
except Exception:
+
# Skip invalid entries
+
continue
+
+
# Sort by updated time (newest first)
+
results.sort(key=lambda x: x[1].updated, reverse=True)
+
+
return results[:limit] if limit else results
+
</file>
+
+
<file path="ARCH.md">
+
# Thicket Architecture Design
+
+
## Overview
+
Thicket is a modern CLI tool for persisting Atom/RSS feeds in a Git repository, designed to enable distributed webblog comment structures.
+
+
## Technology Stack
+
+
### Core Libraries
+
+
#### CLI Framework
+
- **Typer** (0.15.x) - Modern CLI framework with type hints
+
- **Rich** (13.x) - Beautiful terminal output, progress bars, and tables
+
- **prompt-toolkit** - Interactive prompts when needed
+
+
#### Feed Processing
+
- **feedparser** (6.0.11) - Universal feed parser supporting RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0
+
- Alternative: **atoma** for stricter Atom/RSS parsing with JSON feed support
+
- Alternative: **fastfeedparser** for high-performance parsing (10x faster)
+
+
#### Git Integration
+
- **GitPython** (3.1.44) - High-level git operations, requires git CLI
+
- Alternative: **pygit2** (1.18.0) - Direct libgit2 bindings, better for authentication
+
+
#### HTTP Client
+
- **httpx** (0.28.x) - Modern async/sync HTTP client with connection pooling
+
- **aiohttp** (3.11.x) - For async-only operations if needed
+
+
#### Configuration & Data Models
+
- **pydantic** (2.11.x) - Data validation and settings management
+
- **pydantic-settings** (2.10.x) - Configuration file handling with env var support
+
+
#### Utilities
+
- **pendulum** (3.x) - Better datetime handling
+
- **bleach** (6.x) - HTML sanitization for feed content
+
- **platformdirs** (4.x) - Cross-platform directory paths
+
+
## Project Structure
+
+
```
+
thicket/
+
โ”œโ”€โ”€ pyproject.toml # Modern Python packaging
+
โ”œโ”€โ”€ README.md # Project documentation
+
โ”œโ”€โ”€ ARCH.md # This file
+
โ”œโ”€โ”€ CLAUDE.md # Project instructions
+
โ”œโ”€โ”€ .gitignore
+
โ”œโ”€โ”€ src/
+
โ”‚ โ””โ”€โ”€ thicket/
+
โ”‚ โ”œโ”€โ”€ __init__.py
+
โ”‚ โ”œโ”€โ”€ __main__.py # Entry point for `python -m thicket`
+
โ”‚ โ”œโ”€โ”€ cli/ # CLI commands and interface
+
โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
+
โ”‚ โ”‚ โ”œโ”€โ”€ main.py # Main CLI app with Typer
+
โ”‚ โ”‚ โ”œโ”€โ”€ commands/ # Subcommands
+
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
+
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ init.py # Initialize git store
+
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ add.py # Add users and feeds
+
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ sync.py # Sync feeds
+
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ list_cmd.py # List users/feeds
+
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ duplicates.py # Manage duplicate entries
+
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ links_cmd.py # Extract and categorize links
+
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ index_cmd.py # Build reference index and show threads
+
โ”‚ โ”‚ โ””โ”€โ”€ utils.py # CLI utilities (progress, formatting)
+
โ”‚ โ”œโ”€โ”€ core/ # Core business logic
+
โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
+
โ”‚ โ”‚ โ”œโ”€โ”€ feed_parser.py # Feed parsing and normalization
+
โ”‚ โ”‚ โ”œโ”€โ”€ git_store.py # Git repository operations
+
โ”‚ โ”‚ โ””โ”€โ”€ reference_parser.py # Link extraction and threading
+
โ”‚ โ”œโ”€โ”€ models/ # Pydantic data models
+
โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
+
โ”‚ โ”‚ โ”œโ”€โ”€ config.py # Configuration models
+
โ”‚ โ”‚ โ”œโ”€โ”€ feed.py # Feed/Entry models
+
โ”‚ โ”‚ โ””โ”€โ”€ user.py # User metadata models
+
โ”‚ โ””โ”€โ”€ utils/ # Shared utilities
+
โ”‚ โ””โ”€โ”€ __init__.py
+
โ”œโ”€โ”€ tests/
+
โ”‚ โ”œโ”€โ”€ __init__.py
+
โ”‚ โ”œโ”€โ”€ conftest.py # pytest configuration
+
โ”‚ โ”œโ”€โ”€ test_feed_parser.py
+
โ”‚ โ”œโ”€โ”€ test_git_store.py
+
โ”‚ โ””โ”€โ”€ fixtures/ # Test data
+
โ”‚ โ””โ”€โ”€ feeds/
+
โ””โ”€โ”€ docs/
+
โ””โ”€โ”€ examples/ # Example configurations
+
```
+
+
## Data Models
+
+
### Configuration File (YAML/TOML)
+
```python
+
class ThicketConfig(BaseSettings):
+
git_store: Path # Git repository location
+
cache_dir: Path # Cache directory
+
users: list[UserConfig]
+
+
model_config = SettingsConfigDict(
+
env_prefix="THICKET_",
+
env_file=".env",
+
yaml_file="thicket.yaml"
+
)
+
+
class UserConfig(BaseModel):
+
username: str
+
feeds: list[HttpUrl]
+
email: Optional[EmailStr] = None
+
homepage: Optional[HttpUrl] = None
+
icon: Optional[HttpUrl] = None
+
display_name: Optional[str] = None
+
```
+
+
### Feed Storage Format
+
```python
+
class AtomEntry(BaseModel):
+
id: str # Original Atom ID
+
title: str
+
link: HttpUrl
+
updated: datetime
+
published: Optional[datetime]
+
summary: Optional[str]
+
content: Optional[str] # Full body content from Atom entry
+
content_type: Optional[str] = "html" # text, html, xhtml
+
author: Optional[dict]
+
categories: list[str] = []
+
rights: Optional[str] = None # Copyright info
+
source: Optional[str] = None # Source feed URL
+
# Additional Atom fields preserved during RSS->Atom conversion
+
+
model_config = ConfigDict(
+
json_encoders={
+
datetime: lambda v: v.isoformat()
+
}
+
)
+
+
class DuplicateMap(BaseModel):
+
"""Maps duplicate entry IDs to canonical entry IDs"""
+
duplicates: dict[str, str] = {} # duplicate_id -> canonical_id
+
comment: str = "Entry IDs that map to the same canonical content"
+
+
def add_duplicate(self, duplicate_id: str, canonical_id: str) -> None:
+
"""Add a duplicate mapping"""
+
self.duplicates[duplicate_id] = canonical_id
+
+
def remove_duplicate(self, duplicate_id: str) -> bool:
+
"""Remove a duplicate mapping. Returns True if existed."""
+
return self.duplicates.pop(duplicate_id, None) is not None
+
+
def get_canonical(self, entry_id: str) -> str:
+
"""Get canonical ID for an entry (returns original if not duplicate)"""
+
return self.duplicates.get(entry_id, entry_id)
+
+
def is_duplicate(self, entry_id: str) -> bool:
+
"""Check if entry ID is marked as duplicate"""
+
return entry_id in self.duplicates
+
```
+
+
## Git Repository Structure
+
```
+
git-store/
+
โ”œโ”€โ”€ index.json # User directory index
+
โ”œโ”€โ”€ duplicates.json # Manual curation of duplicate entries
+
โ”œโ”€โ”€ links.json # Unified links, references, and mapping data
+
โ”œโ”€โ”€ user1/
+
โ”‚ โ”œโ”€โ”€ entry_id_1.json # Sanitized entry files
+
โ”‚ โ”œโ”€โ”€ entry_id_2.json
+
โ”‚ โ””โ”€โ”€ ...
+
โ””โ”€โ”€ user2/
+
โ””โ”€โ”€ ...
+
```
+
+
## Key Design Decisions
+
+
### 1. Feed Normalization & Auto-Discovery
+
- All RSS feeds converted to Atom format before storage
+
- Preserves maximum metadata during conversion
+
- Sanitizes HTML content to prevent XSS
+
- **Auto-discovery**: Extracts user metadata from feed during `add user` command
+
+
### 2. ID Sanitization
+
- Consistent algorithm to convert Atom IDs to safe filenames
+
- Handles edge cases (very long IDs, special characters)
+
- Maintains reversibility where possible
+
+
### 3. Git Operations
+
- Uses GitPython for simplicity (no authentication required)
+
- Single main branch for all users and entries
+
- Atomic commits per sync operation
+
- Meaningful commit messages with feed update summaries
+
- Preserves complete history - never delete entries even if they disappear from feeds
+
+
### 4. Caching Strategy
+
- HTTP caching with Last-Modified/ETag support
+
- Local cache of parsed feeds with TTL
+
- Cache invalidation on configuration changes
+
- Git store serves as permanent historical archive beyond feed depth limits
+
+
### 5. Error Handling
+
- Graceful handling of feed parsing errors
+
- Retry logic for network failures
+
- Clear error messages with recovery suggestions
+
+
## CLI Command Structure
+
+
```bash
+
# Initialize a new git store
+
thicket init /path/to/store
+
+
# Add a user with feeds (auto-discovers metadata from feed)
+
thicket add user "alyssa" \
+
--feed "https://example.com/feed.atom"
+
# Auto-populates: email, homepage, icon, display_name from feed metadata
+
+
# Add a user with manual overrides
+
thicket add user "alyssa" \
+
--feed "https://example.com/feed.atom" \
+
--email "alyssa@example.com" \
+
--homepage "https://alyssa.example.com" \
+
--icon "https://example.com/avatar.png" \
+
--display-name "Alyssa P. Hacker"
+
+
# Add additional feed to existing user
+
thicket add feed "alyssa" "https://example.com/other-feed.rss"
+
+
# Sync all feeds (designed for cron usage)
+
thicket sync --all
+
+
# Sync specific user
+
thicket sync --user alyssa
+
+
# List users and their feeds
+
thicket list users
+
thicket list feeds --user alyssa
+
+
# Manage duplicate entries
+
thicket duplicates list
+
thicket duplicates add <entry_id_1> <entry_id_2> # Mark as duplicates
+
thicket duplicates remove <entry_id_1> <entry_id_2> # Unmark duplicates
+
+
# Link processing and threading
+
thicket links --verbose # Extract and categorize all links
+
thicket index --verbose # Build reference index for threading
+
thicket threads # Show conversation threads
+
thicket threads --username user1 # Show threads for specific user
+
thicket threads --min-size 3 # Show threads with minimum size
+
```
+
+
## Performance Considerations
+
+
1. **Concurrent Feed Fetching**: Use httpx with asyncio for parallel downloads
+
2. **Incremental Updates**: Only fetch/parse feeds that have changed
+
3. **Efficient Git Operations**: Batch commits, use shallow clones where appropriate
+
4. **Progress Feedback**: Rich progress bars for long operations
+
+
## Security Considerations
+
+
1. **HTML Sanitization**: Use bleach to clean feed content
+
2. **URL Validation**: Strict validation of feed URLs
+
3. **Git Security**: No credentials stored in repository
+
4. **Path Traversal**: Careful sanitization of filenames
+
+
## Future Enhancements
+
+
1. **Web Interface**: Optional web UI for browsing the git store
+
2. **Webhooks**: Notify external services on feed updates
+
3. **Feed Discovery**: Auto-discover feeds from HTML pages
+
4. **Export Formats**: Generate static sites, OPML exports
+
5. **Federation**: P2P sync between thicket instances
+
+
## Requirements Clarification
+
+
**โœ“ Resolved Requirements:**
+
1. **Feed Update Frequency**: Designed for cron usage - no built-in scheduling needed
+
2. **Duplicate Handling**: Manual curation via `duplicates.json` file with CLI commands
+
3. **Git Branching**: Single main branch for all users and entries
+
4. **Authentication**: No feeds require authentication currently
+
5. **Content Storage**: Store complete Atom entry body content as provided
+
6. **Deleted Entries**: Preserve all entries in Git store permanently (historical archive)
+
7. **History Depth**: Git store maintains full history beyond feed depth limits
+
8. **Feed Auto-Discovery**: Extract user metadata from feed during `add user` command
+
+
## Duplicate Entry Management
+
+
### Duplicate Detection Strategy
+
- **Manual Curation**: Duplicates identified and managed manually via CLI
+
- **Storage**: `duplicates.json` file in Git root maps entry IDs to canonical entries
+
- **Structure**: `{"duplicate_id": "canonical_id", ...}`
+
- **CLI Commands**: Add/remove duplicate mappings with validation
+
- **Query Resolution**: Search/list commands resolve duplicates to canonical entries
+
+
### Duplicate File Format
+
```json
+
{
+
"https://example.com/feed/entry/123": "https://canonical.com/posts/same-post",
+
"https://mirror.com/articles/456": "https://canonical.com/posts/same-post",
+
"comment": "Entry IDs that map to the same canonical content"
+
}
+
```
+
+
## Feed Metadata Auto-Discovery
+
+
### Extraction Strategy
+
When adding a new user with `thicket add user`, the system fetches and parses the feed to extract:
+
+
- **Display Name**: From `feed.title` or `feed.author.name`
+
- **Email**: From `feed.author.email` or `feed.managingEditor`
+
- **Homepage**: From `feed.link` or `feed.author.uri`
+
- **Icon**: From `feed.logo`, `feed.icon`, or `feed.image.url`
+
+
### Discovery Priority Order
+
1. **Author Information**: Prefer `feed.author.*` fields (more specific to person)
+
2. **Feed-Level**: Fall back to feed-level metadata
+
3. **Manual Override**: CLI flags always take precedence over discovered values
+
4. **Update Behavior**: Auto-discovery only runs during initial `add user`, not on sync
+
+
### Extracted Metadata Format
+
```python
+
class FeedMetadata(BaseModel):
+
title: Optional[str] = None
+
author_name: Optional[str] = None
+
author_email: Optional[EmailStr] = None
+
author_uri: Optional[HttpUrl] = None
+
link: Optional[HttpUrl] = None
+
logo: Optional[HttpUrl] = None
+
icon: Optional[HttpUrl] = None
+
image_url: Optional[HttpUrl] = None
+
+
def to_user_config(self, username: str, feed_url: HttpUrl) -> UserConfig:
+
"""Convert discovered metadata to UserConfig with fallbacks"""
+
return UserConfig(
+
username=username,
+
feeds=[feed_url],
+
display_name=self.author_name or self.title,
+
email=self.author_email,
+
homepage=self.author_uri or self.link,
+
icon=self.logo or self.icon or self.image_url
+
)
+
```
+
+
## Link Processing and Threading Architecture
+
+
### Overview
+
The thicket system implements a sophisticated link processing and threading system to create email-style threaded views of blog entries by tracking cross-references between different blogs.
+
+
### Link Processing Pipeline
+
+
#### 1. Link Extraction (`thicket links`)
+
The `links` command systematically extracts all outbound links from blog entries and categorizes them:
+
+
```python
+
class LinkData(BaseModel):
+
url: str # Fully resolved URL
+
entry_id: str # Source entry ID
+
username: str # Source username
+
context: str # Surrounding text context
+
category: str # "internal", "user", or "unknown"
+
target_username: Optional[str] # Target user if applicable
+
```
+
+
**Link Categories:**
+
- **Internal**: Links to the same user's domain (self-references)
+
- **User**: Links to other tracked users' domains
+
- **Unknown**: Links to external sites not tracked by thicket
+
+
#### 2. URL Resolution
+
All links are properly resolved using the Atom feed's base URL to handle:
+
- Relative URLs (converted to absolute)
+
- Protocol-relative URLs
+
- Fragment identifiers
+
- Redirects and canonical URLs
+
+
#### 3. Domain Mapping
+
The system builds a comprehensive domain mapping from user configuration:
+
- Feed URLs โ†’ domain extraction
+
- Homepage URLs โ†’ domain extraction
+
- Reverse mapping: domain โ†’ username
+
+
### Threading System
+
+
#### 1. Reference Index Generation (`thicket index`)
+
Creates a bidirectional reference index from the categorized links:
+
+
```python
+
class BlogReference(BaseModel):
+
source_entry_id: str
+
source_username: str
+
target_url: str
+
target_username: Optional[str]
+
target_entry_id: Optional[str]
+
context: str
+
```
+
+
#### 2. Thread Detection Algorithm
+
Uses graph traversal to find connected blog entries:
+
- **Outbound references**: Links from an entry to other entries
+
- **Inbound references**: Links to an entry from other entries
+
- **Thread members**: All entries connected through references
+
+
#### 3. Threading Display (`thicket threads`)
+
Creates email-style threaded views:
+
- Chronological ordering within threads
+
- Reference counts (outbound/inbound)
+
- Context preservation
+
- Filtering options (user, entry, minimum size)
+
+
### Data Structures
+
+
#### links.json Format (Unified Structure)
+
```json
+
{
+
"links": {
+
"https://example.com/post/123": {
+
"referencing_entries": ["https://blog.user.com/entry/456"],
+
"target_username": "user2"
+
},
+
"https://external-site.com/article": {
+
"referencing_entries": ["https://blog.user.com/entry/789"]
+
}
+
},
+
"reverse_mapping": {
+
"https://blog.user.com/entry/456": ["https://example.com/post/123"],
+
"https://blog.user.com/entry/789": ["https://external-site.com/article"]
+
},
+
"references": [
+
{
+
"source_entry_id": "https://blog.user.com/entry/456",
+
"source_username": "user1",
+
"target_url": "https://example.com/post/123",
+
"target_username": "user2",
+
"target_entry_id": "https://example.com/post/123",
+
"context": "As mentioned in this post..."
+
}
+
],
+
"user_domains": {
+
"user1": ["blog.user.com"],
+
"user2": ["example.com"]
+
}
+
}
+
```
+
+
This unified structure eliminates duplication by:
+
- Storing each URL only once with minimal metadata
+
- Including all link data, reference data, and mappings in one file
+
- Using presence of `target_username` to identify tracked vs external links
+
- Providing bidirectional mappings for efficient queries
+
+
### Unified Structure Benefits
+
+
- **Eliminates Duplication**: Each URL appears only once with metadata
+
- **Single Source of Truth**: All link-related data in one file
+
- **Efficient Queries**: Fast lookups for both directions (URLโ†’entries, entryโ†’URLs)
+
- **Atomic Updates**: All link data changes together
+
- **Reduced I/O**: Fewer file operations
+
+
### Implementation Benefits
+
+
1. **Systematic Link Processing**: All links are extracted and categorized consistently
+
2. **Proper URL Resolution**: Handles relative URLs and base URL resolution correctly
+
3. **Domain-based Categorization**: Automatically identifies user-to-user references
+
4. **Bidirectional Indexing**: Supports both "who links to whom" and "who is linked by whom"
+
5. **Thread Discovery**: Finds conversation threads automatically
+
6. **Rich Context**: Preserves surrounding text for each link
+
7. **Performance**: Pre-computed indexes for fast threading queries
+
+
### CLI Commands
+
+
```bash
+
# Extract and categorize all links
+
thicket links --verbose
+
+
# Build reference index for threading
+
thicket index --verbose
+
+
# Show all conversation threads
+
thicket threads
+
+
# Show threads for specific user
+
thicket threads --username user1
+
+
# Show threads with minimum size
+
thicket threads --min-size 3
+
```
+
+
### Integration with Existing Commands
+
+
The link processing system integrates seamlessly with existing thicket commands:
+
- `thicket sync` updates entries, requiring `thicket links` to be run afterward
+
- `thicket index` uses the output from `thicket links` for improved accuracy
+
- `thicket threads` provides the user-facing threading interface
+
+
## Current Implementation Status
+
+
### โœ… Completed Features
+
1. **Core Infrastructure**
+
- Modern CLI with Typer and Rich
+
- Pydantic data models for type safety
+
- Git repository operations with GitPython
+
- Feed parsing and normalization with feedparser
+
+
2. **User and Feed Management**
+
- `thicket init` - Initialize git store
+
- `thicket add` - Add users and feeds with auto-discovery
+
- `thicket sync` - Sync feeds with progress tracking
+
- `thicket list` - List users, feeds, and entries
+
- `thicket duplicates` - Manage duplicate entries
+
+
3. **Link Processing and Threading**
+
- `thicket links` - Extract and categorize all outbound links
+
- `thicket index` - Build reference index from links
+
- `thicket threads` - Display threaded conversation views
+
- Proper URL resolution with base URL handling
+
- Domain-based link categorization
+
- Context preservation for links
+
+
### ๐Ÿ“Š System Performance
+
- **Link Extraction**: Successfully processes thousands of blog entries
+
- **Categorization**: Identifies internal, user, and unknown links
+
- **Threading**: Creates email-style threaded views of conversations
+
- **Storage**: Efficient JSON-based data structures for links and references
+
+
### ๐Ÿ”ง Current Architecture Highlights
+
- **Modular Design**: Clear separation between CLI, core logic, and models
+
- **Type Safety**: Comprehensive Pydantic models for data validation
+
- **Rich CLI**: Beautiful progress bars, tables, and error handling
+
- **Extensible**: Easy to add new commands and features
+
- **Git Integration**: All data stored in version-controlled JSON files
+
+
### ๐ŸŽฏ Proven Functionality
+
The system has been tested with real blog data and successfully:
+
- Extracted 14,396 total links from blog entries
+
- Categorized 3,994 internal links, 363 user-to-user links, and 10,039 unknown links
+
- Built comprehensive domain mappings for 16 users across 20 domains
+
- Generated threaded views showing blog conversation patterns
+
+
### ๐Ÿš€ Ready for Use
+
The thicket system is now fully functional for:
+
- Maintaining Git repositories of blog feeds
+
- Tracking cross-references between blogs
+
- Creating threaded views of blog conversations
+
- Discovering blog interaction patterns
+
- Building distributed comment systems
+
</file>
+
+
<file path="src/thicket/cli/utils.py">
+
"""CLI utilities and helpers."""
+
+
from pathlib import Path
+
from typing import Optional
+
+
import typer
+
from rich.console import Console
+
from rich.progress import Progress, SpinnerColumn, TextColumn
+
from rich.table import Table
+
+
from ..models import ThicketConfig, UserMetadata
+
from ..core.git_store import GitStore
+
+
console = Console()
+
+
+
def get_tsv_mode() -> bool:
+
"""Get the global TSV mode setting."""
+
from .main import tsv_mode
+
return tsv_mode
+
+
+
def load_config(config_path: Optional[Path] = None) -> ThicketConfig:
+
"""Load thicket configuration from file or environment."""
+
if config_path and config_path.exists():
+
import yaml
+
+
with open(config_path) as f:
+
config_data = yaml.safe_load(f)
+
+
# Convert to ThicketConfig
+
return ThicketConfig(**config_data)
+
+
# Try to load from default locations or environment
+
try:
+
# First try to find thicket.yaml in current directory
+
default_config = Path("thicket.yaml")
+
if default_config.exists():
+
import yaml
+
with open(default_config) as f:
+
config_data = yaml.safe_load(f)
+
return ThicketConfig(**config_data)
+
+
# Fall back to environment variables
+
return ThicketConfig()
+
except Exception as e:
+
console.print(f"[red]Error loading configuration: {e}[/red]")
+
console.print("[yellow]Run 'thicket init' to create a new configuration.[/yellow]")
+
raise typer.Exit(1) from e
+
+
+
def save_config(config: ThicketConfig, config_path: Path) -> None:
+
"""Save thicket configuration to file."""
+
import yaml
+
+
config_data = config.model_dump(mode="json", exclude_none=True)
+
+
# Convert Path objects to strings for YAML serialization
+
config_data["git_store"] = str(config_data["git_store"])
+
config_data["cache_dir"] = str(config_data["cache_dir"])
+
+
with open(config_path, "w") as f:
+
yaml.dump(config_data, f, default_flow_style=False, sort_keys=False)
+
+
+
def create_progress() -> Progress:
+
"""Create a Rich progress display."""
+
return Progress(
+
SpinnerColumn(),
+
TextColumn("[progress.description]{task.description}"),
+
console=console,
+
transient=True,
+
)
+
+
+
def print_users_table(config: ThicketConfig) -> None:
+
"""Print a table of users and their feeds."""
+
if get_tsv_mode():
+
print_users_tsv(config)
+
return
+
+
table = Table(title="Users and Feeds")
+
table.add_column("Username", style="cyan", no_wrap=True)
+
table.add_column("Display Name", style="magenta")
+
table.add_column("Email", style="blue")
+
table.add_column("Homepage", style="green")
+
table.add_column("Feeds", style="yellow")
+
+
for user in config.users:
+
feeds_str = "\n".join(str(feed) for feed in user.feeds)
+
table.add_row(
+
user.username,
+
user.display_name or "",
+
user.email or "",
+
str(user.homepage) if user.homepage else "",
+
feeds_str,
+
)
+
+
console.print(table)
+
+
+
def print_feeds_table(config: ThicketConfig, username: Optional[str] = None) -> None:
+
"""Print a table of feeds, optionally filtered by username."""
+
if get_tsv_mode():
+
print_feeds_tsv(config, username)
+
return
+
+
table = Table(title=f"Feeds{f' for {username}' if username else ''}")
+
table.add_column("Username", style="cyan", no_wrap=True)
+
table.add_column("Feed URL", style="blue")
+
table.add_column("Status", style="green")
+
+
users = [config.find_user(username)] if username else config.users
+
users = [u for u in users if u is not None]
+
+
for user in users:
+
for feed in user.feeds:
+
table.add_row(
+
user.username,
+
str(feed),
+
"Active", # TODO: Add actual status checking
+
)
+
+
console.print(table)
+
+
+
def confirm_action(message: str, default: bool = False) -> bool:
+
"""Prompt for confirmation."""
+
return typer.confirm(message, default=default)
+
+
+
def print_success(message: str) -> None:
+
"""Print a success message."""
+
console.print(f"[green]โœ“[/green] {message}")
+
+
+
def print_error(message: str) -> None:
+
"""Print an error message."""
+
console.print(f"[red]โœ—[/red] {message}")
+
+
+
def print_warning(message: str) -> None:
+
"""Print a warning message."""
+
console.print(f"[yellow]โš [/yellow] {message}")
+
+
+
def print_info(message: str) -> None:
+
"""Print an info message."""
+
console.print(f"[blue]โ„น[/blue] {message}")
+
+
+
def print_users_table_from_git(users: list[UserMetadata]) -> None:
+
"""Print a table of users from git repository."""
+
if get_tsv_mode():
+
print_users_tsv_from_git(users)
+
return
+
+
table = Table(title="Users and Feeds")
+
table.add_column("Username", style="cyan", no_wrap=True)
+
table.add_column("Display Name", style="magenta")
+
table.add_column("Email", style="blue")
+
table.add_column("Homepage", style="green")
+
table.add_column("Feeds", style="yellow")
+
+
for user in users:
+
feeds_str = "\n".join(user.feeds)
+
table.add_row(
+
user.username,
+
user.display_name or "",
+
user.email or "",
+
user.homepage or "",
+
feeds_str,
+
)
+
+
console.print(table)
+
+
+
def print_feeds_table_from_git(git_store: GitStore, username: Optional[str] = None) -> None:
+
"""Print a table of feeds from git repository."""
+
if get_tsv_mode():
+
print_feeds_tsv_from_git(git_store, username)
+
return
+
+
table = Table(title=f"Feeds{f' for {username}' if username else ''}")
+
table.add_column("Username", style="cyan", no_wrap=True)
+
table.add_column("Feed URL", style="blue")
+
table.add_column("Status", style="green")
+
+
if username:
+
user = git_store.get_user(username)
+
users = [user] if user else []
+
else:
+
index = git_store._load_index()
+
users = list(index.users.values())
+
+
for user in users:
+
for feed in user.feeds:
+
table.add_row(
+
user.username,
+
feed,
+
"Active", # TODO: Add actual status checking
+
)
+
+
console.print(table)
+
+
+
def print_users_tsv(config: ThicketConfig) -> None:
+
"""Print users in TSV format."""
+
print("Username\tDisplay Name\tEmail\tHomepage\tFeeds")
+
for user in config.users:
+
feeds_str = ",".join(str(feed) for feed in user.feeds)
+
print(f"{user.username}\t{user.display_name or ''}\t{user.email or ''}\t{user.homepage or ''}\t{feeds_str}")
+
+
+
def print_users_tsv_from_git(users: list[UserMetadata]) -> None:
+
"""Print users from git repository in TSV format."""
+
print("Username\tDisplay Name\tEmail\tHomepage\tFeeds")
+
for user in users:
+
feeds_str = ",".join(user.feeds)
+
print(f"{user.username}\t{user.display_name or ''}\t{user.email or ''}\t{user.homepage or ''}\t{feeds_str}")
+
+
+
def print_feeds_tsv(config: ThicketConfig, username: Optional[str] = None) -> None:
+
"""Print feeds in TSV format."""
+
print("Username\tFeed URL\tStatus")
+
users = [config.find_user(username)] if username else config.users
+
users = [u for u in users if u is not None]
+
+
for user in users:
+
for feed in user.feeds:
+
print(f"{user.username}\t{feed}\tActive")
+
+
+
def print_feeds_tsv_from_git(git_store: GitStore, username: Optional[str] = None) -> None:
+
"""Print feeds from git repository in TSV format."""
+
print("Username\tFeed URL\tStatus")
+
+
if username:
+
user = git_store.get_user(username)
+
users = [user] if user else []
+
else:
+
index = git_store._load_index()
+
users = list(index.users.values())
+
+
for user in users:
+
for feed in user.feeds:
+
print(f"{user.username}\t{feed}\tActive")
+
+
+
def print_entries_tsv(entries_by_user: list[list], usernames: list[str]) -> None:
+
"""Print entries in TSV format."""
+
print("User\tAtom ID\tTitle\tUpdated\tURL")
+
+
# Combine all entries with usernames
+
all_entries = []
+
for entries, username in zip(entries_by_user, usernames):
+
for entry in entries:
+
all_entries.append((username, entry))
+
+
# Sort by updated time (newest first)
+
all_entries.sort(key=lambda x: x[1].updated, reverse=True)
+
+
for username, entry in all_entries:
+
# Format updated time
+
updated_str = entry.updated.strftime("%Y-%m-%d %H:%M")
+
+
# Escape tabs and newlines in title to preserve TSV format
+
title = entry.title.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
+
+
print(f"{username}\t{entry.id}\t{title}\t{updated_str}\t{entry.link}")
+
</file>
+
+
</files>
+5 -1
src/thicket/__init__.py
···
-
"""Thicket: A CLI tool for persisting Atom/RSS feeds in Git repositories."""
+
"""Thicket - A library for managing feed repositories and static site generation."""
+
from .thicket import Thicket
+
from .models import AtomEntry, UserConfig, ThicketConfig
+
+
__all__ = ["Thicket", "AtomEntry", "UserConfig", "ThicketConfig"]
__version__ = "0.1.0"
__author__ = "thicket"
__email__ = "thicket@example.com"
+2 -2
src/thicket/cli/commands/__init__.py
···
"""CLI commands for thicket."""
# Import all commands to register them with the main app
-
from . import add, duplicates, info_cmd, init, links_cmd, list_cmd, sync, threads_cmd
+
from . import add, duplicates, generate, index_cmd, info_cmd, init, links_cmd, list_cmd, sync
-
__all__ = ["add", "duplicates", "info_cmd", "init", "links_cmd", "list_cmd", "sync", "threads_cmd"]
+
__all__ = ["add", "duplicates", "generate", "index_cmd", "info_cmd", "init", "links_cmd", "list_cmd", "sync"]
+44 -159
src/thicket/cli/commands/add.py
···
"""Add command for thicket."""
-
import asyncio
from pathlib import Path
from typing import Optional
import typer
-
from pydantic import HttpUrl, ValidationError
+
from pydantic import ValidationError
-
from ...core.feed_parser import FeedParser
-
from ...core.git_store import GitStore
-
from ..main import app
-
from ..utils import (
-
create_progress,
-
load_config,
-
print_error,
-
print_info,
-
print_success,
-
)
+
from ..main import app, console, load_thicket
@app.command("add")
-
def add_command(
-
subcommand: str = typer.Argument(..., help="Subcommand: 'user' or 'feed'"),
+
def add_user(
username: str = typer.Argument(..., help="Username"),
-
feed_url: Optional[str] = typer.Argument(None, help="Feed URL (required for 'user' command)"),
+
feeds: list[str] = typer.Argument(..., help="Feed URLs"),
email: Optional[str] = typer.Option(None, "--email", "-e", help="User email"),
homepage: Optional[str] = typer.Option(None, "--homepage", "-h", help="User homepage"),
icon: Optional[str] = typer.Option(None, "--icon", "-i", help="User icon URL"),
display_name: Optional[str] = typer.Option(None, "--display-name", "-d", help="User display name"),
config_file: Optional[Path] = typer.Option(
-
Path("thicket.yaml"), "--config", help="Configuration file path"
-
),
-
auto_discover: bool = typer.Option(
-
True, "--auto-discover/--no-auto-discover", help="Auto-discover user metadata from feed"
+
None, "--config", help="Configuration file path"
),
) -> None:
-
"""Add a user or feed to thicket."""
-
-
if subcommand == "user":
-
add_user(username, feed_url, email, homepage, icon, display_name, config_file, auto_discover)
-
elif subcommand == "feed":
-
add_feed(username, feed_url, config_file)
-
else:
-
print_error(f"Unknown subcommand: {subcommand}")
-
print_error("Use 'user' or 'feed'")
-
raise typer.Exit(1)
-
-
-
def add_user(
-
username: str,
-
feed_url: Optional[str],
-
email: Optional[str],
-
homepage: Optional[str],
-
icon: Optional[str],
-
display_name: Optional[str],
-
config_file: Path,
-
auto_discover: bool,
-
) -> None:
-
"""Add a new user with feed."""
-
-
if not feed_url:
-
print_error("Feed URL is required when adding a user")
-
raise typer.Exit(1)
-
-
# Validate feed URL
+
"""Add a user with their feeds to thicket."""
+
try:
-
validated_feed_url = HttpUrl(feed_url)
-
except ValidationError:
-
print_error(f"Invalid feed URL: {feed_url}")
-
raise typer.Exit(1) from None
-
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
# Check if user already exists
-
existing_user = git_store.get_user(username)
-
if existing_user:
-
print_error(f"User '{username}' already exists")
-
print_error("Use 'thicket add feed' to add additional feeds")
+
# Load Thicket instance
+
thicket = load_thicket(config_file)
+
+
# Prepare user data
+
user_data = {}
+
if email:
+
user_data['email'] = email
+
if homepage:
+
user_data['homepage'] = homepage
+
if icon:
+
user_data['icon'] = icon
+
if display_name:
+
user_data['display_name'] = display_name
+
+
# Add the user
+
user_config = thicket.add_user(username, feeds, **user_data)
+
+
console.print(f"[green]โœ“[/green] Added user: {username}")
+
console.print(f" โ€ข Display name: {user_config.display_name or 'None'}")
+
console.print(f" โ€ข Email: {user_config.email or 'None'}")
+
console.print(f" โ€ข Homepage: {user_config.homepage or 'None'}")
+
console.print(f" โ€ข Feeds: {len(user_config.feeds)}")
+
+
for feed in user_config.feeds:
+
console.print(f" - {feed}")
+
+
# Commit the addition
+
commit_message = f"Add user {username} with {len(feeds)} feed(s)"
+
if thicket.commit_changes(commit_message):
+
console.print(f"[green]โœ“[/green] Committed: {commit_message}")
+
else:
+
console.print("[yellow]Warning:[/yellow] Failed to commit changes")
+
+
except ValidationError as e:
+
console.print(f"[red]Validation Error:[/red] {str(e)}")
raise typer.Exit(1)
-
-
# Auto-discover metadata if enabled
-
discovered_metadata = None
-
if auto_discover:
-
discovered_metadata = asyncio.run(discover_feed_metadata(validated_feed_url))
-
-
# Prepare user data with manual overrides taking precedence
-
user_display_name = display_name or (discovered_metadata.author_name or discovered_metadata.title if discovered_metadata else None)
-
user_email = email or (discovered_metadata.author_email if discovered_metadata else None)
-
user_homepage = homepage or (str(discovered_metadata.author_uri or discovered_metadata.link) if discovered_metadata else None)
-
user_icon = icon or (str(discovered_metadata.logo or discovered_metadata.icon or discovered_metadata.image_url) if discovered_metadata else None)
-
-
# Add user to Git store
-
git_store.add_user(
-
username=username,
-
display_name=user_display_name,
-
email=user_email,
-
homepage=user_homepage,
-
icon=user_icon,
-
feeds=[str(validated_feed_url)],
-
)
-
-
# Commit changes
-
git_store.commit_changes(f"Add user: {username}")
-
-
print_success(f"Added user '{username}' with feed: {feed_url}")
-
-
if discovered_metadata and auto_discover:
-
print_info("Auto-discovered metadata:")
-
if user_display_name:
-
print_info(f" Display name: {user_display_name}")
-
if user_email:
-
print_info(f" Email: {user_email}")
-
if user_homepage:
-
print_info(f" Homepage: {user_homepage}")
-
if user_icon:
-
print_info(f" Icon: {user_icon}")
-
-
-
def add_feed(username: str, feed_url: Optional[str], config_file: Path) -> None:
-
"""Add a feed to an existing user."""
-
-
if not feed_url:
-
print_error("Feed URL is required")
+
except Exception as e:
+
console.print(f"[red]Error:[/red] {str(e)}")
raise typer.Exit(1)
-
# Validate feed URL
-
try:
-
validated_feed_url = HttpUrl(feed_url)
-
except ValidationError:
-
print_error(f"Invalid feed URL: {feed_url}")
-
raise typer.Exit(1) from None
-
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
# Check if user exists
-
user = git_store.get_user(username)
-
if not user:
-
print_error(f"User '{username}' not found")
-
print_error("Use 'thicket add user' to add a new user")
-
raise typer.Exit(1)
-
-
# Check if feed already exists
-
if str(validated_feed_url) in user.feeds:
-
print_error(f"Feed already exists for user '{username}': {feed_url}")
-
raise typer.Exit(1)
-
-
# Add feed to user
-
updated_feeds = user.feeds + [str(validated_feed_url)]
-
if git_store.update_user(username, feeds=updated_feeds):
-
git_store.commit_changes(f"Add feed to user {username}: {feed_url}")
-
print_success(f"Added feed to user '{username}': {feed_url}")
-
else:
-
print_error(f"Failed to add feed to user '{username}'")
-
raise typer.Exit(1)
-
-
-
async def discover_feed_metadata(feed_url: HttpUrl):
-
"""Discover metadata from a feed URL."""
-
try:
-
with create_progress() as progress:
-
task = progress.add_task("Discovering feed metadata...", total=None)
-
-
parser = FeedParser()
-
content = await parser.fetch_feed(feed_url)
-
metadata, _ = parser.parse_feed(content, feed_url)
-
-
progress.update(task, completed=True)
-
return metadata
-
-
except Exception as e:
-
print_error(f"Failed to discover feed metadata: {e}")
-
return None
+59
src/thicket/cli/commands/generate.py
···
+
"""Generate static HTML website from thicket data."""
+
+
from pathlib import Path
+
from typing import Optional
+
+
import typer
+
+
from ..main import app, console, load_thicket
+
+
+
+
+
@app.command()
+
def generate(
+
output: Path = typer.Option(
+
Path("./thicket-site"),
+
"--output",
+
"-o",
+
help="Output directory for the generated website",
+
),
+
template_dir: Optional[Path] = typer.Option(
+
None, "--templates", help="Custom template directory"
+
),
+
config_file: Optional[Path] = typer.Option(
+
None, "--config", help="Configuration file path"
+
),
+
) -> None:
+
"""Generate a static HTML website from thicket data."""
+
+
try:
+
# Load Thicket instance
+
thicket = load_thicket(config_file)
+
+
console.print(f"[blue]Generating static site to:[/blue] {output}")
+
+
# Generate the complete site
+
if thicket.generate_site(output, template_dir):
+
console.print(f"[green]โœ“[/green] Successfully generated site at {output}")
+
+
# Show what was generated
+
stats = thicket.get_stats()
+
console.print(f" โ€ข {stats.get('total_entries', 0)} entries")
+
console.print(f" โ€ข {stats.get('total_users', 0)} users")
+
console.print(f" โ€ข {stats.get('unique_urls', 0)} unique links")
+
+
# List generated files
+
if output.exists():
+
html_files = list(output.glob("*.html"))
+
if html_files:
+
console.print(" โ€ข Generated pages:")
+
for html_file in sorted(html_files):
+
console.print(f" - {html_file.name}")
+
else:
+
console.print("[red]โœ—[/red] Failed to generate site")
+
raise typer.Exit(1)
+
+
except Exception as e:
+
console.print(f"[red]Error:[/red] {str(e)}")
+
raise typer.Exit(1)
+427
src/thicket/cli/commands/index_cmd.py
···
+
"""CLI command for building reference index from blog entries."""
+
+
import json
+
from pathlib import Path
+
from typing import Optional
+
+
import typer
+
from rich.console import Console
+
from rich.progress import (
+
BarColumn,
+
Progress,
+
SpinnerColumn,
+
TaskProgressColumn,
+
TextColumn,
+
)
+
from rich.table import Table
+
+
from ...core.git_store import GitStore
+
from ...core.reference_parser import ReferenceIndex, ReferenceParser
+
from ..main import app
+
from ..utils import get_tsv_mode, load_config
+
+
console = Console()
+
+
+
@app.command()
+
def index(
+
config_file: Optional[Path] = typer.Option(
+
None,
+
"--config",
+
"-c",
+
help="Path to configuration file",
+
),
+
output_file: Optional[Path] = typer.Option(
+
None,
+
"--output",
+
"-o",
+
help="Path to output index file (default: updates links.json in git store)",
+
),
+
verbose: bool = typer.Option(
+
False,
+
"--verbose",
+
"-v",
+
help="Show detailed progress information",
+
),
+
) -> None:
+
"""Build a reference index showing which blog entries reference others.
+
+
This command analyzes all blog entries to detect cross-references between
+
different blogs, creating an index that can be used to build threaded
+
views of related content.
+
+
Updates the unified links.json file with reference data.
+
"""
+
try:
+
# Load configuration
+
config = load_config(config_file)
+
+
# Initialize Git store
+
git_store = GitStore(config.git_store)
+
+
# Initialize reference parser
+
parser = ReferenceParser()
+
+
# Build user domain mapping
+
if verbose:
+
console.print("Building user domain mapping...")
+
user_domains = parser.build_user_domain_mapping(git_store)
+
+
if verbose:
+
console.print(f"Found {len(user_domains)} users with {sum(len(d) for d in user_domains.values())} total domains")
+
+
# Initialize reference index
+
ref_index = ReferenceIndex()
+
ref_index.user_domains = user_domains
+
+
# Get all users
+
index = git_store._load_index()
+
users = list(index.users.keys())
+
+
if not users:
+
console.print("[yellow]No users found in Git store[/yellow]")
+
raise typer.Exit(0)
+
+
# Process all entries
+
total_entries = 0
+
total_references = 0
+
all_references = []
+
+
with Progress(
+
SpinnerColumn(),
+
TextColumn("[progress.description]{task.description}"),
+
BarColumn(),
+
TaskProgressColumn(),
+
console=console,
+
) as progress:
+
+
# Count total entries first
+
counting_task = progress.add_task("Counting entries...", total=len(users))
+
entry_counts = {}
+
for username in users:
+
entries = git_store.list_entries(username)
+
entry_counts[username] = len(entries)
+
total_entries += len(entries)
+
progress.advance(counting_task)
+
+
progress.remove_task(counting_task)
+
+
# Process entries - extract references
+
processing_task = progress.add_task(
+
f"Extracting references from {total_entries} entries...",
+
total=total_entries
+
)
+
+
for username in users:
+
entries = git_store.list_entries(username)
+
+
for entry in entries:
+
# Extract references from this entry
+
references = parser.extract_references(entry, username, user_domains)
+
all_references.extend(references)
+
+
progress.advance(processing_task)
+
+
if verbose and references:
+
console.print(f" Found {len(references)} references in {username}:{entry.title[:50]}...")
+
+
progress.remove_task(processing_task)
+
+
# Resolve target_entry_ids for references
+
if all_references:
+
resolve_task = progress.add_task(
+
f"Resolving {len(all_references)} references...",
+
total=len(all_references)
+
)
+
+
if verbose:
+
console.print(f"Resolving target entry IDs for {len(all_references)} references...")
+
+
resolved_references = parser.resolve_target_entry_ids(all_references, git_store)
+
+
# Count resolved references
+
resolved_count = sum(1 for ref in resolved_references if ref.target_entry_id is not None)
+
if verbose:
+
console.print(f"Resolved {resolved_count} out of {len(all_references)} references")
+
+
# Add resolved references to index
+
for ref in resolved_references:
+
ref_index.add_reference(ref)
+
total_references += 1
+
progress.advance(resolve_task)
+
+
progress.remove_task(resolve_task)
+
+
# Determine output path
+
if output_file:
+
output_path = output_file
+
else:
+
output_path = config.git_store / "links.json"
+
+
# Load existing links data or create new structure
+
if output_path.exists() and not output_file:
+
# Load existing unified structure
+
with open(output_path) as f:
+
existing_data = json.load(f)
+
else:
+
# Create new structure
+
existing_data = {
+
"links": {},
+
"reverse_mapping": {},
+
"user_domains": {}
+
}
+
+
# Update with reference data
+
existing_data["references"] = ref_index.to_dict()["references"]
+
existing_data["user_domains"] = {k: list(v) for k, v in user_domains.items()}
+
+
# Save updated structure
+
with open(output_path, "w") as f:
+
json.dump(existing_data, f, indent=2, default=str)
+
+
# Show summary
+
if not get_tsv_mode():
+
console.print("\n[green]โœ“ Reference index built successfully[/green]")
+
+
# Create summary table or TSV output
+
if get_tsv_mode():
+
print("Metric\tCount")
+
print(f"Total Users\t{len(users)}")
+
print(f"Total Entries\t{total_entries}")
+
print(f"Total References\t{total_references}")
+
print(f"Outbound Refs\t{len(ref_index.outbound_refs)}")
+
print(f"Inbound Refs\t{len(ref_index.inbound_refs)}")
+
print(f"Output File\t{output_path}")
+
else:
+
table = Table(title="Reference Index Summary")
+
table.add_column("Metric", style="cyan")
+
table.add_column("Count", style="green")
+
+
table.add_row("Total Users", str(len(users)))
+
table.add_row("Total Entries", str(total_entries))
+
table.add_row("Total References", str(total_references))
+
table.add_row("Outbound Refs", str(len(ref_index.outbound_refs)))
+
table.add_row("Inbound Refs", str(len(ref_index.inbound_refs)))
+
table.add_row("Output File", str(output_path))
+
+
console.print(table)
+
+
# Show some interesting statistics
+
if total_references > 0:
+
if not get_tsv_mode():
+
console.print("\n[bold]Reference Statistics:[/bold]")
+
+
# Most referenced users
+
target_counts = {}
+
unresolved_domains = set()
+
+
for ref in ref_index.references:
+
if ref.target_username:
+
target_counts[ref.target_username] = target_counts.get(ref.target_username, 0) + 1
+
else:
+
# Track unresolved domains
+
from urllib.parse import urlparse
+
domain = urlparse(ref.target_url).netloc.lower()
+
unresolved_domains.add(domain)
+
+
if target_counts:
+
if get_tsv_mode():
+
print("Referenced User\tReference Count")
+
for username, count in sorted(target_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
+
print(f"{username}\t{count}")
+
else:
+
console.print("\nMost referenced users:")
+
for username, count in sorted(target_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
+
console.print(f" {username}: {count} references")
+
+
if unresolved_domains and verbose:
+
if get_tsv_mode():
+
print("Unresolved Domain\tCount")
+
for domain in sorted(list(unresolved_domains)[:10]):
+
print(f"{domain}\t1")
+
if len(unresolved_domains) > 10:
+
print(f"... and {len(unresolved_domains) - 10} more\t...")
+
else:
+
console.print(f"\nUnresolved domains: {len(unresolved_domains)}")
+
for domain in sorted(list(unresolved_domains)[:10]):
+
console.print(f" {domain}")
+
if len(unresolved_domains) > 10:
+
console.print(f" ... and {len(unresolved_domains) - 10} more")
+
+
except Exception as e:
+
console.print(f"[red]Error building reference index: {e}[/red]")
+
if verbose:
+
console.print_exception()
+
raise typer.Exit(1)
+
+
+
@app.command()
+
def threads(
+
config_file: Optional[Path] = typer.Option(
+
None,
+
"--config",
+
"-c",
+
help="Path to configuration file",
+
),
+
index_file: Optional[Path] = typer.Option(
+
None,
+
"--index",
+
"-i",
+
help="Path to reference index file (default: links.json in git store)",
+
),
+
username: Optional[str] = typer.Option(
+
None,
+
"--username",
+
"-u",
+
help="Show threads for specific username only",
+
),
+
entry_id: Optional[str] = typer.Option(
+
None,
+
"--entry",
+
"-e",
+
help="Show thread for specific entry ID",
+
),
+
min_size: int = typer.Option(
+
2,
+
"--min-size",
+
"-m",
+
help="Minimum thread size to display",
+
),
+
) -> None:
+
"""Show threaded view of related blog entries.
+
+
This command uses the reference index to show which blog entries
+
are connected through cross-references, creating an email-style
+
threaded view of the conversation.
+
+
Reads reference data from the unified links.json file.
+
"""
+
try:
+
# Load configuration
+
config = load_config(config_file)
+
+
# Determine index file path
+
if index_file:
+
index_path = index_file
+
else:
+
index_path = config.git_store / "links.json"
+
+
if not index_path.exists():
+
console.print(f"[red]Links file not found: {index_path}[/red]")
+
console.print("Run 'thicket links' and 'thicket index' first to build the reference index")
+
raise typer.Exit(1)
+
+
# Load unified data
+
with open(index_path) as f:
+
unified_data = json.load(f)
+
+
# Check if references exist in the unified structure
+
if "references" not in unified_data:
+
console.print(f"[red]No references found in {index_path}[/red]")
+
console.print("Run 'thicket index' first to build the reference index")
+
raise typer.Exit(1)
+
+
# Extract reference data and reconstruct ReferenceIndex
+
ref_index = ReferenceIndex.from_dict({
+
"references": unified_data["references"],
+
"user_domains": unified_data.get("user_domains", {})
+
})
+
+
# Initialize Git store to get entry details
+
git_store = GitStore(config.git_store)
+
+
if entry_id and username:
+
# Show specific thread
+
thread_members = ref_index.get_thread_members(username, entry_id)
+
_display_thread(thread_members, ref_index, git_store, f"Thread for {username}:{entry_id}")
+
+
elif username:
+
# Show all threads involving this user
+
user_index = git_store._load_index()
+
user = user_index.get_user(username)
+
if not user:
+
console.print(f"[red]User not found: {username}[/red]")
+
raise typer.Exit(1)
+
+
entries = git_store.list_entries(username)
+
threads_found = set()
+
+
console.print(f"[bold]Threads involving {username}:[/bold]\n")
+
+
for entry in entries:
+
thread_members = ref_index.get_thread_members(username, entry.id)
+
if len(thread_members) >= min_size:
+
thread_key = tuple(sorted(thread_members))
+
if thread_key not in threads_found:
+
threads_found.add(thread_key)
+
_display_thread(thread_members, ref_index, git_store, f"Thread #{len(threads_found)}")
+
+
else:
+
# Show all threads
+
console.print("[bold]All conversation threads:[/bold]\n")
+
+
all_threads = set()
+
processed_entries = set()
+
+
# Get all entries
+
user_index = git_store._load_index()
+
for username in user_index.users.keys():
+
entries = git_store.list_entries(username)
+
for entry in entries:
+
entry_key = (username, entry.id)
+
if entry_key in processed_entries:
+
continue
+
+
thread_members = ref_index.get_thread_members(username, entry.id)
+
if len(thread_members) >= min_size:
+
thread_key = tuple(sorted(thread_members))
+
if thread_key not in all_threads:
+
all_threads.add(thread_key)
+
_display_thread(thread_members, ref_index, git_store, f"Thread #{len(all_threads)}")
+
+
# Mark all members as processed
+
for member in thread_members:
+
processed_entries.add(member)
+
+
if not all_threads:
+
console.print("[yellow]No conversation threads found[/yellow]")
+
console.print(f"(minimum thread size: {min_size})")
+
+
except Exception as e:
+
console.print(f"[red]Error showing threads: {e}[/red]")
+
raise typer.Exit(1)
+
+
+
def _display_thread(thread_members, ref_index, git_store, title):
+
"""Display a single conversation thread."""
+
console.print(f"[bold cyan]{title}[/bold cyan]")
+
console.print(f"Thread size: {len(thread_members)} entries")
+
+
# Get entry details for each member
+
thread_entries = []
+
for username, entry_id in thread_members:
+
entry = git_store.get_entry(username, entry_id)
+
if entry:
+
thread_entries.append((username, entry))
+
+
# Sort by publication date
+
thread_entries.sort(key=lambda x: x[1].published or x[1].updated)
+
+
# Display entries
+
for i, (username, entry) in enumerate(thread_entries):
+
prefix = "โ”œโ”€" if i < len(thread_entries) - 1 else "โ””โ”€"
+
+
# Get references for this entry
+
outbound = ref_index.get_outbound_refs(username, entry.id)
+
inbound = ref_index.get_inbound_refs(username, entry.id)
+
+
ref_info = ""
+
if outbound or inbound:
+
ref_info = f" ({len(outbound)} out, {len(inbound)} in)"
+
+
console.print(f" {prefix} [{username}] {entry.title[:60]}...{ref_info}")
+
+
if entry.published:
+
console.print(f" Published: {entry.published.strftime('%Y-%m-%d')}")
+
+
console.print() # Empty line after each thread
+52 -33
src/thicket/cli/commands/info_cmd.py
···
"""CLI command for displaying detailed information about a specific atom entry."""
+
import json
from pathlib import Path
from typing import Optional
···
from rich.text import Text
from ...core.git_store import GitStore
+
from ...core.reference_parser import ReferenceIndex
from ..main import app
from ..utils import load_config, get_tsv_mode
···
console.print(f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found in any user's entries[/red]")
raise typer.Exit(1)
+
# Load reference index if available
+
links_path = config.git_store / "links.json"
+
ref_index = None
+
if links_path.exists():
+
with open(links_path) as f:
+
unified_data = json.load(f)
+
+
# Check if references exist in the unified structure
+
if "references" in unified_data:
+
ref_index = ReferenceIndex.from_dict({
+
"references": unified_data["references"],
+
"user_domains": unified_data.get("user_domains", {})
+
})
+
# Display information
if get_tsv_mode():
-
_display_entry_info_tsv(entry, found_username, show_content)
+
_display_entry_info_tsv(entry, found_username, ref_index, show_content)
else:
_display_entry_info(entry, found_username)
-
# Display links and backlinks from entry fields
-
_display_link_info(entry, found_username, git_store)
+
if ref_index:
+
_display_link_info(entry, found_username, ref_index)
+
else:
+
console.print("\n[yellow]No reference index found. Run 'thicket links' and 'thicket index' to build cross-reference data.[/yellow]")
# Optionally display content
if show_content and entry.content:
···
console.print(panel)
-
def _display_link_info(entry, username: str, git_store: GitStore) -> None:
+
def _display_link_info(entry, username: str, ref_index: ReferenceIndex) -> None:
"""Display inbound and outbound link information."""
-
# Get links from entry fields
-
outbound_links = getattr(entry, 'links', [])
-
backlinks = getattr(entry, 'backlinks', [])
+
# Get links
+
outbound_refs = ref_index.get_outbound_refs(username, entry.id)
+
inbound_refs = ref_index.get_inbound_refs(username, entry.id)
-
if not outbound_links and not backlinks:
+
if not outbound_refs and not inbound_refs:
console.print("\n[dim]No cross-references found for this entry.[/dim]")
return
# Create links table
links_table = Table(title="Cross-References")
links_table.add_column("Direction", style="cyan", width=10)
-
links_table.add_column("Target/Source", style="green", width=30)
-
links_table.add_column("URL/ID", style="blue", width=60)
+
links_table.add_column("Target/Source", style="green", width=20)
+
links_table.add_column("URL", style="blue", width=50)
-
# Add outbound links
-
for link in outbound_links:
-
links_table.add_row("โ†’ Out", "External/Other", link)
+
# Add outbound references
+
for ref in outbound_refs:
+
target_info = f"{ref.target_username}:{ref.target_entry_id}" if ref.target_username and ref.target_entry_id else "External"
+
links_table.add_row("โ†’ Out", target_info, ref.target_url)
-
# Add backlinks (inbound references)
-
for backlink_id in backlinks:
-
# Try to find which user this entry belongs to
-
source_info = backlink_id
-
# Could enhance this by looking up the actual entry to get username
-
links_table.add_row("โ† In", "Entry", source_info)
+
# Add inbound references
+
for ref in inbound_refs:
+
source_info = f"{ref.source_username}:{ref.source_entry_id}"
+
links_table.add_row("โ† In", source_info, ref.target_url)
console.print()
console.print(links_table)
# Summary
-
console.print(f"\n[bold]Summary:[/bold] {len(outbound_links)} outbound links, {len(backlinks)} inbound backlinks")
+
console.print(f"\n[bold]Summary:[/bold] {len(outbound_refs)} outbound, {len(inbound_refs)} inbound references")
def _display_content(content: str) -> None:
···
console.print(panel)
-
def _display_entry_info_tsv(entry, username: str, show_content: bool) -> None:
+
def _display_entry_info_tsv(entry, username: str, ref_index: Optional[ReferenceIndex], show_content: bool) -> None:
"""Display entry information in TSV format."""
# Basic info
···
if entry.source:
print(f"Source Feed\t{entry.source}")
-
# Add links info from entry fields
-
outbound_links = getattr(entry, 'links', [])
-
backlinks = getattr(entry, 'backlinks', [])
-
-
if outbound_links or backlinks:
-
print(f"Outbound Links\t{len(outbound_links)}")
-
print(f"Backlinks\t{len(backlinks)}")
+
# Add reference info if available
+
if ref_index:
+
outbound_refs = ref_index.get_outbound_refs(username, entry.id)
+
inbound_refs = ref_index.get_inbound_refs(username, entry.id)
+
+
print(f"Outbound References\t{len(outbound_refs)}")
+
print(f"Inbound References\t{len(inbound_refs)}")
-
# Show each link
-
for link in outbound_links:
-
print(f"โ†’ Link\t{link}")
+
# Show each reference
+
for ref in outbound_refs:
+
target_info = f"{ref.target_username}:{ref.target_entry_id}" if ref.target_username and ref.target_entry_id else "External"
+
print(f"Outbound Reference\t{target_info}\t{ref.target_url}")
-
for backlink_id in backlinks:
-
print(f"โ† Backlink\t{backlink_id}")
+
for ref in inbound_refs:
+
source_info = f"{ref.source_username}:{ref.source_entry_id}"
+
print(f"Inbound Reference\t{source_info}\t{ref.target_url}")
# Show content if requested
if show_content and entry.content:
+50 -39
src/thicket/cli/commands/init.py
···
"""Initialize command for thicket."""
+
import yaml
from pathlib import Path
from typing import Optional
import typer
-
from pydantic import ValidationError
-
from ...core.git_store import GitStore
+
from ..main import app, console, get_config_path
from ...models import ThicketConfig
-
from ..main import app
-
from ..utils import print_error, print_success, save_config
+
from ... import Thicket
@app.command()
···
None, "--cache-dir", "-c", help="Cache directory (default: ~/.cache/thicket)"
),
config_file: Optional[Path] = typer.Option(
-
None, "--config", help="Configuration file path (default: thicket.yaml)"
+
None, "--config", help="Configuration file path (default: ~/.config/thicket/config.yaml)"
),
force: bool = typer.Option(
False, "--force", "-f", help="Overwrite existing configuration"
···
# Set default paths
if cache_dir is None:
-
from platformdirs import user_cache_dir
-
cache_dir = Path(user_cache_dir("thicket"))
+
cache_dir = Path.home() / ".cache" / "thicket"
if config_file is None:
-
config_file = Path("thicket.yaml")
+
config_file = get_config_path()
# Check if config already exists
if config_file.exists() and not force:
-
print_error(f"Configuration file already exists: {config_file}")
-
print_error("Use --force to overwrite")
+
console.print(f"[red]Configuration file already exists:[/red] {config_file}")
+
console.print("Use --force to overwrite")
raise typer.Exit(1)
-
# Create cache directory
-
cache_dir.mkdir(parents=True, exist_ok=True)
+
try:
+
# Create directories
+
git_store.mkdir(parents=True, exist_ok=True)
+
cache_dir.mkdir(parents=True, exist_ok=True)
+
config_file.parent.mkdir(parents=True, exist_ok=True)
-
# Create Git store
-
try:
-
GitStore(git_store)
-
print_success(f"Initialized Git store at: {git_store}")
-
except Exception as e:
-
print_error(f"Failed to initialize Git store: {e}")
-
raise typer.Exit(1) from e
+
# Create Thicket instance with minimal config
+
thicket = Thicket.create(git_store, cache_dir)
+
+
# Initialize the repository
+
if thicket.init_repository():
+
console.print(f"[green]โœ“[/green] Initialized Git store at: {git_store}")
+
else:
+
console.print(f"[red]โœ—[/red] Failed to initialize Git store")
+
raise typer.Exit(1)
+
+
# Save configuration
+
config_data = {
+
'git_store': str(git_store),
+
'cache_dir': str(cache_dir),
+
'users': []
+
}
+
+
with open(config_file, 'w') as f:
+
yaml.dump(config_data, f, default_flow_style=False)
+
+
console.print(f"[green]โœ“[/green] Created configuration file: {config_file}")
-
# Create configuration
-
try:
-
config = ThicketConfig(
-
git_store=git_store,
-
cache_dir=cache_dir,
-
users=[]
-
)
+
# Create initial commit
+
if thicket.commit_changes("Initialize thicket repository"):
+
console.print("[green]โœ“[/green] Created initial commit")
-
save_config(config, config_file)
-
print_success(f"Created configuration file: {config_file}")
+
console.print("\n[green]Thicket initialized successfully![/green]")
+
console.print(f" โ€ข Git store: {git_store}")
+
console.print(f" โ€ข Cache directory: {cache_dir}")
+
console.print(f" โ€ข Configuration: {config_file}")
+
console.print("\n[blue]Next steps:[/blue]")
+
console.print(" 1. Add your first user and feed:")
+
console.print(f" [cyan]thicket add username https://example.com/feed.xml[/cyan]")
+
console.print(" 2. Sync feeds:")
+
console.print(f" [cyan]thicket sync[/cyan]")
+
console.print(" 3. Generate a website:")
+
console.print(f" [cyan]thicket generate[/cyan]")
-
except ValidationError as e:
-
print_error(f"Invalid configuration: {e}")
-
raise typer.Exit(1) from e
except Exception as e:
-
print_error(f"Failed to create configuration: {e}")
-
raise typer.Exit(1) from e
-
-
print_success("Thicket initialized successfully!")
-
print_success(f"Git store: {git_store}")
-
print_success(f"Cache directory: {cache_dir}")
-
print_success(f"Configuration: {config_file}")
-
print_success("Run 'thicket add user' to add your first user and feed.")
+
console.print(f"[red]Error:[/red] {str(e)}")
+
raise typer.Exit(1)
+75 -121
src/thicket/cli/commands/sync.py
···
from typing import Optional
import typer
-
from rich.progress import track
+
from rich.progress import Progress, SpinnerColumn, TextColumn
-
from ...core.feed_parser import FeedParser
-
from ...core.git_store import GitStore
-
from ..main import app
-
from ..utils import (
-
load_config,
-
print_error,
-
print_info,
-
print_success,
-
)
+
from ..main import app, console, load_thicket
@app.command()
def sync(
-
all_users: bool = typer.Option(
-
False, "--all", "-a", help="Sync all users and feeds"
-
),
user: Optional[str] = typer.Option(
-
None, "--user", "-u", help="Sync specific user only"
+
None, "--user", "-u", help="Sync specific user only (default: all users)"
),
config_file: Optional[Path] = typer.Option(
-
Path("thicket.yaml"), "--config", help="Configuration file path"
+
None, "--config", help="Configuration file path"
),
-
dry_run: bool = typer.Option(
-
False, "--dry-run", help="Show what would be synced without making changes"
+
commit: bool = typer.Option(
+
True, "--commit/--no-commit", help="Commit changes after sync"
),
) -> None:
"""Sync feeds and store entries in Git repository."""
-
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
# Determine which users to sync from git repository
-
users_to_sync = []
-
if all_users:
-
index = git_store._load_index()
-
users_to_sync = list(index.users.values())
-
elif user:
-
user_metadata = git_store.get_user(user)
-
if not user_metadata:
-
print_error(f"User '{user}' not found in git repository")
-
raise typer.Exit(1)
-
users_to_sync = [user_metadata]
-
else:
-
print_error("Specify --all to sync all users or --user to sync a specific user")
-
raise typer.Exit(1)
-
-
if not users_to_sync:
-
print_info("No users configured to sync")
-
return
-
-
# Sync each user
-
total_new_entries = 0
-
total_updated_entries = 0
-
-
for user_metadata in users_to_sync:
-
print_info(f"Syncing user: {user_metadata.username}")
-
-
user_new_entries = 0
-
user_updated_entries = 0
-
-
# Sync each feed for the user
-
for feed_url in track(user_metadata.feeds, description=f"Syncing {user_metadata.username}'s feeds"):
-
try:
-
new_entries, updated_entries = asyncio.run(
-
sync_feed(git_store, user_metadata.username, feed_url, dry_run)
-
)
-
user_new_entries += new_entries
-
user_updated_entries += updated_entries
-
-
except Exception as e:
-
print_error(f"Failed to sync feed {feed_url}: {e}")
-
continue
-
-
print_info(f"User {user_metadata.username}: {user_new_entries} new, {user_updated_entries} updated")
-
total_new_entries += user_new_entries
-
total_updated_entries += user_updated_entries
-
-
# Commit changes if not dry run
-
if not dry_run and (total_new_entries > 0 or total_updated_entries > 0):
-
commit_message = f"Sync feeds: {total_new_entries} new entries, {total_updated_entries} updated"
-
git_store.commit_changes(commit_message)
-
print_success(f"Committed changes: {commit_message}")
-
-
# Summary
-
if dry_run:
-
print_info(f"Dry run complete: would sync {total_new_entries} new entries, {total_updated_entries} updated")
-
else:
-
print_success(f"Sync complete: {total_new_entries} new entries, {total_updated_entries} updated")
-
-
-
async def sync_feed(git_store: GitStore, username: str, feed_url, dry_run: bool) -> tuple[int, int]:
-
"""Sync a single feed for a user."""
-
-
parser = FeedParser()
-
+
try:
-
# Fetch and parse feed
-
content = await parser.fetch_feed(feed_url)
-
metadata, entries = parser.parse_feed(content, feed_url)
-
-
new_entries = 0
-
updated_entries = 0
-
-
# Process each entry
-
for entry in entries:
-
try:
-
# Check if entry already exists
-
existing_entry = git_store.get_entry(username, entry.id)
-
-
if existing_entry:
-
# Check if entry has been updated
-
if existing_entry.updated != entry.updated:
-
if not dry_run:
-
git_store.store_entry(username, entry)
-
updated_entries += 1
-
else:
-
# New entry
-
if not dry_run:
-
git_store.store_entry(username, entry)
-
new_entries += 1
-
-
except Exception as e:
-
print_error(f"Failed to process entry {entry.id}: {e}")
-
continue
-
-
return new_entries, updated_entries
-
+
# Load Thicket instance
+
thicket = load_thicket(config_file)
+
+
# Progress callback for tracking
+
current_task = None
+
+
def progress_callback(message: str, current: int = 0, total: int = 0):
+
nonlocal current_task
+
current_task = message
+
if total > 0:
+
console.print(f"[blue]Progress:[/blue] {message} ({current}/{total})")
+
else:
+
console.print(f"[blue]Info:[/blue] {message}")
+
+
# Run sync with progress
+
with Progress(
+
SpinnerColumn(),
+
TextColumn("[progress.description]{task.description}"),
+
console=console,
+
transient=True,
+
) as progress:
+
task = progress.add_task("Syncing feeds...", total=None)
+
+
# Perform sync
+
results = asyncio.run(thicket.sync_feeds(user, progress_callback))
+
+
progress.remove_task(task)
+
+
# Process results
+
total_new = 0
+
total_processed = 0
+
errors = []
+
+
if isinstance(results, dict):
+
for username, user_results in results.items():
+
if 'error' in user_results:
+
errors.append(f"{username}: {user_results['error']}")
+
continue
+
+
total_new += user_results.get('new_entries', 0)
+
total_processed += user_results.get('feeds_processed', 0)
+
+
console.print(f"[green]โœ“[/green] {username}: {user_results.get('new_entries', 0)} new entries from {user_results.get('feeds_processed', 0)} feeds")
+
+
# Show any feed-specific errors
+
for error in user_results.get('errors', []):
+
console.print(f" [yellow]Warning:[/yellow] {error}")
+
+
# Show errors
+
for error in errors:
+
console.print(f"[red]Error:[/red] {error}")
+
+
# Commit changes if requested
+
if commit and total_new > 0:
+
commit_message = f"Sync feeds: {total_new} new entries from {total_processed} feeds"
+
if thicket.commit_changes(commit_message):
+
console.print(f"[green]โœ“[/green] Committed: {commit_message}")
+
else:
+
console.print("[red]โœ—[/red] Failed to commit changes")
+
+
# Summary
+
if total_new > 0:
+
console.print(f"\n[green]Sync complete:[/green] {total_new} new entries processed")
+
else:
+
console.print("\n[blue]Sync complete:[/blue] No new entries found")
+
except Exception as e:
-
print_error(f"Failed to sync feed {feed_url}: {e}")
-
return 0, 0
+
console.print(f"[red]Error:[/red] {str(e)}")
+
raise typer.Exit(1)
-1111
src/thicket/cli/commands/threads_cmd.py
···
-
"""CLI command for displaying and browsing thread-graphs of blog posts."""
-
-
from dataclasses import dataclass, field
-
from datetime import datetime
-
from enum import Enum
-
from pathlib import Path
-
from typing import Dict, List, Optional, Set, Tuple
-
-
import typer
-
from rich.console import Console
-
import json
-
import webbrowser
-
import threading
-
import time
-
from flask import Flask, render_template_string, jsonify
-
from textual import events
-
from textual.app import App, ComposeResult
-
from textual.containers import Container, Horizontal, Vertical
-
from textual.reactive import reactive
-
from textual.widget import Widget
-
from textual.widgets import Footer, Header, Label, Static
-
-
from ...core.git_store import GitStore
-
from ...models import AtomEntry
-
from ..main import app
-
from ..utils import get_tsv_mode, load_config
-
-
console = Console()
-
-
-
class LinkType(Enum):
-
"""Types of links between entries."""
-
-
SELF_REFERENCE = "self" # Link to same user's content
-
USER_REFERENCE = "user" # Link to another tracked user
-
EXTERNAL = "external" # Link to external content
-
-
-
@dataclass
-
class ThreadNode:
-
"""Represents a node in the thread graph."""
-
-
entry_id: str
-
username: str
-
entry: AtomEntry
-
outbound_links: List[Tuple[str, LinkType]] = field(
-
default_factory=list
-
) # (url, type)
-
inbound_backlinks: List[str] = field(default_factory=list) # entry_ids
-
-
@property
-
def published_date(self) -> datetime:
-
"""Get the published or updated date for sorting."""
-
return self.entry.published or self.entry.updated
-
-
@property
-
def title(self) -> str:
-
"""Get the entry title."""
-
return self.entry.title
-
-
@property
-
def summary(self) -> str:
-
"""Get a short summary of the entry."""
-
if self.entry.summary:
-
return (
-
self.entry.summary[:100] + "..."
-
if len(self.entry.summary) > 100
-
else self.entry.summary
-
)
-
return ""
-
-
-
@dataclass
-
class ThreadGraph:
-
"""Represents the full thread graph of interconnected posts."""
-
-
nodes: Dict[str, ThreadNode] = field(default_factory=dict) # entry_id -> ThreadNode
-
user_entries: Dict[str, List[str]] = field(
-
default_factory=dict
-
) # username -> [entry_ids]
-
url_to_entry: Dict[str, str] = field(default_factory=dict) # url -> entry_id
-
-
def add_node(self, node: ThreadNode) -> None:
-
"""Add a node to the graph."""
-
self.nodes[node.entry_id] = node
-
-
# Update user entries index
-
if node.username not in self.user_entries:
-
self.user_entries[node.username] = []
-
self.user_entries[node.username].append(node.entry_id)
-
-
# Update URL mapping
-
if node.entry.link:
-
self.url_to_entry[str(node.entry.link)] = node.entry_id
-
-
def get_connected_components(self) -> List[Set[str]]:
-
"""Find all connected components in the graph (threads)."""
-
visited: Set[str] = set()
-
components: List[Set[str]] = []
-
-
for entry_id in self.nodes:
-
if entry_id not in visited:
-
component: Set[str] = set()
-
self._dfs(entry_id, visited, component)
-
components.append(component)
-
-
return components
-
-
def _dfs(self, entry_id: str, visited: Set[str], component: Set[str]) -> None:
-
"""Depth-first search to find connected components."""
-
if entry_id in visited:
-
return
-
-
visited.add(entry_id)
-
component.add(entry_id)
-
-
node = self.nodes.get(entry_id)
-
if not node:
-
return
-
-
# Follow outbound links
-
for url, link_type in node.outbound_links:
-
if url in self.url_to_entry:
-
target_id = self.url_to_entry[url]
-
self._dfs(target_id, visited, component)
-
-
# Follow backlinks
-
for backlink_id in node.inbound_backlinks:
-
self._dfs(backlink_id, visited, component)
-
-
def get_standalone_entries(self) -> List[str]:
-
"""Get entries with no connections."""
-
standalone = []
-
for entry_id, node in self.nodes.items():
-
if not node.outbound_links and not node.inbound_backlinks:
-
standalone.append(entry_id)
-
return standalone
-
-
def sort_component_chronologically(self, component: Set[str]) -> List[str]:
-
"""Sort a component by published date."""
-
nodes = [
-
self.nodes[entry_id] for entry_id in component if entry_id in self.nodes
-
]
-
nodes.sort(key=lambda n: n.published_date)
-
return [n.entry_id for n in nodes]
-
-
-
def build_thread_graph(git_store: GitStore) -> ThreadGraph:
-
"""Build the thread graph from all entries in the git store."""
-
graph = ThreadGraph()
-
-
# Get all users from index
-
index = git_store._load_index()
-
user_domains = {}
-
-
# Build user domain mapping
-
for username, user_metadata in index.users.items():
-
domains = set()
-
-
# Add domains from feeds
-
for feed_url in user_metadata.feeds:
-
from urllib.parse import urlparse
-
-
domain = urlparse(str(feed_url)).netloc.lower()
-
if domain:
-
domains.add(domain)
-
-
# Add domain from homepage
-
if user_metadata.homepage:
-
domain = urlparse(str(user_metadata.homepage)).netloc.lower()
-
if domain:
-
domains.add(domain)
-
-
user_domains[username] = domains
-
-
# Process all entries
-
for username in index.users:
-
entries = git_store.list_entries(username)
-
-
for entry in entries:
-
# Create node
-
node = ThreadNode(entry_id=entry.id, username=username, entry=entry)
-
-
# Process outbound links
-
for link in getattr(entry, "links", []):
-
link_type = categorize_link(link, username, user_domains)
-
node.outbound_links.append((link, link_type))
-
-
# Copy backlinks
-
node.inbound_backlinks = getattr(entry, "backlinks", [])
-
-
# Add to graph
-
graph.add_node(node)
-
-
return graph
-
-
-
def categorize_link(
-
url: str, source_username: str, user_domains: Dict[str, Set[str]]
-
) -> LinkType:
-
"""Categorize a link as self-reference, user reference, or external."""
-
from urllib.parse import urlparse
-
-
try:
-
parsed = urlparse(url)
-
domain = parsed.netloc.lower()
-
-
# Check if it's a self-reference
-
if domain in user_domains.get(source_username, set()):
-
return LinkType.SELF_REFERENCE
-
-
# Check if it's a reference to another tracked user
-
for username, domains in user_domains.items():
-
if username != source_username and domain in domains:
-
return LinkType.USER_REFERENCE
-
-
# Otherwise it's external
-
return LinkType.EXTERNAL
-
-
except Exception:
-
return LinkType.EXTERNAL
-
-
-
class ThreadTreeWidget(Static):
-
"""Widget for displaying a thread as a tree."""
-
-
def __init__(self, component: Set[str], graph: ThreadGraph, **kwargs):
-
super().__init__(**kwargs)
-
self.component = component
-
self.graph = graph
-
-
def compose(self) -> ComposeResult:
-
"""Create the tree display."""
-
# Sort entries chronologically
-
sorted_ids = self.graph.sort_component_chronologically(self.component)
-
-
# Build tree structure as text
-
content_lines = ["Thread:"]
-
added_nodes: Set[str] = set()
-
-
# Add nodes in chronological order, showing connections
-
for entry_id in sorted_ids:
-
if entry_id not in added_nodes:
-
self._add_node_to_text(content_lines, entry_id, added_nodes, 0)
-
-
# Join all lines into content
-
content = "\n".join(content_lines)
-
-
# Create a Static widget with the content
-
yield Static(content, id="thread-content")
-
-
def _add_node_to_text(
-
self, content_lines: List[str], entry_id: str, added_nodes: Set[str], indent: int = 0
-
):
-
"""Recursively add nodes to the text display."""
-
if entry_id in added_nodes:
-
# Show cycle reference
-
node = self.graph.nodes.get(entry_id)
-
if node:
-
prefix = " " * indent
-
content_lines.append(f"{prefix}โ†ป {node.username}: {node.title}")
-
return
-
-
added_nodes.add(entry_id)
-
node = self.graph.nodes.get(entry_id)
-
if not node:
-
return
-
-
# Format node display
-
prefix = " " * indent
-
date_str = node.published_date.strftime("%Y-%m-%d")
-
node_label = f"{prefix}โ€ข {node.username}: {node.title} ({date_str})"
-
content_lines.append(node_label)
-
-
# Add connections info
-
if node.outbound_links:
-
links_by_type: Dict[LinkType, List[str]] = {}
-
for url, link_type in node.outbound_links:
-
if link_type not in links_by_type:
-
links_by_type[link_type] = []
-
links_by_type[link_type].append(url)
-
-
for link_type, urls in links_by_type.items():
-
type_label = f"{prefix} โ†’ {link_type.value}: {len(urls)} link(s)"
-
content_lines.append(type_label)
-
-
if node.inbound_backlinks:
-
backlink_label = f"{prefix} โ† backlinks: {len(node.inbound_backlinks)}"
-
content_lines.append(backlink_label)
-
-
-
class ThreadBrowserApp(App):
-
"""Terminal UI for browsing threads."""
-
-
CSS = """
-
ThreadBrowserApp {
-
background: $surface;
-
}
-
-
#thread-list {
-
width: 1fr;
-
height: 1fr;
-
border: solid $primary;
-
overflow-y: scroll;
-
}
-
-
#entry-detail {
-
width: 1fr;
-
height: 1fr;
-
border: solid $secondary;
-
overflow-y: scroll;
-
padding: 1;
-
}
-
"""
-
-
BINDINGS = [
-
("q", "quit", "Quit"),
-
("j", "next_thread", "Next Thread"),
-
("k", "prev_thread", "Previous Thread"),
-
("enter", "select_thread", "View Thread"),
-
]
-
-
def __init__(self, graph: ThreadGraph):
-
super().__init__()
-
self.graph = graph
-
self.threads = []
-
self.current_thread_index = 0
-
self._build_thread_list()
-
-
def _build_thread_list(self):
-
"""Build the list of threads to display."""
-
# Get connected components (actual threads)
-
components = self.graph.get_connected_components()
-
-
# Sort components by the earliest date in each
-
sorted_components = []
-
for component in components:
-
if len(component) > 1: # Only show actual threads
-
sorted_ids = self.graph.sort_component_chronologically(component)
-
if sorted_ids:
-
first_node = self.graph.nodes.get(sorted_ids[0])
-
if first_node:
-
sorted_components.append((first_node.published_date, component))
-
-
sorted_components.sort(key=lambda x: x[0], reverse=True)
-
self.threads = [comp for _, comp in sorted_components]
-
-
def compose(self) -> ComposeResult:
-
"""Create the UI layout."""
-
yield Header()
-
-
with Horizontal():
-
with Vertical(id="thread-list"):
-
yield Label("Threads", classes="title")
-
for i, thread in enumerate(self.threads):
-
# Get thread summary
-
sorted_ids = self.graph.sort_component_chronologically(thread)
-
if sorted_ids:
-
first_node = self.graph.nodes.get(sorted_ids[0])
-
if first_node:
-
label = f"{i + 1}. {first_node.title} ({len(thread)} posts)"
-
yield Label(label, classes="thread-item")
-
-
with Vertical(id="entry-detail"):
-
if self.threads:
-
yield ThreadTreeWidget(self.threads[0], self.graph)
-
-
yield Footer()
-
-
def action_next_thread(self) -> None:
-
"""Move to next thread."""
-
if self.current_thread_index < len(self.threads) - 1:
-
self.current_thread_index += 1
-
self.update_display()
-
-
def action_prev_thread(self) -> None:
-
"""Move to previous thread."""
-
if self.current_thread_index > 0:
-
self.current_thread_index -= 1
-
self.update_display()
-
-
def action_select_thread(self) -> None:
-
"""View detailed thread."""
-
# In a real implementation, this could show more detail
-
pass
-
-
def update_display(self) -> None:
-
"""Update the thread display."""
-
detail_view = self.query_one("#entry-detail")
-
detail_view.remove_children()
-
-
if self.threads and self.current_thread_index < len(self.threads):
-
widget = ThreadTreeWidget(
-
self.threads[self.current_thread_index], self.graph
-
)
-
detail_view.mount(widget)
-
-
-
@app.command()
-
def threads(
-
config_file: Optional[Path] = typer.Option(
-
Path("thicket.yaml"),
-
"--config",
-
"-c",
-
help="Path to configuration file",
-
),
-
interactive: bool = typer.Option(
-
True,
-
"--interactive/--no-interactive",
-
"-i/-n",
-
help="Launch interactive terminal UI",
-
),
-
web: bool = typer.Option(
-
False,
-
"--web",
-
"-w",
-
help="Launch web server with D3 force graph visualization",
-
),
-
port: int = typer.Option(
-
8080,
-
"--port",
-
"-p",
-
help="Port for web server",
-
),
-
) -> None:
-
"""Browse and visualize thread-graphs of interconnected blog posts.
-
-
This command analyzes all blog entries and their links/backlinks to build
-
a graph of conversations and references between posts. Threads are displayed
-
as connected components in the link graph.
-
"""
-
try:
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
# Build thread graph
-
console.print("Building thread graph...")
-
graph = build_thread_graph(git_store)
-
-
# Get statistics
-
components = graph.get_connected_components()
-
threads = [c for c in components if len(c) > 1]
-
standalone = graph.get_standalone_entries()
-
-
console.print(
-
f"\n[green]Found {len(threads)} threads and {len(standalone)} standalone posts[/green]"
-
)
-
-
if web:
-
# Launch web server with D3 visualization
-
_launch_web_server(graph, port)
-
elif interactive and threads:
-
# Launch terminal UI
-
app = ThreadBrowserApp(graph)
-
app.run()
-
else:
-
# Display in console
-
if get_tsv_mode():
-
_display_threads_tsv(graph, threads)
-
else:
-
_display_threads_rich(graph, threads)
-
-
except Exception as e:
-
console.print(f"[red]Error building threads: {e}[/red]")
-
raise typer.Exit(1)
-
-
-
def _display_threads_rich(graph: ThreadGraph, threads: List[Set[str]]) -> None:
-
"""Display threads using rich formatting."""
-
for i, thread in enumerate(threads[:10]): # Show first 10 threads
-
sorted_ids = graph.sort_component_chronologically(thread)
-
-
console.print(f"\n[bold]Thread {i + 1}[/bold] ({len(thread)} posts)")
-
-
for j, entry_id in enumerate(sorted_ids):
-
node = graph.nodes.get(entry_id)
-
if node:
-
date_str = node.published_date.strftime("%Y-%m-%d")
-
indent = " " * min(j, 3) # Max 3 levels of indent
-
console.print(f"{indent}โ€ข [{node.username}] {node.title} ({date_str})")
-
-
# Show link types
-
if node.outbound_links:
-
link_summary = {}
-
for _, link_type in node.outbound_links:
-
link_summary[link_type] = link_summary.get(link_type, 0) + 1
-
-
link_str = ", ".join(
-
[f"{t.value}:{c}" for t, c in link_summary.items()]
-
)
-
console.print(f"{indent} โ†’ Links: {link_str}")
-
-
-
def _display_threads_tsv(graph: ThreadGraph, threads: List[Set[str]]) -> None:
-
"""Display threads in TSV format."""
-
print("Thread\tSize\tFirst Post\tLast Post\tUsers")
-
-
for i, thread in enumerate(threads):
-
sorted_ids = graph.sort_component_chronologically(thread)
-
-
if sorted_ids:
-
first_node = graph.nodes.get(sorted_ids[0])
-
last_node = graph.nodes.get(sorted_ids[-1])
-
-
users = set()
-
for entry_id in thread:
-
node = graph.nodes.get(entry_id)
-
if node:
-
users.add(node.username)
-
-
if first_node and last_node:
-
print(
-
f"{i + 1}\t{len(thread)}\t{first_node.published_date.strftime('%Y-%m-%d')}\t{last_node.published_date.strftime('%Y-%m-%d')}\t{','.join(users)}"
-
)
-
-
-
def _build_graph_json(graph: ThreadGraph) -> dict:
-
"""Convert ThreadGraph to JSON format for D3 visualization."""
-
nodes = []
-
links = []
-
-
# Color mapping for different users
-
user_colors = {}
-
colors = [
-
"#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd",
-
"#8c564b", "#e377c2", "#7f7f7f", "#bcbd22", "#17becf",
-
"#aec7e8", "#ffbb78", "#98df8a", "#ff9896", "#c5b0d5"
-
]
-
-
# Assign colors to users
-
for i, username in enumerate(set(node.username for node in graph.nodes.values())):
-
user_colors[username] = colors[i % len(colors)]
-
-
# Create nodes
-
for entry_id, node in graph.nodes.items():
-
nodes.append({
-
"id": entry_id,
-
"title": node.title,
-
"username": node.username,
-
"date": node.published_date.strftime("%Y-%m-%d"),
-
"summary": node.summary,
-
"color": user_colors[node.username],
-
"outbound_count": len(node.outbound_links),
-
"backlink_count": len(node.inbound_backlinks),
-
"link_types": {
-
"self": len([l for l in node.outbound_links if l[1] == LinkType.SELF_REFERENCE]),
-
"user": len([l for l in node.outbound_links if l[1] == LinkType.USER_REFERENCE]),
-
"external": len([l for l in node.outbound_links if l[1] == LinkType.EXTERNAL])
-
}
-
})
-
-
# Create links (only for links between tracked entries)
-
for entry_id, node in graph.nodes.items():
-
for url, link_type in node.outbound_links:
-
if url in graph.url_to_entry:
-
target_id = graph.url_to_entry[url]
-
if target_id in graph.nodes:
-
links.append({
-
"source": entry_id,
-
"target": target_id,
-
"type": link_type.value,
-
"url": url
-
})
-
-
return {
-
"nodes": nodes,
-
"links": links,
-
"stats": {
-
"total_nodes": len(nodes),
-
"total_links": len(links),
-
"users": list(user_colors.keys()),
-
"user_colors": user_colors
-
}
-
}
-
-
-
def _launch_web_server(graph: ThreadGraph, port: int) -> None:
-
"""Launch Flask web server with D3 force graph visualization."""
-
flask_app = Flask(__name__)
-
-
# Store graph data globally for the Flask app
-
graph_data = _build_graph_json(graph)
-
-
@flask_app.route('/')
-
def index():
-
"""Serve the main visualization page."""
-
return render_template_string(HTML_TEMPLATE, port=port)
-
-
@flask_app.route('/api/graph')
-
def api_graph():
-
"""API endpoint to serve graph data as JSON."""
-
return jsonify(graph_data)
-
-
# Disable Flask logging in development mode
-
import logging
-
log = logging.getLogger('werkzeug')
-
log.setLevel(logging.ERROR)
-
-
def open_browser():
-
"""Open browser after a short delay."""
-
time.sleep(1.5)
-
webbrowser.open(f'http://localhost:{port}')
-
-
# Start browser in a separate thread
-
browser_thread = threading.Thread(target=open_browser)
-
browser_thread.daemon = True
-
browser_thread.start()
-
-
console.print(f"\n[green]Starting web server at http://localhost:{port}[/green]")
-
console.print("[yellow]Press Ctrl+C to stop the server[/yellow]")
-
-
try:
-
flask_app.run(host='0.0.0.0', port=port, debug=False)
-
except KeyboardInterrupt:
-
console.print("\n[green]Server stopped[/green]")
-
-
-
# HTML template for D3 force graph visualization
-
HTML_TEMPLATE = """
-
<!DOCTYPE html>
-
<html lang="en">
-
<head>
-
<meta charset="UTF-8">
-
<meta name="viewport" content="width=device-width, initial-scale=1.0">
-
<title>Thicket Thread Graph Visualization</title>
-
<script src="https://d3js.org/d3.v7.min.js"></script>
-
<style>
-
body {
-
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
-
margin: 0;
-
padding: 20px;
-
background-color: #f5f5f5;
-
}
-
-
.header {
-
text-align: center;
-
margin-bottom: 20px;
-
}
-
-
h1 {
-
color: #333;
-
margin-bottom: 10px;
-
}
-
-
.controls {
-
display: flex;
-
justify-content: center;
-
gap: 15px;
-
margin-bottom: 20px;
-
flex-wrap: wrap;
-
}
-
-
.control-group {
-
display: flex;
-
align-items: center;
-
gap: 5px;
-
}
-
-
select, input[type="range"] {
-
padding: 5px;
-
border: 1px solid #ddd;
-
border-radius: 4px;
-
}
-
-
.stats {
-
display: flex;
-
justify-content: center;
-
gap: 20px;
-
margin-bottom: 20px;
-
font-size: 14px;
-
color: #666;
-
}
-
-
.stat-item {
-
background: white;
-
padding: 10px 15px;
-
border-radius: 6px;
-
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
-
}
-
-
#graph-container {
-
background: white;
-
border-radius: 8px;
-
box-shadow: 0 4px 6px rgba(0,0,0,0.1);
-
overflow: hidden;
-
}
-
-
#graph {
-
cursor: grab;
-
}
-
-
#graph:active {
-
cursor: grabbing;
-
}
-
-
.node {
-
stroke: #fff;
-
stroke-width: 1.5px;
-
cursor: pointer;
-
}
-
-
.node:hover {
-
stroke: #333;
-
stroke-width: 2px;
-
}
-
-
.link {
-
stroke: #999;
-
stroke-opacity: 0.6;
-
stroke-width: 1px;
-
}
-
-
.link.self-link {
-
stroke: #2ca02c;
-
}
-
-
.link.user-link {
-
stroke: #ff7f0e;
-
}
-
-
.link.external-link {
-
stroke: #d62728;
-
}
-
-
.tooltip {
-
position: absolute;
-
background: rgba(0, 0, 0, 0.9);
-
color: white;
-
padding: 10px;
-
border-radius: 4px;
-
font-size: 12px;
-
line-height: 1.4;
-
pointer-events: none;
-
z-index: 1000;
-
max-width: 300px;
-
}
-
-
.legend {
-
position: fixed;
-
top: 20px;
-
right: 20px;
-
background: white;
-
padding: 15px;
-
border-radius: 6px;
-
box-shadow: 0 2px 8px rgba(0,0,0,0.15);
-
font-size: 12px;
-
z-index: 100;
-
}
-
-
.legend h3 {
-
margin: 0 0 10px 0;
-
font-size: 14px;
-
color: #333;
-
}
-
-
.legend-item {
-
display: flex;
-
align-items: center;
-
margin-bottom: 5px;
-
}
-
-
.legend-color {
-
width: 12px;
-
height: 12px;
-
margin-right: 8px;
-
border-radius: 2px;
-
}
-
-
.legend-line {
-
width: 20px;
-
height: 2px;
-
margin-right: 8px;
-
}
-
</style>
-
</head>
-
<body>
-
<div class="header">
-
<h1>Thicket Thread Graph Visualization</h1>
-
<p>Interactive visualization of blog post connections and conversations</p>
-
</div>
-
-
<div class="controls">
-
<div class="control-group">
-
<label for="userFilter">Filter by user:</label>
-
<select id="userFilter">
-
<option value="all">All Users</option>
-
</select>
-
</div>
-
-
<div class="control-group">
-
<label for="linkFilter">Show links:</label>
-
<select id="linkFilter">
-
<option value="all">All Links</option>
-
<option value="user">User Links Only</option>
-
<option value="self">Self Links Only</option>
-
<option value="external">External Links Only</option>
-
</select>
-
</div>
-
-
<div class="control-group">
-
<label for="forceStrength">Force Strength:</label>
-
<input type="range" id="forceStrength" min="0.1" max="2" step="0.1" value="0.3">
-
</div>
-
-
<div class="control-group">
-
<label for="nodeSize">Node Size:</label>
-
<input type="range" id="nodeSize" min="3" max="15" step="1" value="6">
-
</div>
-
</div>
-
-
<div class="stats" id="stats"></div>
-
-
<div id="graph-container">
-
<svg id="graph"></svg>
-
</div>
-
-
<div class="legend">
-
<h3>Link Types</h3>
-
<div class="legend-item">
-
<div class="legend-line" style="background: #2ca02c;"></div>
-
<span>Self References</span>
-
</div>
-
<div class="legend-item">
-
<div class="legend-line" style="background: #ff7f0e;"></div>
-
<span>User References</span>
-
</div>
-
<div class="legend-item">
-
<div class="legend-line" style="background: #d62728;"></div>
-
<span>External References</span>
-
</div>
-
-
<h3 style="margin-top: 15px;">Interactions</h3>
-
<div style="font-size: 11px; color: #666;">
-
โ€ข Hover: Show details<br>
-
โ€ข Click: Pin/unpin node<br>
-
โ€ข Drag: Move nodes<br>
-
โ€ข Zoom: Mouse wheel
-
</div>
-
</div>
-
-
<div class="tooltip" id="tooltip" style="display: none;"></div>
-
-
<script>
-
let graphData;
-
let simulation;
-
let svg, g, link, node;
-
let width = window.innerWidth - 40;
-
let height = window.innerHeight - 200;
-
-
// Initialize the visualization
-
async function init() {
-
// Fetch graph data
-
const response = await fetch('/api/graph');
-
graphData = await response.json();
-
-
// Set up SVG
-
svg = d3.select("#graph")
-
.attr("width", width)
-
.attr("height", height);
-
-
// Add zoom behavior
-
const zoom = d3.zoom()
-
.scaleExtent([0.1, 4])
-
.on("zoom", (event) => {
-
g.attr("transform", event.transform);
-
});
-
-
svg.call(zoom);
-
-
// Create main group for all elements
-
g = svg.append("g");
-
-
// Set up controls
-
setupControls();
-
-
// Initial render
-
updateVisualization();
-
-
// Update stats
-
updateStats();
-
-
// Handle window resize
-
window.addEventListener('resize', () => {
-
width = window.innerWidth - 40;
-
height = window.innerHeight - 200;
-
svg.attr("width", width).attr("height", height);
-
simulation.force("center", d3.forceCenter(width / 2, height / 2));
-
simulation.restart();
-
});
-
}
-
-
function setupControls() {
-
// Populate user filter
-
const userFilter = d3.select("#userFilter");
-
graphData.stats.users.forEach(user => {
-
userFilter.append("option").attr("value", user).text(user);
-
});
-
-
// Add event listeners
-
d3.select("#userFilter").on("change", updateVisualization);
-
d3.select("#linkFilter").on("change", updateVisualization);
-
d3.select("#forceStrength").on("input", updateForces);
-
d3.select("#nodeSize").on("input", updateNodeSizes);
-
}
-
-
function updateVisualization() {
-
// Filter data based on controls
-
const userFilter = d3.select("#userFilter").property("value");
-
const linkFilter = d3.select("#linkFilter").property("value");
-
-
let filteredNodes = graphData.nodes;
-
let filteredLinks = graphData.links;
-
-
if (userFilter !== "all") {
-
filteredNodes = graphData.nodes.filter(n => n.username === userFilter);
-
const nodeIds = new Set(filteredNodes.map(n => n.id));
-
filteredLinks = graphData.links.filter(l =>
-
nodeIds.has(l.source.id || l.source) && nodeIds.has(l.target.id || l.target)
-
);
-
}
-
-
if (linkFilter !== "all") {
-
filteredLinks = filteredLinks.filter(l => l.type === linkFilter);
-
}
-
-
// Clear existing elements
-
g.selectAll(".link").remove();
-
g.selectAll(".node").remove();
-
-
// Create force simulation
-
simulation = d3.forceSimulation(filteredNodes)
-
.force("link", d3.forceLink(filteredLinks).id(d => d.id)
-
.distance(d => {
-
// Get source and target nodes
-
const sourceNode = filteredNodes.find(n => n.id === (d.source.id || d.source));
-
const targetNode = filteredNodes.find(n => n.id === (d.target.id || d.target));
-
-
// If nodes are from different users, make them attract more (shorter distance)
-
if (sourceNode && targetNode && sourceNode.username !== targetNode.username) {
-
return 30; // Shorter distance = stronger attraction
-
}
-
-
// Same user posts have normal distance
-
return 60;
-
})
-
.strength(d => {
-
// Get source and target nodes
-
const sourceNode = filteredNodes.find(n => n.id === (d.source.id || d.source));
-
const targetNode = filteredNodes.find(n => n.id === (d.target.id || d.target));
-
-
// If nodes are from different users, make the link stronger
-
if (sourceNode && targetNode && sourceNode.username !== targetNode.username) {
-
return 1.5; // Stronger link force
-
}
-
-
// Same user posts have normal strength
-
return 1.0;
-
}))
-
.force("charge", d3.forceManyBody().strength(-200))
-
.force("center", d3.forceCenter(width / 2, height / 2))
-
.force("collision", d3.forceCollide().radius(15));
-
-
// Create links
-
link = g.append("g")
-
.selectAll(".link")
-
.data(filteredLinks)
-
.enter().append("line")
-
.attr("class", d => `link ${d.type}-link`)
-
.attr("stroke-width", d => {
-
// Get source and target nodes
-
const sourceNode = filteredNodes.find(n => n.id === (d.source.id || d.source));
-
const targetNode = filteredNodes.find(n => n.id === (d.target.id || d.target));
-
-
// If nodes are from different users, make the line thicker
-
if (sourceNode && targetNode && sourceNode.username !== targetNode.username) {
-
return 2.5; // Thicker line for cross-user connections
-
}
-
-
// Same user posts have normal thickness
-
return 1;
-
});
-
-
// Create nodes
-
node = g.append("g")
-
.selectAll(".node")
-
.data(filteredNodes)
-
.enter().append("circle")
-
.attr("class", "node")
-
.attr("r", d => Math.max(4, Math.log(d.outbound_count + d.backlink_count + 1) * 3))
-
.attr("fill", d => d.color)
-
.call(d3.drag()
-
.on("start", dragstarted)
-
.on("drag", dragged)
-
.on("end", dragended))
-
.on("mouseover", showTooltip)
-
.on("mouseout", hideTooltip)
-
.on("click", togglePin);
-
-
// Update force simulation
-
simulation.on("tick", () => {
-
link
-
.attr("x1", d => d.source.x)
-
.attr("y1", d => d.source.y)
-
.attr("x2", d => d.target.x)
-
.attr("y2", d => d.target.y);
-
-
node
-
.attr("cx", d => d.x)
-
.attr("cy", d => d.y);
-
});
-
-
updateStats(filteredNodes, filteredLinks);
-
}
-
-
function updateForces() {
-
const strength = +d3.select("#forceStrength").property("value");
-
if (simulation) {
-
simulation.force("charge").strength(-200 * strength);
-
simulation.alpha(0.3).restart();
-
}
-
}
-
-
function updateNodeSizes() {
-
const size = +d3.select("#nodeSize").property("value");
-
if (node) {
-
node.attr("r", d => Math.max(size * 0.5, Math.log(d.outbound_count + d.backlink_count + 1) * size * 0.5));
-
}
-
}
-
-
function dragstarted(event, d) {
-
if (!event.active) simulation.alphaTarget(0.3).restart();
-
d.fx = d.x;
-
d.fy = d.y;
-
}
-
-
function dragged(event, d) {
-
d.fx = event.x;
-
d.fy = event.y;
-
}
-
-
function dragended(event, d) {
-
if (!event.active) simulation.alphaTarget(0);
-
if (!d.pinned) {
-
d.fx = null;
-
d.fy = null;
-
}
-
}
-
-
function togglePin(event, d) {
-
d.pinned = !d.pinned;
-
if (d.pinned) {
-
d.fx = d.x;
-
d.fy = d.y;
-
} else {
-
d.fx = null;
-
d.fy = null;
-
}
-
}
-
-
function showTooltip(event, d) {
-
const tooltip = d3.select("#tooltip");
-
tooltip.style("display", "block")
-
.html(`
-
<strong>${d.title}</strong><br>
-
<strong>User:</strong> ${d.username}<br>
-
<strong>Date:</strong> ${d.date}<br>
-
<strong>Outbound Links:</strong> ${d.outbound_count}<br>
-
<strong>Backlinks:</strong> ${d.backlink_count}<br>
-
<strong>Link Types:</strong> Self: ${d.link_types.self}, User: ${d.link_types.user}, External: ${d.link_types.external}
-
${d.summary ? '<br><br>' + d.summary : ''}
-
`)
-
.style("left", (event.pageX + 10) + "px")
-
.style("top", (event.pageY - 10) + "px");
-
}
-
-
function hideTooltip() {
-
d3.select("#tooltip").style("display", "none");
-
}
-
-
function updateStats(nodes = graphData.nodes, links = graphData.links) {
-
const stats = d3.select("#stats");
-
const userCounts = {};
-
nodes.forEach(n => {
-
userCounts[n.username] = (userCounts[n.username] || 0) + 1;
-
});
-
-
stats.html(`
-
<div class="stat-item">
-
<strong>${nodes.length}</strong> Nodes
-
</div>
-
<div class="stat-item">
-
<strong>${links.length}</strong> Links
-
</div>
-
<div class="stat-item">
-
<strong>${Object.keys(userCounts).length}</strong> Users
-
</div>
-
<div class="stat-item">
-
Users: ${Object.entries(userCounts).map(([user, count]) => `${user} (${count})`).join(', ')}
-
</div>
-
`);
-
}
-
-
// Initialize when page loads
-
init();
-
</script>
-
</body>
-
</html>
-
"""
+36 -2
src/thicket/cli/main.py
···
"""Main CLI application using Typer."""
+
from pathlib import Path
+
from typing import Optional
+
import typer
from rich.console import Console
-
from .. import __version__
+
from .. import __version__, Thicket, ThicketConfig
app = typer.Typer(
name="thicket",
···
raise typer.Exit()
+
def load_thicket(config_path: Optional[Path] = None) -> Thicket:
+
"""Load Thicket instance from configuration."""
+
if config_path and config_path.exists():
+
return Thicket.from_config_file(config_path)
+
+
# Try default locations
+
default_paths = [
+
Path("thicket.yaml"),
+
Path("thicket.yml"),
+
Path("thicket.json"),
+
Path.home() / ".config" / "thicket" / "config.yaml",
+
Path.home() / ".thicket.yaml",
+
]
+
+
for path in default_paths:
+
if path.exists():
+
return Thicket.from_config_file(path)
+
+
# No config found
+
console.print("[red]Error:[/red] No configuration file found.")
+
console.print("Use [bold]thicket init[/bold] to create a new configuration or specify --config")
+
raise typer.Exit(1)
+
+
+
def get_config_path() -> Path:
+
"""Get the default configuration path for new configs."""
+
config_dir = Path.home() / ".config" / "thicket"
+
config_dir.mkdir(parents=True, exist_ok=True)
+
return config_dir / "config.yaml"
+
+
@app.callback()
def main(
version: bool = typer.Option(
···
# Import commands to register them
-
from .commands import add, duplicates, info_cmd, init, links_cmd, list_cmd, sync, threads_cmd
+
from .commands import add, duplicates, generate, index_cmd, info_cmd, init, links_cmd, list_cmd, sync
if __name__ == "__main__":
app()
+438
src/thicket/core/reference_parser.py
···
+
"""Reference detection and parsing for blog entries."""
+
+
import re
+
from typing import Optional
+
from urllib.parse import urlparse
+
+
from ..models import AtomEntry
+
+
+
class BlogReference:
+
"""Represents a reference from one blog entry to another."""
+
+
def __init__(
+
self,
+
source_entry_id: str,
+
source_username: str,
+
target_url: str,
+
target_username: Optional[str] = None,
+
target_entry_id: Optional[str] = None,
+
):
+
self.source_entry_id = source_entry_id
+
self.source_username = source_username
+
self.target_url = target_url
+
self.target_username = target_username
+
self.target_entry_id = target_entry_id
+
+
def to_dict(self) -> dict:
+
"""Convert to dictionary for JSON serialization."""
+
result = {
+
"source_entry_id": self.source_entry_id,
+
"source_username": self.source_username,
+
"target_url": self.target_url,
+
}
+
+
# Only include optional fields if they are not None
+
if self.target_username is not None:
+
result["target_username"] = self.target_username
+
if self.target_entry_id is not None:
+
result["target_entry_id"] = self.target_entry_id
+
+
return result
+
+
@classmethod
+
def from_dict(cls, data: dict) -> "BlogReference":
+
"""Create from dictionary."""
+
return cls(
+
source_entry_id=data["source_entry_id"],
+
source_username=data["source_username"],
+
target_url=data["target_url"],
+
target_username=data.get("target_username"),
+
target_entry_id=data.get("target_entry_id"),
+
)
+
+
+
class ReferenceIndex:
+
"""Index of blog-to-blog references for creating threaded views."""
+
+
def __init__(self):
+
self.references: list[BlogReference] = []
+
self.outbound_refs: dict[
+
str, list[BlogReference]
+
] = {} # entry_id -> outbound refs
+
self.inbound_refs: dict[
+
str, list[BlogReference]
+
] = {} # entry_id -> inbound refs
+
self.user_domains: dict[str, set[str]] = {} # username -> set of domains
+
+
def add_reference(self, ref: BlogReference) -> None:
+
"""Add a reference to the index."""
+
self.references.append(ref)
+
+
# Update outbound references
+
source_key = f"{ref.source_username}:{ref.source_entry_id}"
+
if source_key not in self.outbound_refs:
+
self.outbound_refs[source_key] = []
+
self.outbound_refs[source_key].append(ref)
+
+
# Update inbound references if we can identify the target
+
if ref.target_username and ref.target_entry_id:
+
target_key = f"{ref.target_username}:{ref.target_entry_id}"
+
if target_key not in self.inbound_refs:
+
self.inbound_refs[target_key] = []
+
self.inbound_refs[target_key].append(ref)
+
+
def get_outbound_refs(self, username: str, entry_id: str) -> list[BlogReference]:
+
"""Get all outbound references from an entry."""
+
key = f"{username}:{entry_id}"
+
return self.outbound_refs.get(key, [])
+
+
def get_inbound_refs(self, username: str, entry_id: str) -> list[BlogReference]:
+
"""Get all inbound references to an entry."""
+
key = f"{username}:{entry_id}"
+
return self.inbound_refs.get(key, [])
+
+
def get_thread_members(self, username: str, entry_id: str) -> set[tuple[str, str]]:
+
"""Get all entries that are part of the same thread."""
+
visited = set()
+
to_visit = [(username, entry_id)]
+
thread_members = set()
+
+
while to_visit:
+
current_user, current_entry = to_visit.pop()
+
if (current_user, current_entry) in visited:
+
continue
+
+
visited.add((current_user, current_entry))
+
thread_members.add((current_user, current_entry))
+
+
# Add outbound references
+
for ref in self.get_outbound_refs(current_user, current_entry):
+
if ref.target_username and ref.target_entry_id:
+
to_visit.append((ref.target_username, ref.target_entry_id))
+
+
# Add inbound references
+
for ref in self.get_inbound_refs(current_user, current_entry):
+
to_visit.append((ref.source_username, ref.source_entry_id))
+
+
return thread_members
+
+
def to_dict(self) -> dict:
+
"""Convert to dictionary for JSON serialization."""
+
return {
+
"references": [ref.to_dict() for ref in self.references],
+
"user_domains": {k: list(v) for k, v in self.user_domains.items()},
+
}
+
+
@classmethod
+
def from_dict(cls, data: dict) -> "ReferenceIndex":
+
"""Create from dictionary."""
+
index = cls()
+
for ref_data in data.get("references", []):
+
ref = BlogReference.from_dict(ref_data)
+
index.add_reference(ref)
+
+
for username, domains in data.get("user_domains", {}).items():
+
index.user_domains[username] = set(domains)
+
+
return index
+
+
+
class ReferenceParser:
+
"""Parses blog entries to detect references to other blogs."""
+
+
def __init__(self):
+
# Common blog platforms and patterns
+
self.blog_patterns = [
+
r"https?://[^/]+\.(?:org|com|net|io|dev|me|co\.uk)/.*", # Common blog domains
+
r"https?://[^/]+\.github\.io/.*", # GitHub Pages
+
r"https?://[^/]+\.substack\.com/.*", # Substack
+
r"https?://medium\.com/.*", # Medium
+
r"https?://[^/]+\.wordpress\.com/.*", # WordPress.com
+
r"https?://[^/]+\.blogspot\.com/.*", # Blogger
+
]
+
+
# Compile regex patterns
+
self.link_pattern = re.compile(
+
r'<a[^>]+href="([^"]+)"[^>]*>(.*?)</a>', re.IGNORECASE | re.DOTALL
+
)
+
self.url_pattern = re.compile(r'https?://[^\s<>"]+')
+
+
def extract_links_from_html(self, html_content: str) -> list[tuple[str, str]]:
+
"""Extract all links from HTML content."""
+
links = []
+
+
# Extract links from <a> tags
+
for match in self.link_pattern.finditer(html_content):
+
url = match.group(1)
+
text = re.sub(
+
r"<[^>]+>", "", match.group(2)
+
).strip() # Remove HTML tags from link text
+
links.append((url, text))
+
+
return links
+
+
def is_blog_url(self, url: str) -> bool:
+
"""Check if a URL likely points to a blog post."""
+
for pattern in self.blog_patterns:
+
if re.match(pattern, url):
+
return True
+
return False
+
+
def _is_likely_blog_post_url(self, url: str) -> bool:
+
"""Check if a same-domain URL likely points to a blog post (not CSS, images, etc.)."""
+
parsed_url = urlparse(url)
+
path = parsed_url.path.lower()
+
+
# Skip obvious non-blog content
+
if any(path.endswith(ext) for ext in ['.css', '.js', '.png', '.jpg', '.jpeg', '.gif', '.svg', '.ico', '.pdf', '.xml', '.json']):
+
return False
+
+
# Skip common non-blog paths
+
if any(segment in path for segment in ['/static/', '/assets/', '/css/', '/js/', '/images/', '/img/', '/media/', '/uploads/']):
+
return False
+
+
# Skip fragment-only links (same page anchors)
+
if not path or path == '/':
+
return False
+
+
# Look for positive indicators of blog posts
+
# Common blog post patterns: dates, slugs, post indicators
+
blog_indicators = [
+
r'/\d{4}/', # Year in path
+
r'/\d{4}/\d{2}/', # Year/month in path
+
r'/blog/',
+
r'/post/',
+
r'/posts/',
+
r'/articles?/',
+
r'/notes?/',
+
r'/entries/',
+
r'/writing/',
+
]
+
+
for pattern in blog_indicators:
+
if re.search(pattern, path):
+
return True
+
+
# If it has a reasonable path depth and doesn't match exclusions, likely a blog post
+
path_segments = [seg for seg in path.split('/') if seg]
+
return len(path_segments) >= 1 # At least one meaningful path segment
+
+
def resolve_target_user(
+
self, url: str, user_domains: dict[str, set[str]]
+
) -> Optional[str]:
+
"""Try to resolve a URL to a known user based on domain mapping."""
+
parsed_url = urlparse(url)
+
domain = parsed_url.netloc.lower()
+
+
for username, domains in user_domains.items():
+
if domain in domains:
+
return username
+
+
return None
+
+
def extract_references(
+
self, entry: AtomEntry, username: str, user_domains: dict[str, set[str]]
+
) -> list[BlogReference]:
+
"""Extract all blog references from an entry."""
+
references = []
+
+
# Combine all text content for analysis
+
content_to_search = []
+
if entry.content:
+
content_to_search.append(entry.content)
+
if entry.summary:
+
content_to_search.append(entry.summary)
+
+
for content in content_to_search:
+
links = self.extract_links_from_html(content)
+
+
for url, _link_text in links:
+
entry_domain = (
+
urlparse(str(entry.link)).netloc.lower() if entry.link else ""
+
)
+
link_domain = urlparse(url).netloc.lower()
+
+
# Check if this looks like a blog URL
+
if not self.is_blog_url(url):
+
continue
+
+
# For same-domain links, apply additional filtering to avoid non-blog content
+
if link_domain == entry_domain:
+
# Only include same-domain links that look like blog posts
+
if not self._is_likely_blog_post_url(url):
+
continue
+
+
# Try to resolve to a known user
+
if link_domain == entry_domain:
+
# Same domain - target user is the same as source user
+
target_username: Optional[str] = username
+
else:
+
# Different domain - try to resolve
+
target_username = self.resolve_target_user(url, user_domains)
+
+
ref = BlogReference(
+
source_entry_id=entry.id,
+
source_username=username,
+
target_url=url,
+
target_username=target_username,
+
target_entry_id=None, # Will be resolved later if possible
+
)
+
+
references.append(ref)
+
+
return references
+
+
def build_user_domain_mapping(self, git_store: "GitStore") -> dict[str, set[str]]:
+
"""Build mapping of usernames to their known domains."""
+
user_domains = {}
+
index = git_store._load_index()
+
+
for username, user_metadata in index.users.items():
+
domains = set()
+
+
# Add domains from feeds
+
for feed_url in user_metadata.feeds:
+
domain = urlparse(feed_url).netloc.lower()
+
if domain:
+
domains.add(domain)
+
+
# Add domain from homepage
+
if user_metadata.homepage:
+
domain = urlparse(str(user_metadata.homepage)).netloc.lower()
+
if domain:
+
domains.add(domain)
+
+
user_domains[username] = domains
+
+
return user_domains
+
+
def _build_url_to_entry_mapping(self, git_store: "GitStore") -> dict[str, str]:
+
"""Build a comprehensive mapping from URLs to entry IDs using git store data.
+
+
This creates a bidirectional mapping that handles:
+
- Entry link URLs -> Entry IDs
+
- URL variations (with/without www, http/https)
+
- Multiple URLs pointing to the same entry
+
"""
+
url_to_entry: dict[str, str] = {}
+
+
# Load index to get all users
+
index = git_store._load_index()
+
+
for username in index.users.keys():
+
entries = git_store.list_entries(username)
+
+
for entry in entries:
+
if entry.link:
+
link_url = str(entry.link)
+
entry_id = entry.id
+
+
# Map the canonical link URL
+
url_to_entry[link_url] = entry_id
+
+
# Handle common URL variations
+
parsed = urlparse(link_url)
+
if parsed.netloc and parsed.path:
+
# Add version without www
+
if parsed.netloc.startswith('www.'):
+
no_www_url = f"{parsed.scheme}://{parsed.netloc[4:]}{parsed.path}"
+
if parsed.query:
+
no_www_url += f"?{parsed.query}"
+
if parsed.fragment:
+
no_www_url += f"#{parsed.fragment}"
+
url_to_entry[no_www_url] = entry_id
+
+
# Add version with www if not present
+
elif not parsed.netloc.startswith('www.'):
+
www_url = f"{parsed.scheme}://www.{parsed.netloc}{parsed.path}"
+
if parsed.query:
+
www_url += f"?{parsed.query}"
+
if parsed.fragment:
+
www_url += f"#{parsed.fragment}"
+
url_to_entry[www_url] = entry_id
+
+
# Add http/https variations
+
if parsed.scheme == 'https':
+
http_url = link_url.replace('https://', 'http://', 1)
+
url_to_entry[http_url] = entry_id
+
elif parsed.scheme == 'http':
+
https_url = link_url.replace('http://', 'https://', 1)
+
url_to_entry[https_url] = entry_id
+
+
return url_to_entry
+
+
def _normalize_url(self, url: str) -> str:
+
"""Normalize URL for consistent matching.
+
+
Handles common variations like trailing slashes, fragments, etc.
+
"""
+
parsed = urlparse(url)
+
+
# Remove trailing slash from path
+
path = parsed.path.rstrip('/') if parsed.path != '/' else parsed.path
+
+
# Reconstruct without fragment for consistent matching
+
normalized = f"{parsed.scheme}://{parsed.netloc}{path}"
+
if parsed.query:
+
normalized += f"?{parsed.query}"
+
+
return normalized
+
+
def resolve_target_entry_ids(
+
self, references: list[BlogReference], git_store: "GitStore"
+
) -> list[BlogReference]:
+
"""Resolve target_entry_id for references using comprehensive URL mapping."""
+
resolved_refs = []
+
+
# Build comprehensive URL to entry ID mapping
+
url_to_entry = self._build_url_to_entry_mapping(git_store)
+
+
for ref in references:
+
# If we already have a target_entry_id, keep the reference as-is
+
if ref.target_entry_id is not None:
+
resolved_refs.append(ref)
+
continue
+
+
# If we don't have a target_username, we can't resolve it
+
if ref.target_username is None:
+
resolved_refs.append(ref)
+
continue
+
+
# Try to resolve using URL mapping
+
resolved_entry_id = None
+
+
# First, try exact match
+
if ref.target_url in url_to_entry:
+
resolved_entry_id = url_to_entry[ref.target_url]
+
else:
+
# Try normalized URL matching
+
normalized_target = self._normalize_url(ref.target_url)
+
if normalized_target in url_to_entry:
+
resolved_entry_id = url_to_entry[normalized_target]
+
else:
+
# Try URL variations
+
for mapped_url, entry_id in url_to_entry.items():
+
if self._normalize_url(mapped_url) == normalized_target:
+
resolved_entry_id = entry_id
+
break
+
+
# Verify the resolved entry belongs to the target username
+
if resolved_entry_id:
+
# Double-check by loading the actual entry
+
entries = git_store.list_entries(ref.target_username)
+
entry_found = any(entry.id == resolved_entry_id for entry in entries)
+
if not entry_found:
+
resolved_entry_id = None
+
+
# Create a new reference with the resolved target_entry_id
+
resolved_ref = BlogReference(
+
source_entry_id=ref.source_entry_id,
+
source_username=ref.source_username,
+
target_url=ref.target_url,
+
target_username=ref.target_username,
+
target_entry_id=resolved_entry_id,
+
)
+
resolved_refs.append(resolved_ref)
+
+
return resolved_refs
+30 -2
src/thicket/models/config.py
···
"""Configuration models for thicket."""
+
import json
+
import yaml
from pathlib import Path
-
from typing import Optional
+
from typing import Optional, Union
-
from pydantic import BaseModel, EmailStr, HttpUrl
+
from pydantic import BaseModel, EmailStr, HttpUrl, ValidationError
from pydantic_settings import BaseSettings, SettingsConfigDict
···
git_store: Path
cache_dir: Path
users: list[UserConfig] = []
+
+
@classmethod
+
def from_file(cls, config_path: Path) -> 'ThicketConfig':
+
"""Load configuration from a file."""
+
if not config_path.exists():
+
raise FileNotFoundError(f"Configuration file not found: {config_path}")
+
+
content = config_path.read_text(encoding='utf-8')
+
+
if config_path.suffix.lower() in ['.yaml', '.yml']:
+
try:
+
data = yaml.safe_load(content)
+
except yaml.YAMLError as e:
+
raise ValueError(f"Invalid YAML in {config_path}: {e}")
+
elif config_path.suffix.lower() == '.json':
+
try:
+
data = json.loads(content)
+
except json.JSONDecodeError as e:
+
raise ValueError(f"Invalid JSON in {config_path}: {e}")
+
else:
+
raise ValueError(f"Unsupported configuration file format: {config_path.suffix}")
+
+
try:
+
return cls(**data)
+
except ValidationError as e:
+
raise ValueError(f"Configuration validation error: {e}")
+2 -4
src/thicket/models/feed.py
···
"""Feed and entry models for thicket."""
from datetime import datetime
-
from typing import TYPE_CHECKING, Any, Optional
+
from typing import TYPE_CHECKING, Optional
from pydantic import BaseModel, ConfigDict, EmailStr, HttpUrl
···
summary: Optional[str] = None
content: Optional[str] = None # Full body content from Atom entry
content_type: Optional[str] = "html" # text, html, xhtml
-
author: Optional[dict[str, Any]] = None
+
author: Optional[dict] = None
categories: list[str] = []
rights: Optional[str] = None # Copyright info
source: Optional[str] = None # Source feed URL
-
links: list[str] = [] # URLs mentioned in this entry
-
backlinks: list[str] = [] # Entry IDs that link to this entry
class FeedMetadata(BaseModel):
+1
src/thicket/subsystems/__init__.py
···
+
"""Thicket subsystems for specialized operations."""
+227
src/thicket/subsystems/feeds.py
···
+
"""Feed management subsystem."""
+
+
import asyncio
+
import json
+
from datetime import datetime
+
from pathlib import Path
+
from typing import Callable, Optional
+
+
from pydantic import HttpUrl
+
+
from ..core.feed_parser import FeedParser
+
from ..core.git_store import GitStore
+
from ..models import AtomEntry, ThicketConfig
+
+
+
class FeedManager:
+
"""Manages feed operations and caching."""
+
+
def __init__(self, git_store: GitStore, feed_parser: FeedParser, config: ThicketConfig):
+
"""Initialize feed manager."""
+
self.git_store = git_store
+
self.feed_parser = feed_parser
+
self.config = config
+
self._ensure_cache_dir()
+
+
def _ensure_cache_dir(self):
+
"""Ensure cache directory exists."""
+
self.config.cache_dir.mkdir(parents=True, exist_ok=True)
+
+
async def sync_feeds(self, username: Optional[str] = None, progress_callback: Optional[Callable] = None) -> dict:
+
"""Sync feeds for all users or specific user."""
+
if username:
+
return await self.sync_user_feeds(username, progress_callback)
+
+
# Sync all users
+
results = {}
+
total_users = len(self.config.users)
+
+
for i, user_config in enumerate(self.config.users):
+
if progress_callback:
+
progress_callback(f"Syncing feeds for {user_config.username}", i, total_users)
+
+
user_results = await self.sync_user_feeds(user_config.username, progress_callback)
+
results[user_config.username] = user_results
+
+
return results
+
+
async def sync_user_feeds(self, username: str, progress_callback: Optional[Callable] = None) -> dict:
+
"""Sync feeds for a specific user."""
+
user_config = next((u for u in self.config.users if u.username == username), None)
+
if not user_config:
+
return {'error': f'User {username} not found in configuration'}
+
+
# Ensure user exists in git store
+
git_user = self.git_store.get_user(username)
+
if not git_user:
+
self.git_store.add_user(
+
username=user_config.username,
+
display_name=user_config.display_name,
+
email=str(user_config.email) if user_config.email else None,
+
homepage=str(user_config.homepage) if user_config.homepage else None,
+
icon=str(user_config.icon) if user_config.icon else None,
+
feeds=[str(feed) for feed in user_config.feeds]
+
)
+
+
results = {
+
'username': username,
+
'feeds_processed': 0,
+
'new_entries': 0,
+
'errors': [],
+
'feeds': {}
+
}
+
+
total_feeds = len(user_config.feeds)
+
+
for i, feed_url in enumerate(user_config.feeds):
+
if progress_callback:
+
progress_callback(f"Processing feed {i+1}/{total_feeds} for {username}", i, total_feeds)
+
+
try:
+
feed_result = await self._sync_single_feed(username, feed_url)
+
results['feeds'][str(feed_url)] = feed_result
+
results['feeds_processed'] += 1
+
results['new_entries'] += feed_result.get('new_entries', 0)
+
except Exception as e:
+
error_msg = f"Error syncing {feed_url}: {str(e)}"
+
results['errors'].append(error_msg)
+
results['feeds'][str(feed_url)] = {'error': error_msg}
+
+
return results
+
+
async def _sync_single_feed(self, username: str, feed_url: HttpUrl) -> dict:
+
"""Sync a single feed for a user."""
+
cache_key = self._get_cache_key(username, feed_url)
+
last_modified = self._get_last_modified(cache_key)
+
+
try:
+
# Fetch feed content
+
content = await self.feed_parser.fetch_feed(feed_url)
+
+
# Parse feed
+
feed_meta, entries = self.feed_parser.parse_feed(content, feed_url)
+
+
# Filter new entries
+
new_entries = []
+
for entry in entries:
+
existing_entry = self.git_store.get_entry(username, entry.id)
+
if not existing_entry:
+
new_entries.append(entry)
+
+
# Store new entries
+
stored_count = 0
+
for entry in new_entries:
+
if self.git_store.store_entry(username, entry):
+
stored_count += 1
+
+
# Update cache
+
self._update_cache(cache_key, {
+
'last_fetched': datetime.now().isoformat(),
+
'feed_meta': feed_meta.model_dump(exclude_none=True),
+
'entry_count': len(entries),
+
'new_entries': stored_count,
+
'feed_url': str(feed_url)
+
})
+
+
return {
+
'success': True,
+
'total_entries': len(entries),
+
'new_entries': stored_count,
+
'feed_title': feed_meta.title,
+
'last_fetched': datetime.now().isoformat()
+
}
+
+
except Exception as e:
+
return {
+
'success': False,
+
'error': str(e),
+
'feed_url': str(feed_url)
+
}
+
+
def get_entries(self, username: str, limit: Optional[int] = None) -> list[AtomEntry]:
+
"""Get entries for a user."""
+
return self.git_store.list_entries(username, limit)
+
+
def get_entry(self, username: str, entry_id: str) -> Optional[AtomEntry]:
+
"""Get a specific entry."""
+
return self.git_store.get_entry(username, entry_id)
+
+
def search_entries(self, query: str, username: Optional[str] = None, limit: Optional[int] = None) -> list[tuple[str, AtomEntry]]:
+
"""Search entries across users."""
+
return self.git_store.search_entries(query, username, limit)
+
+
def get_stats(self) -> dict:
+
"""Get feed-related statistics."""
+
index = self.git_store._load_index()
+
+
feed_stats = {
+
'total_feeds_configured': sum(len(user.feeds) for user in self.config.users),
+
'users_with_entries': len([u for u in index.users.values() if u.entry_count > 0]),
+
'cache_files': len(list(self.config.cache_dir.glob("*.json"))) if self.config.cache_dir.exists() else 0,
+
}
+
+
return feed_stats
+
+
def _get_cache_key(self, username: str, feed_url: HttpUrl) -> str:
+
"""Generate cache key for feed."""
+
# Simple hash of username and feed URL
+
import hashlib
+
key_data = f"{username}:{str(feed_url)}"
+
return hashlib.md5(key_data.encode()).hexdigest()
+
+
def _get_last_modified(self, cache_key: str) -> Optional[datetime]:
+
"""Get last modified time from cache."""
+
cache_file = self.config.cache_dir / f"{cache_key}.json"
+
if cache_file.exists():
+
try:
+
with open(cache_file) as f:
+
data = json.load(f)
+
return datetime.fromisoformat(data.get('last_fetched', ''))
+
except Exception:
+
pass
+
return None
+
+
def _update_cache(self, cache_key: str, data: dict):
+
"""Update cache with feed data."""
+
cache_file = self.config.cache_dir / f"{cache_key}.json"
+
try:
+
with open(cache_file, 'w') as f:
+
json.dump(data, f, indent=2)
+
except Exception:
+
# Cache update failure shouldn't break the sync
+
pass
+
+
def clear_cache(self, username: Optional[str] = None) -> bool:
+
"""Clear feed cache."""
+
try:
+
if username:
+
# Clear cache for specific user
+
for user_config in self.config.users:
+
if user_config.username == username:
+
for feed_url in user_config.feeds:
+
cache_key = self._get_cache_key(username, feed_url)
+
cache_file = self.config.cache_dir / f"{cache_key}.json"
+
if cache_file.exists():
+
cache_file.unlink()
+
else:
+
# Clear all cache
+
if self.config.cache_dir.exists():
+
for cache_file in self.config.cache_dir.glob("*.json"):
+
cache_file.unlink()
+
return True
+
except Exception:
+
return False
+
+
def get_feed_info(self, username: str, feed_url: str) -> Optional[dict]:
+
"""Get cached information about a specific feed."""
+
try:
+
feed_url_obj = HttpUrl(feed_url)
+
cache_key = self._get_cache_key(username, feed_url_obj)
+
cache_file = self.config.cache_dir / f"{cache_key}.json"
+
+
if cache_file.exists():
+
with open(cache_file) as f:
+
return json.load(f)
+
except Exception:
+
pass
+
return None
+304
src/thicket/subsystems/links.py
···
+
"""Link processing subsystem."""
+
+
import json
+
import re
+
from collections import defaultdict
+
from pathlib import Path
+
from typing import Optional
+
from urllib.parse import urljoin, urlparse
+
+
from ..core.git_store import GitStore
+
from ..models import AtomEntry, ThicketConfig
+
+
+
class LinkProcessor:
+
"""Processes and manages links between entries."""
+
+
def __init__(self, git_store: GitStore, config: ThicketConfig):
+
"""Initialize link processor."""
+
self.git_store = git_store
+
self.config = config
+
self.links_file = self.git_store.repo_path / "links.json"
+
+
def process_links(self, username: Optional[str] = None) -> dict:
+
"""Process and extract links from entries."""
+
if username:
+
return self._process_user_links(username)
+
+
# Process all users
+
results = {}
+
index = self.git_store._load_index()
+
+
for user_metadata in index.users.values():
+
user_results = self._process_user_links(user_metadata.username)
+
results[user_metadata.username] = user_results
+
+
# Consolidate all links
+
self._consolidate_links()
+
+
return results
+
+
def _process_user_links(self, username: str) -> dict:
+
"""Process links for a specific user."""
+
entries = self.git_store.list_entries(username)
+
+
results = {
+
'username': username,
+
'entries_processed': 0,
+
'links_found': 0,
+
'external_links': 0,
+
'internal_links': 0,
+
}
+
+
links_data = self._load_links_data()
+
+
for entry in entries:
+
entry_links = self._extract_links_from_entry(entry)
+
+
if entry_links:
+
# Store links for this entry
+
entry_key = f"{username}:{entry.id}"
+
links_data[entry_key] = {
+
'entry_id': entry.id,
+
'username': username,
+
'title': entry.title,
+
'links': entry_links,
+
'processed_at': entry.updated.isoformat() if entry.updated else None,
+
}
+
+
results['links_found'] += len(entry_links)
+
results['external_links'] += len([l for l in entry_links if self._is_external_link(l['url'])])
+
results['internal_links'] += len([l for l in entry_links if not self._is_external_link(l['url'])])
+
+
results['entries_processed'] += 1
+
+
self._save_links_data(links_data)
+
+
return results
+
+
def _extract_links_from_entry(self, entry: AtomEntry) -> list[dict]:
+
"""Extract links from an entry's content."""
+
links = []
+
+
# Combine content and summary for link extraction
+
text_content = ""
+
if entry.content:
+
text_content += entry.content
+
if entry.summary:
+
text_content += " " + entry.summary
+
+
if not text_content:
+
return links
+
+
# Extract HTML links
+
html_link_pattern = r'<a[^>]+href=["\']([^"\']+)["\'][^>]*>([^<]*)</a>'
+
html_matches = re.findall(html_link_pattern, text_content, re.IGNORECASE)
+
+
for url, text in html_matches:
+
# Clean up the URL
+
url = url.strip()
+
text = text.strip()
+
+
if url and url not in ['#', 'javascript:void(0)']:
+
# Resolve relative URLs if possible
+
if entry.link and url.startswith('/'):
+
base_url = str(entry.link)
+
parsed_base = urlparse(base_url)
+
base_domain = f"{parsed_base.scheme}://{parsed_base.netloc}"
+
url = urljoin(base_domain, url)
+
+
links.append({
+
'url': url,
+
'text': text or url,
+
'type': 'html'
+
})
+
+
# Extract markdown links
+
markdown_link_pattern = r'\[([^\]]*)\]\(([^\)]+)\)'
+
markdown_matches = re.findall(markdown_link_pattern, text_content)
+
+
for text, url in markdown_matches:
+
url = url.strip()
+
text = text.strip()
+
+
if url and url not in ['#']:
+
links.append({
+
'url': url,
+
'text': text or url,
+
'type': 'markdown'
+
})
+
+
# Extract plain URLs
+
url_pattern = r'https?://[^\s<>"]+[^\s<>".,;!?]'
+
url_matches = re.findall(url_pattern, text_content)
+
+
for url in url_matches:
+
# Skip if already found as HTML or markdown link
+
if not any(link['url'] == url for link in links):
+
links.append({
+
'url': url,
+
'text': url,
+
'type': 'plain'
+
})
+
+
return links
+
+
def _is_external_link(self, url: str) -> bool:
+
"""Check if a link is external to the configured domains."""
+
try:
+
parsed = urlparse(url)
+
domain = parsed.netloc.lower()
+
+
# Check against user domains from feeds
+
for user_config in self.config.users:
+
for feed_url in user_config.feeds:
+
feed_domain = urlparse(str(feed_url)).netloc.lower()
+
if domain == feed_domain or domain.endswith(f'.{feed_domain}'):
+
return False
+
+
# Check homepage domain
+
if user_config.homepage:
+
homepage_domain = urlparse(str(user_config.homepage)).netloc.lower()
+
if domain == homepage_domain or domain.endswith(f'.{homepage_domain}'):
+
return False
+
+
return True
+
except Exception:
+
return True
+
+
def _load_links_data(self) -> dict:
+
"""Load existing links data."""
+
if self.links_file.exists():
+
try:
+
with open(self.links_file) as f:
+
return json.load(f)
+
except Exception:
+
pass
+
return {}
+
+
def _save_links_data(self, links_data: dict):
+
"""Save links data to file."""
+
try:
+
with open(self.links_file, 'w') as f:
+
json.dump(links_data, f, indent=2, ensure_ascii=False)
+
except Exception:
+
# Link processing failure shouldn't break the main operation
+
pass
+
+
def _consolidate_links(self):
+
"""Consolidate and create reverse link mappings."""
+
links_data = self._load_links_data()
+
+
# Create URL to entries mapping
+
url_mapping = defaultdict(list)
+
+
for entry_key, entry_data in links_data.items():
+
for link in entry_data.get('links', []):
+
url_mapping[link['url']].append({
+
'entry_key': entry_key,
+
'username': entry_data['username'],
+
'entry_id': entry_data['entry_id'],
+
'title': entry_data['title'],
+
'link_text': link['text'],
+
'link_type': link['type'],
+
})
+
+
# Save URL mapping
+
url_mapping_file = self.git_store.repo_path / "url_mapping.json"
+
try:
+
with open(url_mapping_file, 'w') as f:
+
json.dump(dict(url_mapping), f, indent=2, ensure_ascii=False)
+
except Exception:
+
pass
+
+
def get_links(self, username: Optional[str] = None) -> dict:
+
"""Get processed links."""
+
links_data = self._load_links_data()
+
+
if username:
+
user_links = {k: v for k, v in links_data.items() if v.get('username') == username}
+
return user_links
+
+
return links_data
+
+
def find_references(self, url: str) -> list[tuple[str, AtomEntry]]:
+
"""Find entries that reference a URL."""
+
url_mapping_file = self.git_store.repo_path / "url_mapping.json"
+
+
if not url_mapping_file.exists():
+
return []
+
+
try:
+
with open(url_mapping_file) as f:
+
url_mapping = json.load(f)
+
+
references = url_mapping.get(url, [])
+
results = []
+
+
for ref in references:
+
entry = self.git_store.get_entry(ref['username'], ref['entry_id'])
+
if entry:
+
results.append((ref['username'], entry))
+
+
return results
+
except Exception:
+
return []
+
+
def get_stats(self) -> dict:
+
"""Get link processing statistics."""
+
links_data = self._load_links_data()
+
+
total_entries_with_links = len(links_data)
+
total_links = sum(len(entry_data.get('links', [])) for entry_data in links_data.values())
+
+
external_links = 0
+
internal_links = 0
+
+
for entry_data in links_data.values():
+
for link in entry_data.get('links', []):
+
if self._is_external_link(link['url']):
+
external_links += 1
+
else:
+
internal_links += 1
+
+
# Count unique URLs
+
unique_urls = set()
+
for entry_data in links_data.values():
+
for link in entry_data.get('links', []):
+
unique_urls.add(link['url'])
+
+
return {
+
'entries_with_links': total_entries_with_links,
+
'total_links': total_links,
+
'unique_urls': len(unique_urls),
+
'external_links': external_links,
+
'internal_links': internal_links,
+
}
+
+
def get_most_referenced_urls(self, limit: int = 10) -> list[dict]:
+
"""Get most frequently referenced URLs."""
+
url_mapping_file = self.git_store.repo_path / "url_mapping.json"
+
+
if not url_mapping_file.exists():
+
return []
+
+
try:
+
with open(url_mapping_file) as f:
+
url_mapping = json.load(f)
+
+
# Count references per URL
+
url_counts = [(url, len(refs)) for url, refs in url_mapping.items()]
+
url_counts.sort(key=lambda x: x[1], reverse=True)
+
+
results = []
+
for url, count in url_counts[:limit]:
+
results.append({
+
'url': url,
+
'reference_count': count,
+
'is_external': self._is_external_link(url),
+
'references': url_mapping[url]
+
})
+
+
return results
+
except Exception:
+
return []
+158
src/thicket/subsystems/repository.py
···
+
"""Repository management subsystem."""
+
+
import shutil
+
from datetime import datetime
+
from pathlib import Path
+
from typing import Optional
+
+
from ..core.git_store import GitStore
+
from ..models import ThicketConfig
+
+
+
class RepositoryManager:
+
"""Manages repository operations and metadata."""
+
+
def __init__(self, git_store: GitStore, config: ThicketConfig):
+
"""Initialize repository manager."""
+
self.git_store = git_store
+
self.config = config
+
+
def init_repository(self) -> bool:
+
"""Initialize the git repository if not already done."""
+
try:
+
# GitStore.__init__ already handles repository initialization
+
return True
+
except Exception:
+
return False
+
+
def commit_changes(self, message: str) -> bool:
+
"""Commit all pending changes."""
+
try:
+
self.git_store.commit_changes(message)
+
return True
+
except Exception:
+
return False
+
+
def get_status(self) -> dict:
+
"""Get repository status and statistics."""
+
try:
+
stats = self.git_store.get_stats()
+
+
# Add repository-specific information
+
repo_status = {
+
**stats,
+
'repository_path': str(self.config.git_store),
+
'cache_path': str(self.config.cache_dir),
+
'has_uncommitted_changes': self._has_uncommitted_changes(),
+
'last_commit': self._get_last_commit_info(),
+
}
+
+
return repo_status
+
except Exception as e:
+
return {'error': str(e)}
+
+
def backup_repository(self, backup_path: Path) -> bool:
+
"""Create a backup of the repository."""
+
try:
+
if backup_path.exists():
+
shutil.rmtree(backup_path)
+
+
shutil.copytree(self.config.git_store, backup_path)
+
return True
+
except Exception:
+
return False
+
+
def cleanup_cache(self) -> bool:
+
"""Clean up cache directory."""
+
try:
+
if self.config.cache_dir.exists():
+
shutil.rmtree(self.config.cache_dir)
+
self.config.cache_dir.mkdir(parents=True, exist_ok=True)
+
return True
+
except Exception:
+
return False
+
+
def get_repository_size(self) -> dict:
+
"""Get detailed repository size information."""
+
try:
+
total_size = 0
+
file_count = 0
+
dir_count = 0
+
+
for path in self.config.git_store.rglob("*"):
+
if path.is_file():
+
total_size += path.stat().st_size
+
file_count += 1
+
elif path.is_dir():
+
dir_count += 1
+
+
return {
+
'total_size_bytes': total_size,
+
'total_size_mb': round(total_size / (1024 * 1024), 2),
+
'file_count': file_count,
+
'directory_count': dir_count,
+
}
+
except Exception as e:
+
return {'error': str(e)}
+
+
def _has_uncommitted_changes(self) -> bool:
+
"""Check if there are uncommitted changes."""
+
try:
+
if not self.git_store.repo:
+
return False
+
return bool(self.git_store.repo.index.diff("HEAD") or self.git_store.repo.untracked_files)
+
except Exception:
+
return False
+
+
def _get_last_commit_info(self) -> Optional[dict]:
+
"""Get information about the last commit."""
+
try:
+
if not self.git_store.repo:
+
return None
+
+
last_commit = self.git_store.repo.head.commit
+
return {
+
'hash': last_commit.hexsha[:8],
+
'message': last_commit.message.strip(),
+
'author': str(last_commit.author),
+
'date': datetime.fromtimestamp(last_commit.committed_date).isoformat(),
+
}
+
except Exception:
+
return None
+
+
def verify_integrity(self) -> dict:
+
"""Verify repository integrity."""
+
issues = []
+
+
# Check if git repository is valid
+
try:
+
if not self.git_store.repo:
+
issues.append("Git repository not initialized")
+
except Exception as e:
+
issues.append(f"Git repository error: {e}")
+
+
# Check if index.json exists and is valid
+
index_path = self.config.git_store / "index.json"
+
if not index_path.exists():
+
issues.append("index.json missing")
+
else:
+
try:
+
self.git_store._load_index()
+
except Exception as e:
+
issues.append(f"index.json corrupted: {e}")
+
+
# Check if duplicates.json exists
+
duplicates_path = self.config.git_store / "duplicates.json"
+
if not duplicates_path.exists():
+
issues.append("duplicates.json missing")
+
else:
+
try:
+
self.git_store._load_duplicates()
+
except Exception as e:
+
issues.append(f"duplicates.json corrupted: {e}")
+
+
return {
+
'is_valid': len(issues) == 0,
+
'issues': issues,
+
'checked_at': datetime.now().isoformat(),
+
}
+319
src/thicket/subsystems/site.py
···
+
"""Site generation subsystem."""
+
+
import json
+
import shutil
+
from datetime import datetime
+
from pathlib import Path
+
from typing import Optional
+
+
from jinja2 import Environment, FileSystemLoader, select_autoescape
+
+
from ..core.git_store import GitStore
+
from ..models import ThicketConfig
+
+
+
class SiteGenerator:
+
"""Generates static sites from stored entries."""
+
+
def __init__(self, git_store: GitStore, config: ThicketConfig):
+
"""Initialize site generator."""
+
self.git_store = git_store
+
self.config = config
+
self.default_template_dir = Path(__file__).parent.parent / "templates"
+
+
def generate_site(self, output_dir: Path, template_dir: Optional[Path] = None) -> bool:
+
"""Generate complete static site."""
+
try:
+
# Setup template environment
+
template_dir = template_dir or self.default_template_dir
+
if not template_dir.exists():
+
return False
+
+
env = Environment(
+
loader=FileSystemLoader(str(template_dir)),
+
autoescape=select_autoescape(['html', 'xml'])
+
)
+
+
# Prepare output directory
+
output_dir.mkdir(parents=True, exist_ok=True)
+
+
# Copy static assets
+
self._copy_static_assets(template_dir, output_dir)
+
+
# Generate pages
+
self._generate_index_page(env, output_dir)
+
self._generate_timeline_page(env, output_dir)
+
self._generate_users_page(env, output_dir)
+
self._generate_links_page(env, output_dir)
+
self._generate_user_detail_pages(env, output_dir)
+
+
return True
+
except Exception:
+
return False
+
+
def generate_timeline(self, output_path: Path, limit: Optional[int] = None) -> bool:
+
"""Generate timeline HTML file."""
+
try:
+
env = Environment(
+
loader=FileSystemLoader(str(self.default_template_dir)),
+
autoescape=select_autoescape(['html', 'xml'])
+
)
+
+
timeline_data = self._get_timeline_data(limit)
+
template = env.get_template('timeline.html')
+
+
content = template.render(**timeline_data)
+
+
output_path.parent.mkdir(parents=True, exist_ok=True)
+
with open(output_path, 'w', encoding='utf-8') as f:
+
f.write(content)
+
+
return True
+
except Exception:
+
return False
+
+
def generate_user_pages(self, output_dir: Path) -> bool:
+
"""Generate individual user pages."""
+
try:
+
env = Environment(
+
loader=FileSystemLoader(str(self.default_template_dir)),
+
autoescape=select_autoescape(['html', 'xml'])
+
)
+
+
return self._generate_user_detail_pages(env, output_dir)
+
except Exception:
+
return False
+
+
def _copy_static_assets(self, template_dir: Path, output_dir: Path):
+
"""Copy CSS, JS, and other static assets."""
+
static_files = ['style.css', 'script.js']
+
+
for filename in static_files:
+
src_file = template_dir / filename
+
if src_file.exists():
+
dst_file = output_dir / filename
+
shutil.copy2(src_file, dst_file)
+
+
def _generate_index_page(self, env: Environment, output_dir: Path):
+
"""Generate main index page."""
+
template = env.get_template('index.html')
+
+
# Get summary statistics
+
stats = self.git_store.get_stats()
+
index = self.git_store._load_index()
+
+
# Recent entries
+
recent_entries = []
+
for username in index.users.keys():
+
user_entries = self.git_store.list_entries(username, limit=5)
+
for entry in user_entries:
+
recent_entries.append({
+
'username': username,
+
'entry': entry
+
})
+
+
# Sort by date
+
recent_entries.sort(key=lambda x: x['entry'].updated or x['entry'].published, reverse=True)
+
recent_entries = recent_entries[:10]
+
+
context = {
+
'title': 'Thicket Feed Archive',
+
'stats': stats,
+
'recent_entries': recent_entries,
+
'users': list(index.users.values()),
+
'generated_at': datetime.now().isoformat(),
+
}
+
+
content = template.render(**context)
+
+
with open(output_dir / 'index.html', 'w', encoding='utf-8') as f:
+
f.write(content)
+
+
def _generate_timeline_page(self, env: Environment, output_dir: Path):
+
"""Generate timeline page."""
+
template = env.get_template('timeline.html')
+
timeline_data = self._get_timeline_data()
+
+
content = template.render(**timeline_data)
+
+
with open(output_dir / 'timeline.html', 'w', encoding='utf-8') as f:
+
f.write(content)
+
+
def _generate_users_page(self, env: Environment, output_dir: Path):
+
"""Generate users overview page."""
+
template = env.get_template('users.html')
+
+
index = self.git_store._load_index()
+
users_data = []
+
+
for user_metadata in index.users.values():
+
# Get user config for additional details
+
user_config = next(
+
(u for u in self.config.users if u.username == user_metadata.username),
+
None
+
)
+
+
# Get recent entries
+
recent_entries = self.git_store.list_entries(user_metadata.username, limit=3)
+
+
users_data.append({
+
'metadata': user_metadata,
+
'config': user_config,
+
'recent_entries': recent_entries,
+
})
+
+
# Sort by entry count
+
users_data.sort(key=lambda x: x['metadata'].entry_count, reverse=True)
+
+
context = {
+
'title': 'Users',
+
'users': users_data,
+
'generated_at': datetime.now().isoformat(),
+
}
+
+
content = template.render(**context)
+
+
with open(output_dir / 'users.html', 'w', encoding='utf-8') as f:
+
f.write(content)
+
+
def _generate_links_page(self, env: Environment, output_dir: Path):
+
"""Generate links overview page."""
+
template = env.get_template('links.html')
+
+
# Load links data
+
links_file = self.git_store.repo_path / "links.json"
+
url_mapping_file = self.git_store.repo_path / "url_mapping.json"
+
+
links_data = {}
+
url_mapping = {}
+
+
if links_file.exists():
+
try:
+
with open(links_file) as f:
+
links_data = json.load(f)
+
except Exception:
+
pass
+
+
if url_mapping_file.exists():
+
try:
+
with open(url_mapping_file) as f:
+
url_mapping = json.load(f)
+
except Exception:
+
pass
+
+
# Process most referenced URLs
+
url_counts = [(url, len(refs)) for url, refs in url_mapping.items()]
+
url_counts.sort(key=lambda x: x[1], reverse=True)
+
most_referenced = url_counts[:20]
+
+
# Count links by type
+
link_stats = {
+
'total_entries_with_links': len(links_data),
+
'total_links': sum(len(entry_data.get('links', [])) for entry_data in links_data.values()),
+
'unique_urls': len(url_mapping),
+
}
+
+
context = {
+
'title': 'Links',
+
'most_referenced': most_referenced,
+
'url_mapping': url_mapping,
+
'link_stats': link_stats,
+
'generated_at': datetime.now().isoformat(),
+
}
+
+
content = template.render(**context)
+
+
with open(output_dir / 'links.html', 'w', encoding='utf-8') as f:
+
f.write(content)
+
+
def _generate_user_detail_pages(self, env: Environment, output_dir: Path) -> bool:
+
"""Generate individual user detail pages."""
+
try:
+
template = env.get_template('user_detail.html')
+
index = self.git_store._load_index()
+
+
# Create users subdirectory
+
users_dir = output_dir / 'users'
+
users_dir.mkdir(exist_ok=True)
+
+
for user_metadata in index.users.values():
+
user_config = next(
+
(u for u in self.config.users if u.username == user_metadata.username),
+
None
+
)
+
+
entries = self.git_store.list_entries(user_metadata.username)
+
+
# Get user's links
+
links_file = self.git_store.repo_path / "links.json"
+
user_links = []
+
if links_file.exists():
+
try:
+
with open(links_file) as f:
+
all_links = json.load(f)
+
user_links = [
+
data for key, data in all_links.items()
+
if data.get('username') == user_metadata.username
+
]
+
except Exception:
+
pass
+
+
context = {
+
'title': f"{user_metadata.display_name or user_metadata.username}",
+
'user_metadata': user_metadata,
+
'user_config': user_config,
+
'entries': entries,
+
'user_links': user_links,
+
'generated_at': datetime.now().isoformat(),
+
}
+
+
content = template.render(**context)
+
+
user_file = users_dir / f"{user_metadata.username}.html"
+
with open(user_file, 'w', encoding='utf-8') as f:
+
f.write(content)
+
+
return True
+
except Exception:
+
return False
+
+
def _get_timeline_data(self, limit: Optional[int] = None) -> dict:
+
"""Get data for timeline page."""
+
index = self.git_store._load_index()
+
+
# Collect all entries with metadata
+
all_entries = []
+
for user_metadata in index.users.values():
+
user_entries = self.git_store.list_entries(user_metadata.username)
+
for entry in user_entries:
+
all_entries.append({
+
'username': user_metadata.username,
+
'display_name': user_metadata.display_name,
+
'entry': entry,
+
})
+
+
# Sort by date (newest first)
+
all_entries.sort(
+
key=lambda x: x['entry'].updated or x['entry'].published or datetime.min,
+
reverse=True
+
)
+
+
if limit:
+
all_entries = all_entries[:limit]
+
+
# Group by date for timeline display
+
timeline_groups = {}
+
for item in all_entries:
+
entry_date = item['entry'].updated or item['entry'].published
+
if entry_date:
+
date_key = entry_date.strftime('%Y-%m-%d')
+
if date_key not in timeline_groups:
+
timeline_groups[date_key] = []
+
timeline_groups[date_key].append(item)
+
+
return {
+
'title': 'Timeline',
+
'timeline_groups': timeline_groups,
+
'total_entries': len(all_entries),
+
'generated_at': datetime.now().isoformat(),
+
}
+254
src/thicket/subsystems/users.py
···
+
"""User management subsystem."""
+
+
import shutil
+
from typing import Optional
+
+
from pydantic import EmailStr, HttpUrl, ValidationError
+
+
from ..core.git_store import GitStore
+
from ..models import ThicketConfig, UserConfig, UserMetadata
+
+
+
class UserManager:
+
"""Manages user operations and metadata."""
+
+
def __init__(self, git_store: GitStore, config: ThicketConfig):
+
"""Initialize user manager."""
+
self.git_store = git_store
+
self.config = config
+
+
def add_user(self, username: str, feeds: list[str], **kwargs) -> UserConfig:
+
"""Add a new user with feeds."""
+
# Validate feeds
+
validated_feeds = []
+
for feed in feeds:
+
try:
+
validated_feeds.append(HttpUrl(feed))
+
except ValidationError as e:
+
raise ValueError(f"Invalid feed URL '{feed}': {e}")
+
+
# Validate optional fields
+
email = None
+
if 'email' in kwargs and kwargs['email']:
+
try:
+
email = EmailStr(kwargs['email'])
+
except ValidationError as e:
+
raise ValueError(f"Invalid email '{kwargs['email']}': {e}")
+
+
homepage = None
+
if 'homepage' in kwargs and kwargs['homepage']:
+
try:
+
homepage = HttpUrl(kwargs['homepage'])
+
except ValidationError as e:
+
raise ValueError(f"Invalid homepage URL '{kwargs['homepage']}': {e}")
+
+
icon = None
+
if 'icon' in kwargs and kwargs['icon']:
+
try:
+
icon = HttpUrl(kwargs['icon'])
+
except ValidationError as e:
+
raise ValueError(f"Invalid icon URL '{kwargs['icon']}': {e}")
+
+
# Create user config
+
user_config = UserConfig(
+
username=username,
+
feeds=validated_feeds,
+
email=email,
+
homepage=homepage,
+
icon=icon,
+
display_name=kwargs.get('display_name')
+
)
+
+
# Add to git store
+
self.git_store.add_user(
+
username=username,
+
display_name=user_config.display_name,
+
email=str(user_config.email) if user_config.email else None,
+
homepage=str(user_config.homepage) if user_config.homepage else None,
+
icon=str(user_config.icon) if user_config.icon else None,
+
feeds=[str(feed) for feed in user_config.feeds]
+
)
+
+
# Add to config if not already present
+
existing_user = next((u for u in self.config.users if u.username == username), None)
+
if not existing_user:
+
self.config.users.append(user_config)
+
else:
+
# Update existing config
+
existing_user.feeds = user_config.feeds
+
existing_user.email = user_config.email
+
existing_user.homepage = user_config.homepage
+
existing_user.icon = user_config.icon
+
existing_user.display_name = user_config.display_name
+
+
return user_config
+
+
def get_user(self, username: str) -> Optional[UserConfig]:
+
"""Get user configuration."""
+
return next((u for u in self.config.users if u.username == username), None)
+
+
def get_user_metadata(self, username: str) -> Optional[UserMetadata]:
+
"""Get user metadata from git store."""
+
return self.git_store.get_user(username)
+
+
def list_users(self) -> list[UserConfig]:
+
"""List all configured users."""
+
return self.config.users.copy()
+
+
def list_users_with_metadata(self) -> list[tuple[UserConfig, Optional[UserMetadata]]]:
+
"""List users with their git store metadata."""
+
result = []
+
for user_config in self.config.users:
+
metadata = self.git_store.get_user(user_config.username)
+
result.append((user_config, metadata))
+
return result
+
+
def update_user(self, username: str, **kwargs) -> bool:
+
"""Update user configuration."""
+
# Update in config
+
user_config = self.get_user(username)
+
if not user_config:
+
return False
+
+
# Validate and update feeds if provided
+
if 'feeds' in kwargs:
+
validated_feeds = []
+
for feed in kwargs['feeds']:
+
try:
+
validated_feeds.append(HttpUrl(feed))
+
except ValidationError:
+
return False
+
user_config.feeds = validated_feeds
+
+
# Validate and update other fields
+
if 'email' in kwargs and kwargs['email']:
+
try:
+
user_config.email = EmailStr(kwargs['email'])
+
except ValidationError:
+
return False
+
elif 'email' in kwargs and not kwargs['email']:
+
user_config.email = None
+
+
if 'homepage' in kwargs and kwargs['homepage']:
+
try:
+
user_config.homepage = HttpUrl(kwargs['homepage'])
+
except ValidationError:
+
return False
+
elif 'homepage' in kwargs and not kwargs['homepage']:
+
user_config.homepage = None
+
+
if 'icon' in kwargs and kwargs['icon']:
+
try:
+
user_config.icon = HttpUrl(kwargs['icon'])
+
except ValidationError:
+
return False
+
elif 'icon' in kwargs and not kwargs['icon']:
+
user_config.icon = None
+
+
if 'display_name' in kwargs:
+
user_config.display_name = kwargs['display_name'] or None
+
+
# Update in git store
+
git_kwargs = {}
+
if 'feeds' in kwargs:
+
git_kwargs['feeds'] = [str(feed) for feed in user_config.feeds]
+
if user_config.email:
+
git_kwargs['email'] = str(user_config.email)
+
if user_config.homepage:
+
git_kwargs['homepage'] = str(user_config.homepage)
+
if user_config.icon:
+
git_kwargs['icon'] = str(user_config.icon)
+
if user_config.display_name:
+
git_kwargs['display_name'] = user_config.display_name
+
+
return self.git_store.update_user(username, **git_kwargs)
+
+
def remove_user(self, username: str) -> bool:
+
"""Remove a user and their data."""
+
# Remove from config
+
self.config.users = [u for u in self.config.users if u.username != username]
+
+
# Remove user directory from git store
+
user_metadata = self.git_store.get_user(username)
+
if user_metadata:
+
user_dir = self.git_store.repo_path / user_metadata.directory
+
if user_dir.exists():
+
try:
+
shutil.rmtree(user_dir)
+
except Exception:
+
return False
+
+
# Remove user from index
+
index = self.git_store._load_index()
+
if username in index.users:
+
del index.users[username]
+
self.git_store._save_index(index)
+
+
return True
+
+
def get_user_stats(self, username: str) -> Optional[dict]:
+
"""Get statistics for a specific user."""
+
user_metadata = self.git_store.get_user(username)
+
if not user_metadata:
+
return None
+
+
user_config = self.get_user(username)
+
entries = self.git_store.list_entries(username)
+
+
return {
+
'username': username,
+
'display_name': user_metadata.display_name,
+
'entry_count': user_metadata.entry_count,
+
'feeds_configured': len(user_config.feeds) if user_config else 0,
+
'directory': user_metadata.directory,
+
'created': user_metadata.created.isoformat() if user_metadata.created else None,
+
'last_updated': user_metadata.last_updated.isoformat() if user_metadata.last_updated else None,
+
'latest_entry': entries[0].updated.isoformat() if entries else None,
+
}
+
+
def validate_user_feeds(self, username: str) -> dict:
+
"""Validate all feeds for a user."""
+
user_config = self.get_user(username)
+
if not user_config:
+
return {'error': 'User not found'}
+
+
results = {
+
'username': username,
+
'total_feeds': len(user_config.feeds),
+
'valid_feeds': [],
+
'invalid_feeds': [],
+
}
+
+
for feed_url in user_config.feeds:
+
try:
+
# Basic URL validation - more comprehensive validation would require fetching
+
HttpUrl(str(feed_url))
+
results['valid_feeds'].append(str(feed_url))
+
except ValidationError as e:
+
results['invalid_feeds'].append({
+
'url': str(feed_url),
+
'error': str(e)
+
})
+
+
results['is_valid'] = len(results['invalid_feeds']) == 0
+
+
return results
+
+
def sync_config_with_git_store(self) -> bool:
+
"""Sync configuration users with git store."""
+
try:
+
for user_config in self.config.users:
+
git_user = self.git_store.get_user(user_config.username)
+
if not git_user:
+
# Add missing user to git store
+
self.git_store.add_user(
+
username=user_config.username,
+
display_name=user_config.display_name,
+
email=str(user_config.email) if user_config.email else None,
+
homepage=str(user_config.homepage) if user_config.homepage else None,
+
icon=str(user_config.icon) if user_config.icon else None,
+
feeds=[str(feed) for feed in user_config.feeds]
+
)
+
return True
+
except Exception:
+
return False
+31
src/thicket/templates/base.html
···
+
<!DOCTYPE html>
+
<html lang="en">
+
<head>
+
<meta charset="UTF-8">
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
+
<title>{% block page_title %}{{ title }}{% endblock %}</title>
+
<link rel="stylesheet" href="css/style.css">
+
</head>
+
<body>
+
<header class="site-header">
+
<div class="header-content">
+
<h1 class="site-title">{{ title }}</h1>
+
<nav class="site-nav">
+
<a href="timeline.html" class="nav-link {% if page == 'timeline' %}active{% endif %}">Timeline</a>
+
<a href="links.html" class="nav-link {% if page == 'links' %}active{% endif %}">Links</a>
+
<a href="users.html" class="nav-link {% if page == 'users' %}active{% endif %}">Users</a>
+
</nav>
+
</div>
+
</header>
+
+
<main class="main-content">
+
{% block content %}{% endblock %}
+
</main>
+
+
<footer class="site-footer">
+
<p>Generated on {{ generated_at }} by <a href="https://github.com/avsm/thicket">Thicket</a></p>
+
</footer>
+
+
<script src="js/script.js"></script>
+
</body>
+
</html>
+13
src/thicket/templates/index.html
···
+
<!DOCTYPE html>
+
<html lang="en">
+
<head>
+
<meta charset="UTF-8">
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
+
<title>{{ title }}</title>
+
<meta http-equiv="refresh" content="0; url=timeline.html">
+
<link rel="canonical" href="timeline.html">
+
</head>
+
<body>
+
<p>Redirecting to <a href="timeline.html">Timeline</a>...</p>
+
</body>
+
</html>
+38
src/thicket/templates/links.html
···
+
{% extends "base.html" %}
+
+
{% block page_title %}Outgoing Links - {{ title }}{% endblock %}
+
+
{% block content %}
+
<div class="page-content">
+
<h2>Outgoing Links</h2>
+
<p class="page-description">External links referenced in blog posts, ordered by most recent reference.</p>
+
+
{% for link in outgoing_links %}
+
<article class="link-group">
+
<h3 class="link-url">
+
<a href="{{ link.url }}" target="_blank">{{ link.url|truncate(80) }}</a>
+
{% if link.target_username %}
+
<span class="target-user">({{ link.target_username }})</span>
+
{% endif %}
+
</h3>
+
<div class="referencing-entries">
+
<span class="ref-count">Referenced in {{ link.entries|length }} post(s):</span>
+
<ul>
+
{% for display_name, entry in link.entries[:5] %}
+
<li>
+
<span class="author">{{ display_name }}</span> -
+
<a href="{{ entry.link }}" target="_blank">{{ entry.title }}</a>
+
<time datetime="{{ entry.updated or entry.published }}">
+
({{ (entry.updated or entry.published).strftime('%Y-%m-%d') }})
+
</time>
+
</li>
+
{% endfor %}
+
{% if link.entries|length > 5 %}
+
<li class="more">... and {{ link.entries|length - 5 }} more</li>
+
{% endif %}
+
</ul>
+
</div>
+
</article>
+
{% endfor %}
+
</div>
+
{% endblock %}
+88
src/thicket/templates/script.js
···
+
// Enhanced functionality for thicket website
+
document.addEventListener('DOMContentLoaded', function() {
+
+
// Enhance thread collapsing (optional feature)
+
const threadHeaders = document.querySelectorAll('.thread-header');
+
threadHeaders.forEach(header => {
+
header.style.cursor = 'pointer';
+
header.addEventListener('click', function() {
+
const thread = this.parentElement;
+
const entries = thread.querySelectorAll('.thread-entry');
+
+
// Toggle visibility of all but the first entry
+
for (let i = 1; i < entries.length; i++) {
+
entries[i].style.display = entries[i].style.display === 'none' ? 'block' : 'none';
+
}
+
+
// Update thread count text
+
const count = this.querySelector('.thread-count');
+
if (entries[1] && entries[1].style.display === 'none') {
+
count.textContent = count.textContent.replace('posts', 'posts (collapsed)');
+
} else {
+
count.textContent = count.textContent.replace(' (collapsed)', '');
+
}
+
});
+
});
+
+
// Add relative time display
+
const timeElements = document.querySelectorAll('time');
+
timeElements.forEach(timeEl => {
+
const datetime = new Date(timeEl.getAttribute('datetime'));
+
const now = new Date();
+
const diffMs = now - datetime;
+
const diffDays = Math.floor(diffMs / (1000 * 60 * 60 * 24));
+
+
let relativeTime;
+
if (diffDays === 0) {
+
const diffHours = Math.floor(diffMs / (1000 * 60 * 60));
+
if (diffHours === 0) {
+
const diffMinutes = Math.floor(diffMs / (1000 * 60));
+
relativeTime = diffMinutes === 0 ? 'just now' : `${diffMinutes}m ago`;
+
} else {
+
relativeTime = `${diffHours}h ago`;
+
}
+
} else if (diffDays === 1) {
+
relativeTime = 'yesterday';
+
} else if (diffDays < 7) {
+
relativeTime = `${diffDays}d ago`;
+
} else if (diffDays < 30) {
+
const weeks = Math.floor(diffDays / 7);
+
relativeTime = weeks === 1 ? '1w ago' : `${weeks}w ago`;
+
} else if (diffDays < 365) {
+
const months = Math.floor(diffDays / 30);
+
relativeTime = months === 1 ? '1mo ago' : `${months}mo ago`;
+
} else {
+
const years = Math.floor(diffDays / 365);
+
relativeTime = years === 1 ? '1y ago' : `${years}y ago`;
+
}
+
+
// Add relative time as title attribute
+
timeEl.setAttribute('title', timeEl.textContent);
+
timeEl.textContent = relativeTime;
+
});
+
+
// Enhanced anchor link scrolling for shared references
+
document.querySelectorAll('a[href^="#"]').forEach(anchor => {
+
anchor.addEventListener('click', function (e) {
+
e.preventDefault();
+
const target = document.querySelector(this.getAttribute('href'));
+
if (target) {
+
target.scrollIntoView({
+
behavior: 'smooth',
+
block: 'center'
+
});
+
+
// Highlight the target briefly
+
const timelineEntry = target.closest('.timeline-entry');
+
if (timelineEntry) {
+
timelineEntry.style.outline = '2px solid var(--primary-color)';
+
timelineEntry.style.borderRadius = '8px';
+
setTimeout(() => {
+
timelineEntry.style.outline = '';
+
timelineEntry.style.borderRadius = '';
+
}, 2000);
+
}
+
}
+
});
+
});
+
});
+754
src/thicket/templates/style.css
···
+
/* Modern, clean design with high-density text and readable theme */
+
+
:root {
+
--primary-color: #2c3e50;
+
--secondary-color: #3498db;
+
--accent-color: #e74c3c;
+
--background: #ffffff;
+
--surface: #f8f9fa;
+
--text-primary: #2c3e50;
+
--text-secondary: #7f8c8d;
+
--border-color: #e0e0e0;
+
--thread-indent: 20px;
+
--max-width: 1200px;
+
}
+
+
* {
+
margin: 0;
+
padding: 0;
+
box-sizing: border-box;
+
}
+
+
body {
+
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Helvetica Neue', Arial, sans-serif;
+
font-size: 14px;
+
line-height: 1.6;
+
color: var(--text-primary);
+
background-color: var(--background);
+
}
+
+
/* Header */
+
.site-header {
+
background-color: var(--surface);
+
border-bottom: 1px solid var(--border-color);
+
padding: 0.75rem 0;
+
position: sticky;
+
top: 0;
+
z-index: 100;
+
}
+
+
.header-content {
+
max-width: var(--max-width);
+
margin: 0 auto;
+
padding: 0 2rem;
+
display: flex;
+
justify-content: space-between;
+
align-items: center;
+
}
+
+
.site-title {
+
font-size: 1.5rem;
+
font-weight: 600;
+
color: var(--primary-color);
+
margin: 0;
+
}
+
+
/* Navigation */
+
.site-nav {
+
display: flex;
+
gap: 1.5rem;
+
}
+
+
.nav-link {
+
text-decoration: none;
+
color: var(--text-secondary);
+
font-weight: 500;
+
font-size: 0.95rem;
+
padding: 0.5rem 0.75rem;
+
border-radius: 4px;
+
transition: all 0.2s ease;
+
}
+
+
.nav-link:hover {
+
color: var(--primary-color);
+
background-color: var(--background);
+
}
+
+
.nav-link.active {
+
color: var(--secondary-color);
+
background-color: var(--background);
+
font-weight: 600;
+
}
+
+
/* Main Content */
+
.main-content {
+
max-width: var(--max-width);
+
margin: 2rem auto;
+
padding: 0 2rem;
+
}
+
+
.page-content {
+
margin: 0;
+
}
+
+
.page-description {
+
color: var(--text-secondary);
+
margin-bottom: 1.5rem;
+
font-style: italic;
+
}
+
+
/* Sections */
+
section {
+
margin-bottom: 2rem;
+
}
+
+
h2 {
+
font-size: 1.3rem;
+
font-weight: 600;
+
margin-bottom: 0.75rem;
+
color: var(--primary-color);
+
}
+
+
h3 {
+
font-size: 1.1rem;
+
font-weight: 600;
+
margin-bottom: 0.75rem;
+
color: var(--primary-color);
+
}
+
+
/* Entries and Threads */
+
article {
+
margin-bottom: 1.5rem;
+
padding: 1rem;
+
background-color: var(--surface);
+
border-radius: 4px;
+
border: 1px solid var(--border-color);
+
}
+
+
/* Timeline-style entries */
+
.timeline-entry {
+
margin-bottom: 0.5rem;
+
padding: 0.5rem 0.75rem;
+
border: none;
+
background: transparent;
+
transition: background-color 0.2s ease;
+
}
+
+
.timeline-entry:hover {
+
background-color: var(--surface);
+
}
+
+
.timeline-meta {
+
display: inline-flex;
+
gap: 0.5rem;
+
align-items: center;
+
font-size: 0.75rem;
+
color: var(--text-secondary);
+
margin-bottom: 0.25rem;
+
}
+
+
.timeline-time {
+
font-family: 'SF Mono', Monaco, Consolas, 'Courier New', monospace;
+
font-size: 0.75rem;
+
color: var(--text-secondary);
+
}
+
+
.timeline-author {
+
font-weight: 600;
+
color: var(--primary-color);
+
font-size: 0.8rem;
+
text-decoration: none;
+
}
+
+
.timeline-author:hover {
+
color: var(--secondary-color);
+
text-decoration: underline;
+
}
+
+
.timeline-content {
+
line-height: 1.4;
+
}
+
+
.timeline-title {
+
font-size: 0.95rem;
+
font-weight: 600;
+
}
+
+
.timeline-title a {
+
color: var(--primary-color);
+
text-decoration: none;
+
}
+
+
.timeline-title a:hover {
+
color: var(--secondary-color);
+
text-decoration: underline;
+
}
+
+
.timeline-summary {
+
color: var(--text-secondary);
+
font-size: 0.9rem;
+
line-height: 1.4;
+
}
+
+
/* Legacy styles for other sections */
+
.entry-meta, .thread-header {
+
display: flex;
+
gap: 1rem;
+
align-items: center;
+
margin-bottom: 0.5rem;
+
font-size: 0.85rem;
+
color: var(--text-secondary);
+
}
+
+
.author {
+
font-weight: 600;
+
color: var(--primary-color);
+
}
+
+
time {
+
font-size: 0.85rem;
+
}
+
+
h4 {
+
font-size: 1.1rem;
+
font-weight: 600;
+
margin-bottom: 0.5rem;
+
}
+
+
h4 a {
+
color: var(--primary-color);
+
text-decoration: none;
+
}
+
+
h4 a:hover {
+
color: var(--secondary-color);
+
text-decoration: underline;
+
}
+
+
.entry-summary {
+
color: var(--text-primary);
+
line-height: 1.5;
+
margin-top: 0.5rem;
+
}
+
+
/* Enhanced Threading Styles */
+
+
/* Conversation Clusters */
+
.conversation-cluster {
+
background-color: var(--background);
+
border: 2px solid var(--border-color);
+
border-radius: 8px;
+
margin-bottom: 2rem;
+
overflow: hidden;
+
box-shadow: 0 2px 4px rgba(0, 0, 0, 0.05);
+
}
+
+
.conversation-header {
+
background: linear-gradient(135deg, var(--surface) 0%, #f1f3f4 100%);
+
padding: 0.75rem 1rem;
+
border-bottom: 1px solid var(--border-color);
+
}
+
+
.conversation-meta {
+
display: flex;
+
justify-content: space-between;
+
align-items: center;
+
flex-wrap: wrap;
+
gap: 0.5rem;
+
}
+
+
.conversation-count {
+
font-weight: 600;
+
color: var(--secondary-color);
+
font-size: 0.9rem;
+
}
+
+
.conversation-participants {
+
font-size: 0.8rem;
+
color: var(--text-secondary);
+
flex: 1;
+
text-align: right;
+
}
+
+
.conversation-flow {
+
padding: 0.5rem;
+
}
+
+
/* Threaded Conversation Entries */
+
.conversation-entry {
+
position: relative;
+
margin-bottom: 0.75rem;
+
display: flex;
+
align-items: flex-start;
+
}
+
+
.conversation-entry.level-0 {
+
margin-left: 0;
+
}
+
+
.conversation-entry.level-1 {
+
margin-left: 1.5rem;
+
}
+
+
.conversation-entry.level-2 {
+
margin-left: 3rem;
+
}
+
+
.conversation-entry.level-3 {
+
margin-left: 4.5rem;
+
}
+
+
.conversation-entry.level-4 {
+
margin-left: 6rem;
+
}
+
+
.entry-connector {
+
width: 3px;
+
background-color: var(--secondary-color);
+
margin-right: 0.75rem;
+
margin-top: 0.25rem;
+
min-height: 2rem;
+
border-radius: 2px;
+
opacity: 0.6;
+
}
+
+
.conversation-entry.level-0 .entry-connector {
+
background-color: var(--accent-color);
+
opacity: 0.8;
+
}
+
+
.entry-content {
+
flex: 1;
+
background-color: var(--surface);
+
padding: 0.75rem;
+
border-radius: 6px;
+
border: 1px solid var(--border-color);
+
transition: all 0.2s ease;
+
}
+
+
.entry-content:hover {
+
border-color: var(--secondary-color);
+
box-shadow: 0 2px 8px rgba(52, 152, 219, 0.1);
+
}
+
+
/* Reference Indicators */
+
.reference-indicators {
+
display: inline-flex;
+
gap: 0.25rem;
+
margin-left: 0.5rem;
+
}
+
+
.ref-out, .ref-in {
+
display: inline-block;
+
width: 1rem;
+
height: 1rem;
+
border-radius: 50%;
+
text-align: center;
+
line-height: 1rem;
+
font-size: 0.7rem;
+
font-weight: bold;
+
}
+
+
.ref-out {
+
background-color: #e8f5e8;
+
color: #2d8f2d;
+
}
+
+
.ref-in {
+
background-color: #e8f0ff;
+
color: #1f5fbf;
+
}
+
+
/* Reference Badges for Individual Posts */
+
.timeline-entry.with-references {
+
background-color: var(--surface);
+
}
+
+
/* Conversation posts in unified timeline */
+
.timeline-entry.conversation-post {
+
background: transparent;
+
border: none;
+
margin-bottom: 0.5rem;
+
padding: 0.5rem 0.75rem;
+
}
+
+
.timeline-entry.conversation-post.level-0 {
+
margin-left: 0;
+
border-left: 2px solid var(--accent-color);
+
padding-left: 0.75rem;
+
}
+
+
.timeline-entry.conversation-post.level-1 {
+
margin-left: 1.5rem;
+
border-left: 2px solid var(--secondary-color);
+
padding-left: 0.75rem;
+
}
+
+
.timeline-entry.conversation-post.level-2 {
+
margin-left: 3rem;
+
border-left: 2px solid var(--text-secondary);
+
padding-left: 0.75rem;
+
}
+
+
.timeline-entry.conversation-post.level-3 {
+
margin-left: 4.5rem;
+
border-left: 2px solid var(--text-secondary);
+
padding-left: 0.75rem;
+
}
+
+
.timeline-entry.conversation-post.level-4 {
+
margin-left: 6rem;
+
border-left: 2px solid var(--text-secondary);
+
padding-left: 0.75rem;
+
}
+
+
/* Cross-thread linking */
+
.cross-thread-links {
+
margin-top: 0.5rem;
+
padding-top: 0.5rem;
+
border-top: 1px solid var(--border-color);
+
}
+
+
.cross-thread-indicator {
+
font-size: 0.75rem;
+
color: var(--text-secondary);
+
background-color: var(--surface);
+
padding: 0.25rem 0.5rem;
+
border-radius: 12px;
+
border: 1px solid var(--border-color);
+
display: inline-block;
+
}
+
+
/* Inline shared references styling */
+
.inline-shared-refs {
+
margin-left: 0.5rem;
+
font-size: 0.85rem;
+
color: var(--text-secondary);
+
}
+
+
.shared-ref-link {
+
color: var(--primary-color);
+
text-decoration: none;
+
font-weight: 500;
+
transition: color 0.2s ease;
+
}
+
+
.shared-ref-link:hover {
+
color: var(--secondary-color);
+
text-decoration: underline;
+
}
+
+
.shared-ref-more {
+
font-style: italic;
+
color: var(--text-secondary);
+
font-size: 0.8rem;
+
margin-left: 0.25rem;
+
}
+
+
.user-anchor, .post-anchor {
+
position: absolute;
+
margin-top: -60px; /* Offset for fixed header */
+
pointer-events: none;
+
}
+
+
.cross-thread-link {
+
color: var(--primary-color);
+
text-decoration: none;
+
font-weight: 500;
+
transition: color 0.2s ease;
+
}
+
+
.cross-thread-link:hover {
+
color: var(--secondary-color);
+
text-decoration: underline;
+
}
+
+
.reference-badges {
+
display: flex;
+
gap: 0.25rem;
+
margin-left: 0.5rem;
+
flex-wrap: wrap;
+
}
+
+
.ref-badge {
+
display: inline-block;
+
padding: 0.1rem 0.4rem;
+
border-radius: 12px;
+
font-size: 0.7rem;
+
font-weight: 600;
+
text-transform: uppercase;
+
letter-spacing: 0.05em;
+
}
+
+
.ref-badge.ref-outbound {
+
background-color: #e8f5e8;
+
color: #2d8f2d;
+
border: 1px solid #c3e6c3;
+
}
+
+
.ref-badge.ref-inbound {
+
background-color: #e8f0ff;
+
color: #1f5fbf;
+
border: 1px solid #b3d9ff;
+
}
+
+
/* Author Color Coding */
+
.timeline-author {
+
position: relative;
+
}
+
+
.timeline-author::before {
+
content: '';
+
display: inline-block;
+
width: 8px;
+
height: 8px;
+
border-radius: 50%;
+
margin-right: 0.5rem;
+
background-color: var(--secondary-color);
+
}
+
+
/* Generate consistent colors for authors */
+
.author-avsm::before { background-color: #e74c3c; }
+
.author-mort::before { background-color: #3498db; }
+
.author-mte::before { background-color: #2ecc71; }
+
.author-ryan::before { background-color: #f39c12; }
+
.author-mwd::before { background-color: #9b59b6; }
+
.author-dra::before { background-color: #1abc9c; }
+
.author-pf341::before { background-color: #34495e; }
+
.author-sadiqj::before { background-color: #e67e22; }
+
.author-martinkl::before { background-color: #8e44ad; }
+
.author-jonsterling::before { background-color: #27ae60; }
+
.author-jon::before { background-color: #f1c40f; }
+
.author-onkar::before { background-color: #e91e63; }
+
.author-gabriel::before { background-color: #00bcd4; }
+
.author-jess::before { background-color: #ff5722; }
+
.author-ibrahim::before { background-color: #607d8b; }
+
.author-andres::before { background-color: #795548; }
+
.author-eeg::before { background-color: #ff9800; }
+
+
/* Section Headers */
+
.conversations-section h3,
+
.referenced-posts-section h3,
+
.individual-posts-section h3 {
+
border-bottom: 2px solid var(--border-color);
+
padding-bottom: 0.5rem;
+
margin-bottom: 1.5rem;
+
position: relative;
+
}
+
+
.conversations-section h3::before {
+
content: "๐Ÿ’ฌ";
+
margin-right: 0.5rem;
+
}
+
+
.referenced-posts-section h3::before {
+
content: "๐Ÿ”—";
+
margin-right: 0.5rem;
+
}
+
+
.individual-posts-section h3::before {
+
content: "๐Ÿ“";
+
margin-right: 0.5rem;
+
}
+
+
/* Legacy thread styles (for backward compatibility) */
+
.thread {
+
background-color: var(--background);
+
border: 1px solid var(--border-color);
+
padding: 0;
+
overflow: hidden;
+
margin-bottom: 1rem;
+
}
+
+
.thread-header {
+
background-color: var(--surface);
+
padding: 0.5rem 0.75rem;
+
border-bottom: 1px solid var(--border-color);
+
}
+
+
.thread-count {
+
font-weight: 600;
+
color: var(--secondary-color);
+
}
+
+
.thread-entry {
+
padding: 0.5rem 0.75rem;
+
border-bottom: 1px solid var(--border-color);
+
}
+
+
.thread-entry:last-child {
+
border-bottom: none;
+
}
+
+
.thread-entry.reply {
+
margin-left: var(--thread-indent);
+
border-left: 3px solid var(--secondary-color);
+
background-color: var(--surface);
+
}
+
+
/* Links Section */
+
.link-group {
+
background-color: var(--background);
+
}
+
+
.link-url {
+
font-size: 1rem;
+
word-break: break-word;
+
}
+
+
.link-url a {
+
color: var(--secondary-color);
+
text-decoration: none;
+
}
+
+
.link-url a:hover {
+
text-decoration: underline;
+
}
+
+
.target-user {
+
font-size: 0.9rem;
+
color: var(--text-secondary);
+
font-weight: normal;
+
}
+
+
.referencing-entries {
+
margin-top: 0.75rem;
+
}
+
+
.ref-count {
+
font-weight: 600;
+
color: var(--text-secondary);
+
font-size: 0.9rem;
+
}
+
+
.referencing-entries ul {
+
list-style: none;
+
margin-top: 0.5rem;
+
padding-left: 1rem;
+
}
+
+
.referencing-entries li {
+
margin-bottom: 0.25rem;
+
font-size: 0.9rem;
+
}
+
+
.referencing-entries .more {
+
font-style: italic;
+
color: var(--text-secondary);
+
}
+
+
/* Users Section */
+
.user-card {
+
background-color: var(--background);
+
}
+
+
.user-header {
+
display: flex;
+
gap: 1rem;
+
align-items: start;
+
margin-bottom: 1rem;
+
}
+
+
.user-icon {
+
width: 48px;
+
height: 48px;
+
border-radius: 50%;
+
object-fit: cover;
+
}
+
+
.user-info h3 {
+
margin-bottom: 0.25rem;
+
}
+
+
.username {
+
font-size: 0.9rem;
+
color: var(--text-secondary);
+
font-weight: normal;
+
}
+
+
.user-meta {
+
font-size: 0.9rem;
+
color: var(--text-secondary);
+
}
+
+
.user-meta a {
+
color: var(--secondary-color);
+
text-decoration: none;
+
}
+
+
.user-meta a:hover {
+
text-decoration: underline;
+
}
+
+
.separator {
+
margin: 0 0.5rem;
+
}
+
+
.post-count {
+
font-weight: 600;
+
}
+
+
.user-recent h4 {
+
font-size: 0.95rem;
+
margin-bottom: 0.5rem;
+
color: var(--text-secondary);
+
}
+
+
.user-recent ul {
+
list-style: none;
+
padding-left: 0;
+
}
+
+
.user-recent li {
+
margin-bottom: 0.25rem;
+
font-size: 0.9rem;
+
}
+
+
/* Footer */
+
.site-footer {
+
max-width: var(--max-width);
+
margin: 3rem auto 2rem;
+
padding: 1rem 2rem;
+
text-align: center;
+
color: var(--text-secondary);
+
font-size: 0.85rem;
+
border-top: 1px solid var(--border-color);
+
}
+
+
.site-footer a {
+
color: var(--secondary-color);
+
text-decoration: none;
+
}
+
+
.site-footer a:hover {
+
text-decoration: underline;
+
}
+
+
/* Responsive */
+
@media (max-width: 768px) {
+
.site-title {
+
font-size: 1.3rem;
+
}
+
+
.header-content {
+
flex-direction: column;
+
gap: 0.75rem;
+
align-items: flex-start;
+
}
+
+
.site-nav {
+
gap: 1rem;
+
}
+
+
.main-content {
+
padding: 0 1rem;
+
}
+
+
.thread-entry.reply {
+
margin-left: calc(var(--thread-indent) / 2);
+
}
+
+
.user-header {
+
flex-direction: column;
+
}
+
}
+141
src/thicket/templates/timeline.html
···
+
{% extends "base.html" %}
+
+
{% block page_title %}Timeline - {{ title }}{% endblock %}
+
+
{% block content %}
+
{% set seen_users = [] %}
+
<div class="page-content">
+
<h2>Recent Posts & Conversations</h2>
+
+
<section class="unified-timeline">
+
{% for item in timeline_items %}
+
{% if item.type == "post" %}
+
<!-- Individual Post -->
+
<article class="timeline-entry {% if item.content.references %}with-references{% endif %}">
+
<div class="timeline-meta">
+
<time datetime="{{ item.content.entry.updated or item.content.entry.published }}" class="timeline-time">
+
{{ (item.content.entry.updated or item.content.entry.published).strftime('%Y-%m-%d %H:%M') }}
+
</time>
+
{% set homepage = get_user_homepage(item.content.username) %}
+
{% if item.content.username not in seen_users %}
+
<a id="{{ item.content.username }}" class="user-anchor"></a>
+
{% set _ = seen_users.append(item.content.username) %}
+
{% endif %}
+
<a id="post-{{ loop.index0 }}-{{ safe_anchor_id(item.content.entry.id) }}" class="post-anchor"></a>
+
{% if homepage %}
+
<a href="{{ homepage }}" target="_blank" class="timeline-author">{{ item.content.display_name }}</a>
+
{% else %}
+
<span class="timeline-author">{{ item.content.display_name }}</span>
+
{% endif %}
+
{% if item.content.references %}
+
<div class="reference-badges">
+
{% for ref in item.content.references %}
+
{% if ref.type == 'outbound' %}
+
<span class="ref-badge ref-outbound" title="References {{ ref.target_username or 'external post' }}">
+
โ†’ {{ ref.target_username or 'ext' }}
+
</span>
+
{% elif ref.type == 'inbound' %}
+
<span class="ref-badge ref-inbound" title="Referenced by {{ ref.source_username or 'external post' }}">
+
โ† {{ ref.source_username or 'ext' }}
+
</span>
+
{% endif %}
+
{% endfor %}
+
</div>
+
{% endif %}
+
</div>
+
<div class="timeline-content">
+
<strong class="timeline-title">
+
<a href="{{ item.content.entry.link }}" target="_blank">{{ item.content.entry.title }}</a>
+
</strong>
+
{% if item.content.entry.summary %}
+
<span class="timeline-summary">โ€” {{ clean_html_summary(item.content.entry.summary, 250) }}</span>
+
{% endif %}
+
{% if item.content.shared_references %}
+
<span class="inline-shared-refs">
+
{% for ref in item.content.shared_references[:3] %}
+
{% if ref.target_username %}
+
<a href="#{{ ref.target_username }}" class="shared-ref-link" title="Referenced by {{ ref.count }} entries">@{{ ref.target_username }}</a>{% if not loop.last %}, {% endif %}
+
{% endif %}
+
{% endfor %}
+
{% if item.content.shared_references|length > 3 %}
+
<span class="shared-ref-more">+{{ item.content.shared_references|length - 3 }} more</span>
+
{% endif %}
+
</span>
+
{% endif %}
+
{% if item.content.cross_thread_links %}
+
<div class="cross-thread-links">
+
<span class="cross-thread-indicator">๐Ÿ”— Also appears: </span>
+
{% for link in item.content.cross_thread_links %}
+
<a href="#{{ link.anchor_id }}" class="cross-thread-link" title="{{ link.title }}">{{ link.context }}</a>{% if not loop.last %}, {% endif %}
+
{% endfor %}
+
</div>
+
{% endif %}
+
</div>
+
</article>
+
+
{% elif item.type == "thread" %}
+
<!-- Conversation Thread -->
+
{% set outer_loop_index = loop.index0 %}
+
{% for thread_item in item.content %}
+
<article class="timeline-entry conversation-post level-{{ thread_item.thread_level }}">
+
<div class="timeline-meta">
+
<time datetime="{{ thread_item.entry.updated or thread_item.entry.published }}" class="timeline-time">
+
{{ (thread_item.entry.updated or thread_item.entry.published).strftime('%Y-%m-%d %H:%M') }}
+
</time>
+
{% set homepage = get_user_homepage(thread_item.username) %}
+
{% if thread_item.username not in seen_users %}
+
<a id="{{ thread_item.username }}" class="user-anchor"></a>
+
{% set _ = seen_users.append(thread_item.username) %}
+
{% endif %}
+
<a id="post-{{ outer_loop_index }}-{{ loop.index0 }}-{{ safe_anchor_id(thread_item.entry.id) }}" class="post-anchor"></a>
+
{% if homepage %}
+
<a href="{{ homepage }}" target="_blank" class="timeline-author author-{{ thread_item.username }}">{{ thread_item.display_name }}</a>
+
{% else %}
+
<span class="timeline-author author-{{ thread_item.username }}">{{ thread_item.display_name }}</span>
+
{% endif %}
+
{% if thread_item.references_to or thread_item.referenced_by %}
+
<span class="reference-indicators">
+
{% if thread_item.references_to %}
+
<span class="ref-out" title="References other posts">โ†’</span>
+
{% endif %}
+
{% if thread_item.referenced_by %}
+
<span class="ref-in" title="Referenced by other posts">โ†</span>
+
{% endif %}
+
</span>
+
{% endif %}
+
</div>
+
<div class="timeline-content">
+
<strong class="timeline-title">
+
<a href="{{ thread_item.entry.link }}" target="_blank">{{ thread_item.entry.title }}</a>
+
</strong>
+
{% if thread_item.entry.summary %}
+
<span class="timeline-summary">โ€” {{ clean_html_summary(thread_item.entry.summary, 300) }}</span>
+
{% endif %}
+
{% if thread_item.shared_references %}
+
<span class="inline-shared-refs">
+
{% for ref in thread_item.shared_references[:3] %}
+
{% if ref.target_username %}
+
<a href="#{{ ref.target_username }}" class="shared-ref-link" title="Referenced by {{ ref.count }} entries">@{{ ref.target_username }}</a>{% if not loop.last %}, {% endif %}
+
{% endif %}
+
{% endfor %}
+
{% if thread_item.shared_references|length > 3 %}
+
<span class="shared-ref-more">+{{ thread_item.shared_references|length - 3 }} more</span>
+
{% endif %}
+
</span>
+
{% endif %}
+
{% if thread_item.cross_thread_links %}
+
<div class="cross-thread-links">
+
<span class="cross-thread-indicator">๐Ÿ”— Also appears: </span>
+
{% for link in thread_item.cross_thread_links %}
+
<a href="#{{ link.anchor_id }}" class="cross-thread-link" title="{{ link.title }}">{{ link.context }}</a>{% if not loop.last %}, {% endif %}
+
{% endfor %}
+
</div>
+
{% endif %}
+
</div>
+
</article>
+
{% endfor %}
+
{% endif %}
+
{% endfor %}
+
</section>
+
</div>
+
{% endblock %}
+169
src/thicket/templates/user_detail.html
···
+
{% extends "base.html" %}
+
+
{% block title %}{{ title }} - Thicket{% endblock %}
+
+
{% block content %}
+
<div class="container mx-auto px-4 py-8">
+
<div class="max-w-4xl mx-auto">
+
<!-- User Header -->
+
<div class="bg-white rounded-lg shadow-md p-6 mb-6">
+
<div class="flex items-center space-x-4">
+
{% if user_config and user_config.icon %}
+
<img src="{{ user_config.icon }}" alt="{{ title }}" class="w-16 h-16 rounded-full">
+
{% else %}
+
<div class="w-16 h-16 rounded-full bg-blue-500 flex items-center justify-center text-white text-xl font-bold">
+
{{ user_metadata.username[0].upper() }}
+
</div>
+
{% endif %}
+
+
<div>
+
<h1 class="text-2xl font-bold text-gray-900">{{ title }}</h1>
+
<p class="text-gray-600">@{{ user_metadata.username }}</p>
+
{% if user_config and user_config.email %}
+
<p class="text-sm text-gray-500">{{ user_config.email }}</p>
+
{% endif %}
+
</div>
+
</div>
+
+
{% if user_config and user_config.homepage %}
+
<div class="mt-4">
+
<a href="{{ user_config.homepage }}" class="text-blue-600 hover:text-blue-800" target="_blank">
+
๐Ÿ  Homepage
+
</a>
+
</div>
+
{% endif %}
+
+
<div class="mt-4 grid grid-cols-2 md:grid-cols-4 gap-4">
+
<div class="text-center">
+
<div class="text-2xl font-bold text-blue-600">{{ user_metadata.entry_count }}</div>
+
<div class="text-sm text-gray-500">Entries</div>
+
</div>
+
+
{% if user_config %}
+
<div class="text-center">
+
<div class="text-2xl font-bold text-green-600">{{ user_config.feeds|length }}</div>
+
<div class="text-sm text-gray-500">Feeds</div>
+
</div>
+
{% endif %}
+
+
<div class="text-center">
+
<div class="text-2xl font-bold text-purple-600">{{ user_links|length }}</div>
+
<div class="text-sm text-gray-500">Link Groups</div>
+
</div>
+
+
<div class="text-center">
+
<div class="text-sm text-gray-500">Member since</div>
+
<div class="text-sm font-medium">{{ user_metadata.created.strftime('%Y-%m-%d') if user_metadata.created else 'Unknown' }}</div>
+
</div>
+
</div>
+
</div>
+
+
<!-- Feeds -->
+
{% if user_config and user_config.feeds %}
+
<div class="bg-white rounded-lg shadow-md p-6 mb-6">
+
<h2 class="text-xl font-semibold mb-4">Feeds</h2>
+
<div class="space-y-2">
+
{% for feed in user_config.feeds %}
+
<div class="flex items-center space-x-2">
+
<span class="text-green-500">๐Ÿ“ก</span>
+
<a href="{{ feed }}" class="text-blue-600 hover:text-blue-800" target="_blank">{{ feed }}</a>
+
</div>
+
{% endfor %}
+
</div>
+
</div>
+
{% endif %}
+
+
<!-- Recent Entries -->
+
<div class="bg-white rounded-lg shadow-md p-6 mb-6">
+
<h2 class="text-xl font-semibold mb-4">Recent Entries</h2>
+
+
{% if entries %}
+
<div class="space-y-4">
+
{% for entry in entries[:10] %}
+
<div class="border-l-4 border-blue-500 pl-4 py-2">
+
<h3 class="font-semibold text-lg">
+
<a href="{{ entry.link }}" class="text-blue-600 hover:text-blue-800" target="_blank">
+
{{ entry.title }}
+
</a>
+
</h3>
+
+
<div class="text-sm text-gray-500 mb-2">
+
{% if entry.published %}
+
Published: {{ entry.published.strftime('%Y-%m-%d %H:%M') }}
+
{% endif %}
+
{% if entry.updated and entry.updated != entry.published %}
+
โ€ข Updated: {{ entry.updated.strftime('%Y-%m-%d %H:%M') }}
+
{% endif %}
+
</div>
+
+
{% if entry.summary %}
+
<div class="text-gray-700 mb-2">
+
{{ entry.summary|truncate(200) }}
+
</div>
+
{% endif %}
+
+
{% if entry.categories %}
+
<div class="flex flex-wrap gap-1">
+
{% for category in entry.categories %}
+
<span class="px-2 py-1 bg-blue-100 text-blue-800 text-xs rounded">{{ category }}</span>
+
{% endfor %}
+
</div>
+
{% endif %}
+
</div>
+
{% endfor %}
+
</div>
+
+
{% if entries|length > 10 %}
+
<div class="mt-4 text-center">
+
<p class="text-gray-500">Showing 10 of {{ entries|length }} entries</p>
+
</div>
+
{% endif %}
+
+
{% else %}
+
<p class="text-gray-500">No entries found.</p>
+
{% endif %}
+
</div>
+
+
<!-- Links Summary -->
+
{% if user_links %}
+
<div class="bg-white rounded-lg shadow-md p-6">
+
<h2 class="text-xl font-semibold mb-4">Link Activity</h2>
+
+
<div class="space-y-3">
+
{% for link_group in user_links[:5] %}
+
<div class="border-l-4 border-green-500 pl-4">
+
<h3 class="font-medium">{{ link_group.title }}</h3>
+
<div class="text-sm text-gray-500 mb-2">
+
{{ link_group.links|length }} link(s) found
+
</div>
+
+
<div class="space-y-1">
+
{% for link in link_group.links[:3] %}
+
<div class="text-sm">
+
<a href="{{ link.url }}" class="text-blue-600 hover:text-blue-800" target="_blank">
+
{{ link.text or link.url }}
+
</a>
+
<span class="text-gray-400 ml-2">({{ link.type }})</span>
+
</div>
+
{% endfor %}
+
+
{% if link_group.links|length > 3 %}
+
<div class="text-sm text-gray-500">
+
... and {{ link_group.links|length - 3 }} more
+
</div>
+
{% endif %}
+
</div>
+
</div>
+
{% endfor %}
+
</div>
+
+
{% if user_links|length > 5 %}
+
<div class="mt-4 text-center">
+
<p class="text-gray-500">Showing 5 of {{ user_links|length }} entries with links</p>
+
</div>
+
{% endif %}
+
</div>
+
{% endif %}
+
</div>
+
</div>
+
{% endblock %}
+57
src/thicket/templates/users.html
···
+
{% extends "base.html" %}
+
+
{% block page_title %}Users - {{ title }}{% endblock %}
+
+
{% block content %}
+
<div class="page-content">
+
<h2>Users</h2>
+
<p class="page-description">All users contributing to this thicket, ordered by post count.</p>
+
+
{% for user_info in users %}
+
<article class="user-card">
+
<div class="user-header">
+
{% if user_info.metadata.icon and user_info.metadata.icon != "None" %}
+
<img src="{{ user_info.metadata.icon }}" alt="{{ user_info.metadata.username }}" class="user-icon">
+
{% endif %}
+
<div class="user-info">
+
<h3>
+
{% if user_info.metadata.display_name %}
+
{{ user_info.metadata.display_name }}
+
<span class="username">({{ user_info.metadata.username }})</span>
+
{% else %}
+
{{ user_info.metadata.username }}
+
{% endif %}
+
</h3>
+
<div class="user-meta">
+
{% if user_info.metadata.homepage %}
+
<a href="{{ user_info.metadata.homepage }}" target="_blank">{{ user_info.metadata.homepage }}</a>
+
{% endif %}
+
{% if user_info.metadata.email %}
+
<span class="separator">โ€ข</span>
+
<a href="mailto:{{ user_info.metadata.email }}">{{ user_info.metadata.email }}</a>
+
{% endif %}
+
<span class="separator">โ€ข</span>
+
<span class="post-count">{{ user_info.metadata.entry_count }} posts</span>
+
</div>
+
</div>
+
</div>
+
+
{% if user_info.recent_entries %}
+
<div class="user-recent">
+
<h4>Recent posts:</h4>
+
<ul>
+
{% for display_name, entry in user_info.recent_entries %}
+
<li>
+
<a href="{{ entry.link }}" target="_blank">{{ entry.title }}</a>
+
<time datetime="{{ entry.updated or entry.published }}">
+
({{ (entry.updated or entry.published).strftime('%Y-%m-%d') }})
+
</time>
+
</li>
+
{% endfor %}
+
</ul>
+
</div>
+
{% endif %}
+
</article>
+
{% endfor %}
+
</div>
+
{% endblock %}
+230
src/thicket/thicket.py
···
+
"""Main Thicket library class providing unified API."""
+
+
import asyncio
+
from datetime import datetime
+
from pathlib import Path
+
from typing import Optional, Union
+
+
from pydantic import HttpUrl
+
+
from .core.feed_parser import FeedParser
+
from .core.git_store import GitStore
+
from .models import AtomEntry, ThicketConfig, UserConfig
+
from .subsystems.feeds import FeedManager
+
from .subsystems.links import LinkProcessor
+
from .subsystems.repository import RepositoryManager
+
from .subsystems.site import SiteGenerator
+
from .subsystems.users import UserManager
+
+
+
class Thicket:
+
"""
+
Main Thicket class providing unified API for feed management.
+
+
This class serves as the primary interface for all Thicket operations,
+
consolidating configuration, repository management, feed processing,
+
user management, link processing, and site generation.
+
"""
+
+
def __init__(self, config: Union[ThicketConfig, Path, str]):
+
"""
+
Initialize Thicket with configuration.
+
+
Args:
+
config: Either a ThicketConfig object, or path to config file
+
"""
+
if isinstance(config, (Path, str)):
+
self.config = ThicketConfig.from_file(Path(config))
+
else:
+
self.config = config
+
+
# Initialize subsystems
+
self._init_subsystems()
+
+
def _init_subsystems(self):
+
"""Initialize all subsystems."""
+
# Core components
+
self.git_store = GitStore(self.config.git_store)
+
self.feed_parser = FeedParser()
+
+
# Subsystem managers
+
self.repository = RepositoryManager(self.git_store, self.config)
+
self.users = UserManager(self.git_store, self.config)
+
self.feeds = FeedManager(self.git_store, self.feed_parser, self.config)
+
self.links = LinkProcessor(self.git_store, self.config)
+
self.site = SiteGenerator(self.git_store, self.config)
+
+
@classmethod
+
def create(cls, git_store: Path, cache_dir: Path, users: Optional[list[UserConfig]] = None) -> 'Thicket':
+
"""
+
Create a new Thicket instance with minimal configuration.
+
+
Args:
+
git_store: Path to git repository
+
cache_dir: Path to cache directory
+
users: Optional list of user configurations
+
+
Returns:
+
Configured Thicket instance
+
"""
+
config = ThicketConfig(
+
git_store=git_store,
+
cache_dir=cache_dir,
+
users=users or []
+
)
+
return cls(config)
+
+
@classmethod
+
def from_config_file(cls, config_path: Path) -> 'Thicket':
+
"""Load Thicket from configuration file."""
+
return cls(config_path)
+
+
# User Management API
+
def add_user(self, username: str, feeds: list[str], **kwargs) -> UserConfig:
+
"""Add a new user with feeds."""
+
return self.users.add_user(username, feeds, **kwargs)
+
+
def get_user(self, username: str) -> Optional[UserConfig]:
+
"""Get user configuration."""
+
return self.users.get_user(username)
+
+
def list_users(self) -> list[UserConfig]:
+
"""List all configured users."""
+
return self.users.list_users()
+
+
def update_user(self, username: str, **kwargs) -> bool:
+
"""Update user configuration."""
+
return self.users.update_user(username, **kwargs)
+
+
def remove_user(self, username: str) -> bool:
+
"""Remove a user and their data."""
+
return self.users.remove_user(username)
+
+
# Feed Management API
+
async def sync_feeds(self, username: Optional[str] = None, progress_callback=None) -> dict:
+
"""Sync feeds for user(s)."""
+
return await self.feeds.sync_feeds(username, progress_callback)
+
+
async def sync_user_feeds(self, username: str, progress_callback=None) -> dict:
+
"""Sync feeds for a specific user."""
+
return await self.feeds.sync_user_feeds(username, progress_callback)
+
+
def get_entries(self, username: str, limit: Optional[int] = None) -> list[AtomEntry]:
+
"""Get entries for a user."""
+
return self.feeds.get_entries(username, limit)
+
+
def get_entry(self, username: str, entry_id: str) -> Optional[AtomEntry]:
+
"""Get a specific entry."""
+
return self.feeds.get_entry(username, entry_id)
+
+
def search_entries(self, query: str, username: Optional[str] = None, limit: Optional[int] = None) -> list[tuple[str, AtomEntry]]:
+
"""Search entries across users."""
+
return self.feeds.search_entries(query, username, limit)
+
+
# Repository Management API
+
def init_repository(self) -> bool:
+
"""Initialize the git repository."""
+
return self.repository.init_repository()
+
+
def commit_changes(self, message: str) -> bool:
+
"""Commit all pending changes."""
+
return self.repository.commit_changes(message)
+
+
def get_status(self) -> dict:
+
"""Get repository status and statistics."""
+
return self.repository.get_status()
+
+
def backup_repository(self, backup_path: Path) -> bool:
+
"""Create a backup of the repository."""
+
return self.repository.backup_repository(backup_path)
+
+
# Link Processing API
+
def process_links(self, username: Optional[str] = None) -> dict:
+
"""Process and extract links from entries."""
+
return self.links.process_links(username)
+
+
def get_links(self, username: Optional[str] = None) -> dict:
+
"""Get processed links."""
+
return self.links.get_links(username)
+
+
def find_references(self, url: str) -> list[tuple[str, AtomEntry]]:
+
"""Find entries that reference a URL."""
+
return self.links.find_references(url)
+
+
# Site Generation API
+
def generate_site(self, output_dir: Path, template_dir: Optional[Path] = None) -> bool:
+
"""Generate static site."""
+
return self.site.generate_site(output_dir, template_dir)
+
+
def generate_timeline(self, output_path: Path, limit: Optional[int] = None) -> bool:
+
"""Generate timeline HTML."""
+
return self.site.generate_timeline(output_path, limit)
+
+
def generate_user_pages(self, output_dir: Path) -> bool:
+
"""Generate individual user pages."""
+
return self.site.generate_user_pages(output_dir)
+
+
# Utility Methods
+
def get_stats(self) -> dict:
+
"""Get comprehensive statistics."""
+
base_stats = self.repository.get_status()
+
feed_stats = self.feeds.get_stats()
+
link_stats = self.links.get_stats()
+
+
return {
+
**base_stats,
+
**feed_stats,
+
**link_stats,
+
'config': {
+
'git_store': str(self.config.git_store),
+
'cache_dir': str(self.config.cache_dir),
+
'total_users_configured': len(self.config.users),
+
}
+
}
+
+
async def full_sync(self, progress_callback=None) -> dict:
+
"""Perform a complete sync: feeds -> links -> commit."""
+
results = {}
+
+
# Sync feeds
+
results['feeds'] = await self.sync_feeds(progress_callback=progress_callback)
+
+
# Process links
+
results['links'] = self.process_links()
+
+
# Commit changes
+
message = f"Sync completed at {datetime.now().isoformat()}"
+
results['committed'] = self.commit_changes(message)
+
+
return results
+
+
def validate_config(self) -> list[str]:
+
"""Validate configuration and return any errors."""
+
errors = []
+
+
# Check paths exist
+
if not self.config.git_store.parent.exists():
+
errors.append(f"Git store parent directory does not exist: {self.config.git_store.parent}")
+
+
if not self.config.cache_dir.parent.exists():
+
errors.append(f"Cache directory parent does not exist: {self.config.cache_dir.parent}")
+
+
# Validate user configs
+
for user in self.config.users:
+
if not user.feeds:
+
errors.append(f"User {user.username} has no feeds configured")
+
+
for feed_url in user.feeds:
+
# Basic URL validation is handled by pydantic
+
pass
+
+
return errors
+
+
def __enter__(self):
+
"""Context manager entry."""
+
return self
+
+
def __exit__(self, exc_type, exc_val, exc_tb):
+
"""Context manager exit."""
+
# Could add cleanup logic here if needed
+
pass
tests/__init__.py

This is a binary file and will not be displayed.

-84
tests/conftest.py
···
-
"""Test configuration and fixtures for thicket."""
-
-
import tempfile
-
from pathlib import Path
-
-
import pytest
-
-
from thicket.models import ThicketConfig, UserConfig
-
-
-
@pytest.fixture
-
def temp_dir():
-
"""Create a temporary directory for tests."""
-
with tempfile.TemporaryDirectory() as tmp_dir:
-
yield Path(tmp_dir)
-
-
-
@pytest.fixture
-
def sample_config(temp_dir):
-
"""Create a sample configuration for testing."""
-
git_store = temp_dir / "git_store"
-
cache_dir = temp_dir / "cache"
-
-
return ThicketConfig(
-
git_store=git_store,
-
cache_dir=cache_dir,
-
users=[
-
UserConfig(
-
username="testuser",
-
feeds=["https://example.com/feed.xml"],
-
email="test@example.com",
-
display_name="Test User",
-
)
-
],
-
)
-
-
-
@pytest.fixture
-
def sample_atom_feed():
-
"""Sample Atom feed XML for testing."""
-
return """<?xml version="1.0" encoding="utf-8"?>
-
<feed xmlns="http://www.w3.org/2005/Atom">
-
<title>Test Feed</title>
-
<link href="https://example.com/"/>
-
<updated>2025-01-01T00:00:00Z</updated>
-
<author>
-
<name>Test Author</name>
-
<email>author@example.com</email>
-
</author>
-
<id>https://example.com/</id>
-
-
<entry>
-
<title>Test Entry</title>
-
<link href="https://example.com/entry/1"/>
-
<id>https://example.com/entry/1</id>
-
<updated>2025-01-01T00:00:00Z</updated>
-
<summary>This is a test entry.</summary>
-
<content type="html">
-
<![CDATA[<p>This is the content of the test entry.</p>]]>
-
</content>
-
</entry>
-
</feed>"""
-
-
-
@pytest.fixture
-
def sample_rss_feed():
-
"""Sample RSS feed XML for testing."""
-
return """<?xml version="1.0" encoding="UTF-8"?>
-
<rss version="2.0">
-
<channel>
-
<title>Test RSS Feed</title>
-
<link>https://example.com/</link>
-
<description>Test RSS feed for testing</description>
-
<managingEditor>editor@example.com</managingEditor>
-
-
<item>
-
<title>Test RSS Entry</title>
-
<link>https://example.com/rss/entry/1</link>
-
<description>This is a test RSS entry.</description>
-
<pubDate>Mon, 01 Jan 2025 00:00:00 GMT</pubDate>
-
<guid>https://example.com/rss/entry/1</guid>
-
</item>
-
</channel>
-
</rss>"""
-131
tests/test_feed_parser.py
···
-
"""Tests for feed parser functionality."""
-
-
from pydantic import HttpUrl
-
-
from thicket.core.feed_parser import FeedParser
-
from thicket.models import AtomEntry, FeedMetadata
-
-
-
class TestFeedParser:
-
"""Test the FeedParser class."""
-
-
def test_init(self):
-
"""Test parser initialization."""
-
parser = FeedParser()
-
assert parser.user_agent == "thicket/0.1.0"
-
assert "a" in parser.allowed_tags
-
assert "href" in parser.allowed_attributes["a"]
-
-
def test_parse_atom_feed(self, sample_atom_feed):
-
"""Test parsing an Atom feed."""
-
parser = FeedParser()
-
metadata, entries = parser.parse_feed(sample_atom_feed)
-
-
# Check metadata
-
assert isinstance(metadata, FeedMetadata)
-
assert metadata.title == "Test Feed"
-
assert metadata.author_name == "Test Author"
-
assert metadata.author_email == "author@example.com"
-
assert metadata.link == HttpUrl("https://example.com/")
-
-
# Check entries
-
assert len(entries) == 1
-
entry = entries[0]
-
assert isinstance(entry, AtomEntry)
-
assert entry.title == "Test Entry"
-
assert entry.id == "https://example.com/entry/1"
-
assert entry.link == HttpUrl("https://example.com/entry/1")
-
assert entry.summary == "This is a test entry."
-
assert "<p>This is the content of the test entry.</p>" in entry.content
-
-
def test_parse_rss_feed(self, sample_rss_feed):
-
"""Test parsing an RSS feed."""
-
parser = FeedParser()
-
metadata, entries = parser.parse_feed(sample_rss_feed)
-
-
# Check metadata
-
assert isinstance(metadata, FeedMetadata)
-
assert metadata.title == "Test RSS Feed"
-
assert metadata.link == HttpUrl("https://example.com/")
-
assert metadata.author_email == "editor@example.com"
-
-
# Check entries
-
assert len(entries) == 1
-
entry = entries[0]
-
assert isinstance(entry, AtomEntry)
-
assert entry.title == "Test RSS Entry"
-
assert entry.id == "https://example.com/rss/entry/1"
-
assert entry.summary == "This is a test RSS entry."
-
-
def test_sanitize_entry_id(self):
-
"""Test entry ID sanitization."""
-
parser = FeedParser()
-
-
# Test URL ID
-
url_id = "https://example.com/posts/2025/01/test-post"
-
sanitized = parser.sanitize_entry_id(url_id)
-
assert sanitized == "posts_2025_01_test-post"
-
-
# Test problematic characters
-
bad_id = "test/with\\bad:chars|and<more>"
-
sanitized = parser.sanitize_entry_id(bad_id)
-
assert sanitized == "test_with_bad_chars_and_more_"
-
-
# Test empty ID
-
empty_id = ""
-
sanitized = parser.sanitize_entry_id(empty_id)
-
assert sanitized == "entry"
-
-
# Test very long ID
-
long_id = "a" * 300
-
sanitized = parser.sanitize_entry_id(long_id)
-
assert len(sanitized) == 200
-
-
def test_sanitize_html(self):
-
"""Test HTML sanitization."""
-
parser = FeedParser()
-
-
# Test allowed tags
-
safe_html = "<p>This is <strong>safe</strong> HTML</p>"
-
sanitized = parser._sanitize_html(safe_html)
-
assert sanitized == safe_html
-
-
# Test dangerous tags
-
dangerous_html = "<script>alert('xss')</script><p>Safe content</p>"
-
sanitized = parser._sanitize_html(dangerous_html)
-
assert "<script>" not in sanitized
-
assert "<p>Safe content</p>" in sanitized
-
-
# Test attributes
-
html_with_attrs = '<a href="https://example.com" onclick="alert()">Link</a>'
-
sanitized = parser._sanitize_html(html_with_attrs)
-
assert 'href="https://example.com"' in sanitized
-
assert 'onclick' not in sanitized
-
-
def test_extract_feed_metadata(self):
-
"""Test feed metadata extraction."""
-
parser = FeedParser()
-
-
# Test with feedparser parsed data
-
import feedparser
-
parsed = feedparser.parse("""<?xml version="1.0" encoding="utf-8"?>
-
<feed xmlns="http://www.w3.org/2005/Atom">
-
<title>Test Feed</title>
-
<link href="https://example.com/"/>
-
<author>
-
<name>Test Author</name>
-
<email>author@example.com</email>
-
<uri>https://example.com/about</uri>
-
</author>
-
<logo>https://example.com/logo.png</logo>
-
<icon>https://example.com/icon.png</icon>
-
</feed>""")
-
-
metadata = parser._extract_feed_metadata(parsed.feed)
-
assert metadata.title == "Test Feed"
-
assert metadata.author_name == "Test Author"
-
assert metadata.author_email == "author@example.com"
-
assert metadata.author_uri == HttpUrl("https://example.com/about")
-
assert metadata.link == HttpUrl("https://example.com/")
-
assert metadata.logo == HttpUrl("https://example.com/logo.png")
-
assert metadata.icon == HttpUrl("https://example.com/icon.png")
-275
tests/test_git_store.py
···
-
"""Tests for Git store functionality."""
-
-
import json
-
from datetime import datetime
-
-
from pydantic import HttpUrl
-
-
from thicket.core.git_store import GitStore
-
from thicket.models import AtomEntry, DuplicateMap, UserMetadata
-
-
-
class TestGitStore:
-
"""Test the GitStore class."""
-
-
def test_init_new_repo(self, temp_dir):
-
"""Test initializing a new Git repository."""
-
repo_path = temp_dir / "test_repo"
-
store = GitStore(repo_path)
-
-
assert store.repo_path == repo_path
-
assert store.repo is not None
-
assert repo_path.exists()
-
assert (repo_path / ".git").exists()
-
assert (repo_path / "index.json").exists()
-
assert (repo_path / "duplicates.json").exists()
-
-
def test_init_existing_repo(self, temp_dir):
-
"""Test initializing with existing repository."""
-
repo_path = temp_dir / "test_repo"
-
-
# Create first store
-
store1 = GitStore(repo_path)
-
store1.add_user("testuser", display_name="Test User")
-
-
# Create second store pointing to same repo
-
store2 = GitStore(repo_path)
-
user = store2.get_user("testuser")
-
-
assert user is not None
-
assert user.username == "testuser"
-
assert user.display_name == "Test User"
-
-
def test_add_user(self, temp_dir):
-
"""Test adding a user to the Git store."""
-
store = GitStore(temp_dir / "test_repo")
-
-
user = store.add_user(
-
username="testuser",
-
display_name="Test User",
-
email="test@example.com",
-
homepage="https://example.com",
-
icon="https://example.com/icon.png",
-
feeds=["https://example.com/feed.xml"],
-
)
-
-
assert isinstance(user, UserMetadata)
-
assert user.username == "testuser"
-
assert user.display_name == "Test User"
-
assert user.email == "test@example.com"
-
assert user.homepage == "https://example.com"
-
assert user.icon == "https://example.com/icon.png"
-
assert user.feeds == ["https://example.com/feed.xml"]
-
assert user.directory == "testuser"
-
-
# Check that user directory was created
-
user_dir = store.repo_path / "testuser"
-
assert user_dir.exists()
-
-
# Check user exists in index
-
stored_user = store.get_user("testuser")
-
assert stored_user is not None
-
assert stored_user.username == "testuser"
-
assert stored_user.display_name == "Test User"
-
-
def test_get_user(self, temp_dir):
-
"""Test getting user metadata."""
-
store = GitStore(temp_dir / "test_repo")
-
-
# Add user
-
store.add_user("testuser", display_name="Test User")
-
-
# Get user
-
user = store.get_user("testuser")
-
assert user is not None
-
assert user.username == "testuser"
-
assert user.display_name == "Test User"
-
-
# Try to get non-existent user
-
non_user = store.get_user("nonexistent")
-
assert non_user is None
-
-
def test_store_entry(self, temp_dir):
-
"""Test storing an entry."""
-
store = GitStore(temp_dir / "test_repo")
-
-
# Add user first
-
store.add_user("testuser")
-
-
# Create test entry
-
entry = AtomEntry(
-
id="https://example.com/entry/1",
-
title="Test Entry",
-
link=HttpUrl("https://example.com/entry/1"),
-
updated=datetime.now(),
-
summary="Test entry summary",
-
content="<p>Test content</p>",
-
)
-
-
# Store entry
-
result = store.store_entry("testuser", entry)
-
assert result is True
-
-
# Check that entry file was created
-
user_dir = store.repo_path / "testuser"
-
entry_files = list(user_dir.glob("*.json"))
-
entry_files = [f for f in entry_files if f.name != "metadata.json"]
-
assert len(entry_files) == 1
-
-
# Check entry content
-
with open(entry_files[0]) as f:
-
stored_entry = json.load(f)
-
assert stored_entry["title"] == "Test Entry"
-
assert stored_entry["id"] == "https://example.com/entry/1"
-
-
def test_get_entry(self, temp_dir):
-
"""Test retrieving an entry."""
-
store = GitStore(temp_dir / "test_repo")
-
-
# Add user and entry
-
store.add_user("testuser")
-
entry = AtomEntry(
-
id="https://example.com/entry/1",
-
title="Test Entry",
-
link=HttpUrl("https://example.com/entry/1"),
-
updated=datetime.now(),
-
)
-
store.store_entry("testuser", entry)
-
-
# Get entry
-
retrieved = store.get_entry("testuser", "https://example.com/entry/1")
-
assert retrieved is not None
-
assert retrieved.title == "Test Entry"
-
assert retrieved.id == "https://example.com/entry/1"
-
-
# Try to get non-existent entry
-
non_entry = store.get_entry("testuser", "https://example.com/nonexistent")
-
assert non_entry is None
-
-
def test_list_entries(self, temp_dir):
-
"""Test listing entries for a user."""
-
store = GitStore(temp_dir / "test_repo")
-
-
# Add user
-
store.add_user("testuser")
-
-
# Add multiple entries
-
for i in range(3):
-
entry = AtomEntry(
-
id=f"https://example.com/entry/{i}",
-
title=f"Test Entry {i}",
-
link=HttpUrl(f"https://example.com/entry/{i}"),
-
updated=datetime.now(),
-
)
-
store.store_entry("testuser", entry)
-
-
# List all entries
-
entries = store.list_entries("testuser")
-
assert len(entries) == 3
-
-
# List with limit
-
limited = store.list_entries("testuser", limit=2)
-
assert len(limited) == 2
-
-
# List for non-existent user
-
none_entries = store.list_entries("nonexistent")
-
assert len(none_entries) == 0
-
-
def test_duplicates(self, temp_dir):
-
"""Test duplicate management."""
-
store = GitStore(temp_dir / "test_repo")
-
-
# Get initial duplicates (should be empty)
-
duplicates = store.get_duplicates()
-
assert isinstance(duplicates, DuplicateMap)
-
assert len(duplicates.duplicates) == 0
-
-
# Add duplicate
-
store.add_duplicate("https://example.com/dup", "https://example.com/canonical")
-
-
# Check duplicate was added
-
duplicates = store.get_duplicates()
-
assert len(duplicates.duplicates) == 1
-
assert duplicates.is_duplicate("https://example.com/dup")
-
assert duplicates.get_canonical("https://example.com/dup") == "https://example.com/canonical"
-
-
# Remove duplicate
-
result = store.remove_duplicate("https://example.com/dup")
-
assert result is True
-
-
# Check duplicate was removed
-
duplicates = store.get_duplicates()
-
assert len(duplicates.duplicates) == 0
-
assert not duplicates.is_duplicate("https://example.com/dup")
-
-
def test_search_entries(self, temp_dir):
-
"""Test searching entries."""
-
store = GitStore(temp_dir / "test_repo")
-
-
# Add user
-
store.add_user("testuser")
-
-
# Add entries with different content
-
entries_data = [
-
("Test Python Programming", "Learning Python basics"),
-
("JavaScript Tutorial", "Advanced JavaScript concepts"),
-
("Python Web Development", "Building web apps with Python"),
-
]
-
-
for title, summary in entries_data:
-
entry = AtomEntry(
-
id=f"https://example.com/entry/{title.lower().replace(' ', '-')}",
-
title=title,
-
link=HttpUrl(f"https://example.com/entry/{title.lower().replace(' ', '-')}"),
-
updated=datetime.now(),
-
summary=summary,
-
)
-
store.store_entry("testuser", entry)
-
-
# Search for Python entries
-
results = store.search_entries("Python")
-
assert len(results) == 2
-
-
# Search for specific user
-
results = store.search_entries("Python", username="testuser")
-
assert len(results) == 2
-
-
# Search with limit
-
results = store.search_entries("Python", limit=1)
-
assert len(results) == 1
-
-
# Search for non-existent term
-
results = store.search_entries("NonExistent")
-
assert len(results) == 0
-
-
def test_get_stats(self, temp_dir):
-
"""Test getting repository statistics."""
-
store = GitStore(temp_dir / "test_repo")
-
-
# Get initial stats
-
stats = store.get_stats()
-
assert stats["total_users"] == 0
-
assert stats["total_entries"] == 0
-
assert stats["total_duplicates"] == 0
-
-
# Add user and entries
-
store.add_user("testuser")
-
for i in range(3):
-
entry = AtomEntry(
-
id=f"https://example.com/entry/{i}",
-
title=f"Test Entry {i}",
-
link=HttpUrl(f"https://example.com/entry/{i}"),
-
updated=datetime.now(),
-
)
-
store.store_entry("testuser", entry)
-
-
# Add duplicate
-
store.add_duplicate("https://example.com/dup", "https://example.com/canonical")
-
-
# Get updated stats
-
stats = store.get_stats()
-
assert stats["total_users"] == 1
-
assert stats["total_entries"] == 3
-
assert stats["total_duplicates"] == 1
-
assert "last_updated" in stats
-
assert "repository_size" in stats
-352
tests/test_models.py
···
-
"""Tests for pydantic models."""
-
-
from datetime import datetime
-
-
import pytest
-
from pydantic import HttpUrl, ValidationError
-
-
from thicket.models import (
-
AtomEntry,
-
DuplicateMap,
-
FeedMetadata,
-
ThicketConfig,
-
UserConfig,
-
UserMetadata,
-
)
-
-
-
class TestUserConfig:
-
"""Test UserConfig model."""
-
-
def test_valid_user_config(self):
-
"""Test creating valid user config."""
-
config = UserConfig(
-
username="testuser",
-
feeds=["https://example.com/feed.xml"],
-
email="test@example.com",
-
homepage="https://example.com",
-
display_name="Test User",
-
)
-
-
assert config.username == "testuser"
-
assert len(config.feeds) == 1
-
assert config.feeds[0] == HttpUrl("https://example.com/feed.xml")
-
assert config.email == "test@example.com"
-
assert config.display_name == "Test User"
-
-
def test_invalid_email(self):
-
"""Test validation of invalid email."""
-
with pytest.raises(ValidationError):
-
UserConfig(
-
username="testuser",
-
feeds=["https://example.com/feed.xml"],
-
email="invalid-email",
-
)
-
-
def test_invalid_feed_url(self):
-
"""Test validation of invalid feed URL."""
-
with pytest.raises(ValidationError):
-
UserConfig(
-
username="testuser",
-
feeds=["not-a-url"],
-
)
-
-
def test_optional_fields(self):
-
"""Test optional fields with None values."""
-
config = UserConfig(
-
username="testuser",
-
feeds=["https://example.com/feed.xml"],
-
)
-
-
assert config.email is None
-
assert config.homepage is None
-
assert config.icon is None
-
assert config.display_name is None
-
-
-
class TestThicketConfig:
-
"""Test ThicketConfig model."""
-
-
def test_valid_config(self, temp_dir):
-
"""Test creating valid configuration."""
-
config = ThicketConfig(
-
git_store=temp_dir / "git_store",
-
cache_dir=temp_dir / "cache",
-
users=[
-
UserConfig(
-
username="testuser",
-
feeds=["https://example.com/feed.xml"],
-
)
-
],
-
)
-
-
assert config.git_store == temp_dir / "git_store"
-
assert config.cache_dir == temp_dir / "cache"
-
assert len(config.users) == 1
-
assert config.users[0].username == "testuser"
-
-
def test_find_user(self, temp_dir):
-
"""Test finding user by username."""
-
config = ThicketConfig(
-
git_store=temp_dir / "git_store",
-
cache_dir=temp_dir / "cache",
-
users=[
-
UserConfig(username="user1", feeds=["https://example.com/feed1.xml"]),
-
UserConfig(username="user2", feeds=["https://example.com/feed2.xml"]),
-
],
-
)
-
-
user = config.find_user("user1")
-
assert user is not None
-
assert user.username == "user1"
-
-
non_user = config.find_user("nonexistent")
-
assert non_user is None
-
-
def test_add_user(self, temp_dir):
-
"""Test adding a new user."""
-
config = ThicketConfig(
-
git_store=temp_dir / "git_store",
-
cache_dir=temp_dir / "cache",
-
users=[],
-
)
-
-
new_user = UserConfig(
-
username="newuser",
-
feeds=["https://example.com/feed.xml"],
-
)
-
-
config.add_user(new_user)
-
assert len(config.users) == 1
-
assert config.users[0].username == "newuser"
-
-
def test_add_feed_to_user(self, temp_dir):
-
"""Test adding feed to existing user."""
-
config = ThicketConfig(
-
git_store=temp_dir / "git_store",
-
cache_dir=temp_dir / "cache",
-
users=[
-
UserConfig(username="testuser", feeds=["https://example.com/feed1.xml"]),
-
],
-
)
-
-
result = config.add_feed_to_user("testuser", HttpUrl("https://example.com/feed2.xml"))
-
assert result is True
-
-
user = config.find_user("testuser")
-
assert len(user.feeds) == 2
-
assert HttpUrl("https://example.com/feed2.xml") in user.feeds
-
-
# Test adding to non-existent user
-
result = config.add_feed_to_user("nonexistent", HttpUrl("https://example.com/feed.xml"))
-
assert result is False
-
-
-
class TestAtomEntry:
-
"""Test AtomEntry model."""
-
-
def test_valid_entry(self):
-
"""Test creating valid Atom entry."""
-
entry = AtomEntry(
-
id="https://example.com/entry/1",
-
title="Test Entry",
-
link=HttpUrl("https://example.com/entry/1"),
-
updated=datetime.now(),
-
published=datetime.now(),
-
summary="Test summary",
-
content="<p>Test content</p>",
-
content_type="html",
-
author={"name": "Test Author"},
-
categories=["test", "example"],
-
)
-
-
assert entry.id == "https://example.com/entry/1"
-
assert entry.title == "Test Entry"
-
assert entry.summary == "Test summary"
-
assert entry.content == "<p>Test content</p>"
-
assert entry.content_type == "html"
-
assert entry.author["name"] == "Test Author"
-
assert "test" in entry.categories
-
-
def test_minimal_entry(self):
-
"""Test creating minimal Atom entry."""
-
entry = AtomEntry(
-
id="https://example.com/entry/1",
-
title="Test Entry",
-
link=HttpUrl("https://example.com/entry/1"),
-
updated=datetime.now(),
-
)
-
-
assert entry.id == "https://example.com/entry/1"
-
assert entry.title == "Test Entry"
-
assert entry.published is None
-
assert entry.summary is None
-
assert entry.content is None
-
assert entry.content_type == "html" # default
-
assert entry.author is None
-
assert entry.categories == []
-
-
-
class TestDuplicateMap:
-
"""Test DuplicateMap model."""
-
-
def test_empty_duplicates(self):
-
"""Test empty duplicate map."""
-
dup_map = DuplicateMap()
-
assert len(dup_map.duplicates) == 0
-
assert not dup_map.is_duplicate("test")
-
assert dup_map.get_canonical("test") == "test"
-
-
def test_add_duplicate(self):
-
"""Test adding duplicate mapping."""
-
dup_map = DuplicateMap()
-
dup_map.add_duplicate("dup1", "canonical1")
-
-
assert len(dup_map.duplicates) == 1
-
assert dup_map.is_duplicate("dup1")
-
assert dup_map.get_canonical("dup1") == "canonical1"
-
assert dup_map.get_canonical("canonical1") == "canonical1"
-
-
def test_remove_duplicate(self):
-
"""Test removing duplicate mapping."""
-
dup_map = DuplicateMap()
-
dup_map.add_duplicate("dup1", "canonical1")
-
-
result = dup_map.remove_duplicate("dup1")
-
assert result is True
-
assert len(dup_map.duplicates) == 0
-
assert not dup_map.is_duplicate("dup1")
-
-
# Test removing non-existent duplicate
-
result = dup_map.remove_duplicate("nonexistent")
-
assert result is False
-
-
def test_get_duplicates_for_canonical(self):
-
"""Test getting all duplicates for a canonical ID."""
-
dup_map = DuplicateMap()
-
dup_map.add_duplicate("dup1", "canonical1")
-
dup_map.add_duplicate("dup2", "canonical1")
-
dup_map.add_duplicate("dup3", "canonical2")
-
-
dups = dup_map.get_duplicates_for_canonical("canonical1")
-
assert len(dups) == 2
-
assert "dup1" in dups
-
assert "dup2" in dups
-
-
dups = dup_map.get_duplicates_for_canonical("canonical2")
-
assert len(dups) == 1
-
assert "dup3" in dups
-
-
dups = dup_map.get_duplicates_for_canonical("nonexistent")
-
assert len(dups) == 0
-
-
-
class TestFeedMetadata:
-
"""Test FeedMetadata model."""
-
-
def test_valid_metadata(self):
-
"""Test creating valid feed metadata."""
-
metadata = FeedMetadata(
-
title="Test Feed",
-
author_name="Test Author",
-
author_email="author@example.com",
-
author_uri=HttpUrl("https://example.com/author"),
-
link=HttpUrl("https://example.com"),
-
description="Test description",
-
)
-
-
assert metadata.title == "Test Feed"
-
assert metadata.author_name == "Test Author"
-
assert metadata.author_email == "author@example.com"
-
assert metadata.link == HttpUrl("https://example.com")
-
-
def test_to_user_config(self):
-
"""Test converting metadata to user config."""
-
metadata = FeedMetadata(
-
title="Test Feed",
-
author_name="Test Author",
-
author_email="author@example.com",
-
author_uri=HttpUrl("https://example.com/author"),
-
link=HttpUrl("https://example.com"),
-
logo=HttpUrl("https://example.com/logo.png"),
-
)
-
-
feed_url = HttpUrl("https://example.com/feed.xml")
-
user_config = metadata.to_user_config("testuser", feed_url)
-
-
assert user_config.username == "testuser"
-
assert user_config.feeds == [feed_url]
-
assert user_config.display_name == "Test Author"
-
assert user_config.email == "author@example.com"
-
assert user_config.homepage == HttpUrl("https://example.com/author")
-
assert user_config.icon == HttpUrl("https://example.com/logo.png")
-
-
def test_to_user_config_fallbacks(self):
-
"""Test fallback logic in to_user_config."""
-
metadata = FeedMetadata(
-
title="Test Feed",
-
link=HttpUrl("https://example.com"),
-
icon=HttpUrl("https://example.com/icon.png"),
-
)
-
-
feed_url = HttpUrl("https://example.com/feed.xml")
-
user_config = metadata.to_user_config("testuser", feed_url)
-
-
assert user_config.display_name == "Test Feed" # Falls back to title
-
assert user_config.homepage == HttpUrl("https://example.com") # Falls back to link
-
assert user_config.icon == HttpUrl("https://example.com/icon.png")
-
assert user_config.email is None
-
-
-
class TestUserMetadata:
-
"""Test UserMetadata model."""
-
-
def test_valid_metadata(self):
-
"""Test creating valid user metadata."""
-
now = datetime.now()
-
metadata = UserMetadata(
-
username="testuser",
-
directory="testuser",
-
created=now,
-
last_updated=now,
-
feeds=["https://example.com/feed.xml"],
-
entry_count=5,
-
)
-
-
assert metadata.username == "testuser"
-
assert metadata.directory == "testuser"
-
assert metadata.entry_count == 5
-
assert len(metadata.feeds) == 1
-
-
def test_update_timestamp(self):
-
"""Test updating timestamp."""
-
now = datetime.now()
-
metadata = UserMetadata(
-
username="testuser",
-
directory="testuser",
-
created=now,
-
last_updated=now,
-
)
-
-
original_time = metadata.last_updated
-
metadata.update_timestamp()
-
-
assert metadata.last_updated > original_time
-
-
def test_increment_entry_count(self):
-
"""Test incrementing entry count."""
-
metadata = UserMetadata(
-
username="testuser",
-
directory="testuser",
-
created=datetime.now(),
-
last_updated=datetime.now(),
-
entry_count=5,
-
)
-
-
original_count = metadata.entry_count
-
original_time = metadata.last_updated
-
-
metadata.increment_entry_count(3)
-
-
assert metadata.entry_count == original_count + 3
-
assert metadata.last_updated > original_time
+2 -130
uv.lock
···
]
[[package]]
-
name = "blinker"
-
version = "1.9.0"
-
source = { registry = "https://pypi.org/simple" }
-
sdist = { url = "https://files.pythonhosted.org/packages/21/28/9b3f50ce0e048515135495f198351908d99540d69bfdc8c1d15b73dc55ce/blinker-1.9.0.tar.gz", hash = "sha256:b4ce2265a7abece45e7cc896e98dbebe6cead56bcf805a3d23136d145f5445bf", size = 22460, upload-time = "2024-11-08T17:25:47.436Z" }
-
wheels = [
-
{ url = "https://files.pythonhosted.org/packages/10/cb/f2ad4230dc2eb1a74edf38f1a38b9b52277f75bef262d8908e60d957e13c/blinker-1.9.0-py3-none-any.whl", hash = "sha256:ba0efaa9080b619ff2f3459d1d500c57bddea4a6b424b60a91141db6fd2f08bc", size = 8458, upload-time = "2024-11-08T17:25:46.184Z" },
-
]
-
-
[[package]]
name = "certifi"
version = "2025.7.14"
source = { registry = "https://pypi.org/simple" }
···
]
[[package]]
-
name = "flask"
-
version = "3.1.1"
-
source = { registry = "https://pypi.org/simple" }
-
dependencies = [
-
{ name = "blinker" },
-
{ name = "click", version = "8.1.8", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.10'" },
-
{ name = "click", version = "8.2.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.10'" },
-
{ name = "importlib-metadata", marker = "python_full_version < '3.10'" },
-
{ name = "itsdangerous" },
-
{ name = "jinja2" },
-
{ name = "markupsafe" },
-
{ name = "werkzeug" },
-
]
-
sdist = { url = "https://files.pythonhosted.org/packages/c0/de/e47735752347f4128bcf354e0da07ef311a78244eba9e3dc1d4a5ab21a98/flask-3.1.1.tar.gz", hash = "sha256:284c7b8f2f58cb737f0cf1c30fd7eaf0ccfcde196099d24ecede3fc2005aa59e", size = 753440, upload-time = "2025-05-13T15:01:17.447Z" }
-
wheels = [
-
{ url = "https://files.pythonhosted.org/packages/3d/68/9d4508e893976286d2ead7f8f571314af6c2037af34853a30fd769c02e9d/flask-3.1.1-py3-none-any.whl", hash = "sha256:07aae2bb5eaf77993ef57e357491839f5fd9f4dc281593a81a9e4d79a24f295c", size = 103305, upload-time = "2025-05-13T15:01:15.591Z" },
-
]
-
-
[[package]]
name = "gitdb"
version = "4.0.12"
source = { registry = "https://pypi.org/simple" }
···
]
[[package]]
-
name = "importlib-metadata"
-
version = "8.7.0"
-
source = { registry = "https://pypi.org/simple" }
-
dependencies = [
-
{ name = "zipp", marker = "python_full_version < '3.10'" },
-
]
-
sdist = { url = "https://files.pythonhosted.org/packages/76/66/650a33bd90f786193e4de4b3ad86ea60b53c89b669a5c7be931fac31cdb0/importlib_metadata-8.7.0.tar.gz", hash = "sha256:d13b81ad223b890aa16c5471f2ac3056cf76c5f10f82d6f9292f0b415f389000", size = 56641, upload-time = "2025-04-27T15:29:01.736Z" }
-
wheels = [
-
{ url = "https://files.pythonhosted.org/packages/20/b0/36bd937216ec521246249be3bf9855081de4c5e06a0c9b4219dbeda50373/importlib_metadata-8.7.0-py3-none-any.whl", hash = "sha256:e5dd1551894c77868a30651cef00984d50e1002d06942a7101d34870c5f02afd", size = 27656, upload-time = "2025-04-27T15:29:00.214Z" },
-
]
-
-
[[package]]
name = "iniconfig"
version = "2.1.0"
source = { registry = "https://pypi.org/simple" }
···
]
[[package]]
-
name = "itsdangerous"
-
version = "2.2.0"
-
source = { registry = "https://pypi.org/simple" }
-
sdist = { url = "https://files.pythonhosted.org/packages/9c/cb/8ac0172223afbccb63986cc25049b154ecfb5e85932587206f42317be31d/itsdangerous-2.2.0.tar.gz", hash = "sha256:e0050c0b7da1eea53ffaf149c0cfbb5c6e2e2b69c4bef22c81fa6eb73e5f6173", size = 54410, upload-time = "2024-04-16T21:28:15.614Z" }
-
wheels = [
-
{ url = "https://files.pythonhosted.org/packages/04/96/92447566d16df59b2a776c0fb82dbc4d9e07cd95062562af01e408583fc4/itsdangerous-2.2.0-py3-none-any.whl", hash = "sha256:c6242fc49e35958c8b15141343aa660db5fc54d4f13a1db01a3f5891b98700ef", size = 16234, upload-time = "2024-04-16T21:28:14.499Z" },
-
]
-
-
[[package]]
name = "jinja2"
version = "3.1.6"
source = { registry = "https://pypi.org/simple" }
···
]
[[package]]
-
name = "linkify-it-py"
-
version = "2.0.3"
-
source = { registry = "https://pypi.org/simple" }
-
dependencies = [
-
{ name = "uc-micro-py" },
-
]
-
sdist = { url = "https://files.pythonhosted.org/packages/2a/ae/bb56c6828e4797ba5a4821eec7c43b8bf40f69cda4d4f5f8c8a2810ec96a/linkify-it-py-2.0.3.tar.gz", hash = "sha256:68cda27e162e9215c17d786649d1da0021a451bdc436ef9e0fa0ba5234b9b048", size = 27946, upload-time = "2024-02-04T14:48:04.179Z" }
-
wheels = [
-
{ url = "https://files.pythonhosted.org/packages/04/1e/b832de447dee8b582cac175871d2f6c3d5077cc56d5575cadba1fd1cccfa/linkify_it_py-2.0.3-py3-none-any.whl", hash = "sha256:6bcbc417b0ac14323382aef5c5192c0075bf8a9d6b41820a2b66371eac6b6d79", size = 19820, upload-time = "2024-02-04T14:48:02.496Z" },
-
]
-
-
[[package]]
name = "markdown-it-py"
version = "3.0.0"
source = { registry = "https://pypi.org/simple" }
···
{ url = "https://files.pythonhosted.org/packages/42/d7/1ec15b46af6af88f19b8e5ffea08fa375d433c998b8a7639e76935c14f1f/markdown_it_py-3.0.0-py3-none-any.whl", hash = "sha256:355216845c60bd96232cd8d8c40e8f9765cc86f46880e43a8fd22dc1a1a8cab1", size = 87528, upload-time = "2023-06-03T06:41:11.019Z" },
]
-
[package.optional-dependencies]
-
linkify = [
-
{ name = "linkify-it-py" },
-
]
-
plugins = [
-
{ name = "mdit-py-plugins" },
-
]
-
[[package]]
name = "markupsafe"
version = "3.0.2"
···
{ url = "https://files.pythonhosted.org/packages/17/d8/5811082f85bb88410ad7e452263af048d685669bbbfb7b595e8689152498/MarkupSafe-3.0.2-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:eb7972a85c54febfb25b5c4b4f3af4dcc731994c7da0d8a0b4a6eb0640e1d178", size = 20946, upload-time = "2024-10-18T15:21:50.441Z" },
{ url = "https://files.pythonhosted.org/packages/7c/31/bd635fb5989440d9365c5e3c47556cfea121c7803f5034ac843e8f37c2f2/MarkupSafe-3.0.2-cp39-cp39-win32.whl", hash = "sha256:8c4e8c3ce11e1f92f6536ff07154f9d49677ebaaafc32db9db4620bc11ed480f", size = 15063, upload-time = "2024-10-18T15:21:51.385Z" },
{ url = "https://files.pythonhosted.org/packages/b3/73/085399401383ce949f727afec55ec3abd76648d04b9f22e1c0e99cb4bec3/MarkupSafe-3.0.2-cp39-cp39-win_amd64.whl", hash = "sha256:6e296a513ca3d94054c2c881cc913116e90fd030ad1c656b3869762b754f5f8a", size = 15506, upload-time = "2024-10-18T15:21:52.974Z" },
-
]
-
-
[[package]]
-
name = "mdit-py-plugins"
-
version = "0.4.2"
-
source = { registry = "https://pypi.org/simple" }
-
dependencies = [
-
{ name = "markdown-it-py" },
-
]
-
sdist = { url = "https://files.pythonhosted.org/packages/19/03/a2ecab526543b152300717cf232bb4bb8605b6edb946c845016fa9c9c9fd/mdit_py_plugins-0.4.2.tar.gz", hash = "sha256:5f2cd1fdb606ddf152d37ec30e46101a60512bc0e5fa1a7002c36647b09e26b5", size = 43542, upload-time = "2024-09-09T20:27:49.564Z" }
-
wheels = [
-
{ url = "https://files.pythonhosted.org/packages/a7/f7/7782a043553ee469c1ff49cfa1cdace2d6bf99a1f333cf38676b3ddf30da/mdit_py_plugins-0.4.2-py3-none-any.whl", hash = "sha256:0c673c3f889399a33b95e88d2f0d111b4447bdfea7f237dab2d488f459835636", size = 55316, upload-time = "2024-09-09T20:27:48.397Z" },
]
[[package]]
···
]
[[package]]
-
name = "textual"
-
version = "4.0.0"
-
source = { registry = "https://pypi.org/simple" }
-
dependencies = [
-
{ name = "markdown-it-py", extra = ["linkify", "plugins"] },
-
{ name = "platformdirs" },
-
{ name = "rich" },
-
{ name = "typing-extensions" },
-
]
-
sdist = { url = "https://files.pythonhosted.org/packages/f1/22/a2812ab1e5b0cb3a327a4ea79b430234c2271ba13462b989f435b40a247d/textual-4.0.0.tar.gz", hash = "sha256:1cab4ea3cfc0e47ae773405cdd6bc2a17ed76ff7b648379ac8017ea89c5ad28c", size = 1606128, upload-time = "2025-07-12T09:41:20.812Z" }
-
wheels = [
-
{ url = "https://files.pythonhosted.org/packages/d8/e4/ebe27c54d2534cc41d00ea1d78b783763f97abf3e3d6dd41e5536daa52a5/textual-4.0.0-py3-none-any.whl", hash = "sha256:214051640f890676a670aa7d29cd2a37d27cfe6b2cf866e9d5abc3b6c89c5800", size = 692382, upload-time = "2025-07-12T09:41:18.828Z" },
-
]
-
-
[[package]]
name = "thicket"
source = { editable = "." }
dependencies = [
{ name = "bleach" },
{ name = "email-validator" },
{ name = "feedparser" },
-
{ name = "flask" },
{ name = "gitpython" },
{ name = "httpx" },
+
{ name = "jinja2" },
{ name = "pendulum" },
{ name = "platformdirs" },
{ name = "pydantic" },
{ name = "pydantic-settings" },
{ name = "pyyaml" },
{ name = "rich" },
-
{ name = "textual" },
{ name = "typer" },
]
···
{ name = "bleach", specifier = ">=6.0.0" },
{ name = "email-validator" },
{ name = "feedparser", specifier = ">=6.0.11" },
-
{ name = "flask", specifier = ">=3.1.1" },
{ name = "gitpython", specifier = ">=3.1.40" },
{ name = "httpx", specifier = ">=0.28.0" },
+
{ name = "jinja2", specifier = ">=3.1.6" },
{ name = "mypy", marker = "extra == 'dev'", specifier = ">=1.13.0" },
{ name = "pendulum", specifier = ">=3.0.0" },
{ name = "platformdirs", specifier = ">=4.0.0" },
···
{ name = "pyyaml", specifier = ">=6.0.0" },
{ name = "rich", specifier = ">=13.0.0" },
{ name = "ruff", marker = "extra == 'dev'", specifier = ">=0.8.0" },
-
{ name = "textual", specifier = ">=4.0.0" },
{ name = "typer", specifier = ">=0.15.0" },
{ name = "types-pyyaml", marker = "extra == 'dev'", specifier = ">=6.0.0" },
···
[[package]]
-
name = "uc-micro-py"
-
version = "1.0.3"
-
source = { registry = "https://pypi.org/simple" }
-
sdist = { url = "https://files.pythonhosted.org/packages/91/7a/146a99696aee0609e3712f2b44c6274566bc368dfe8375191278045186b8/uc-micro-py-1.0.3.tar.gz", hash = "sha256:d321b92cff673ec58027c04015fcaa8bb1e005478643ff4a500882eaab88c48a", size = 6043, upload-time = "2024-02-09T16:52:01.654Z" }
-
wheels = [
-
{ url = "https://files.pythonhosted.org/packages/37/87/1f677586e8ac487e29672e4b17455758fce261de06a0d086167bb760361a/uc_micro_py-1.0.3-py3-none-any.whl", hash = "sha256:db1dffff340817673d7b466ec86114a9dc0e9d4d9b5ba229d9d60e5c12600cd5", size = 6229, upload-time = "2024-02-09T16:52:00.371Z" },
-
]
-
-
[[package]]
name = "webencodings"
version = "0.5.1"
source = { registry = "https://pypi.org/simple" }
···
wheels = [
{ url = "https://files.pythonhosted.org/packages/f4/24/2a3e3df732393fed8b3ebf2ec078f05546de641fe1b667ee316ec1dcf3b7/webencodings-0.5.1-py2.py3-none-any.whl", hash = "sha256:a0af1213f3c2226497a97e2b3aa01a7e4bee4f403f95be16fc9acd2947514a78", size = 11774, upload-time = "2017-04-05T20:21:32.581Z" },
-
-
[[package]]
-
name = "werkzeug"
-
version = "3.1.3"
-
source = { registry = "https://pypi.org/simple" }
-
dependencies = [
-
{ name = "markupsafe" },
-
]
-
sdist = { url = "https://files.pythonhosted.org/packages/9f/69/83029f1f6300c5fb2471d621ab06f6ec6b3324685a2ce0f9777fd4a8b71e/werkzeug-3.1.3.tar.gz", hash = "sha256:60723ce945c19328679790e3282cc758aa4a6040e4bb330f53d30fa546d44746", size = 806925, upload-time = "2024-11-08T15:52:18.093Z" }
-
wheels = [
-
{ url = "https://files.pythonhosted.org/packages/52/24/ab44c871b0f07f491e5d2ad12c9bd7358e527510618cb1b803a88e986db1/werkzeug-3.1.3-py3-none-any.whl", hash = "sha256:54b78bf3716d19a65be4fceccc0d1d7b89e608834989dfae50ea87564639213e", size = 224498, upload-time = "2024-11-08T15:52:16.132Z" },
-
]
-
-
[[package]]
-
name = "zipp"
-
version = "3.23.0"
-
source = { registry = "https://pypi.org/simple" }
-
sdist = { url = "https://files.pythonhosted.org/packages/e3/02/0f2892c661036d50ede074e376733dca2ae7c6eb617489437771209d4180/zipp-3.23.0.tar.gz", hash = "sha256:a07157588a12518c9d4034df3fbbee09c814741a33ff63c05fa29d26a2404166", size = 25547, upload-time = "2025-06-08T17:06:39.4Z" }
-
wheels = [
-
{ url = "https://files.pythonhosted.org/packages/2e/54/647ade08bf0db230bfea292f893923872fd20be6ac6f53b2b936ba839d75/zipp-3.23.0-py3-none-any.whl", hash = "sha256:071652d6115ed432f5ce1d34c336c0adfd6a884660d1e9712a256d3d3bd4b14e", size = 10276, upload-time = "2025-06-08T17:06:38.034Z" },
-
]