Manage Atom feeds in a persistent git repository

Compare changes

Choose any two refs to compare.

-260
code_duplication_analysis.md
···
-
# Code Duplication Analysis for Thicket
-
-
## 1. Duplicate JSON Handling Code
-
-
### Pattern: JSON file reading/writing
-
**Locations:**
-
- `src/thicket/cli/commands/generate.py:230` - Reading JSON with `json.load(f)`
-
- `src/thicket/cli/commands/generate.py:249` - Reading links.json
-
- `src/thicket/cli/commands/index.py:2305` - Reading JSON
-
- `src/thicket/cli/commands/index.py:2320` - Writing JSON with `json.dump()`
-
- `src/thicket/cli/commands/threads.py:2456` - Reading JSON
-
- `src/thicket/cli/commands/info.py:2683` - Reading JSON
-
- `src/thicket/core/git_store.py:5546` - Writing JSON with custom serializer
-
- `src/thicket/core/git_store.py:5556` - Reading JSON
-
- `src/thicket/core/git_store.py:5566` - Writing JSON
-
- `src/thicket/core/git_store.py:5656` - Writing JSON with model dump
-
-
**Recommendation:** Create a shared `json_utils.py` module:
-
```python
-
def read_json_file(path: Path) -> dict:
-
"""Read JSON file with error handling."""
-
with open(path) as f:
-
return json.load(f)
-
-
def write_json_file(path: Path, data: dict, indent: int = 2) -> None:
-
"""Write JSON file with consistent formatting."""
-
with open(path, "w") as f:
-
json.dump(data, f, indent=indent, default=str)
-
-
def write_model_json(path: Path, model: BaseModel, indent: int = 2) -> None:
-
"""Write Pydantic model as JSON."""
-
with open(path, "w") as f:
-
json.dump(model.model_dump(mode="json", exclude_none=True), f, indent=indent, default=str)
-
```
-
-
## 2. Repeated Datetime Handling
-
-
### Pattern: datetime formatting and fallback handling
-
**Locations:**
-
- `src/thicket/cli/commands/generate.py:241` - `key=lambda x: x[1].updated or x[1].published or datetime.min`
-
- `src/thicket/cli/commands/generate.py:353` - Same pattern in thread sorting
-
- `src/thicket/cli/commands/generate.py:359` - Same pattern for max date
-
- `src/thicket/cli/commands/generate.py:625` - Same pattern
-
- `src/thicket/cli/commands/generate.py:655` - `entry.updated or entry.published or datetime.min`
-
- `src/thicket/cli/commands/generate.py:689` - Same pattern
-
- `src/thicket/cli/commands/generate.py:702` - Same pattern
-
- Multiple `.strftime('%Y-%m-%d')` calls throughout
-
-
**Recommendation:** Create a shared `datetime_utils.py` module:
-
```python
-
def get_entry_date(entry: AtomEntry) -> datetime:
-
"""Get the most relevant date for an entry with fallback."""
-
return entry.updated or entry.published or datetime.min
-
-
def format_date_short(dt: datetime) -> str:
-
"""Format datetime as YYYY-MM-DD."""
-
return dt.strftime('%Y-%m-%d')
-
-
def format_date_full(dt: datetime) -> str:
-
"""Format datetime as YYYY-MM-DD HH:MM."""
-
return dt.strftime('%Y-%m-%d %H:%M')
-
-
def format_date_iso(dt: datetime) -> str:
-
"""Format datetime as ISO string."""
-
return dt.isoformat()
-
```
-
-
## 3. Path Handling Patterns
-
-
### Pattern: Directory creation and existence checks
-
**Locations:**
-
- `src/thicket/cli/commands/generate.py:225` - `if user_dir.exists()`
-
- `src/thicket/cli/commands/generate.py:247` - `if links_file.exists()`
-
- `src/thicket/cli/commands/generate.py:582` - `self.output_dir.mkdir(parents=True, exist_ok=True)`
-
- `src/thicket/cli/commands/generate.py:585-586` - Multiple mkdir calls
-
- `src/thicket/cli/commands/threads.py:2449` - `if not index_path.exists()`
-
- `src/thicket/cli/commands/info.py:2681` - `if links_path.exists()`
-
- `src/thicket/core/git_store.py:5515` - `if not self.repo_path.exists()`
-
- `src/thicket/core/git_store.py:5586` - `user_dir.mkdir(exist_ok=True)`
-
- Many more similar patterns
-
-
**Recommendation:** Create a shared `path_utils.py` module:
-
```python
-
def ensure_directory(path: Path) -> Path:
-
"""Ensure directory exists, creating if necessary."""
-
path.mkdir(parents=True, exist_ok=True)
-
return path
-
-
def read_json_if_exists(path: Path, default: Any = None) -> Any:
-
"""Read JSON file if it exists, otherwise return default."""
-
if path.exists():
-
with open(path) as f:
-
return json.load(f)
-
return default
-
-
def safe_path_join(*parts: Union[str, Path]) -> Path:
-
"""Safely join path components."""
-
return Path(*parts)
-
```
-
-
## 4. Progress Bar and Console Output
-
-
### Pattern: Progress bar creation and updates
-
**Locations:**
-
- `src/thicket/cli/commands/generate.py:209` - Progress with SpinnerColumn
-
- `src/thicket/cli/commands/index.py:2230` - Same Progress pattern
-
- Multiple `console.print()` calls with similar formatting patterns
-
- Progress update patterns repeated
-
-
**Recommendation:** Create a shared `ui_utils.py` module:
-
```python
-
def create_progress_spinner(description: str) -> tuple[Progress, TaskID]:
-
"""Create a standard progress spinner."""
-
progress = Progress(
-
SpinnerColumn(),
-
TextColumn("[progress.description]{task.description}"),
-
transient=True,
-
)
-
task = progress.add_task(description)
-
return progress, task
-
-
def print_success(message: str) -> None:
-
"""Print success message with consistent formatting."""
-
console.print(f"[green]โœ“[/green] {message}")
-
-
def print_error(message: str) -> None:
-
"""Print error message with consistent formatting."""
-
console.print(f"[red]Error: {message}[/red]")
-
-
def print_warning(message: str) -> None:
-
"""Print warning message with consistent formatting."""
-
console.print(f"[yellow]Warning: {message}[/yellow]")
-
```
-
-
## 5. Git Store Operations
-
-
### Pattern: Entry file operations
-
**Locations:**
-
- Multiple patterns of loading entries from user directories
-
- Repeated safe_id generation
-
- Repeated user directory path construction
-
-
**Recommendation:** Enhance GitStore with helper methods:
-
```python
-
def get_user_dir(self, username: str) -> Path:
-
"""Get user directory path."""
-
return self.repo_path / username
-
-
def iter_user_entries(self, username: str) -> Iterator[tuple[Path, AtomEntry]]:
-
"""Iterate over all entries for a user."""
-
user_dir = self.get_user_dir(username)
-
if user_dir.exists():
-
for entry_file in user_dir.glob("*.json"):
-
if entry_file.name not in ["index.json", "duplicates.json"]:
-
try:
-
entry = self.read_entry_file(entry_file)
-
yield entry_file, entry
-
except Exception:
-
continue
-
```
-
-
## 6. Error Handling Patterns
-
-
### Pattern: Try-except with console error printing
-
**Locations:**
-
- Similar error handling patterns throughout CLI commands
-
- Repeated `raise typer.Exit(1)` patterns
-
- Similar exception message formatting
-
-
**Recommendation:** Create error handling decorators:
-
```python
-
def handle_cli_errors(func):
-
"""Decorator to handle CLI command errors consistently."""
-
@functools.wraps(func)
-
def wrapper(*args, **kwargs):
-
try:
-
return func(*args, **kwargs)
-
except ValidationError as e:
-
console.print(f"[red]Validation error: {e}[/red]")
-
raise typer.Exit(1)
-
except Exception as e:
-
console.print(f"[red]Error: {e}[/red]")
-
if kwargs.get('verbose'):
-
console.print_exception()
-
raise typer.Exit(1)
-
return wrapper
-
```
-
-
## 7. Configuration and Validation
-
-
### Pattern: Config file loading and validation
-
**Locations:**
-
- Repeated config loading pattern in every CLI command
-
- Similar validation patterns for URLs and paths
-
-
**Recommendation:** Create a `config_utils.py` module:
-
```python
-
def load_config_with_defaults(config_path: Optional[Path] = None) -> ThicketConfig:
-
"""Load config with standard defaults and error handling."""
-
if config_path is None:
-
config_path = Path("thicket.yaml")
-
-
if not config_path.exists():
-
raise ConfigError(f"Configuration file not found: {config_path}")
-
-
return load_config(config_path)
-
-
def validate_url(url: str) -> HttpUrl:
-
"""Validate and return URL with consistent error handling."""
-
try:
-
return HttpUrl(url)
-
except ValidationError:
-
raise ConfigError(f"Invalid URL: {url}")
-
```
-
-
## 8. Model Serialization
-
-
### Pattern: Pydantic model JSON encoding
-
**Locations:**
-
- Repeated `json_encoders={datetime: lambda v: v.isoformat()}` in model configs
-
- Similar model_dump patterns
-
-
**Recommendation:** Create base model class:
-
```python
-
class ThicketBaseModel(BaseModel):
-
"""Base model with common configuration."""
-
model_config = ConfigDict(
-
json_encoders={datetime: lambda v: v.isoformat()},
-
str_strip_whitespace=True,
-
)
-
-
def to_json_dict(self) -> dict:
-
"""Convert to JSON-serializable dict."""
-
return self.model_dump(mode="json", exclude_none=True)
-
```
-
-
## Summary of Refactoring Benefits
-
-
1. **Reduced Code Duplication**: Eliminate 30-40% of duplicate code
-
2. **Consistent Error Handling**: Standardize error messages and handling
-
3. **Easier Maintenance**: Central location for common patterns
-
4. **Better Testing**: Easier to unit test shared utilities
-
5. **Type Safety**: Shared type hints and validation
-
6. **Performance**: Potential to optimize common operations in one place
-
-
## Implementation Priority
-
-
1. **High Priority**:
-
- JSON utilities (used everywhere)
-
- Datetime utilities (critical for sorting and display)
-
- Error handling decorators (improves UX consistency)
-
-
2. **Medium Priority**:
-
- Path utilities
-
- UI/Console utilities
-
- Config utilities
-
-
3. **Low Priority**:
-
- Base model classes (requires more refactoring)
-
- Git store enhancements (already well-structured)
···
+5 -5
pyproject.toml
···
"platformdirs>=4.0.0",
"pyyaml>=6.0.0",
"email_validator",
-
"jinja2>=3.1.6",
]
[project.optional-dependencies]
···
"-ra",
"--strict-markers",
"--strict-config",
-
"--cov=src/thicket",
-
"--cov-report=term-missing",
-
"--cov-report=html",
-
"--cov-report=xml",
]
filterwarnings = [
"error",
···
"class .*\\bProtocol\\):",
"@(abc\\.)?abstractmethod",
]
···
"platformdirs>=4.0.0",
"pyyaml>=6.0.0",
"email_validator",
]
[project.optional-dependencies]
···
"-ra",
"--strict-markers",
"--strict-config",
]
filterwarnings = [
"error",
···
"class .*\\bProtocol\\):",
"@(abc\\.)?abstractmethod",
]
+
+
[dependency-groups]
+
dev = [
+
"pytest>=8.4.1",
+
]
-6617
repomix-output.xml
···
-
This file is a merged representation of the entire codebase, combined into a single document by Repomix.
-
-
<file_summary>
-
This section contains a summary of this file.
-
-
<purpose>
-
This file contains a packed representation of the entire repository's contents.
-
It is designed to be easily consumable by AI systems for analysis, code review,
-
or other automated processes.
-
</purpose>
-
-
<file_format>
-
The content is organized as follows:
-
1. This summary section
-
2. Repository information
-
3. Directory structure
-
4. Repository files (if enabled)
-
5. Multiple file entries, each consisting of:
-
- File path as an attribute
-
- Full contents of the file
-
</file_format>
-
-
<usage_guidelines>
-
- This file should be treated as read-only. Any changes should be made to the
-
original repository files, not this packed version.
-
- When processing this file, use the file path to distinguish
-
between different files in the repository.
-
- Be aware that this file may contain sensitive information. Handle it with
-
the same level of security as you would the original repository.
-
</usage_guidelines>
-
-
<notes>
-
- Some files may have been excluded based on .gitignore rules and Repomix's configuration
-
- Binary files are not included in this packed representation. Please refer to the Repository Structure section for a complete list of file paths, including binary files
-
- Files matching patterns in .gitignore are excluded
-
- Files matching default ignore patterns are excluded
-
- Files are sorted by Git change count (files with more changes are at the bottom)
-
</notes>
-
-
</file_summary>
-
-
<directory_structure>
-
.claude/
-
settings.local.json
-
src/
-
thicket/
-
cli/
-
commands/
-
__init__.py
-
add.py
-
duplicates.py
-
generate.py
-
index_cmd.py
-
info_cmd.py
-
init.py
-
links_cmd.py
-
list_cmd.py
-
sync.py
-
__init__.py
-
main.py
-
utils.py
-
core/
-
__init__.py
-
feed_parser.py
-
git_store.py
-
reference_parser.py
-
models/
-
__init__.py
-
config.py
-
feed.py
-
user.py
-
templates/
-
base.html
-
index.html
-
links.html
-
script.js
-
style.css
-
timeline.html
-
users.html
-
utils/
-
__init__.py
-
__init__.py
-
__main__.py
-
.gitignore
-
ARCH.md
-
CLAUDE.md
-
pyproject.toml
-
README.md
-
</directory_structure>
-
-
<files>
-
This section contains the contents of the repository's files.
-
-
<file path=".claude/settings.local.json">
-
{
-
"permissions": {
-
"allow": [
-
"Bash(find:*)",
-
"Bash(uv run:*)",
-
"Bash(grep:*)",
-
"Bash(jq:*)",
-
"Bash(git add:*)",
-
"Bash(ls:*)"
-
]
-
},
-
"enableAllProjectMcpServers": false
-
}
-
</file>
-
-
<file path="src/thicket/cli/commands/generate.py">
-
"""Generate static HTML website from thicket data."""
-
-
import base64
-
import json
-
import re
-
import shutil
-
from datetime import datetime
-
from pathlib import Path
-
from typing import Any, Optional, TypedDict, Union
-
-
import typer
-
from jinja2 import Environment, FileSystemLoader, select_autoescape
-
from rich.progress import Progress, SpinnerColumn, TextColumn
-
-
from ...core.git_store import GitStore
-
from ...models.feed import AtomEntry
-
from ...models.user import GitStoreIndex, UserMetadata
-
from ..main import app
-
from ..utils import console, load_config
-
-
-
class UserData(TypedDict):
-
"""Type definition for user data structure."""
-
-
metadata: UserMetadata
-
recent_entries: list[tuple[str, AtomEntry]]
-
-
-
def safe_anchor_id(atom_id: str) -> str:
-
"""Convert an Atom ID to a safe HTML anchor ID."""
-
# Use base64 URL-safe encoding without padding
-
encoded = base64.urlsafe_b64encode(atom_id.encode('utf-8')).decode('ascii').rstrip('=')
-
# Prefix with 'id' to ensure it starts with a letter (HTML requirement)
-
return f"id{encoded}"
-
-
-
class WebsiteGenerator:
-
"""Generate static HTML website from thicket data."""
-
-
def __init__(self, git_store: GitStore, output_dir: Path):
-
self.git_store = git_store
-
self.output_dir = output_dir
-
self.template_dir = Path(__file__).parent.parent.parent / "templates"
-
-
# Initialize Jinja2 environment
-
self.env = Environment(
-
loader=FileSystemLoader(self.template_dir),
-
autoescape=select_autoescape(["html", "xml"]),
-
)
-
-
# Data containers
-
self.index: Optional[GitStoreIndex] = None
-
self.entries: list[tuple[str, AtomEntry]] = [] # (username, entry)
-
self.links_data: Optional[dict[str, Any]] = None
-
self.threads: list[list[dict[str, Any]]] = [] # List of threads with metadata
-
-
def get_display_name(self, username: str) -> str:
-
"""Get display name for a user, falling back to username."""
-
if self.index and username in self.index.users:
-
user = self.index.users[username]
-
return user.display_name or username
-
return username
-
-
def get_user_homepage(self, username: str) -> Optional[str]:
-
"""Get homepage URL for a user."""
-
if self.index and username in self.index.users:
-
user = self.index.users[username]
-
return str(user.homepage) if user.homepage else None
-
return None
-
-
def clean_html_summary(self, content: Optional[str], max_length: int = 200) -> str:
-
"""Clean HTML content and truncate for display in timeline."""
-
if not content:
-
return ""
-
-
# Remove HTML tags
-
clean_text = re.sub(r"<[^>]+>", " ", content)
-
# Replace multiple whitespace with single space
-
clean_text = re.sub(r"\s+", " ", clean_text)
-
# Strip leading/trailing whitespace
-
clean_text = clean_text.strip()
-
-
# Truncate with ellipsis if needed
-
if len(clean_text) > max_length:
-
# Try to break at word boundary
-
truncated = clean_text[:max_length]
-
last_space = truncated.rfind(" ")
-
if (
-
last_space > max_length * 0.8
-
): # If we can break reasonably close to the limit
-
clean_text = truncated[:last_space] + "..."
-
else:
-
clean_text = truncated + "..."
-
-
return clean_text
-
-
def load_data(self) -> None:
-
"""Load all data from the git repository."""
-
with Progress(
-
SpinnerColumn(),
-
TextColumn("[progress.description]{task.description}"),
-
console=console,
-
) as progress:
-
# Load index
-
task = progress.add_task("Loading repository index...", total=None)
-
self.index = self.git_store._load_index()
-
if not self.index:
-
raise ValueError("No index found in repository")
-
progress.update(task, completed=True)
-
-
# Load all entries
-
task = progress.add_task("Loading entries...", total=None)
-
for username, user_metadata in self.index.users.items():
-
user_dir = self.git_store.repo_path / user_metadata.directory
-
if user_dir.exists():
-
for entry_file in user_dir.glob("*.json"):
-
if entry_file.name not in ["index.json", "duplicates.json"]:
-
try:
-
with open(entry_file) as f:
-
entry_data = json.load(f)
-
entry = AtomEntry(**entry_data)
-
self.entries.append((username, entry))
-
except Exception as e:
-
console.print(
-
f"[yellow]Warning: Failed to load {entry_file}: {e}[/yellow]"
-
)
-
progress.update(task, completed=True)
-
-
# Sort entries by date (newest first) - prioritize updated over published
-
self.entries.sort(
-
key=lambda x: x[1].updated or x[1].published or datetime.min, reverse=True
-
)
-
-
# Load links data
-
task = progress.add_task("Loading links and references...", total=None)
-
links_file = self.git_store.repo_path / "links.json"
-
if links_file.exists():
-
with open(links_file) as f:
-
self.links_data = json.load(f)
-
progress.update(task, completed=True)
-
-
def build_threads(self) -> None:
-
"""Build threaded conversations from references."""
-
if not self.links_data or "references" not in self.links_data:
-
return
-
-
# Map entry IDs to (username, entry) tuples
-
entry_map: dict[str, tuple[str, AtomEntry]] = {}
-
for username, entry in self.entries:
-
entry_map[entry.id] = (username, entry)
-
-
# Build adjacency lists for references
-
self.outbound_refs: dict[str, set[str]] = {}
-
self.inbound_refs: dict[str, set[str]] = {}
-
self.reference_details: dict[
-
str, list[dict[str, Any]]
-
] = {} # Store full reference info
-
-
for ref in self.links_data["references"]:
-
source_id = ref["source_entry_id"]
-
target_id = ref.get("target_entry_id")
-
-
if target_id and source_id in entry_map and target_id in entry_map:
-
self.outbound_refs.setdefault(source_id, set()).add(target_id)
-
self.inbound_refs.setdefault(target_id, set()).add(source_id)
-
-
# Store reference details for UI
-
self.reference_details.setdefault(source_id, []).append(
-
{
-
"target_id": target_id,
-
"target_username": ref.get("target_username"),
-
"type": "outbound",
-
}
-
)
-
self.reference_details.setdefault(target_id, []).append(
-
{
-
"source_id": source_id,
-
"source_username": ref.get("source_username"),
-
"type": "inbound",
-
}
-
)
-
-
# Find conversation threads (multi-post discussions)
-
processed = set()
-
-
for entry_id, (_username, _entry) in entry_map.items():
-
if entry_id in processed:
-
continue
-
-
# Build thread starting from this entry
-
thread = []
-
to_visit = [entry_id]
-
thread_ids = set()
-
level_map: dict[str, int] = {} # Track levels for this thread
-
-
# First, traverse up to find the root
-
current = entry_id
-
while current in self.inbound_refs:
-
parents = self.inbound_refs[current] - {
-
current
-
} # Exclude self-references
-
if not parents:
-
break
-
# Take the first parent
-
parent = next(iter(parents))
-
if parent in thread_ids: # Avoid cycles
-
break
-
current = parent
-
to_visit.insert(0, current)
-
-
# Now traverse down from the root
-
while to_visit:
-
current = to_visit.pop(0)
-
if current in thread_ids or current not in entry_map:
-
continue
-
-
thread_ids.add(current)
-
username, entry = entry_map[current]
-
-
# Calculate thread level
-
thread_level = self._calculate_thread_level(current, level_map)
-
-
# Add threading metadata
-
thread_entry = {
-
"username": username,
-
"display_name": self.get_display_name(username),
-
"entry": entry,
-
"entry_id": current,
-
"references_to": list(self.outbound_refs.get(current, [])),
-
"referenced_by": list(self.inbound_refs.get(current, [])),
-
"thread_level": thread_level,
-
}
-
thread.append(thread_entry)
-
processed.add(current)
-
-
# Add children
-
if current in self.outbound_refs:
-
children = self.outbound_refs[current] - thread_ids # Avoid cycles
-
to_visit.extend(sorted(children))
-
-
if len(thread) > 1: # Only keep actual threads
-
# Sort thread by date (newest first) - prioritize updated over published
-
thread.sort(key=lambda x: x["entry"].updated or x["entry"].published or datetime.min, reverse=True) # type: ignore
-
self.threads.append(thread)
-
-
# Sort threads by the date of their most recent entry - prioritize updated over published
-
self.threads.sort(
-
key=lambda t: max(
-
item["entry"].updated or item["entry"].published or datetime.min for item in t
-
),
-
reverse=True,
-
)
-
-
def _calculate_thread_level(
-
self, entry_id: str, processed_entries: dict[str, int]
-
) -> int:
-
"""Calculate indentation level for threaded display."""
-
if entry_id in processed_entries:
-
return processed_entries[entry_id]
-
-
if entry_id not in self.inbound_refs:
-
processed_entries[entry_id] = 0
-
return 0
-
-
parents_in_thread = self.inbound_refs[entry_id] & set(processed_entries.keys())
-
if not parents_in_thread:
-
processed_entries[entry_id] = 0
-
return 0
-
-
# Find the deepest parent level + 1
-
max_parent_level = 0
-
for parent_id in parents_in_thread:
-
parent_level = self._calculate_thread_level(parent_id, processed_entries)
-
max_parent_level = max(max_parent_level, parent_level)
-
-
level = min(max_parent_level + 1, 4) # Cap at level 4
-
processed_entries[entry_id] = level
-
return level
-
-
def get_standalone_references(self) -> list[dict[str, Any]]:
-
"""Get posts that have references but aren't part of multi-post threads."""
-
if not hasattr(self, "reference_details"):
-
return []
-
-
threaded_entry_ids = set()
-
for thread in self.threads:
-
for item in thread:
-
threaded_entry_ids.add(item["entry_id"])
-
-
standalone_refs = []
-
for username, entry in self.entries:
-
if (
-
entry.id in self.reference_details
-
and entry.id not in threaded_entry_ids
-
):
-
refs = self.reference_details[entry.id]
-
# Only include if it has meaningful references (not just self-references)
-
meaningful_refs = [
-
r
-
for r in refs
-
if r.get("target_id") != entry.id and r.get("source_id") != entry.id
-
]
-
if meaningful_refs:
-
standalone_refs.append(
-
{
-
"username": username,
-
"display_name": self.get_display_name(username),
-
"entry": entry,
-
"references": meaningful_refs,
-
}
-
)
-
-
return standalone_refs
-
-
def _add_cross_thread_links(self, timeline_items: list[dict[str, Any]]) -> None:
-
"""Add cross-thread linking for entries that appear in multiple threads."""
-
# Map entry IDs to their positions in the timeline
-
entry_positions: dict[str, list[int]] = {}
-
# Map URLs referenced by entries to the entries that reference them
-
url_references: dict[str, list[tuple[str, int]]] = {} # url -> [(entry_id, position)]
-
-
# First pass: collect all entry IDs, their positions, and referenced URLs
-
for i, item in enumerate(timeline_items):
-
if item["type"] == "post":
-
entry_id = item["content"]["entry"].id
-
entry_positions.setdefault(entry_id, []).append(i)
-
# Track URLs this entry references
-
if entry_id in self.reference_details:
-
for ref in self.reference_details[entry_id]:
-
if ref["type"] == "outbound" and "target_id" in ref:
-
# Find the target entry's URL if available
-
target_entry = self._find_entry_by_id(ref["target_id"])
-
if target_entry and target_entry.link:
-
url = str(target_entry.link)
-
url_references.setdefault(url, []).append((entry_id, i))
-
elif item["type"] == "thread":
-
for thread_item in item["content"]:
-
entry_id = thread_item["entry"].id
-
entry_positions.setdefault(entry_id, []).append(i)
-
# Track URLs this entry references
-
if entry_id in self.reference_details:
-
for ref in self.reference_details[entry_id]:
-
if ref["type"] == "outbound" and "target_id" in ref:
-
target_entry = self._find_entry_by_id(ref["target_id"])
-
if target_entry and target_entry.link:
-
url = str(target_entry.link)
-
url_references.setdefault(url, []).append((entry_id, i))
-
-
# Build cross-thread connections - only for entries that actually appear multiple times
-
cross_thread_connections: dict[str, set[int]] = {} # entry_id -> set of timeline positions
-
-
# Add connections ONLY for entries that appear multiple times in the timeline
-
for entry_id, positions in entry_positions.items():
-
if len(positions) > 1:
-
cross_thread_connections[entry_id] = set(positions)
-
# Debug: uncomment to see which entries have multiple appearances
-
# print(f"Entry {entry_id[:50]}... appears at positions: {positions}")
-
-
# Apply cross-thread links to timeline items
-
for entry_id, positions_set in cross_thread_connections.items():
-
positions_list = list(positions_set)
-
for pos in positions_list:
-
item = timeline_items[pos]
-
other_positions = sorted([p for p in positions_list if p != pos])
-
-
if item["type"] == "post":
-
# Add cross-thread info to individual posts
-
item["content"]["cross_thread_links"] = self._build_cross_thread_link_data(entry_id, other_positions, timeline_items)
-
# Add info about shared references
-
item["content"]["shared_references"] = self._get_shared_references(entry_id, positions_set, timeline_items)
-
elif item["type"] == "thread":
-
# Add cross-thread info to thread items
-
for thread_item in item["content"]:
-
if thread_item["entry"].id == entry_id:
-
thread_item["cross_thread_links"] = self._build_cross_thread_link_data(entry_id, other_positions, timeline_items)
-
thread_item["shared_references"] = self._get_shared_references(entry_id, positions_set, timeline_items)
-
break
-
-
def _build_cross_thread_link_data(self, entry_id: str, other_positions: list[int], timeline_items: list[dict[str, Any]]) -> list[dict[str, Any]]:
-
"""Build detailed cross-thread link data with anchor information."""
-
cross_thread_links = []
-
-
for pos in other_positions:
-
item = timeline_items[pos]
-
if item["type"] == "post":
-
# For individual posts
-
safe_id = safe_anchor_id(entry_id)
-
cross_thread_links.append({
-
"position": pos,
-
"anchor_id": f"post-{pos}-{safe_id}",
-
"context": "individual post",
-
"title": item["content"]["entry"].title
-
})
-
elif item["type"] == "thread":
-
# For thread items, find the specific thread item
-
for thread_idx, thread_item in enumerate(item["content"]):
-
if thread_item["entry"].id == entry_id:
-
safe_id = safe_anchor_id(entry_id)
-
cross_thread_links.append({
-
"position": pos,
-
"anchor_id": f"post-{pos}-{thread_idx}-{safe_id}",
-
"context": f"thread (level {thread_item.get('thread_level', 0)})",
-
"title": thread_item["entry"].title
-
})
-
break
-
-
return cross_thread_links
-
-
def _find_entry_by_id(self, entry_id: str) -> Optional[AtomEntry]:
-
"""Find an entry by its ID."""
-
for _username, entry in self.entries:
-
if entry.id == entry_id:
-
return entry
-
return None
-
-
def _get_shared_references(self, entry_id: str, positions: Union[set[int], list[int]], timeline_items: list[dict[str, Any]]) -> list[dict[str, Any]]:
-
"""Get information about shared references between cross-thread entries."""
-
shared_refs = []
-
-
# Collect all referenced URLs from entries at these positions
-
url_counts: dict[str, int] = {}
-
referencing_entries: dict[str, list[str]] = {} # url -> [entry_ids]
-
-
for pos in positions:
-
item = timeline_items[pos]
-
entries_to_check = []
-
-
if item["type"] == "post":
-
entries_to_check.append(item["content"]["entry"])
-
elif item["type"] == "thread":
-
entries_to_check.extend([ti["entry"] for ti in item["content"]])
-
-
for entry in entries_to_check:
-
if entry.id in self.reference_details:
-
for ref in self.reference_details[entry.id]:
-
if ref["type"] == "outbound" and "target_id" in ref:
-
target_entry = self._find_entry_by_id(ref["target_id"])
-
if target_entry and target_entry.link:
-
url = str(target_entry.link)
-
url_counts[url] = url_counts.get(url, 0) + 1
-
if url not in referencing_entries:
-
referencing_entries[url] = []
-
if entry.id not in referencing_entries[url]:
-
referencing_entries[url].append(entry.id)
-
-
# Find URLs referenced by multiple entries
-
for url, count in url_counts.items():
-
if count > 1 and len(referencing_entries[url]) > 1:
-
# Get the target entry info
-
target_entry = None
-
target_username = None
-
for ref in (self.links_data or {}).get("references", []):
-
if ref.get("target_url") == url:
-
target_username = ref.get("target_username")
-
if ref.get("target_entry_id"):
-
target_entry = self._find_entry_by_id(ref["target_entry_id"])
-
break
-
-
shared_refs.append({
-
"url": url,
-
"count": count,
-
"referencing_entries": referencing_entries[url],
-
"target_username": target_username,
-
"target_title": target_entry.title if target_entry else None
-
})
-
-
return sorted(shared_refs, key=lambda x: x["count"], reverse=True)
-
-
def generate_site(self) -> None:
-
"""Generate the static website."""
-
# Create output directory
-
self.output_dir.mkdir(parents=True, exist_ok=True)
-
-
# Create static directories
-
(self.output_dir / "css").mkdir(exist_ok=True)
-
(self.output_dir / "js").mkdir(exist_ok=True)
-
-
# Generate CSS
-
css_template = self.env.get_template("style.css")
-
css_content = css_template.render()
-
with open(self.output_dir / "css" / "style.css", "w") as f:
-
f.write(css_content)
-
-
# Generate JavaScript
-
js_template = self.env.get_template("script.js")
-
js_content = js_template.render()
-
with open(self.output_dir / "js" / "script.js", "w") as f:
-
f.write(js_content)
-
-
# Prepare common template data
-
base_data = {
-
"title": "Energy & Environment Group",
-
"generated_at": datetime.now().isoformat(),
-
"get_display_name": self.get_display_name,
-
"get_user_homepage": self.get_user_homepage,
-
"clean_html_summary": self.clean_html_summary,
-
"safe_anchor_id": safe_anchor_id,
-
}
-
-
# Build unified timeline
-
timeline_items = []
-
-
# Only consider the threads that will actually be displayed
-
displayed_threads = self.threads[:20] # Limit to 20 threads
-
-
# Track which entries are part of displayed threads
-
threaded_entry_ids = set()
-
for thread in displayed_threads:
-
for item in thread:
-
threaded_entry_ids.add(item["entry_id"])
-
-
# Add threads to timeline (using the date of the most recent post)
-
for thread in displayed_threads:
-
most_recent_date = max(
-
item["entry"].updated or item["entry"].published or datetime.min
-
for item in thread
-
)
-
timeline_items.append({
-
"type": "thread",
-
"date": most_recent_date,
-
"content": thread
-
})
-
-
# Add individual posts (not in threads)
-
for username, entry in self.entries[:50]:
-
if entry.id not in threaded_entry_ids:
-
# Check if this entry has references
-
has_refs = (
-
entry.id in self.reference_details
-
if hasattr(self, "reference_details")
-
else False
-
)
-
-
refs = []
-
if has_refs:
-
refs = self.reference_details.get(entry.id, [])
-
refs = [
-
r for r in refs
-
if r.get("target_id") != entry.id
-
and r.get("source_id") != entry.id
-
]
-
-
timeline_items.append({
-
"type": "post",
-
"date": entry.updated or entry.published or datetime.min,
-
"content": {
-
"username": username,
-
"display_name": self.get_display_name(username),
-
"entry": entry,
-
"references": refs if refs else None
-
}
-
})
-
-
# Sort unified timeline by date (newest first)
-
timeline_items.sort(key=lambda x: x["date"], reverse=True)
-
-
# Limit timeline to what will actually be rendered
-
timeline_items = timeline_items[:50] # Limit to 50 items total
-
-
# Add cross-thread linking for repeat blog references
-
self._add_cross_thread_links(timeline_items)
-
-
# Prepare outgoing links data
-
outgoing_links = []
-
if self.links_data and "links" in self.links_data:
-
for url, link_info in self.links_data["links"].items():
-
referencing_entries = []
-
for entry_id in link_info.get("referencing_entries", []):
-
for username, entry in self.entries:
-
if entry.id == entry_id:
-
referencing_entries.append(
-
(self.get_display_name(username), entry)
-
)
-
break
-
-
if referencing_entries:
-
# Sort by date - prioritize updated over published
-
referencing_entries.sort(
-
key=lambda x: x[1].updated or x[1].published or datetime.min, reverse=True
-
)
-
outgoing_links.append(
-
{
-
"url": url,
-
"target_username": link_info.get("target_username"),
-
"entries": referencing_entries,
-
}
-
)
-
-
# Sort links by most recent reference - prioritize updated over published
-
outgoing_links.sort(
-
key=lambda x: x["entries"][0][1].updated
-
or x["entries"][0][1].published or datetime.min,
-
reverse=True,
-
)
-
-
# Prepare users data
-
users: list[UserData] = []
-
if self.index:
-
for username, user_metadata in self.index.users.items():
-
# Get recent entries for this user with display names
-
user_entries = [
-
(self.get_display_name(u), e)
-
for u, e in self.entries
-
if u == username
-
][:5]
-
users.append(
-
{"metadata": user_metadata, "recent_entries": user_entries}
-
)
-
# Sort by entry count
-
users.sort(key=lambda x: x["metadata"].entry_count, reverse=True)
-
-
# Generate timeline page
-
timeline_template = self.env.get_template("timeline.html")
-
timeline_content = timeline_template.render(
-
**base_data,
-
page="timeline",
-
timeline_items=timeline_items, # Already limited above
-
)
-
with open(self.output_dir / "timeline.html", "w") as f:
-
f.write(timeline_content)
-
-
# Generate links page
-
links_template = self.env.get_template("links.html")
-
links_content = links_template.render(
-
**base_data,
-
page="links",
-
outgoing_links=outgoing_links[:100],
-
)
-
with open(self.output_dir / "links.html", "w") as f:
-
f.write(links_content)
-
-
# Generate users page
-
users_template = self.env.get_template("users.html")
-
users_content = users_template.render(
-
**base_data,
-
page="users",
-
users=users,
-
)
-
with open(self.output_dir / "users.html", "w") as f:
-
f.write(users_content)
-
-
# Generate main index page (redirect to timeline)
-
index_template = self.env.get_template("index.html")
-
index_content = index_template.render(**base_data)
-
with open(self.output_dir / "index.html", "w") as f:
-
f.write(index_content)
-
-
console.print(f"[green]โœ“[/green] Generated website at {self.output_dir}")
-
console.print(f" - {len(self.entries)} entries")
-
console.print(f" - {len(self.threads)} conversation threads")
-
console.print(f" - {len(outgoing_links)} outgoing links")
-
console.print(f" - {len(users)} users")
-
console.print(
-
" - Generated pages: index.html, timeline.html, links.html, users.html"
-
)
-
-
-
@app.command()
-
def generate(
-
output: Path = typer.Option(
-
Path("./thicket-site"),
-
"--output",
-
"-o",
-
help="Output directory for the generated website",
-
),
-
force: bool = typer.Option(
-
False, "--force", "-f", help="Overwrite existing output directory"
-
),
-
config_file: Path = typer.Option(
-
Path("thicket.yaml"), "--config", help="Configuration file path"
-
),
-
) -> None:
-
"""Generate a static HTML website from thicket data."""
-
config = load_config(config_file)
-
-
if not config.git_store:
-
console.print("[red]No git store path configured[/red]")
-
raise typer.Exit(1)
-
-
git_store = GitStore(config.git_store)
-
-
# Check if output directory exists
-
if output.exists() and not force:
-
console.print(
-
f"[red]Output directory {output} already exists. Use --force to overwrite.[/red]"
-
)
-
raise typer.Exit(1)
-
-
# Clean output directory if forcing
-
if output.exists() and force:
-
shutil.rmtree(output)
-
-
try:
-
generator = WebsiteGenerator(git_store, output)
-
-
console.print("[bold]Generating static website...[/bold]")
-
generator.load_data()
-
generator.build_threads()
-
generator.generate_site()
-
-
except Exception as e:
-
console.print(f"[red]Error generating website: {e}[/red]")
-
raise typer.Exit(1) from e
-
</file>
-
-
<file path="src/thicket/templates/base.html">
-
<!DOCTYPE html>
-
<html lang="en">
-
<head>
-
<meta charset="UTF-8">
-
<meta name="viewport" content="width=device-width, initial-scale=1.0">
-
<title>{% block page_title %}{{ title }}{% endblock %}</title>
-
<link rel="stylesheet" href="css/style.css">
-
</head>
-
<body>
-
<header class="site-header">
-
<div class="header-content">
-
<h1 class="site-title">{{ title }}</h1>
-
<nav class="site-nav">
-
<a href="timeline.html" class="nav-link {% if page == 'timeline' %}active{% endif %}">Timeline</a>
-
<a href="links.html" class="nav-link {% if page == 'links' %}active{% endif %}">Links</a>
-
<a href="users.html" class="nav-link {% if page == 'users' %}active{% endif %}">Users</a>
-
</nav>
-
</div>
-
</header>
-
-
<main class="main-content">
-
{% block content %}{% endblock %}
-
</main>
-
-
<footer class="site-footer">
-
<p>Generated on {{ generated_at }} by <a href="https://github.com/avsm/thicket">Thicket</a></p>
-
</footer>
-
-
<script src="js/script.js"></script>
-
</body>
-
</html>
-
</file>
-
-
<file path="src/thicket/templates/index.html">
-
<!DOCTYPE html>
-
<html lang="en">
-
<head>
-
<meta charset="UTF-8">
-
<meta name="viewport" content="width=device-width, initial-scale=1.0">
-
<title>{{ title }}</title>
-
<meta http-equiv="refresh" content="0; url=timeline.html">
-
<link rel="canonical" href="timeline.html">
-
</head>
-
<body>
-
<p>Redirecting to <a href="timeline.html">Timeline</a>...</p>
-
</body>
-
</html>
-
</file>
-
-
<file path="src/thicket/templates/links.html">
-
{% extends "base.html" %}
-
-
{% block page_title %}Outgoing Links - {{ title }}{% endblock %}
-
-
{% block content %}
-
<div class="page-content">
-
<h2>Outgoing Links</h2>
-
<p class="page-description">External links referenced in blog posts, ordered by most recent reference.</p>
-
-
{% for link in outgoing_links %}
-
<article class="link-group">
-
<h3 class="link-url">
-
<a href="{{ link.url }}" target="_blank">{{ link.url|truncate(80) }}</a>
-
{% if link.target_username %}
-
<span class="target-user">({{ link.target_username }})</span>
-
{% endif %}
-
</h3>
-
<div class="referencing-entries">
-
<span class="ref-count">Referenced in {{ link.entries|length }} post(s):</span>
-
<ul>
-
{% for display_name, entry in link.entries[:5] %}
-
<li>
-
<span class="author">{{ display_name }}</span> -
-
<a href="{{ entry.link }}" target="_blank">{{ entry.title }}</a>
-
<time datetime="{{ entry.updated or entry.published }}">
-
({{ (entry.updated or entry.published).strftime('%Y-%m-%d') }})
-
</time>
-
</li>
-
{% endfor %}
-
{% if link.entries|length > 5 %}
-
<li class="more">... and {{ link.entries|length - 5 }} more</li>
-
{% endif %}
-
</ul>
-
</div>
-
</article>
-
{% endfor %}
-
</div>
-
{% endblock %}
-
</file>
-
-
<file path="src/thicket/templates/script.js">
-
// Enhanced functionality for thicket website
-
document.addEventListener('DOMContentLoaded', function() {
-
-
// Enhance thread collapsing (optional feature)
-
const threadHeaders = document.querySelectorAll('.thread-header');
-
threadHeaders.forEach(header => {
-
header.style.cursor = 'pointer';
-
header.addEventListener('click', function() {
-
const thread = this.parentElement;
-
const entries = thread.querySelectorAll('.thread-entry');
-
-
// Toggle visibility of all but the first entry
-
for (let i = 1; i < entries.length; i++) {
-
entries[i].style.display = entries[i].style.display === 'none' ? 'block' : 'none';
-
}
-
-
// Update thread count text
-
const count = this.querySelector('.thread-count');
-
if (entries[1] && entries[1].style.display === 'none') {
-
count.textContent = count.textContent.replace('posts', 'posts (collapsed)');
-
} else {
-
count.textContent = count.textContent.replace(' (collapsed)', '');
-
}
-
});
-
});
-
-
// Add relative time display
-
const timeElements = document.querySelectorAll('time');
-
timeElements.forEach(timeEl => {
-
const datetime = new Date(timeEl.getAttribute('datetime'));
-
const now = new Date();
-
const diffMs = now - datetime;
-
const diffDays = Math.floor(diffMs / (1000 * 60 * 60 * 24));
-
-
let relativeTime;
-
if (diffDays === 0) {
-
const diffHours = Math.floor(diffMs / (1000 * 60 * 60));
-
if (diffHours === 0) {
-
const diffMinutes = Math.floor(diffMs / (1000 * 60));
-
relativeTime = diffMinutes === 0 ? 'just now' : `${diffMinutes}m ago`;
-
} else {
-
relativeTime = `${diffHours}h ago`;
-
}
-
} else if (diffDays === 1) {
-
relativeTime = 'yesterday';
-
} else if (diffDays < 7) {
-
relativeTime = `${diffDays}d ago`;
-
} else if (diffDays < 30) {
-
const weeks = Math.floor(diffDays / 7);
-
relativeTime = weeks === 1 ? '1w ago' : `${weeks}w ago`;
-
} else if (diffDays < 365) {
-
const months = Math.floor(diffDays / 30);
-
relativeTime = months === 1 ? '1mo ago' : `${months}mo ago`;
-
} else {
-
const years = Math.floor(diffDays / 365);
-
relativeTime = years === 1 ? '1y ago' : `${years}y ago`;
-
}
-
-
// Add relative time as title attribute
-
timeEl.setAttribute('title', timeEl.textContent);
-
timeEl.textContent = relativeTime;
-
});
-
-
// Enhanced anchor link scrolling for shared references
-
document.querySelectorAll('a[href^="#"]').forEach(anchor => {
-
anchor.addEventListener('click', function (e) {
-
e.preventDefault();
-
const target = document.querySelector(this.getAttribute('href'));
-
if (target) {
-
target.scrollIntoView({
-
behavior: 'smooth',
-
block: 'center'
-
});
-
-
// Highlight the target briefly
-
const timelineEntry = target.closest('.timeline-entry');
-
if (timelineEntry) {
-
timelineEntry.style.outline = '2px solid var(--primary-color)';
-
timelineEntry.style.borderRadius = '8px';
-
setTimeout(() => {
-
timelineEntry.style.outline = '';
-
timelineEntry.style.borderRadius = '';
-
}, 2000);
-
}
-
}
-
});
-
});
-
});
-
</file>
-
-
<file path="src/thicket/templates/style.css">
-
/* Modern, clean design with high-density text and readable theme */
-
-
:root {
-
--primary-color: #2c3e50;
-
--secondary-color: #3498db;
-
--accent-color: #e74c3c;
-
--background: #ffffff;
-
--surface: #f8f9fa;
-
--text-primary: #2c3e50;
-
--text-secondary: #7f8c8d;
-
--border-color: #e0e0e0;
-
--thread-indent: 20px;
-
--max-width: 1200px;
-
}
-
-
* {
-
margin: 0;
-
padding: 0;
-
box-sizing: border-box;
-
}
-
-
body {
-
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Helvetica Neue', Arial, sans-serif;
-
font-size: 14px;
-
line-height: 1.6;
-
color: var(--text-primary);
-
background-color: var(--background);
-
}
-
-
/* Header */
-
.site-header {
-
background-color: var(--surface);
-
border-bottom: 1px solid var(--border-color);
-
padding: 0.75rem 0;
-
position: sticky;
-
top: 0;
-
z-index: 100;
-
}
-
-
.header-content {
-
max-width: var(--max-width);
-
margin: 0 auto;
-
padding: 0 2rem;
-
display: flex;
-
justify-content: space-between;
-
align-items: center;
-
}
-
-
.site-title {
-
font-size: 1.5rem;
-
font-weight: 600;
-
color: var(--primary-color);
-
margin: 0;
-
}
-
-
/* Navigation */
-
.site-nav {
-
display: flex;
-
gap: 1.5rem;
-
}
-
-
.nav-link {
-
text-decoration: none;
-
color: var(--text-secondary);
-
font-weight: 500;
-
font-size: 0.95rem;
-
padding: 0.5rem 0.75rem;
-
border-radius: 4px;
-
transition: all 0.2s ease;
-
}
-
-
.nav-link:hover {
-
color: var(--primary-color);
-
background-color: var(--background);
-
}
-
-
.nav-link.active {
-
color: var(--secondary-color);
-
background-color: var(--background);
-
font-weight: 600;
-
}
-
-
/* Main Content */
-
.main-content {
-
max-width: var(--max-width);
-
margin: 2rem auto;
-
padding: 0 2rem;
-
}
-
-
.page-content {
-
margin: 0;
-
}
-
-
.page-description {
-
color: var(--text-secondary);
-
margin-bottom: 1.5rem;
-
font-style: italic;
-
}
-
-
/* Sections */
-
section {
-
margin-bottom: 2rem;
-
}
-
-
h2 {
-
font-size: 1.3rem;
-
font-weight: 600;
-
margin-bottom: 0.75rem;
-
color: var(--primary-color);
-
}
-
-
h3 {
-
font-size: 1.1rem;
-
font-weight: 600;
-
margin-bottom: 0.75rem;
-
color: var(--primary-color);
-
}
-
-
/* Entries and Threads */
-
article {
-
margin-bottom: 1.5rem;
-
padding: 1rem;
-
background-color: var(--surface);
-
border-radius: 4px;
-
border: 1px solid var(--border-color);
-
}
-
-
/* Timeline-style entries */
-
.timeline-entry {
-
margin-bottom: 0.5rem;
-
padding: 0.5rem 0.75rem;
-
border: none;
-
background: transparent;
-
transition: background-color 0.2s ease;
-
}
-
-
.timeline-entry:hover {
-
background-color: var(--surface);
-
}
-
-
.timeline-meta {
-
display: inline-flex;
-
gap: 0.5rem;
-
align-items: center;
-
font-size: 0.75rem;
-
color: var(--text-secondary);
-
margin-bottom: 0.25rem;
-
}
-
-
.timeline-time {
-
font-family: 'SF Mono', Monaco, Consolas, 'Courier New', monospace;
-
font-size: 0.75rem;
-
color: var(--text-secondary);
-
}
-
-
.timeline-author {
-
font-weight: 600;
-
color: var(--primary-color);
-
font-size: 0.8rem;
-
text-decoration: none;
-
}
-
-
.timeline-author:hover {
-
color: var(--secondary-color);
-
text-decoration: underline;
-
}
-
-
.timeline-content {
-
line-height: 1.4;
-
}
-
-
.timeline-title {
-
font-size: 0.95rem;
-
font-weight: 600;
-
}
-
-
.timeline-title a {
-
color: var(--primary-color);
-
text-decoration: none;
-
}
-
-
.timeline-title a:hover {
-
color: var(--secondary-color);
-
text-decoration: underline;
-
}
-
-
.timeline-summary {
-
color: var(--text-secondary);
-
font-size: 0.9rem;
-
line-height: 1.4;
-
}
-
-
/* Legacy styles for other sections */
-
.entry-meta, .thread-header {
-
display: flex;
-
gap: 1rem;
-
align-items: center;
-
margin-bottom: 0.5rem;
-
font-size: 0.85rem;
-
color: var(--text-secondary);
-
}
-
-
.author {
-
font-weight: 600;
-
color: var(--primary-color);
-
}
-
-
time {
-
font-size: 0.85rem;
-
}
-
-
h4 {
-
font-size: 1.1rem;
-
font-weight: 600;
-
margin-bottom: 0.5rem;
-
}
-
-
h4 a {
-
color: var(--primary-color);
-
text-decoration: none;
-
}
-
-
h4 a:hover {
-
color: var(--secondary-color);
-
text-decoration: underline;
-
}
-
-
.entry-summary {
-
color: var(--text-primary);
-
line-height: 1.5;
-
margin-top: 0.5rem;
-
}
-
-
/* Enhanced Threading Styles */
-
-
/* Conversation Clusters */
-
.conversation-cluster {
-
background-color: var(--background);
-
border: 2px solid var(--border-color);
-
border-radius: 8px;
-
margin-bottom: 2rem;
-
overflow: hidden;
-
box-shadow: 0 2px 4px rgba(0, 0, 0, 0.05);
-
}
-
-
.conversation-header {
-
background: linear-gradient(135deg, var(--surface) 0%, #f1f3f4 100%);
-
padding: 0.75rem 1rem;
-
border-bottom: 1px solid var(--border-color);
-
}
-
-
.conversation-meta {
-
display: flex;
-
justify-content: space-between;
-
align-items: center;
-
flex-wrap: wrap;
-
gap: 0.5rem;
-
}
-
-
.conversation-count {
-
font-weight: 600;
-
color: var(--secondary-color);
-
font-size: 0.9rem;
-
}
-
-
.conversation-participants {
-
font-size: 0.8rem;
-
color: var(--text-secondary);
-
flex: 1;
-
text-align: right;
-
}
-
-
.conversation-flow {
-
padding: 0.5rem;
-
}
-
-
/* Threaded Conversation Entries */
-
.conversation-entry {
-
position: relative;
-
margin-bottom: 0.75rem;
-
display: flex;
-
align-items: flex-start;
-
}
-
-
.conversation-entry.level-0 {
-
margin-left: 0;
-
}
-
-
.conversation-entry.level-1 {
-
margin-left: 1.5rem;
-
}
-
-
.conversation-entry.level-2 {
-
margin-left: 3rem;
-
}
-
-
.conversation-entry.level-3 {
-
margin-left: 4.5rem;
-
}
-
-
.conversation-entry.level-4 {
-
margin-left: 6rem;
-
}
-
-
.entry-connector {
-
width: 3px;
-
background-color: var(--secondary-color);
-
margin-right: 0.75rem;
-
margin-top: 0.25rem;
-
min-height: 2rem;
-
border-radius: 2px;
-
opacity: 0.6;
-
}
-
-
.conversation-entry.level-0 .entry-connector {
-
background-color: var(--accent-color);
-
opacity: 0.8;
-
}
-
-
.entry-content {
-
flex: 1;
-
background-color: var(--surface);
-
padding: 0.75rem;
-
border-radius: 6px;
-
border: 1px solid var(--border-color);
-
transition: all 0.2s ease;
-
}
-
-
.entry-content:hover {
-
border-color: var(--secondary-color);
-
box-shadow: 0 2px 8px rgba(52, 152, 219, 0.1);
-
}
-
-
/* Reference Indicators */
-
.reference-indicators {
-
display: inline-flex;
-
gap: 0.25rem;
-
margin-left: 0.5rem;
-
}
-
-
.ref-out, .ref-in {
-
display: inline-block;
-
width: 1rem;
-
height: 1rem;
-
border-radius: 50%;
-
text-align: center;
-
line-height: 1rem;
-
font-size: 0.7rem;
-
font-weight: bold;
-
}
-
-
.ref-out {
-
background-color: #e8f5e8;
-
color: #2d8f2d;
-
}
-
-
.ref-in {
-
background-color: #e8f0ff;
-
color: #1f5fbf;
-
}
-
-
/* Reference Badges for Individual Posts */
-
.timeline-entry.with-references {
-
background-color: var(--surface);
-
}
-
-
/* Conversation posts in unified timeline */
-
.timeline-entry.conversation-post {
-
background: transparent;
-
border: none;
-
margin-bottom: 0.5rem;
-
padding: 0.5rem 0.75rem;
-
}
-
-
.timeline-entry.conversation-post.level-0 {
-
margin-left: 0;
-
border-left: 2px solid var(--accent-color);
-
padding-left: 0.75rem;
-
}
-
-
.timeline-entry.conversation-post.level-1 {
-
margin-left: 1.5rem;
-
border-left: 2px solid var(--secondary-color);
-
padding-left: 0.75rem;
-
}
-
-
.timeline-entry.conversation-post.level-2 {
-
margin-left: 3rem;
-
border-left: 2px solid var(--text-secondary);
-
padding-left: 0.75rem;
-
}
-
-
.timeline-entry.conversation-post.level-3 {
-
margin-left: 4.5rem;
-
border-left: 2px solid var(--text-secondary);
-
padding-left: 0.75rem;
-
}
-
-
.timeline-entry.conversation-post.level-4 {
-
margin-left: 6rem;
-
border-left: 2px solid var(--text-secondary);
-
padding-left: 0.75rem;
-
}
-
-
/* Cross-thread linking */
-
.cross-thread-links {
-
margin-top: 0.5rem;
-
padding-top: 0.5rem;
-
border-top: 1px solid var(--border-color);
-
}
-
-
.cross-thread-indicator {
-
font-size: 0.75rem;
-
color: var(--text-secondary);
-
background-color: var(--surface);
-
padding: 0.25rem 0.5rem;
-
border-radius: 12px;
-
border: 1px solid var(--border-color);
-
display: inline-block;
-
}
-
-
/* Inline shared references styling */
-
.inline-shared-refs {
-
margin-left: 0.5rem;
-
font-size: 0.85rem;
-
color: var(--text-secondary);
-
}
-
-
.shared-ref-link {
-
color: var(--primary-color);
-
text-decoration: none;
-
font-weight: 500;
-
transition: color 0.2s ease;
-
}
-
-
.shared-ref-link:hover {
-
color: var(--secondary-color);
-
text-decoration: underline;
-
}
-
-
.shared-ref-more {
-
font-style: italic;
-
color: var(--text-secondary);
-
font-size: 0.8rem;
-
margin-left: 0.25rem;
-
}
-
-
.user-anchor, .post-anchor {
-
position: absolute;
-
margin-top: -60px; /* Offset for fixed header */
-
pointer-events: none;
-
}
-
-
.cross-thread-link {
-
color: var(--primary-color);
-
text-decoration: none;
-
font-weight: 500;
-
transition: color 0.2s ease;
-
}
-
-
.cross-thread-link:hover {
-
color: var(--secondary-color);
-
text-decoration: underline;
-
}
-
-
.reference-badges {
-
display: flex;
-
gap: 0.25rem;
-
margin-left: 0.5rem;
-
flex-wrap: wrap;
-
}
-
-
.ref-badge {
-
display: inline-block;
-
padding: 0.1rem 0.4rem;
-
border-radius: 12px;
-
font-size: 0.7rem;
-
font-weight: 600;
-
text-transform: uppercase;
-
letter-spacing: 0.05em;
-
}
-
-
.ref-badge.ref-outbound {
-
background-color: #e8f5e8;
-
color: #2d8f2d;
-
border: 1px solid #c3e6c3;
-
}
-
-
.ref-badge.ref-inbound {
-
background-color: #e8f0ff;
-
color: #1f5fbf;
-
border: 1px solid #b3d9ff;
-
}
-
-
/* Author Color Coding */
-
.timeline-author {
-
position: relative;
-
}
-
-
.timeline-author::before {
-
content: '';
-
display: inline-block;
-
width: 8px;
-
height: 8px;
-
border-radius: 50%;
-
margin-right: 0.5rem;
-
background-color: var(--secondary-color);
-
}
-
-
/* Generate consistent colors for authors */
-
.author-avsm::before { background-color: #e74c3c; }
-
.author-mort::before { background-color: #3498db; }
-
.author-mte::before { background-color: #2ecc71; }
-
.author-ryan::before { background-color: #f39c12; }
-
.author-mwd::before { background-color: #9b59b6; }
-
.author-dra::before { background-color: #1abc9c; }
-
.author-pf341::before { background-color: #34495e; }
-
.author-sadiqj::before { background-color: #e67e22; }
-
.author-martinkl::before { background-color: #8e44ad; }
-
.author-jonsterling::before { background-color: #27ae60; }
-
.author-jon::before { background-color: #f1c40f; }
-
.author-onkar::before { background-color: #e91e63; }
-
.author-gabriel::before { background-color: #00bcd4; }
-
.author-jess::before { background-color: #ff5722; }
-
.author-ibrahim::before { background-color: #607d8b; }
-
.author-andres::before { background-color: #795548; }
-
.author-eeg::before { background-color: #ff9800; }
-
-
/* Section Headers */
-
.conversations-section h3,
-
.referenced-posts-section h3,
-
.individual-posts-section h3 {
-
border-bottom: 2px solid var(--border-color);
-
padding-bottom: 0.5rem;
-
margin-bottom: 1.5rem;
-
position: relative;
-
}
-
-
.conversations-section h3::before {
-
content: "๐Ÿ’ฌ";
-
margin-right: 0.5rem;
-
}
-
-
.referenced-posts-section h3::before {
-
content: "๐Ÿ”—";
-
margin-right: 0.5rem;
-
}
-
-
.individual-posts-section h3::before {
-
content: "๐Ÿ“";
-
margin-right: 0.5rem;
-
}
-
-
/* Legacy thread styles (for backward compatibility) */
-
.thread {
-
background-color: var(--background);
-
border: 1px solid var(--border-color);
-
padding: 0;
-
overflow: hidden;
-
margin-bottom: 1rem;
-
}
-
-
.thread-header {
-
background-color: var(--surface);
-
padding: 0.5rem 0.75rem;
-
border-bottom: 1px solid var(--border-color);
-
}
-
-
.thread-count {
-
font-weight: 600;
-
color: var(--secondary-color);
-
}
-
-
.thread-entry {
-
padding: 0.5rem 0.75rem;
-
border-bottom: 1px solid var(--border-color);
-
}
-
-
.thread-entry:last-child {
-
border-bottom: none;
-
}
-
-
.thread-entry.reply {
-
margin-left: var(--thread-indent);
-
border-left: 3px solid var(--secondary-color);
-
background-color: var(--surface);
-
}
-
-
/* Links Section */
-
.link-group {
-
background-color: var(--background);
-
}
-
-
.link-url {
-
font-size: 1rem;
-
word-break: break-word;
-
}
-
-
.link-url a {
-
color: var(--secondary-color);
-
text-decoration: none;
-
}
-
-
.link-url a:hover {
-
text-decoration: underline;
-
}
-
-
.target-user {
-
font-size: 0.9rem;
-
color: var(--text-secondary);
-
font-weight: normal;
-
}
-
-
.referencing-entries {
-
margin-top: 0.75rem;
-
}
-
-
.ref-count {
-
font-weight: 600;
-
color: var(--text-secondary);
-
font-size: 0.9rem;
-
}
-
-
.referencing-entries ul {
-
list-style: none;
-
margin-top: 0.5rem;
-
padding-left: 1rem;
-
}
-
-
.referencing-entries li {
-
margin-bottom: 0.25rem;
-
font-size: 0.9rem;
-
}
-
-
.referencing-entries .more {
-
font-style: italic;
-
color: var(--text-secondary);
-
}
-
-
/* Users Section */
-
.user-card {
-
background-color: var(--background);
-
}
-
-
.user-header {
-
display: flex;
-
gap: 1rem;
-
align-items: start;
-
margin-bottom: 1rem;
-
}
-
-
.user-icon {
-
width: 48px;
-
height: 48px;
-
border-radius: 50%;
-
object-fit: cover;
-
}
-
-
.user-info h3 {
-
margin-bottom: 0.25rem;
-
}
-
-
.username {
-
font-size: 0.9rem;
-
color: var(--text-secondary);
-
font-weight: normal;
-
}
-
-
.user-meta {
-
font-size: 0.9rem;
-
color: var(--text-secondary);
-
}
-
-
.user-meta a {
-
color: var(--secondary-color);
-
text-decoration: none;
-
}
-
-
.user-meta a:hover {
-
text-decoration: underline;
-
}
-
-
.separator {
-
margin: 0 0.5rem;
-
}
-
-
.post-count {
-
font-weight: 600;
-
}
-
-
.user-recent h4 {
-
font-size: 0.95rem;
-
margin-bottom: 0.5rem;
-
color: var(--text-secondary);
-
}
-
-
.user-recent ul {
-
list-style: none;
-
padding-left: 0;
-
}
-
-
.user-recent li {
-
margin-bottom: 0.25rem;
-
font-size: 0.9rem;
-
}
-
-
/* Footer */
-
.site-footer {
-
max-width: var(--max-width);
-
margin: 3rem auto 2rem;
-
padding: 1rem 2rem;
-
text-align: center;
-
color: var(--text-secondary);
-
font-size: 0.85rem;
-
border-top: 1px solid var(--border-color);
-
}
-
-
.site-footer a {
-
color: var(--secondary-color);
-
text-decoration: none;
-
}
-
-
.site-footer a:hover {
-
text-decoration: underline;
-
}
-
-
/* Responsive */
-
@media (max-width: 768px) {
-
.site-title {
-
font-size: 1.3rem;
-
}
-
-
.header-content {
-
flex-direction: column;
-
gap: 0.75rem;
-
align-items: flex-start;
-
}
-
-
.site-nav {
-
gap: 1rem;
-
}
-
-
.main-content {
-
padding: 0 1rem;
-
}
-
-
.thread-entry.reply {
-
margin-left: calc(var(--thread-indent) / 2);
-
}
-
-
.user-header {
-
flex-direction: column;
-
}
-
}
-
</file>
-
-
<file path="src/thicket/templates/timeline.html">
-
{% extends "base.html" %}
-
-
{% block page_title %}Timeline - {{ title }}{% endblock %}
-
-
{% block content %}
-
{% set seen_users = [] %}
-
<div class="page-content">
-
<h2>Recent Posts & Conversations</h2>
-
-
<section class="unified-timeline">
-
{% for item in timeline_items %}
-
{% if item.type == "post" %}
-
<!-- Individual Post -->
-
<article class="timeline-entry {% if item.content.references %}with-references{% endif %}">
-
<div class="timeline-meta">
-
<time datetime="{{ item.content.entry.updated or item.content.entry.published }}" class="timeline-time">
-
{{ (item.content.entry.updated or item.content.entry.published).strftime('%Y-%m-%d %H:%M') }}
-
</time>
-
{% set homepage = get_user_homepage(item.content.username) %}
-
{% if item.content.username not in seen_users %}
-
<a id="{{ item.content.username }}" class="user-anchor"></a>
-
{% set _ = seen_users.append(item.content.username) %}
-
{% endif %}
-
<a id="post-{{ loop.index0 }}-{{ safe_anchor_id(item.content.entry.id) }}" class="post-anchor"></a>
-
{% if homepage %}
-
<a href="{{ homepage }}" target="_blank" class="timeline-author">{{ item.content.display_name }}</a>
-
{% else %}
-
<span class="timeline-author">{{ item.content.display_name }}</span>
-
{% endif %}
-
{% if item.content.references %}
-
<div class="reference-badges">
-
{% for ref in item.content.references %}
-
{% if ref.type == 'outbound' %}
-
<span class="ref-badge ref-outbound" title="References {{ ref.target_username or 'external post' }}">
-
โ†’ {{ ref.target_username or 'ext' }}
-
</span>
-
{% elif ref.type == 'inbound' %}
-
<span class="ref-badge ref-inbound" title="Referenced by {{ ref.source_username or 'external post' }}">
-
โ† {{ ref.source_username or 'ext' }}
-
</span>
-
{% endif %}
-
{% endfor %}
-
</div>
-
{% endif %}
-
</div>
-
<div class="timeline-content">
-
<strong class="timeline-title">
-
<a href="{{ item.content.entry.link }}" target="_blank">{{ item.content.entry.title }}</a>
-
</strong>
-
{% if item.content.entry.summary %}
-
<span class="timeline-summary">โ€” {{ clean_html_summary(item.content.entry.summary, 250) }}</span>
-
{% endif %}
-
{% if item.content.shared_references %}
-
<span class="inline-shared-refs">
-
{% for ref in item.content.shared_references[:3] %}
-
{% if ref.target_username %}
-
<a href="#{{ ref.target_username }}" class="shared-ref-link" title="Referenced by {{ ref.count }} entries">@{{ ref.target_username }}</a>{% if not loop.last %}, {% endif %}
-
{% endif %}
-
{% endfor %}
-
{% if item.content.shared_references|length > 3 %}
-
<span class="shared-ref-more">+{{ item.content.shared_references|length - 3 }} more</span>
-
{% endif %}
-
</span>
-
{% endif %}
-
{% if item.content.cross_thread_links %}
-
<div class="cross-thread-links">
-
<span class="cross-thread-indicator">๐Ÿ”— Also appears: </span>
-
{% for link in item.content.cross_thread_links %}
-
<a href="#{{ link.anchor_id }}" class="cross-thread-link" title="{{ link.title }}">{{ link.context }}</a>{% if not loop.last %}, {% endif %}
-
{% endfor %}
-
</div>
-
{% endif %}
-
</div>
-
</article>
-
-
{% elif item.type == "thread" %}
-
<!-- Conversation Thread -->
-
{% set outer_loop_index = loop.index0 %}
-
{% for thread_item in item.content %}
-
<article class="timeline-entry conversation-post level-{{ thread_item.thread_level }}">
-
<div class="timeline-meta">
-
<time datetime="{{ thread_item.entry.updated or thread_item.entry.published }}" class="timeline-time">
-
{{ (thread_item.entry.updated or thread_item.entry.published).strftime('%Y-%m-%d %H:%M') }}
-
</time>
-
{% set homepage = get_user_homepage(thread_item.username) %}
-
{% if thread_item.username not in seen_users %}
-
<a id="{{ thread_item.username }}" class="user-anchor"></a>
-
{% set _ = seen_users.append(thread_item.username) %}
-
{% endif %}
-
<a id="post-{{ outer_loop_index }}-{{ loop.index0 }}-{{ safe_anchor_id(thread_item.entry.id) }}" class="post-anchor"></a>
-
{% if homepage %}
-
<a href="{{ homepage }}" target="_blank" class="timeline-author author-{{ thread_item.username }}">{{ thread_item.display_name }}</a>
-
{% else %}
-
<span class="timeline-author author-{{ thread_item.username }}">{{ thread_item.display_name }}</span>
-
{% endif %}
-
{% if thread_item.references_to or thread_item.referenced_by %}
-
<span class="reference-indicators">
-
{% if thread_item.references_to %}
-
<span class="ref-out" title="References other posts">โ†’</span>
-
{% endif %}
-
{% if thread_item.referenced_by %}
-
<span class="ref-in" title="Referenced by other posts">โ†</span>
-
{% endif %}
-
</span>
-
{% endif %}
-
</div>
-
<div class="timeline-content">
-
<strong class="timeline-title">
-
<a href="{{ thread_item.entry.link }}" target="_blank">{{ thread_item.entry.title }}</a>
-
</strong>
-
{% if thread_item.entry.summary %}
-
<span class="timeline-summary">โ€” {{ clean_html_summary(thread_item.entry.summary, 300) }}</span>
-
{% endif %}
-
{% if thread_item.shared_references %}
-
<span class="inline-shared-refs">
-
{% for ref in thread_item.shared_references[:3] %}
-
{% if ref.target_username %}
-
<a href="#{{ ref.target_username }}" class="shared-ref-link" title="Referenced by {{ ref.count }} entries">@{{ ref.target_username }}</a>{% if not loop.last %}, {% endif %}
-
{% endif %}
-
{% endfor %}
-
{% if thread_item.shared_references|length > 3 %}
-
<span class="shared-ref-more">+{{ thread_item.shared_references|length - 3 }} more</span>
-
{% endif %}
-
</span>
-
{% endif %}
-
{% if thread_item.cross_thread_links %}
-
<div class="cross-thread-links">
-
<span class="cross-thread-indicator">๐Ÿ”— Also appears: </span>
-
{% for link in thread_item.cross_thread_links %}
-
<a href="#{{ link.anchor_id }}" class="cross-thread-link" title="{{ link.title }}">{{ link.context }}</a>{% if not loop.last %}, {% endif %}
-
{% endfor %}
-
</div>
-
{% endif %}
-
</div>
-
</article>
-
{% endfor %}
-
{% endif %}
-
{% endfor %}
-
</section>
-
</div>
-
{% endblock %}
-
</file>
-
-
<file path="src/thicket/templates/users.html">
-
{% extends "base.html" %}
-
-
{% block page_title %}Users - {{ title }}{% endblock %}
-
-
{% block content %}
-
<div class="page-content">
-
<h2>Users</h2>
-
<p class="page-description">All users contributing to this thicket, ordered by post count.</p>
-
-
{% for user_info in users %}
-
<article class="user-card">
-
<div class="user-header">
-
{% if user_info.metadata.icon and user_info.metadata.icon != "None" %}
-
<img src="{{ user_info.metadata.icon }}" alt="{{ user_info.metadata.username }}" class="user-icon">
-
{% endif %}
-
<div class="user-info">
-
<h3>
-
{% if user_info.metadata.display_name %}
-
{{ user_info.metadata.display_name }}
-
<span class="username">({{ user_info.metadata.username }})</span>
-
{% else %}
-
{{ user_info.metadata.username }}
-
{% endif %}
-
</h3>
-
<div class="user-meta">
-
{% if user_info.metadata.homepage %}
-
<a href="{{ user_info.metadata.homepage }}" target="_blank">{{ user_info.metadata.homepage }}</a>
-
{% endif %}
-
{% if user_info.metadata.email %}
-
<span class="separator">โ€ข</span>
-
<a href="mailto:{{ user_info.metadata.email }}">{{ user_info.metadata.email }}</a>
-
{% endif %}
-
<span class="separator">โ€ข</span>
-
<span class="post-count">{{ user_info.metadata.entry_count }} posts</span>
-
</div>
-
</div>
-
</div>
-
-
{% if user_info.recent_entries %}
-
<div class="user-recent">
-
<h4>Recent posts:</h4>
-
<ul>
-
{% for display_name, entry in user_info.recent_entries %}
-
<li>
-
<a href="{{ entry.link }}" target="_blank">{{ entry.title }}</a>
-
<time datetime="{{ entry.updated or entry.published }}">
-
({{ (entry.updated or entry.published).strftime('%Y-%m-%d') }})
-
</time>
-
</li>
-
{% endfor %}
-
</ul>
-
</div>
-
{% endif %}
-
</article>
-
{% endfor %}
-
</div>
-
{% endblock %}
-
</file>
-
-
<file path="README.md">
-
# Thicket
-
-
A modern CLI tool for persisting Atom/RSS feeds in Git repositories, designed to enable distributed webblog comment structures.
-
-
## Features
-
-
- **Feed Auto-Discovery**: Automatically extracts user metadata from Atom/RSS feeds
-
- **Git Storage**: Stores feed entries in a Git repository with full history
-
- **Duplicate Management**: Manual curation of duplicate entries across feeds
-
- **Modern CLI**: Built with Typer and Rich for beautiful terminal output
-
- **Comprehensive Parsing**: Supports RSS 0.9x, RSS 1.0, RSS 2.0, and Atom feeds
-
- **Cron-Friendly**: Designed for scheduled execution
-
-
## Installation
-
-
```bash
-
# Install from source
-
pip install -e .
-
-
# Or install with dev dependencies
-
pip install -e .[dev]
-
```
-
-
## Quick Start
-
-
1. **Initialize a new thicket repository:**
-
```bash
-
thicket init ./my-feeds
-
```
-
-
2. **Add a user with their feed:**
-
```bash
-
thicket add user "alice" --feed "https://alice.example.com/feed.xml"
-
```
-
-
3. **Sync feeds to download entries:**
-
```bash
-
thicket sync --all
-
```
-
-
4. **List users and feeds:**
-
```bash
-
thicket list users
-
thicket list feeds
-
thicket list entries
-
```
-
-
## Commands
-
-
### Initialize
-
```bash
-
thicket init <git-store-path> [--cache-dir <path>] [--config <config-file>]
-
```
-
-
### Add Users and Feeds
-
```bash
-
# Add user with auto-discovery
-
thicket add user "username" --feed "https://example.com/feed.xml"
-
-
# Add user with manual metadata
-
thicket add user "username" \
-
--feed "https://example.com/feed.xml" \
-
--email "user@example.com" \
-
--homepage "https://example.com" \
-
--display-name "User Name"
-
-
# Add additional feed to existing user
-
thicket add feed "username" "https://example.com/other-feed.xml"
-
```
-
-
### Sync Feeds
-
```bash
-
# Sync all users
-
thicket sync --all
-
-
# Sync specific user
-
thicket sync --user "username"
-
-
# Dry run (preview changes)
-
thicket sync --all --dry-run
-
```
-
-
### List Information
-
```bash
-
# List all users
-
thicket list users
-
-
# List all feeds
-
thicket list feeds
-
-
# List feeds for specific user
-
thicket list feeds --user "username"
-
-
# List recent entries
-
thicket list entries --limit 20
-
-
# List entries for specific user
-
thicket list entries --user "username"
-
```
-
-
### Manage Duplicates
-
```bash
-
# List duplicate mappings
-
thicket duplicates list
-
-
# Mark entries as duplicates
-
thicket duplicates add "https://example.com/dup" "https://example.com/canonical"
-
-
# Remove duplicate mapping
-
thicket duplicates remove "https://example.com/dup"
-
```
-
-
## Configuration
-
-
Thicket uses a YAML configuration file (default: `thicket.yaml`):
-
-
```yaml
-
git_store: ./feeds-repo
-
cache_dir: ~/.cache/thicket
-
users:
-
- username: alice
-
feeds:
-
- https://alice.example.com/feed.xml
-
email: alice@example.com
-
homepage: https://alice.example.com
-
display_name: Alice
-
```
-
-
## Git Repository Structure
-
-
```
-
feeds-repo/
-
โ”œโ”€โ”€ index.json # User directory index
-
โ”œโ”€โ”€ duplicates.json # Duplicate entry mappings
-
โ”œโ”€โ”€ alice/
-
โ”‚ โ”œโ”€โ”€ metadata.json # User metadata
-
โ”‚ โ”œโ”€โ”€ entry_id_1.json # Feed entries
-
โ”‚ โ””โ”€โ”€ entry_id_2.json
-
โ””โ”€โ”€ bob/
-
โ””โ”€โ”€ ...
-
```
-
-
## Development
-
-
### Setup
-
```bash
-
# Install in development mode
-
pip install -e .[dev]
-
-
# Run tests
-
pytest
-
-
# Run linting
-
ruff check src/
-
black --check src/
-
-
# Run type checking
-
mypy src/
-
```
-
-
### Architecture
-
-
- **CLI**: Modern interface with Typer and Rich
-
- **Feed Processing**: Universal parsing with feedparser
-
- **Git Storage**: Structured storage with GitPython
-
- **Data Models**: Pydantic for validation and serialization
-
- **Async HTTP**: httpx for efficient feed fetching
-
-
## Use Cases
-
-
- **Blog Aggregation**: Collect and archive blog posts from multiple sources
-
- **Comment Networks**: Enable distributed commenting systems
-
- **Feed Archival**: Preserve feed history beyond typical feed depth limits
-
- **Content Curation**: Manage and deduplicate content across feeds
-
-
## License
-
-
MIT License - see LICENSE file for details.
-
</file>
-
-
<file path="src/thicket/cli/commands/index_cmd.py">
-
"""CLI command for building reference index from blog entries."""
-
-
import json
-
from pathlib import Path
-
from typing import Optional
-
-
import typer
-
from rich.console import Console
-
from rich.progress import (
-
BarColumn,
-
Progress,
-
SpinnerColumn,
-
TaskProgressColumn,
-
TextColumn,
-
)
-
from rich.table import Table
-
-
from ...core.git_store import GitStore
-
from ...core.reference_parser import ReferenceIndex, ReferenceParser
-
from ..main import app
-
from ..utils import get_tsv_mode, load_config
-
-
console = Console()
-
-
-
@app.command()
-
def index(
-
config_file: Optional[Path] = typer.Option(
-
None,
-
"--config",
-
"-c",
-
help="Path to configuration file",
-
),
-
output_file: Optional[Path] = typer.Option(
-
None,
-
"--output",
-
"-o",
-
help="Path to output index file (default: updates links.json in git store)",
-
),
-
verbose: bool = typer.Option(
-
False,
-
"--verbose",
-
"-v",
-
help="Show detailed progress information",
-
),
-
) -> None:
-
"""Build a reference index showing which blog entries reference others.
-
-
This command analyzes all blog entries to detect cross-references between
-
different blogs, creating an index that can be used to build threaded
-
views of related content.
-
-
Updates the unified links.json file with reference data.
-
"""
-
try:
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
# Initialize reference parser
-
parser = ReferenceParser()
-
-
# Build user domain mapping
-
if verbose:
-
console.print("Building user domain mapping...")
-
user_domains = parser.build_user_domain_mapping(git_store)
-
-
if verbose:
-
console.print(f"Found {len(user_domains)} users with {sum(len(d) for d in user_domains.values())} total domains")
-
-
# Initialize reference index
-
ref_index = ReferenceIndex()
-
ref_index.user_domains = user_domains
-
-
# Get all users
-
index = git_store._load_index()
-
users = list(index.users.keys())
-
-
if not users:
-
console.print("[yellow]No users found in Git store[/yellow]")
-
raise typer.Exit(0)
-
-
# Process all entries
-
total_entries = 0
-
total_references = 0
-
all_references = []
-
-
with Progress(
-
SpinnerColumn(),
-
TextColumn("[progress.description]{task.description}"),
-
BarColumn(),
-
TaskProgressColumn(),
-
console=console,
-
) as progress:
-
-
# Count total entries first
-
counting_task = progress.add_task("Counting entries...", total=len(users))
-
entry_counts = {}
-
for username in users:
-
entries = git_store.list_entries(username)
-
entry_counts[username] = len(entries)
-
total_entries += len(entries)
-
progress.advance(counting_task)
-
-
progress.remove_task(counting_task)
-
-
# Process entries - extract references
-
processing_task = progress.add_task(
-
f"Extracting references from {total_entries} entries...",
-
total=total_entries
-
)
-
-
for username in users:
-
entries = git_store.list_entries(username)
-
-
for entry in entries:
-
# Extract references from this entry
-
references = parser.extract_references(entry, username, user_domains)
-
all_references.extend(references)
-
-
progress.advance(processing_task)
-
-
if verbose and references:
-
console.print(f" Found {len(references)} references in {username}:{entry.title[:50]}...")
-
-
progress.remove_task(processing_task)
-
-
# Resolve target_entry_ids for references
-
if all_references:
-
resolve_task = progress.add_task(
-
f"Resolving {len(all_references)} references...",
-
total=len(all_references)
-
)
-
-
if verbose:
-
console.print(f"Resolving target entry IDs for {len(all_references)} references...")
-
-
resolved_references = parser.resolve_target_entry_ids(all_references, git_store)
-
-
# Count resolved references
-
resolved_count = sum(1 for ref in resolved_references if ref.target_entry_id is not None)
-
if verbose:
-
console.print(f"Resolved {resolved_count} out of {len(all_references)} references")
-
-
# Add resolved references to index
-
for ref in resolved_references:
-
ref_index.add_reference(ref)
-
total_references += 1
-
progress.advance(resolve_task)
-
-
progress.remove_task(resolve_task)
-
-
# Determine output path
-
if output_file:
-
output_path = output_file
-
else:
-
output_path = config.git_store / "links.json"
-
-
# Load existing links data or create new structure
-
if output_path.exists() and not output_file:
-
# Load existing unified structure
-
with open(output_path) as f:
-
existing_data = json.load(f)
-
else:
-
# Create new structure
-
existing_data = {
-
"links": {},
-
"reverse_mapping": {},
-
"user_domains": {}
-
}
-
-
# Update with reference data
-
existing_data["references"] = ref_index.to_dict()["references"]
-
existing_data["user_domains"] = {k: list(v) for k, v in user_domains.items()}
-
-
# Save updated structure
-
with open(output_path, "w") as f:
-
json.dump(existing_data, f, indent=2, default=str)
-
-
# Show summary
-
if not get_tsv_mode():
-
console.print("\n[green]โœ“ Reference index built successfully[/green]")
-
-
# Create summary table or TSV output
-
if get_tsv_mode():
-
print("Metric\tCount")
-
print(f"Total Users\t{len(users)}")
-
print(f"Total Entries\t{total_entries}")
-
print(f"Total References\t{total_references}")
-
print(f"Outbound Refs\t{len(ref_index.outbound_refs)}")
-
print(f"Inbound Refs\t{len(ref_index.inbound_refs)}")
-
print(f"Output File\t{output_path}")
-
else:
-
table = Table(title="Reference Index Summary")
-
table.add_column("Metric", style="cyan")
-
table.add_column("Count", style="green")
-
-
table.add_row("Total Users", str(len(users)))
-
table.add_row("Total Entries", str(total_entries))
-
table.add_row("Total References", str(total_references))
-
table.add_row("Outbound Refs", str(len(ref_index.outbound_refs)))
-
table.add_row("Inbound Refs", str(len(ref_index.inbound_refs)))
-
table.add_row("Output File", str(output_path))
-
-
console.print(table)
-
-
# Show some interesting statistics
-
if total_references > 0:
-
if not get_tsv_mode():
-
console.print("\n[bold]Reference Statistics:[/bold]")
-
-
# Most referenced users
-
target_counts = {}
-
unresolved_domains = set()
-
-
for ref in ref_index.references:
-
if ref.target_username:
-
target_counts[ref.target_username] = target_counts.get(ref.target_username, 0) + 1
-
else:
-
# Track unresolved domains
-
from urllib.parse import urlparse
-
domain = urlparse(ref.target_url).netloc.lower()
-
unresolved_domains.add(domain)
-
-
if target_counts:
-
if get_tsv_mode():
-
print("Referenced User\tReference Count")
-
for username, count in sorted(target_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
-
print(f"{username}\t{count}")
-
else:
-
console.print("\nMost referenced users:")
-
for username, count in sorted(target_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
-
console.print(f" {username}: {count} references")
-
-
if unresolved_domains and verbose:
-
if get_tsv_mode():
-
print("Unresolved Domain\tCount")
-
for domain in sorted(list(unresolved_domains)[:10]):
-
print(f"{domain}\t1")
-
if len(unresolved_domains) > 10:
-
print(f"... and {len(unresolved_domains) - 10} more\t...")
-
else:
-
console.print(f"\nUnresolved domains: {len(unresolved_domains)}")
-
for domain in sorted(list(unresolved_domains)[:10]):
-
console.print(f" {domain}")
-
if len(unresolved_domains) > 10:
-
console.print(f" ... and {len(unresolved_domains) - 10} more")
-
-
except Exception as e:
-
console.print(f"[red]Error building reference index: {e}[/red]")
-
if verbose:
-
console.print_exception()
-
raise typer.Exit(1)
-
-
-
@app.command()
-
def threads(
-
config_file: Optional[Path] = typer.Option(
-
None,
-
"--config",
-
"-c",
-
help="Path to configuration file",
-
),
-
index_file: Optional[Path] = typer.Option(
-
None,
-
"--index",
-
"-i",
-
help="Path to reference index file (default: links.json in git store)",
-
),
-
username: Optional[str] = typer.Option(
-
None,
-
"--username",
-
"-u",
-
help="Show threads for specific username only",
-
),
-
entry_id: Optional[str] = typer.Option(
-
None,
-
"--entry",
-
"-e",
-
help="Show thread for specific entry ID",
-
),
-
min_size: int = typer.Option(
-
2,
-
"--min-size",
-
"-m",
-
help="Minimum thread size to display",
-
),
-
) -> None:
-
"""Show threaded view of related blog entries.
-
-
This command uses the reference index to show which blog entries
-
are connected through cross-references, creating an email-style
-
threaded view of the conversation.
-
-
Reads reference data from the unified links.json file.
-
"""
-
try:
-
# Load configuration
-
config = load_config(config_file)
-
-
# Determine index file path
-
if index_file:
-
index_path = index_file
-
else:
-
index_path = config.git_store / "links.json"
-
-
if not index_path.exists():
-
console.print(f"[red]Links file not found: {index_path}[/red]")
-
console.print("Run 'thicket links' and 'thicket index' first to build the reference index")
-
raise typer.Exit(1)
-
-
# Load unified data
-
with open(index_path) as f:
-
unified_data = json.load(f)
-
-
# Check if references exist in the unified structure
-
if "references" not in unified_data:
-
console.print(f"[red]No references found in {index_path}[/red]")
-
console.print("Run 'thicket index' first to build the reference index")
-
raise typer.Exit(1)
-
-
# Extract reference data and reconstruct ReferenceIndex
-
ref_index = ReferenceIndex.from_dict({
-
"references": unified_data["references"],
-
"user_domains": unified_data.get("user_domains", {})
-
})
-
-
# Initialize Git store to get entry details
-
git_store = GitStore(config.git_store)
-
-
if entry_id and username:
-
# Show specific thread
-
thread_members = ref_index.get_thread_members(username, entry_id)
-
_display_thread(thread_members, ref_index, git_store, f"Thread for {username}:{entry_id}")
-
-
elif username:
-
# Show all threads involving this user
-
user_index = git_store._load_index()
-
user = user_index.get_user(username)
-
if not user:
-
console.print(f"[red]User not found: {username}[/red]")
-
raise typer.Exit(1)
-
-
entries = git_store.list_entries(username)
-
threads_found = set()
-
-
console.print(f"[bold]Threads involving {username}:[/bold]\n")
-
-
for entry in entries:
-
thread_members = ref_index.get_thread_members(username, entry.id)
-
if len(thread_members) >= min_size:
-
thread_key = tuple(sorted(thread_members))
-
if thread_key not in threads_found:
-
threads_found.add(thread_key)
-
_display_thread(thread_members, ref_index, git_store, f"Thread #{len(threads_found)}")
-
-
else:
-
# Show all threads
-
console.print("[bold]All conversation threads:[/bold]\n")
-
-
all_threads = set()
-
processed_entries = set()
-
-
# Get all entries
-
user_index = git_store._load_index()
-
for username in user_index.users.keys():
-
entries = git_store.list_entries(username)
-
for entry in entries:
-
entry_key = (username, entry.id)
-
if entry_key in processed_entries:
-
continue
-
-
thread_members = ref_index.get_thread_members(username, entry.id)
-
if len(thread_members) >= min_size:
-
thread_key = tuple(sorted(thread_members))
-
if thread_key not in all_threads:
-
all_threads.add(thread_key)
-
_display_thread(thread_members, ref_index, git_store, f"Thread #{len(all_threads)}")
-
-
# Mark all members as processed
-
for member in thread_members:
-
processed_entries.add(member)
-
-
if not all_threads:
-
console.print("[yellow]No conversation threads found[/yellow]")
-
console.print(f"(minimum thread size: {min_size})")
-
-
except Exception as e:
-
console.print(f"[red]Error showing threads: {e}[/red]")
-
raise typer.Exit(1)
-
-
-
def _display_thread(thread_members, ref_index, git_store, title):
-
"""Display a single conversation thread."""
-
console.print(f"[bold cyan]{title}[/bold cyan]")
-
console.print(f"Thread size: {len(thread_members)} entries")
-
-
# Get entry details for each member
-
thread_entries = []
-
for username, entry_id in thread_members:
-
entry = git_store.get_entry(username, entry_id)
-
if entry:
-
thread_entries.append((username, entry))
-
-
# Sort by publication date
-
thread_entries.sort(key=lambda x: x[1].published or x[1].updated)
-
-
# Display entries
-
for i, (username, entry) in enumerate(thread_entries):
-
prefix = "โ”œโ”€" if i < len(thread_entries) - 1 else "โ””โ”€"
-
-
# Get references for this entry
-
outbound = ref_index.get_outbound_refs(username, entry.id)
-
inbound = ref_index.get_inbound_refs(username, entry.id)
-
-
ref_info = ""
-
if outbound or inbound:
-
ref_info = f" ({len(outbound)} out, {len(inbound)} in)"
-
-
console.print(f" {prefix} [{username}] {entry.title[:60]}...{ref_info}")
-
-
if entry.published:
-
console.print(f" Published: {entry.published.strftime('%Y-%m-%d')}")
-
-
console.print() # Empty line after each thread
-
</file>
-
-
<file path="src/thicket/cli/commands/info_cmd.py">
-
"""CLI command for displaying detailed information about a specific atom entry."""
-
-
import json
-
from pathlib import Path
-
from typing import Optional
-
-
import typer
-
from rich.console import Console
-
from rich.panel import Panel
-
from rich.table import Table
-
from rich.text import Text
-
-
from ...core.git_store import GitStore
-
from ...core.reference_parser import ReferenceIndex
-
from ..main import app
-
from ..utils import load_config, get_tsv_mode
-
-
console = Console()
-
-
-
@app.command()
-
def info(
-
identifier: str = typer.Argument(
-
...,
-
help="The atom ID or URL of the entry to display information about"
-
),
-
username: Optional[str] = typer.Option(
-
None,
-
"--username",
-
"-u",
-
help="Username to search for the entry (if not provided, searches all users)"
-
),
-
config_file: Optional[Path] = typer.Option(
-
Path("thicket.yaml"),
-
"--config",
-
"-c",
-
help="Path to configuration file",
-
),
-
show_content: bool = typer.Option(
-
False,
-
"--content",
-
help="Include the full content of the entry in the output"
-
),
-
) -> None:
-
"""Display detailed information about a specific atom entry.
-
-
You can specify the entry using either its atom ID or URL.
-
Shows all metadata for the given entry, including title, dates, categories,
-
and summarizes all inbound and outbound links to/from other posts.
-
"""
-
try:
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
# Find the entry
-
entry = None
-
found_username = None
-
-
# Check if identifier looks like a URL
-
is_url = identifier.startswith(('http://', 'https://'))
-
-
if username:
-
# Search specific username
-
if is_url:
-
# Search by URL
-
entries = git_store.list_entries(username)
-
for e in entries:
-
if str(e.link) == identifier:
-
entry = e
-
found_username = username
-
break
-
else:
-
# Search by atom ID
-
entry = git_store.get_entry(username, identifier)
-
if entry:
-
found_username = username
-
else:
-
# Search all users
-
index = git_store._load_index()
-
for user in index.users.keys():
-
if is_url:
-
# Search by URL
-
entries = git_store.list_entries(user)
-
for e in entries:
-
if str(e.link) == identifier:
-
entry = e
-
found_username = user
-
break
-
if entry:
-
break
-
else:
-
# Search by atom ID
-
entry = git_store.get_entry(user, identifier)
-
if entry:
-
found_username = user
-
break
-
-
if not entry or not found_username:
-
if username:
-
console.print(f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found for user '{username}'[/red]")
-
else:
-
console.print(f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found in any user's entries[/red]")
-
raise typer.Exit(1)
-
-
# Load reference index if available
-
links_path = config.git_store / "links.json"
-
ref_index = None
-
if links_path.exists():
-
with open(links_path) as f:
-
unified_data = json.load(f)
-
-
# Check if references exist in the unified structure
-
if "references" in unified_data:
-
ref_index = ReferenceIndex.from_dict({
-
"references": unified_data["references"],
-
"user_domains": unified_data.get("user_domains", {})
-
})
-
-
# Display information
-
if get_tsv_mode():
-
_display_entry_info_tsv(entry, found_username, ref_index, show_content)
-
else:
-
_display_entry_info(entry, found_username)
-
-
if ref_index:
-
_display_link_info(entry, found_username, ref_index)
-
else:
-
console.print("\n[yellow]No reference index found. Run 'thicket links' and 'thicket index' to build cross-reference data.[/yellow]")
-
-
# Optionally display content
-
if show_content and entry.content:
-
_display_content(entry.content)
-
-
except Exception as e:
-
console.print(f"[red]Error displaying entry info: {e}[/red]")
-
raise typer.Exit(1)
-
-
-
def _display_entry_info(entry, username: str) -> None:
-
"""Display basic entry information in a structured format."""
-
-
# Create main info panel
-
info_table = Table.grid(padding=(0, 2))
-
info_table.add_column("Field", style="cyan bold", width=15)
-
info_table.add_column("Value", style="white")
-
-
info_table.add_row("User", f"[green]{username}[/green]")
-
info_table.add_row("Atom ID", f"[blue]{entry.id}[/blue]")
-
info_table.add_row("Title", entry.title)
-
info_table.add_row("Link", str(entry.link))
-
-
if entry.published:
-
info_table.add_row("Published", entry.published.strftime("%Y-%m-%d %H:%M:%S UTC"))
-
-
info_table.add_row("Updated", entry.updated.strftime("%Y-%m-%d %H:%M:%S UTC"))
-
-
if entry.summary:
-
# Truncate long summaries
-
summary = entry.summary[:200] + "..." if len(entry.summary) > 200 else entry.summary
-
info_table.add_row("Summary", summary)
-
-
if entry.categories:
-
categories_text = ", ".join(entry.categories)
-
info_table.add_row("Categories", categories_text)
-
-
if entry.author:
-
author_info = []
-
if "name" in entry.author:
-
author_info.append(entry.author["name"])
-
if "email" in entry.author:
-
author_info.append(f"<{entry.author['email']}>")
-
if author_info:
-
info_table.add_row("Author", " ".join(author_info))
-
-
if entry.content_type:
-
info_table.add_row("Content Type", entry.content_type)
-
-
if entry.rights:
-
info_table.add_row("Rights", entry.rights)
-
-
if entry.source:
-
info_table.add_row("Source Feed", entry.source)
-
-
panel = Panel(
-
info_table,
-
title=f"[bold]Entry Information[/bold]",
-
border_style="blue"
-
)
-
-
console.print(panel)
-
-
-
def _display_link_info(entry, username: str, ref_index: ReferenceIndex) -> None:
-
"""Display inbound and outbound link information."""
-
-
# Get links
-
outbound_refs = ref_index.get_outbound_refs(username, entry.id)
-
inbound_refs = ref_index.get_inbound_refs(username, entry.id)
-
-
if not outbound_refs and not inbound_refs:
-
console.print("\n[dim]No cross-references found for this entry.[/dim]")
-
return
-
-
# Create links table
-
links_table = Table(title="Cross-References")
-
links_table.add_column("Direction", style="cyan", width=10)
-
links_table.add_column("Target/Source", style="green", width=20)
-
links_table.add_column("URL", style="blue", width=50)
-
-
# Add outbound references
-
for ref in outbound_refs:
-
target_info = f"{ref.target_username}:{ref.target_entry_id}" if ref.target_username and ref.target_entry_id else "External"
-
links_table.add_row("โ†’ Out", target_info, ref.target_url)
-
-
# Add inbound references
-
for ref in inbound_refs:
-
source_info = f"{ref.source_username}:{ref.source_entry_id}"
-
links_table.add_row("โ† In", source_info, ref.target_url)
-
-
console.print()
-
console.print(links_table)
-
-
# Summary
-
console.print(f"\n[bold]Summary:[/bold] {len(outbound_refs)} outbound, {len(inbound_refs)} inbound references")
-
-
-
def _display_content(content: str) -> None:
-
"""Display the full content of the entry."""
-
-
# Truncate very long content
-
display_content = content
-
if len(content) > 5000:
-
display_content = content[:5000] + "\n\n[... content truncated ...]"
-
-
panel = Panel(
-
display_content,
-
title="[bold]Entry Content[/bold]",
-
border_style="green",
-
expand=False
-
)
-
-
console.print()
-
console.print(panel)
-
-
-
def _display_entry_info_tsv(entry, username: str, ref_index: Optional[ReferenceIndex], show_content: bool) -> None:
-
"""Display entry information in TSV format."""
-
-
# Basic info
-
print("Field\tValue")
-
print(f"User\t{username}")
-
print(f"Atom ID\t{entry.id}")
-
print(f"Title\t{entry.title.replace(chr(9), ' ').replace(chr(10), ' ').replace(chr(13), ' ')}")
-
print(f"Link\t{entry.link}")
-
-
if entry.published:
-
print(f"Published\t{entry.published.strftime('%Y-%m-%d %H:%M:%S UTC')}")
-
-
print(f"Updated\t{entry.updated.strftime('%Y-%m-%d %H:%M:%S UTC')}")
-
-
if entry.summary:
-
# Escape tabs and newlines in summary
-
summary = entry.summary.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
-
print(f"Summary\t{summary}")
-
-
if entry.categories:
-
print(f"Categories\t{', '.join(entry.categories)}")
-
-
if entry.author:
-
author_info = []
-
if "name" in entry.author:
-
author_info.append(entry.author["name"])
-
if "email" in entry.author:
-
author_info.append(f"<{entry.author['email']}>")
-
if author_info:
-
print(f"Author\t{' '.join(author_info)}")
-
-
if entry.content_type:
-
print(f"Content Type\t{entry.content_type}")
-
-
if entry.rights:
-
print(f"Rights\t{entry.rights}")
-
-
if entry.source:
-
print(f"Source Feed\t{entry.source}")
-
-
# Add reference info if available
-
if ref_index:
-
outbound_refs = ref_index.get_outbound_refs(username, entry.id)
-
inbound_refs = ref_index.get_inbound_refs(username, entry.id)
-
-
print(f"Outbound References\t{len(outbound_refs)}")
-
print(f"Inbound References\t{len(inbound_refs)}")
-
-
# Show each reference
-
for ref in outbound_refs:
-
target_info = f"{ref.target_username}:{ref.target_entry_id}" if ref.target_username and ref.target_entry_id else "External"
-
print(f"Outbound Reference\t{target_info}\t{ref.target_url}")
-
-
for ref in inbound_refs:
-
source_info = f"{ref.source_username}:{ref.source_entry_id}"
-
print(f"Inbound Reference\t{source_info}\t{ref.target_url}")
-
-
# Show content if requested
-
if show_content and entry.content:
-
# Escape tabs and newlines in content
-
content = entry.content.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
-
print(f"Content\t{content}")
-
</file>
-
-
<file path="src/thicket/cli/commands/init.py">
-
"""Initialize command for thicket."""
-
-
from pathlib import Path
-
from typing import Optional
-
-
import typer
-
from pydantic import ValidationError
-
-
from ...core.git_store import GitStore
-
from ...models import ThicketConfig
-
from ..main import app
-
from ..utils import print_error, print_success, save_config
-
-
-
@app.command()
-
def init(
-
git_store: Path = typer.Argument(..., help="Path to Git repository for storing feeds"),
-
cache_dir: Optional[Path] = typer.Option(
-
None, "--cache-dir", "-c", help="Cache directory (default: ~/.cache/thicket)"
-
),
-
config_file: Optional[Path] = typer.Option(
-
None, "--config", help="Configuration file path (default: thicket.yaml)"
-
),
-
force: bool = typer.Option(
-
False, "--force", "-f", help="Overwrite existing configuration"
-
),
-
) -> None:
-
"""Initialize a new thicket configuration and Git store."""
-
-
# Set default paths
-
if cache_dir is None:
-
from platformdirs import user_cache_dir
-
cache_dir = Path(user_cache_dir("thicket"))
-
-
if config_file is None:
-
config_file = Path("thicket.yaml")
-
-
# Check if config already exists
-
if config_file.exists() and not force:
-
print_error(f"Configuration file already exists: {config_file}")
-
print_error("Use --force to overwrite")
-
raise typer.Exit(1)
-
-
# Create cache directory
-
cache_dir.mkdir(parents=True, exist_ok=True)
-
-
# Create Git store
-
try:
-
GitStore(git_store)
-
print_success(f"Initialized Git store at: {git_store}")
-
except Exception as e:
-
print_error(f"Failed to initialize Git store: {e}")
-
raise typer.Exit(1) from e
-
-
# Create configuration
-
try:
-
config = ThicketConfig(
-
git_store=git_store,
-
cache_dir=cache_dir,
-
users=[]
-
)
-
-
save_config(config, config_file)
-
print_success(f"Created configuration file: {config_file}")
-
-
except ValidationError as e:
-
print_error(f"Invalid configuration: {e}")
-
raise typer.Exit(1) from e
-
except Exception as e:
-
print_error(f"Failed to create configuration: {e}")
-
raise typer.Exit(1) from e
-
-
print_success("Thicket initialized successfully!")
-
print_success(f"Git store: {git_store}")
-
print_success(f"Cache directory: {cache_dir}")
-
print_success(f"Configuration: {config_file}")
-
print_success("Run 'thicket add user' to add your first user and feed.")
-
</file>
-
-
<file path="src/thicket/cli/__init__.py">
-
"""CLI interface for thicket."""
-
-
from .main import app
-
-
__all__ = ["app"]
-
</file>
-
-
<file path="src/thicket/core/__init__.py">
-
"""Core business logic for thicket."""
-
-
from .feed_parser import FeedParser
-
from .git_store import GitStore
-
-
__all__ = ["FeedParser", "GitStore"]
-
</file>
-
-
<file path="src/thicket/core/feed_parser.py">
-
"""Feed parsing and normalization with auto-discovery."""
-
-
from datetime import datetime
-
from typing import Optional
-
from urllib.parse import urlparse
-
-
import bleach
-
import feedparser
-
import httpx
-
from pydantic import HttpUrl, ValidationError
-
-
from ..models import AtomEntry, FeedMetadata
-
-
-
class FeedParser:
-
"""Parser for RSS/Atom feeds with normalization and auto-discovery."""
-
-
def __init__(self, user_agent: str = "thicket/0.1.0"):
-
"""Initialize the feed parser."""
-
self.user_agent = user_agent
-
self.allowed_tags = [
-
"a", "abbr", "acronym", "b", "blockquote", "br", "code", "em",
-
"i", "li", "ol", "p", "pre", "strong", "ul", "h1", "h2", "h3",
-
"h4", "h5", "h6", "img", "div", "span",
-
]
-
self.allowed_attributes = {
-
"a": ["href", "title"],
-
"abbr": ["title"],
-
"acronym": ["title"],
-
"img": ["src", "alt", "title", "width", "height"],
-
"blockquote": ["cite"],
-
}
-
-
async def fetch_feed(self, url: HttpUrl) -> str:
-
"""Fetch feed content from URL."""
-
async with httpx.AsyncClient() as client:
-
response = await client.get(
-
str(url),
-
headers={"User-Agent": self.user_agent},
-
timeout=30.0,
-
follow_redirects=True,
-
)
-
response.raise_for_status()
-
return response.text
-
-
def parse_feed(self, content: str, source_url: Optional[HttpUrl] = None) -> tuple[FeedMetadata, list[AtomEntry]]:
-
"""Parse feed content and return metadata and entries."""
-
parsed = feedparser.parse(content)
-
-
if parsed.bozo and parsed.bozo_exception:
-
# Try to continue with potentially malformed feed
-
pass
-
-
# Extract feed metadata
-
feed_meta = self._extract_feed_metadata(parsed.feed)
-
-
# Extract and normalize entries
-
entries = []
-
for entry in parsed.entries:
-
try:
-
atom_entry = self._normalize_entry(entry, source_url)
-
entries.append(atom_entry)
-
except Exception as e:
-
# Log error but continue processing other entries
-
print(f"Error processing entry {getattr(entry, 'id', 'unknown')}: {e}")
-
continue
-
-
return feed_meta, entries
-
-
def _extract_feed_metadata(self, feed: feedparser.FeedParserDict) -> FeedMetadata:
-
"""Extract metadata from feed for auto-discovery."""
-
# Parse author information
-
author_name = None
-
author_email = None
-
author_uri = None
-
-
if hasattr(feed, 'author_detail'):
-
author_name = feed.author_detail.get('name')
-
author_email = feed.author_detail.get('email')
-
author_uri = feed.author_detail.get('href')
-
elif hasattr(feed, 'author'):
-
author_name = feed.author
-
-
# Parse managing editor for RSS feeds
-
if not author_email and hasattr(feed, 'managingEditor'):
-
author_email = feed.managingEditor
-
-
# Parse feed link
-
feed_link = None
-
if hasattr(feed, 'link'):
-
try:
-
feed_link = HttpUrl(feed.link)
-
except ValidationError:
-
pass
-
-
# Parse image/icon/logo
-
logo = None
-
icon = None
-
image_url = None
-
-
if hasattr(feed, 'image'):
-
try:
-
image_url = HttpUrl(feed.image.get('href', feed.image.get('url', '')))
-
except (ValidationError, AttributeError):
-
pass
-
-
if hasattr(feed, 'icon'):
-
try:
-
icon = HttpUrl(feed.icon)
-
except ValidationError:
-
pass
-
-
if hasattr(feed, 'logo'):
-
try:
-
logo = HttpUrl(feed.logo)
-
except ValidationError:
-
pass
-
-
return FeedMetadata(
-
title=getattr(feed, 'title', None),
-
author_name=author_name,
-
author_email=author_email,
-
author_uri=HttpUrl(author_uri) if author_uri else None,
-
link=feed_link,
-
logo=logo,
-
icon=icon,
-
image_url=image_url,
-
description=getattr(feed, 'description', None),
-
)
-
-
def _normalize_entry(self, entry: feedparser.FeedParserDict, source_url: Optional[HttpUrl] = None) -> AtomEntry:
-
"""Normalize an entry to Atom format."""
-
# Parse timestamps
-
updated = self._parse_timestamp(entry.get('updated_parsed') or entry.get('published_parsed'))
-
published = self._parse_timestamp(entry.get('published_parsed'))
-
-
# Parse content
-
content = self._extract_content(entry)
-
content_type = self._extract_content_type(entry)
-
-
# Parse author
-
author = self._extract_author(entry)
-
-
# Parse categories/tags
-
categories = []
-
if hasattr(entry, 'tags'):
-
categories = [tag.get('term', '') for tag in entry.tags if tag.get('term')]
-
-
# Sanitize HTML content
-
if content:
-
content = self._sanitize_html(content)
-
-
summary = entry.get('summary', '')
-
if summary:
-
summary = self._sanitize_html(summary)
-
-
return AtomEntry(
-
id=entry.get('id', entry.get('link', '')),
-
title=entry.get('title', ''),
-
link=HttpUrl(entry.get('link', '')),
-
updated=updated,
-
published=published,
-
summary=summary or None,
-
content=content or None,
-
content_type=content_type,
-
author=author,
-
categories=categories,
-
rights=entry.get('rights', None),
-
source=str(source_url) if source_url else None,
-
)
-
-
def _parse_timestamp(self, time_struct) -> datetime:
-
"""Parse feedparser time struct to datetime."""
-
if time_struct:
-
return datetime(*time_struct[:6])
-
return datetime.now()
-
-
def _extract_content(self, entry: feedparser.FeedParserDict) -> Optional[str]:
-
"""Extract the best content from an entry."""
-
# Prefer content over summary
-
if hasattr(entry, 'content') and entry.content:
-
# Find the best content (prefer text/html, then text/plain)
-
for content_item in entry.content:
-
if content_item.get('type') in ['text/html', 'html']:
-
return content_item.get('value', '')
-
elif content_item.get('type') in ['text/plain', 'text']:
-
return content_item.get('value', '')
-
# Fallback to first content item
-
return entry.content[0].get('value', '')
-
-
# Fallback to summary
-
return entry.get('summary', '')
-
-
def _extract_content_type(self, entry: feedparser.FeedParserDict) -> str:
-
"""Extract content type from entry."""
-
if hasattr(entry, 'content') and entry.content:
-
content_type = entry.content[0].get('type', 'html')
-
# Normalize content type
-
if content_type in ['text/html', 'html']:
-
return 'html'
-
elif content_type in ['text/plain', 'text']:
-
return 'text'
-
elif content_type == 'xhtml':
-
return 'xhtml'
-
return 'html'
-
-
def _extract_author(self, entry: feedparser.FeedParserDict) -> Optional[dict]:
-
"""Extract author information from entry."""
-
author = {}
-
-
if hasattr(entry, 'author_detail'):
-
author.update({
-
'name': entry.author_detail.get('name'),
-
'email': entry.author_detail.get('email'),
-
'uri': entry.author_detail.get('href'),
-
})
-
elif hasattr(entry, 'author'):
-
author['name'] = entry.author
-
-
return author if author else None
-
-
def _sanitize_html(self, html: str) -> str:
-
"""Sanitize HTML content to prevent XSS."""
-
return bleach.clean(
-
html,
-
tags=self.allowed_tags,
-
attributes=self.allowed_attributes,
-
strip=True,
-
)
-
-
def sanitize_entry_id(self, entry_id: str) -> str:
-
"""Sanitize entry ID to be a safe filename."""
-
# Parse URL to get meaningful parts
-
parsed = urlparse(entry_id)
-
-
# Start with the path component
-
if parsed.path:
-
# Remove leading slash and replace problematic characters
-
safe_id = parsed.path.lstrip('/').replace('/', '_').replace('\\', '_')
-
else:
-
# Use the entire ID as fallback
-
safe_id = entry_id
-
-
# Replace problematic characters
-
safe_chars = []
-
for char in safe_id:
-
if char.isalnum() or char in '-_.':
-
safe_chars.append(char)
-
else:
-
safe_chars.append('_')
-
-
safe_id = ''.join(safe_chars)
-
-
# Ensure it's not too long (max 200 chars)
-
if len(safe_id) > 200:
-
safe_id = safe_id[:200]
-
-
# Ensure it's not empty
-
if not safe_id:
-
safe_id = "entry"
-
-
return safe_id
-
</file>
-
-
<file path="src/thicket/core/reference_parser.py">
-
"""Reference detection and parsing for blog entries."""
-
-
import re
-
from typing import Optional
-
from urllib.parse import urlparse
-
-
from ..models import AtomEntry
-
-
-
class BlogReference:
-
"""Represents a reference from one blog entry to another."""
-
-
def __init__(
-
self,
-
source_entry_id: str,
-
source_username: str,
-
target_url: str,
-
target_username: Optional[str] = None,
-
target_entry_id: Optional[str] = None,
-
):
-
self.source_entry_id = source_entry_id
-
self.source_username = source_username
-
self.target_url = target_url
-
self.target_username = target_username
-
self.target_entry_id = target_entry_id
-
-
def to_dict(self) -> dict:
-
"""Convert to dictionary for JSON serialization."""
-
result = {
-
"source_entry_id": self.source_entry_id,
-
"source_username": self.source_username,
-
"target_url": self.target_url,
-
}
-
-
# Only include optional fields if they are not None
-
if self.target_username is not None:
-
result["target_username"] = self.target_username
-
if self.target_entry_id is not None:
-
result["target_entry_id"] = self.target_entry_id
-
-
return result
-
-
@classmethod
-
def from_dict(cls, data: dict) -> "BlogReference":
-
"""Create from dictionary."""
-
return cls(
-
source_entry_id=data["source_entry_id"],
-
source_username=data["source_username"],
-
target_url=data["target_url"],
-
target_username=data.get("target_username"),
-
target_entry_id=data.get("target_entry_id"),
-
)
-
-
-
class ReferenceIndex:
-
"""Index of blog-to-blog references for creating threaded views."""
-
-
def __init__(self):
-
self.references: list[BlogReference] = []
-
self.outbound_refs: dict[
-
str, list[BlogReference]
-
] = {} # entry_id -> outbound refs
-
self.inbound_refs: dict[
-
str, list[BlogReference]
-
] = {} # entry_id -> inbound refs
-
self.user_domains: dict[str, set[str]] = {} # username -> set of domains
-
-
def add_reference(self, ref: BlogReference) -> None:
-
"""Add a reference to the index."""
-
self.references.append(ref)
-
-
# Update outbound references
-
source_key = f"{ref.source_username}:{ref.source_entry_id}"
-
if source_key not in self.outbound_refs:
-
self.outbound_refs[source_key] = []
-
self.outbound_refs[source_key].append(ref)
-
-
# Update inbound references if we can identify the target
-
if ref.target_username and ref.target_entry_id:
-
target_key = f"{ref.target_username}:{ref.target_entry_id}"
-
if target_key not in self.inbound_refs:
-
self.inbound_refs[target_key] = []
-
self.inbound_refs[target_key].append(ref)
-
-
def get_outbound_refs(self, username: str, entry_id: str) -> list[BlogReference]:
-
"""Get all outbound references from an entry."""
-
key = f"{username}:{entry_id}"
-
return self.outbound_refs.get(key, [])
-
-
def get_inbound_refs(self, username: str, entry_id: str) -> list[BlogReference]:
-
"""Get all inbound references to an entry."""
-
key = f"{username}:{entry_id}"
-
return self.inbound_refs.get(key, [])
-
-
def get_thread_members(self, username: str, entry_id: str) -> set[tuple[str, str]]:
-
"""Get all entries that are part of the same thread."""
-
visited = set()
-
to_visit = [(username, entry_id)]
-
thread_members = set()
-
-
while to_visit:
-
current_user, current_entry = to_visit.pop()
-
if (current_user, current_entry) in visited:
-
continue
-
-
visited.add((current_user, current_entry))
-
thread_members.add((current_user, current_entry))
-
-
# Add outbound references
-
for ref in self.get_outbound_refs(current_user, current_entry):
-
if ref.target_username and ref.target_entry_id:
-
to_visit.append((ref.target_username, ref.target_entry_id))
-
-
# Add inbound references
-
for ref in self.get_inbound_refs(current_user, current_entry):
-
to_visit.append((ref.source_username, ref.source_entry_id))
-
-
return thread_members
-
-
def to_dict(self) -> dict:
-
"""Convert to dictionary for JSON serialization."""
-
return {
-
"references": [ref.to_dict() for ref in self.references],
-
"user_domains": {k: list(v) for k, v in self.user_domains.items()},
-
}
-
-
@classmethod
-
def from_dict(cls, data: dict) -> "ReferenceIndex":
-
"""Create from dictionary."""
-
index = cls()
-
for ref_data in data.get("references", []):
-
ref = BlogReference.from_dict(ref_data)
-
index.add_reference(ref)
-
-
for username, domains in data.get("user_domains", {}).items():
-
index.user_domains[username] = set(domains)
-
-
return index
-
-
-
class ReferenceParser:
-
"""Parses blog entries to detect references to other blogs."""
-
-
def __init__(self):
-
# Common blog platforms and patterns
-
self.blog_patterns = [
-
r"https?://[^/]+\.(?:org|com|net|io|dev|me|co\.uk)/.*", # Common blog domains
-
r"https?://[^/]+\.github\.io/.*", # GitHub Pages
-
r"https?://[^/]+\.substack\.com/.*", # Substack
-
r"https?://medium\.com/.*", # Medium
-
r"https?://[^/]+\.wordpress\.com/.*", # WordPress.com
-
r"https?://[^/]+\.blogspot\.com/.*", # Blogger
-
]
-
-
# Compile regex patterns
-
self.link_pattern = re.compile(
-
r'<a[^>]+href="([^"]+)"[^>]*>(.*?)</a>', re.IGNORECASE | re.DOTALL
-
)
-
self.url_pattern = re.compile(r'https?://[^\s<>"]+')
-
-
def extract_links_from_html(self, html_content: str) -> list[tuple[str, str]]:
-
"""Extract all links from HTML content."""
-
links = []
-
-
# Extract links from <a> tags
-
for match in self.link_pattern.finditer(html_content):
-
url = match.group(1)
-
text = re.sub(
-
r"<[^>]+>", "", match.group(2)
-
).strip() # Remove HTML tags from link text
-
links.append((url, text))
-
-
return links
-
-
def is_blog_url(self, url: str) -> bool:
-
"""Check if a URL likely points to a blog post."""
-
for pattern in self.blog_patterns:
-
if re.match(pattern, url):
-
return True
-
return False
-
-
def _is_likely_blog_post_url(self, url: str) -> bool:
-
"""Check if a same-domain URL likely points to a blog post (not CSS, images, etc.)."""
-
parsed_url = urlparse(url)
-
path = parsed_url.path.lower()
-
-
# Skip obvious non-blog content
-
if any(path.endswith(ext) for ext in ['.css', '.js', '.png', '.jpg', '.jpeg', '.gif', '.svg', '.ico', '.pdf', '.xml', '.json']):
-
return False
-
-
# Skip common non-blog paths
-
if any(segment in path for segment in ['/static/', '/assets/', '/css/', '/js/', '/images/', '/img/', '/media/', '/uploads/']):
-
return False
-
-
# Skip fragment-only links (same page anchors)
-
if not path or path == '/':
-
return False
-
-
# Look for positive indicators of blog posts
-
# Common blog post patterns: dates, slugs, post indicators
-
blog_indicators = [
-
r'/\d{4}/', # Year in path
-
r'/\d{4}/\d{2}/', # Year/month in path
-
r'/blog/',
-
r'/post/',
-
r'/posts/',
-
r'/articles?/',
-
r'/notes?/',
-
r'/entries/',
-
r'/writing/',
-
]
-
-
for pattern in blog_indicators:
-
if re.search(pattern, path):
-
return True
-
-
# If it has a reasonable path depth and doesn't match exclusions, likely a blog post
-
path_segments = [seg for seg in path.split('/') if seg]
-
return len(path_segments) >= 1 # At least one meaningful path segment
-
-
def resolve_target_user(
-
self, url: str, user_domains: dict[str, set[str]]
-
) -> Optional[str]:
-
"""Try to resolve a URL to a known user based on domain mapping."""
-
parsed_url = urlparse(url)
-
domain = parsed_url.netloc.lower()
-
-
for username, domains in user_domains.items():
-
if domain in domains:
-
return username
-
-
return None
-
-
def extract_references(
-
self, entry: AtomEntry, username: str, user_domains: dict[str, set[str]]
-
) -> list[BlogReference]:
-
"""Extract all blog references from an entry."""
-
references = []
-
-
# Combine all text content for analysis
-
content_to_search = []
-
if entry.content:
-
content_to_search.append(entry.content)
-
if entry.summary:
-
content_to_search.append(entry.summary)
-
-
for content in content_to_search:
-
links = self.extract_links_from_html(content)
-
-
for url, _link_text in links:
-
entry_domain = (
-
urlparse(str(entry.link)).netloc.lower() if entry.link else ""
-
)
-
link_domain = urlparse(url).netloc.lower()
-
-
# Check if this looks like a blog URL
-
if not self.is_blog_url(url):
-
continue
-
-
# For same-domain links, apply additional filtering to avoid non-blog content
-
if link_domain == entry_domain:
-
# Only include same-domain links that look like blog posts
-
if not self._is_likely_blog_post_url(url):
-
continue
-
-
# Try to resolve to a known user
-
if link_domain == entry_domain:
-
# Same domain - target user is the same as source user
-
target_username: Optional[str] = username
-
else:
-
# Different domain - try to resolve
-
target_username = self.resolve_target_user(url, user_domains)
-
-
ref = BlogReference(
-
source_entry_id=entry.id,
-
source_username=username,
-
target_url=url,
-
target_username=target_username,
-
target_entry_id=None, # Will be resolved later if possible
-
)
-
-
references.append(ref)
-
-
return references
-
-
def build_user_domain_mapping(self, git_store: "GitStore") -> dict[str, set[str]]:
-
"""Build mapping of usernames to their known domains."""
-
user_domains = {}
-
index = git_store._load_index()
-
-
for username, user_metadata in index.users.items():
-
domains = set()
-
-
# Add domains from feeds
-
for feed_url in user_metadata.feeds:
-
domain = urlparse(feed_url).netloc.lower()
-
if domain:
-
domains.add(domain)
-
-
# Add domain from homepage
-
if user_metadata.homepage:
-
domain = urlparse(str(user_metadata.homepage)).netloc.lower()
-
if domain:
-
domains.add(domain)
-
-
user_domains[username] = domains
-
-
return user_domains
-
-
def _build_url_to_entry_mapping(self, git_store: "GitStore") -> dict[str, str]:
-
"""Build a comprehensive mapping from URLs to entry IDs using git store data.
-
-
This creates a bidirectional mapping that handles:
-
- Entry link URLs -> Entry IDs
-
- URL variations (with/without www, http/https)
-
- Multiple URLs pointing to the same entry
-
"""
-
url_to_entry: dict[str, str] = {}
-
-
# Load index to get all users
-
index = git_store._load_index()
-
-
for username in index.users.keys():
-
entries = git_store.list_entries(username)
-
-
for entry in entries:
-
if entry.link:
-
link_url = str(entry.link)
-
entry_id = entry.id
-
-
# Map the canonical link URL
-
url_to_entry[link_url] = entry_id
-
-
# Handle common URL variations
-
parsed = urlparse(link_url)
-
if parsed.netloc and parsed.path:
-
# Add version without www
-
if parsed.netloc.startswith('www.'):
-
no_www_url = f"{parsed.scheme}://{parsed.netloc[4:]}{parsed.path}"
-
if parsed.query:
-
no_www_url += f"?{parsed.query}"
-
if parsed.fragment:
-
no_www_url += f"#{parsed.fragment}"
-
url_to_entry[no_www_url] = entry_id
-
-
# Add version with www if not present
-
elif not parsed.netloc.startswith('www.'):
-
www_url = f"{parsed.scheme}://www.{parsed.netloc}{parsed.path}"
-
if parsed.query:
-
www_url += f"?{parsed.query}"
-
if parsed.fragment:
-
www_url += f"#{parsed.fragment}"
-
url_to_entry[www_url] = entry_id
-
-
# Add http/https variations
-
if parsed.scheme == 'https':
-
http_url = link_url.replace('https://', 'http://', 1)
-
url_to_entry[http_url] = entry_id
-
elif parsed.scheme == 'http':
-
https_url = link_url.replace('http://', 'https://', 1)
-
url_to_entry[https_url] = entry_id
-
-
return url_to_entry
-
-
def _normalize_url(self, url: str) -> str:
-
"""Normalize URL for consistent matching.
-
-
Handles common variations like trailing slashes, fragments, etc.
-
"""
-
parsed = urlparse(url)
-
-
# Remove trailing slash from path
-
path = parsed.path.rstrip('/') if parsed.path != '/' else parsed.path
-
-
# Reconstruct without fragment for consistent matching
-
normalized = f"{parsed.scheme}://{parsed.netloc}{path}"
-
if parsed.query:
-
normalized += f"?{parsed.query}"
-
-
return normalized
-
-
def resolve_target_entry_ids(
-
self, references: list[BlogReference], git_store: "GitStore"
-
) -> list[BlogReference]:
-
"""Resolve target_entry_id for references using comprehensive URL mapping."""
-
resolved_refs = []
-
-
# Build comprehensive URL to entry ID mapping
-
url_to_entry = self._build_url_to_entry_mapping(git_store)
-
-
for ref in references:
-
# If we already have a target_entry_id, keep the reference as-is
-
if ref.target_entry_id is not None:
-
resolved_refs.append(ref)
-
continue
-
-
# If we don't have a target_username, we can't resolve it
-
if ref.target_username is None:
-
resolved_refs.append(ref)
-
continue
-
-
# Try to resolve using URL mapping
-
resolved_entry_id = None
-
-
# First, try exact match
-
if ref.target_url in url_to_entry:
-
resolved_entry_id = url_to_entry[ref.target_url]
-
else:
-
# Try normalized URL matching
-
normalized_target = self._normalize_url(ref.target_url)
-
if normalized_target in url_to_entry:
-
resolved_entry_id = url_to_entry[normalized_target]
-
else:
-
# Try URL variations
-
for mapped_url, entry_id in url_to_entry.items():
-
if self._normalize_url(mapped_url) == normalized_target:
-
resolved_entry_id = entry_id
-
break
-
-
# Verify the resolved entry belongs to the target username
-
if resolved_entry_id:
-
# Double-check by loading the actual entry
-
entries = git_store.list_entries(ref.target_username)
-
entry_found = any(entry.id == resolved_entry_id for entry in entries)
-
if not entry_found:
-
resolved_entry_id = None
-
-
# Create a new reference with the resolved target_entry_id
-
resolved_ref = BlogReference(
-
source_entry_id=ref.source_entry_id,
-
source_username=ref.source_username,
-
target_url=ref.target_url,
-
target_username=ref.target_username,
-
target_entry_id=resolved_entry_id,
-
)
-
resolved_refs.append(resolved_ref)
-
-
return resolved_refs
-
</file>
-
-
<file path="src/thicket/models/__init__.py">
-
"""Data models for thicket."""
-
-
from .config import ThicketConfig, UserConfig
-
from .feed import AtomEntry, DuplicateMap, FeedMetadata
-
from .user import GitStoreIndex, UserMetadata
-
-
__all__ = [
-
"ThicketConfig",
-
"UserConfig",
-
"AtomEntry",
-
"DuplicateMap",
-
"FeedMetadata",
-
"GitStoreIndex",
-
"UserMetadata",
-
]
-
</file>
-
-
<file path="src/thicket/models/feed.py">
-
"""Feed and entry models for thicket."""
-
-
from datetime import datetime
-
from typing import TYPE_CHECKING, Optional
-
-
from pydantic import BaseModel, ConfigDict, EmailStr, HttpUrl
-
-
if TYPE_CHECKING:
-
from .config import UserConfig
-
-
-
class AtomEntry(BaseModel):
-
"""Represents an Atom feed entry stored in the Git repository."""
-
-
model_config = ConfigDict(
-
json_encoders={datetime: lambda v: v.isoformat()},
-
str_strip_whitespace=True,
-
)
-
-
id: str # Original Atom ID
-
title: str
-
link: HttpUrl
-
updated: datetime
-
published: Optional[datetime] = None
-
summary: Optional[str] = None
-
content: Optional[str] = None # Full body content from Atom entry
-
content_type: Optional[str] = "html" # text, html, xhtml
-
author: Optional[dict] = None
-
categories: list[str] = []
-
rights: Optional[str] = None # Copyright info
-
source: Optional[str] = None # Source feed URL
-
-
-
class FeedMetadata(BaseModel):
-
"""Metadata extracted from a feed for auto-discovery."""
-
-
title: Optional[str] = None
-
author_name: Optional[str] = None
-
author_email: Optional[EmailStr] = None
-
author_uri: Optional[HttpUrl] = None
-
link: Optional[HttpUrl] = None
-
logo: Optional[HttpUrl] = None
-
icon: Optional[HttpUrl] = None
-
image_url: Optional[HttpUrl] = None
-
description: Optional[str] = None
-
-
def to_user_config(self, username: str, feed_url: HttpUrl) -> "UserConfig":
-
"""Convert discovered metadata to UserConfig with fallbacks."""
-
from .config import UserConfig
-
-
return UserConfig(
-
username=username,
-
feeds=[feed_url],
-
display_name=self.author_name or self.title,
-
email=self.author_email,
-
homepage=self.author_uri or self.link,
-
icon=self.logo or self.icon or self.image_url,
-
)
-
-
-
class DuplicateMap(BaseModel):
-
"""Maps duplicate entry IDs to canonical entry IDs."""
-
-
duplicates: dict[str, str] = {} # duplicate_id -> canonical_id
-
comment: str = "Entry IDs that map to the same canonical content"
-
-
def add_duplicate(self, duplicate_id: str, canonical_id: str) -> None:
-
"""Add a duplicate mapping."""
-
self.duplicates[duplicate_id] = canonical_id
-
-
def remove_duplicate(self, duplicate_id: str) -> bool:
-
"""Remove a duplicate mapping. Returns True if existed."""
-
return self.duplicates.pop(duplicate_id, None) is not None
-
-
def get_canonical(self, entry_id: str) -> str:
-
"""Get canonical ID for an entry (returns original if not duplicate)."""
-
return self.duplicates.get(entry_id, entry_id)
-
-
def is_duplicate(self, entry_id: str) -> bool:
-
"""Check if entry ID is marked as duplicate."""
-
return entry_id in self.duplicates
-
-
def get_duplicates_for_canonical(self, canonical_id: str) -> list[str]:
-
"""Get all duplicate IDs that map to a canonical ID."""
-
return [
-
duplicate_id
-
for duplicate_id, canonical in self.duplicates.items()
-
if canonical == canonical_id
-
]
-
</file>
-
-
<file path="src/thicket/models/user.py">
-
"""User metadata models for thicket."""
-
-
from datetime import datetime
-
from typing import Optional
-
-
from pydantic import BaseModel, ConfigDict
-
-
-
class UserMetadata(BaseModel):
-
"""Metadata about a user stored in the Git repository."""
-
-
model_config = ConfigDict(
-
json_encoders={datetime: lambda v: v.isoformat()},
-
str_strip_whitespace=True,
-
)
-
-
username: str
-
display_name: Optional[str] = None
-
email: Optional[str] = None
-
homepage: Optional[str] = None
-
icon: Optional[str] = None
-
feeds: list[str] = []
-
directory: str # Directory name in Git store
-
created: datetime
-
last_updated: datetime
-
entry_count: int = 0
-
-
def update_timestamp(self) -> None:
-
"""Update the last_updated timestamp to now."""
-
self.last_updated = datetime.now()
-
-
def increment_entry_count(self, count: int = 1) -> None:
-
"""Increment the entry count by the given amount."""
-
self.entry_count += count
-
self.update_timestamp()
-
-
-
class GitStoreIndex(BaseModel):
-
"""Index of all users and their directories in the Git store."""
-
-
model_config = ConfigDict(
-
json_encoders={datetime: lambda v: v.isoformat()}
-
)
-
-
users: dict[str, UserMetadata] = {} # username -> UserMetadata
-
created: datetime
-
last_updated: datetime
-
total_entries: int = 0
-
-
def add_user(self, user_metadata: UserMetadata) -> None:
-
"""Add or update a user in the index."""
-
self.users[user_metadata.username] = user_metadata
-
self.last_updated = datetime.now()
-
-
def remove_user(self, username: str) -> bool:
-
"""Remove a user from the index. Returns True if user existed."""
-
if username in self.users:
-
del self.users[username]
-
self.last_updated = datetime.now()
-
return True
-
return False
-
-
def get_user(self, username: str) -> Optional[UserMetadata]:
-
"""Get user metadata by username."""
-
return self.users.get(username)
-
-
def update_entry_count(self, username: str, count: int) -> None:
-
"""Update entry count for a user and total."""
-
user = self.get_user(username)
-
if user:
-
user.increment_entry_count(count)
-
self.total_entries += count
-
self.last_updated = datetime.now()
-
-
def recalculate_totals(self) -> None:
-
"""Recalculate total entries from all users."""
-
self.total_entries = sum(user.entry_count for user in self.users.values())
-
self.last_updated = datetime.now()
-
</file>
-
-
<file path="src/thicket/utils/__init__.py">
-
"""Utility modules for thicket."""
-
-
# This module will contain shared utilities
-
# For now, it's empty but can be expanded with common functions
-
</file>
-
-
<file path="src/thicket/__init__.py">
-
"""Thicket: A CLI tool for persisting Atom/RSS feeds in Git repositories."""
-
-
__version__ = "0.1.0"
-
__author__ = "thicket"
-
__email__ = "thicket@example.com"
-
</file>
-
-
<file path="src/thicket/__main__.py">
-
"""Entry point for running thicket as a module."""
-
-
from .cli.main import app
-
-
if __name__ == "__main__":
-
app()
-
</file>
-
-
<file path=".gitignore">
-
# Byte-compiled / optimized / DLL files
-
__pycache__/
-
*.py[codz]
-
*$py.class
-
-
# C extensions
-
*.so
-
-
# Distribution / packaging
-
.Python
-
build/
-
develop-eggs/
-
dist/
-
downloads/
-
eggs/
-
.eggs/
-
lib/
-
lib64/
-
parts/
-
sdist/
-
var/
-
wheels/
-
share/python-wheels/
-
*.egg-info/
-
.installed.cfg
-
*.egg
-
MANIFEST
-
-
# PyInstaller
-
# Usually these files are written by a python script from a template
-
# before PyInstaller builds the exe, so as to inject date/other infos into it.
-
*.manifest
-
*.spec
-
-
# Installer logs
-
pip-log.txt
-
pip-delete-this-directory.txt
-
-
# Unit test / coverage reports
-
htmlcov/
-
.tox/
-
.nox/
-
.coverage
-
.coverage.*
-
.cache
-
nosetests.xml
-
coverage.xml
-
*.cover
-
*.py.cover
-
.hypothesis/
-
.pytest_cache/
-
cover/
-
-
# Translations
-
*.mo
-
*.pot
-
-
# Django stuff:
-
*.log
-
local_settings.py
-
db.sqlite3
-
db.sqlite3-journal
-
-
# Flask stuff:
-
instance/
-
.webassets-cache
-
-
# Scrapy stuff:
-
.scrapy
-
-
# Sphinx documentation
-
docs/_build/
-
-
# PyBuilder
-
.pybuilder/
-
target/
-
-
# Jupyter Notebook
-
.ipynb_checkpoints
-
-
# IPython
-
profile_default/
-
ipython_config.py
-
-
# pyenv
-
# For a library or package, you might want to ignore these files since the code is
-
# intended to run in multiple environments; otherwise, check them in:
-
# .python-version
-
-
# pipenv
-
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-
# However, in case of collaboration, if having platform-specific dependencies or dependencies
-
# having no cross-platform support, pipenv may install dependencies that don't work, or not
-
# install all needed dependencies.
-
#Pipfile.lock
-
-
# UV
-
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
-
# This is especially recommended for binary packages to ensure reproducibility, and is more
-
# commonly ignored for libraries.
-
#uv.lock
-
-
# poetry
-
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-
# This is especially recommended for binary packages to ensure reproducibility, and is more
-
# commonly ignored for libraries.
-
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-
#poetry.lock
-
#poetry.toml
-
-
# pdm
-
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-
# pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
-
# https://pdm-project.org/en/latest/usage/project/#working-with-version-control
-
#pdm.lock
-
#pdm.toml
-
.pdm-python
-
.pdm-build/
-
-
# pixi
-
# Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
-
#pixi.lock
-
# Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
-
# in the .venv directory. It is recommended not to include this directory in version control.
-
.pixi
-
-
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-
__pypackages__/
-
-
# Celery stuff
-
celerybeat-schedule
-
celerybeat.pid
-
-
# SageMath parsed files
-
*.sage.py
-
-
# Environments
-
.env
-
.envrc
-
.venv
-
env/
-
venv/
-
ENV/
-
env.bak/
-
venv.bak/
-
-
# Spyder project settings
-
.spyderproject
-
.spyproject
-
-
# Rope project settings
-
.ropeproject
-
-
# mkdocs documentation
-
/site
-
-
# mypy
-
.mypy_cache/
-
.dmypy.json
-
dmypy.json
-
-
# Pyre type checker
-
.pyre/
-
-
# pytype static type analyzer
-
.pytype/
-
-
# Cython debug symbols
-
cython_debug/
-
-
# PyCharm
-
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-
# and can be added to the global gitignore or merged into this file. For a more nuclear
-
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
-
#.idea/
-
-
# Abstra
-
# Abstra is an AI-powered process automation framework.
-
# Ignore directories containing user credentials, local state, and settings.
-
# Learn more at https://abstra.io/docs
-
.abstra/
-
-
# Visual Studio Code
-
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
-
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
-
# and can be added to the global gitignore or merged into this file. However, if you prefer,
-
# you could uncomment the following to ignore the entire vscode folder
-
# .vscode/
-
-
# Ruff stuff:
-
.ruff_cache/
-
-
# PyPI configuration file
-
.pypirc
-
-
# Marimo
-
marimo/_static/
-
marimo/_lsp/
-
__marimo__/
-
-
# Streamlit
-
.streamlit/secrets.toml
-
-
thicket.yaml
-
</file>
-
-
<file path="CLAUDE.md">
-
My goal is to build a CLI tool called thicket in Python that maintains a Git repository within which Atom feeds can be persisted, including their contents.
-
-
# Python Environment and Package Management
-
-
This project uses `uv` for Python package management and virtual environment handling.
-
-
## Running Commands
-
-
ALWAYS use `uv run` to execute Python commands:
-
-
- Run the CLI: `uv run -m thicket`
-
- Run tests: `uv run pytest`
-
- Type checking: `uv run mypy src/`
-
- Linting: `uv run ruff check src/`
-
- Format code: `uv run ruff format src/`
-
- Compile check: `uv run python -m py_compile <file>`
-
-
## Package Management
-
-
- Add dependencies: `uv add <package>`
-
- Add dev dependencies: `uv add --dev <package>`
-
- Install dependencies: `uv sync`
-
- Update dependencies: `uv lock --upgrade`
-
-
# Project Structure
-
-
The configuration file specifies:
-
- the location of a git store
-
- a list of usernames and target Atom/RSS feed(s) and optional metadata about the username such as their email, homepage, icon and display name
-
- a cache directory to store temporary results such as feed downloads and their last modification date that speed up operations across runs of the tool
-
-
The Git data store should:
-
- have a subdirectory per user
-
- within that directory, an entry per Atom entry indexed by the Atom id for that entry. The id should be sanitised consistently to be a safe filename. RSS feed should be normalized to Atom before storing it.
-
- within each entry file, the metadata of the Atom feed converted into a JSON format that preserves as much metadata as possible.
-
- have a JSON file in the Git repository that indexes the users, their associated directories within the Git repository, and any other metadata about that user from the config file
-
The CLI should be modern and use cool progress bars and any otfrom ecosystem libraries.
-
-
The intention behind the Git repository is that it can be queried by other websites in order to build a webblog structure of comments that link to other blogs.
-
</file>
-
-
<file path="pyproject.toml">
-
[build-system]
-
requires = ["hatchling"]
-
build-backend = "hatchling.build"
-
-
[project]
-
name = "thicket"
-
dynamic = ["version"]
-
description = "A CLI tool for persisting Atom/RSS feeds in Git repositories"
-
readme = "README.md"
-
license = "MIT"
-
requires-python = ">=3.9"
-
authors = [
-
{name = "thicket", email = "thicket@example.com"},
-
]
-
classifiers = [
-
"Development Status :: 3 - Alpha",
-
"Intended Audience :: Developers",
-
"License :: OSI Approved :: MIT License",
-
"Operating System :: OS Independent",
-
"Programming Language :: Python :: 3",
-
"Programming Language :: Python :: 3.9",
-
"Programming Language :: Python :: 3.10",
-
"Programming Language :: Python :: 3.11",
-
"Programming Language :: Python :: 3.12",
-
"Programming Language :: Python :: 3.13",
-
"Topic :: Internet :: WWW/HTTP :: Dynamic Content :: News/Diary",
-
"Topic :: Software Development :: Version Control :: Git",
-
"Topic :: Text Processing :: Markup :: XML",
-
]
-
dependencies = [
-
"typer>=0.15.0",
-
"rich>=13.0.0",
-
"GitPython>=3.1.40",
-
"feedparser>=6.0.11",
-
"pydantic>=2.11.0",
-
"pydantic-settings>=2.10.0",
-
"httpx>=0.28.0",
-
"pendulum>=3.0.0",
-
"bleach>=6.0.0",
-
"platformdirs>=4.0.0",
-
"pyyaml>=6.0.0",
-
"email_validator",
-
"jinja2>=3.1.6",
-
]
-
-
[project.optional-dependencies]
-
dev = [
-
"pytest>=8.0.0",
-
"pytest-asyncio>=0.24.0",
-
"pytest-cov>=6.0.0",
-
"black>=24.0.0",
-
"ruff>=0.8.0",
-
"mypy>=1.13.0",
-
"types-PyYAML>=6.0.0",
-
]
-
-
[project.urls]
-
Homepage = "https://github.com/example/thicket"
-
Documentation = "https://github.com/example/thicket"
-
Repository = "https://github.com/example/thicket"
-
"Bug Tracker" = "https://github.com/example/thicket/issues"
-
-
[project.scripts]
-
thicket = "thicket.cli.main:app"
-
-
[tool.hatch.version]
-
path = "src/thicket/__init__.py"
-
-
[tool.hatch.build.targets.wheel]
-
packages = ["src/thicket"]
-
-
[tool.black]
-
line-length = 88
-
target-version = ['py39']
-
include = '\.pyi?$'
-
extend-exclude = '''
-
/(
-
# directories
-
\.eggs
-
| \.git
-
| \.hg
-
| \.mypy_cache
-
| \.tox
-
| \.venv
-
| build
-
| dist
-
)/
-
'''
-
-
[tool.ruff]
-
target-version = "py39"
-
line-length = 88
-
-
[tool.ruff.lint]
-
select = [
-
"E", # pycodestyle errors
-
"W", # pycodestyle warnings
-
"F", # pyflakes
-
"I", # isort
-
"B", # flake8-bugbear
-
"C4", # flake8-comprehensions
-
"UP", # pyupgrade
-
]
-
ignore = [
-
"E501", # line too long, handled by black
-
"B008", # do not perform function calls in argument defaults
-
"C901", # too complex
-
]
-
-
[tool.ruff.lint.per-file-ignores]
-
"__init__.py" = ["F401"]
-
-
[tool.mypy]
-
python_version = "3.9"
-
check_untyped_defs = true
-
disallow_any_generics = true
-
disallow_incomplete_defs = true
-
disallow_untyped_defs = true
-
no_implicit_optional = true
-
warn_redundant_casts = true
-
warn_unused_ignores = true
-
warn_return_any = true
-
strict_optional = true
-
-
[[tool.mypy.overrides]]
-
module = [
-
"feedparser",
-
"git",
-
"bleach",
-
]
-
ignore_missing_imports = true
-
-
[tool.pytest.ini_options]
-
testpaths = ["tests"]
-
python_files = ["test_*.py"]
-
python_classes = ["Test*"]
-
python_functions = ["test_*"]
-
addopts = [
-
"-ra",
-
"--strict-markers",
-
"--strict-config",
-
"--cov=src/thicket",
-
"--cov-report=term-missing",
-
"--cov-report=html",
-
"--cov-report=xml",
-
]
-
filterwarnings = [
-
"error",
-
"ignore::UserWarning",
-
"ignore::DeprecationWarning",
-
]
-
markers = [
-
"slow: marks tests as slow (deselect with '-m \"not slow\"')",
-
"integration: marks tests as integration tests",
-
]
-
-
[tool.coverage.run]
-
source = ["src"]
-
branch = true
-
-
[tool.coverage.report]
-
exclude_lines = [
-
"pragma: no cover",
-
"def __repr__",
-
"if self.debug:",
-
"if settings.DEBUG",
-
"raise AssertionError",
-
"raise NotImplementedError",
-
"if 0:",
-
"if __name__ == .__main__.:",
-
"class .*\\bProtocol\\):",
-
"@(abc\\.)?abstractmethod",
-
]
-
</file>
-
-
<file path="src/thicket/cli/commands/__init__.py">
-
"""CLI commands for thicket."""
-
-
# Import all commands to register them with the main app
-
from . import add, duplicates, generate, index_cmd, info_cmd, init, links_cmd, list_cmd, sync
-
-
__all__ = ["add", "duplicates", "generate", "index_cmd", "info_cmd", "init", "links_cmd", "list_cmd", "sync"]
-
</file>
-
-
<file path="src/thicket/cli/commands/add.py">
-
"""Add command for thicket."""
-
-
import asyncio
-
from pathlib import Path
-
from typing import Optional
-
-
import typer
-
from pydantic import HttpUrl, ValidationError
-
-
from ...core.feed_parser import FeedParser
-
from ...core.git_store import GitStore
-
from ..main import app
-
from ..utils import (
-
create_progress,
-
load_config,
-
print_error,
-
print_info,
-
print_success,
-
)
-
-
-
@app.command("add")
-
def add_command(
-
subcommand: str = typer.Argument(..., help="Subcommand: 'user' or 'feed'"),
-
username: str = typer.Argument(..., help="Username"),
-
feed_url: Optional[str] = typer.Argument(None, help="Feed URL (required for 'user' command)"),
-
email: Optional[str] = typer.Option(None, "--email", "-e", help="User email"),
-
homepage: Optional[str] = typer.Option(None, "--homepage", "-h", help="User homepage"),
-
icon: Optional[str] = typer.Option(None, "--icon", "-i", help="User icon URL"),
-
display_name: Optional[str] = typer.Option(None, "--display-name", "-d", help="User display name"),
-
config_file: Optional[Path] = typer.Option(
-
Path("thicket.yaml"), "--config", help="Configuration file path"
-
),
-
auto_discover: bool = typer.Option(
-
True, "--auto-discover/--no-auto-discover", help="Auto-discover user metadata from feed"
-
),
-
) -> None:
-
"""Add a user or feed to thicket."""
-
-
if subcommand == "user":
-
add_user(username, feed_url, email, homepage, icon, display_name, config_file, auto_discover)
-
elif subcommand == "feed":
-
add_feed(username, feed_url, config_file)
-
else:
-
print_error(f"Unknown subcommand: {subcommand}")
-
print_error("Use 'user' or 'feed'")
-
raise typer.Exit(1)
-
-
-
def add_user(
-
username: str,
-
feed_url: Optional[str],
-
email: Optional[str],
-
homepage: Optional[str],
-
icon: Optional[str],
-
display_name: Optional[str],
-
config_file: Path,
-
auto_discover: bool,
-
) -> None:
-
"""Add a new user with feed."""
-
-
if not feed_url:
-
print_error("Feed URL is required when adding a user")
-
raise typer.Exit(1)
-
-
# Validate feed URL
-
try:
-
validated_feed_url = HttpUrl(feed_url)
-
except ValidationError:
-
print_error(f"Invalid feed URL: {feed_url}")
-
raise typer.Exit(1) from None
-
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
# Check if user already exists
-
existing_user = git_store.get_user(username)
-
if existing_user:
-
print_error(f"User '{username}' already exists")
-
print_error("Use 'thicket add feed' to add additional feeds")
-
raise typer.Exit(1)
-
-
# Auto-discover metadata if enabled
-
discovered_metadata = None
-
if auto_discover:
-
discovered_metadata = asyncio.run(discover_feed_metadata(validated_feed_url))
-
-
# Prepare user data with manual overrides taking precedence
-
user_display_name = display_name or (discovered_metadata.author_name or discovered_metadata.title if discovered_metadata else None)
-
user_email = email or (discovered_metadata.author_email if discovered_metadata else None)
-
user_homepage = homepage or (str(discovered_metadata.author_uri or discovered_metadata.link) if discovered_metadata else None)
-
user_icon = icon or (str(discovered_metadata.logo or discovered_metadata.icon or discovered_metadata.image_url) if discovered_metadata else None)
-
-
# Add user to Git store
-
git_store.add_user(
-
username=username,
-
display_name=user_display_name,
-
email=user_email,
-
homepage=user_homepage,
-
icon=user_icon,
-
feeds=[str(validated_feed_url)],
-
)
-
-
# Commit changes
-
git_store.commit_changes(f"Add user: {username}")
-
-
print_success(f"Added user '{username}' with feed: {feed_url}")
-
-
if discovered_metadata and auto_discover:
-
print_info("Auto-discovered metadata:")
-
if user_display_name:
-
print_info(f" Display name: {user_display_name}")
-
if user_email:
-
print_info(f" Email: {user_email}")
-
if user_homepage:
-
print_info(f" Homepage: {user_homepage}")
-
if user_icon:
-
print_info(f" Icon: {user_icon}")
-
-
-
def add_feed(username: str, feed_url: Optional[str], config_file: Path) -> None:
-
"""Add a feed to an existing user."""
-
-
if not feed_url:
-
print_error("Feed URL is required")
-
raise typer.Exit(1)
-
-
# Validate feed URL
-
try:
-
validated_feed_url = HttpUrl(feed_url)
-
except ValidationError:
-
print_error(f"Invalid feed URL: {feed_url}")
-
raise typer.Exit(1) from None
-
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
# Check if user exists
-
user = git_store.get_user(username)
-
if not user:
-
print_error(f"User '{username}' not found")
-
print_error("Use 'thicket add user' to add a new user")
-
raise typer.Exit(1)
-
-
# Check if feed already exists
-
if str(validated_feed_url) in user.feeds:
-
print_error(f"Feed already exists for user '{username}': {feed_url}")
-
raise typer.Exit(1)
-
-
# Add feed to user
-
updated_feeds = user.feeds + [str(validated_feed_url)]
-
if git_store.update_user(username, feeds=updated_feeds):
-
git_store.commit_changes(f"Add feed to user {username}: {feed_url}")
-
print_success(f"Added feed to user '{username}': {feed_url}")
-
else:
-
print_error(f"Failed to add feed to user '{username}'")
-
raise typer.Exit(1)
-
-
-
async def discover_feed_metadata(feed_url: HttpUrl):
-
"""Discover metadata from a feed URL."""
-
try:
-
with create_progress() as progress:
-
task = progress.add_task("Discovering feed metadata...", total=None)
-
-
parser = FeedParser()
-
content = await parser.fetch_feed(feed_url)
-
metadata, _ = parser.parse_feed(content, feed_url)
-
-
progress.update(task, completed=True)
-
return metadata
-
-
except Exception as e:
-
print_error(f"Failed to discover feed metadata: {e}")
-
return None
-
</file>
-
-
<file path="src/thicket/cli/commands/duplicates.py">
-
"""Duplicates command for thicket."""
-
-
from pathlib import Path
-
from typing import Optional
-
-
import typer
-
from rich.table import Table
-
-
from ...core.git_store import GitStore
-
from ..main import app
-
from ..utils import (
-
console,
-
load_config,
-
print_error,
-
print_info,
-
print_success,
-
get_tsv_mode,
-
)
-
-
-
@app.command("duplicates")
-
def duplicates_command(
-
action: str = typer.Argument(..., help="Action: 'list', 'add', 'remove'"),
-
duplicate_id: Optional[str] = typer.Argument(None, help="Duplicate entry ID"),
-
canonical_id: Optional[str] = typer.Argument(None, help="Canonical entry ID"),
-
config_file: Optional[Path] = typer.Option(
-
Path("thicket.yaml"), "--config", help="Configuration file path"
-
),
-
) -> None:
-
"""Manage duplicate entry mappings."""
-
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
if action == "list":
-
list_duplicates(git_store)
-
elif action == "add":
-
add_duplicate(git_store, duplicate_id, canonical_id)
-
elif action == "remove":
-
remove_duplicate(git_store, duplicate_id)
-
else:
-
print_error(f"Unknown action: {action}")
-
print_error("Use 'list', 'add', or 'remove'")
-
raise typer.Exit(1)
-
-
-
def list_duplicates(git_store: GitStore) -> None:
-
"""List all duplicate mappings."""
-
duplicates = git_store.get_duplicates()
-
-
if not duplicates.duplicates:
-
if get_tsv_mode():
-
print("No duplicate mappings found")
-
else:
-
print_info("No duplicate mappings found")
-
return
-
-
if get_tsv_mode():
-
print("Duplicate ID\tCanonical ID")
-
for duplicate_id, canonical_id in duplicates.duplicates.items():
-
print(f"{duplicate_id}\t{canonical_id}")
-
print(f"Total duplicates: {len(duplicates.duplicates)}")
-
else:
-
table = Table(title="Duplicate Entry Mappings")
-
table.add_column("Duplicate ID", style="red")
-
table.add_column("Canonical ID", style="green")
-
-
for duplicate_id, canonical_id in duplicates.duplicates.items():
-
table.add_row(duplicate_id, canonical_id)
-
-
console.print(table)
-
print_info(f"Total duplicates: {len(duplicates.duplicates)}")
-
-
-
def add_duplicate(git_store: GitStore, duplicate_id: Optional[str], canonical_id: Optional[str]) -> None:
-
"""Add a duplicate mapping."""
-
if not duplicate_id:
-
print_error("Duplicate ID is required")
-
raise typer.Exit(1)
-
-
if not canonical_id:
-
print_error("Canonical ID is required")
-
raise typer.Exit(1)
-
-
# Check if duplicate_id already exists
-
duplicates = git_store.get_duplicates()
-
if duplicates.is_duplicate(duplicate_id):
-
existing_canonical = duplicates.get_canonical(duplicate_id)
-
print_error(f"Duplicate ID already mapped to: {existing_canonical}")
-
print_error("Use 'remove' first to change the mapping")
-
raise typer.Exit(1)
-
-
# Check if we're trying to make a canonical ID point to itself
-
if duplicate_id == canonical_id:
-
print_error("Duplicate ID cannot be the same as canonical ID")
-
raise typer.Exit(1)
-
-
# Add the mapping
-
git_store.add_duplicate(duplicate_id, canonical_id)
-
-
# Commit changes
-
git_store.commit_changes(f"Add duplicate mapping: {duplicate_id} -> {canonical_id}")
-
-
print_success(f"Added duplicate mapping: {duplicate_id} -> {canonical_id}")
-
-
-
def remove_duplicate(git_store: GitStore, duplicate_id: Optional[str]) -> None:
-
"""Remove a duplicate mapping."""
-
if not duplicate_id:
-
print_error("Duplicate ID is required")
-
raise typer.Exit(1)
-
-
# Check if mapping exists
-
duplicates = git_store.get_duplicates()
-
if not duplicates.is_duplicate(duplicate_id):
-
print_error(f"No duplicate mapping found for: {duplicate_id}")
-
raise typer.Exit(1)
-
-
canonical_id = duplicates.get_canonical(duplicate_id)
-
-
# Remove the mapping
-
if git_store.remove_duplicate(duplicate_id):
-
# Commit changes
-
git_store.commit_changes(f"Remove duplicate mapping: {duplicate_id} -> {canonical_id}")
-
print_success(f"Removed duplicate mapping: {duplicate_id} -> {canonical_id}")
-
else:
-
print_error(f"Failed to remove duplicate mapping: {duplicate_id}")
-
raise typer.Exit(1)
-
</file>
-
-
<file path="src/thicket/cli/commands/sync.py">
-
"""Sync command for thicket."""
-
-
import asyncio
-
from pathlib import Path
-
from typing import Optional
-
-
import typer
-
from rich.progress import track
-
-
from ...core.feed_parser import FeedParser
-
from ...core.git_store import GitStore
-
from ..main import app
-
from ..utils import (
-
load_config,
-
print_error,
-
print_info,
-
print_success,
-
)
-
-
-
@app.command()
-
def sync(
-
all_users: bool = typer.Option(
-
False, "--all", "-a", help="Sync all users and feeds"
-
),
-
user: Optional[str] = typer.Option(
-
None, "--user", "-u", help="Sync specific user only"
-
),
-
config_file: Optional[Path] = typer.Option(
-
Path("thicket.yaml"), "--config", help="Configuration file path"
-
),
-
dry_run: bool = typer.Option(
-
False, "--dry-run", help="Show what would be synced without making changes"
-
),
-
) -> None:
-
"""Sync feeds and store entries in Git repository."""
-
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
# Determine which users to sync from git repository
-
users_to_sync = []
-
if all_users:
-
index = git_store._load_index()
-
users_to_sync = list(index.users.values())
-
elif user:
-
user_metadata = git_store.get_user(user)
-
if not user_metadata:
-
print_error(f"User '{user}' not found in git repository")
-
raise typer.Exit(1)
-
users_to_sync = [user_metadata]
-
else:
-
print_error("Specify --all to sync all users or --user to sync a specific user")
-
raise typer.Exit(1)
-
-
if not users_to_sync:
-
print_info("No users configured to sync")
-
return
-
-
# Sync each user
-
total_new_entries = 0
-
total_updated_entries = 0
-
-
for user_metadata in users_to_sync:
-
print_info(f"Syncing user: {user_metadata.username}")
-
-
user_new_entries = 0
-
user_updated_entries = 0
-
-
# Sync each feed for the user
-
for feed_url in track(user_metadata.feeds, description=f"Syncing {user_metadata.username}'s feeds"):
-
try:
-
new_entries, updated_entries = asyncio.run(
-
sync_feed(git_store, user_metadata.username, feed_url, dry_run)
-
)
-
user_new_entries += new_entries
-
user_updated_entries += updated_entries
-
-
except Exception as e:
-
print_error(f"Failed to sync feed {feed_url}: {e}")
-
continue
-
-
print_info(f"User {user_metadata.username}: {user_new_entries} new, {user_updated_entries} updated")
-
total_new_entries += user_new_entries
-
total_updated_entries += user_updated_entries
-
-
# Commit changes if not dry run
-
if not dry_run and (total_new_entries > 0 or total_updated_entries > 0):
-
commit_message = f"Sync feeds: {total_new_entries} new entries, {total_updated_entries} updated"
-
git_store.commit_changes(commit_message)
-
print_success(f"Committed changes: {commit_message}")
-
-
# Summary
-
if dry_run:
-
print_info(f"Dry run complete: would sync {total_new_entries} new entries, {total_updated_entries} updated")
-
else:
-
print_success(f"Sync complete: {total_new_entries} new entries, {total_updated_entries} updated")
-
-
-
async def sync_feed(git_store: GitStore, username: str, feed_url, dry_run: bool) -> tuple[int, int]:
-
"""Sync a single feed for a user."""
-
-
parser = FeedParser()
-
-
try:
-
# Fetch and parse feed
-
content = await parser.fetch_feed(feed_url)
-
metadata, entries = parser.parse_feed(content, feed_url)
-
-
new_entries = 0
-
updated_entries = 0
-
-
# Process each entry
-
for entry in entries:
-
try:
-
# Check if entry already exists
-
existing_entry = git_store.get_entry(username, entry.id)
-
-
if existing_entry:
-
# Check if entry has been updated
-
if existing_entry.updated != entry.updated:
-
if not dry_run:
-
git_store.store_entry(username, entry)
-
updated_entries += 1
-
else:
-
# New entry
-
if not dry_run:
-
git_store.store_entry(username, entry)
-
new_entries += 1
-
-
except Exception as e:
-
print_error(f"Failed to process entry {entry.id}: {e}")
-
continue
-
-
return new_entries, updated_entries
-
-
except Exception as e:
-
print_error(f"Failed to sync feed {feed_url}: {e}")
-
return 0, 0
-
</file>
-
-
<file path="src/thicket/models/config.py">
-
"""Configuration models for thicket."""
-
-
from pathlib import Path
-
from typing import Optional
-
-
from pydantic import BaseModel, EmailStr, HttpUrl
-
from pydantic_settings import BaseSettings, SettingsConfigDict
-
-
-
class UserConfig(BaseModel):
-
"""Configuration for a single user and their feeds."""
-
-
username: str
-
feeds: list[HttpUrl]
-
email: Optional[EmailStr] = None
-
homepage: Optional[HttpUrl] = None
-
icon: Optional[HttpUrl] = None
-
display_name: Optional[str] = None
-
-
-
class ThicketConfig(BaseSettings):
-
"""Main configuration for thicket."""
-
-
model_config = SettingsConfigDict(
-
env_prefix="THICKET_",
-
env_file=".env",
-
yaml_file="thicket.yaml",
-
case_sensitive=False,
-
)
-
-
git_store: Path
-
cache_dir: Path
-
users: list[UserConfig] = []
-
</file>
-
-
<file path="src/thicket/cli/commands/links_cmd.py">
-
"""CLI command for extracting and categorizing all outbound links from blog entries."""
-
-
import json
-
import re
-
from pathlib import Path
-
from typing import Dict, List, Optional, Set
-
from urllib.parse import urljoin, urlparse
-
-
import typer
-
from rich.console import Console
-
from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn
-
from rich.table import Table
-
-
from ...core.git_store import GitStore
-
from ..main import app
-
from ..utils import load_config, get_tsv_mode
-
-
console = Console()
-
-
-
class LinkData:
-
"""Represents a link found in a blog entry."""
-
-
def __init__(self, url: str, entry_id: str, username: str):
-
self.url = url
-
self.entry_id = entry_id
-
self.username = username
-
-
def to_dict(self) -> dict:
-
"""Convert to dictionary for JSON serialization."""
-
return {
-
"url": self.url,
-
"entry_id": self.entry_id,
-
"username": self.username
-
}
-
-
@classmethod
-
def from_dict(cls, data: dict) -> "LinkData":
-
"""Create from dictionary."""
-
return cls(
-
url=data["url"],
-
entry_id=data["entry_id"],
-
username=data["username"]
-
)
-
-
-
class LinkCategorizer:
-
"""Categorizes links as internal, user, or unknown."""
-
-
def __init__(self, user_domains: Dict[str, Set[str]]):
-
self.user_domains = user_domains
-
# Create reverse mapping of domain -> username
-
self.domain_to_user = {}
-
for username, domains in user_domains.items():
-
for domain in domains:
-
self.domain_to_user[domain] = username
-
-
def categorize_url(self, url: str, source_username: str) -> tuple[str, Optional[str]]:
-
"""
-
Categorize a URL as 'internal', 'user', or 'unknown'.
-
Returns (category, target_username).
-
"""
-
try:
-
parsed = urlparse(url)
-
domain = parsed.netloc.lower()
-
-
# Check if it's a link to the same user's domain (internal)
-
if domain in self.user_domains.get(source_username, set()):
-
return "internal", source_username
-
-
# Check if it's a link to another user's domain
-
if domain in self.domain_to_user:
-
return "user", self.domain_to_user[domain]
-
-
# Everything else is unknown
-
return "unknown", None
-
-
except Exception:
-
return "unknown", None
-
-
-
class LinkExtractor:
-
"""Extracts and resolves links from blog entries."""
-
-
def __init__(self):
-
# Pattern for extracting links from HTML
-
self.link_pattern = re.compile(r'<a[^>]+href="([^"]+)"[^>]*>(.*?)</a>', re.IGNORECASE | re.DOTALL)
-
self.url_pattern = re.compile(r'https?://[^\s<>"]+')
-
-
def extract_links_from_html(self, html_content: str, base_url: str) -> List[tuple[str, str]]:
-
"""Extract all links from HTML content and resolve them against base URL."""
-
links = []
-
-
# Extract links from <a> tags
-
for match in self.link_pattern.finditer(html_content):
-
url = match.group(1)
-
text = re.sub(r'<[^>]+>', '', match.group(2)).strip() # Remove HTML tags from link text
-
-
# Resolve relative URLs against base URL
-
resolved_url = urljoin(base_url, url)
-
links.append((resolved_url, text))
-
-
return links
-
-
-
def extract_links_from_entry(self, entry, username: str, base_url: str) -> List[LinkData]:
-
"""Extract all links from a blog entry."""
-
links = []
-
-
# Combine all text content for analysis
-
content_to_search = []
-
if entry.content:
-
content_to_search.append(entry.content)
-
if entry.summary:
-
content_to_search.append(entry.summary)
-
-
for content in content_to_search:
-
extracted_links = self.extract_links_from_html(content, base_url)
-
-
for url, link_text in extracted_links:
-
# Skip empty URLs
-
if not url or url.startswith('#'):
-
continue
-
-
link_data = LinkData(
-
url=url,
-
entry_id=entry.id,
-
username=username
-
)
-
-
links.append(link_data)
-
-
return links
-
-
-
@app.command()
-
def links(
-
config_file: Optional[Path] = typer.Option(
-
Path("thicket.yaml"),
-
"--config",
-
"-c",
-
help="Path to configuration file",
-
),
-
output_file: Optional[Path] = typer.Option(
-
None,
-
"--output",
-
"-o",
-
help="Path to output unified links file (default: links.json in git store)",
-
),
-
verbose: bool = typer.Option(
-
False,
-
"--verbose",
-
"-v",
-
help="Show detailed progress information",
-
),
-
) -> None:
-
"""Extract and categorize all outbound links from blog entries.
-
-
This command analyzes all blog entries to extract outbound links,
-
resolve them properly with respect to the feed's base URL, and
-
categorize them as internal, user, or unknown links.
-
-
Creates a unified links.json file containing all link data.
-
"""
-
try:
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
# Build user domain mapping
-
if verbose:
-
console.print("Building user domain mapping...")
-
-
index = git_store._load_index()
-
user_domains = {}
-
-
for username, user_metadata in index.users.items():
-
domains = set()
-
-
# Add domains from feeds
-
for feed_url in user_metadata.feeds:
-
domain = urlparse(feed_url).netloc.lower()
-
if domain:
-
domains.add(domain)
-
-
# Add domain from homepage
-
if user_metadata.homepage:
-
domain = urlparse(str(user_metadata.homepage)).netloc.lower()
-
if domain:
-
domains.add(domain)
-
-
user_domains[username] = domains
-
-
if verbose:
-
console.print(f"Found {len(user_domains)} users with {sum(len(d) for d in user_domains.values())} total domains")
-
-
# Initialize components
-
link_extractor = LinkExtractor()
-
categorizer = LinkCategorizer(user_domains)
-
-
# Get all users
-
users = list(index.users.keys())
-
-
if not users:
-
console.print("[yellow]No users found in Git store[/yellow]")
-
raise typer.Exit(0)
-
-
# Process all entries
-
all_links = []
-
link_categories = {"internal": [], "user": [], "unknown": []}
-
link_dict = {} # Dictionary with link URL as key, maps to list of atom IDs
-
reverse_dict = {} # Dictionary with atom ID as key, maps to list of URLs
-
-
with Progress(
-
SpinnerColumn(),
-
TextColumn("[progress.description]{task.description}"),
-
BarColumn(),
-
TaskProgressColumn(),
-
console=console,
-
) as progress:
-
-
# Count total entries first
-
counting_task = progress.add_task("Counting entries...", total=len(users))
-
total_entries = 0
-
-
for username in users:
-
entries = git_store.list_entries(username)
-
total_entries += len(entries)
-
progress.advance(counting_task)
-
-
progress.remove_task(counting_task)
-
-
# Process entries
-
processing_task = progress.add_task(
-
f"Processing {total_entries} entries...",
-
total=total_entries
-
)
-
-
for username in users:
-
entries = git_store.list_entries(username)
-
user_metadata = index.users[username]
-
-
# Get base URL for this user (use first feed URL)
-
base_url = str(user_metadata.feeds[0]) if user_metadata.feeds else "https://example.com"
-
-
for entry in entries:
-
# Extract links from this entry
-
entry_links = link_extractor.extract_links_from_entry(entry, username, base_url)
-
-
# Track unique links per entry
-
entry_urls_seen = set()
-
-
# Categorize each link
-
for link_data in entry_links:
-
# Skip if we've already seen this URL in this entry
-
if link_data.url in entry_urls_seen:
-
continue
-
entry_urls_seen.add(link_data.url)
-
-
category, target_username = categorizer.categorize_url(link_data.url, username)
-
-
# Add to link dictionary (URL as key, maps to list of atom IDs)
-
if link_data.url not in link_dict:
-
link_dict[link_data.url] = []
-
if link_data.entry_id not in link_dict[link_data.url]:
-
link_dict[link_data.url].append(link_data.entry_id)
-
-
# Also add to reverse mapping (atom ID -> list of URLs)
-
if link_data.entry_id not in reverse_dict:
-
reverse_dict[link_data.entry_id] = []
-
if link_data.url not in reverse_dict[link_data.entry_id]:
-
reverse_dict[link_data.entry_id].append(link_data.url)
-
-
# Add category info to link data for categories tracking
-
link_info = link_data.to_dict()
-
link_info["category"] = category
-
link_info["target_username"] = target_username
-
-
all_links.append(link_info)
-
link_categories[category].append(link_info)
-
-
progress.advance(processing_task)
-
-
if verbose and entry_links:
-
console.print(f" Found {len(entry_links)} links in {username}:{entry.title[:50]}...")
-
-
# Determine output path
-
if output_file:
-
output_path = output_file
-
else:
-
output_path = config.git_store / "links.json"
-
-
# Save all extracted links (not just filtered ones)
-
if verbose:
-
console.print("Preparing output data...")
-
-
# Build a set of all URLs that correspond to posts in the git database
-
registered_urls = set()
-
-
# Get all entries from all users and build URL mappings
-
for username in users:
-
entries = git_store.list_entries(username)
-
user_metadata = index.users[username]
-
-
for entry in entries:
-
# Try to match entry URLs with extracted links
-
if hasattr(entry, 'link') and entry.link:
-
registered_urls.add(str(entry.link))
-
-
# Also check entry alternate links if they exist
-
if hasattr(entry, 'links') and entry.links:
-
for link in entry.links:
-
if hasattr(link, 'href') and link.href:
-
registered_urls.add(str(link.href))
-
-
# Build unified structure with metadata
-
unified_links = {}
-
reverse_mapping = {}
-
-
for url, entry_ids in link_dict.items():
-
unified_links[url] = {
-
"referencing_entries": entry_ids
-
}
-
-
# Find target username if this is a tracked post
-
if url in registered_urls:
-
for username in users:
-
user_domains_set = {domain for domain in user_domains.get(username, [])}
-
if any(domain in url for domain in user_domains_set):
-
unified_links[url]["target_username"] = username
-
break
-
-
# Build reverse mapping
-
for entry_id in entry_ids:
-
if entry_id not in reverse_mapping:
-
reverse_mapping[entry_id] = []
-
if url not in reverse_mapping[entry_id]:
-
reverse_mapping[entry_id].append(url)
-
-
# Create unified output data
-
output_data = {
-
"links": unified_links,
-
"reverse_mapping": reverse_mapping,
-
"user_domains": {k: list(v) for k, v in user_domains.items()}
-
}
-
-
if verbose:
-
console.print(f"Found {len(registered_urls)} registered post URLs")
-
console.print(f"Found {len(link_dict)} total links, {sum(1 for link in unified_links.values() if 'target_username' in link)} tracked posts")
-
-
# Save unified data
-
with open(output_path, "w") as f:
-
json.dump(output_data, f, indent=2, default=str)
-
-
# Show summary
-
if not get_tsv_mode():
-
console.print("\n[green]โœ“ Links extraction completed successfully[/green]")
-
-
# Create summary table or TSV output
-
if get_tsv_mode():
-
print("Category\tCount\tDescription")
-
print(f"Internal\t{len(link_categories['internal'])}\tLinks to same user's domain")
-
print(f"User\t{len(link_categories['user'])}\tLinks to other tracked users")
-
print(f"Unknown\t{len(link_categories['unknown'])}\tLinks to external sites")
-
print(f"Total Extracted\t{len(all_links)}\tAll extracted links")
-
print(f"Saved to Output\t{len(output_data['links'])}\tLinks saved to output file")
-
print(f"Cross-references\t{sum(1 for link in unified_links.values() if 'target_username' in link)}\tLinks to registered posts only")
-
else:
-
table = Table(title="Links Summary")
-
table.add_column("Category", style="cyan")
-
table.add_column("Count", style="green")
-
table.add_column("Description", style="white")
-
-
table.add_row("Internal", str(len(link_categories["internal"])), "Links to same user's domain")
-
table.add_row("User", str(len(link_categories["user"])), "Links to other tracked users")
-
table.add_row("Unknown", str(len(link_categories["unknown"])), "Links to external sites")
-
table.add_row("Total Extracted", str(len(all_links)), "All extracted links")
-
table.add_row("Saved to Output", str(len(output_data['links'])), "Links saved to output file")
-
table.add_row("Cross-references", str(sum(1 for link in unified_links.values() if 'target_username' in link)), "Links to registered posts only")
-
-
console.print(table)
-
-
# Show user links if verbose
-
if verbose and link_categories["user"]:
-
if get_tsv_mode():
-
print("User Link Source\tUser Link Target\tLink Count")
-
user_link_counts = {}
-
-
for link in link_categories["user"]:
-
key = f"{link['username']} -> {link['target_username']}"
-
user_link_counts[key] = user_link_counts.get(key, 0) + 1
-
-
for link_pair, count in sorted(user_link_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
-
source, target = link_pair.split(" -> ")
-
print(f"{source}\t{target}\t{count}")
-
else:
-
console.print("\n[bold]User-to-user links:[/bold]")
-
user_link_counts = {}
-
-
for link in link_categories["user"]:
-
key = f"{link['username']} -> {link['target_username']}"
-
user_link_counts[key] = user_link_counts.get(key, 0) + 1
-
-
for link_pair, count in sorted(user_link_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
-
console.print(f" {link_pair}: {count} links")
-
-
if not get_tsv_mode():
-
console.print(f"\nUnified links data saved to: {output_path}")
-
-
except Exception as e:
-
console.print(f"[red]Error extracting links: {e}[/red]")
-
if verbose:
-
console.print_exception()
-
raise typer.Exit(1)
-
</file>
-
-
<file path="src/thicket/cli/commands/list_cmd.py">
-
"""List command for thicket."""
-
-
import re
-
from pathlib import Path
-
from typing import Optional
-
-
import typer
-
from rich.table import Table
-
-
from ...core.git_store import GitStore
-
from ..main import app
-
from ..utils import (
-
console,
-
load_config,
-
print_error,
-
print_feeds_table,
-
print_feeds_table_from_git,
-
print_info,
-
print_users_table,
-
print_users_table_from_git,
-
print_entries_tsv,
-
get_tsv_mode,
-
)
-
-
-
@app.command("list")
-
def list_command(
-
what: str = typer.Argument(..., help="What to list: 'users', 'feeds', 'entries'"),
-
user: Optional[str] = typer.Option(
-
None, "--user", "-u", help="Filter by specific user"
-
),
-
limit: Optional[int] = typer.Option(
-
None, "--limit", "-l", help="Limit number of results"
-
),
-
config_file: Optional[Path] = typer.Option(
-
Path("thicket.yaml"), "--config", help="Configuration file path"
-
),
-
) -> None:
-
"""List users, feeds, or entries."""
-
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
if what == "users":
-
list_users(git_store)
-
elif what == "feeds":
-
list_feeds(git_store, user)
-
elif what == "entries":
-
list_entries(git_store, user, limit)
-
else:
-
print_error(f"Unknown list type: {what}")
-
print_error("Use 'users', 'feeds', or 'entries'")
-
raise typer.Exit(1)
-
-
-
def list_users(git_store: GitStore) -> None:
-
"""List all users."""
-
index = git_store._load_index()
-
users = list(index.users.values())
-
-
if not users:
-
print_info("No users configured")
-
return
-
-
print_users_table_from_git(users)
-
-
-
def list_feeds(git_store: GitStore, username: Optional[str] = None) -> None:
-
"""List feeds, optionally filtered by user."""
-
if username:
-
user = git_store.get_user(username)
-
if not user:
-
print_error(f"User '{username}' not found")
-
raise typer.Exit(1)
-
-
if not user.feeds:
-
print_info(f"No feeds configured for user '{username}'")
-
return
-
-
print_feeds_table_from_git(git_store, username)
-
-
-
def list_entries(git_store: GitStore, username: Optional[str] = None, limit: Optional[int] = None) -> None:
-
"""List entries, optionally filtered by user."""
-
-
if username:
-
# List entries for specific user
-
user = git_store.get_user(username)
-
if not user:
-
print_error(f"User '{username}' not found")
-
raise typer.Exit(1)
-
-
entries = git_store.list_entries(username, limit)
-
if not entries:
-
print_info(f"No entries found for user '{username}'")
-
return
-
-
print_entries_table([entries], [username])
-
-
else:
-
# List entries for all users
-
all_entries = []
-
all_usernames = []
-
-
index = git_store._load_index()
-
for user in index.users.values():
-
entries = git_store.list_entries(user.username, limit)
-
if entries:
-
all_entries.append(entries)
-
all_usernames.append(user.username)
-
-
if not all_entries:
-
print_info("No entries found")
-
return
-
-
print_entries_table(all_entries, all_usernames)
-
-
-
def _clean_html_content(content: Optional[str]) -> str:
-
"""Clean HTML content for display in table."""
-
if not content:
-
return ""
-
-
# Remove HTML tags
-
clean_text = re.sub(r'<[^>]+>', ' ', content)
-
# Replace multiple whitespace with single space
-
clean_text = re.sub(r'\s+', ' ', clean_text)
-
# Strip and limit length
-
clean_text = clean_text.strip()
-
if len(clean_text) > 100:
-
clean_text = clean_text[:97] + "..."
-
-
return clean_text
-
-
-
def print_entries_table(entries_by_user: list[list], usernames: list[str]) -> None:
-
"""Print a table of entries."""
-
if get_tsv_mode():
-
print_entries_tsv(entries_by_user, usernames)
-
return
-
-
table = Table(title="Feed Entries")
-
table.add_column("User", style="cyan", no_wrap=True)
-
table.add_column("Title", style="bold")
-
table.add_column("Updated", style="blue")
-
table.add_column("URL", style="green")
-
-
# Combine all entries with usernames
-
all_entries = []
-
for entries, username in zip(entries_by_user, usernames):
-
for entry in entries:
-
all_entries.append((username, entry))
-
-
# Sort by updated time (newest first)
-
all_entries.sort(key=lambda x: x[1].updated, reverse=True)
-
-
for username, entry in all_entries:
-
# Format updated time
-
updated_str = entry.updated.strftime("%Y-%m-%d %H:%M")
-
-
# Truncate title if too long
-
title = entry.title
-
if len(title) > 50:
-
title = title[:47] + "..."
-
-
table.add_row(
-
username,
-
title,
-
updated_str,
-
str(entry.link),
-
)
-
-
console.print(table)
-
</file>
-
-
<file path="src/thicket/cli/main.py">
-
"""Main CLI application using Typer."""
-
-
import typer
-
from rich.console import Console
-
-
from .. import __version__
-
-
app = typer.Typer(
-
name="thicket",
-
help="A CLI tool for persisting Atom/RSS feeds in Git repositories",
-
no_args_is_help=True,
-
rich_markup_mode="rich",
-
)
-
-
console = Console()
-
-
# Global state for TSV output mode
-
tsv_mode = False
-
-
-
def version_callback(value: bool) -> None:
-
"""Show version and exit."""
-
if value:
-
console.print(f"thicket version {__version__}")
-
raise typer.Exit()
-
-
-
@app.callback()
-
def main(
-
version: bool = typer.Option(
-
None,
-
"--version",
-
"-v",
-
help="Show the version and exit",
-
callback=version_callback,
-
is_eager=True,
-
),
-
tsv: bool = typer.Option(
-
False,
-
"--tsv",
-
help="Output in tab-separated values format without truncation",
-
),
-
) -> None:
-
"""Thicket: A CLI tool for persisting Atom/RSS feeds in Git repositories."""
-
global tsv_mode
-
tsv_mode = tsv
-
-
-
# Import commands to register them
-
from .commands import add, duplicates, generate, index_cmd, info_cmd, init, links_cmd, list_cmd, sync
-
-
if __name__ == "__main__":
-
app()
-
</file>
-
-
<file path="src/thicket/core/git_store.py">
-
"""Git repository operations for thicket."""
-
-
import json
-
from datetime import datetime
-
from pathlib import Path
-
from typing import Optional
-
-
import git
-
from git import Repo
-
-
from ..models import AtomEntry, DuplicateMap, GitStoreIndex, UserMetadata
-
-
-
class GitStore:
-
"""Manages the Git repository for storing feed entries."""
-
-
def __init__(self, repo_path: Path):
-
"""Initialize the Git store."""
-
self.repo_path = repo_path
-
self.repo: Optional[Repo] = None
-
self._ensure_repo()
-
-
def _ensure_repo(self) -> None:
-
"""Ensure the Git repository exists and is initialized."""
-
if not self.repo_path.exists():
-
self.repo_path.mkdir(parents=True, exist_ok=True)
-
-
try:
-
self.repo = Repo(self.repo_path)
-
except git.InvalidGitRepositoryError:
-
# Initialize new repository
-
self.repo = Repo.init(self.repo_path)
-
self._create_initial_structure()
-
-
def _create_initial_structure(self) -> None:
-
"""Create initial Git store structure."""
-
# Create index.json
-
index = GitStoreIndex(
-
created=datetime.now(),
-
last_updated=datetime.now(),
-
)
-
self._save_index(index)
-
-
# Create duplicates.json
-
duplicates = DuplicateMap()
-
self._save_duplicates(duplicates)
-
-
# Create initial commit
-
self.repo.index.add(["index.json", "duplicates.json"])
-
self.repo.index.commit("Initial thicket repository structure")
-
-
def _save_index(self, index: GitStoreIndex) -> None:
-
"""Save the index to index.json."""
-
index_path = self.repo_path / "index.json"
-
with open(index_path, "w") as f:
-
json.dump(index.model_dump(mode="json", exclude_none=True), f, indent=2, default=str)
-
-
def _load_index(self) -> GitStoreIndex:
-
"""Load the index from index.json."""
-
index_path = self.repo_path / "index.json"
-
if not index_path.exists():
-
return GitStoreIndex(
-
created=datetime.now(),
-
last_updated=datetime.now(),
-
)
-
-
with open(index_path) as f:
-
data = json.load(f)
-
-
return GitStoreIndex(**data)
-
-
def _save_duplicates(self, duplicates: DuplicateMap) -> None:
-
"""Save duplicates map to duplicates.json."""
-
duplicates_path = self.repo_path / "duplicates.json"
-
with open(duplicates_path, "w") as f:
-
json.dump(duplicates.model_dump(exclude_none=True), f, indent=2)
-
-
def _load_duplicates(self) -> DuplicateMap:
-
"""Load duplicates map from duplicates.json."""
-
duplicates_path = self.repo_path / "duplicates.json"
-
if not duplicates_path.exists():
-
return DuplicateMap()
-
-
with open(duplicates_path) as f:
-
data = json.load(f)
-
-
return DuplicateMap(**data)
-
-
def add_user(self, username: str, display_name: Optional[str] = None,
-
email: Optional[str] = None, homepage: Optional[str] = None,
-
icon: Optional[str] = None, feeds: Optional[list[str]] = None) -> UserMetadata:
-
"""Add a new user to the Git store."""
-
index = self._load_index()
-
-
# Create user directory
-
user_dir = self.repo_path / username
-
user_dir.mkdir(exist_ok=True)
-
-
# Create user metadata
-
user_metadata = UserMetadata(
-
username=username,
-
display_name=display_name,
-
email=email,
-
homepage=homepage,
-
icon=icon,
-
feeds=feeds or [],
-
directory=username,
-
created=datetime.now(),
-
last_updated=datetime.now(),
-
)
-
-
-
# Update index
-
index.add_user(user_metadata)
-
self._save_index(index)
-
-
return user_metadata
-
-
def get_user(self, username: str) -> Optional[UserMetadata]:
-
"""Get user metadata by username."""
-
index = self._load_index()
-
return index.get_user(username)
-
-
def update_user(self, username: str, **kwargs) -> bool:
-
"""Update user metadata."""
-
index = self._load_index()
-
user = index.get_user(username)
-
-
if not user:
-
return False
-
-
# Update user metadata
-
for key, value in kwargs.items():
-
if hasattr(user, key) and value is not None:
-
setattr(user, key, value)
-
-
user.update_timestamp()
-
-
-
# Update index
-
index.add_user(user)
-
self._save_index(index)
-
-
return True
-
-
def store_entry(self, username: str, entry: AtomEntry) -> bool:
-
"""Store an entry in the user's directory."""
-
user = self.get_user(username)
-
if not user:
-
return False
-
-
# Sanitize entry ID for filename
-
from .feed_parser import FeedParser
-
parser = FeedParser()
-
safe_id = parser.sanitize_entry_id(entry.id)
-
-
# Create entry file
-
user_dir = self.repo_path / user.directory
-
entry_path = user_dir / f"{safe_id}.json"
-
-
# Check if entry already exists
-
entry_exists = entry_path.exists()
-
-
# Save entry
-
with open(entry_path, "w") as f:
-
json.dump(entry.model_dump(mode="json", exclude_none=True), f, indent=2, default=str)
-
-
# Update user metadata if new entry
-
if not entry_exists:
-
index = self._load_index()
-
index.update_entry_count(username, 1)
-
self._save_index(index)
-
-
return True
-
-
def get_entry(self, username: str, entry_id: str) -> Optional[AtomEntry]:
-
"""Get an entry by username and entry ID."""
-
user = self.get_user(username)
-
if not user:
-
return None
-
-
# Sanitize entry ID
-
from .feed_parser import FeedParser
-
parser = FeedParser()
-
safe_id = parser.sanitize_entry_id(entry_id)
-
-
entry_path = self.repo_path / user.directory / f"{safe_id}.json"
-
if not entry_path.exists():
-
return None
-
-
with open(entry_path) as f:
-
data = json.load(f)
-
-
return AtomEntry(**data)
-
-
def list_entries(self, username: str, limit: Optional[int] = None) -> list[AtomEntry]:
-
"""List entries for a user."""
-
user = self.get_user(username)
-
if not user:
-
return []
-
-
user_dir = self.repo_path / user.directory
-
if not user_dir.exists():
-
return []
-
-
entries = []
-
entry_files = sorted(user_dir.glob("*.json"), key=lambda p: p.stat().st_mtime, reverse=True)
-
-
-
if limit:
-
entry_files = entry_files[:limit]
-
-
for entry_file in entry_files:
-
try:
-
with open(entry_file) as f:
-
data = json.load(f)
-
entries.append(AtomEntry(**data))
-
except Exception:
-
# Skip invalid entries
-
continue
-
-
return entries
-
-
def get_duplicates(self) -> DuplicateMap:
-
"""Get the duplicates map."""
-
return self._load_duplicates()
-
-
def add_duplicate(self, duplicate_id: str, canonical_id: str) -> None:
-
"""Add a duplicate mapping."""
-
duplicates = self._load_duplicates()
-
duplicates.add_duplicate(duplicate_id, canonical_id)
-
self._save_duplicates(duplicates)
-
-
def remove_duplicate(self, duplicate_id: str) -> bool:
-
"""Remove a duplicate mapping."""
-
duplicates = self._load_duplicates()
-
result = duplicates.remove_duplicate(duplicate_id)
-
self._save_duplicates(duplicates)
-
return result
-
-
def commit_changes(self, message: str) -> None:
-
"""Commit all changes to the Git repository."""
-
if not self.repo:
-
return
-
-
# Add all changes
-
self.repo.git.add(A=True)
-
-
# Check if there are changes to commit
-
if self.repo.index.diff("HEAD"):
-
self.repo.index.commit(message)
-
-
def get_stats(self) -> dict:
-
"""Get statistics about the Git store."""
-
index = self._load_index()
-
duplicates = self._load_duplicates()
-
-
return {
-
"total_users": len(index.users),
-
"total_entries": index.total_entries,
-
"total_duplicates": len(duplicates.duplicates),
-
"last_updated": index.last_updated,
-
"repository_size": sum(f.stat().st_size for f in self.repo_path.rglob("*") if f.is_file()),
-
}
-
-
def search_entries(self, query: str, username: Optional[str] = None,
-
limit: Optional[int] = None) -> list[tuple[str, AtomEntry]]:
-
"""Search entries by content."""
-
results = []
-
-
# Get users to search
-
index = self._load_index()
-
users = [index.get_user(username)] if username else list(index.users.values())
-
users = [u for u in users if u is not None]
-
-
for user in users:
-
user_dir = self.repo_path / user.directory
-
if not user_dir.exists():
-
continue
-
-
entry_files = user_dir.glob("*.json")
-
-
for entry_file in entry_files:
-
try:
-
with open(entry_file) as f:
-
data = json.load(f)
-
-
entry = AtomEntry(**data)
-
-
# Simple text search in title, summary, and content
-
searchable_text = " ".join(filter(None, [
-
entry.title,
-
entry.summary or "",
-
entry.content or "",
-
])).lower()
-
-
if query.lower() in searchable_text:
-
results.append((user.username, entry))
-
-
if limit and len(results) >= limit:
-
return results
-
-
except Exception:
-
# Skip invalid entries
-
continue
-
-
# Sort by updated time (newest first)
-
results.sort(key=lambda x: x[1].updated, reverse=True)
-
-
return results[:limit] if limit else results
-
</file>
-
-
<file path="ARCH.md">
-
# Thicket Architecture Design
-
-
## Overview
-
Thicket is a modern CLI tool for persisting Atom/RSS feeds in a Git repository, designed to enable distributed webblog comment structures.
-
-
## Technology Stack
-
-
### Core Libraries
-
-
#### CLI Framework
-
- **Typer** (0.15.x) - Modern CLI framework with type hints
-
- **Rich** (13.x) - Beautiful terminal output, progress bars, and tables
-
- **prompt-toolkit** - Interactive prompts when needed
-
-
#### Feed Processing
-
- **feedparser** (6.0.11) - Universal feed parser supporting RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0
-
- Alternative: **atoma** for stricter Atom/RSS parsing with JSON feed support
-
- Alternative: **fastfeedparser** for high-performance parsing (10x faster)
-
-
#### Git Integration
-
- **GitPython** (3.1.44) - High-level git operations, requires git CLI
-
- Alternative: **pygit2** (1.18.0) - Direct libgit2 bindings, better for authentication
-
-
#### HTTP Client
-
- **httpx** (0.28.x) - Modern async/sync HTTP client with connection pooling
-
- **aiohttp** (3.11.x) - For async-only operations if needed
-
-
#### Configuration & Data Models
-
- **pydantic** (2.11.x) - Data validation and settings management
-
- **pydantic-settings** (2.10.x) - Configuration file handling with env var support
-
-
#### Utilities
-
- **pendulum** (3.x) - Better datetime handling
-
- **bleach** (6.x) - HTML sanitization for feed content
-
- **platformdirs** (4.x) - Cross-platform directory paths
-
-
## Project Structure
-
-
```
-
thicket/
-
โ”œโ”€โ”€ pyproject.toml # Modern Python packaging
-
โ”œโ”€โ”€ README.md # Project documentation
-
โ”œโ”€โ”€ ARCH.md # This file
-
โ”œโ”€โ”€ CLAUDE.md # Project instructions
-
โ”œโ”€โ”€ .gitignore
-
โ”œโ”€โ”€ src/
-
โ”‚ โ””โ”€โ”€ thicket/
-
โ”‚ โ”œโ”€โ”€ __init__.py
-
โ”‚ โ”œโ”€โ”€ __main__.py # Entry point for `python -m thicket`
-
โ”‚ โ”œโ”€โ”€ cli/ # CLI commands and interface
-
โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
-
โ”‚ โ”‚ โ”œโ”€โ”€ main.py # Main CLI app with Typer
-
โ”‚ โ”‚ โ”œโ”€โ”€ commands/ # Subcommands
-
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
-
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ init.py # Initialize git store
-
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ add.py # Add users and feeds
-
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ sync.py # Sync feeds
-
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ list_cmd.py # List users/feeds
-
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ duplicates.py # Manage duplicate entries
-
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ links_cmd.py # Extract and categorize links
-
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ index_cmd.py # Build reference index and show threads
-
โ”‚ โ”‚ โ””โ”€โ”€ utils.py # CLI utilities (progress, formatting)
-
โ”‚ โ”œโ”€โ”€ core/ # Core business logic
-
โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
-
โ”‚ โ”‚ โ”œโ”€โ”€ feed_parser.py # Feed parsing and normalization
-
โ”‚ โ”‚ โ”œโ”€โ”€ git_store.py # Git repository operations
-
โ”‚ โ”‚ โ””โ”€โ”€ reference_parser.py # Link extraction and threading
-
โ”‚ โ”œโ”€โ”€ models/ # Pydantic data models
-
โ”‚ โ”‚ โ”œโ”€โ”€ __init__.py
-
โ”‚ โ”‚ โ”œโ”€โ”€ config.py # Configuration models
-
โ”‚ โ”‚ โ”œโ”€โ”€ feed.py # Feed/Entry models
-
โ”‚ โ”‚ โ””โ”€โ”€ user.py # User metadata models
-
โ”‚ โ””โ”€โ”€ utils/ # Shared utilities
-
โ”‚ โ””โ”€โ”€ __init__.py
-
โ”œโ”€โ”€ tests/
-
โ”‚ โ”œโ”€โ”€ __init__.py
-
โ”‚ โ”œโ”€โ”€ conftest.py # pytest configuration
-
โ”‚ โ”œโ”€โ”€ test_feed_parser.py
-
โ”‚ โ”œโ”€โ”€ test_git_store.py
-
โ”‚ โ””โ”€โ”€ fixtures/ # Test data
-
โ”‚ โ””โ”€โ”€ feeds/
-
โ””โ”€โ”€ docs/
-
โ””โ”€โ”€ examples/ # Example configurations
-
```
-
-
## Data Models
-
-
### Configuration File (YAML/TOML)
-
```python
-
class ThicketConfig(BaseSettings):
-
git_store: Path # Git repository location
-
cache_dir: Path # Cache directory
-
users: list[UserConfig]
-
-
model_config = SettingsConfigDict(
-
env_prefix="THICKET_",
-
env_file=".env",
-
yaml_file="thicket.yaml"
-
)
-
-
class UserConfig(BaseModel):
-
username: str
-
feeds: list[HttpUrl]
-
email: Optional[EmailStr] = None
-
homepage: Optional[HttpUrl] = None
-
icon: Optional[HttpUrl] = None
-
display_name: Optional[str] = None
-
```
-
-
### Feed Storage Format
-
```python
-
class AtomEntry(BaseModel):
-
id: str # Original Atom ID
-
title: str
-
link: HttpUrl
-
updated: datetime
-
published: Optional[datetime]
-
summary: Optional[str]
-
content: Optional[str] # Full body content from Atom entry
-
content_type: Optional[str] = "html" # text, html, xhtml
-
author: Optional[dict]
-
categories: list[str] = []
-
rights: Optional[str] = None # Copyright info
-
source: Optional[str] = None # Source feed URL
-
# Additional Atom fields preserved during RSS->Atom conversion
-
-
model_config = ConfigDict(
-
json_encoders={
-
datetime: lambda v: v.isoformat()
-
}
-
)
-
-
class DuplicateMap(BaseModel):
-
"""Maps duplicate entry IDs to canonical entry IDs"""
-
duplicates: dict[str, str] = {} # duplicate_id -> canonical_id
-
comment: str = "Entry IDs that map to the same canonical content"
-
-
def add_duplicate(self, duplicate_id: str, canonical_id: str) -> None:
-
"""Add a duplicate mapping"""
-
self.duplicates[duplicate_id] = canonical_id
-
-
def remove_duplicate(self, duplicate_id: str) -> bool:
-
"""Remove a duplicate mapping. Returns True if existed."""
-
return self.duplicates.pop(duplicate_id, None) is not None
-
-
def get_canonical(self, entry_id: str) -> str:
-
"""Get canonical ID for an entry (returns original if not duplicate)"""
-
return self.duplicates.get(entry_id, entry_id)
-
-
def is_duplicate(self, entry_id: str) -> bool:
-
"""Check if entry ID is marked as duplicate"""
-
return entry_id in self.duplicates
-
```
-
-
## Git Repository Structure
-
```
-
git-store/
-
โ”œโ”€โ”€ index.json # User directory index
-
โ”œโ”€โ”€ duplicates.json # Manual curation of duplicate entries
-
โ”œโ”€โ”€ links.json # Unified links, references, and mapping data
-
โ”œโ”€โ”€ user1/
-
โ”‚ โ”œโ”€โ”€ entry_id_1.json # Sanitized entry files
-
โ”‚ โ”œโ”€โ”€ entry_id_2.json
-
โ”‚ โ””โ”€โ”€ ...
-
โ””โ”€โ”€ user2/
-
โ””โ”€โ”€ ...
-
```
-
-
## Key Design Decisions
-
-
### 1. Feed Normalization & Auto-Discovery
-
- All RSS feeds converted to Atom format before storage
-
- Preserves maximum metadata during conversion
-
- Sanitizes HTML content to prevent XSS
-
- **Auto-discovery**: Extracts user metadata from feed during `add user` command
-
-
### 2. ID Sanitization
-
- Consistent algorithm to convert Atom IDs to safe filenames
-
- Handles edge cases (very long IDs, special characters)
-
- Maintains reversibility where possible
-
-
### 3. Git Operations
-
- Uses GitPython for simplicity (no authentication required)
-
- Single main branch for all users and entries
-
- Atomic commits per sync operation
-
- Meaningful commit messages with feed update summaries
-
- Preserves complete history - never delete entries even if they disappear from feeds
-
-
### 4. Caching Strategy
-
- HTTP caching with Last-Modified/ETag support
-
- Local cache of parsed feeds with TTL
-
- Cache invalidation on configuration changes
-
- Git store serves as permanent historical archive beyond feed depth limits
-
-
### 5. Error Handling
-
- Graceful handling of feed parsing errors
-
- Retry logic for network failures
-
- Clear error messages with recovery suggestions
-
-
## CLI Command Structure
-
-
```bash
-
# Initialize a new git store
-
thicket init /path/to/store
-
-
# Add a user with feeds (auto-discovers metadata from feed)
-
thicket add user "alyssa" \
-
--feed "https://example.com/feed.atom"
-
# Auto-populates: email, homepage, icon, display_name from feed metadata
-
-
# Add a user with manual overrides
-
thicket add user "alyssa" \
-
--feed "https://example.com/feed.atom" \
-
--email "alyssa@example.com" \
-
--homepage "https://alyssa.example.com" \
-
--icon "https://example.com/avatar.png" \
-
--display-name "Alyssa P. Hacker"
-
-
# Add additional feed to existing user
-
thicket add feed "alyssa" "https://example.com/other-feed.rss"
-
-
# Sync all feeds (designed for cron usage)
-
thicket sync --all
-
-
# Sync specific user
-
thicket sync --user alyssa
-
-
# List users and their feeds
-
thicket list users
-
thicket list feeds --user alyssa
-
-
# Manage duplicate entries
-
thicket duplicates list
-
thicket duplicates add <entry_id_1> <entry_id_2> # Mark as duplicates
-
thicket duplicates remove <entry_id_1> <entry_id_2> # Unmark duplicates
-
-
# Link processing and threading
-
thicket links --verbose # Extract and categorize all links
-
thicket index --verbose # Build reference index for threading
-
thicket threads # Show conversation threads
-
thicket threads --username user1 # Show threads for specific user
-
thicket threads --min-size 3 # Show threads with minimum size
-
```
-
-
## Performance Considerations
-
-
1. **Concurrent Feed Fetching**: Use httpx with asyncio for parallel downloads
-
2. **Incremental Updates**: Only fetch/parse feeds that have changed
-
3. **Efficient Git Operations**: Batch commits, use shallow clones where appropriate
-
4. **Progress Feedback**: Rich progress bars for long operations
-
-
## Security Considerations
-
-
1. **HTML Sanitization**: Use bleach to clean feed content
-
2. **URL Validation**: Strict validation of feed URLs
-
3. **Git Security**: No credentials stored in repository
-
4. **Path Traversal**: Careful sanitization of filenames
-
-
## Future Enhancements
-
-
1. **Web Interface**: Optional web UI for browsing the git store
-
2. **Webhooks**: Notify external services on feed updates
-
3. **Feed Discovery**: Auto-discover feeds from HTML pages
-
4. **Export Formats**: Generate static sites, OPML exports
-
5. **Federation**: P2P sync between thicket instances
-
-
## Requirements Clarification
-
-
**โœ“ Resolved Requirements:**
-
1. **Feed Update Frequency**: Designed for cron usage - no built-in scheduling needed
-
2. **Duplicate Handling**: Manual curation via `duplicates.json` file with CLI commands
-
3. **Git Branching**: Single main branch for all users and entries
-
4. **Authentication**: No feeds require authentication currently
-
5. **Content Storage**: Store complete Atom entry body content as provided
-
6. **Deleted Entries**: Preserve all entries in Git store permanently (historical archive)
-
7. **History Depth**: Git store maintains full history beyond feed depth limits
-
8. **Feed Auto-Discovery**: Extract user metadata from feed during `add user` command
-
-
## Duplicate Entry Management
-
-
### Duplicate Detection Strategy
-
- **Manual Curation**: Duplicates identified and managed manually via CLI
-
- **Storage**: `duplicates.json` file in Git root maps entry IDs to canonical entries
-
- **Structure**: `{"duplicate_id": "canonical_id", ...}`
-
- **CLI Commands**: Add/remove duplicate mappings with validation
-
- **Query Resolution**: Search/list commands resolve duplicates to canonical entries
-
-
### Duplicate File Format
-
```json
-
{
-
"https://example.com/feed/entry/123": "https://canonical.com/posts/same-post",
-
"https://mirror.com/articles/456": "https://canonical.com/posts/same-post",
-
"comment": "Entry IDs that map to the same canonical content"
-
}
-
```
-
-
## Feed Metadata Auto-Discovery
-
-
### Extraction Strategy
-
When adding a new user with `thicket add user`, the system fetches and parses the feed to extract:
-
-
- **Display Name**: From `feed.title` or `feed.author.name`
-
- **Email**: From `feed.author.email` or `feed.managingEditor`
-
- **Homepage**: From `feed.link` or `feed.author.uri`
-
- **Icon**: From `feed.logo`, `feed.icon`, or `feed.image.url`
-
-
### Discovery Priority Order
-
1. **Author Information**: Prefer `feed.author.*` fields (more specific to person)
-
2. **Feed-Level**: Fall back to feed-level metadata
-
3. **Manual Override**: CLI flags always take precedence over discovered values
-
4. **Update Behavior**: Auto-discovery only runs during initial `add user`, not on sync
-
-
### Extracted Metadata Format
-
```python
-
class FeedMetadata(BaseModel):
-
title: Optional[str] = None
-
author_name: Optional[str] = None
-
author_email: Optional[EmailStr] = None
-
author_uri: Optional[HttpUrl] = None
-
link: Optional[HttpUrl] = None
-
logo: Optional[HttpUrl] = None
-
icon: Optional[HttpUrl] = None
-
image_url: Optional[HttpUrl] = None
-
-
def to_user_config(self, username: str, feed_url: HttpUrl) -> UserConfig:
-
"""Convert discovered metadata to UserConfig with fallbacks"""
-
return UserConfig(
-
username=username,
-
feeds=[feed_url],
-
display_name=self.author_name or self.title,
-
email=self.author_email,
-
homepage=self.author_uri or self.link,
-
icon=self.logo or self.icon or self.image_url
-
)
-
```
-
-
## Link Processing and Threading Architecture
-
-
### Overview
-
The thicket system implements a sophisticated link processing and threading system to create email-style threaded views of blog entries by tracking cross-references between different blogs.
-
-
### Link Processing Pipeline
-
-
#### 1. Link Extraction (`thicket links`)
-
The `links` command systematically extracts all outbound links from blog entries and categorizes them:
-
-
```python
-
class LinkData(BaseModel):
-
url: str # Fully resolved URL
-
entry_id: str # Source entry ID
-
username: str # Source username
-
context: str # Surrounding text context
-
category: str # "internal", "user", or "unknown"
-
target_username: Optional[str] # Target user if applicable
-
```
-
-
**Link Categories:**
-
- **Internal**: Links to the same user's domain (self-references)
-
- **User**: Links to other tracked users' domains
-
- **Unknown**: Links to external sites not tracked by thicket
-
-
#### 2. URL Resolution
-
All links are properly resolved using the Atom feed's base URL to handle:
-
- Relative URLs (converted to absolute)
-
- Protocol-relative URLs
-
- Fragment identifiers
-
- Redirects and canonical URLs
-
-
#### 3. Domain Mapping
-
The system builds a comprehensive domain mapping from user configuration:
-
- Feed URLs โ†’ domain extraction
-
- Homepage URLs โ†’ domain extraction
-
- Reverse mapping: domain โ†’ username
-
-
### Threading System
-
-
#### 1. Reference Index Generation (`thicket index`)
-
Creates a bidirectional reference index from the categorized links:
-
-
```python
-
class BlogReference(BaseModel):
-
source_entry_id: str
-
source_username: str
-
target_url: str
-
target_username: Optional[str]
-
target_entry_id: Optional[str]
-
context: str
-
```
-
-
#### 2. Thread Detection Algorithm
-
Uses graph traversal to find connected blog entries:
-
- **Outbound references**: Links from an entry to other entries
-
- **Inbound references**: Links to an entry from other entries
-
- **Thread members**: All entries connected through references
-
-
#### 3. Threading Display (`thicket threads`)
-
Creates email-style threaded views:
-
- Chronological ordering within threads
-
- Reference counts (outbound/inbound)
-
- Context preservation
-
- Filtering options (user, entry, minimum size)
-
-
### Data Structures
-
-
#### links.json Format (Unified Structure)
-
```json
-
{
-
"links": {
-
"https://example.com/post/123": {
-
"referencing_entries": ["https://blog.user.com/entry/456"],
-
"target_username": "user2"
-
},
-
"https://external-site.com/article": {
-
"referencing_entries": ["https://blog.user.com/entry/789"]
-
}
-
},
-
"reverse_mapping": {
-
"https://blog.user.com/entry/456": ["https://example.com/post/123"],
-
"https://blog.user.com/entry/789": ["https://external-site.com/article"]
-
},
-
"references": [
-
{
-
"source_entry_id": "https://blog.user.com/entry/456",
-
"source_username": "user1",
-
"target_url": "https://example.com/post/123",
-
"target_username": "user2",
-
"target_entry_id": "https://example.com/post/123",
-
"context": "As mentioned in this post..."
-
}
-
],
-
"user_domains": {
-
"user1": ["blog.user.com"],
-
"user2": ["example.com"]
-
}
-
}
-
```
-
-
This unified structure eliminates duplication by:
-
- Storing each URL only once with minimal metadata
-
- Including all link data, reference data, and mappings in one file
-
- Using presence of `target_username` to identify tracked vs external links
-
- Providing bidirectional mappings for efficient queries
-
-
### Unified Structure Benefits
-
-
- **Eliminates Duplication**: Each URL appears only once with metadata
-
- **Single Source of Truth**: All link-related data in one file
-
- **Efficient Queries**: Fast lookups for both directions (URLโ†’entries, entryโ†’URLs)
-
- **Atomic Updates**: All link data changes together
-
- **Reduced I/O**: Fewer file operations
-
-
### Implementation Benefits
-
-
1. **Systematic Link Processing**: All links are extracted and categorized consistently
-
2. **Proper URL Resolution**: Handles relative URLs and base URL resolution correctly
-
3. **Domain-based Categorization**: Automatically identifies user-to-user references
-
4. **Bidirectional Indexing**: Supports both "who links to whom" and "who is linked by whom"
-
5. **Thread Discovery**: Finds conversation threads automatically
-
6. **Rich Context**: Preserves surrounding text for each link
-
7. **Performance**: Pre-computed indexes for fast threading queries
-
-
### CLI Commands
-
-
```bash
-
# Extract and categorize all links
-
thicket links --verbose
-
-
# Build reference index for threading
-
thicket index --verbose
-
-
# Show all conversation threads
-
thicket threads
-
-
# Show threads for specific user
-
thicket threads --username user1
-
-
# Show threads with minimum size
-
thicket threads --min-size 3
-
```
-
-
### Integration with Existing Commands
-
-
The link processing system integrates seamlessly with existing thicket commands:
-
- `thicket sync` updates entries, requiring `thicket links` to be run afterward
-
- `thicket index` uses the output from `thicket links` for improved accuracy
-
- `thicket threads` provides the user-facing threading interface
-
-
## Current Implementation Status
-
-
### โœ… Completed Features
-
1. **Core Infrastructure**
-
- Modern CLI with Typer and Rich
-
- Pydantic data models for type safety
-
- Git repository operations with GitPython
-
- Feed parsing and normalization with feedparser
-
-
2. **User and Feed Management**
-
- `thicket init` - Initialize git store
-
- `thicket add` - Add users and feeds with auto-discovery
-
- `thicket sync` - Sync feeds with progress tracking
-
- `thicket list` - List users, feeds, and entries
-
- `thicket duplicates` - Manage duplicate entries
-
-
3. **Link Processing and Threading**
-
- `thicket links` - Extract and categorize all outbound links
-
- `thicket index` - Build reference index from links
-
- `thicket threads` - Display threaded conversation views
-
- Proper URL resolution with base URL handling
-
- Domain-based link categorization
-
- Context preservation for links
-
-
### ๐Ÿ“Š System Performance
-
- **Link Extraction**: Successfully processes thousands of blog entries
-
- **Categorization**: Identifies internal, user, and unknown links
-
- **Threading**: Creates email-style threaded views of conversations
-
- **Storage**: Efficient JSON-based data structures for links and references
-
-
### ๐Ÿ”ง Current Architecture Highlights
-
- **Modular Design**: Clear separation between CLI, core logic, and models
-
- **Type Safety**: Comprehensive Pydantic models for data validation
-
- **Rich CLI**: Beautiful progress bars, tables, and error handling
-
- **Extensible**: Easy to add new commands and features
-
- **Git Integration**: All data stored in version-controlled JSON files
-
-
### ๐ŸŽฏ Proven Functionality
-
The system has been tested with real blog data and successfully:
-
- Extracted 14,396 total links from blog entries
-
- Categorized 3,994 internal links, 363 user-to-user links, and 10,039 unknown links
-
- Built comprehensive domain mappings for 16 users across 20 domains
-
- Generated threaded views showing blog conversation patterns
-
-
### ๐Ÿš€ Ready for Use
-
The thicket system is now fully functional for:
-
- Maintaining Git repositories of blog feeds
-
- Tracking cross-references between blogs
-
- Creating threaded views of blog conversations
-
- Discovering blog interaction patterns
-
- Building distributed comment systems
-
</file>
-
-
<file path="src/thicket/cli/utils.py">
-
"""CLI utilities and helpers."""
-
-
from pathlib import Path
-
from typing import Optional
-
-
import typer
-
from rich.console import Console
-
from rich.progress import Progress, SpinnerColumn, TextColumn
-
from rich.table import Table
-
-
from ..models import ThicketConfig, UserMetadata
-
from ..core.git_store import GitStore
-
-
console = Console()
-
-
-
def get_tsv_mode() -> bool:
-
"""Get the global TSV mode setting."""
-
from .main import tsv_mode
-
return tsv_mode
-
-
-
def load_config(config_path: Optional[Path] = None) -> ThicketConfig:
-
"""Load thicket configuration from file or environment."""
-
if config_path and config_path.exists():
-
import yaml
-
-
with open(config_path) as f:
-
config_data = yaml.safe_load(f)
-
-
# Convert to ThicketConfig
-
return ThicketConfig(**config_data)
-
-
# Try to load from default locations or environment
-
try:
-
# First try to find thicket.yaml in current directory
-
default_config = Path("thicket.yaml")
-
if default_config.exists():
-
import yaml
-
with open(default_config) as f:
-
config_data = yaml.safe_load(f)
-
return ThicketConfig(**config_data)
-
-
# Fall back to environment variables
-
return ThicketConfig()
-
except Exception as e:
-
console.print(f"[red]Error loading configuration: {e}[/red]")
-
console.print("[yellow]Run 'thicket init' to create a new configuration.[/yellow]")
-
raise typer.Exit(1) from e
-
-
-
def save_config(config: ThicketConfig, config_path: Path) -> None:
-
"""Save thicket configuration to file."""
-
import yaml
-
-
config_data = config.model_dump(mode="json", exclude_none=True)
-
-
# Convert Path objects to strings for YAML serialization
-
config_data["git_store"] = str(config_data["git_store"])
-
config_data["cache_dir"] = str(config_data["cache_dir"])
-
-
with open(config_path, "w") as f:
-
yaml.dump(config_data, f, default_flow_style=False, sort_keys=False)
-
-
-
def create_progress() -> Progress:
-
"""Create a Rich progress display."""
-
return Progress(
-
SpinnerColumn(),
-
TextColumn("[progress.description]{task.description}"),
-
console=console,
-
transient=True,
-
)
-
-
-
def print_users_table(config: ThicketConfig) -> None:
-
"""Print a table of users and their feeds."""
-
if get_tsv_mode():
-
print_users_tsv(config)
-
return
-
-
table = Table(title="Users and Feeds")
-
table.add_column("Username", style="cyan", no_wrap=True)
-
table.add_column("Display Name", style="magenta")
-
table.add_column("Email", style="blue")
-
table.add_column("Homepage", style="green")
-
table.add_column("Feeds", style="yellow")
-
-
for user in config.users:
-
feeds_str = "\n".join(str(feed) for feed in user.feeds)
-
table.add_row(
-
user.username,
-
user.display_name or "",
-
user.email or "",
-
str(user.homepage) if user.homepage else "",
-
feeds_str,
-
)
-
-
console.print(table)
-
-
-
def print_feeds_table(config: ThicketConfig, username: Optional[str] = None) -> None:
-
"""Print a table of feeds, optionally filtered by username."""
-
if get_tsv_mode():
-
print_feeds_tsv(config, username)
-
return
-
-
table = Table(title=f"Feeds{f' for {username}' if username else ''}")
-
table.add_column("Username", style="cyan", no_wrap=True)
-
table.add_column("Feed URL", style="blue")
-
table.add_column("Status", style="green")
-
-
users = [config.find_user(username)] if username else config.users
-
users = [u for u in users if u is not None]
-
-
for user in users:
-
for feed in user.feeds:
-
table.add_row(
-
user.username,
-
str(feed),
-
"Active", # TODO: Add actual status checking
-
)
-
-
console.print(table)
-
-
-
def confirm_action(message: str, default: bool = False) -> bool:
-
"""Prompt for confirmation."""
-
return typer.confirm(message, default=default)
-
-
-
def print_success(message: str) -> None:
-
"""Print a success message."""
-
console.print(f"[green]โœ“[/green] {message}")
-
-
-
def print_error(message: str) -> None:
-
"""Print an error message."""
-
console.print(f"[red]โœ—[/red] {message}")
-
-
-
def print_warning(message: str) -> None:
-
"""Print a warning message."""
-
console.print(f"[yellow]โš [/yellow] {message}")
-
-
-
def print_info(message: str) -> None:
-
"""Print an info message."""
-
console.print(f"[blue]โ„น[/blue] {message}")
-
-
-
def print_users_table_from_git(users: list[UserMetadata]) -> None:
-
"""Print a table of users from git repository."""
-
if get_tsv_mode():
-
print_users_tsv_from_git(users)
-
return
-
-
table = Table(title="Users and Feeds")
-
table.add_column("Username", style="cyan", no_wrap=True)
-
table.add_column("Display Name", style="magenta")
-
table.add_column("Email", style="blue")
-
table.add_column("Homepage", style="green")
-
table.add_column("Feeds", style="yellow")
-
-
for user in users:
-
feeds_str = "\n".join(user.feeds)
-
table.add_row(
-
user.username,
-
user.display_name or "",
-
user.email or "",
-
user.homepage or "",
-
feeds_str,
-
)
-
-
console.print(table)
-
-
-
def print_feeds_table_from_git(git_store: GitStore, username: Optional[str] = None) -> None:
-
"""Print a table of feeds from git repository."""
-
if get_tsv_mode():
-
print_feeds_tsv_from_git(git_store, username)
-
return
-
-
table = Table(title=f"Feeds{f' for {username}' if username else ''}")
-
table.add_column("Username", style="cyan", no_wrap=True)
-
table.add_column("Feed URL", style="blue")
-
table.add_column("Status", style="green")
-
-
if username:
-
user = git_store.get_user(username)
-
users = [user] if user else []
-
else:
-
index = git_store._load_index()
-
users = list(index.users.values())
-
-
for user in users:
-
for feed in user.feeds:
-
table.add_row(
-
user.username,
-
feed,
-
"Active", # TODO: Add actual status checking
-
)
-
-
console.print(table)
-
-
-
def print_users_tsv(config: ThicketConfig) -> None:
-
"""Print users in TSV format."""
-
print("Username\tDisplay Name\tEmail\tHomepage\tFeeds")
-
for user in config.users:
-
feeds_str = ",".join(str(feed) for feed in user.feeds)
-
print(f"{user.username}\t{user.display_name or ''}\t{user.email or ''}\t{user.homepage or ''}\t{feeds_str}")
-
-
-
def print_users_tsv_from_git(users: list[UserMetadata]) -> None:
-
"""Print users from git repository in TSV format."""
-
print("Username\tDisplay Name\tEmail\tHomepage\tFeeds")
-
for user in users:
-
feeds_str = ",".join(user.feeds)
-
print(f"{user.username}\t{user.display_name or ''}\t{user.email or ''}\t{user.homepage or ''}\t{feeds_str}")
-
-
-
def print_feeds_tsv(config: ThicketConfig, username: Optional[str] = None) -> None:
-
"""Print feeds in TSV format."""
-
print("Username\tFeed URL\tStatus")
-
users = [config.find_user(username)] if username else config.users
-
users = [u for u in users if u is not None]
-
-
for user in users:
-
for feed in user.feeds:
-
print(f"{user.username}\t{feed}\tActive")
-
-
-
def print_feeds_tsv_from_git(git_store: GitStore, username: Optional[str] = None) -> None:
-
"""Print feeds from git repository in TSV format."""
-
print("Username\tFeed URL\tStatus")
-
-
if username:
-
user = git_store.get_user(username)
-
users = [user] if user else []
-
else:
-
index = git_store._load_index()
-
users = list(index.users.values())
-
-
for user in users:
-
for feed in user.feeds:
-
print(f"{user.username}\t{feed}\tActive")
-
-
-
def print_entries_tsv(entries_by_user: list[list], usernames: list[str]) -> None:
-
"""Print entries in TSV format."""
-
print("User\tAtom ID\tTitle\tUpdated\tURL")
-
-
# Combine all entries with usernames
-
all_entries = []
-
for entries, username in zip(entries_by_user, usernames):
-
for entry in entries:
-
all_entries.append((username, entry))
-
-
# Sort by updated time (newest first)
-
all_entries.sort(key=lambda x: x[1].updated, reverse=True)
-
-
for username, entry in all_entries:
-
# Format updated time
-
updated_str = entry.updated.strftime("%Y-%m-%d %H:%M")
-
-
# Escape tabs and newlines in title to preserve TSV format
-
title = entry.title.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
-
-
print(f"{username}\t{entry.id}\t{title}\t{updated_str}\t{entry.link}")
-
</file>
-
-
</files>
···
+1 -5
src/thicket/__init__.py
···
-
"""Thicket - A library for managing feed repositories and static site generation."""
-
-
from .thicket import Thicket
-
from .models import AtomEntry, UserConfig, ThicketConfig
-
__all__ = ["Thicket", "AtomEntry", "UserConfig", "ThicketConfig"]
__version__ = "0.1.0"
__author__ = "thicket"
__email__ = "thicket@example.com"
···
+
"""Thicket: A CLI tool for persisting Atom/RSS feeds in Git repositories."""
__version__ = "0.1.0"
__author__ = "thicket"
__email__ = "thicket@example.com"
+2 -2
src/thicket/cli/commands/__init__.py
···
"""CLI commands for thicket."""
# Import all commands to register them with the main app
-
from . import add, duplicates, generate, index_cmd, info_cmd, init, links_cmd, list_cmd, sync
-
__all__ = ["add", "duplicates", "generate", "index_cmd", "info_cmd", "init", "links_cmd", "list_cmd", "sync"]
···
"""CLI commands for thicket."""
# Import all commands to register them with the main app
+
from . import add, duplicates, info_cmd, init, list_cmd, sync
+
__all__ = ["add", "duplicates", "info_cmd", "init", "list_cmd", "sync"]
+196 -46
src/thicket/cli/commands/add.py
···
"""Add command for thicket."""
from pathlib import Path
from typing import Optional
import typer
-
from pydantic import ValidationError
-
from ..main import app, console, load_thicket
@app.command("add")
-
def add_user(
username: str = typer.Argument(..., help="Username"),
-
feeds: list[str] = typer.Argument(..., help="Feed URLs"),
email: Optional[str] = typer.Option(None, "--email", "-e", help="User email"),
-
homepage: Optional[str] = typer.Option(None, "--homepage", "-h", help="User homepage"),
icon: Optional[str] = typer.Option(None, "--icon", "-i", help="User icon URL"),
-
display_name: Optional[str] = typer.Option(None, "--display-name", "-d", help="User display name"),
config_file: Optional[Path] = typer.Option(
-
None, "--config", help="Configuration file path"
),
) -> None:
-
"""Add a user with their feeds to thicket."""
-
try:
-
# Load Thicket instance
-
thicket = load_thicket(config_file)
-
-
# Prepare user data
-
user_data = {}
-
if email:
-
user_data['email'] = email
-
if homepage:
-
user_data['homepage'] = homepage
-
if icon:
-
user_data['icon'] = icon
-
if display_name:
-
user_data['display_name'] = display_name
-
-
# Add the user
-
user_config = thicket.add_user(username, feeds, **user_data)
-
-
console.print(f"[green]โœ“[/green] Added user: {username}")
-
console.print(f" โ€ข Display name: {user_config.display_name or 'None'}")
-
console.print(f" โ€ข Email: {user_config.email or 'None'}")
-
console.print(f" โ€ข Homepage: {user_config.homepage or 'None'}")
-
console.print(f" โ€ข Feeds: {len(user_config.feeds)}")
-
-
for feed in user_config.feeds:
-
console.print(f" - {feed}")
-
-
# Commit the addition
-
commit_message = f"Add user {username} with {len(feeds)} feed(s)"
-
if thicket.commit_changes(commit_message):
-
console.print(f"[green]โœ“[/green] Committed: {commit_message}")
-
else:
-
console.print("[yellow]Warning:[/yellow] Failed to commit changes")
-
-
except ValidationError as e:
-
console.print(f"[red]Validation Error:[/red] {str(e)}")
raise typer.Exit(1)
-
except Exception as e:
-
console.print(f"[red]Error:[/red] {str(e)}")
raise typer.Exit(1)
···
"""Add command for thicket."""
+
import asyncio
from pathlib import Path
from typing import Optional
import typer
+
from pydantic import HttpUrl, ValidationError
+
from ...core.feed_parser import FeedParser
+
from ...core.git_store import GitStore
+
from ..main import app
+
from ..utils import (
+
create_progress,
+
load_config,
+
print_error,
+
print_info,
+
print_success,
+
)
@app.command("add")
+
def add_command(
+
subcommand: str = typer.Argument(..., help="Subcommand: 'user' or 'feed'"),
username: str = typer.Argument(..., help="Username"),
+
feed_url: Optional[str] = typer.Argument(
+
None, help="Feed URL (required for 'user' command)"
+
),
email: Optional[str] = typer.Option(None, "--email", "-e", help="User email"),
+
homepage: Optional[str] = typer.Option(
+
None, "--homepage", "-h", help="User homepage"
+
),
icon: Optional[str] = typer.Option(None, "--icon", "-i", help="User icon URL"),
+
display_name: Optional[str] = typer.Option(
+
None, "--display-name", "-d", help="User display name"
+
),
config_file: Optional[Path] = typer.Option(
+
Path("thicket.yaml"), "--config", help="Configuration file path"
+
),
+
auto_discover: bool = typer.Option(
+
True,
+
"--auto-discover/--no-auto-discover",
+
help="Auto-discover user metadata from feed",
),
) -> None:
+
"""Add a user or feed to thicket."""
+
+
if subcommand == "user":
+
add_user(
+
username,
+
feed_url,
+
email,
+
homepage,
+
icon,
+
display_name,
+
config_file,
+
auto_discover,
+
)
+
elif subcommand == "feed":
+
add_feed(username, feed_url, config_file)
+
else:
+
print_error(f"Unknown subcommand: {subcommand}")
+
print_error("Use 'user' or 'feed'")
+
raise typer.Exit(1)
+
+
+
def add_user(
+
username: str,
+
feed_url: Optional[str],
+
email: Optional[str],
+
homepage: Optional[str],
+
icon: Optional[str],
+
display_name: Optional[str],
+
config_file: Path,
+
auto_discover: bool,
+
) -> None:
+
"""Add a new user with feed."""
+
+
if not feed_url:
+
print_error("Feed URL is required when adding a user")
+
raise typer.Exit(1)
+
+
# Validate feed URL
try:
+
validated_feed_url = HttpUrl(feed_url)
+
except ValidationError:
+
print_error(f"Invalid feed URL: {feed_url}")
+
raise typer.Exit(1) from None
+
+
# Load configuration
+
config = load_config(config_file)
+
+
# Initialize Git store
+
git_store = GitStore(config.git_store)
+
+
# Check if user already exists
+
existing_user = git_store.get_user(username)
+
if existing_user:
+
print_error(f"User '{username}' already exists")
+
print_error("Use 'thicket add feed' to add additional feeds")
raise typer.Exit(1)
+
+
# Auto-discover metadata if enabled
+
discovered_metadata = None
+
if auto_discover:
+
discovered_metadata = asyncio.run(discover_feed_metadata(validated_feed_url))
+
+
# Prepare user data with manual overrides taking precedence
+
user_display_name = display_name or (
+
discovered_metadata.author_name or discovered_metadata.title
+
if discovered_metadata
+
else None
+
)
+
user_email = email or (
+
discovered_metadata.author_email if discovered_metadata else None
+
)
+
user_homepage = homepage or (
+
str(discovered_metadata.author_uri or discovered_metadata.link)
+
if discovered_metadata
+
else None
+
)
+
user_icon = icon or (
+
str(
+
discovered_metadata.logo
+
or discovered_metadata.icon
+
or discovered_metadata.image_url
+
)
+
if discovered_metadata
+
else None
+
)
+
+
# Add user to Git store
+
git_store.add_user(
+
username=username,
+
display_name=user_display_name,
+
email=user_email,
+
homepage=user_homepage,
+
icon=user_icon,
+
feeds=[str(validated_feed_url)],
+
)
+
+
# Commit changes
+
git_store.commit_changes(f"Add user: {username}")
+
+
print_success(f"Added user '{username}' with feed: {feed_url}")
+
+
if discovered_metadata and auto_discover:
+
print_info("Auto-discovered metadata:")
+
if user_display_name:
+
print_info(f" Display name: {user_display_name}")
+
if user_email:
+
print_info(f" Email: {user_email}")
+
if user_homepage:
+
print_info(f" Homepage: {user_homepage}")
+
if user_icon:
+
print_info(f" Icon: {user_icon}")
+
+
+
def add_feed(username: str, feed_url: Optional[str], config_file: Path) -> None:
+
"""Add a feed to an existing user."""
+
+
if not feed_url:
+
print_error("Feed URL is required")
raise typer.Exit(1)
+
# Validate feed URL
+
try:
+
validated_feed_url = HttpUrl(feed_url)
+
except ValidationError:
+
print_error(f"Invalid feed URL: {feed_url}")
+
raise typer.Exit(1) from None
+
+
# Load configuration
+
config = load_config(config_file)
+
+
# Initialize Git store
+
git_store = GitStore(config.git_store)
+
+
# Check if user exists
+
user = git_store.get_user(username)
+
if not user:
+
print_error(f"User '{username}' not found")
+
print_error("Use 'thicket add user' to add a new user")
+
raise typer.Exit(1)
+
+
# Check if feed already exists
+
if str(validated_feed_url) in user.feeds:
+
print_error(f"Feed already exists for user '{username}': {feed_url}")
+
raise typer.Exit(1)
+
+
# Add feed to user
+
updated_feeds = user.feeds + [str(validated_feed_url)]
+
if git_store.update_user(username, feeds=updated_feeds):
+
git_store.commit_changes(f"Add feed to user {username}: {feed_url}")
+
print_success(f"Added feed to user '{username}': {feed_url}")
+
else:
+
print_error(f"Failed to add feed to user '{username}'")
+
raise typer.Exit(1)
+
+
+
async def discover_feed_metadata(feed_url: HttpUrl):
+
"""Discover metadata from a feed URL."""
+
try:
+
with create_progress() as progress:
+
task = progress.add_task("Discovering feed metadata...", total=None)
+
+
parser = FeedParser()
+
content = await parser.fetch_feed(feed_url)
+
metadata, _ = parser.parse_feed(content, feed_url)
+
+
progress.update(task, completed=True)
+
return metadata
+
+
except Exception as e:
+
print_error(f"Failed to discover feed metadata: {e}")
+
return None
+7 -3
src/thicket/cli/commands/duplicates.py
···
from ..main import app
from ..utils import (
console,
load_config,
print_error,
print_info,
print_success,
-
get_tsv_mode,
)
···
print_info(f"Total duplicates: {len(duplicates.duplicates)}")
-
def add_duplicate(git_store: GitStore, duplicate_id: Optional[str], canonical_id: Optional[str]) -> None:
"""Add a duplicate mapping."""
if not duplicate_id:
print_error("Duplicate ID is required")
···
# Remove the mapping
if git_store.remove_duplicate(duplicate_id):
# Commit changes
-
git_store.commit_changes(f"Remove duplicate mapping: {duplicate_id} -> {canonical_id}")
print_success(f"Removed duplicate mapping: {duplicate_id} -> {canonical_id}")
else:
print_error(f"Failed to remove duplicate mapping: {duplicate_id}")
···
from ..main import app
from ..utils import (
console,
+
get_tsv_mode,
load_config,
print_error,
print_info,
print_success,
)
···
print_info(f"Total duplicates: {len(duplicates.duplicates)}")
+
def add_duplicate(
+
git_store: GitStore, duplicate_id: Optional[str], canonical_id: Optional[str]
+
) -> None:
"""Add a duplicate mapping."""
if not duplicate_id:
print_error("Duplicate ID is required")
···
# Remove the mapping
if git_store.remove_duplicate(duplicate_id):
# Commit changes
+
git_store.commit_changes(
+
f"Remove duplicate mapping: {duplicate_id} -> {canonical_id}"
+
)
print_success(f"Removed duplicate mapping: {duplicate_id} -> {canonical_id}")
else:
print_error(f"Failed to remove duplicate mapping: {duplicate_id}")
-59
src/thicket/cli/commands/generate.py
···
-
"""Generate static HTML website from thicket data."""
-
-
from pathlib import Path
-
from typing import Optional
-
-
import typer
-
-
from ..main import app, console, load_thicket
-
-
-
-
-
@app.command()
-
def generate(
-
output: Path = typer.Option(
-
Path("./thicket-site"),
-
"--output",
-
"-o",
-
help="Output directory for the generated website",
-
),
-
template_dir: Optional[Path] = typer.Option(
-
None, "--templates", help="Custom template directory"
-
),
-
config_file: Optional[Path] = typer.Option(
-
None, "--config", help="Configuration file path"
-
),
-
) -> None:
-
"""Generate a static HTML website from thicket data."""
-
-
try:
-
# Load Thicket instance
-
thicket = load_thicket(config_file)
-
-
console.print(f"[blue]Generating static site to:[/blue] {output}")
-
-
# Generate the complete site
-
if thicket.generate_site(output, template_dir):
-
console.print(f"[green]โœ“[/green] Successfully generated site at {output}")
-
-
# Show what was generated
-
stats = thicket.get_stats()
-
console.print(f" โ€ข {stats.get('total_entries', 0)} entries")
-
console.print(f" โ€ข {stats.get('total_users', 0)} users")
-
console.print(f" โ€ข {stats.get('unique_urls', 0)} unique links")
-
-
# List generated files
-
if output.exists():
-
html_files = list(output.glob("*.html"))
-
if html_files:
-
console.print(" โ€ข Generated pages:")
-
for html_file in sorted(html_files):
-
console.print(f" - {html_file.name}")
-
else:
-
console.print("[red]โœ—[/red] Failed to generate site")
-
raise typer.Exit(1)
-
-
except Exception as e:
-
console.print(f"[red]Error:[/red] {str(e)}")
-
raise typer.Exit(1)
···
-427
src/thicket/cli/commands/index_cmd.py
···
-
"""CLI command for building reference index from blog entries."""
-
-
import json
-
from pathlib import Path
-
from typing import Optional
-
-
import typer
-
from rich.console import Console
-
from rich.progress import (
-
BarColumn,
-
Progress,
-
SpinnerColumn,
-
TaskProgressColumn,
-
TextColumn,
-
)
-
from rich.table import Table
-
-
from ...core.git_store import GitStore
-
from ...core.reference_parser import ReferenceIndex, ReferenceParser
-
from ..main import app
-
from ..utils import get_tsv_mode, load_config
-
-
console = Console()
-
-
-
@app.command()
-
def index(
-
config_file: Optional[Path] = typer.Option(
-
None,
-
"--config",
-
"-c",
-
help="Path to configuration file",
-
),
-
output_file: Optional[Path] = typer.Option(
-
None,
-
"--output",
-
"-o",
-
help="Path to output index file (default: updates links.json in git store)",
-
),
-
verbose: bool = typer.Option(
-
False,
-
"--verbose",
-
"-v",
-
help="Show detailed progress information",
-
),
-
) -> None:
-
"""Build a reference index showing which blog entries reference others.
-
-
This command analyzes all blog entries to detect cross-references between
-
different blogs, creating an index that can be used to build threaded
-
views of related content.
-
-
Updates the unified links.json file with reference data.
-
"""
-
try:
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
# Initialize reference parser
-
parser = ReferenceParser()
-
-
# Build user domain mapping
-
if verbose:
-
console.print("Building user domain mapping...")
-
user_domains = parser.build_user_domain_mapping(git_store)
-
-
if verbose:
-
console.print(f"Found {len(user_domains)} users with {sum(len(d) for d in user_domains.values())} total domains")
-
-
# Initialize reference index
-
ref_index = ReferenceIndex()
-
ref_index.user_domains = user_domains
-
-
# Get all users
-
index = git_store._load_index()
-
users = list(index.users.keys())
-
-
if not users:
-
console.print("[yellow]No users found in Git store[/yellow]")
-
raise typer.Exit(0)
-
-
# Process all entries
-
total_entries = 0
-
total_references = 0
-
all_references = []
-
-
with Progress(
-
SpinnerColumn(),
-
TextColumn("[progress.description]{task.description}"),
-
BarColumn(),
-
TaskProgressColumn(),
-
console=console,
-
) as progress:
-
-
# Count total entries first
-
counting_task = progress.add_task("Counting entries...", total=len(users))
-
entry_counts = {}
-
for username in users:
-
entries = git_store.list_entries(username)
-
entry_counts[username] = len(entries)
-
total_entries += len(entries)
-
progress.advance(counting_task)
-
-
progress.remove_task(counting_task)
-
-
# Process entries - extract references
-
processing_task = progress.add_task(
-
f"Extracting references from {total_entries} entries...",
-
total=total_entries
-
)
-
-
for username in users:
-
entries = git_store.list_entries(username)
-
-
for entry in entries:
-
# Extract references from this entry
-
references = parser.extract_references(entry, username, user_domains)
-
all_references.extend(references)
-
-
progress.advance(processing_task)
-
-
if verbose and references:
-
console.print(f" Found {len(references)} references in {username}:{entry.title[:50]}...")
-
-
progress.remove_task(processing_task)
-
-
# Resolve target_entry_ids for references
-
if all_references:
-
resolve_task = progress.add_task(
-
f"Resolving {len(all_references)} references...",
-
total=len(all_references)
-
)
-
-
if verbose:
-
console.print(f"Resolving target entry IDs for {len(all_references)} references...")
-
-
resolved_references = parser.resolve_target_entry_ids(all_references, git_store)
-
-
# Count resolved references
-
resolved_count = sum(1 for ref in resolved_references if ref.target_entry_id is not None)
-
if verbose:
-
console.print(f"Resolved {resolved_count} out of {len(all_references)} references")
-
-
# Add resolved references to index
-
for ref in resolved_references:
-
ref_index.add_reference(ref)
-
total_references += 1
-
progress.advance(resolve_task)
-
-
progress.remove_task(resolve_task)
-
-
# Determine output path
-
if output_file:
-
output_path = output_file
-
else:
-
output_path = config.git_store / "links.json"
-
-
# Load existing links data or create new structure
-
if output_path.exists() and not output_file:
-
# Load existing unified structure
-
with open(output_path) as f:
-
existing_data = json.load(f)
-
else:
-
# Create new structure
-
existing_data = {
-
"links": {},
-
"reverse_mapping": {},
-
"user_domains": {}
-
}
-
-
# Update with reference data
-
existing_data["references"] = ref_index.to_dict()["references"]
-
existing_data["user_domains"] = {k: list(v) for k, v in user_domains.items()}
-
-
# Save updated structure
-
with open(output_path, "w") as f:
-
json.dump(existing_data, f, indent=2, default=str)
-
-
# Show summary
-
if not get_tsv_mode():
-
console.print("\n[green]โœ“ Reference index built successfully[/green]")
-
-
# Create summary table or TSV output
-
if get_tsv_mode():
-
print("Metric\tCount")
-
print(f"Total Users\t{len(users)}")
-
print(f"Total Entries\t{total_entries}")
-
print(f"Total References\t{total_references}")
-
print(f"Outbound Refs\t{len(ref_index.outbound_refs)}")
-
print(f"Inbound Refs\t{len(ref_index.inbound_refs)}")
-
print(f"Output File\t{output_path}")
-
else:
-
table = Table(title="Reference Index Summary")
-
table.add_column("Metric", style="cyan")
-
table.add_column("Count", style="green")
-
-
table.add_row("Total Users", str(len(users)))
-
table.add_row("Total Entries", str(total_entries))
-
table.add_row("Total References", str(total_references))
-
table.add_row("Outbound Refs", str(len(ref_index.outbound_refs)))
-
table.add_row("Inbound Refs", str(len(ref_index.inbound_refs)))
-
table.add_row("Output File", str(output_path))
-
-
console.print(table)
-
-
# Show some interesting statistics
-
if total_references > 0:
-
if not get_tsv_mode():
-
console.print("\n[bold]Reference Statistics:[/bold]")
-
-
# Most referenced users
-
target_counts = {}
-
unresolved_domains = set()
-
-
for ref in ref_index.references:
-
if ref.target_username:
-
target_counts[ref.target_username] = target_counts.get(ref.target_username, 0) + 1
-
else:
-
# Track unresolved domains
-
from urllib.parse import urlparse
-
domain = urlparse(ref.target_url).netloc.lower()
-
unresolved_domains.add(domain)
-
-
if target_counts:
-
if get_tsv_mode():
-
print("Referenced User\tReference Count")
-
for username, count in sorted(target_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
-
print(f"{username}\t{count}")
-
else:
-
console.print("\nMost referenced users:")
-
for username, count in sorted(target_counts.items(), key=lambda x: x[1], reverse=True)[:5]:
-
console.print(f" {username}: {count} references")
-
-
if unresolved_domains and verbose:
-
if get_tsv_mode():
-
print("Unresolved Domain\tCount")
-
for domain in sorted(list(unresolved_domains)[:10]):
-
print(f"{domain}\t1")
-
if len(unresolved_domains) > 10:
-
print(f"... and {len(unresolved_domains) - 10} more\t...")
-
else:
-
console.print(f"\nUnresolved domains: {len(unresolved_domains)}")
-
for domain in sorted(list(unresolved_domains)[:10]):
-
console.print(f" {domain}")
-
if len(unresolved_domains) > 10:
-
console.print(f" ... and {len(unresolved_domains) - 10} more")
-
-
except Exception as e:
-
console.print(f"[red]Error building reference index: {e}[/red]")
-
if verbose:
-
console.print_exception()
-
raise typer.Exit(1)
-
-
-
@app.command()
-
def threads(
-
config_file: Optional[Path] = typer.Option(
-
None,
-
"--config",
-
"-c",
-
help="Path to configuration file",
-
),
-
index_file: Optional[Path] = typer.Option(
-
None,
-
"--index",
-
"-i",
-
help="Path to reference index file (default: links.json in git store)",
-
),
-
username: Optional[str] = typer.Option(
-
None,
-
"--username",
-
"-u",
-
help="Show threads for specific username only",
-
),
-
entry_id: Optional[str] = typer.Option(
-
None,
-
"--entry",
-
"-e",
-
help="Show thread for specific entry ID",
-
),
-
min_size: int = typer.Option(
-
2,
-
"--min-size",
-
"-m",
-
help="Minimum thread size to display",
-
),
-
) -> None:
-
"""Show threaded view of related blog entries.
-
-
This command uses the reference index to show which blog entries
-
are connected through cross-references, creating an email-style
-
threaded view of the conversation.
-
-
Reads reference data from the unified links.json file.
-
"""
-
try:
-
# Load configuration
-
config = load_config(config_file)
-
-
# Determine index file path
-
if index_file:
-
index_path = index_file
-
else:
-
index_path = config.git_store / "links.json"
-
-
if not index_path.exists():
-
console.print(f"[red]Links file not found: {index_path}[/red]")
-
console.print("Run 'thicket links' and 'thicket index' first to build the reference index")
-
raise typer.Exit(1)
-
-
# Load unified data
-
with open(index_path) as f:
-
unified_data = json.load(f)
-
-
# Check if references exist in the unified structure
-
if "references" not in unified_data:
-
console.print(f"[red]No references found in {index_path}[/red]")
-
console.print("Run 'thicket index' first to build the reference index")
-
raise typer.Exit(1)
-
-
# Extract reference data and reconstruct ReferenceIndex
-
ref_index = ReferenceIndex.from_dict({
-
"references": unified_data["references"],
-
"user_domains": unified_data.get("user_domains", {})
-
})
-
-
# Initialize Git store to get entry details
-
git_store = GitStore(config.git_store)
-
-
if entry_id and username:
-
# Show specific thread
-
thread_members = ref_index.get_thread_members(username, entry_id)
-
_display_thread(thread_members, ref_index, git_store, f"Thread for {username}:{entry_id}")
-
-
elif username:
-
# Show all threads involving this user
-
user_index = git_store._load_index()
-
user = user_index.get_user(username)
-
if not user:
-
console.print(f"[red]User not found: {username}[/red]")
-
raise typer.Exit(1)
-
-
entries = git_store.list_entries(username)
-
threads_found = set()
-
-
console.print(f"[bold]Threads involving {username}:[/bold]\n")
-
-
for entry in entries:
-
thread_members = ref_index.get_thread_members(username, entry.id)
-
if len(thread_members) >= min_size:
-
thread_key = tuple(sorted(thread_members))
-
if thread_key not in threads_found:
-
threads_found.add(thread_key)
-
_display_thread(thread_members, ref_index, git_store, f"Thread #{len(threads_found)}")
-
-
else:
-
# Show all threads
-
console.print("[bold]All conversation threads:[/bold]\n")
-
-
all_threads = set()
-
processed_entries = set()
-
-
# Get all entries
-
user_index = git_store._load_index()
-
for username in user_index.users.keys():
-
entries = git_store.list_entries(username)
-
for entry in entries:
-
entry_key = (username, entry.id)
-
if entry_key in processed_entries:
-
continue
-
-
thread_members = ref_index.get_thread_members(username, entry.id)
-
if len(thread_members) >= min_size:
-
thread_key = tuple(sorted(thread_members))
-
if thread_key not in all_threads:
-
all_threads.add(thread_key)
-
_display_thread(thread_members, ref_index, git_store, f"Thread #{len(all_threads)}")
-
-
# Mark all members as processed
-
for member in thread_members:
-
processed_entries.add(member)
-
-
if not all_threads:
-
console.print("[yellow]No conversation threads found[/yellow]")
-
console.print(f"(minimum thread size: {min_size})")
-
-
except Exception as e:
-
console.print(f"[red]Error showing threads: {e}[/red]")
-
raise typer.Exit(1)
-
-
-
def _display_thread(thread_members, ref_index, git_store, title):
-
"""Display a single conversation thread."""
-
console.print(f"[bold cyan]{title}[/bold cyan]")
-
console.print(f"Thread size: {len(thread_members)} entries")
-
-
# Get entry details for each member
-
thread_entries = []
-
for username, entry_id in thread_members:
-
entry = git_store.get_entry(username, entry_id)
-
if entry:
-
thread_entries.append((username, entry))
-
-
# Sort by publication date
-
thread_entries.sort(key=lambda x: x[1].published or x[1].updated)
-
-
# Display entries
-
for i, (username, entry) in enumerate(thread_entries):
-
prefix = "โ”œโ”€" if i < len(thread_entries) - 1 else "โ””โ”€"
-
-
# Get references for this entry
-
outbound = ref_index.get_outbound_refs(username, entry.id)
-
inbound = ref_index.get_inbound_refs(username, entry.id)
-
-
ref_info = ""
-
if outbound or inbound:
-
ref_info = f" ({len(outbound)} out, {len(inbound)} in)"
-
-
console.print(f" {prefix} [{username}] {entry.title[:60]}...{ref_info}")
-
-
if entry.published:
-
console.print(f" Published: {entry.published.strftime('%Y-%m-%d')}")
-
-
console.print() # Empty line after each thread
···
+105 -118
src/thicket/cli/commands/info_cmd.py
···
"""CLI command for displaying detailed information about a specific atom entry."""
-
import json
from pathlib import Path
from typing import Optional
···
from rich.console import Console
from rich.panel import Panel
from rich.table import Table
-
from rich.text import Text
from ...core.git_store import GitStore
-
from ...core.reference_parser import ReferenceIndex
from ..main import app
-
from ..utils import load_config, get_tsv_mode
console = Console()
···
@app.command()
def info(
identifier: str = typer.Argument(
-
...,
-
help="The atom ID or URL of the entry to display information about"
),
username: Optional[str] = typer.Option(
None,
"--username",
"-u",
-
help="Username to search for the entry (if not provided, searches all users)"
),
config_file: Optional[Path] = typer.Option(
Path("thicket.yaml"),
···
help="Path to configuration file",
),
show_content: bool = typer.Option(
-
False,
-
"--content",
-
help="Include the full content of the entry in the output"
),
) -> None:
"""Display detailed information about a specific atom entry.
-
You can specify the entry using either its atom ID or URL.
Shows all metadata for the given entry, including title, dates, categories,
and summarizes all inbound and outbound links to/from other posts.
···
try:
# Load configuration
config = load_config(config_file)
-
# Initialize Git store
git_store = GitStore(config.git_store)
-
# Find the entry
entry = None
found_username = None
-
# Check if identifier looks like a URL
-
is_url = identifier.startswith(('http://', 'https://'))
-
if username:
# Search specific username
if is_url:
···
if entry:
found_username = user
break
-
if not entry or not found_username:
if username:
-
console.print(f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found for user '{username}'[/red]")
else:
-
console.print(f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found in any user's entries[/red]")
raise typer.Exit(1)
-
-
# Load reference index if available
-
links_path = config.git_store / "links.json"
-
ref_index = None
-
if links_path.exists():
-
with open(links_path) as f:
-
unified_data = json.load(f)
-
-
# Check if references exist in the unified structure
-
if "references" in unified_data:
-
ref_index = ReferenceIndex.from_dict({
-
"references": unified_data["references"],
-
"user_domains": unified_data.get("user_domains", {})
-
})
-
# Display information
if get_tsv_mode():
-
_display_entry_info_tsv(entry, found_username, ref_index, show_content)
else:
_display_entry_info(entry, found_username)
-
-
if ref_index:
-
_display_link_info(entry, found_username, ref_index)
-
else:
-
console.print("\n[yellow]No reference index found. Run 'thicket links' and 'thicket index' to build cross-reference data.[/yellow]")
-
# Optionally display content
if show_content and entry.content:
_display_content(entry.content)
-
except Exception as e:
console.print(f"[red]Error displaying entry info: {e}[/red]")
raise typer.Exit(1)
···
def _display_entry_info(entry, username: str) -> None:
"""Display basic entry information in a structured format."""
-
# Create main info panel
info_table = Table.grid(padding=(0, 2))
info_table.add_column("Field", style="cyan bold", width=15)
info_table.add_column("Value", style="white")
-
info_table.add_row("User", f"[green]{username}[/green]")
info_table.add_row("Atom ID", f"[blue]{entry.id}[/blue]")
info_table.add_row("Title", entry.title)
info_table.add_row("Link", str(entry.link))
-
if entry.published:
-
info_table.add_row("Published", entry.published.strftime("%Y-%m-%d %H:%M:%S UTC"))
-
info_table.add_row("Updated", entry.updated.strftime("%Y-%m-%d %H:%M:%S UTC"))
-
if entry.summary:
# Truncate long summaries
-
summary = entry.summary[:200] + "..." if len(entry.summary) > 200 else entry.summary
info_table.add_row("Summary", summary)
-
if entry.categories:
categories_text = ", ".join(entry.categories)
info_table.add_row("Categories", categories_text)
-
if entry.author:
author_info = []
if "name" in entry.author:
···
author_info.append(f"<{entry.author['email']}>")
if author_info:
info_table.add_row("Author", " ".join(author_info))
-
if entry.content_type:
info_table.add_row("Content Type", entry.content_type)
-
if entry.rights:
info_table.add_row("Rights", entry.rights)
-
if entry.source:
info_table.add_row("Source Feed", entry.source)
-
panel = Panel(
-
info_table,
-
title=f"[bold]Entry Information[/bold]",
-
border_style="blue"
)
-
console.print(panel)
-
def _display_link_info(entry, username: str, ref_index: ReferenceIndex) -> None:
"""Display inbound and outbound link information."""
-
-
# Get links
-
outbound_refs = ref_index.get_outbound_refs(username, entry.id)
-
inbound_refs = ref_index.get_inbound_refs(username, entry.id)
-
-
if not outbound_refs and not inbound_refs:
console.print("\n[dim]No cross-references found for this entry.[/dim]")
return
-
# Create links table
links_table = Table(title="Cross-References")
links_table.add_column("Direction", style="cyan", width=10)
-
links_table.add_column("Target/Source", style="green", width=20)
-
links_table.add_column("URL", style="blue", width=50)
-
-
# Add outbound references
-
for ref in outbound_refs:
-
target_info = f"{ref.target_username}:{ref.target_entry_id}" if ref.target_username and ref.target_entry_id else "External"
-
links_table.add_row("โ†’ Out", target_info, ref.target_url)
-
-
# Add inbound references
-
for ref in inbound_refs:
-
source_info = f"{ref.source_username}:{ref.source_entry_id}"
-
links_table.add_row("โ† In", source_info, ref.target_url)
-
console.print()
console.print(links_table)
-
# Summary
-
console.print(f"\n[bold]Summary:[/bold] {len(outbound_refs)} outbound, {len(inbound_refs)} inbound references")
def _display_content(content: str) -> None:
"""Display the full content of the entry."""
-
# Truncate very long content
display_content = content
if len(content) > 5000:
display_content = content[:5000] + "\n\n[... content truncated ...]"
-
panel = Panel(
display_content,
title="[bold]Entry Content[/bold]",
border_style="green",
-
expand=False
)
-
console.print()
console.print(panel)
-
def _display_entry_info_tsv(entry, username: str, ref_index: Optional[ReferenceIndex], show_content: bool) -> None:
"""Display entry information in TSV format."""
-
# Basic info
print("Field\tValue")
print(f"User\t{username}")
print(f"Atom ID\t{entry.id}")
-
print(f"Title\t{entry.title.replace(chr(9), ' ').replace(chr(10), ' ').replace(chr(13), ' ')}")
print(f"Link\t{entry.link}")
-
if entry.published:
print(f"Published\t{entry.published.strftime('%Y-%m-%d %H:%M:%S UTC')}")
-
print(f"Updated\t{entry.updated.strftime('%Y-%m-%d %H:%M:%S UTC')}")
-
if entry.summary:
# Escape tabs and newlines in summary
-
summary = entry.summary.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
print(f"Summary\t{summary}")
-
if entry.categories:
print(f"Categories\t{', '.join(entry.categories)}")
-
if entry.author:
author_info = []
if "name" in entry.author:
···
author_info.append(f"<{entry.author['email']}>")
if author_info:
print(f"Author\t{' '.join(author_info)}")
-
if entry.content_type:
print(f"Content Type\t{entry.content_type}")
-
if entry.rights:
print(f"Rights\t{entry.rights}")
-
if entry.source:
print(f"Source Feed\t{entry.source}")
-
-
# Add reference info if available
-
if ref_index:
-
outbound_refs = ref_index.get_outbound_refs(username, entry.id)
-
inbound_refs = ref_index.get_inbound_refs(username, entry.id)
-
-
print(f"Outbound References\t{len(outbound_refs)}")
-
print(f"Inbound References\t{len(inbound_refs)}")
-
-
# Show each reference
-
for ref in outbound_refs:
-
target_info = f"{ref.target_username}:{ref.target_entry_id}" if ref.target_username and ref.target_entry_id else "External"
-
print(f"Outbound Reference\t{target_info}\t{ref.target_url}")
-
-
for ref in inbound_refs:
-
source_info = f"{ref.source_username}:{ref.source_entry_id}"
-
print(f"Inbound Reference\t{source_info}\t{ref.target_url}")
-
# Show content if requested
if show_content and entry.content:
# Escape tabs and newlines in content
-
content = entry.content.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
-
print(f"Content\t{content}")
···
"""CLI command for displaying detailed information about a specific atom entry."""
from pathlib import Path
from typing import Optional
···
from rich.console import Console
from rich.panel import Panel
from rich.table import Table
from ...core.git_store import GitStore
from ..main import app
+
from ..utils import get_tsv_mode, load_config
console = Console()
···
@app.command()
def info(
identifier: str = typer.Argument(
+
..., help="The atom ID or URL of the entry to display information about"
),
username: Optional[str] = typer.Option(
None,
"--username",
"-u",
+
help="Username to search for the entry (if not provided, searches all users)",
),
config_file: Optional[Path] = typer.Option(
Path("thicket.yaml"),
···
help="Path to configuration file",
),
show_content: bool = typer.Option(
+
False, "--content", help="Include the full content of the entry in the output"
),
) -> None:
"""Display detailed information about a specific atom entry.
+
You can specify the entry using either its atom ID or URL.
Shows all metadata for the given entry, including title, dates, categories,
and summarizes all inbound and outbound links to/from other posts.
···
try:
# Load configuration
config = load_config(config_file)
+
# Initialize Git store
git_store = GitStore(config.git_store)
+
# Find the entry
entry = None
found_username = None
+
# Check if identifier looks like a URL
+
is_url = identifier.startswith(("http://", "https://"))
+
if username:
# Search specific username
if is_url:
···
if entry:
found_username = user
break
+
if not entry or not found_username:
if username:
+
console.print(
+
f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found for user '{username}'[/red]"
+
)
else:
+
console.print(
+
f"[red]Entry with {'URL' if is_url else 'atom ID'} '{identifier}' not found in any user's entries[/red]"
+
)
raise typer.Exit(1)
+
# Display information
if get_tsv_mode():
+
_display_entry_info_tsv(entry, found_username, show_content)
else:
_display_entry_info(entry, found_username)
+
+
# Display links and backlinks from entry fields
+
_display_link_info(entry, found_username, git_store)
+
# Optionally display content
if show_content and entry.content:
_display_content(entry.content)
+
except Exception as e:
console.print(f"[red]Error displaying entry info: {e}[/red]")
raise typer.Exit(1)
···
def _display_entry_info(entry, username: str) -> None:
"""Display basic entry information in a structured format."""
+
# Create main info panel
info_table = Table.grid(padding=(0, 2))
info_table.add_column("Field", style="cyan bold", width=15)
info_table.add_column("Value", style="white")
+
info_table.add_row("User", f"[green]{username}[/green]")
info_table.add_row("Atom ID", f"[blue]{entry.id}[/blue]")
info_table.add_row("Title", entry.title)
info_table.add_row("Link", str(entry.link))
+
if entry.published:
+
info_table.add_row(
+
"Published", entry.published.strftime("%Y-%m-%d %H:%M:%S UTC")
+
)
+
info_table.add_row("Updated", entry.updated.strftime("%Y-%m-%d %H:%M:%S UTC"))
+
if entry.summary:
# Truncate long summaries
+
summary = (
+
entry.summary[:200] + "..." if len(entry.summary) > 200 else entry.summary
+
)
info_table.add_row("Summary", summary)
+
if entry.categories:
categories_text = ", ".join(entry.categories)
info_table.add_row("Categories", categories_text)
+
if entry.author:
author_info = []
if "name" in entry.author:
···
author_info.append(f"<{entry.author['email']}>")
if author_info:
info_table.add_row("Author", " ".join(author_info))
+
if entry.content_type:
info_table.add_row("Content Type", entry.content_type)
+
if entry.rights:
info_table.add_row("Rights", entry.rights)
+
if entry.source:
info_table.add_row("Source Feed", entry.source)
+
panel = Panel(
+
info_table, title="[bold]Entry Information[/bold]", border_style="blue"
)
+
console.print(panel)
+
def _display_link_info(entry, username: str, git_store: GitStore) -> None:
"""Display inbound and outbound link information."""
+
+
# Get links from entry fields
+
outbound_links = getattr(entry, "links", [])
+
backlinks = getattr(entry, "backlinks", [])
+
+
if not outbound_links and not backlinks:
console.print("\n[dim]No cross-references found for this entry.[/dim]")
return
+
# Create links table
links_table = Table(title="Cross-References")
links_table.add_column("Direction", style="cyan", width=10)
+
links_table.add_column("Target/Source", style="green", width=30)
+
links_table.add_column("URL/ID", style="blue", width=60)
+
+
# Add outbound links
+
for link in outbound_links:
+
links_table.add_row("โ†’ Out", "External/Other", link)
+
+
# Add backlinks (inbound references)
+
for backlink_id in backlinks:
+
# Try to find which user this entry belongs to
+
source_info = backlink_id
+
# Could enhance this by looking up the actual entry to get username
+
links_table.add_row("โ† In", "Entry", source_info)
+
console.print()
console.print(links_table)
+
# Summary
+
console.print(
+
f"\n[bold]Summary:[/bold] {len(outbound_links)} outbound links, {len(backlinks)} inbound backlinks"
+
)
def _display_content(content: str) -> None:
"""Display the full content of the entry."""
+
# Truncate very long content
display_content = content
if len(content) > 5000:
display_content = content[:5000] + "\n\n[... content truncated ...]"
+
panel = Panel(
display_content,
title="[bold]Entry Content[/bold]",
border_style="green",
+
expand=False,
)
+
console.print()
console.print(panel)
+
def _display_entry_info_tsv(entry, username: str, show_content: bool) -> None:
"""Display entry information in TSV format."""
+
# Basic info
print("Field\tValue")
print(f"User\t{username}")
print(f"Atom ID\t{entry.id}")
+
print(
+
f"Title\t{entry.title.replace(chr(9), ' ').replace(chr(10), ' ').replace(chr(13), ' ')}"
+
)
print(f"Link\t{entry.link}")
+
if entry.published:
print(f"Published\t{entry.published.strftime('%Y-%m-%d %H:%M:%S UTC')}")
+
print(f"Updated\t{entry.updated.strftime('%Y-%m-%d %H:%M:%S UTC')}")
+
if entry.summary:
# Escape tabs and newlines in summary
+
summary = entry.summary.replace("\t", " ").replace("\n", " ").replace("\r", " ")
print(f"Summary\t{summary}")
+
if entry.categories:
print(f"Categories\t{', '.join(entry.categories)}")
+
if entry.author:
author_info = []
if "name" in entry.author:
···
author_info.append(f"<{entry.author['email']}>")
if author_info:
print(f"Author\t{' '.join(author_info)}")
+
if entry.content_type:
print(f"Content Type\t{entry.content_type}")
+
if entry.rights:
print(f"Rights\t{entry.rights}")
+
if entry.source:
print(f"Source Feed\t{entry.source}")
+
+
# Add links info from entry fields
+
outbound_links = getattr(entry, "links", [])
+
backlinks = getattr(entry, "backlinks", [])
+
+
if outbound_links or backlinks:
+
print(f"Outbound Links\t{len(outbound_links)}")
+
print(f"Backlinks\t{len(backlinks)}")
+
+
# Show each link
+
for link in outbound_links:
+
print(f"โ†’ Link\t{link}")
+
+
for backlink_id in backlinks:
+
print(f"โ† Backlink\t{backlink_id}")
+
# Show content if requested
if show_content and entry.content:
# Escape tabs and newlines in content
+
content = entry.content.replace("\t", " ").replace("\n", " ").replace("\r", " ")
+
print(f"Content\t{content}")
+39 -51
src/thicket/cli/commands/init.py
···
"""Initialize command for thicket."""
-
import yaml
from pathlib import Path
from typing import Optional
import typer
-
from ..main import app, console, get_config_path
from ...models import ThicketConfig
-
from ... import Thicket
@app.command()
def init(
-
git_store: Path = typer.Argument(..., help="Path to Git repository for storing feeds"),
cache_dir: Optional[Path] = typer.Option(
None, "--cache-dir", "-c", help="Cache directory (default: ~/.cache/thicket)"
),
config_file: Optional[Path] = typer.Option(
-
None, "--config", help="Configuration file path (default: ~/.config/thicket/config.yaml)"
),
force: bool = typer.Option(
False, "--force", "-f", help="Overwrite existing configuration"
···
# Set default paths
if cache_dir is None:
-
cache_dir = Path.home() / ".cache" / "thicket"
if config_file is None:
-
config_file = get_config_path()
# Check if config already exists
if config_file.exists() and not force:
-
console.print(f"[red]Configuration file already exists:[/red] {config_file}")
-
console.print("Use --force to overwrite")
raise typer.Exit(1)
-
try:
-
# Create directories
-
git_store.mkdir(parents=True, exist_ok=True)
-
cache_dir.mkdir(parents=True, exist_ok=True)
-
config_file.parent.mkdir(parents=True, exist_ok=True)
-
-
# Create Thicket instance with minimal config
-
thicket = Thicket.create(git_store, cache_dir)
-
-
# Initialize the repository
-
if thicket.init_repository():
-
console.print(f"[green]โœ“[/green] Initialized Git store at: {git_store}")
-
else:
-
console.print(f"[red]โœ—[/red] Failed to initialize Git store")
-
raise typer.Exit(1)
-
# Save configuration
-
config_data = {
-
'git_store': str(git_store),
-
'cache_dir': str(cache_dir),
-
'users': []
-
}
-
-
with open(config_file, 'w') as f:
-
yaml.dump(config_data, f, default_flow_style=False)
-
-
console.print(f"[green]โœ“[/green] Created configuration file: {config_file}")
-
# Create initial commit
-
if thicket.commit_changes("Initialize thicket repository"):
-
console.print("[green]โœ“[/green] Created initial commit")
-
console.print("\n[green]Thicket initialized successfully![/green]")
-
console.print(f" โ€ข Git store: {git_store}")
-
console.print(f" โ€ข Cache directory: {cache_dir}")
-
console.print(f" โ€ข Configuration: {config_file}")
-
console.print("\n[blue]Next steps:[/blue]")
-
console.print(" 1. Add your first user and feed:")
-
console.print(f" [cyan]thicket add username https://example.com/feed.xml[/cyan]")
-
console.print(" 2. Sync feeds:")
-
console.print(f" [cyan]thicket sync[/cyan]")
-
console.print(" 3. Generate a website:")
-
console.print(f" [cyan]thicket generate[/cyan]")
except Exception as e:
-
console.print(f"[red]Error:[/red] {str(e)}")
-
raise typer.Exit(1)
···
"""Initialize command for thicket."""
from pathlib import Path
from typing import Optional
import typer
+
from pydantic import ValidationError
+
from ...core.git_store import GitStore
from ...models import ThicketConfig
+
from ..main import app
+
from ..utils import print_error, print_success, save_config
@app.command()
def init(
+
git_store: Path = typer.Argument(
+
..., help="Path to Git repository for storing feeds"
+
),
cache_dir: Optional[Path] = typer.Option(
None, "--cache-dir", "-c", help="Cache directory (default: ~/.cache/thicket)"
),
config_file: Optional[Path] = typer.Option(
+
None, "--config", help="Configuration file path (default: thicket.yaml)"
),
force: bool = typer.Option(
False, "--force", "-f", help="Overwrite existing configuration"
···
# Set default paths
if cache_dir is None:
+
from platformdirs import user_cache_dir
+
+
cache_dir = Path(user_cache_dir("thicket"))
if config_file is None:
+
config_file = Path("thicket.yaml")
# Check if config already exists
if config_file.exists() and not force:
+
print_error(f"Configuration file already exists: {config_file}")
+
print_error("Use --force to overwrite")
raise typer.Exit(1)
+
# Create cache directory
+
cache_dir.mkdir(parents=True, exist_ok=True)
+
# Create Git store
+
try:
+
GitStore(git_store)
+
print_success(f"Initialized Git store at: {git_store}")
+
except Exception as e:
+
print_error(f"Failed to initialize Git store: {e}")
+
raise typer.Exit(1) from e
+
# Create configuration
+
try:
+
config = ThicketConfig(git_store=git_store, cache_dir=cache_dir, users=[])
+
save_config(config, config_file)
+
print_success(f"Created configuration file: {config_file}")
+
except ValidationError as e:
+
print_error(f"Invalid configuration: {e}")
+
raise typer.Exit(1) from e
except Exception as e:
+
print_error(f"Failed to create configuration: {e}")
+
raise typer.Exit(1) from e
+
+
print_success("Thicket initialized successfully!")
+
print_success(f"Git store: {git_store}")
+
print_success(f"Cache directory: {cache_dir}")
+
print_success(f"Configuration: {config_file}")
+
print_success("Run 'thicket add user' to add your first user and feed.")
-416
src/thicket/cli/commands/links_cmd.py
···
-
"""CLI command for extracting and categorizing all outbound links from blog entries."""
-
-
import json
-
import re
-
from pathlib import Path
-
from typing import Dict, List, Optional, Set
-
from urllib.parse import urljoin, urlparse
-
-
import typer
-
from rich.console import Console
-
from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn
-
from rich.table import Table
-
-
from ...core.git_store import GitStore
-
from ..main import app
-
from ..utils import load_config, get_tsv_mode
-
-
console = Console()
-
-
-
class LinkData:
-
"""Represents a link found in a blog entry."""
-
-
def __init__(self, url: str, entry_id: str, username: str):
-
self.url = url
-
self.entry_id = entry_id
-
self.username = username
-
-
def to_dict(self) -> dict:
-
"""Convert to dictionary for JSON serialization."""
-
return {
-
"url": self.url,
-
"entry_id": self.entry_id,
-
"username": self.username
-
}
-
-
@classmethod
-
def from_dict(cls, data: dict) -> "LinkData":
-
"""Create from dictionary."""
-
return cls(
-
url=data["url"],
-
entry_id=data["entry_id"],
-
username=data["username"]
-
)
-
-
-
class LinkCategorizer:
-
"""Categorizes links as internal, user, or unknown."""
-
-
def __init__(self, user_domains: Dict[str, Set[str]]):
-
self.user_domains = user_domains
-
# Create reverse mapping of domain -> username
-
self.domain_to_user = {}
-
for username, domains in user_domains.items():
-
for domain in domains:
-
self.domain_to_user[domain] = username
-
-
def categorize_url(self, url: str, source_username: str) -> tuple[str, Optional[str]]:
-
"""
-
Categorize a URL as 'internal', 'user', or 'unknown'.
-
Returns (category, target_username).
-
"""
-
try:
-
parsed = urlparse(url)
-
domain = parsed.netloc.lower()
-
-
# Check if it's a link to the same user's domain (internal)
-
if domain in self.user_domains.get(source_username, set()):
-
return "internal", source_username
-
-
# Check if it's a link to another user's domain
-
if domain in self.domain_to_user:
-
return "user", self.domain_to_user[domain]
-
-
# Everything else is unknown
-
return "unknown", None
-
-
except Exception:
-
return "unknown", None
-
-
-
class LinkExtractor:
-
"""Extracts and resolves links from blog entries."""
-
-
def __init__(self):
-
# Pattern for extracting links from HTML
-
self.link_pattern = re.compile(r'<a[^>]+href="([^"]+)"[^>]*>(.*?)</a>', re.IGNORECASE | re.DOTALL)
-
self.url_pattern = re.compile(r'https?://[^\s<>"]+')
-
-
def extract_links_from_html(self, html_content: str, base_url: str) -> List[tuple[str, str]]:
-
"""Extract all links from HTML content and resolve them against base URL."""
-
links = []
-
-
# Extract links from <a> tags
-
for match in self.link_pattern.finditer(html_content):
-
url = match.group(1)
-
text = re.sub(r'<[^>]+>', '', match.group(2)).strip() # Remove HTML tags from link text
-
-
# Resolve relative URLs against base URL
-
resolved_url = urljoin(base_url, url)
-
links.append((resolved_url, text))
-
-
return links
-
-
-
def extract_links_from_entry(self, entry, username: str, base_url: str) -> List[LinkData]:
-
"""Extract all links from a blog entry."""
-
links = []
-
-
# Combine all text content for analysis
-
content_to_search = []
-
if entry.content:
-
content_to_search.append(entry.content)
-
if entry.summary:
-
content_to_search.append(entry.summary)
-
-
for content in content_to_search:
-
extracted_links = self.extract_links_from_html(content, base_url)
-
-
for url, link_text in extracted_links:
-
# Skip empty URLs
-
if not url or url.startswith('#'):
-
continue
-
-
link_data = LinkData(
-
url=url,
-
entry_id=entry.id,
-
username=username
-
)
-
-
links.append(link_data)
-
-
return links
-
-
-
@app.command()
-
def links(
-
config_file: Optional[Path] = typer.Option(
-
Path("thicket.yaml"),
-
"--config",
-
"-c",
-
help="Path to configuration file",
-
),
-
output_file: Optional[Path] = typer.Option(
-
None,
-
"--output",
-
"-o",
-
help="Path to output unified links file (default: links.json in git store)",
-
),
-
verbose: bool = typer.Option(
-
False,
-
"--verbose",
-
"-v",
-
help="Show detailed progress information",
-
),
-
) -> None:
-
"""Extract and categorize all outbound links from blog entries.
-
-
This command analyzes all blog entries to extract outbound links,
-
resolve them properly with respect to the feed's base URL, and
-
categorize them as internal, user, or unknown links.
-
-
Creates a unified links.json file containing all link data.
-
"""
-
try:
-
# Load configuration
-
config = load_config(config_file)
-
-
# Initialize Git store
-
git_store = GitStore(config.git_store)
-
-
# Build user domain mapping
-
if verbose:
-
console.print("Building user domain mapping...")
-
-
index = git_store._load_index()
-
user_domains = {}
-
-
for username, user_metadata in index.users.items():
-
domains = set()
-
-
# Add domains from feeds
-
for feed_url in user_metadata.feeds:
-
domain = urlparse(feed_url).netloc.lower()
-
if domain:
-
domains.add(domain)
-
-
# Add domain from homepage
-
if user_metadata.homepage:
-
domain = urlparse(str(user_metadata.homepage)).netloc.lower()
-
if domain:
-
domains.add(domain)
-
-
user_domains[username] = domains
-
-
if verbose:
-
console.print(f"Found {len(user_domains)} users with {sum(len(d) for d in user_domains.values())} total domains")
-
-
# Initialize components
-
link_extractor = LinkExtractor()
-
categorizer = LinkCategorizer(user_domains)
-
-
# Get all users
-
users = list(index.users.keys())
-
-
if not users:
-
console.print("[yellow]No users found in Git store[/yellow]")
-
raise typer.Exit(0)
-
-
# Process all entries
-
all_links = []
-
link_categories = {"internal": [], "user": [], "unknown": []}
-
link_dict = {} # Dictionary with link URL as key, maps to list of atom IDs
-
reverse_dict = {} # Dictionary with atom ID as key, maps to list of URLs
-
-
with Progress(
-
SpinnerColumn(),
-
TextColumn("[progress.description]{task.description}"),
-
BarColumn(),
-
TaskProgressColumn(),
-
console=console,
-
) as progress:
-
-
# Count total entries first
-
counting_task = progress.add_task("Counting entries...", total=len(users))
-
total_entries = 0
-
-
for username in users:
-
entries = git_store.list_entries(username)
-
total_entries += len(entries)
-
progress.advance(counting_task)
-
-
progress.remove_task(counting_task)
-
-
# Process entries
-
processing_task = progress.add_task(
-
f"Processing {total_entries} entries...",
-
total=total_entries
-
)
-
-
for username in users:
-
entries = git_store.list_entries(username)
-
user_metadata = index.users[username]
-
-
# Get base URL for this user (use first feed URL)
-
base_url = str(user_metadata.feeds[0]) if user_metadata.feeds else "https://example.com"
-
-
for entry in entries:
-
# Extract links from this entry
-
entry_links = link_extractor.extract_links_from_entry(entry, username, base_url)
-
-
# Track unique links per entry
-
entry_urls_seen = set()
-
-
# Categorize each link
-
for link_data in entry_links:
-
# Skip if we've already seen this URL in this entry
-
if link_data.url in entry_urls_seen:
-
continue
-
entry_urls_seen.add(link_data.url)
-
-
category, target_username = categorizer.categorize_url(link_data.url, username)
-
-
# Add to link dictionary (URL as key, maps to list of atom IDs)
-
if link_data.url not in link_dict:
-
link_dict[link_data.url] = []
-
if link_data.entry_id not in link_dict[link_data.url]:
-
link_dict[link_data.url].append(link_data.entry_id)
-
-
# Also add to reverse mapping (atom ID -> list of URLs)
-
if link_data.entry_id not in reverse_dict:
-
reverse_dict[link_data.entry_id] = []
-
if link_data.url not in reverse_dict[link_data.entry_id]:
-
reverse_dict[link_data.entry_id].append(link_data.url)
-
-
# Add category info to link data for categories tracking
-
link_info = link_data.to_dict()
-
link_info["category"] = category
-
link_info["target_username"] = target_username
-
-
all_links.append(link_info)
-
link_categories[category].append(link_info)
-
-
progress.advance(processing_task)
-
-
if verbose and entry_links:
-
console.print(f" Found {len(entry_links)} links in {username}:{entry.title[:50]}...")
-
-
# Determine output path
-
if output_file:
-
output_path = output_file
-
else:
-
output_path = config.git_store / "links.json"
-
-
# Save all extracted links (not just filtered ones)
-
if verbose:
-
console.print("Preparing output data...")
-
-
# Build a set of all URLs that correspond to posts in the git database
-
registered_urls = set()
-
-
# Get all entries from all users and build URL mappings
-
for username in users:
-
entries = git_store.list_entries(username)
-
user_metadata = index.users[username]
-
-
for entry in entries:
-
# Try to match entry URLs with extracted links
-
if hasattr(entry, 'link') and entry.link:
-
registered_urls.add(str(entry.link))
-
-
# Also check entry alternate links if they exist
-
if hasattr(entry, 'links') and entry.links:
-
for link in entry.links:
-
if hasattr(link, 'href') and link.href:
-
registered_urls.add(str(link.href))
-
-
# Build unified structure with metadata
-
unified_links = {}
-
reverse_mapping = {}
-
-
for url, entry_ids in link_dict.items():
-
unified_links[url] = {
-
"referencing_entries": entry_ids
-
}
-
-
# Find target username if this is a tracked post
-
if url in registered_urls:
-
for username in users:
-
user_domains_set = {domain for domain in user_domains.get(username, [])}
-
if any(domain in url for domain in user_domains_set):
-
unified_links[url]["target_username"] = username
-
break
-
-
# Build reverse mapping
-
for entry_id in entry_ids:
-
if entry_id not in reverse_mapping:
-
reverse_mapping[entry_id] = []
-
if url not in reverse_mapping[entry_id]:
-
reverse_mapping[entry_id].append(url)
-
-
# Create unified output data
-
output_data = {
-
"links": unified_links,
-
"reverse_mapping": reverse_mapping,
-
"user_domains": {k: list(v) for k, v in user_domains.items()}
-
}
-
-
if verbose:
-
console.print(f"Found {len(registered_urls)} registered post URLs")
-
console.print(f"Found {len(link_dict)} total links, {sum(1 for link in unified_links.values() if 'target_username' in link)} tracked posts")
-
-
# Save unified data
-
with open(output_path, "w") as f:
-
json.dump(output_data, f, indent=2, default=str)
-
-
# Show summary
-
if not get_tsv_mode():
-
console.print("\n[green]โœ“ Links extraction completed successfully[/green]")
-
-
# Create summary table or TSV output
-
if get_tsv_mode():
-
print("Category\tCount\tDescription")
-
print(f"Internal\t{len(link_categories['internal'])}\tLinks to same user's domain")
-
print(f"User\t{len(link_categories['user'])}\tLinks to other tracked users")
-
print(f"Unknown\t{len(link_categories['unknown'])}\tLinks to external sites")
-
print(f"Total Extracted\t{len(all_links)}\tAll extracted links")
-
print(f"Saved to Output\t{len(output_data['links'])}\tLinks saved to output file")
-
print(f"Cross-references\t{sum(1 for link in unified_links.values() if 'target_username' in link)}\tLinks to registered posts only")
-
else:
-
table = Table(title="Links Summary")
-
table.add_column("Category", style="cyan")
-
table.add_column("Count", style="green")
-
table.add_column("Description", style="white")
-
-
table.add_row("Internal", str(len(link_categories["internal"])), "Links to same user's domain")
-
table.add_row("User", str(len(link_categories["user"])), "Links to other tracked users")
-
table.add_row("Unknown", str(len(link_categories["unknown"])), "Links to external sites")
-
table.add_row("Total Extracted", str(len(all_links)), "All extracted links")
-
table.add_row("Saved to Output", str(len(output_data['links'])), "Links saved to output file")
-
table.add_row("Cross-references", str(sum(1 for link in unified_links.values() if 'target_username' in link)), "Links to registered posts only")
-
-
console.print(table)
-
-
# Show user links if verbose
-
if verbose and link_categories["user"]:
-
if get_tsv_mode():
-
print("User Link Source\tUser Link Target\tLink Count")
-
user_link_counts = {}
-
-
for link in link_categories["user"]:
-
key = f"{link['username']} -> {link['target_username']}"
-
user_link_counts[key] = user_link_counts.get(key, 0) + 1
-
-
for link_pair, count in sorted(user_link_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
-
source, target = link_pair.split(" -> ")
-
print(f"{source}\t{target}\t{count}")
-
else:
-
console.print("\n[bold]User-to-user links:[/bold]")
-
user_link_counts = {}
-
-
for link in link_categories["user"]:
-
key = f"{link['username']} -> {link['target_username']}"
-
user_link_counts[key] = user_link_counts.get(key, 0) + 1
-
-
for link_pair, count in sorted(user_link_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
-
console.print(f" {link_pair}: {count} links")
-
-
if not get_tsv_mode():
-
console.print(f"\nUnified links data saved to: {output_path}")
-
-
except Exception as e:
-
console.print(f"[red]Error extracting links: {e}[/red]")
-
if verbose:
-
console.print_exception()
-
raise typer.Exit(1)
···
+11 -11
src/thicket/cli/commands/list_cmd.py
···
from ..main import app
from ..utils import (
console,
load_config,
print_error,
-
print_feeds_table,
print_feeds_table_from_git,
print_info,
-
print_users_table,
print_users_table_from_git,
-
print_entries_tsv,
-
get_tsv_mode,
)
···
"""List all users."""
index = git_store._load_index()
users = list(index.users.values())
-
if not users:
print_info("No users configured")
return
···
print_feeds_table_from_git(git_store, username)
-
def list_entries(git_store: GitStore, username: Optional[str] = None, limit: Optional[int] = None) -> None:
"""List entries, optionally filtered by user."""
if username:
···
"""Clean HTML content for display in table."""
if not content:
return ""
-
# Remove HTML tags
-
clean_text = re.sub(r'<[^>]+>', ' ', content)
# Replace multiple whitespace with single space
-
clean_text = re.sub(r'\s+', ' ', clean_text)
# Strip and limit length
clean_text = clean_text.strip()
if len(clean_text) > 100:
clean_text = clean_text[:97] + "..."
-
return clean_text
···
if get_tsv_mode():
print_entries_tsv(entries_by_user, usernames)
return
-
table = Table(title="Feed Entries")
table.add_column("User", style="cyan", no_wrap=True)
table.add_column("Title", style="bold")
···
from ..main import app
from ..utils import (
console,
+
get_tsv_mode,
load_config,
+
print_entries_tsv,
print_error,
print_feeds_table_from_git,
print_info,
print_users_table_from_git,
)
···
"""List all users."""
index = git_store._load_index()
users = list(index.users.values())
+
if not users:
print_info("No users configured")
return
···
print_feeds_table_from_git(git_store, username)
+
def list_entries(
+
git_store: GitStore, username: Optional[str] = None, limit: Optional[int] = None
+
) -> None:
"""List entries, optionally filtered by user."""
if username:
···
"""Clean HTML content for display in table."""
if not content:
return ""
+
# Remove HTML tags
+
clean_text = re.sub(r"<[^>]+>", " ", content)
# Replace multiple whitespace with single space
+
clean_text = re.sub(r"\s+", " ", clean_text)
# Strip and limit length
clean_text = clean_text.strip()
if len(clean_text) > 100:
clean_text = clean_text[:97] + "..."
+
return clean_text
···
if get_tsv_mode():
print_entries_tsv(entries_by_user, usernames)
return
+
table = Table(title="Feed Entries")
table.add_column("User", style="cyan", no_wrap=True)
table.add_column("Title", style="bold")
+131 -75
src/thicket/cli/commands/sync.py
···
from typing import Optional
import typer
-
from rich.progress import Progress, SpinnerColumn, TextColumn
-
from ..main import app, console, load_thicket
@app.command()
def sync(
user: Optional[str] = typer.Option(
-
None, "--user", "-u", help="Sync specific user only (default: all users)"
),
config_file: Optional[Path] = typer.Option(
-
None, "--config", help="Configuration file path"
),
-
commit: bool = typer.Option(
-
True, "--commit/--no-commit", help="Commit changes after sync"
),
) -> None:
"""Sync feeds and store entries in Git repository."""
-
try:
-
# Load Thicket instance
-
thicket = load_thicket(config_file)
-
-
# Progress callback for tracking
-
current_task = None
-
-
def progress_callback(message: str, current: int = 0, total: int = 0):
-
nonlocal current_task
-
current_task = message
-
if total > 0:
-
console.print(f"[blue]Progress:[/blue] {message} ({current}/{total})")
-
else:
-
console.print(f"[blue]Info:[/blue] {message}")
-
-
# Run sync with progress
-
with Progress(
-
SpinnerColumn(),
-
TextColumn("[progress.description]{task.description}"),
-
console=console,
-
transient=True,
-
) as progress:
-
task = progress.add_task("Syncing feeds...", total=None)
-
-
# Perform sync
-
results = asyncio.run(thicket.sync_feeds(user, progress_callback))
-
-
progress.remove_task(task)
-
-
# Process results
-
total_new = 0
-
total_processed = 0
-
errors = []
-
-
if isinstance(results, dict):
-
for username, user_results in results.items():
-
if 'error' in user_results:
-
errors.append(f"{username}: {user_results['error']}")
-
continue
-
-
total_new += user_results.get('new_entries', 0)
-
total_processed += user_results.get('feeds_processed', 0)
-
-
console.print(f"[green]โœ“[/green] {username}: {user_results.get('new_entries', 0)} new entries from {user_results.get('feeds_processed', 0)} feeds")
-
-
# Show any feed-specific errors
-
for error in user_results.get('errors', []):
-
console.print(f" [yellow]Warning:[/yellow] {error}")
-
-
# Show errors
-
for error in errors:
-
console.print(f"[red]Error:[/red] {error}")
-
-
# Commit changes if requested
-
if commit and total_new > 0:
-
commit_message = f"Sync feeds: {total_new} new entries from {total_processed} feeds"
-
if thicket.commit_changes(commit_message):
-
console.print(f"[green]โœ“[/green] Committed: {commit_message}")
-
else:
-
console.print("[red]โœ—[/red] Failed to commit changes")
-
-
# Summary
-
if total_new > 0:
-
console.print(f"\n[green]Sync complete:[/green] {total_new} new entries processed")
-
else:
-
console.print("\n[blue]Sync complete:[/blue] No new entries found")
-
except Exception as e:
-
console.print(f"[red]Error:[/red] {str(e)}")
-
raise typer.Exit(1)
···
from typing import Optional
import typer
+
from rich.progress import track
+
from ...core.feed_parser import FeedParser
+
from ...core.git_store import GitStore
+
from ..main import app
+
from ..utils import (
+
load_config,
+
print_error,
+
print_info,
+
print_success,
+
)
@app.command()
def sync(
+
all_users: bool = typer.Option(
+
False, "--all", "-a", help="Sync all users and feeds"
+
),
user: Optional[str] = typer.Option(
+
None, "--user", "-u", help="Sync specific user only"
),
config_file: Optional[Path] = typer.Option(
+
Path("thicket.yaml"), "--config", help="Configuration file path"
),
+
dry_run: bool = typer.Option(
+
False, "--dry-run", help="Show what would be synced without making changes"
),
) -> None:
"""Sync feeds and store entries in Git repository."""
+
+
# Load configuration
+
config = load_config(config_file)
+
+
# Initialize Git store
+
git_store = GitStore(config.git_store)
+
+
# Determine which users to sync from git repository
+
users_to_sync = []
+
if all_users:
+
index = git_store._load_index()
+
users_to_sync = list(index.users.values())
+
elif user:
+
user_metadata = git_store.get_user(user)
+
if not user_metadata:
+
print_error(f"User '{user}' not found in git repository")
+
raise typer.Exit(1)
+
users_to_sync = [user_metadata]
+
else:
+
print_error("Specify --all to sync all users or --user to sync a specific user")
+
raise typer.Exit(1)
+
+
if not users_to_sync:
+
print_info("No users configured to sync")
+
return
+
+
# Sync each user
+
total_new_entries = 0
+
total_updated_entries = 0
+
+
for user_metadata in users_to_sync:
+
print_info(f"Syncing user: {user_metadata.username}")
+
+
user_new_entries = 0
+
user_updated_entries = 0
+
+
# Sync each feed for the user
+
for feed_url in track(
+
user_metadata.feeds, description=f"Syncing {user_metadata.username}'s feeds"
+
):
+
try:
+
new_entries, updated_entries = asyncio.run(
+
sync_feed(git_store, user_metadata.username, feed_url, dry_run)
+
)
+
user_new_entries += new_entries
+
user_updated_entries += updated_entries
+
+
except Exception as e:
+
print_error(f"Failed to sync feed {feed_url}: {e}")
+
continue
+
+
print_info(
+
f"User {user_metadata.username}: {user_new_entries} new, {user_updated_entries} updated"
+
)
+
total_new_entries += user_new_entries
+
total_updated_entries += user_updated_entries
+
+
# Commit changes if not dry run
+
if not dry_run and (total_new_entries > 0 or total_updated_entries > 0):
+
commit_message = f"Sync feeds: {total_new_entries} new entries, {total_updated_entries} updated"
+
git_store.commit_changes(commit_message)
+
print_success(f"Committed changes: {commit_message}")
+
+
# Summary
+
if dry_run:
+
print_info(
+
f"Dry run complete: would sync {total_new_entries} new entries, {total_updated_entries} updated"
+
)
+
else:
+
print_success(
+
f"Sync complete: {total_new_entries} new entries, {total_updated_entries} updated"
+
)
+
+
+
async def sync_feed(
+
git_store: GitStore, username: str, feed_url, dry_run: bool
+
) -> tuple[int, int]:
+
"""Sync a single feed for a user."""
+
+
parser = FeedParser()
+
try:
+
# Fetch and parse feed
+
content = await parser.fetch_feed(feed_url)
+
metadata, entries = parser.parse_feed(content, feed_url)
+
+
new_entries = 0
+
updated_entries = 0
+
+
# Process each entry
+
for entry in entries:
+
try:
+
# Check if entry already exists
+
existing_entry = git_store.get_entry(username, entry.id)
+
+
if existing_entry:
+
# Check if entry has been updated
+
if existing_entry.updated != entry.updated:
+
if not dry_run:
+
git_store.store_entry(username, entry)
+
updated_entries += 1
+
else:
+
# New entry
+
if not dry_run:
+
git_store.store_entry(username, entry)
+
new_entries += 1
+
+
except Exception as e:
+
print_error(f"Failed to process entry {entry.id}: {e}")
+
continue
+
+
return new_entries, updated_entries
+
except Exception as e:
+
print_error(f"Failed to sync feed {feed_url}: {e}")
+
return 0, 0
+2 -36
src/thicket/cli/main.py
···
"""Main CLI application using Typer."""
-
from pathlib import Path
-
from typing import Optional
-
import typer
from rich.console import Console
-
from .. import __version__, Thicket, ThicketConfig
app = typer.Typer(
name="thicket",
···
raise typer.Exit()
-
def load_thicket(config_path: Optional[Path] = None) -> Thicket:
-
"""Load Thicket instance from configuration."""
-
if config_path and config_path.exists():
-
return Thicket.from_config_file(config_path)
-
-
# Try default locations
-
default_paths = [
-
Path("thicket.yaml"),
-
Path("thicket.yml"),
-
Path("thicket.json"),
-
Path.home() / ".config" / "thicket" / "config.yaml",
-
Path.home() / ".thicket.yaml",
-
]
-
-
for path in default_paths:
-
if path.exists():
-
return Thicket.from_config_file(path)
-
-
# No config found
-
console.print("[red]Error:[/red] No configuration file found.")
-
console.print("Use [bold]thicket init[/bold] to create a new configuration or specify --config")
-
raise typer.Exit(1)
-
-
-
def get_config_path() -> Path:
-
"""Get the default configuration path for new configs."""
-
config_dir = Path.home() / ".config" / "thicket"
-
config_dir.mkdir(parents=True, exist_ok=True)
-
return config_dir / "config.yaml"
-
-
@app.callback()
def main(
version: bool = typer.Option(
···
# Import commands to register them
-
from .commands import add, duplicates, generate, index_cmd, info_cmd, init, links_cmd, list_cmd, sync
if __name__ == "__main__":
app()
···
"""Main CLI application using Typer."""
import typer
from rich.console import Console
+
from .. import __version__
app = typer.Typer(
name="thicket",
···
raise typer.Exit()
@app.callback()
def main(
version: bool = typer.Option(
···
# Import commands to register them
+
from .commands import add, duplicates, info_cmd, init, list_cmd, sync # noqa: F401
if __name__ == "__main__":
app()
+32 -20
src/thicket/cli/utils.py
···
from rich.progress import Progress, SpinnerColumn, TextColumn
from rich.table import Table
-
from ..models import ThicketConfig, UserMetadata
from ..core.git_store import GitStore
console = Console()
···
def get_tsv_mode() -> bool:
"""Get the global TSV mode setting."""
from .main import tsv_mode
return tsv_mode
···
default_config = Path("thicket.yaml")
if default_config.exists():
import yaml
with open(default_config) as f:
config_data = yaml.safe_load(f)
return ThicketConfig(**config_data)
-
# Fall back to environment variables
return ThicketConfig()
except Exception as e:
console.print(f"[red]Error loading configuration: {e}[/red]")
-
console.print("[yellow]Run 'thicket init' to create a new configuration.[/yellow]")
raise typer.Exit(1) from e
···
if get_tsv_mode():
print_users_tsv(config)
return
-
table = Table(title="Users and Feeds")
table.add_column("Username", style="cyan", no_wrap=True)
table.add_column("Display Name", style="magenta")
···
if get_tsv_mode():
print_feeds_tsv(config, username)
return
-
table = Table(title=f"Feeds{f' for {username}' if username else ''}")
table.add_column("Username", style="cyan", no_wrap=True)
table.add_column("Feed URL", style="blue")
···
if get_tsv_mode():
print_users_tsv_from_git(users)
return
-
table = Table(title="Users and Feeds")
table.add_column("Username", style="cyan", no_wrap=True)
table.add_column("Display Name", style="magenta")
···
console.print(table)
-
def print_feeds_table_from_git(git_store: GitStore, username: Optional[str] = None) -> None:
"""Print a table of feeds from git repository."""
if get_tsv_mode():
print_feeds_tsv_from_git(git_store, username)
return
-
table = Table(title=f"Feeds{f' for {username}' if username else ''}")
table.add_column("Username", style="cyan", no_wrap=True)
table.add_column("Feed URL", style="blue")
···
print("Username\tDisplay Name\tEmail\tHomepage\tFeeds")
for user in config.users:
feeds_str = ",".join(str(feed) for feed in user.feeds)
-
print(f"{user.username}\t{user.display_name or ''}\t{user.email or ''}\t{user.homepage or ''}\t{feeds_str}")
def print_users_tsv_from_git(users: list[UserMetadata]) -> None:
···
print("Username\tDisplay Name\tEmail\tHomepage\tFeeds")
for user in users:
feeds_str = ",".join(user.feeds)
-
print(f"{user.username}\t{user.display_name or ''}\t{user.email or ''}\t{user.homepage or ''}\t{feeds_str}")
def print_feeds_tsv(config: ThicketConfig, username: Optional[str] = None) -> None:
···
print("Username\tFeed URL\tStatus")
users = [config.find_user(username)] if username else config.users
users = [u for u in users if u is not None]
-
for user in users:
for feed in user.feeds:
print(f"{user.username}\t{feed}\tActive")
-
def print_feeds_tsv_from_git(git_store: GitStore, username: Optional[str] = None) -> None:
"""Print feeds from git repository in TSV format."""
print("Username\tFeed URL\tStatus")
-
if username:
user = git_store.get_user(username)
users = [user] if user else []
else:
index = git_store._load_index()
users = list(index.users.values())
-
for user in users:
for feed in user.feeds:
print(f"{user.username}\t{feed}\tActive")
···
def print_entries_tsv(entries_by_user: list[list], usernames: list[str]) -> None:
"""Print entries in TSV format."""
print("User\tAtom ID\tTitle\tUpdated\tURL")
-
# Combine all entries with usernames
all_entries = []
for entries, username in zip(entries_by_user, usernames):
for entry in entries:
all_entries.append((username, entry))
-
# Sort by updated time (newest first)
all_entries.sort(key=lambda x: x[1].updated, reverse=True)
-
for username, entry in all_entries:
# Format updated time
updated_str = entry.updated.strftime("%Y-%m-%d %H:%M")
-
# Escape tabs and newlines in title to preserve TSV format
-
title = entry.title.replace('\t', ' ').replace('\n', ' ').replace('\r', ' ')
-
print(f"{username}\t{entry.id}\t{title}\t{updated_str}\t{entry.link}")
···
from rich.progress import Progress, SpinnerColumn, TextColumn
from rich.table import Table
from ..core.git_store import GitStore
+
from ..models import ThicketConfig, UserMetadata
console = Console()
···
def get_tsv_mode() -> bool:
"""Get the global TSV mode setting."""
from .main import tsv_mode
+
return tsv_mode
···
default_config = Path("thicket.yaml")
if default_config.exists():
import yaml
+
with open(default_config) as f:
config_data = yaml.safe_load(f)
return ThicketConfig(**config_data)
+
# Fall back to environment variables
return ThicketConfig()
except Exception as e:
console.print(f"[red]Error loading configuration: {e}[/red]")
+
console.print(
+
"[yellow]Run 'thicket init' to create a new configuration.[/yellow]"
+
)
raise typer.Exit(1) from e
···
if get_tsv_mode():
print_users_tsv(config)
return
+
table = Table(title="Users and Feeds")
table.add_column("Username", style="cyan", no_wrap=True)
table.add_column("Display Name", style="magenta")
···
if get_tsv_mode():
print_feeds_tsv(config, username)
return
+
table = Table(title=f"Feeds{f' for {username}' if username else ''}")
table.add_column("Username", style="cyan", no_wrap=True)
table.add_column("Feed URL", style="blue")
···
if get_tsv_mode():
print_users_tsv_from_git(users)
return
+
table = Table(title="Users and Feeds")
table.add_column("Username", style="cyan", no_wrap=True)
table.add_column("Display Name", style="magenta")
···
console.print(table)
+
def print_feeds_table_from_git(
+
git_store: GitStore, username: Optional[str] = None
+
) -> None:
"""Print a table of feeds from git repository."""
if get_tsv_mode():
print_feeds_tsv_from_git(git_store, username)
return
+
table = Table(title=f"Feeds{f' for {username}' if username else ''}")
table.add_column("Username", style="cyan", no_wrap=True)
table.add_column("Feed URL", style="blue")
···
print("Username\tDisplay Name\tEmail\tHomepage\tFeeds")
for user in config.users:
feeds_str = ",".join(str(feed) for feed in user.feeds)
+
print(
+
f"{user.username}\t{user.display_name or ''}\t{user.email or ''}\t{user.homepage or ''}\t{feeds_str}"
+
)
def print_users_tsv_from_git(users: list[UserMetadata]) -> None:
···
print("Username\tDisplay Name\tEmail\tHomepage\tFeeds")
for user in users:
feeds_str = ",".join(user.feeds)
+
print(
+
f"{user.username}\t{user.display_name or ''}\t{user.email or ''}\t{user.homepage or ''}\t{feeds_str}"
+
)
def print_feeds_tsv(config: ThicketConfig, username: Optional[str] = None) -> None:
···
print("Username\tFeed URL\tStatus")
users = [config.find_user(username)] if username else config.users
users = [u for u in users if u is not None]
+
for user in users:
for feed in user.feeds:
print(f"{user.username}\t{feed}\tActive")
+
def print_feeds_tsv_from_git(
+
git_store: GitStore, username: Optional[str] = None
+
) -> None:
"""Print feeds from git repository in TSV format."""
print("Username\tFeed URL\tStatus")
+
if username:
user = git_store.get_user(username)
users = [user] if user else []
else:
index = git_store._load_index()
users = list(index.users.values())
+
for user in users:
for feed in user.feeds:
print(f"{user.username}\t{feed}\tActive")
···
def print_entries_tsv(entries_by_user: list[list], usernames: list[str]) -> None:
"""Print entries in TSV format."""
print("User\tAtom ID\tTitle\tUpdated\tURL")
+
# Combine all entries with usernames
all_entries = []
for entries, username in zip(entries_by_user, usernames):
for entry in entries:
all_entries.append((username, entry))
+
# Sort by updated time (newest first)
all_entries.sort(key=lambda x: x[1].updated, reverse=True)
+
for username, entry in all_entries:
# Format updated time
updated_str = entry.updated.strftime("%Y-%m-%d %H:%M")
+
# Escape tabs and newlines in title to preserve TSV format
+
title = entry.title.replace("\t", " ").replace("\n", " ").replace("\r", " ")
+
print(f"{username}\t{entry.id}\t{title}\t{updated_str}\t{entry.link}")
+84 -55
src/thicket/core/feed_parser.py
···
"""Initialize the feed parser."""
self.user_agent = user_agent
self.allowed_tags = [
-
"a", "abbr", "acronym", "b", "blockquote", "br", "code", "em",
-
"i", "li", "ol", "p", "pre", "strong", "ul", "h1", "h2", "h3",
-
"h4", "h5", "h6", "img", "div", "span",
]
self.allowed_attributes = {
"a": ["href", "title"],
···
response.raise_for_status()
return response.text
-
def parse_feed(self, content: str, source_url: Optional[HttpUrl] = None) -> tuple[FeedMetadata, list[AtomEntry]]:
"""Parse feed content and return metadata and entries."""
parsed = feedparser.parse(content)
···
author_email = None
author_uri = None
-
if hasattr(feed, 'author_detail'):
-
author_name = feed.author_detail.get('name')
-
author_email = feed.author_detail.get('email')
-
author_uri = feed.author_detail.get('href')
-
elif hasattr(feed, 'author'):
author_name = feed.author
# Parse managing editor for RSS feeds
-
if not author_email and hasattr(feed, 'managingEditor'):
author_email = feed.managingEditor
# Parse feed link
feed_link = None
-
if hasattr(feed, 'link'):
try:
feed_link = HttpUrl(feed.link)
except ValidationError:
···
icon = None
image_url = None
-
if hasattr(feed, 'image'):
try:
-
image_url = HttpUrl(feed.image.get('href', feed.image.get('url', '')))
except (ValidationError, AttributeError):
pass
-
if hasattr(feed, 'icon'):
try:
icon = HttpUrl(feed.icon)
except ValidationError:
pass
-
if hasattr(feed, 'logo'):
try:
logo = HttpUrl(feed.logo)
except ValidationError:
pass
return FeedMetadata(
-
title=getattr(feed, 'title', None),
author_name=author_name,
author_email=author_email,
author_uri=HttpUrl(author_uri) if author_uri else None,
···
logo=logo,
icon=icon,
image_url=image_url,
-
description=getattr(feed, 'description', None),
)
-
def _normalize_entry(self, entry: feedparser.FeedParserDict, source_url: Optional[HttpUrl] = None) -> AtomEntry:
"""Normalize an entry to Atom format."""
# Parse timestamps
-
updated = self._parse_timestamp(entry.get('updated_parsed') or entry.get('published_parsed'))
-
published = self._parse_timestamp(entry.get('published_parsed'))
# Parse content
content = self._extract_content(entry)
···
# Parse categories/tags
categories = []
-
if hasattr(entry, 'tags'):
-
categories = [tag.get('term', '') for tag in entry.tags if tag.get('term')]
# Sanitize HTML content
if content:
content = self._sanitize_html(content)
-
summary = entry.get('summary', '')
if summary:
summary = self._sanitize_html(summary)
return AtomEntry(
-
id=entry.get('id', entry.get('link', '')),
-
title=entry.get('title', ''),
-
link=HttpUrl(entry.get('link', '')),
updated=updated,
published=published,
summary=summary or None,
···
content_type=content_type,
author=author,
categories=categories,
-
rights=entry.get('rights', None),
source=str(source_url) if source_url else None,
)
···
def _extract_content(self, entry: feedparser.FeedParserDict) -> Optional[str]:
"""Extract the best content from an entry."""
# Prefer content over summary
-
if hasattr(entry, 'content') and entry.content:
# Find the best content (prefer text/html, then text/plain)
for content_item in entry.content:
-
if content_item.get('type') in ['text/html', 'html']:
-
return content_item.get('value', '')
-
elif content_item.get('type') in ['text/plain', 'text']:
-
return content_item.get('value', '')
# Fallback to first content item
-
return entry.content[0].get('value', '')
# Fallback to summary
-
return entry.get('summary', '')
def _extract_content_type(self, entry: feedparser.FeedParserDict) -> str:
"""Extract content type from entry."""
-
if hasattr(entry, 'content') and entry.content:
-
content_type = entry.content[0].get('type', 'html')
# Normalize content type
-
if content_type in ['text/html', 'html']:
-
return 'html'
-
elif content_type in ['text/plain', 'text']:
-
return 'text'
-
elif content_type == 'xhtml':
-
return 'xhtml'
-
return 'html'
def _extract_author(self, entry: feedparser.FeedParserDict) -> Optional[dict]:
"""Extract author information from entry."""
author = {}
-
if hasattr(entry, 'author_detail'):
-
author.update({
-
'name': entry.author_detail.get('name'),
-
'email': entry.author_detail.get('email'),
-
'uri': entry.author_detail.get('href'),
-
})
-
elif hasattr(entry, 'author'):
-
author['name'] = entry.author
return author if author else None
···
# Start with the path component
if parsed.path:
# Remove leading slash and replace problematic characters
-
safe_id = parsed.path.lstrip('/').replace('/', '_').replace('\\', '_')
else:
# Use the entire ID as fallback
safe_id = entry_id
···
# Replace problematic characters
safe_chars = []
for char in safe_id:
-
if char.isalnum() or char in '-_.':
safe_chars.append(char)
else:
-
safe_chars.append('_')
-
safe_id = ''.join(safe_chars)
# Ensure it's not too long (max 200 chars)
if len(safe_id) > 200:
···
"""Initialize the feed parser."""
self.user_agent = user_agent
self.allowed_tags = [
+
"a",
+
"abbr",
+
"acronym",
+
"b",
+
"blockquote",
+
"br",
+
"code",
+
"em",
+
"i",
+
"li",
+
"ol",
+
"p",
+
"pre",
+
"strong",
+
"ul",
+
"h1",
+
"h2",
+
"h3",
+
"h4",
+
"h5",
+
"h6",
+
"img",
+
"div",
+
"span",
]
self.allowed_attributes = {
"a": ["href", "title"],
···
response.raise_for_status()
return response.text
+
def parse_feed(
+
self, content: str, source_url: Optional[HttpUrl] = None
+
) -> tuple[FeedMetadata, list[AtomEntry]]:
"""Parse feed content and return metadata and entries."""
parsed = feedparser.parse(content)
···
author_email = None
author_uri = None
+
if hasattr(feed, "author_detail"):
+
author_name = feed.author_detail.get("name")
+
author_email = feed.author_detail.get("email")
+
author_uri = feed.author_detail.get("href")
+
elif hasattr(feed, "author"):
author_name = feed.author
# Parse managing editor for RSS feeds
+
if not author_email and hasattr(feed, "managingEditor"):
author_email = feed.managingEditor
# Parse feed link
feed_link = None
+
if hasattr(feed, "link"):
try:
feed_link = HttpUrl(feed.link)
except ValidationError:
···
icon = None
image_url = None
+
if hasattr(feed, "image"):
try:
+
image_url = HttpUrl(feed.image.get("href", feed.image.get("url", "")))
except (ValidationError, AttributeError):
pass
+
if hasattr(feed, "icon"):
try:
icon = HttpUrl(feed.icon)
except ValidationError:
pass
+
if hasattr(feed, "logo"):
try:
logo = HttpUrl(feed.logo)
except ValidationError:
pass
return FeedMetadata(
+
title=getattr(feed, "title", None),
author_name=author_name,
author_email=author_email,
author_uri=HttpUrl(author_uri) if author_uri else None,
···
logo=logo,
icon=icon,
image_url=image_url,
+
description=getattr(feed, "description", None),
)
+
def _normalize_entry(
+
self, entry: feedparser.FeedParserDict, source_url: Optional[HttpUrl] = None
+
) -> AtomEntry:
"""Normalize an entry to Atom format."""
# Parse timestamps
+
updated = self._parse_timestamp(
+
entry.get("updated_parsed") or entry.get("published_parsed")
+
)
+
published = self._parse_timestamp(entry.get("published_parsed"))
# Parse content
content = self._extract_content(entry)
···
# Parse categories/tags
categories = []
+
if hasattr(entry, "tags"):
+
categories = [tag.get("term", "") for tag in entry.tags if tag.get("term")]
# Sanitize HTML content
if content:
content = self._sanitize_html(content)
+
summary = entry.get("summary", "")
if summary:
summary = self._sanitize_html(summary)
return AtomEntry(
+
id=entry.get("id", entry.get("link", "")),
+
title=entry.get("title", ""),
+
link=HttpUrl(entry.get("link", "")),
updated=updated,
published=published,
summary=summary or None,
···
content_type=content_type,
author=author,
categories=categories,
+
rights=entry.get("rights", None),
source=str(source_url) if source_url else None,
)
···
def _extract_content(self, entry: feedparser.FeedParserDict) -> Optional[str]:
"""Extract the best content from an entry."""
# Prefer content over summary
+
if hasattr(entry, "content") and entry.content:
# Find the best content (prefer text/html, then text/plain)
for content_item in entry.content:
+
if content_item.get("type") in ["text/html", "html"]:
+
return content_item.get("value", "")
+
elif content_item.get("type") in ["text/plain", "text"]:
+
return content_item.get("value", "")
# Fallback to first content item
+
return entry.content[0].get("value", "")
# Fallback to summary
+
return entry.get("summary", "")
def _extract_content_type(self, entry: feedparser.FeedParserDict) -> str:
"""Extract content type from entry."""
+
if hasattr(entry, "content") and entry.content:
+
content_type = entry.content[0].get("type", "html")
# Normalize content type
+
if content_type in ["text/html", "html"]:
+
return "html"
+
elif content_type in ["text/plain", "text"]:
+
return "text"
+
elif content_type == "xhtml":
+
return "xhtml"
+
return "html"
def _extract_author(self, entry: feedparser.FeedParserDict) -> Optional[dict]:
"""Extract author information from entry."""
author = {}
+
if hasattr(entry, "author_detail"):
+
author.update(
+
{
+
"name": entry.author_detail.get("name"),
+
"email": entry.author_detail.get("email"),
+
"uri": entry.author_detail.get("href"),
+
}
+
)
+
elif hasattr(entry, "author"):
+
author["name"] = entry.author
return author if author else None
···
# Start with the path component
if parsed.path:
# Remove leading slash and replace problematic characters
+
safe_id = parsed.path.lstrip("/").replace("/", "_").replace("\\", "_")
else:
# Use the entire ID as fallback
safe_id = entry_id
···
# Replace problematic characters
safe_chars = []
for char in safe_id:
+
if char.isalnum() or char in "-_.":
safe_chars.append(char)
else:
+
safe_chars.append("_")
+
safe_id = "".join(safe_chars)
# Ensure it's not too long (max 200 chars)
if len(safe_id) > 200:
+45 -18
src/thicket/core/git_store.py
···
"""Save the index to index.json."""
index_path = self.repo_path / "index.json"
with open(index_path, "w") as f:
-
json.dump(index.model_dump(mode="json", exclude_none=True), f, indent=2, default=str)
def _load_index(self) -> GitStoreIndex:
"""Load the index from index.json."""
···
return DuplicateMap(**data)
-
def add_user(self, username: str, display_name: Optional[str] = None,
-
email: Optional[str] = None, homepage: Optional[str] = None,
-
icon: Optional[str] = None, feeds: Optional[list[str]] = None) -> UserMetadata:
"""Add a new user to the Git store."""
index = self._load_index()
···
created=datetime.now(),
last_updated=datetime.now(),
)
-
# Update index
index.add_user(user_metadata)
···
user.update_timestamp()
-
# Update index
index.add_user(user)
self._save_index(index)
···
# Sanitize entry ID for filename
from .feed_parser import FeedParser
parser = FeedParser()
safe_id = parser.sanitize_entry_id(entry.id)
···
# Save entry
with open(entry_path, "w") as f:
-
json.dump(entry.model_dump(mode="json", exclude_none=True), f, indent=2, default=str)
# Update user metadata if new entry
if not entry_exists:
···
# Sanitize entry ID
from .feed_parser import FeedParser
parser = FeedParser()
safe_id = parser.sanitize_entry_id(entry_id)
···
return AtomEntry(**data)
-
def list_entries(self, username: str, limit: Optional[int] = None) -> list[AtomEntry]:
"""List entries for a user."""
user = self.get_user(username)
if not user:
···
return []
entries = []
-
entry_files = sorted(user_dir.glob("*.json"), key=lambda p: p.stat().st_mtime, reverse=True)
-
if limit:
entry_files = entry_files[:limit]
···
"total_entries": index.total_entries,
"total_duplicates": len(duplicates.duplicates),
"last_updated": index.last_updated,
-
"repository_size": sum(f.stat().st_size for f in self.repo_path.rglob("*") if f.is_file()),
}
-
def search_entries(self, query: str, username: Optional[str] = None,
-
limit: Optional[int] = None) -> list[tuple[str, AtomEntry]]:
"""Search entries by content."""
results = []
···
entry = AtomEntry(**data)
# Simple text search in title, summary, and content
-
searchable_text = " ".join(filter(None, [
-
entry.title,
-
entry.summary or "",
-
entry.content or "",
-
])).lower()
if query.lower() in searchable_text:
results.append((user.username, entry))
···
"""Save the index to index.json."""
index_path = self.repo_path / "index.json"
with open(index_path, "w") as f:
+
json.dump(
+
index.model_dump(mode="json", exclude_none=True),
+
f,
+
indent=2,
+
default=str,
+
)
def _load_index(self) -> GitStoreIndex:
"""Load the index from index.json."""
···
return DuplicateMap(**data)
+
def add_user(
+
self,
+
username: str,
+
display_name: Optional[str] = None,
+
email: Optional[str] = None,
+
homepage: Optional[str] = None,
+
icon: Optional[str] = None,
+
feeds: Optional[list[str]] = None,
+
) -> UserMetadata:
"""Add a new user to the Git store."""
index = self._load_index()
···
created=datetime.now(),
last_updated=datetime.now(),
)
# Update index
index.add_user(user_metadata)
···
user.update_timestamp()
# Update index
index.add_user(user)
self._save_index(index)
···
# Sanitize entry ID for filename
from .feed_parser import FeedParser
+
parser = FeedParser()
safe_id = parser.sanitize_entry_id(entry.id)
···
# Save entry
with open(entry_path, "w") as f:
+
json.dump(
+
entry.model_dump(mode="json", exclude_none=True),
+
f,
+
indent=2,
+
default=str,
+
)
# Update user metadata if new entry
if not entry_exists:
···
# Sanitize entry ID
from .feed_parser import FeedParser
+
parser = FeedParser()
safe_id = parser.sanitize_entry_id(entry_id)
···
return AtomEntry(**data)
+
def list_entries(
+
self, username: str, limit: Optional[int] = None
+
) -> list[AtomEntry]:
"""List entries for a user."""
user = self.get_user(username)
if not user:
···
return []
entries = []
+
entry_files = sorted(
+
user_dir.glob("*.json"), key=lambda p: p.stat().st_mtime, reverse=True
+
)
if limit:
entry_files = entry_files[:limit]
···
"total_entries": index.total_entries,
"total_duplicates": len(duplicates.duplicates),
"last_updated": index.last_updated,
+
"repository_size": sum(
+
f.stat().st_size for f in self.repo_path.rglob("*") if f.is_file()
+
),
}
+
def search_entries(
+
self, query: str, username: Optional[str] = None, limit: Optional[int] = None
+
) -> list[tuple[str, AtomEntry]]:
"""Search entries by content."""
results = []
···
entry = AtomEntry(**data)
# Simple text search in title, summary, and content
+
searchable_text = " ".join(
+
filter(
+
None,
+
[
+
entry.title,
+
entry.summary or "",
+
entry.content or "",
+
],
+
)
+
).lower()
if query.lower() in searchable_text:
results.append((user.username, entry))
-438
src/thicket/core/reference_parser.py
···
-
"""Reference detection and parsing for blog entries."""
-
-
import re
-
from typing import Optional
-
from urllib.parse import urlparse
-
-
from ..models import AtomEntry
-
-
-
class BlogReference:
-
"""Represents a reference from one blog entry to another."""
-
-
def __init__(
-
self,
-
source_entry_id: str,
-
source_username: str,
-
target_url: str,
-
target_username: Optional[str] = None,
-
target_entry_id: Optional[str] = None,
-
):
-
self.source_entry_id = source_entry_id
-
self.source_username = source_username
-
self.target_url = target_url
-
self.target_username = target_username
-
self.target_entry_id = target_entry_id
-
-
def to_dict(self) -> dict:
-
"""Convert to dictionary for JSON serialization."""
-
result = {
-
"source_entry_id": self.source_entry_id,
-
"source_username": self.source_username,
-
"target_url": self.target_url,
-
}
-
-
# Only include optional fields if they are not None
-
if self.target_username is not None:
-
result["target_username"] = self.target_username
-
if self.target_entry_id is not None:
-
result["target_entry_id"] = self.target_entry_id
-
-
return result
-
-
@classmethod
-
def from_dict(cls, data: dict) -> "BlogReference":
-
"""Create from dictionary."""
-
return cls(
-
source_entry_id=data["source_entry_id"],
-
source_username=data["source_username"],
-
target_url=data["target_url"],
-
target_username=data.get("target_username"),
-
target_entry_id=data.get("target_entry_id"),
-
)
-
-
-
class ReferenceIndex:
-
"""Index of blog-to-blog references for creating threaded views."""
-
-
def __init__(self):
-
self.references: list[BlogReference] = []
-
self.outbound_refs: dict[
-
str, list[BlogReference]
-
] = {} # entry_id -> outbound refs
-
self.inbound_refs: dict[
-
str, list[BlogReference]
-
] = {} # entry_id -> inbound refs
-
self.user_domains: dict[str, set[str]] = {} # username -> set of domains
-
-
def add_reference(self, ref: BlogReference) -> None:
-
"""Add a reference to the index."""
-
self.references.append(ref)
-
-
# Update outbound references
-
source_key = f"{ref.source_username}:{ref.source_entry_id}"
-
if source_key not in self.outbound_refs:
-
self.outbound_refs[source_key] = []
-
self.outbound_refs[source_key].append(ref)
-
-
# Update inbound references if we can identify the target
-
if ref.target_username and ref.target_entry_id:
-
target_key = f"{ref.target_username}:{ref.target_entry_id}"
-
if target_key not in self.inbound_refs:
-
self.inbound_refs[target_key] = []
-
self.inbound_refs[target_key].append(ref)
-
-
def get_outbound_refs(self, username: str, entry_id: str) -> list[BlogReference]:
-
"""Get all outbound references from an entry."""
-
key = f"{username}:{entry_id}"
-
return self.outbound_refs.get(key, [])
-
-
def get_inbound_refs(self, username: str, entry_id: str) -> list[BlogReference]:
-
"""Get all inbound references to an entry."""
-
key = f"{username}:{entry_id}"
-
return self.inbound_refs.get(key, [])
-
-
def get_thread_members(self, username: str, entry_id: str) -> set[tuple[str, str]]:
-
"""Get all entries that are part of the same thread."""
-
visited = set()
-
to_visit = [(username, entry_id)]
-
thread_members = set()
-
-
while to_visit:
-
current_user, current_entry = to_visit.pop()
-
if (current_user, current_entry) in visited:
-
continue
-
-
visited.add((current_user, current_entry))
-
thread_members.add((current_user, current_entry))
-
-
# Add outbound references
-
for ref in self.get_outbound_refs(current_user, current_entry):
-
if ref.target_username and ref.target_entry_id:
-
to_visit.append((ref.target_username, ref.target_entry_id))
-
-
# Add inbound references
-
for ref in self.get_inbound_refs(current_user, current_entry):
-
to_visit.append((ref.source_username, ref.source_entry_id))
-
-
return thread_members
-
-
def to_dict(self) -> dict:
-
"""Convert to dictionary for JSON serialization."""
-
return {
-
"references": [ref.to_dict() for ref in self.references],
-
"user_domains": {k: list(v) for k, v in self.user_domains.items()},
-
}
-
-
@classmethod
-
def from_dict(cls, data: dict) -> "ReferenceIndex":
-
"""Create from dictionary."""
-
index = cls()
-
for ref_data in data.get("references", []):
-
ref = BlogReference.from_dict(ref_data)
-
index.add_reference(ref)
-
-
for username, domains in data.get("user_domains", {}).items():
-
index.user_domains[username] = set(domains)
-
-
return index
-
-
-
class ReferenceParser:
-
"""Parses blog entries to detect references to other blogs."""
-
-
def __init__(self):
-
# Common blog platforms and patterns
-
self.blog_patterns = [
-
r"https?://[^/]+\.(?:org|com|net|io|dev|me|co\.uk)/.*", # Common blog domains
-
r"https?://[^/]+\.github\.io/.*", # GitHub Pages
-
r"https?://[^/]+\.substack\.com/.*", # Substack
-
r"https?://medium\.com/.*", # Medium
-
r"https?://[^/]+\.wordpress\.com/.*", # WordPress.com
-
r"https?://[^/]+\.blogspot\.com/.*", # Blogger
-
]
-
-
# Compile regex patterns
-
self.link_pattern = re.compile(
-
r'<a[^>]+href="([^"]+)"[^>]*>(.*?)</a>', re.IGNORECASE | re.DOTALL
-
)
-
self.url_pattern = re.compile(r'https?://[^\s<>"]+')
-
-
def extract_links_from_html(self, html_content: str) -> list[tuple[str, str]]:
-
"""Extract all links from HTML content."""
-
links = []
-
-
# Extract links from <a> tags
-
for match in self.link_pattern.finditer(html_content):
-
url = match.group(1)
-
text = re.sub(
-
r"<[^>]+>", "", match.group(2)
-
).strip() # Remove HTML tags from link text
-
links.append((url, text))
-
-
return links
-
-
def is_blog_url(self, url: str) -> bool:
-
"""Check if a URL likely points to a blog post."""
-
for pattern in self.blog_patterns:
-
if re.match(pattern, url):
-
return True
-
return False
-
-
def _is_likely_blog_post_url(self, url: str) -> bool:
-
"""Check if a same-domain URL likely points to a blog post (not CSS, images, etc.)."""
-
parsed_url = urlparse(url)
-
path = parsed_url.path.lower()
-
-
# Skip obvious non-blog content
-
if any(path.endswith(ext) for ext in ['.css', '.js', '.png', '.jpg', '.jpeg', '.gif', '.svg', '.ico', '.pdf', '.xml', '.json']):
-
return False
-
-
# Skip common non-blog paths
-
if any(segment in path for segment in ['/static/', '/assets/', '/css/', '/js/', '/images/', '/img/', '/media/', '/uploads/']):
-
return False
-
-
# Skip fragment-only links (same page anchors)
-
if not path or path == '/':
-
return False
-
-
# Look for positive indicators of blog posts
-
# Common blog post patterns: dates, slugs, post indicators
-
blog_indicators = [
-
r'/\d{4}/', # Year in path
-
r'/\d{4}/\d{2}/', # Year/month in path
-
r'/blog/',
-
r'/post/',
-
r'/posts/',
-
r'/articles?/',
-
r'/notes?/',
-
r'/entries/',
-
r'/writing/',
-
]
-
-
for pattern in blog_indicators:
-
if re.search(pattern, path):
-
return True
-
-
# If it has a reasonable path depth and doesn't match exclusions, likely a blog post
-
path_segments = [seg for seg in path.split('/') if seg]
-
return len(path_segments) >= 1 # At least one meaningful path segment
-
-
def resolve_target_user(
-
self, url: str, user_domains: dict[str, set[str]]
-
) -> Optional[str]:
-
"""Try to resolve a URL to a known user based on domain mapping."""
-
parsed_url = urlparse(url)
-
domain = parsed_url.netloc.lower()
-
-
for username, domains in user_domains.items():
-
if domain in domains:
-
return username
-
-
return None
-
-
def extract_references(
-
self, entry: AtomEntry, username: str, user_domains: dict[str, set[str]]
-
) -> list[BlogReference]:
-
"""Extract all blog references from an entry."""
-
references = []
-
-
# Combine all text content for analysis
-
content_to_search = []
-
if entry.content:
-
content_to_search.append(entry.content)
-
if entry.summary:
-
content_to_search.append(entry.summary)
-
-
for content in content_to_search:
-
links = self.extract_links_from_html(content)
-
-
for url, _link_text in links:
-
entry_domain = (
-
urlparse(str(entry.link)).netloc.lower() if entry.link else ""
-
)
-
link_domain = urlparse(url).netloc.lower()
-
-
# Check if this looks like a blog URL
-
if not self.is_blog_url(url):
-
continue
-
-
# For same-domain links, apply additional filtering to avoid non-blog content
-
if link_domain == entry_domain:
-
# Only include same-domain links that look like blog posts
-
if not self._is_likely_blog_post_url(url):
-
continue
-
-
# Try to resolve to a known user
-
if link_domain == entry_domain:
-
# Same domain - target user is the same as source user
-
target_username: Optional[str] = username
-
else:
-
# Different domain - try to resolve
-
target_username = self.resolve_target_user(url, user_domains)
-
-
ref = BlogReference(
-
source_entry_id=entry.id,
-
source_username=username,
-
target_url=url,
-
target_username=target_username,
-
target_entry_id=None, # Will be resolved later if possible
-
)
-
-
references.append(ref)
-
-
return references
-
-
def build_user_domain_mapping(self, git_store: "GitStore") -> dict[str, set[str]]:
-
"""Build mapping of usernames to their known domains."""
-
user_domains = {}
-
index = git_store._load_index()
-
-
for username, user_metadata in index.users.items():
-
domains = set()
-
-
# Add domains from feeds
-
for feed_url in user_metadata.feeds:
-
domain = urlparse(feed_url).netloc.lower()
-
if domain:
-
domains.add(domain)
-
-
# Add domain from homepage
-
if user_metadata.homepage:
-
domain = urlparse(str(user_metadata.homepage)).netloc.lower()
-
if domain:
-
domains.add(domain)
-
-
user_domains[username] = domains
-
-
return user_domains
-
-
def _build_url_to_entry_mapping(self, git_store: "GitStore") -> dict[str, str]:
-
"""Build a comprehensive mapping from URLs to entry IDs using git store data.
-
-
This creates a bidirectional mapping that handles:
-
- Entry link URLs -> Entry IDs
-
- URL variations (with/without www, http/https)
-
- Multiple URLs pointing to the same entry
-
"""
-
url_to_entry: dict[str, str] = {}
-
-
# Load index to get all users
-
index = git_store._load_index()
-
-
for username in index.users.keys():
-
entries = git_store.list_entries(username)
-
-
for entry in entries:
-
if entry.link:
-
link_url = str(entry.link)
-
entry_id = entry.id
-
-
# Map the canonical link URL
-
url_to_entry[link_url] = entry_id
-
-
# Handle common URL variations
-
parsed = urlparse(link_url)
-
if parsed.netloc and parsed.path:
-
# Add version without www
-
if parsed.netloc.startswith('www.'):
-
no_www_url = f"{parsed.scheme}://{parsed.netloc[4:]}{parsed.path}"
-
if parsed.query:
-
no_www_url += f"?{parsed.query}"
-
if parsed.fragment:
-
no_www_url += f"#{parsed.fragment}"
-
url_to_entry[no_www_url] = entry_id
-
-
# Add version with www if not present
-
elif not parsed.netloc.startswith('www.'):
-
www_url = f"{parsed.scheme}://www.{parsed.netloc}{parsed.path}"
-
if parsed.query:
-
www_url += f"?{parsed.query}"
-
if parsed.fragment:
-
www_url += f"#{parsed.fragment}"
-
url_to_entry[www_url] = entry_id
-
-
# Add http/https variations
-
if parsed.scheme == 'https':
-
http_url = link_url.replace('https://', 'http://', 1)
-
url_to_entry[http_url] = entry_id
-
elif parsed.scheme == 'http':
-
https_url = link_url.replace('http://', 'https://', 1)
-
url_to_entry[https_url] = entry_id
-
-
return url_to_entry
-
-
def _normalize_url(self, url: str) -> str:
-
"""Normalize URL for consistent matching.
-
-
Handles common variations like trailing slashes, fragments, etc.
-
"""
-
parsed = urlparse(url)
-
-
# Remove trailing slash from path
-
path = parsed.path.rstrip('/') if parsed.path != '/' else parsed.path
-
-
# Reconstruct without fragment for consistent matching
-
normalized = f"{parsed.scheme}://{parsed.netloc}{path}"
-
if parsed.query:
-
normalized += f"?{parsed.query}"
-
-
return normalized
-
-
def resolve_target_entry_ids(
-
self, references: list[BlogReference], git_store: "GitStore"
-
) -> list[BlogReference]:
-
"""Resolve target_entry_id for references using comprehensive URL mapping."""
-
resolved_refs = []
-
-
# Build comprehensive URL to entry ID mapping
-
url_to_entry = self._build_url_to_entry_mapping(git_store)
-
-
for ref in references:
-
# If we already have a target_entry_id, keep the reference as-is
-
if ref.target_entry_id is not None:
-
resolved_refs.append(ref)
-
continue
-
-
# If we don't have a target_username, we can't resolve it
-
if ref.target_username is None:
-
resolved_refs.append(ref)
-
continue
-
-
# Try to resolve using URL mapping
-
resolved_entry_id = None
-
-
# First, try exact match
-
if ref.target_url in url_to_entry:
-
resolved_entry_id = url_to_entry[ref.target_url]
-
else:
-
# Try normalized URL matching
-
normalized_target = self._normalize_url(ref.target_url)
-
if normalized_target in url_to_entry:
-
resolved_entry_id = url_to_entry[normalized_target]
-
else:
-
# Try URL variations
-
for mapped_url, entry_id in url_to_entry.items():
-
if self._normalize_url(mapped_url) == normalized_target:
-
resolved_entry_id = entry_id
-
break
-
-
# Verify the resolved entry belongs to the target username
-
if resolved_entry_id:
-
# Double-check by loading the actual entry
-
entries = git_store.list_entries(ref.target_username)
-
entry_found = any(entry.id == resolved_entry_id for entry in entries)
-
if not entry_found:
-
resolved_entry_id = None
-
-
# Create a new reference with the resolved target_entry_id
-
resolved_ref = BlogReference(
-
source_entry_id=ref.source_entry_id,
-
source_username=ref.source_username,
-
target_url=ref.target_url,
-
target_username=ref.target_username,
-
target_entry_id=resolved_entry_id,
-
)
-
resolved_refs.append(resolved_ref)
-
-
return resolved_refs
···
+25 -29
src/thicket/models/config.py
···
"""Configuration models for thicket."""
-
import json
-
import yaml
from pathlib import Path
-
from typing import Optional, Union
-
from pydantic import BaseModel, EmailStr, HttpUrl, ValidationError
from pydantic_settings import BaseSettings, SettingsConfigDict
···
cache_dir: Path
users: list[UserConfig] = []
-
@classmethod
-
def from_file(cls, config_path: Path) -> 'ThicketConfig':
-
"""Load configuration from a file."""
-
if not config_path.exists():
-
raise FileNotFoundError(f"Configuration file not found: {config_path}")
-
-
content = config_path.read_text(encoding='utf-8')
-
-
if config_path.suffix.lower() in ['.yaml', '.yml']:
-
try:
-
data = yaml.safe_load(content)
-
except yaml.YAMLError as e:
-
raise ValueError(f"Invalid YAML in {config_path}: {e}")
-
elif config_path.suffix.lower() == '.json':
-
try:
-
data = json.loads(content)
-
except json.JSONDecodeError as e:
-
raise ValueError(f"Invalid JSON in {config_path}: {e}")
-
else:
-
raise ValueError(f"Unsupported configuration file format: {config_path.suffix}")
-
-
try:
-
return cls(**data)
-
except ValidationError as e:
-
raise ValueError(f"Configuration validation error: {e}")
···
"""Configuration models for thicket."""
from pathlib import Path
+
from typing import Optional
+
from pydantic import BaseModel, EmailStr, HttpUrl
from pydantic_settings import BaseSettings, SettingsConfigDict
···
cache_dir: Path
users: list[UserConfig] = []
+
def find_user(self, username: str) -> Optional[UserConfig]:
+
"""Find a user by username."""
+
for user in self.users:
+
if user.username == username:
+
return user
+
return None
+
+
def add_user(self, user: UserConfig) -> bool:
+
"""Add a user to the configuration. Returns True if added, False if already exists."""
+
if self.find_user(user.username) is not None:
+
return False
+
self.users.append(user)
+
return True
+
+
def add_feed_to_user(self, username: str, feed_url: HttpUrl) -> bool:
+
"""Add a feed to an existing user. Returns True if added, False if user not found or feed already exists."""
+
user = self.find_user(username)
+
if user is None:
+
return False
+
if feed_url in user.feeds:
+
return False
+
user.feeds.append(feed_url)
+
return True
+2 -2
src/thicket/models/feed.py
···
"""Feed and entry models for thicket."""
from datetime import datetime
-
from typing import TYPE_CHECKING, Optional
from pydantic import BaseModel, ConfigDict, EmailStr, HttpUrl
···
summary: Optional[str] = None
content: Optional[str] = None # Full body content from Atom entry
content_type: Optional[str] = "html" # text, html, xhtml
-
author: Optional[dict] = None
categories: list[str] = []
rights: Optional[str] = None # Copyright info
source: Optional[str] = None # Source feed URL
···
"""Feed and entry models for thicket."""
from datetime import datetime
+
from typing import TYPE_CHECKING, Any, Optional
from pydantic import BaseModel, ConfigDict, EmailStr, HttpUrl
···
summary: Optional[str] = None
content: Optional[str] = None # Full body content from Atom entry
content_type: Optional[str] = "html" # text, html, xhtml
+
author: Optional[dict[str, Any]] = None
categories: list[str] = []
rights: Optional[str] = None # Copyright info
source: Optional[str] = None # Source feed URL
+1 -3
src/thicket/models/user.py
···
class GitStoreIndex(BaseModel):
"""Index of all users and their directories in the Git store."""
-
model_config = ConfigDict(
-
json_encoders={datetime: lambda v: v.isoformat()}
-
)
users: dict[str, UserMetadata] = {} # username -> UserMetadata
created: datetime
···
class GitStoreIndex(BaseModel):
"""Index of all users and their directories in the Git store."""
+
model_config = ConfigDict(json_encoders={datetime: lambda v: v.isoformat()})
users: dict[str, UserMetadata] = {} # username -> UserMetadata
created: datetime
-1
src/thicket/subsystems/__init__.py
···
-
"""Thicket subsystems for specialized operations."""
···
-227
src/thicket/subsystems/feeds.py
···
-
"""Feed management subsystem."""
-
-
import asyncio
-
import json
-
from datetime import datetime
-
from pathlib import Path
-
from typing import Callable, Optional
-
-
from pydantic import HttpUrl
-
-
from ..core.feed_parser import FeedParser
-
from ..core.git_store import GitStore
-
from ..models import AtomEntry, ThicketConfig
-
-
-
class FeedManager:
-
"""Manages feed operations and caching."""
-
-
def __init__(self, git_store: GitStore, feed_parser: FeedParser, config: ThicketConfig):
-
"""Initialize feed manager."""
-
self.git_store = git_store
-
self.feed_parser = feed_parser
-
self.config = config
-
self._ensure_cache_dir()
-
-
def _ensure_cache_dir(self):
-
"""Ensure cache directory exists."""
-
self.config.cache_dir.mkdir(parents=True, exist_ok=True)
-
-
async def sync_feeds(self, username: Optional[str] = None, progress_callback: Optional[Callable] = None) -> dict:
-
"""Sync feeds for all users or specific user."""
-
if username:
-
return await self.sync_user_feeds(username, progress_callback)
-
-
# Sync all users
-
results = {}
-
total_users = len(self.config.users)
-
-
for i, user_config in enumerate(self.config.users):
-
if progress_callback:
-
progress_callback(f"Syncing feeds for {user_config.username}", i, total_users)
-
-
user_results = await self.sync_user_feeds(user_config.username, progress_callback)
-
results[user_config.username] = user_results
-
-
return results
-
-
async def sync_user_feeds(self, username: str, progress_callback: Optional[Callable] = None) -> dict:
-
"""Sync feeds for a specific user."""
-
user_config = next((u for u in self.config.users if u.username == username), None)
-
if not user_config:
-
return {'error': f'User {username} not found in configuration'}
-
-
# Ensure user exists in git store
-
git_user = self.git_store.get_user(username)
-
if not git_user:
-
self.git_store.add_user(
-
username=user_config.username,
-
display_name=user_config.display_name,
-
email=str(user_config.email) if user_config.email else None,
-
homepage=str(user_config.homepage) if user_config.homepage else None,
-
icon=str(user_config.icon) if user_config.icon else None,
-
feeds=[str(feed) for feed in user_config.feeds]
-
)
-
-
results = {
-
'username': username,
-
'feeds_processed': 0,
-
'new_entries': 0,
-
'errors': [],
-
'feeds': {}
-
}
-
-
total_feeds = len(user_config.feeds)
-
-
for i, feed_url in enumerate(user_config.feeds):
-
if progress_callback:
-
progress_callback(f"Processing feed {i+1}/{total_feeds} for {username}", i, total_feeds)
-
-
try:
-
feed_result = await self._sync_single_feed(username, feed_url)
-
results['feeds'][str(feed_url)] = feed_result
-
results['feeds_processed'] += 1
-
results['new_entries'] += feed_result.get('new_entries', 0)
-
except Exception as e:
-
error_msg = f"Error syncing {feed_url}: {str(e)}"
-
results['errors'].append(error_msg)
-
results['feeds'][str(feed_url)] = {'error': error_msg}
-
-
return results
-
-
async def _sync_single_feed(self, username: str, feed_url: HttpUrl) -> dict:
-
"""Sync a single feed for a user."""
-
cache_key = self._get_cache_key(username, feed_url)
-
last_modified = self._get_last_modified(cache_key)
-
-
try:
-
# Fetch feed content
-
content = await self.feed_parser.fetch_feed(feed_url)
-
-
# Parse feed
-
feed_meta, entries = self.feed_parser.parse_feed(content, feed_url)
-
-
# Filter new entries
-
new_entries = []
-
for entry in entries:
-
existing_entry = self.git_store.get_entry(username, entry.id)
-
if not existing_entry:
-
new_entries.append(entry)
-
-
# Store new entries
-
stored_count = 0
-
for entry in new_entries:
-
if self.git_store.store_entry(username, entry):
-
stored_count += 1
-
-
# Update cache
-
self._update_cache(cache_key, {
-
'last_fetched': datetime.now().isoformat(),
-
'feed_meta': feed_meta.model_dump(exclude_none=True),
-
'entry_count': len(entries),
-
'new_entries': stored_count,
-
'feed_url': str(feed_url)
-
})
-
-
return {
-
'success': True,
-
'total_entries': len(entries),
-
'new_entries': stored_count,
-
'feed_title': feed_meta.title,
-
'last_fetched': datetime.now().isoformat()
-
}
-
-
except Exception as e:
-
return {
-
'success': False,
-
'error': str(e),
-
'feed_url': str(feed_url)
-
}
-
-
def get_entries(self, username: str, limit: Optional[int] = None) -> list[AtomEntry]:
-
"""Get entries for a user."""
-
return self.git_store.list_entries(username, limit)
-
-
def get_entry(self, username: str, entry_id: str) -> Optional[AtomEntry]:
-
"""Get a specific entry."""
-
return self.git_store.get_entry(username, entry_id)
-
-
def search_entries(self, query: str, username: Optional[str] = None, limit: Optional[int] = None) -> list[tuple[str, AtomEntry]]:
-
"""Search entries across users."""
-
return self.git_store.search_entries(query, username, limit)
-
-
def get_stats(self) -> dict:
-
"""Get feed-related statistics."""
-
index = self.git_store._load_index()
-
-
feed_stats = {
-
'total_feeds_configured': sum(len(user.feeds) for user in self.config.users),
-
'users_with_entries': len([u for u in index.users.values() if u.entry_count > 0]),
-
'cache_files': len(list(self.config.cache_dir.glob("*.json"))) if self.config.cache_dir.exists() else 0,
-
}
-
-
return feed_stats
-
-
def _get_cache_key(self, username: str, feed_url: HttpUrl) -> str:
-
"""Generate cache key for feed."""
-
# Simple hash of username and feed URL
-
import hashlib
-
key_data = f"{username}:{str(feed_url)}"
-
return hashlib.md5(key_data.encode()).hexdigest()
-
-
def _get_last_modified(self, cache_key: str) -> Optional[datetime]:
-
"""Get last modified time from cache."""
-
cache_file = self.config.cache_dir / f"{cache_key}.json"
-
if cache_file.exists():
-
try:
-
with open(cache_file) as f:
-
data = json.load(f)
-
return datetime.fromisoformat(data.get('last_fetched', ''))
-
except Exception:
-
pass
-
return None
-
-
def _update_cache(self, cache_key: str, data: dict):
-
"""Update cache with feed data."""
-
cache_file = self.config.cache_dir / f"{cache_key}.json"
-
try:
-
with open(cache_file, 'w') as f:
-
json.dump(data, f, indent=2)
-
except Exception:
-
# Cache update failure shouldn't break the sync
-
pass
-
-
def clear_cache(self, username: Optional[str] = None) -> bool:
-
"""Clear feed cache."""
-
try:
-
if username:
-
# Clear cache for specific user
-
for user_config in self.config.users:
-
if user_config.username == username:
-
for feed_url in user_config.feeds:
-
cache_key = self._get_cache_key(username, feed_url)
-
cache_file = self.config.cache_dir / f"{cache_key}.json"
-
if cache_file.exists():
-
cache_file.unlink()
-
else:
-
# Clear all cache
-
if self.config.cache_dir.exists():
-
for cache_file in self.config.cache_dir.glob("*.json"):
-
cache_file.unlink()
-
return True
-
except Exception:
-
return False
-
-
def get_feed_info(self, username: str, feed_url: str) -> Optional[dict]:
-
"""Get cached information about a specific feed."""
-
try:
-
feed_url_obj = HttpUrl(feed_url)
-
cache_key = self._get_cache_key(username, feed_url_obj)
-
cache_file = self.config.cache_dir / f"{cache_key}.json"
-
-
if cache_file.exists():
-
with open(cache_file) as f:
-
return json.load(f)
-
except Exception:
-
pass
-
return None
···
-304
src/thicket/subsystems/links.py
···
-
"""Link processing subsystem."""
-
-
import json
-
import re
-
from collections import defaultdict
-
from pathlib import Path
-
from typing import Optional
-
from urllib.parse import urljoin, urlparse
-
-
from ..core.git_store import GitStore
-
from ..models import AtomEntry, ThicketConfig
-
-
-
class LinkProcessor:
-
"""Processes and manages links between entries."""
-
-
def __init__(self, git_store: GitStore, config: ThicketConfig):
-
"""Initialize link processor."""
-
self.git_store = git_store
-
self.config = config
-
self.links_file = self.git_store.repo_path / "links.json"
-
-
def process_links(self, username: Optional[str] = None) -> dict:
-
"""Process and extract links from entries."""
-
if username:
-
return self._process_user_links(username)
-
-
# Process all users
-
results = {}
-
index = self.git_store._load_index()
-
-
for user_metadata in index.users.values():
-
user_results = self._process_user_links(user_metadata.username)
-
results[user_metadata.username] = user_results
-
-
# Consolidate all links
-
self._consolidate_links()
-
-
return results
-
-
def _process_user_links(self, username: str) -> dict:
-
"""Process links for a specific user."""
-
entries = self.git_store.list_entries(username)
-
-
results = {
-
'username': username,
-
'entries_processed': 0,
-
'links_found': 0,
-
'external_links': 0,
-
'internal_links': 0,
-
}
-
-
links_data = self._load_links_data()
-
-
for entry in entries:
-
entry_links = self._extract_links_from_entry(entry)
-
-
if entry_links:
-
# Store links for this entry
-
entry_key = f"{username}:{entry.id}"
-
links_data[entry_key] = {
-
'entry_id': entry.id,
-
'username': username,
-
'title': entry.title,
-
'links': entry_links,
-
'processed_at': entry.updated.isoformat() if entry.updated else None,
-
}
-
-
results['links_found'] += len(entry_links)
-
results['external_links'] += len([l for l in entry_links if self._is_external_link(l['url'])])
-
results['internal_links'] += len([l for l in entry_links if not self._is_external_link(l['url'])])
-
-
results['entries_processed'] += 1
-
-
self._save_links_data(links_data)
-
-
return results
-
-
def _extract_links_from_entry(self, entry: AtomEntry) -> list[dict]:
-
"""Extract links from an entry's content."""
-
links = []
-
-
# Combine content and summary for link extraction
-
text_content = ""
-
if entry.content:
-
text_content += entry.content
-
if entry.summary:
-
text_content += " " + entry.summary
-
-
if not text_content:
-
return links
-
-
# Extract HTML links
-
html_link_pattern = r'<a[^>]+href=["\']([^"\']+)["\'][^>]*>([^<]*)</a>'
-
html_matches = re.findall(html_link_pattern, text_content, re.IGNORECASE)
-
-
for url, text in html_matches:
-
# Clean up the URL
-
url = url.strip()
-
text = text.strip()
-
-
if url and url not in ['#', 'javascript:void(0)']:
-
# Resolve relative URLs if possible
-
if entry.link and url.startswith('/'):
-
base_url = str(entry.link)
-
parsed_base = urlparse(base_url)
-
base_domain = f"{parsed_base.scheme}://{parsed_base.netloc}"
-
url = urljoin(base_domain, url)
-
-
links.append({
-
'url': url,
-
'text': text or url,
-
'type': 'html'
-
})
-
-
# Extract markdown links
-
markdown_link_pattern = r'\[([^\]]*)\]\(([^\)]+)\)'
-
markdown_matches = re.findall(markdown_link_pattern, text_content)
-
-
for text, url in markdown_matches:
-
url = url.strip()
-
text = text.strip()
-
-
if url and url not in ['#']:
-
links.append({
-
'url': url,
-
'text': text or url,
-
'type': 'markdown'
-
})
-
-
# Extract plain URLs
-
url_pattern = r'https?://[^\s<>"]+[^\s<>".,;!?]'
-
url_matches = re.findall(url_pattern, text_content)
-
-
for url in url_matches:
-
# Skip if already found as HTML or markdown link
-
if not any(link['url'] == url for link in links):
-
links.append({
-
'url': url,
-
'text': url,
-
'type': 'plain'
-
})
-
-
return links
-
-
def _is_external_link(self, url: str) -> bool:
-
"""Check if a link is external to the configured domains."""
-
try:
-
parsed = urlparse(url)
-
domain = parsed.netloc.lower()
-
-
# Check against user domains from feeds
-
for user_config in self.config.users:
-
for feed_url in user_config.feeds:
-
feed_domain = urlparse(str(feed_url)).netloc.lower()
-
if domain == feed_domain or domain.endswith(f'.{feed_domain}'):
-
return False
-
-
# Check homepage domain
-
if user_config.homepage:
-
homepage_domain = urlparse(str(user_config.homepage)).netloc.lower()
-
if domain == homepage_domain or domain.endswith(f'.{homepage_domain}'):
-
return False
-
-
return True
-
except Exception:
-
return True
-
-
def _load_links_data(self) -> dict:
-
"""Load existing links data."""
-
if self.links_file.exists():
-
try:
-
with open(self.links_file) as f:
-
return json.load(f)
-
except Exception:
-
pass
-
return {}
-
-
def _save_links_data(self, links_data: dict):
-
"""Save links data to file."""
-
try:
-
with open(self.links_file, 'w') as f:
-
json.dump(links_data, f, indent=2, ensure_ascii=False)
-
except Exception:
-
# Link processing failure shouldn't break the main operation
-
pass
-
-
def _consolidate_links(self):
-
"""Consolidate and create reverse link mappings."""
-
links_data = self._load_links_data()
-
-
# Create URL to entries mapping
-
url_mapping = defaultdict(list)
-
-
for entry_key, entry_data in links_data.items():
-
for link in entry_data.get('links', []):
-
url_mapping[link['url']].append({
-
'entry_key': entry_key,
-
'username': entry_data['username'],
-
'entry_id': entry_data['entry_id'],
-
'title': entry_data['title'],
-
'link_text': link['text'],
-
'link_type': link['type'],
-
})
-
-
# Save URL mapping
-
url_mapping_file = self.git_store.repo_path / "url_mapping.json"
-
try:
-
with open(url_mapping_file, 'w') as f:
-
json.dump(dict(url_mapping), f, indent=2, ensure_ascii=False)
-
except Exception:
-
pass
-
-
def get_links(self, username: Optional[str] = None) -> dict:
-
"""Get processed links."""
-
links_data = self._load_links_data()
-
-
if username:
-
user_links = {k: v for k, v in links_data.items() if v.get('username') == username}
-
return user_links
-
-
return links_data
-
-
def find_references(self, url: str) -> list[tuple[str, AtomEntry]]:
-
"""Find entries that reference a URL."""
-
url_mapping_file = self.git_store.repo_path / "url_mapping.json"
-
-
if not url_mapping_file.exists():
-
return []
-
-
try:
-
with open(url_mapping_file) as f:
-
url_mapping = json.load(f)
-
-
references = url_mapping.get(url, [])
-
results = []
-
-
for ref in references:
-
entry = self.git_store.get_entry(ref['username'], ref['entry_id'])
-
if entry:
-
results.append((ref['username'], entry))
-
-
return results
-
except Exception:
-
return []
-
-
def get_stats(self) -> dict:
-
"""Get link processing statistics."""
-
links_data = self._load_links_data()
-
-
total_entries_with_links = len(links_data)
-
total_links = sum(len(entry_data.get('links', [])) for entry_data in links_data.values())
-
-
external_links = 0
-
internal_links = 0
-
-
for entry_data in links_data.values():
-
for link in entry_data.get('links', []):
-
if self._is_external_link(link['url']):
-
external_links += 1
-
else:
-
internal_links += 1
-
-
# Count unique URLs
-
unique_urls = set()
-
for entry_data in links_data.values():
-
for link in entry_data.get('links', []):
-
unique_urls.add(link['url'])
-
-
return {
-
'entries_with_links': total_entries_with_links,
-
'total_links': total_links,
-
'unique_urls': len(unique_urls),
-
'external_links': external_links,
-
'internal_links': internal_links,
-
}
-
-
def get_most_referenced_urls(self, limit: int = 10) -> list[dict]:
-
"""Get most frequently referenced URLs."""
-
url_mapping_file = self.git_store.repo_path / "url_mapping.json"
-
-
if not url_mapping_file.exists():
-
return []
-
-
try:
-
with open(url_mapping_file) as f:
-
url_mapping = json.load(f)
-
-
# Count references per URL
-
url_counts = [(url, len(refs)) for url, refs in url_mapping.items()]
-
url_counts.sort(key=lambda x: x[1], reverse=True)
-
-
results = []
-
for url, count in url_counts[:limit]:
-
results.append({
-
'url': url,
-
'reference_count': count,
-
'is_external': self._is_external_link(url),
-
'references': url_mapping[url]
-
})
-
-
return results
-
except Exception:
-
return []
···
-158
src/thicket/subsystems/repository.py
···
-
"""Repository management subsystem."""
-
-
import shutil
-
from datetime import datetime
-
from pathlib import Path
-
from typing import Optional
-
-
from ..core.git_store import GitStore
-
from ..models import ThicketConfig
-
-
-
class RepositoryManager:
-
"""Manages repository operations and metadata."""
-
-
def __init__(self, git_store: GitStore, config: ThicketConfig):
-
"""Initialize repository manager."""
-
self.git_store = git_store
-
self.config = config
-
-
def init_repository(self) -> bool:
-
"""Initialize the git repository if not already done."""
-
try:
-
# GitStore.__init__ already handles repository initialization
-
return True
-
except Exception:
-
return False
-
-
def commit_changes(self, message: str) -> bool:
-
"""Commit all pending changes."""
-
try:
-
self.git_store.commit_changes(message)
-
return True
-
except Exception:
-
return False
-
-
def get_status(self) -> dict:
-
"""Get repository status and statistics."""
-
try:
-
stats = self.git_store.get_stats()
-
-
# Add repository-specific information
-
repo_status = {
-
**stats,
-
'repository_path': str(self.config.git_store),
-
'cache_path': str(self.config.cache_dir),
-
'has_uncommitted_changes': self._has_uncommitted_changes(),
-
'last_commit': self._get_last_commit_info(),
-
}
-
-
return repo_status
-
except Exception as e:
-
return {'error': str(e)}
-
-
def backup_repository(self, backup_path: Path) -> bool:
-
"""Create a backup of the repository."""
-
try:
-
if backup_path.exists():
-
shutil.rmtree(backup_path)
-
-
shutil.copytree(self.config.git_store, backup_path)
-
return True
-
except Exception:
-
return False
-
-
def cleanup_cache(self) -> bool:
-
"""Clean up cache directory."""
-
try:
-
if self.config.cache_dir.exists():
-
shutil.rmtree(self.config.cache_dir)
-
self.config.cache_dir.mkdir(parents=True, exist_ok=True)
-
return True
-
except Exception:
-
return False
-
-
def get_repository_size(self) -> dict:
-
"""Get detailed repository size information."""
-
try:
-
total_size = 0
-
file_count = 0
-
dir_count = 0
-
-
for path in self.config.git_store.rglob("*"):
-
if path.is_file():
-
total_size += path.stat().st_size
-
file_count += 1
-
elif path.is_dir():
-
dir_count += 1
-
-
return {
-
'total_size_bytes': total_size,
-
'total_size_mb': round(total_size / (1024 * 1024), 2),
-
'file_count': file_count,
-
'directory_count': dir_count,
-
}
-
except Exception as e:
-
return {'error': str(e)}
-
-
def _has_uncommitted_changes(self) -> bool:
-
"""Check if there are uncommitted changes."""
-
try:
-
if not self.git_store.repo:
-
return False
-
return bool(self.git_store.repo.index.diff("HEAD") or self.git_store.repo.untracked_files)
-
except Exception:
-
return False
-
-
def _get_last_commit_info(self) -> Optional[dict]:
-
"""Get information about the last commit."""
-
try:
-
if not self.git_store.repo:
-
return None
-
-
last_commit = self.git_store.repo.head.commit
-
return {
-
'hash': last_commit.hexsha[:8],
-
'message': last_commit.message.strip(),
-
'author': str(last_commit.author),
-
'date': datetime.fromtimestamp(last_commit.committed_date).isoformat(),
-
}
-
except Exception:
-
return None
-
-
def verify_integrity(self) -> dict:
-
"""Verify repository integrity."""
-
issues = []
-
-
# Check if git repository is valid
-
try:
-
if not self.git_store.repo:
-
issues.append("Git repository not initialized")
-
except Exception as e:
-
issues.append(f"Git repository error: {e}")
-
-
# Check if index.json exists and is valid
-
index_path = self.config.git_store / "index.json"
-
if not index_path.exists():
-
issues.append("index.json missing")
-
else:
-
try:
-
self.git_store._load_index()
-
except Exception as e:
-
issues.append(f"index.json corrupted: {e}")
-
-
# Check if duplicates.json exists
-
duplicates_path = self.config.git_store / "duplicates.json"
-
if not duplicates_path.exists():
-
issues.append("duplicates.json missing")
-
else:
-
try:
-
self.git_store._load_duplicates()
-
except Exception as e:
-
issues.append(f"duplicates.json corrupted: {e}")
-
-
return {
-
'is_valid': len(issues) == 0,
-
'issues': issues,
-
'checked_at': datetime.now().isoformat(),
-
}
···
-319
src/thicket/subsystems/site.py
···
-
"""Site generation subsystem."""
-
-
import json
-
import shutil
-
from datetime import datetime
-
from pathlib import Path
-
from typing import Optional
-
-
from jinja2 import Environment, FileSystemLoader, select_autoescape
-
-
from ..core.git_store import GitStore
-
from ..models import ThicketConfig
-
-
-
class SiteGenerator:
-
"""Generates static sites from stored entries."""
-
-
def __init__(self, git_store: GitStore, config: ThicketConfig):
-
"""Initialize site generator."""
-
self.git_store = git_store
-
self.config = config
-
self.default_template_dir = Path(__file__).parent.parent / "templates"
-
-
def generate_site(self, output_dir: Path, template_dir: Optional[Path] = None) -> bool:
-
"""Generate complete static site."""
-
try:
-
# Setup template environment
-
template_dir = template_dir or self.default_template_dir
-
if not template_dir.exists():
-
return False
-
-
env = Environment(
-
loader=FileSystemLoader(str(template_dir)),
-
autoescape=select_autoescape(['html', 'xml'])
-
)
-
-
# Prepare output directory
-
output_dir.mkdir(parents=True, exist_ok=True)
-
-
# Copy static assets
-
self._copy_static_assets(template_dir, output_dir)
-
-
# Generate pages
-
self._generate_index_page(env, output_dir)
-
self._generate_timeline_page(env, output_dir)
-
self._generate_users_page(env, output_dir)
-
self._generate_links_page(env, output_dir)
-
self._generate_user_detail_pages(env, output_dir)
-
-
return True
-
except Exception:
-
return False
-
-
def generate_timeline(self, output_path: Path, limit: Optional[int] = None) -> bool:
-
"""Generate timeline HTML file."""
-
try:
-
env = Environment(
-
loader=FileSystemLoader(str(self.default_template_dir)),
-
autoescape=select_autoescape(['html', 'xml'])
-
)
-
-
timeline_data = self._get_timeline_data(limit)
-
template = env.get_template('timeline.html')
-
-
content = template.render(**timeline_data)
-
-
output_path.parent.mkdir(parents=True, exist_ok=True)
-
with open(output_path, 'w', encoding='utf-8') as f:
-
f.write(content)
-
-
return True
-
except Exception:
-
return False
-
-
def generate_user_pages(self, output_dir: Path) -> bool:
-
"""Generate individual user pages."""
-
try:
-
env = Environment(
-
loader=FileSystemLoader(str(self.default_template_dir)),
-
autoescape=select_autoescape(['html', 'xml'])
-
)
-
-
return self._generate_user_detail_pages(env, output_dir)
-
except Exception:
-
return False
-
-
def _copy_static_assets(self, template_dir: Path, output_dir: Path):
-
"""Copy CSS, JS, and other static assets."""
-
static_files = ['style.css', 'script.js']
-
-
for filename in static_files:
-
src_file = template_dir / filename
-
if src_file.exists():
-
dst_file = output_dir / filename
-
shutil.copy2(src_file, dst_file)
-
-
def _generate_index_page(self, env: Environment, output_dir: Path):
-
"""Generate main index page."""
-
template = env.get_template('index.html')
-
-
# Get summary statistics
-
stats = self.git_store.get_stats()
-
index = self.git_store._load_index()
-
-
# Recent entries
-
recent_entries = []
-
for username in index.users.keys():
-
user_entries = self.git_store.list_entries(username, limit=5)
-
for entry in user_entries:
-
recent_entries.append({
-
'username': username,
-
'entry': entry
-
})
-
-
# Sort by date
-
recent_entries.sort(key=lambda x: x['entry'].updated or x['entry'].published, reverse=True)
-
recent_entries = recent_entries[:10]
-
-
context = {
-
'title': 'Thicket Feed Archive',
-
'stats': stats,
-
'recent_entries': recent_entries,
-
'users': list(index.users.values()),
-
'generated_at': datetime.now().isoformat(),
-
}
-
-
content = template.render(**context)
-
-
with open(output_dir / 'index.html', 'w', encoding='utf-8') as f:
-
f.write(content)
-
-
def _generate_timeline_page(self, env: Environment, output_dir: Path):
-
"""Generate timeline page."""
-
template = env.get_template('timeline.html')
-
timeline_data = self._get_timeline_data()
-
-
content = template.render(**timeline_data)
-
-
with open(output_dir / 'timeline.html', 'w', encoding='utf-8') as f:
-
f.write(content)
-
-
def _generate_users_page(self, env: Environment, output_dir: Path):
-
"""Generate users overview page."""
-
template = env.get_template('users.html')
-
-
index = self.git_store._load_index()
-
users_data = []
-
-
for user_metadata in index.users.values():
-
# Get user config for additional details
-
user_config = next(
-
(u for u in self.config.users if u.username == user_metadata.username),
-
None
-
)
-
-
# Get recent entries
-
recent_entries = self.git_store.list_entries(user_metadata.username, limit=3)
-
-
users_data.append({
-
'metadata': user_metadata,
-
'config': user_config,
-
'recent_entries': recent_entries,
-
})
-
-
# Sort by entry count
-
users_data.sort(key=lambda x: x['metadata'].entry_count, reverse=True)
-
-
context = {
-
'title': 'Users',
-
'users': users_data,
-
'generated_at': datetime.now().isoformat(),
-
}
-
-
content = template.render(**context)
-
-
with open(output_dir / 'users.html', 'w', encoding='utf-8') as f:
-
f.write(content)
-
-
def _generate_links_page(self, env: Environment, output_dir: Path):
-
"""Generate links overview page."""
-
template = env.get_template('links.html')
-
-
# Load links data
-
links_file = self.git_store.repo_path / "links.json"
-
url_mapping_file = self.git_store.repo_path / "url_mapping.json"
-
-
links_data = {}
-
url_mapping = {}
-
-
if links_file.exists():
-
try:
-
with open(links_file) as f:
-
links_data = json.load(f)
-
except Exception:
-
pass
-
-
if url_mapping_file.exists():
-
try:
-
with open(url_mapping_file) as f:
-
url_mapping = json.load(f)
-
except Exception:
-
pass
-
-
# Process most referenced URLs
-
url_counts = [(url, len(refs)) for url, refs in url_mapping.items()]
-
url_counts.sort(key=lambda x: x[1], reverse=True)
-
most_referenced = url_counts[:20]
-
-
# Count links by type
-
link_stats = {
-
'total_entries_with_links': len(links_data),
-
'total_links': sum(len(entry_data.get('links', [])) for entry_data in links_data.values()),
-
'unique_urls': len(url_mapping),
-
}
-
-
context = {
-
'title': 'Links',
-
'most_referenced': most_referenced,
-
'url_mapping': url_mapping,
-
'link_stats': link_stats,
-
'generated_at': datetime.now().isoformat(),
-
}
-
-
content = template.render(**context)
-
-
with open(output_dir / 'links.html', 'w', encoding='utf-8') as f:
-
f.write(content)
-
-
def _generate_user_detail_pages(self, env: Environment, output_dir: Path) -> bool:
-
"""Generate individual user detail pages."""
-
try:
-
template = env.get_template('user_detail.html')
-
index = self.git_store._load_index()
-
-
# Create users subdirectory
-
users_dir = output_dir / 'users'
-
users_dir.mkdir(exist_ok=True)
-
-
for user_metadata in index.users.values():
-
user_config = next(
-
(u for u in self.config.users if u.username == user_metadata.username),
-
None
-
)
-
-
entries = self.git_store.list_entries(user_metadata.username)
-
-
# Get user's links
-
links_file = self.git_store.repo_path / "links.json"
-
user_links = []
-
if links_file.exists():
-
try:
-
with open(links_file) as f:
-
all_links = json.load(f)
-
user_links = [
-
data for key, data in all_links.items()
-
if data.get('username') == user_metadata.username
-
]
-
except Exception:
-
pass
-
-
context = {
-
'title': f"{user_metadata.display_name or user_metadata.username}",
-
'user_metadata': user_metadata,
-
'user_config': user_config,
-
'entries': entries,
-
'user_links': user_links,
-
'generated_at': datetime.now().isoformat(),
-
}
-
-
content = template.render(**context)
-
-
user_file = users_dir / f"{user_metadata.username}.html"
-
with open(user_file, 'w', encoding='utf-8') as f:
-
f.write(content)
-
-
return True
-
except Exception:
-
return False
-
-
def _get_timeline_data(self, limit: Optional[int] = None) -> dict:
-
"""Get data for timeline page."""
-
index = self.git_store._load_index()
-
-
# Collect all entries with metadata
-
all_entries = []
-
for user_metadata in index.users.values():
-
user_entries = self.git_store.list_entries(user_metadata.username)
-
for entry in user_entries:
-
all_entries.append({
-
'username': user_metadata.username,
-
'display_name': user_metadata.display_name,
-
'entry': entry,
-
})
-
-
# Sort by date (newest first)
-
all_entries.sort(
-
key=lambda x: x['entry'].updated or x['entry'].published or datetime.min,
-
reverse=True
-
)
-
-
if limit:
-
all_entries = all_entries[:limit]
-
-
# Group by date for timeline display
-
timeline_groups = {}
-
for item in all_entries:
-
entry_date = item['entry'].updated or item['entry'].published
-
if entry_date:
-
date_key = entry_date.strftime('%Y-%m-%d')
-
if date_key not in timeline_groups:
-
timeline_groups[date_key] = []
-
timeline_groups[date_key].append(item)
-
-
return {
-
'title': 'Timeline',
-
'timeline_groups': timeline_groups,
-
'total_entries': len(all_entries),
-
'generated_at': datetime.now().isoformat(),
-
}
···
-254
src/thicket/subsystems/users.py
···
-
"""User management subsystem."""
-
-
import shutil
-
from typing import Optional
-
-
from pydantic import EmailStr, HttpUrl, ValidationError
-
-
from ..core.git_store import GitStore
-
from ..models import ThicketConfig, UserConfig, UserMetadata
-
-
-
class UserManager:
-
"""Manages user operations and metadata."""
-
-
def __init__(self, git_store: GitStore, config: ThicketConfig):
-
"""Initialize user manager."""
-
self.git_store = git_store
-
self.config = config
-
-
def add_user(self, username: str, feeds: list[str], **kwargs) -> UserConfig:
-
"""Add a new user with feeds."""
-
# Validate feeds
-
validated_feeds = []
-
for feed in feeds:
-
try:
-
validated_feeds.append(HttpUrl(feed))
-
except ValidationError as e:
-
raise ValueError(f"Invalid feed URL '{feed}': {e}")
-
-
# Validate optional fields
-
email = None
-
if 'email' in kwargs and kwargs['email']:
-
try:
-
email = EmailStr(kwargs['email'])
-
except ValidationError as e:
-
raise ValueError(f"Invalid email '{kwargs['email']}': {e}")
-
-
homepage = None
-
if 'homepage' in kwargs and kwargs['homepage']:
-
try:
-
homepage = HttpUrl(kwargs['homepage'])
-
except ValidationError as e:
-
raise ValueError(f"Invalid homepage URL '{kwargs['homepage']}': {e}")
-
-
icon = None
-
if 'icon' in kwargs and kwargs['icon']:
-
try:
-
icon = HttpUrl(kwargs['icon'])
-
except ValidationError as e:
-
raise ValueError(f"Invalid icon URL '{kwargs['icon']}': {e}")
-
-
# Create user config
-
user_config = UserConfig(
-
username=username,
-
feeds=validated_feeds,
-
email=email,
-
homepage=homepage,
-
icon=icon,
-
display_name=kwargs.get('display_name')
-
)
-
-
# Add to git store
-
self.git_store.add_user(
-
username=username,
-
display_name=user_config.display_name,
-
email=str(user_config.email) if user_config.email else None,
-
homepage=str(user_config.homepage) if user_config.homepage else None,
-
icon=str(user_config.icon) if user_config.icon else None,
-
feeds=[str(feed) for feed in user_config.feeds]
-
)
-
-
# Add to config if not already present
-
existing_user = next((u for u in self.config.users if u.username == username), None)
-
if not existing_user:
-
self.config.users.append(user_config)
-
else:
-
# Update existing config
-
existing_user.feeds = user_config.feeds
-
existing_user.email = user_config.email
-
existing_user.homepage = user_config.homepage
-
existing_user.icon = user_config.icon
-
existing_user.display_name = user_config.display_name
-
-
return user_config
-
-
def get_user(self, username: str) -> Optional[UserConfig]:
-
"""Get user configuration."""
-
return next((u for u in self.config.users if u.username == username), None)
-
-
def get_user_metadata(self, username: str) -> Optional[UserMetadata]:
-
"""Get user metadata from git store."""
-
return self.git_store.get_user(username)
-
-
def list_users(self) -> list[UserConfig]:
-
"""List all configured users."""
-
return self.config.users.copy()
-
-
def list_users_with_metadata(self) -> list[tuple[UserConfig, Optional[UserMetadata]]]:
-
"""List users with their git store metadata."""
-
result = []
-
for user_config in self.config.users:
-
metadata = self.git_store.get_user(user_config.username)
-
result.append((user_config, metadata))
-
return result
-
-
def update_user(self, username: str, **kwargs) -> bool:
-
"""Update user configuration."""
-
# Update in config
-
user_config = self.get_user(username)
-
if not user_config:
-
return False
-
-
# Validate and update feeds if provided
-
if 'feeds' in kwargs:
-
validated_feeds = []
-
for feed in kwargs['feeds']:
-
try:
-
validated_feeds.append(HttpUrl(feed))
-
except ValidationError:
-
return False
-
user_config.feeds = validated_feeds
-
-
# Validate and update other fields
-
if 'email' in kwargs and kwargs['email']:
-
try:
-
user_config.email = EmailStr(kwargs['email'])
-
except ValidationError:
-
return False
-
elif 'email' in kwargs and not kwargs['email']:
-
user_config.email = None
-
-
if 'homepage' in kwargs and kwargs['homepage']:
-
try:
-
user_config.homepage = HttpUrl(kwargs['homepage'])
-
except ValidationError:
-
return False
-
elif 'homepage' in kwargs and not kwargs['homepage']:
-
user_config.homepage = None
-
-
if 'icon' in kwargs and kwargs['icon']:
-
try:
-
user_config.icon = HttpUrl(kwargs['icon'])
-
except ValidationError:
-
return False
-
elif 'icon' in kwargs and not kwargs['icon']:
-
user_config.icon = None
-
-
if 'display_name' in kwargs:
-
user_config.display_name = kwargs['display_name'] or None
-
-
# Update in git store
-
git_kwargs = {}
-
if 'feeds' in kwargs:
-
git_kwargs['feeds'] = [str(feed) for feed in user_config.feeds]
-
if user_config.email:
-
git_kwargs['email'] = str(user_config.email)
-
if user_config.homepage:
-
git_kwargs['homepage'] = str(user_config.homepage)
-
if user_config.icon:
-
git_kwargs['icon'] = str(user_config.icon)
-
if user_config.display_name:
-
git_kwargs['display_name'] = user_config.display_name
-
-
return self.git_store.update_user(username, **git_kwargs)
-
-
def remove_user(self, username: str) -> bool:
-
"""Remove a user and their data."""
-
# Remove from config
-
self.config.users = [u for u in self.config.users if u.username != username]
-
-
# Remove user directory from git store
-
user_metadata = self.git_store.get_user(username)
-
if user_metadata:
-
user_dir = self.git_store.repo_path / user_metadata.directory
-
if user_dir.exists():
-
try:
-
shutil.rmtree(user_dir)
-
except Exception:
-
return False
-
-
# Remove user from index
-
index = self.git_store._load_index()
-
if username in index.users:
-
del index.users[username]
-
self.git_store._save_index(index)
-
-
return True
-
-
def get_user_stats(self, username: str) -> Optional[dict]:
-
"""Get statistics for a specific user."""
-
user_metadata = self.git_store.get_user(username)
-
if not user_metadata:
-
return None
-
-
user_config = self.get_user(username)
-
entries = self.git_store.list_entries(username)
-
-
return {
-
'username': username,
-
'display_name': user_metadata.display_name,
-
'entry_count': user_metadata.entry_count,
-
'feeds_configured': len(user_config.feeds) if user_config else 0,
-
'directory': user_metadata.directory,
-
'created': user_metadata.created.isoformat() if user_metadata.created else None,
-
'last_updated': user_metadata.last_updated.isoformat() if user_metadata.last_updated else None,
-
'latest_entry': entries[0].updated.isoformat() if entries else None,
-
}
-
-
def validate_user_feeds(self, username: str) -> dict:
-
"""Validate all feeds for a user."""
-
user_config = self.get_user(username)
-
if not user_config:
-
return {'error': 'User not found'}
-
-
results = {
-
'username': username,
-
'total_feeds': len(user_config.feeds),
-
'valid_feeds': [],
-
'invalid_feeds': [],
-
}
-
-
for feed_url in user_config.feeds:
-
try:
-
# Basic URL validation - more comprehensive validation would require fetching
-
HttpUrl(str(feed_url))
-
results['valid_feeds'].append(str(feed_url))
-
except ValidationError as e:
-
results['invalid_feeds'].append({
-
'url': str(feed_url),
-
'error': str(e)
-
})
-
-
results['is_valid'] = len(results['invalid_feeds']) == 0
-
-
return results
-
-
def sync_config_with_git_store(self) -> bool:
-
"""Sync configuration users with git store."""
-
try:
-
for user_config in self.config.users:
-
git_user = self.git_store.get_user(user_config.username)
-
if not git_user:
-
# Add missing user to git store
-
self.git_store.add_user(
-
username=user_config.username,
-
display_name=user_config.display_name,
-
email=str(user_config.email) if user_config.email else None,
-
homepage=str(user_config.homepage) if user_config.homepage else None,
-
icon=str(user_config.icon) if user_config.icon else None,
-
feeds=[str(feed) for feed in user_config.feeds]
-
)
-
return True
-
except Exception:
-
return False
···
-31
src/thicket/templates/base.html
···
-
<!DOCTYPE html>
-
<html lang="en">
-
<head>
-
<meta charset="UTF-8">
-
<meta name="viewport" content="width=device-width, initial-scale=1.0">
-
<title>{% block page_title %}{{ title }}{% endblock %}</title>
-
<link rel="stylesheet" href="css/style.css">
-
</head>
-
<body>
-
<header class="site-header">
-
<div class="header-content">
-
<h1 class="site-title">{{ title }}</h1>
-
<nav class="site-nav">
-
<a href="timeline.html" class="nav-link {% if page == 'timeline' %}active{% endif %}">Timeline</a>
-
<a href="links.html" class="nav-link {% if page == 'links' %}active{% endif %}">Links</a>
-
<a href="users.html" class="nav-link {% if page == 'users' %}active{% endif %}">Users</a>
-
</nav>
-
</div>
-
</header>
-
-
<main class="main-content">
-
{% block content %}{% endblock %}
-
</main>
-
-
<footer class="site-footer">
-
<p>Generated on {{ generated_at }} by <a href="https://github.com/avsm/thicket">Thicket</a></p>
-
</footer>
-
-
<script src="js/script.js"></script>
-
</body>
-
</html>
···
-13
src/thicket/templates/index.html
···
-
<!DOCTYPE html>
-
<html lang="en">
-
<head>
-
<meta charset="UTF-8">
-
<meta name="viewport" content="width=device-width, initial-scale=1.0">
-
<title>{{ title }}</title>
-
<meta http-equiv="refresh" content="0; url=timeline.html">
-
<link rel="canonical" href="timeline.html">
-
</head>
-
<body>
-
<p>Redirecting to <a href="timeline.html">Timeline</a>...</p>
-
</body>
-
</html>
···
-38
src/thicket/templates/links.html
···
-
{% extends "base.html" %}
-
-
{% block page_title %}Outgoing Links - {{ title }}{% endblock %}
-
-
{% block content %}
-
<div class="page-content">
-
<h2>Outgoing Links</h2>
-
<p class="page-description">External links referenced in blog posts, ordered by most recent reference.</p>
-
-
{% for link in outgoing_links %}
-
<article class="link-group">
-
<h3 class="link-url">
-
<a href="{{ link.url }}" target="_blank">{{ link.url|truncate(80) }}</a>
-
{% if link.target_username %}
-
<span class="target-user">({{ link.target_username }})</span>
-
{% endif %}
-
</h3>
-
<div class="referencing-entries">
-
<span class="ref-count">Referenced in {{ link.entries|length }} post(s):</span>
-
<ul>
-
{% for display_name, entry in link.entries[:5] %}
-
<li>
-
<span class="author">{{ display_name }}</span> -
-
<a href="{{ entry.link }}" target="_blank">{{ entry.title }}</a>
-
<time datetime="{{ entry.updated or entry.published }}">
-
({{ (entry.updated or entry.published).strftime('%Y-%m-%d') }})
-
</time>
-
</li>
-
{% endfor %}
-
{% if link.entries|length > 5 %}
-
<li class="more">... and {{ link.entries|length - 5 }} more</li>
-
{% endif %}
-
</ul>
-
</div>
-
</article>
-
{% endfor %}
-
</div>
-
{% endblock %}
···
-88
src/thicket/templates/script.js
···
-
// Enhanced functionality for thicket website
-
document.addEventListener('DOMContentLoaded', function() {
-
-
// Enhance thread collapsing (optional feature)
-
const threadHeaders = document.querySelectorAll('.thread-header');
-
threadHeaders.forEach(header => {
-
header.style.cursor = 'pointer';
-
header.addEventListener('click', function() {
-
const thread = this.parentElement;
-
const entries = thread.querySelectorAll('.thread-entry');
-
-
// Toggle visibility of all but the first entry
-
for (let i = 1; i < entries.length; i++) {
-
entries[i].style.display = entries[i].style.display === 'none' ? 'block' : 'none';
-
}
-
-
// Update thread count text
-
const count = this.querySelector('.thread-count');
-
if (entries[1] && entries[1].style.display === 'none') {
-
count.textContent = count.textContent.replace('posts', 'posts (collapsed)');
-
} else {
-
count.textContent = count.textContent.replace(' (collapsed)', '');
-
}
-
});
-
});
-
-
// Add relative time display
-
const timeElements = document.querySelectorAll('time');
-
timeElements.forEach(timeEl => {
-
const datetime = new Date(timeEl.getAttribute('datetime'));
-
const now = new Date();
-
const diffMs = now - datetime;
-
const diffDays = Math.floor(diffMs / (1000 * 60 * 60 * 24));
-
-
let relativeTime;
-
if (diffDays === 0) {
-
const diffHours = Math.floor(diffMs / (1000 * 60 * 60));
-
if (diffHours === 0) {
-
const diffMinutes = Math.floor(diffMs / (1000 * 60));
-
relativeTime = diffMinutes === 0 ? 'just now' : `${diffMinutes}m ago`;
-
} else {
-
relativeTime = `${diffHours}h ago`;
-
}
-
} else if (diffDays === 1) {
-
relativeTime = 'yesterday';
-
} else if (diffDays < 7) {
-
relativeTime = `${diffDays}d ago`;
-
} else if (diffDays < 30) {
-
const weeks = Math.floor(diffDays / 7);
-
relativeTime = weeks === 1 ? '1w ago' : `${weeks}w ago`;
-
} else if (diffDays < 365) {
-
const months = Math.floor(diffDays / 30);
-
relativeTime = months === 1 ? '1mo ago' : `${months}mo ago`;
-
} else {
-
const years = Math.floor(diffDays / 365);
-
relativeTime = years === 1 ? '1y ago' : `${years}y ago`;
-
}
-
-
// Add relative time as title attribute
-
timeEl.setAttribute('title', timeEl.textContent);
-
timeEl.textContent = relativeTime;
-
});
-
-
// Enhanced anchor link scrolling for shared references
-
document.querySelectorAll('a[href^="#"]').forEach(anchor => {
-
anchor.addEventListener('click', function (e) {
-
e.preventDefault();
-
const target = document.querySelector(this.getAttribute('href'));
-
if (target) {
-
target.scrollIntoView({
-
behavior: 'smooth',
-
block: 'center'
-
});
-
-
// Highlight the target briefly
-
const timelineEntry = target.closest('.timeline-entry');
-
if (timelineEntry) {
-
timelineEntry.style.outline = '2px solid var(--primary-color)';
-
timelineEntry.style.borderRadius = '8px';
-
setTimeout(() => {
-
timelineEntry.style.outline = '';
-
timelineEntry.style.borderRadius = '';
-
}, 2000);
-
}
-
}
-
});
-
});
-
});
···
-754
src/thicket/templates/style.css
···
-
/* Modern, clean design with high-density text and readable theme */
-
-
:root {
-
--primary-color: #2c3e50;
-
--secondary-color: #3498db;
-
--accent-color: #e74c3c;
-
--background: #ffffff;
-
--surface: #f8f9fa;
-
--text-primary: #2c3e50;
-
--text-secondary: #7f8c8d;
-
--border-color: #e0e0e0;
-
--thread-indent: 20px;
-
--max-width: 1200px;
-
}
-
-
* {
-
margin: 0;
-
padding: 0;
-
box-sizing: border-box;
-
}
-
-
body {
-
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Helvetica Neue', Arial, sans-serif;
-
font-size: 14px;
-
line-height: 1.6;
-
color: var(--text-primary);
-
background-color: var(--background);
-
}
-
-
/* Header */
-
.site-header {
-
background-color: var(--surface);
-
border-bottom: 1px solid var(--border-color);
-
padding: 0.75rem 0;
-
position: sticky;
-
top: 0;
-
z-index: 100;
-
}
-
-
.header-content {
-
max-width: var(--max-width);
-
margin: 0 auto;
-
padding: 0 2rem;
-
display: flex;
-
justify-content: space-between;
-
align-items: center;
-
}
-
-
.site-title {
-
font-size: 1.5rem;
-
font-weight: 600;
-
color: var(--primary-color);
-
margin: 0;
-
}
-
-
/* Navigation */
-
.site-nav {
-
display: flex;
-
gap: 1.5rem;
-
}
-
-
.nav-link {
-
text-decoration: none;
-
color: var(--text-secondary);
-
font-weight: 500;
-
font-size: 0.95rem;
-
padding: 0.5rem 0.75rem;
-
border-radius: 4px;
-
transition: all 0.2s ease;
-
}
-
-
.nav-link:hover {
-
color: var(--primary-color);
-
background-color: var(--background);
-
}
-
-
.nav-link.active {
-
color: var(--secondary-color);
-
background-color: var(--background);
-
font-weight: 600;
-
}
-
-
/* Main Content */
-
.main-content {
-
max-width: var(--max-width);
-
margin: 2rem auto;
-
padding: 0 2rem;
-
}
-
-
.page-content {
-
margin: 0;
-
}
-
-
.page-description {
-
color: var(--text-secondary);
-
margin-bottom: 1.5rem;
-
font-style: italic;
-
}
-
-
/* Sections */
-
section {
-
margin-bottom: 2rem;
-
}
-
-
h2 {
-
font-size: 1.3rem;
-
font-weight: 600;
-
margin-bottom: 0.75rem;
-
color: var(--primary-color);
-
}
-
-
h3 {
-
font-size: 1.1rem;
-
font-weight: 600;
-
margin-bottom: 0.75rem;
-
color: var(--primary-color);
-
}
-
-
/* Entries and Threads */
-
article {
-
margin-bottom: 1.5rem;
-
padding: 1rem;
-
background-color: var(--surface);
-
border-radius: 4px;
-
border: 1px solid var(--border-color);
-
}
-
-
/* Timeline-style entries */
-
.timeline-entry {
-
margin-bottom: 0.5rem;
-
padding: 0.5rem 0.75rem;
-
border: none;
-
background: transparent;
-
transition: background-color 0.2s ease;
-
}
-
-
.timeline-entry:hover {
-
background-color: var(--surface);
-
}
-
-
.timeline-meta {
-
display: inline-flex;
-
gap: 0.5rem;
-
align-items: center;
-
font-size: 0.75rem;
-
color: var(--text-secondary);
-
margin-bottom: 0.25rem;
-
}
-
-
.timeline-time {
-
font-family: 'SF Mono', Monaco, Consolas, 'Courier New', monospace;
-
font-size: 0.75rem;
-
color: var(--text-secondary);
-
}
-
-
.timeline-author {
-
font-weight: 600;
-
color: var(--primary-color);
-
font-size: 0.8rem;
-
text-decoration: none;
-
}
-
-
.timeline-author:hover {
-
color: var(--secondary-color);
-
text-decoration: underline;
-
}
-
-
.timeline-content {
-
line-height: 1.4;
-
}
-
-
.timeline-title {
-
font-size: 0.95rem;
-
font-weight: 600;
-
}
-
-
.timeline-title a {
-
color: var(--primary-color);
-
text-decoration: none;
-
}
-
-
.timeline-title a:hover {
-
color: var(--secondary-color);
-
text-decoration: underline;
-
}
-
-
.timeline-summary {
-
color: var(--text-secondary);
-
font-size: 0.9rem;
-
line-height: 1.4;
-
}
-
-
/* Legacy styles for other sections */
-
.entry-meta, .thread-header {
-
display: flex;
-
gap: 1rem;
-
align-items: center;
-
margin-bottom: 0.5rem;
-
font-size: 0.85rem;
-
color: var(--text-secondary);
-
}
-
-
.author {
-
font-weight: 600;
-
color: var(--primary-color);
-
}
-
-
time {
-
font-size: 0.85rem;
-
}
-
-
h4 {
-
font-size: 1.1rem;
-
font-weight: 600;
-
margin-bottom: 0.5rem;
-
}
-
-
h4 a {
-
color: var(--primary-color);
-
text-decoration: none;
-
}
-
-
h4 a:hover {
-
color: var(--secondary-color);
-
text-decoration: underline;
-
}
-
-
.entry-summary {
-
color: var(--text-primary);
-
line-height: 1.5;
-
margin-top: 0.5rem;
-
}
-
-
/* Enhanced Threading Styles */
-
-
/* Conversation Clusters */
-
.conversation-cluster {
-
background-color: var(--background);
-
border: 2px solid var(--border-color);
-
border-radius: 8px;
-
margin-bottom: 2rem;
-
overflow: hidden;
-
box-shadow: 0 2px 4px rgba(0, 0, 0, 0.05);
-
}
-
-
.conversation-header {
-
background: linear-gradient(135deg, var(--surface) 0%, #f1f3f4 100%);
-
padding: 0.75rem 1rem;
-
border-bottom: 1px solid var(--border-color);
-
}
-
-
.conversation-meta {
-
display: flex;
-
justify-content: space-between;
-
align-items: center;
-
flex-wrap: wrap;
-
gap: 0.5rem;
-
}
-
-
.conversation-count {
-
font-weight: 600;
-
color: var(--secondary-color);
-
font-size: 0.9rem;
-
}
-
-
.conversation-participants {
-
font-size: 0.8rem;
-
color: var(--text-secondary);
-
flex: 1;
-
text-align: right;
-
}
-
-
.conversation-flow {
-
padding: 0.5rem;
-
}
-
-
/* Threaded Conversation Entries */
-
.conversation-entry {
-
position: relative;
-
margin-bottom: 0.75rem;
-
display: flex;
-
align-items: flex-start;
-
}
-
-
.conversation-entry.level-0 {
-
margin-left: 0;
-
}
-
-
.conversation-entry.level-1 {
-
margin-left: 1.5rem;
-
}
-
-
.conversation-entry.level-2 {
-
margin-left: 3rem;
-
}
-
-
.conversation-entry.level-3 {
-
margin-left: 4.5rem;
-
}
-
-
.conversation-entry.level-4 {
-
margin-left: 6rem;
-
}
-
-
.entry-connector {
-
width: 3px;
-
background-color: var(--secondary-color);
-
margin-right: 0.75rem;
-
margin-top: 0.25rem;
-
min-height: 2rem;
-
border-radius: 2px;
-
opacity: 0.6;
-
}
-
-
.conversation-entry.level-0 .entry-connector {
-
background-color: var(--accent-color);
-
opacity: 0.8;
-
}
-
-
.entry-content {
-
flex: 1;
-
background-color: var(--surface);
-
padding: 0.75rem;
-
border-radius: 6px;
-
border: 1px solid var(--border-color);
-
transition: all 0.2s ease;
-
}
-
-
.entry-content:hover {
-
border-color: var(--secondary-color);
-
box-shadow: 0 2px 8px rgba(52, 152, 219, 0.1);
-
}
-
-
/* Reference Indicators */
-
.reference-indicators {
-
display: inline-flex;
-
gap: 0.25rem;
-
margin-left: 0.5rem;
-
}
-
-
.ref-out, .ref-in {
-
display: inline-block;
-
width: 1rem;
-
height: 1rem;
-
border-radius: 50%;
-
text-align: center;
-
line-height: 1rem;
-
font-size: 0.7rem;
-
font-weight: bold;
-
}
-
-
.ref-out {
-
background-color: #e8f5e8;
-
color: #2d8f2d;
-
}
-
-
.ref-in {
-
background-color: #e8f0ff;
-
color: #1f5fbf;
-
}
-
-
/* Reference Badges for Individual Posts */
-
.timeline-entry.with-references {
-
background-color: var(--surface);
-
}
-
-
/* Conversation posts in unified timeline */
-
.timeline-entry.conversation-post {
-
background: transparent;
-
border: none;
-
margin-bottom: 0.5rem;
-
padding: 0.5rem 0.75rem;
-
}
-
-
.timeline-entry.conversation-post.level-0 {
-
margin-left: 0;
-
border-left: 2px solid var(--accent-color);
-
padding-left: 0.75rem;
-
}
-
-
.timeline-entry.conversation-post.level-1 {
-
margin-left: 1.5rem;
-
border-left: 2px solid var(--secondary-color);
-
padding-left: 0.75rem;
-
}
-
-
.timeline-entry.conversation-post.level-2 {
-
margin-left: 3rem;
-
border-left: 2px solid var(--text-secondary);
-
padding-left: 0.75rem;
-
}
-
-
.timeline-entry.conversation-post.level-3 {
-
margin-left: 4.5rem;
-
border-left: 2px solid var(--text-secondary);
-
padding-left: 0.75rem;
-
}
-
-
.timeline-entry.conversation-post.level-4 {
-
margin-left: 6rem;
-
border-left: 2px solid var(--text-secondary);
-
padding-left: 0.75rem;
-
}
-
-
/* Cross-thread linking */
-
.cross-thread-links {
-
margin-top: 0.5rem;
-
padding-top: 0.5rem;
-
border-top: 1px solid var(--border-color);
-
}
-
-
.cross-thread-indicator {
-
font-size: 0.75rem;
-
color: var(--text-secondary);
-
background-color: var(--surface);
-
padding: 0.25rem 0.5rem;
-
border-radius: 12px;
-
border: 1px solid var(--border-color);
-
display: inline-block;
-
}
-
-
/* Inline shared references styling */
-
.inline-shared-refs {
-
margin-left: 0.5rem;
-
font-size: 0.85rem;
-
color: var(--text-secondary);
-
}
-
-
.shared-ref-link {
-
color: var(--primary-color);
-
text-decoration: none;
-
font-weight: 500;
-
transition: color 0.2s ease;
-
}
-
-
.shared-ref-link:hover {
-
color: var(--secondary-color);
-
text-decoration: underline;
-
}
-
-
.shared-ref-more {
-
font-style: italic;
-
color: var(--text-secondary);
-
font-size: 0.8rem;
-
margin-left: 0.25rem;
-
}
-
-
.user-anchor, .post-anchor {
-
position: absolute;
-
margin-top: -60px; /* Offset for fixed header */
-
pointer-events: none;
-
}
-
-
.cross-thread-link {
-
color: var(--primary-color);
-
text-decoration: none;
-
font-weight: 500;
-
transition: color 0.2s ease;
-
}
-
-
.cross-thread-link:hover {
-
color: var(--secondary-color);
-
text-decoration: underline;
-
}
-
-
.reference-badges {
-
display: flex;
-
gap: 0.25rem;
-
margin-left: 0.5rem;
-
flex-wrap: wrap;
-
}
-
-
.ref-badge {
-
display: inline-block;
-
padding: 0.1rem 0.4rem;
-
border-radius: 12px;
-
font-size: 0.7rem;
-
font-weight: 600;
-
text-transform: uppercase;
-
letter-spacing: 0.05em;
-
}
-
-
.ref-badge.ref-outbound {
-
background-color: #e8f5e8;
-
color: #2d8f2d;
-
border: 1px solid #c3e6c3;
-
}
-
-
.ref-badge.ref-inbound {
-
background-color: #e8f0ff;
-
color: #1f5fbf;
-
border: 1px solid #b3d9ff;
-
}
-
-
/* Author Color Coding */
-
.timeline-author {
-
position: relative;
-
}
-
-
.timeline-author::before {
-
content: '';
-
display: inline-block;
-
width: 8px;
-
height: 8px;
-
border-radius: 50%;
-
margin-right: 0.5rem;
-
background-color: var(--secondary-color);
-
}
-
-
/* Generate consistent colors for authors */
-
.author-avsm::before { background-color: #e74c3c; }
-
.author-mort::before { background-color: #3498db; }
-
.author-mte::before { background-color: #2ecc71; }
-
.author-ryan::before { background-color: #f39c12; }
-
.author-mwd::before { background-color: #9b59b6; }
-
.author-dra::before { background-color: #1abc9c; }
-
.author-pf341::before { background-color: #34495e; }
-
.author-sadiqj::before { background-color: #e67e22; }
-
.author-martinkl::before { background-color: #8e44ad; }
-
.author-jonsterling::before { background-color: #27ae60; }
-
.author-jon::before { background-color: #f1c40f; }
-
.author-onkar::before { background-color: #e91e63; }
-
.author-gabriel::before { background-color: #00bcd4; }
-
.author-jess::before { background-color: #ff5722; }
-
.author-ibrahim::before { background-color: #607d8b; }
-
.author-andres::before { background-color: #795548; }
-
.author-eeg::before { background-color: #ff9800; }
-
-
/* Section Headers */
-
.conversations-section h3,
-
.referenced-posts-section h3,
-
.individual-posts-section h3 {
-
border-bottom: 2px solid var(--border-color);
-
padding-bottom: 0.5rem;
-
margin-bottom: 1.5rem;
-
position: relative;
-
}
-
-
.conversations-section h3::before {
-
content: "๐Ÿ’ฌ";
-
margin-right: 0.5rem;
-
}
-
-
.referenced-posts-section h3::before {
-
content: "๐Ÿ”—";
-
margin-right: 0.5rem;
-
}
-
-
.individual-posts-section h3::before {
-
content: "๐Ÿ“";
-
margin-right: 0.5rem;
-
}
-
-
/* Legacy thread styles (for backward compatibility) */
-
.thread {
-
background-color: var(--background);
-
border: 1px solid var(--border-color);
-
padding: 0;
-
overflow: hidden;
-
margin-bottom: 1rem;
-
}
-
-
.thread-header {
-
background-color: var(--surface);
-
padding: 0.5rem 0.75rem;
-
border-bottom: 1px solid var(--border-color);
-
}
-
-
.thread-count {
-
font-weight: 600;
-
color: var(--secondary-color);
-
}
-
-
.thread-entry {
-
padding: 0.5rem 0.75rem;
-
border-bottom: 1px solid var(--border-color);
-
}
-
-
.thread-entry:last-child {
-
border-bottom: none;
-
}
-
-
.thread-entry.reply {
-
margin-left: var(--thread-indent);
-
border-left: 3px solid var(--secondary-color);
-
background-color: var(--surface);
-
}
-
-
/* Links Section */
-
.link-group {
-
background-color: var(--background);
-
}
-
-
.link-url {
-
font-size: 1rem;
-
word-break: break-word;
-
}
-
-
.link-url a {
-
color: var(--secondary-color);
-
text-decoration: none;
-
}
-
-
.link-url a:hover {
-
text-decoration: underline;
-
}
-
-
.target-user {
-
font-size: 0.9rem;
-
color: var(--text-secondary);
-
font-weight: normal;
-
}
-
-
.referencing-entries {
-
margin-top: 0.75rem;
-
}
-
-
.ref-count {
-
font-weight: 600;
-
color: var(--text-secondary);
-
font-size: 0.9rem;
-
}
-
-
.referencing-entries ul {
-
list-style: none;
-
margin-top: 0.5rem;
-
padding-left: 1rem;
-
}
-
-
.referencing-entries li {
-
margin-bottom: 0.25rem;
-
font-size: 0.9rem;
-
}
-
-
.referencing-entries .more {
-
font-style: italic;
-
color: var(--text-secondary);
-
}
-
-
/* Users Section */
-
.user-card {
-
background-color: var(--background);
-
}
-
-
.user-header {
-
display: flex;
-
gap: 1rem;
-
align-items: start;
-
margin-bottom: 1rem;
-
}
-
-
.user-icon {
-
width: 48px;
-
height: 48px;
-
border-radius: 50%;
-
object-fit: cover;
-
}
-
-
.user-info h3 {
-
margin-bottom: 0.25rem;
-
}
-
-
.username {
-
font-size: 0.9rem;
-
color: var(--text-secondary);
-
font-weight: normal;
-
}
-
-
.user-meta {
-
font-size: 0.9rem;
-
color: var(--text-secondary);
-
}
-
-
.user-meta a {
-
color: var(--secondary-color);
-
text-decoration: none;
-
}
-
-
.user-meta a:hover {
-
text-decoration: underline;
-
}
-
-
.separator {
-
margin: 0 0.5rem;
-
}
-
-
.post-count {
-
font-weight: 600;
-
}
-
-
.user-recent h4 {
-
font-size: 0.95rem;
-
margin-bottom: 0.5rem;
-
color: var(--text-secondary);
-
}
-
-
.user-recent ul {
-
list-style: none;
-
padding-left: 0;
-
}
-
-
.user-recent li {
-
margin-bottom: 0.25rem;
-
font-size: 0.9rem;
-
}
-
-
/* Footer */
-
.site-footer {
-
max-width: var(--max-width);
-
margin: 3rem auto 2rem;
-
padding: 1rem 2rem;
-
text-align: center;
-
color: var(--text-secondary);
-
font-size: 0.85rem;
-
border-top: 1px solid var(--border-color);
-
}
-
-
.site-footer a {
-
color: var(--secondary-color);
-
text-decoration: none;
-
}
-
-
.site-footer a:hover {
-
text-decoration: underline;
-
}
-
-
/* Responsive */
-
@media (max-width: 768px) {
-
.site-title {
-
font-size: 1.3rem;
-
}
-
-
.header-content {
-
flex-direction: column;
-
gap: 0.75rem;
-
align-items: flex-start;
-
}
-
-
.site-nav {
-
gap: 1rem;
-
}
-
-
.main-content {
-
padding: 0 1rem;
-
}
-
-
.thread-entry.reply {
-
margin-left: calc(var(--thread-indent) / 2);
-
}
-
-
.user-header {
-
flex-direction: column;
-
}
-
}
···
-141
src/thicket/templates/timeline.html
···
-
{% extends "base.html" %}
-
-
{% block page_title %}Timeline - {{ title }}{% endblock %}
-
-
{% block content %}
-
{% set seen_users = [] %}
-
<div class="page-content">
-
<h2>Recent Posts & Conversations</h2>
-
-
<section class="unified-timeline">
-
{% for item in timeline_items %}
-
{% if item.type == "post" %}
-
<!-- Individual Post -->
-
<article class="timeline-entry {% if item.content.references %}with-references{% endif %}">
-
<div class="timeline-meta">
-
<time datetime="{{ item.content.entry.updated or item.content.entry.published }}" class="timeline-time">
-
{{ (item.content.entry.updated or item.content.entry.published).strftime('%Y-%m-%d %H:%M') }}
-
</time>
-
{% set homepage = get_user_homepage(item.content.username) %}
-
{% if item.content.username not in seen_users %}
-
<a id="{{ item.content.username }}" class="user-anchor"></a>
-
{% set _ = seen_users.append(item.content.username) %}
-
{% endif %}
-
<a id="post-{{ loop.index0 }}-{{ safe_anchor_id(item.content.entry.id) }}" class="post-anchor"></a>
-
{% if homepage %}
-
<a href="{{ homepage }}" target="_blank" class="timeline-author">{{ item.content.display_name }}</a>
-
{% else %}
-
<span class="timeline-author">{{ item.content.display_name }}</span>
-
{% endif %}
-
{% if item.content.references %}
-
<div class="reference-badges">
-
{% for ref in item.content.references %}
-
{% if ref.type == 'outbound' %}
-
<span class="ref-badge ref-outbound" title="References {{ ref.target_username or 'external post' }}">
-
โ†’ {{ ref.target_username or 'ext' }}
-
</span>
-
{% elif ref.type == 'inbound' %}
-
<span class="ref-badge ref-inbound" title="Referenced by {{ ref.source_username or 'external post' }}">
-
โ† {{ ref.source_username or 'ext' }}
-
</span>
-
{% endif %}
-
{% endfor %}
-
</div>
-
{% endif %}
-
</div>
-
<div class="timeline-content">
-
<strong class="timeline-title">
-
<a href="{{ item.content.entry.link }}" target="_blank">{{ item.content.entry.title }}</a>
-
</strong>
-
{% if item.content.entry.summary %}
-
<span class="timeline-summary">โ€” {{ clean_html_summary(item.content.entry.summary, 250) }}</span>
-
{% endif %}
-
{% if item.content.shared_references %}
-
<span class="inline-shared-refs">
-
{% for ref in item.content.shared_references[:3] %}
-
{% if ref.target_username %}
-
<a href="#{{ ref.target_username }}" class="shared-ref-link" title="Referenced by {{ ref.count }} entries">@{{ ref.target_username }}</a>{% if not loop.last %}, {% endif %}
-
{% endif %}
-
{% endfor %}
-
{% if item.content.shared_references|length > 3 %}
-
<span class="shared-ref-more">+{{ item.content.shared_references|length - 3 }} more</span>
-
{% endif %}
-
</span>
-
{% endif %}
-
{% if item.content.cross_thread_links %}
-
<div class="cross-thread-links">
-
<span class="cross-thread-indicator">๐Ÿ”— Also appears: </span>
-
{% for link in item.content.cross_thread_links %}
-
<a href="#{{ link.anchor_id }}" class="cross-thread-link" title="{{ link.title }}">{{ link.context }}</a>{% if not loop.last %}, {% endif %}
-
{% endfor %}
-
</div>
-
{% endif %}
-
</div>
-
</article>
-
-
{% elif item.type == "thread" %}
-
<!-- Conversation Thread -->
-
{% set outer_loop_index = loop.index0 %}
-
{% for thread_item in item.content %}
-
<article class="timeline-entry conversation-post level-{{ thread_item.thread_level }}">
-
<div class="timeline-meta">
-
<time datetime="{{ thread_item.entry.updated or thread_item.entry.published }}" class="timeline-time">
-
{{ (thread_item.entry.updated or thread_item.entry.published).strftime('%Y-%m-%d %H:%M') }}
-
</time>
-
{% set homepage = get_user_homepage(thread_item.username) %}
-
{% if thread_item.username not in seen_users %}
-
<a id="{{ thread_item.username }}" class="user-anchor"></a>
-
{% set _ = seen_users.append(thread_item.username) %}
-
{% endif %}
-
<a id="post-{{ outer_loop_index }}-{{ loop.index0 }}-{{ safe_anchor_id(thread_item.entry.id) }}" class="post-anchor"></a>
-
{% if homepage %}
-
<a href="{{ homepage }}" target="_blank" class="timeline-author author-{{ thread_item.username }}">{{ thread_item.display_name }}</a>
-
{% else %}
-
<span class="timeline-author author-{{ thread_item.username }}">{{ thread_item.display_name }}</span>
-
{% endif %}
-
{% if thread_item.references_to or thread_item.referenced_by %}
-
<span class="reference-indicators">
-
{% if thread_item.references_to %}
-
<span class="ref-out" title="References other posts">โ†’</span>
-
{% endif %}
-
{% if thread_item.referenced_by %}
-
<span class="ref-in" title="Referenced by other posts">โ†</span>
-
{% endif %}
-
</span>
-
{% endif %}
-
</div>
-
<div class="timeline-content">
-
<strong class="timeline-title">
-
<a href="{{ thread_item.entry.link }}" target="_blank">{{ thread_item.entry.title }}</a>
-
</strong>
-
{% if thread_item.entry.summary %}
-
<span class="timeline-summary">โ€” {{ clean_html_summary(thread_item.entry.summary, 300) }}</span>
-
{% endif %}
-
{% if thread_item.shared_references %}
-
<span class="inline-shared-refs">
-
{% for ref in thread_item.shared_references[:3] %}
-
{% if ref.target_username %}
-
<a href="#{{ ref.target_username }}" class="shared-ref-link" title="Referenced by {{ ref.count }} entries">@{{ ref.target_username }}</a>{% if not loop.last %}, {% endif %}
-
{% endif %}
-
{% endfor %}
-
{% if thread_item.shared_references|length > 3 %}
-
<span class="shared-ref-more">+{{ thread_item.shared_references|length - 3 }} more</span>
-
{% endif %}
-
</span>
-
{% endif %}
-
{% if thread_item.cross_thread_links %}
-
<div class="cross-thread-links">
-
<span class="cross-thread-indicator">๐Ÿ”— Also appears: </span>
-
{% for link in thread_item.cross_thread_links %}
-
<a href="#{{ link.anchor_id }}" class="cross-thread-link" title="{{ link.title }}">{{ link.context }}</a>{% if not loop.last %}, {% endif %}
-
{% endfor %}
-
</div>
-
{% endif %}
-
</div>
-
</article>
-
{% endfor %}
-
{% endif %}
-
{% endfor %}
-
</section>
-
</div>
-
{% endblock %}
···
-169
src/thicket/templates/user_detail.html
···
-
{% extends "base.html" %}
-
-
{% block title %}{{ title }} - Thicket{% endblock %}
-
-
{% block content %}
-
<div class="container mx-auto px-4 py-8">
-
<div class="max-w-4xl mx-auto">
-
<!-- User Header -->
-
<div class="bg-white rounded-lg shadow-md p-6 mb-6">
-
<div class="flex items-center space-x-4">
-
{% if user_config and user_config.icon %}
-
<img src="{{ user_config.icon }}" alt="{{ title }}" class="w-16 h-16 rounded-full">
-
{% else %}
-
<div class="w-16 h-16 rounded-full bg-blue-500 flex items-center justify-center text-white text-xl font-bold">
-
{{ user_metadata.username[0].upper() }}
-
</div>
-
{% endif %}
-
-
<div>
-
<h1 class="text-2xl font-bold text-gray-900">{{ title }}</h1>
-
<p class="text-gray-600">@{{ user_metadata.username }}</p>
-
{% if user_config and user_config.email %}
-
<p class="text-sm text-gray-500">{{ user_config.email }}</p>
-
{% endif %}
-
</div>
-
</div>
-
-
{% if user_config and user_config.homepage %}
-
<div class="mt-4">
-
<a href="{{ user_config.homepage }}" class="text-blue-600 hover:text-blue-800" target="_blank">
-
๐Ÿ  Homepage
-
</a>
-
</div>
-
{% endif %}
-
-
<div class="mt-4 grid grid-cols-2 md:grid-cols-4 gap-4">
-
<div class="text-center">
-
<div class="text-2xl font-bold text-blue-600">{{ user_metadata.entry_count }}</div>
-
<div class="text-sm text-gray-500">Entries</div>
-
</div>
-
-
{% if user_config %}
-
<div class="text-center">
-
<div class="text-2xl font-bold text-green-600">{{ user_config.feeds|length }}</div>
-
<div class="text-sm text-gray-500">Feeds</div>
-
</div>
-
{% endif %}
-
-
<div class="text-center">
-
<div class="text-2xl font-bold text-purple-600">{{ user_links|length }}</div>
-
<div class="text-sm text-gray-500">Link Groups</div>
-
</div>
-
-
<div class="text-center">
-
<div class="text-sm text-gray-500">Member since</div>
-
<div class="text-sm font-medium">{{ user_metadata.created.strftime('%Y-%m-%d') if user_metadata.created else 'Unknown' }}</div>
-
</div>
-
</div>
-
</div>
-
-
<!-- Feeds -->
-
{% if user_config and user_config.feeds %}
-
<div class="bg-white rounded-lg shadow-md p-6 mb-6">
-
<h2 class="text-xl font-semibold mb-4">Feeds</h2>
-
<div class="space-y-2">
-
{% for feed in user_config.feeds %}
-
<div class="flex items-center space-x-2">
-
<span class="text-green-500">๐Ÿ“ก</span>
-
<a href="{{ feed }}" class="text-blue-600 hover:text-blue-800" target="_blank">{{ feed }}</a>
-
</div>
-
{% endfor %}
-
</div>
-
</div>
-
{% endif %}
-
-
<!-- Recent Entries -->
-
<div class="bg-white rounded-lg shadow-md p-6 mb-6">
-
<h2 class="text-xl font-semibold mb-4">Recent Entries</h2>
-
-
{% if entries %}
-
<div class="space-y-4">
-
{% for entry in entries[:10] %}
-
<div class="border-l-4 border-blue-500 pl-4 py-2">
-
<h3 class="font-semibold text-lg">
-
<a href="{{ entry.link }}" class="text-blue-600 hover:text-blue-800" target="_blank">
-
{{ entry.title }}
-
</a>
-
</h3>
-
-
<div class="text-sm text-gray-500 mb-2">
-
{% if entry.published %}
-
Published: {{ entry.published.strftime('%Y-%m-%d %H:%M') }}
-
{% endif %}
-
{% if entry.updated and entry.updated != entry.published %}
-
โ€ข Updated: {{ entry.updated.strftime('%Y-%m-%d %H:%M') }}
-
{% endif %}
-
</div>
-
-
{% if entry.summary %}
-
<div class="text-gray-700 mb-2">
-
{{ entry.summary|truncate(200) }}
-
</div>
-
{% endif %}
-
-
{% if entry.categories %}
-
<div class="flex flex-wrap gap-1">
-
{% for category in entry.categories %}
-
<span class="px-2 py-1 bg-blue-100 text-blue-800 text-xs rounded">{{ category }}</span>
-
{% endfor %}
-
</div>
-
{% endif %}
-
</div>
-
{% endfor %}
-
</div>
-
-
{% if entries|length > 10 %}
-
<div class="mt-4 text-center">
-
<p class="text-gray-500">Showing 10 of {{ entries|length }} entries</p>
-
</div>
-
{% endif %}
-
-
{% else %}
-
<p class="text-gray-500">No entries found.</p>
-
{% endif %}
-
</div>
-
-
<!-- Links Summary -->
-
{% if user_links %}
-
<div class="bg-white rounded-lg shadow-md p-6">
-
<h2 class="text-xl font-semibold mb-4">Link Activity</h2>
-
-
<div class="space-y-3">
-
{% for link_group in user_links[:5] %}
-
<div class="border-l-4 border-green-500 pl-4">
-
<h3 class="font-medium">{{ link_group.title }}</h3>
-
<div class="text-sm text-gray-500 mb-2">
-
{{ link_group.links|length }} link(s) found
-
</div>
-
-
<div class="space-y-1">
-
{% for link in link_group.links[:3] %}
-
<div class="text-sm">
-
<a href="{{ link.url }}" class="text-blue-600 hover:text-blue-800" target="_blank">
-
{{ link.text or link.url }}
-
</a>
-
<span class="text-gray-400 ml-2">({{ link.type }})</span>
-
</div>
-
{% endfor %}
-
-
{% if link_group.links|length > 3 %}
-
<div class="text-sm text-gray-500">
-
... and {{ link_group.links|length - 3 }} more
-
</div>
-
{% endif %}
-
</div>
-
</div>
-
{% endfor %}
-
</div>
-
-
{% if user_links|length > 5 %}
-
<div class="mt-4 text-center">
-
<p class="text-gray-500">Showing 5 of {{ user_links|length }} entries with links</p>
-
</div>
-
{% endif %}
-
</div>
-
{% endif %}
-
</div>
-
</div>
-
{% endblock %}
···
-57
src/thicket/templates/users.html
···
-
{% extends "base.html" %}
-
-
{% block page_title %}Users - {{ title }}{% endblock %}
-
-
{% block content %}
-
<div class="page-content">
-
<h2>Users</h2>
-
<p class="page-description">All users contributing to this thicket, ordered by post count.</p>
-
-
{% for user_info in users %}
-
<article class="user-card">
-
<div class="user-header">
-
{% if user_info.metadata.icon and user_info.metadata.icon != "None" %}
-
<img src="{{ user_info.metadata.icon }}" alt="{{ user_info.metadata.username }}" class="user-icon">
-
{% endif %}
-
<div class="user-info">
-
<h3>
-
{% if user_info.metadata.display_name %}
-
{{ user_info.metadata.display_name }}
-
<span class="username">({{ user_info.metadata.username }})</span>
-
{% else %}
-
{{ user_info.metadata.username }}
-
{% endif %}
-
</h3>
-
<div class="user-meta">
-
{% if user_info.metadata.homepage %}
-
<a href="{{ user_info.metadata.homepage }}" target="_blank">{{ user_info.metadata.homepage }}</a>
-
{% endif %}
-
{% if user_info.metadata.email %}
-
<span class="separator">โ€ข</span>
-
<a href="mailto:{{ user_info.metadata.email }}">{{ user_info.metadata.email }}</a>
-
{% endif %}
-
<span class="separator">โ€ข</span>
-
<span class="post-count">{{ user_info.metadata.entry_count }} posts</span>
-
</div>
-
</div>
-
</div>
-
-
{% if user_info.recent_entries %}
-
<div class="user-recent">
-
<h4>Recent posts:</h4>
-
<ul>
-
{% for display_name, entry in user_info.recent_entries %}
-
<li>
-
<a href="{{ entry.link }}" target="_blank">{{ entry.title }}</a>
-
<time datetime="{{ entry.updated or entry.published }}">
-
({{ (entry.updated or entry.published).strftime('%Y-%m-%d') }})
-
</time>
-
</li>
-
{% endfor %}
-
</ul>
-
</div>
-
{% endif %}
-
</article>
-
{% endfor %}
-
</div>
-
{% endblock %}
···
-230
src/thicket/thicket.py
···
-
"""Main Thicket library class providing unified API."""
-
-
import asyncio
-
from datetime import datetime
-
from pathlib import Path
-
from typing import Optional, Union
-
-
from pydantic import HttpUrl
-
-
from .core.feed_parser import FeedParser
-
from .core.git_store import GitStore
-
from .models import AtomEntry, ThicketConfig, UserConfig
-
from .subsystems.feeds import FeedManager
-
from .subsystems.links import LinkProcessor
-
from .subsystems.repository import RepositoryManager
-
from .subsystems.site import SiteGenerator
-
from .subsystems.users import UserManager
-
-
-
class Thicket:
-
"""
-
Main Thicket class providing unified API for feed management.
-
-
This class serves as the primary interface for all Thicket operations,
-
consolidating configuration, repository management, feed processing,
-
user management, link processing, and site generation.
-
"""
-
-
def __init__(self, config: Union[ThicketConfig, Path, str]):
-
"""
-
Initialize Thicket with configuration.
-
-
Args:
-
config: Either a ThicketConfig object, or path to config file
-
"""
-
if isinstance(config, (Path, str)):
-
self.config = ThicketConfig.from_file(Path(config))
-
else:
-
self.config = config
-
-
# Initialize subsystems
-
self._init_subsystems()
-
-
def _init_subsystems(self):
-
"""Initialize all subsystems."""
-
# Core components
-
self.git_store = GitStore(self.config.git_store)
-
self.feed_parser = FeedParser()
-
-
# Subsystem managers
-
self.repository = RepositoryManager(self.git_store, self.config)
-
self.users = UserManager(self.git_store, self.config)
-
self.feeds = FeedManager(self.git_store, self.feed_parser, self.config)
-
self.links = LinkProcessor(self.git_store, self.config)
-
self.site = SiteGenerator(self.git_store, self.config)
-
-
@classmethod
-
def create(cls, git_store: Path, cache_dir: Path, users: Optional[list[UserConfig]] = None) -> 'Thicket':
-
"""
-
Create a new Thicket instance with minimal configuration.
-
-
Args:
-
git_store: Path to git repository
-
cache_dir: Path to cache directory
-
users: Optional list of user configurations
-
-
Returns:
-
Configured Thicket instance
-
"""
-
config = ThicketConfig(
-
git_store=git_store,
-
cache_dir=cache_dir,
-
users=users or []
-
)
-
return cls(config)
-
-
@classmethod
-
def from_config_file(cls, config_path: Path) -> 'Thicket':
-
"""Load Thicket from configuration file."""
-
return cls(config_path)
-
-
# User Management API
-
def add_user(self, username: str, feeds: list[str], **kwargs) -> UserConfig:
-
"""Add a new user with feeds."""
-
return self.users.add_user(username, feeds, **kwargs)
-
-
def get_user(self, username: str) -> Optional[UserConfig]:
-
"""Get user configuration."""
-
return self.users.get_user(username)
-
-
def list_users(self) -> list[UserConfig]:
-
"""List all configured users."""
-
return self.users.list_users()
-
-
def update_user(self, username: str, **kwargs) -> bool:
-
"""Update user configuration."""
-
return self.users.update_user(username, **kwargs)
-
-
def remove_user(self, username: str) -> bool:
-
"""Remove a user and their data."""
-
return self.users.remove_user(username)
-
-
# Feed Management API
-
async def sync_feeds(self, username: Optional[str] = None, progress_callback=None) -> dict:
-
"""Sync feeds for user(s)."""
-
return await self.feeds.sync_feeds(username, progress_callback)
-
-
async def sync_user_feeds(self, username: str, progress_callback=None) -> dict:
-
"""Sync feeds for a specific user."""
-
return await self.feeds.sync_user_feeds(username, progress_callback)
-
-
def get_entries(self, username: str, limit: Optional[int] = None) -> list[AtomEntry]:
-
"""Get entries for a user."""
-
return self.feeds.get_entries(username, limit)
-
-
def get_entry(self, username: str, entry_id: str) -> Optional[AtomEntry]:
-
"""Get a specific entry."""
-
return self.feeds.get_entry(username, entry_id)
-
-
def search_entries(self, query: str, username: Optional[str] = None, limit: Optional[int] = None) -> list[tuple[str, AtomEntry]]:
-
"""Search entries across users."""
-
return self.feeds.search_entries(query, username, limit)
-
-
# Repository Management API
-
def init_repository(self) -> bool:
-
"""Initialize the git repository."""
-
return self.repository.init_repository()
-
-
def commit_changes(self, message: str) -> bool:
-
"""Commit all pending changes."""
-
return self.repository.commit_changes(message)
-
-
def get_status(self) -> dict:
-
"""Get repository status and statistics."""
-
return self.repository.get_status()
-
-
def backup_repository(self, backup_path: Path) -> bool:
-
"""Create a backup of the repository."""
-
return self.repository.backup_repository(backup_path)
-
-
# Link Processing API
-
def process_links(self, username: Optional[str] = None) -> dict:
-
"""Process and extract links from entries."""
-
return self.links.process_links(username)
-
-
def get_links(self, username: Optional[str] = None) -> dict:
-
"""Get processed links."""
-
return self.links.get_links(username)
-
-
def find_references(self, url: str) -> list[tuple[str, AtomEntry]]:
-
"""Find entries that reference a URL."""
-
return self.links.find_references(url)
-
-
# Site Generation API
-
def generate_site(self, output_dir: Path, template_dir: Optional[Path] = None) -> bool:
-
"""Generate static site."""
-
return self.site.generate_site(output_dir, template_dir)
-
-
def generate_timeline(self, output_path: Path, limit: Optional[int] = None) -> bool:
-
"""Generate timeline HTML."""
-
return self.site.generate_timeline(output_path, limit)
-
-
def generate_user_pages(self, output_dir: Path) -> bool:
-
"""Generate individual user pages."""
-
return self.site.generate_user_pages(output_dir)
-
-
# Utility Methods
-
def get_stats(self) -> dict:
-
"""Get comprehensive statistics."""
-
base_stats = self.repository.get_status()
-
feed_stats = self.feeds.get_stats()
-
link_stats = self.links.get_stats()
-
-
return {
-
**base_stats,
-
**feed_stats,
-
**link_stats,
-
'config': {
-
'git_store': str(self.config.git_store),
-
'cache_dir': str(self.config.cache_dir),
-
'total_users_configured': len(self.config.users),
-
}
-
}
-
-
async def full_sync(self, progress_callback=None) -> dict:
-
"""Perform a complete sync: feeds -> links -> commit."""
-
results = {}
-
-
# Sync feeds
-
results['feeds'] = await self.sync_feeds(progress_callback=progress_callback)
-
-
# Process links
-
results['links'] = self.process_links()
-
-
# Commit changes
-
message = f"Sync completed at {datetime.now().isoformat()}"
-
results['committed'] = self.commit_changes(message)
-
-
return results
-
-
def validate_config(self) -> list[str]:
-
"""Validate configuration and return any errors."""
-
errors = []
-
-
# Check paths exist
-
if not self.config.git_store.parent.exists():
-
errors.append(f"Git store parent directory does not exist: {self.config.git_store.parent}")
-
-
if not self.config.cache_dir.parent.exists():
-
errors.append(f"Cache directory parent does not exist: {self.config.cache_dir.parent}")
-
-
# Validate user configs
-
for user in self.config.users:
-
if not user.feeds:
-
errors.append(f"User {user.username} has no feeds configured")
-
-
for feed_url in user.feeds:
-
# Basic URL validation is handled by pydantic
-
pass
-
-
return errors
-
-
def __enter__(self):
-
"""Context manager entry."""
-
return self
-
-
def __exit__(self, exc_type, exc_val, exc_tb):
-
"""Context manager exit."""
-
# Could add cleanup logic here if needed
-
pass
···
tests/__init__.py

This is a binary file and will not be displayed.

+84
tests/conftest.py
···
···
+
"""Test configuration and fixtures for thicket."""
+
+
import tempfile
+
from pathlib import Path
+
+
import pytest
+
+
from thicket.models import ThicketConfig, UserConfig
+
+
+
@pytest.fixture
+
def temp_dir():
+
"""Create a temporary directory for tests."""
+
with tempfile.TemporaryDirectory() as tmp_dir:
+
yield Path(tmp_dir)
+
+
+
@pytest.fixture
+
def sample_config(temp_dir):
+
"""Create a sample configuration for testing."""
+
git_store = temp_dir / "git_store"
+
cache_dir = temp_dir / "cache"
+
+
return ThicketConfig(
+
git_store=git_store,
+
cache_dir=cache_dir,
+
users=[
+
UserConfig(
+
username="testuser",
+
feeds=["https://example.com/feed.xml"],
+
email="test@example.com",
+
display_name="Test User",
+
)
+
],
+
)
+
+
+
@pytest.fixture
+
def sample_atom_feed():
+
"""Sample Atom feed XML for testing."""
+
return """<?xml version="1.0" encoding="utf-8"?>
+
<feed xmlns="http://www.w3.org/2005/Atom">
+
<title>Test Feed</title>
+
<link href="https://example.com/"/>
+
<updated>2025-01-01T00:00:00Z</updated>
+
<author>
+
<name>Test Author</name>
+
<email>author@example.com</email>
+
</author>
+
<id>https://example.com/</id>
+
+
<entry>
+
<title>Test Entry</title>
+
<link href="https://example.com/entry/1"/>
+
<id>https://example.com/entry/1</id>
+
<updated>2025-01-01T00:00:00Z</updated>
+
<summary>This is a test entry.</summary>
+
<content type="html">
+
<![CDATA[<p>This is the content of the test entry.</p>]]>
+
</content>
+
</entry>
+
</feed>"""
+
+
+
@pytest.fixture
+
def sample_rss_feed():
+
"""Sample RSS feed XML for testing."""
+
return """<?xml version="1.0" encoding="UTF-8"?>
+
<rss version="2.0">
+
<channel>
+
<title>Test RSS Feed</title>
+
<link>https://example.com/</link>
+
<description>Test RSS feed for testing</description>
+
<managingEditor>editor@example.com</managingEditor>
+
+
<item>
+
<title>Test RSS Entry</title>
+
<link>https://example.com/rss/entry/1</link>
+
<description>This is a test RSS entry.</description>
+
<pubDate>Mon, 01 Jan 2025 00:00:00 GMT</pubDate>
+
<guid>https://example.com/rss/entry/1</guid>
+
</item>
+
</channel>
+
</rss>"""
+131
tests/test_feed_parser.py
···
···
+
"""Tests for feed parser functionality."""
+
+
from pydantic import HttpUrl
+
+
from thicket.core.feed_parser import FeedParser
+
from thicket.models import AtomEntry, FeedMetadata
+
+
+
class TestFeedParser:
+
"""Test the FeedParser class."""
+
+
def test_init(self):
+
"""Test parser initialization."""
+
parser = FeedParser()
+
assert parser.user_agent == "thicket/0.1.0"
+
assert "a" in parser.allowed_tags
+
assert "href" in parser.allowed_attributes["a"]
+
+
def test_parse_atom_feed(self, sample_atom_feed):
+
"""Test parsing an Atom feed."""
+
parser = FeedParser()
+
metadata, entries = parser.parse_feed(sample_atom_feed)
+
+
# Check metadata
+
assert isinstance(metadata, FeedMetadata)
+
assert metadata.title == "Test Feed"
+
assert metadata.author_name == "Test Author"
+
assert metadata.author_email == "author@example.com"
+
assert metadata.link == HttpUrl("https://example.com/")
+
+
# Check entries
+
assert len(entries) == 1
+
entry = entries[0]
+
assert isinstance(entry, AtomEntry)
+
assert entry.title == "Test Entry"
+
assert entry.id == "https://example.com/entry/1"
+
assert entry.link == HttpUrl("https://example.com/entry/1")
+
assert entry.summary == "This is a test entry."
+
assert "<p>This is the content of the test entry.</p>" in entry.content
+
+
def test_parse_rss_feed(self, sample_rss_feed):
+
"""Test parsing an RSS feed."""
+
parser = FeedParser()
+
metadata, entries = parser.parse_feed(sample_rss_feed)
+
+
# Check metadata
+
assert isinstance(metadata, FeedMetadata)
+
assert metadata.title == "Test RSS Feed"
+
assert metadata.link == HttpUrl("https://example.com/")
+
assert metadata.author_email == "editor@example.com"
+
+
# Check entries
+
assert len(entries) == 1
+
entry = entries[0]
+
assert isinstance(entry, AtomEntry)
+
assert entry.title == "Test RSS Entry"
+
assert entry.id == "https://example.com/rss/entry/1"
+
assert entry.summary == "This is a test RSS entry."
+
+
def test_sanitize_entry_id(self):
+
"""Test entry ID sanitization."""
+
parser = FeedParser()
+
+
# Test URL ID
+
url_id = "https://example.com/posts/2025/01/test-post"
+
sanitized = parser.sanitize_entry_id(url_id)
+
assert sanitized == "posts_2025_01_test-post"
+
+
# Test problematic characters
+
bad_id = "test/with\\bad:chars|and<more>"
+
sanitized = parser.sanitize_entry_id(bad_id)
+
assert sanitized == "test_with_bad_chars_and_more_"
+
+
# Test empty ID
+
empty_id = ""
+
sanitized = parser.sanitize_entry_id(empty_id)
+
assert sanitized == "entry"
+
+
# Test very long ID
+
long_id = "a" * 300
+
sanitized = parser.sanitize_entry_id(long_id)
+
assert len(sanitized) == 200
+
+
def test_sanitize_html(self):
+
"""Test HTML sanitization."""
+
parser = FeedParser()
+
+
# Test allowed tags
+
safe_html = "<p>This is <strong>safe</strong> HTML</p>"
+
sanitized = parser._sanitize_html(safe_html)
+
assert sanitized == safe_html
+
+
# Test dangerous tags
+
dangerous_html = "<script>alert('xss')</script><p>Safe content</p>"
+
sanitized = parser._sanitize_html(dangerous_html)
+
assert "<script>" not in sanitized
+
assert "<p>Safe content</p>" in sanitized
+
+
# Test attributes
+
html_with_attrs = '<a href="https://example.com" onclick="alert()">Link</a>'
+
sanitized = parser._sanitize_html(html_with_attrs)
+
assert 'href="https://example.com"' in sanitized
+
assert 'onclick' not in sanitized
+
+
def test_extract_feed_metadata(self):
+
"""Test feed metadata extraction."""
+
parser = FeedParser()
+
+
# Test with feedparser parsed data
+
import feedparser
+
parsed = feedparser.parse("""<?xml version="1.0" encoding="utf-8"?>
+
<feed xmlns="http://www.w3.org/2005/Atom">
+
<title>Test Feed</title>
+
<link href="https://example.com/"/>
+
<author>
+
<name>Test Author</name>
+
<email>author@example.com</email>
+
<uri>https://example.com/about</uri>
+
</author>
+
<logo>https://example.com/logo.png</logo>
+
<icon>https://example.com/icon.png</icon>
+
</feed>""")
+
+
metadata = parser._extract_feed_metadata(parsed.feed)
+
assert metadata.title == "Test Feed"
+
assert metadata.author_name == "Test Author"
+
assert metadata.author_email == "author@example.com"
+
assert metadata.author_uri == HttpUrl("https://example.com/about")
+
assert metadata.link == HttpUrl("https://example.com/")
+
assert metadata.logo == HttpUrl("https://example.com/logo.png")
+
assert metadata.icon == HttpUrl("https://example.com/icon.png")
+275
tests/test_git_store.py
···
···
+
"""Tests for Git store functionality."""
+
+
import json
+
from datetime import datetime
+
+
from pydantic import HttpUrl
+
+
from thicket.core.git_store import GitStore
+
from thicket.models import AtomEntry, DuplicateMap, UserMetadata
+
+
+
class TestGitStore:
+
"""Test the GitStore class."""
+
+
def test_init_new_repo(self, temp_dir):
+
"""Test initializing a new Git repository."""
+
repo_path = temp_dir / "test_repo"
+
store = GitStore(repo_path)
+
+
assert store.repo_path == repo_path
+
assert store.repo is not None
+
assert repo_path.exists()
+
assert (repo_path / ".git").exists()
+
assert (repo_path / "index.json").exists()
+
assert (repo_path / "duplicates.json").exists()
+
+
def test_init_existing_repo(self, temp_dir):
+
"""Test initializing with existing repository."""
+
repo_path = temp_dir / "test_repo"
+
+
# Create first store
+
store1 = GitStore(repo_path)
+
store1.add_user("testuser", display_name="Test User")
+
+
# Create second store pointing to same repo
+
store2 = GitStore(repo_path)
+
user = store2.get_user("testuser")
+
+
assert user is not None
+
assert user.username == "testuser"
+
assert user.display_name == "Test User"
+
+
def test_add_user(self, temp_dir):
+
"""Test adding a user to the Git store."""
+
store = GitStore(temp_dir / "test_repo")
+
+
user = store.add_user(
+
username="testuser",
+
display_name="Test User",
+
email="test@example.com",
+
homepage="https://example.com",
+
icon="https://example.com/icon.png",
+
feeds=["https://example.com/feed.xml"],
+
)
+
+
assert isinstance(user, UserMetadata)
+
assert user.username == "testuser"
+
assert user.display_name == "Test User"
+
assert user.email == "test@example.com"
+
assert user.homepage == "https://example.com"
+
assert user.icon == "https://example.com/icon.png"
+
assert user.feeds == ["https://example.com/feed.xml"]
+
assert user.directory == "testuser"
+
+
# Check that user directory was created
+
user_dir = store.repo_path / "testuser"
+
assert user_dir.exists()
+
+
# Check user exists in index
+
stored_user = store.get_user("testuser")
+
assert stored_user is not None
+
assert stored_user.username == "testuser"
+
assert stored_user.display_name == "Test User"
+
+
def test_get_user(self, temp_dir):
+
"""Test getting user metadata."""
+
store = GitStore(temp_dir / "test_repo")
+
+
# Add user
+
store.add_user("testuser", display_name="Test User")
+
+
# Get user
+
user = store.get_user("testuser")
+
assert user is not None
+
assert user.username == "testuser"
+
assert user.display_name == "Test User"
+
+
# Try to get non-existent user
+
non_user = store.get_user("nonexistent")
+
assert non_user is None
+
+
def test_store_entry(self, temp_dir):
+
"""Test storing an entry."""
+
store = GitStore(temp_dir / "test_repo")
+
+
# Add user first
+
store.add_user("testuser")
+
+
# Create test entry
+
entry = AtomEntry(
+
id="https://example.com/entry/1",
+
title="Test Entry",
+
link=HttpUrl("https://example.com/entry/1"),
+
updated=datetime.now(),
+
summary="Test entry summary",
+
content="<p>Test content</p>",
+
)
+
+
# Store entry
+
result = store.store_entry("testuser", entry)
+
assert result is True
+
+
# Check that entry file was created
+
user_dir = store.repo_path / "testuser"
+
entry_files = list(user_dir.glob("*.json"))
+
entry_files = [f for f in entry_files if f.name != "metadata.json"]
+
assert len(entry_files) == 1
+
+
# Check entry content
+
with open(entry_files[0]) as f:
+
stored_entry = json.load(f)
+
assert stored_entry["title"] == "Test Entry"
+
assert stored_entry["id"] == "https://example.com/entry/1"
+
+
def test_get_entry(self, temp_dir):
+
"""Test retrieving an entry."""
+
store = GitStore(temp_dir / "test_repo")
+
+
# Add user and entry
+
store.add_user("testuser")
+
entry = AtomEntry(
+
id="https://example.com/entry/1",
+
title="Test Entry",
+
link=HttpUrl("https://example.com/entry/1"),
+
updated=datetime.now(),
+
)
+
store.store_entry("testuser", entry)
+
+
# Get entry
+
retrieved = store.get_entry("testuser", "https://example.com/entry/1")
+
assert retrieved is not None
+
assert retrieved.title == "Test Entry"
+
assert retrieved.id == "https://example.com/entry/1"
+
+
# Try to get non-existent entry
+
non_entry = store.get_entry("testuser", "https://example.com/nonexistent")
+
assert non_entry is None
+
+
def test_list_entries(self, temp_dir):
+
"""Test listing entries for a user."""
+
store = GitStore(temp_dir / "test_repo")
+
+
# Add user
+
store.add_user("testuser")
+
+
# Add multiple entries
+
for i in range(3):
+
entry = AtomEntry(
+
id=f"https://example.com/entry/{i}",
+
title=f"Test Entry {i}",
+
link=HttpUrl(f"https://example.com/entry/{i}"),
+
updated=datetime.now(),
+
)
+
store.store_entry("testuser", entry)
+
+
# List all entries
+
entries = store.list_entries("testuser")
+
assert len(entries) == 3
+
+
# List with limit
+
limited = store.list_entries("testuser", limit=2)
+
assert len(limited) == 2
+
+
# List for non-existent user
+
none_entries = store.list_entries("nonexistent")
+
assert len(none_entries) == 0
+
+
def test_duplicates(self, temp_dir):
+
"""Test duplicate management."""
+
store = GitStore(temp_dir / "test_repo")
+
+
# Get initial duplicates (should be empty)
+
duplicates = store.get_duplicates()
+
assert isinstance(duplicates, DuplicateMap)
+
assert len(duplicates.duplicates) == 0
+
+
# Add duplicate
+
store.add_duplicate("https://example.com/dup", "https://example.com/canonical")
+
+
# Check duplicate was added
+
duplicates = store.get_duplicates()
+
assert len(duplicates.duplicates) == 1
+
assert duplicates.is_duplicate("https://example.com/dup")
+
assert duplicates.get_canonical("https://example.com/dup") == "https://example.com/canonical"
+
+
# Remove duplicate
+
result = store.remove_duplicate("https://example.com/dup")
+
assert result is True
+
+
# Check duplicate was removed
+
duplicates = store.get_duplicates()
+
assert len(duplicates.duplicates) == 0
+
assert not duplicates.is_duplicate("https://example.com/dup")
+
+
def test_search_entries(self, temp_dir):
+
"""Test searching entries."""
+
store = GitStore(temp_dir / "test_repo")
+
+
# Add user
+
store.add_user("testuser")
+
+
# Add entries with different content
+
entries_data = [
+
("Test Python Programming", "Learning Python basics"),
+
("JavaScript Tutorial", "Advanced JavaScript concepts"),
+
("Python Web Development", "Building web apps with Python"),
+
]
+
+
for title, summary in entries_data:
+
entry = AtomEntry(
+
id=f"https://example.com/entry/{title.lower().replace(' ', '-')}",
+
title=title,
+
link=HttpUrl(f"https://example.com/entry/{title.lower().replace(' ', '-')}"),
+
updated=datetime.now(),
+
summary=summary,
+
)
+
store.store_entry("testuser", entry)
+
+
# Search for Python entries
+
results = store.search_entries("Python")
+
assert len(results) == 2
+
+
# Search for specific user
+
results = store.search_entries("Python", username="testuser")
+
assert len(results) == 2
+
+
# Search with limit
+
results = store.search_entries("Python", limit=1)
+
assert len(results) == 1
+
+
# Search for non-existent term
+
results = store.search_entries("NonExistent")
+
assert len(results) == 0
+
+
def test_get_stats(self, temp_dir):
+
"""Test getting repository statistics."""
+
store = GitStore(temp_dir / "test_repo")
+
+
# Get initial stats
+
stats = store.get_stats()
+
assert stats["total_users"] == 0
+
assert stats["total_entries"] == 0
+
assert stats["total_duplicates"] == 0
+
+
# Add user and entries
+
store.add_user("testuser")
+
for i in range(3):
+
entry = AtomEntry(
+
id=f"https://example.com/entry/{i}",
+
title=f"Test Entry {i}",
+
link=HttpUrl(f"https://example.com/entry/{i}"),
+
updated=datetime.now(),
+
)
+
store.store_entry("testuser", entry)
+
+
# Add duplicate
+
store.add_duplicate("https://example.com/dup", "https://example.com/canonical")
+
+
# Get updated stats
+
stats = store.get_stats()
+
assert stats["total_users"] == 1
+
assert stats["total_entries"] == 3
+
assert stats["total_duplicates"] == 1
+
assert "last_updated" in stats
+
assert "repository_size" in stats
+352
tests/test_models.py
···
···
+
"""Tests for pydantic models."""
+
+
from datetime import datetime
+
+
import pytest
+
from pydantic import HttpUrl, ValidationError
+
+
from thicket.models import (
+
AtomEntry,
+
DuplicateMap,
+
FeedMetadata,
+
ThicketConfig,
+
UserConfig,
+
UserMetadata,
+
)
+
+
+
class TestUserConfig:
+
"""Test UserConfig model."""
+
+
def test_valid_user_config(self):
+
"""Test creating valid user config."""
+
config = UserConfig(
+
username="testuser",
+
feeds=["https://example.com/feed.xml"],
+
email="test@example.com",
+
homepage="https://example.com",
+
display_name="Test User",
+
)
+
+
assert config.username == "testuser"
+
assert len(config.feeds) == 1
+
assert config.feeds[0] == HttpUrl("https://example.com/feed.xml")
+
assert config.email == "test@example.com"
+
assert config.display_name == "Test User"
+
+
def test_invalid_email(self):
+
"""Test validation of invalid email."""
+
with pytest.raises(ValidationError):
+
UserConfig(
+
username="testuser",
+
feeds=["https://example.com/feed.xml"],
+
email="invalid-email",
+
)
+
+
def test_invalid_feed_url(self):
+
"""Test validation of invalid feed URL."""
+
with pytest.raises(ValidationError):
+
UserConfig(
+
username="testuser",
+
feeds=["not-a-url"],
+
)
+
+
def test_optional_fields(self):
+
"""Test optional fields with None values."""
+
config = UserConfig(
+
username="testuser",
+
feeds=["https://example.com/feed.xml"],
+
)
+
+
assert config.email is None
+
assert config.homepage is None
+
assert config.icon is None
+
assert config.display_name is None
+
+
+
class TestThicketConfig:
+
"""Test ThicketConfig model."""
+
+
def test_valid_config(self, temp_dir):
+
"""Test creating valid configuration."""
+
config = ThicketConfig(
+
git_store=temp_dir / "git_store",
+
cache_dir=temp_dir / "cache",
+
users=[
+
UserConfig(
+
username="testuser",
+
feeds=["https://example.com/feed.xml"],
+
)
+
],
+
)
+
+
assert config.git_store == temp_dir / "git_store"
+
assert config.cache_dir == temp_dir / "cache"
+
assert len(config.users) == 1
+
assert config.users[0].username == "testuser"
+
+
def test_find_user(self, temp_dir):
+
"""Test finding user by username."""
+
config = ThicketConfig(
+
git_store=temp_dir / "git_store",
+
cache_dir=temp_dir / "cache",
+
users=[
+
UserConfig(username="user1", feeds=["https://example.com/feed1.xml"]),
+
UserConfig(username="user2", feeds=["https://example.com/feed2.xml"]),
+
],
+
)
+
+
user = config.find_user("user1")
+
assert user is not None
+
assert user.username == "user1"
+
+
non_user = config.find_user("nonexistent")
+
assert non_user is None
+
+
def test_add_user(self, temp_dir):
+
"""Test adding a new user."""
+
config = ThicketConfig(
+
git_store=temp_dir / "git_store",
+
cache_dir=temp_dir / "cache",
+
users=[],
+
)
+
+
new_user = UserConfig(
+
username="newuser",
+
feeds=["https://example.com/feed.xml"],
+
)
+
+
config.add_user(new_user)
+
assert len(config.users) == 1
+
assert config.users[0].username == "newuser"
+
+
def test_add_feed_to_user(self, temp_dir):
+
"""Test adding feed to existing user."""
+
config = ThicketConfig(
+
git_store=temp_dir / "git_store",
+
cache_dir=temp_dir / "cache",
+
users=[
+
UserConfig(username="testuser", feeds=["https://example.com/feed1.xml"]),
+
],
+
)
+
+
result = config.add_feed_to_user("testuser", HttpUrl("https://example.com/feed2.xml"))
+
assert result is True
+
+
user = config.find_user("testuser")
+
assert len(user.feeds) == 2
+
assert HttpUrl("https://example.com/feed2.xml") in user.feeds
+
+
# Test adding to non-existent user
+
result = config.add_feed_to_user("nonexistent", HttpUrl("https://example.com/feed.xml"))
+
assert result is False
+
+
+
class TestAtomEntry:
+
"""Test AtomEntry model."""
+
+
def test_valid_entry(self):
+
"""Test creating valid Atom entry."""
+
entry = AtomEntry(
+
id="https://example.com/entry/1",
+
title="Test Entry",
+
link=HttpUrl("https://example.com/entry/1"),
+
updated=datetime.now(),
+
published=datetime.now(),
+
summary="Test summary",
+
content="<p>Test content</p>",
+
content_type="html",
+
author={"name": "Test Author"},
+
categories=["test", "example"],
+
)
+
+
assert entry.id == "https://example.com/entry/1"
+
assert entry.title == "Test Entry"
+
assert entry.summary == "Test summary"
+
assert entry.content == "<p>Test content</p>"
+
assert entry.content_type == "html"
+
assert entry.author["name"] == "Test Author"
+
assert "test" in entry.categories
+
+
def test_minimal_entry(self):
+
"""Test creating minimal Atom entry."""
+
entry = AtomEntry(
+
id="https://example.com/entry/1",
+
title="Test Entry",
+
link=HttpUrl("https://example.com/entry/1"),
+
updated=datetime.now(),
+
)
+
+
assert entry.id == "https://example.com/entry/1"
+
assert entry.title == "Test Entry"
+
assert entry.published is None
+
assert entry.summary is None
+
assert entry.content is None
+
assert entry.content_type == "html" # default
+
assert entry.author is None
+
assert entry.categories == []
+
+
+
class TestDuplicateMap:
+
"""Test DuplicateMap model."""
+
+
def test_empty_duplicates(self):
+
"""Test empty duplicate map."""
+
dup_map = DuplicateMap()
+
assert len(dup_map.duplicates) == 0
+
assert not dup_map.is_duplicate("test")
+
assert dup_map.get_canonical("test") == "test"
+
+
def test_add_duplicate(self):
+
"""Test adding duplicate mapping."""
+
dup_map = DuplicateMap()
+
dup_map.add_duplicate("dup1", "canonical1")
+
+
assert len(dup_map.duplicates) == 1
+
assert dup_map.is_duplicate("dup1")
+
assert dup_map.get_canonical("dup1") == "canonical1"
+
assert dup_map.get_canonical("canonical1") == "canonical1"
+
+
def test_remove_duplicate(self):
+
"""Test removing duplicate mapping."""
+
dup_map = DuplicateMap()
+
dup_map.add_duplicate("dup1", "canonical1")
+
+
result = dup_map.remove_duplicate("dup1")
+
assert result is True
+
assert len(dup_map.duplicates) == 0
+
assert not dup_map.is_duplicate("dup1")
+
+
# Test removing non-existent duplicate
+
result = dup_map.remove_duplicate("nonexistent")
+
assert result is False
+
+
def test_get_duplicates_for_canonical(self):
+
"""Test getting all duplicates for a canonical ID."""
+
dup_map = DuplicateMap()
+
dup_map.add_duplicate("dup1", "canonical1")
+
dup_map.add_duplicate("dup2", "canonical1")
+
dup_map.add_duplicate("dup3", "canonical2")
+
+
dups = dup_map.get_duplicates_for_canonical("canonical1")
+
assert len(dups) == 2
+
assert "dup1" in dups
+
assert "dup2" in dups
+
+
dups = dup_map.get_duplicates_for_canonical("canonical2")
+
assert len(dups) == 1
+
assert "dup3" in dups
+
+
dups = dup_map.get_duplicates_for_canonical("nonexistent")
+
assert len(dups) == 0
+
+
+
class TestFeedMetadata:
+
"""Test FeedMetadata model."""
+
+
def test_valid_metadata(self):
+
"""Test creating valid feed metadata."""
+
metadata = FeedMetadata(
+
title="Test Feed",
+
author_name="Test Author",
+
author_email="author@example.com",
+
author_uri=HttpUrl("https://example.com/author"),
+
link=HttpUrl("https://example.com"),
+
description="Test description",
+
)
+
+
assert metadata.title == "Test Feed"
+
assert metadata.author_name == "Test Author"
+
assert metadata.author_email == "author@example.com"
+
assert metadata.link == HttpUrl("https://example.com")
+
+
def test_to_user_config(self):
+
"""Test converting metadata to user config."""
+
metadata = FeedMetadata(
+
title="Test Feed",
+
author_name="Test Author",
+
author_email="author@example.com",
+
author_uri=HttpUrl("https://example.com/author"),
+
link=HttpUrl("https://example.com"),
+
logo=HttpUrl("https://example.com/logo.png"),
+
)
+
+
feed_url = HttpUrl("https://example.com/feed.xml")
+
user_config = metadata.to_user_config("testuser", feed_url)
+
+
assert user_config.username == "testuser"
+
assert user_config.feeds == [feed_url]
+
assert user_config.display_name == "Test Author"
+
assert user_config.email == "author@example.com"
+
assert user_config.homepage == HttpUrl("https://example.com/author")
+
assert user_config.icon == HttpUrl("https://example.com/logo.png")
+
+
def test_to_user_config_fallbacks(self):
+
"""Test fallback logic in to_user_config."""
+
metadata = FeedMetadata(
+
title="Test Feed",
+
link=HttpUrl("https://example.com"),
+
icon=HttpUrl("https://example.com/icon.png"),
+
)
+
+
feed_url = HttpUrl("https://example.com/feed.xml")
+
user_config = metadata.to_user_config("testuser", feed_url)
+
+
assert user_config.display_name == "Test Feed" # Falls back to title
+
assert user_config.homepage == HttpUrl("https://example.com") # Falls back to link
+
assert user_config.icon == HttpUrl("https://example.com/icon.png")
+
assert user_config.email is None
+
+
+
class TestUserMetadata:
+
"""Test UserMetadata model."""
+
+
def test_valid_metadata(self):
+
"""Test creating valid user metadata."""
+
now = datetime.now()
+
metadata = UserMetadata(
+
username="testuser",
+
directory="testuser",
+
created=now,
+
last_updated=now,
+
feeds=["https://example.com/feed.xml"],
+
entry_count=5,
+
)
+
+
assert metadata.username == "testuser"
+
assert metadata.directory == "testuser"
+
assert metadata.entry_count == 5
+
assert len(metadata.feeds) == 1
+
+
def test_update_timestamp(self):
+
"""Test updating timestamp."""
+
now = datetime.now()
+
metadata = UserMetadata(
+
username="testuser",
+
directory="testuser",
+
created=now,
+
last_updated=now,
+
)
+
+
original_time = metadata.last_updated
+
metadata.update_timestamp()
+
+
assert metadata.last_updated > original_time
+
+
def test_increment_entry_count(self):
+
"""Test incrementing entry count."""
+
metadata = UserMetadata(
+
username="testuser",
+
directory="testuser",
+
created=datetime.now(),
+
last_updated=datetime.now(),
+
entry_count=5,
+
)
+
+
original_count = metadata.entry_count
+
original_time = metadata.last_updated
+
+
metadata.increment_entry_count(3)
+
+
assert metadata.entry_count == original_count + 3
+
assert metadata.last_updated > original_time
+9 -83
uv.lock
···
version = 1
-
revision = 2
requires-python = ">=3.9"
resolution-markers = [
"python_full_version >= '3.10'",
···
]
[[package]]
-
name = "jinja2"
-
version = "3.1.6"
-
source = { registry = "https://pypi.org/simple" }
-
dependencies = [
-
{ name = "markupsafe" },
-
]
-
sdist = { url = "https://files.pythonhosted.org/packages/df/bf/f7da0350254c0ed7c72f3e33cef02e048281fec7ecec5f032d4aac52226b/jinja2-3.1.6.tar.gz", hash = "sha256:0137fb05990d35f1275a587e9aee6d56da821fc83491a0fb838183be43f66d6d", size = 245115, upload-time = "2025-03-05T20:05:02.478Z" }
-
wheels = [
-
{ url = "https://files.pythonhosted.org/packages/62/a1/3d680cbfd5f4b8f15abc1d571870c5fc3e594bb582bc3b64ea099db13e56/jinja2-3.1.6-py3-none-any.whl", hash = "sha256:85ece4451f492d0c13c5dd7c13a64681a86afae63a5f347908daf103ce6d2f67", size = 134899, upload-time = "2025-03-05T20:05:00.369Z" },
-
]
-
-
[[package]]
name = "markdown-it-py"
version = "3.0.0"
source = { registry = "https://pypi.org/simple" }
···
sdist = { url = "https://files.pythonhosted.org/packages/38/71/3b932df36c1a044d397a1f92d1cf91ee0a503d91e470cbd670aa66b07ed0/markdown-it-py-3.0.0.tar.gz", hash = "sha256:e3f60a94fa066dc52ec76661e37c851cb232d92f9886b15cb560aaada2df8feb", size = 74596, upload-time = "2023-06-03T06:41:14.443Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/42/d7/1ec15b46af6af88f19b8e5ffea08fa375d433c998b8a7639e76935c14f1f/markdown_it_py-3.0.0-py3-none-any.whl", hash = "sha256:355216845c60bd96232cd8d8c40e8f9765cc86f46880e43a8fd22dc1a1a8cab1", size = 87528, upload-time = "2023-06-03T06:41:11.019Z" },
-
]
-
-
[[package]]
-
name = "markupsafe"
-
version = "3.0.2"
-
source = { registry = "https://pypi.org/simple" }
-
sdist = { url = "https://files.pythonhosted.org/packages/b2/97/5d42485e71dfc078108a86d6de8fa46db44a1a9295e89c5d6d4a06e23a62/markupsafe-3.0.2.tar.gz", hash = "sha256:ee55d3edf80167e48ea11a923c7386f4669df67d7994554387f84e7d8b0a2bf0", size = 20537, upload-time = "2024-10-18T15:21:54.129Z" }
-
wheels = [
-
{ url = "https://files.pythonhosted.org/packages/04/90/d08277ce111dd22f77149fd1a5d4653eeb3b3eaacbdfcbae5afb2600eebd/MarkupSafe-3.0.2-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:7e94c425039cde14257288fd61dcfb01963e658efbc0ff54f5306b06054700f8", size = 14357, upload-time = "2024-10-18T15:20:51.44Z" },
-
{ url = "https://files.pythonhosted.org/packages/04/e1/6e2194baeae0bca1fae6629dc0cbbb968d4d941469cbab11a3872edff374/MarkupSafe-3.0.2-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:9e2d922824181480953426608b81967de705c3cef4d1af983af849d7bd619158", size = 12393, upload-time = "2024-10-18T15:20:52.426Z" },
-
{ url = "https://files.pythonhosted.org/packages/1d/69/35fa85a8ece0a437493dc61ce0bb6d459dcba482c34197e3efc829aa357f/MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:38a9ef736c01fccdd6600705b09dc574584b89bea478200c5fbf112a6b0d5579", size = 21732, upload-time = "2024-10-18T15:20:53.578Z" },
-
{ url = "https://files.pythonhosted.org/packages/22/35/137da042dfb4720b638d2937c38a9c2df83fe32d20e8c8f3185dbfef05f7/MarkupSafe-3.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:bbcb445fa71794da8f178f0f6d66789a28d7319071af7a496d4d507ed566270d", size = 20866, upload-time = "2024-10-18T15:20:55.06Z" },
-
{ url = "https://files.pythonhosted.org/packages/29/28/6d029a903727a1b62edb51863232152fd335d602def598dade38996887f0/MarkupSafe-3.0.2-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:57cb5a3cf367aeb1d316576250f65edec5bb3be939e9247ae594b4bcbc317dfb", size = 20964, upload-time = "2024-10-18T15:20:55.906Z" },
-
{ url = "https://files.pythonhosted.org/packages/cc/cd/07438f95f83e8bc028279909d9c9bd39e24149b0d60053a97b2bc4f8aa51/MarkupSafe-3.0.2-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:3809ede931876f5b2ec92eef964286840ed3540dadf803dd570c3b7e13141a3b", size = 21977, upload-time = "2024-10-18T15:20:57.189Z" },
-
{ url = "https://files.pythonhosted.org/packages/29/01/84b57395b4cc062f9c4c55ce0df7d3108ca32397299d9df00fedd9117d3d/MarkupSafe-3.0.2-cp310-cp310-musllinux_1_2_i686.whl", hash = "sha256:e07c3764494e3776c602c1e78e298937c3315ccc9043ead7e685b7f2b8d47b3c", size = 21366, upload-time = "2024-10-18T15:20:58.235Z" },
-
{ url = "https://files.pythonhosted.org/packages/bd/6e/61ebf08d8940553afff20d1fb1ba7294b6f8d279df9fd0c0db911b4bbcfd/MarkupSafe-3.0.2-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:b424c77b206d63d500bcb69fa55ed8d0e6a3774056bdc4839fc9298a7edca171", size = 21091, upload-time = "2024-10-18T15:20:59.235Z" },
-
{ url = "https://files.pythonhosted.org/packages/11/23/ffbf53694e8c94ebd1e7e491de185124277964344733c45481f32ede2499/MarkupSafe-3.0.2-cp310-cp310-win32.whl", hash = "sha256:fcabf5ff6eea076f859677f5f0b6b5c1a51e70a376b0579e0eadef8db48c6b50", size = 15065, upload-time = "2024-10-18T15:21:00.307Z" },
-
{ url = "https://files.pythonhosted.org/packages/44/06/e7175d06dd6e9172d4a69a72592cb3f7a996a9c396eee29082826449bbc3/MarkupSafe-3.0.2-cp310-cp310-win_amd64.whl", hash = "sha256:6af100e168aa82a50e186c82875a5893c5597a0c1ccdb0d8b40240b1f28b969a", size = 15514, upload-time = "2024-10-18T15:21:01.122Z" },
-
{ url = "https://files.pythonhosted.org/packages/6b/28/bbf83e3f76936960b850435576dd5e67034e200469571be53f69174a2dfd/MarkupSafe-3.0.2-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:9025b4018f3a1314059769c7bf15441064b2207cb3f065e6ea1e7359cb46db9d", size = 14353, upload-time = "2024-10-18T15:21:02.187Z" },
-
{ url = "https://files.pythonhosted.org/packages/6c/30/316d194b093cde57d448a4c3209f22e3046c5bb2fb0820b118292b334be7/MarkupSafe-3.0.2-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:93335ca3812df2f366e80509ae119189886b0f3c2b81325d39efdb84a1e2ae93", size = 12392, upload-time = "2024-10-18T15:21:02.941Z" },
-
{ url = "https://files.pythonhosted.org/packages/f2/96/9cdafba8445d3a53cae530aaf83c38ec64c4d5427d975c974084af5bc5d2/MarkupSafe-3.0.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:2cb8438c3cbb25e220c2ab33bb226559e7afb3baec11c4f218ffa7308603c832", size = 23984, upload-time = "2024-10-18T15:21:03.953Z" },
-
{ url = "https://files.pythonhosted.org/packages/f1/a4/aefb044a2cd8d7334c8a47d3fb2c9f328ac48cb349468cc31c20b539305f/MarkupSafe-3.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:a123e330ef0853c6e822384873bef7507557d8e4a082961e1defa947aa59ba84", size = 23120, upload-time = "2024-10-18T15:21:06.495Z" },
-
{ url = "https://files.pythonhosted.org/packages/8d/21/5e4851379f88f3fad1de30361db501300d4f07bcad047d3cb0449fc51f8c/MarkupSafe-3.0.2-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:1e084f686b92e5b83186b07e8a17fc09e38fff551f3602b249881fec658d3eca", size = 23032, upload-time = "2024-10-18T15:21:07.295Z" },
-
{ url = "https://files.pythonhosted.org/packages/00/7b/e92c64e079b2d0d7ddf69899c98842f3f9a60a1ae72657c89ce2655c999d/MarkupSafe-3.0.2-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d8213e09c917a951de9d09ecee036d5c7d36cb6cb7dbaece4c71a60d79fb9798", size = 24057, upload-time = "2024-10-18T15:21:08.073Z" },
-
{ url = "https://files.pythonhosted.org/packages/f9/ac/46f960ca323037caa0a10662ef97d0a4728e890334fc156b9f9e52bcc4ca/MarkupSafe-3.0.2-cp311-cp311-musllinux_1_2_i686.whl", hash = "sha256:5b02fb34468b6aaa40dfc198d813a641e3a63b98c2b05a16b9f80b7ec314185e", size = 23359, upload-time = "2024-10-18T15:21:09.318Z" },
-
{ url = "https://files.pythonhosted.org/packages/69/84/83439e16197337b8b14b6a5b9c2105fff81d42c2a7c5b58ac7b62ee2c3b1/MarkupSafe-3.0.2-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:0bff5e0ae4ef2e1ae4fdf2dfd5b76c75e5c2fa4132d05fc1b0dabcd20c7e28c4", size = 23306, upload-time = "2024-10-18T15:21:10.185Z" },
-
{ url = "https://files.pythonhosted.org/packages/9a/34/a15aa69f01e2181ed8d2b685c0d2f6655d5cca2c4db0ddea775e631918cd/MarkupSafe-3.0.2-cp311-cp311-win32.whl", hash = "sha256:6c89876f41da747c8d3677a2b540fb32ef5715f97b66eeb0c6b66f5e3ef6f59d", size = 15094, upload-time = "2024-10-18T15:21:11.005Z" },
-
{ url = "https://files.pythonhosted.org/packages/da/b8/3a3bd761922d416f3dc5d00bfbed11f66b1ab89a0c2b6e887240a30b0f6b/MarkupSafe-3.0.2-cp311-cp311-win_amd64.whl", hash = "sha256:70a87b411535ccad5ef2f1df5136506a10775d267e197e4cf531ced10537bd6b", size = 15521, upload-time = "2024-10-18T15:21:12.911Z" },
-
{ url = "https://files.pythonhosted.org/packages/22/09/d1f21434c97fc42f09d290cbb6350d44eb12f09cc62c9476effdb33a18aa/MarkupSafe-3.0.2-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:9778bd8ab0a994ebf6f84c2b949e65736d5575320a17ae8984a77fab08db94cf", size = 14274, upload-time = "2024-10-18T15:21:13.777Z" },
-
{ url = "https://files.pythonhosted.org/packages/6b/b0/18f76bba336fa5aecf79d45dcd6c806c280ec44538b3c13671d49099fdd0/MarkupSafe-3.0.2-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:846ade7b71e3536c4e56b386c2a47adf5741d2d8b94ec9dc3e92e5e1ee1e2225", size = 12348, upload-time = "2024-10-18T15:21:14.822Z" },
-
{ url = "https://files.pythonhosted.org/packages/e0/25/dd5c0f6ac1311e9b40f4af06c78efde0f3b5cbf02502f8ef9501294c425b/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1c99d261bd2d5f6b59325c92c73df481e05e57f19837bdca8413b9eac4bd8028", size = 24149, upload-time = "2024-10-18T15:21:15.642Z" },
-
{ url = "https://files.pythonhosted.org/packages/f3/f0/89e7aadfb3749d0f52234a0c8c7867877876e0a20b60e2188e9850794c17/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:e17c96c14e19278594aa4841ec148115f9c7615a47382ecb6b82bd8fea3ab0c8", size = 23118, upload-time = "2024-10-18T15:21:17.133Z" },
-
{ url = "https://files.pythonhosted.org/packages/d5/da/f2eeb64c723f5e3777bc081da884b414671982008c47dcc1873d81f625b6/MarkupSafe-3.0.2-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:88416bd1e65dcea10bc7569faacb2c20ce071dd1f87539ca2ab364bf6231393c", size = 22993, upload-time = "2024-10-18T15:21:18.064Z" },
-
{ url = "https://files.pythonhosted.org/packages/da/0e/1f32af846df486dce7c227fe0f2398dc7e2e51d4a370508281f3c1c5cddc/MarkupSafe-3.0.2-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:2181e67807fc2fa785d0592dc2d6206c019b9502410671cc905d132a92866557", size = 24178, upload-time = "2024-10-18T15:21:18.859Z" },
-
{ url = "https://files.pythonhosted.org/packages/c4/f6/bb3ca0532de8086cbff5f06d137064c8410d10779c4c127e0e47d17c0b71/MarkupSafe-3.0.2-cp312-cp312-musllinux_1_2_i686.whl", hash = "sha256:52305740fe773d09cffb16f8ed0427942901f00adedac82ec8b67752f58a1b22", size = 23319, upload-time = "2024-10-18T15:21:19.671Z" },
-
{ url = "https://files.pythonhosted.org/packages/a2/82/8be4c96ffee03c5b4a034e60a31294daf481e12c7c43ab8e34a1453ee48b/MarkupSafe-3.0.2-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:ad10d3ded218f1039f11a75f8091880239651b52e9bb592ca27de44eed242a48", size = 23352, upload-time = "2024-10-18T15:21:20.971Z" },
-
{ url = "https://files.pythonhosted.org/packages/51/ae/97827349d3fcffee7e184bdf7f41cd6b88d9919c80f0263ba7acd1bbcb18/MarkupSafe-3.0.2-cp312-cp312-win32.whl", hash = "sha256:0f4ca02bea9a23221c0182836703cbf8930c5e9454bacce27e767509fa286a30", size = 15097, upload-time = "2024-10-18T15:21:22.646Z" },
-
{ url = "https://files.pythonhosted.org/packages/c1/80/a61f99dc3a936413c3ee4e1eecac96c0da5ed07ad56fd975f1a9da5bc630/MarkupSafe-3.0.2-cp312-cp312-win_amd64.whl", hash = "sha256:8e06879fc22a25ca47312fbe7c8264eb0b662f6db27cb2d3bbbc74b1df4b9b87", size = 15601, upload-time = "2024-10-18T15:21:23.499Z" },
-
{ url = "https://files.pythonhosted.org/packages/83/0e/67eb10a7ecc77a0c2bbe2b0235765b98d164d81600746914bebada795e97/MarkupSafe-3.0.2-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:ba9527cdd4c926ed0760bc301f6728ef34d841f405abf9d4f959c478421e4efd", size = 14274, upload-time = "2024-10-18T15:21:24.577Z" },
-
{ url = "https://files.pythonhosted.org/packages/2b/6d/9409f3684d3335375d04e5f05744dfe7e9f120062c9857df4ab490a1031a/MarkupSafe-3.0.2-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:f8b3d067f2e40fe93e1ccdd6b2e1d16c43140e76f02fb1319a05cf2b79d99430", size = 12352, upload-time = "2024-10-18T15:21:25.382Z" },
-
{ url = "https://files.pythonhosted.org/packages/d2/f5/6eadfcd3885ea85fe2a7c128315cc1bb7241e1987443d78c8fe712d03091/MarkupSafe-3.0.2-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:569511d3b58c8791ab4c2e1285575265991e6d8f8700c7be0e88f86cb0672094", size = 24122, upload-time = "2024-10-18T15:21:26.199Z" },
-
{ url = "https://files.pythonhosted.org/packages/0c/91/96cf928db8236f1bfab6ce15ad070dfdd02ed88261c2afafd4b43575e9e9/MarkupSafe-3.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:15ab75ef81add55874e7ab7055e9c397312385bd9ced94920f2802310c930396", size = 23085, upload-time = "2024-10-18T15:21:27.029Z" },
-
{ url = "https://files.pythonhosted.org/packages/c2/cf/c9d56af24d56ea04daae7ac0940232d31d5a8354f2b457c6d856b2057d69/MarkupSafe-3.0.2-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:f3818cb119498c0678015754eba762e0d61e5b52d34c8b13d770f0719f7b1d79", size = 22978, upload-time = "2024-10-18T15:21:27.846Z" },
-
{ url = "https://files.pythonhosted.org/packages/2a/9f/8619835cd6a711d6272d62abb78c033bda638fdc54c4e7f4272cf1c0962b/MarkupSafe-3.0.2-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:cdb82a876c47801bb54a690c5ae105a46b392ac6099881cdfb9f6e95e4014c6a", size = 24208, upload-time = "2024-10-18T15:21:28.744Z" },
-
{ url = "https://files.pythonhosted.org/packages/f9/bf/176950a1792b2cd2102b8ffeb5133e1ed984547b75db47c25a67d3359f77/MarkupSafe-3.0.2-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:cabc348d87e913db6ab4aa100f01b08f481097838bdddf7c7a84b7575b7309ca", size = 23357, upload-time = "2024-10-18T15:21:29.545Z" },
-
{ url = "https://files.pythonhosted.org/packages/ce/4f/9a02c1d335caabe5c4efb90e1b6e8ee944aa245c1aaaab8e8a618987d816/MarkupSafe-3.0.2-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:444dcda765c8a838eaae23112db52f1efaf750daddb2d9ca300bcae1039adc5c", size = 23344, upload-time = "2024-10-18T15:21:30.366Z" },
-
{ url = "https://files.pythonhosted.org/packages/ee/55/c271b57db36f748f0e04a759ace9f8f759ccf22b4960c270c78a394f58be/MarkupSafe-3.0.2-cp313-cp313-win32.whl", hash = "sha256:bcf3e58998965654fdaff38e58584d8937aa3096ab5354d493c77d1fdd66d7a1", size = 15101, upload-time = "2024-10-18T15:21:31.207Z" },
-
{ url = "https://files.pythonhosted.org/packages/29/88/07df22d2dd4df40aba9f3e402e6dc1b8ee86297dddbad4872bd5e7b0094f/MarkupSafe-3.0.2-cp313-cp313-win_amd64.whl", hash = "sha256:e6a2a455bd412959b57a172ce6328d2dd1f01cb2135efda2e4576e8a23fa3b0f", size = 15603, upload-time = "2024-10-18T15:21:32.032Z" },
-
{ url = "https://files.pythonhosted.org/packages/62/6a/8b89d24db2d32d433dffcd6a8779159da109842434f1dd2f6e71f32f738c/MarkupSafe-3.0.2-cp313-cp313t-macosx_10_13_universal2.whl", hash = "sha256:b5a6b3ada725cea8a5e634536b1b01c30bcdcd7f9c6fff4151548d5bf6b3a36c", size = 14510, upload-time = "2024-10-18T15:21:33.625Z" },
-
{ url = "https://files.pythonhosted.org/packages/7a/06/a10f955f70a2e5a9bf78d11a161029d278eeacbd35ef806c3fd17b13060d/MarkupSafe-3.0.2-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:a904af0a6162c73e3edcb969eeeb53a63ceeb5d8cf642fade7d39e7963a22ddb", size = 12486, upload-time = "2024-10-18T15:21:34.611Z" },
-
{ url = "https://files.pythonhosted.org/packages/34/cf/65d4a571869a1a9078198ca28f39fba5fbb910f952f9dbc5220afff9f5e6/MarkupSafe-3.0.2-cp313-cp313t-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:4aa4e5faecf353ed117801a068ebab7b7e09ffb6e1d5e412dc852e0da018126c", size = 25480, upload-time = "2024-10-18T15:21:35.398Z" },
-
{ url = "https://files.pythonhosted.org/packages/0c/e3/90e9651924c430b885468b56b3d597cabf6d72be4b24a0acd1fa0e12af67/MarkupSafe-3.0.2-cp313-cp313t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c0ef13eaeee5b615fb07c9a7dadb38eac06a0608b41570d8ade51c56539e509d", size = 23914, upload-time = "2024-10-18T15:21:36.231Z" },
-
{ url = "https://files.pythonhosted.org/packages/66/8c/6c7cf61f95d63bb866db39085150df1f2a5bd3335298f14a66b48e92659c/MarkupSafe-3.0.2-cp313-cp313t-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:d16a81a06776313e817c951135cf7340a3e91e8c1ff2fac444cfd75fffa04afe", size = 23796, upload-time = "2024-10-18T15:21:37.073Z" },
-
{ url = "https://files.pythonhosted.org/packages/bb/35/cbe9238ec3f47ac9a7c8b3df7a808e7cb50fe149dc7039f5f454b3fba218/MarkupSafe-3.0.2-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:6381026f158fdb7c72a168278597a5e3a5222e83ea18f543112b2662a9b699c5", size = 25473, upload-time = "2024-10-18T15:21:37.932Z" },
-
{ url = "https://files.pythonhosted.org/packages/e6/32/7621a4382488aa283cc05e8984a9c219abad3bca087be9ec77e89939ded9/MarkupSafe-3.0.2-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:3d79d162e7be8f996986c064d1c7c817f6df3a77fe3d6859f6f9e7be4b8c213a", size = 24114, upload-time = "2024-10-18T15:21:39.799Z" },
-
{ url = "https://files.pythonhosted.org/packages/0d/80/0985960e4b89922cb5a0bac0ed39c5b96cbc1a536a99f30e8c220a996ed9/MarkupSafe-3.0.2-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:131a3c7689c85f5ad20f9f6fb1b866f402c445b220c19fe4308c0b147ccd2ad9", size = 24098, upload-time = "2024-10-18T15:21:40.813Z" },
-
{ url = "https://files.pythonhosted.org/packages/82/78/fedb03c7d5380df2427038ec8d973587e90561b2d90cd472ce9254cf348b/MarkupSafe-3.0.2-cp313-cp313t-win32.whl", hash = "sha256:ba8062ed2cf21c07a9e295d5b8a2a5ce678b913b45fdf68c32d95d6c1291e0b6", size = 15208, upload-time = "2024-10-18T15:21:41.814Z" },
-
{ url = "https://files.pythonhosted.org/packages/4f/65/6079a46068dfceaeabb5dcad6d674f5f5c61a6fa5673746f42a9f4c233b3/MarkupSafe-3.0.2-cp313-cp313t-win_amd64.whl", hash = "sha256:e444a31f8db13eb18ada366ab3cf45fd4b31e4db1236a4448f68778c1d1a5a2f", size = 15739, upload-time = "2024-10-18T15:21:42.784Z" },
-
{ url = "https://files.pythonhosted.org/packages/a7/ea/9b1530c3fdeeca613faeb0fb5cbcf2389d816072fab72a71b45749ef6062/MarkupSafe-3.0.2-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:eaa0a10b7f72326f1372a713e73c3f739b524b3af41feb43e4921cb529f5929a", size = 14344, upload-time = "2024-10-18T15:21:43.721Z" },
-
{ url = "https://files.pythonhosted.org/packages/4b/c2/fbdbfe48848e7112ab05e627e718e854d20192b674952d9042ebd8c9e5de/MarkupSafe-3.0.2-cp39-cp39-macosx_11_0_arm64.whl", hash = "sha256:48032821bbdf20f5799ff537c7ac3d1fba0ba032cfc06194faffa8cda8b560ff", size = 12389, upload-time = "2024-10-18T15:21:44.666Z" },
-
{ url = "https://files.pythonhosted.org/packages/f0/25/7a7c6e4dbd4f867d95d94ca15449e91e52856f6ed1905d58ef1de5e211d0/MarkupSafe-3.0.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:1a9d3f5f0901fdec14d8d2f66ef7d035f2157240a433441719ac9a3fba440b13", size = 21607, upload-time = "2024-10-18T15:21:45.452Z" },
-
{ url = "https://files.pythonhosted.org/packages/53/8f/f339c98a178f3c1e545622206b40986a4c3307fe39f70ccd3d9df9a9e425/MarkupSafe-3.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:88b49a3b9ff31e19998750c38e030fc7bb937398b1f78cfa599aaef92d693144", size = 20728, upload-time = "2024-10-18T15:21:46.295Z" },
-
{ url = "https://files.pythonhosted.org/packages/1a/03/8496a1a78308456dbd50b23a385c69b41f2e9661c67ea1329849a598a8f9/MarkupSafe-3.0.2-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:cfad01eed2c2e0c01fd0ecd2ef42c492f7f93902e39a42fc9ee1692961443a29", size = 20826, upload-time = "2024-10-18T15:21:47.134Z" },
-
{ url = "https://files.pythonhosted.org/packages/e6/cf/0a490a4bd363048c3022f2f475c8c05582179bb179defcee4766fb3dcc18/MarkupSafe-3.0.2-cp39-cp39-musllinux_1_2_aarch64.whl", hash = "sha256:1225beacc926f536dc82e45f8a4d68502949dc67eea90eab715dea3a21c1b5f0", size = 21843, upload-time = "2024-10-18T15:21:48.334Z" },
-
{ url = "https://files.pythonhosted.org/packages/19/a3/34187a78613920dfd3cdf68ef6ce5e99c4f3417f035694074beb8848cd77/MarkupSafe-3.0.2-cp39-cp39-musllinux_1_2_i686.whl", hash = "sha256:3169b1eefae027567d1ce6ee7cae382c57fe26e82775f460f0b2778beaad66c0", size = 21219, upload-time = "2024-10-18T15:21:49.587Z" },
-
{ url = "https://files.pythonhosted.org/packages/17/d8/5811082f85bb88410ad7e452263af048d685669bbbfb7b595e8689152498/MarkupSafe-3.0.2-cp39-cp39-musllinux_1_2_x86_64.whl", hash = "sha256:eb7972a85c54febfb25b5c4b4f3af4dcc731994c7da0d8a0b4a6eb0640e1d178", size = 20946, upload-time = "2024-10-18T15:21:50.441Z" },
-
{ url = "https://files.pythonhosted.org/packages/7c/31/bd635fb5989440d9365c5e3c47556cfea121c7803f5034ac843e8f37c2f2/MarkupSafe-3.0.2-cp39-cp39-win32.whl", hash = "sha256:8c4e8c3ce11e1f92f6536ff07154f9d49677ebaaafc32db9db4620bc11ed480f", size = 15063, upload-time = "2024-10-18T15:21:51.385Z" },
-
{ url = "https://files.pythonhosted.org/packages/b3/73/085399401383ce949f727afec55ec3abd76648d04b9f22e1c0e99cb4bec3/MarkupSafe-3.0.2-cp39-cp39-win_amd64.whl", hash = "sha256:6e296a513ca3d94054c2c881cc913116e90fd030ad1c656b3869762b754f5f8a", size = 15506, upload-time = "2024-10-18T15:21:52.974Z" },
]
[[package]]
···
{ name = "feedparser" },
{ name = "gitpython" },
{ name = "httpx" },
-
{ name = "jinja2" },
{ name = "pendulum" },
{ name = "platformdirs" },
{ name = "pydantic" },
···
{ name = "types-pyyaml" },
]
[package.metadata]
requires-dist = [
{ name = "black", marker = "extra == 'dev'", specifier = ">=24.0.0" },
···
{ name = "feedparser", specifier = ">=6.0.11" },
{ name = "gitpython", specifier = ">=3.1.40" },
{ name = "httpx", specifier = ">=0.28.0" },
-
{ name = "jinja2", specifier = ">=3.1.6" },
{ name = "mypy", marker = "extra == 'dev'", specifier = ">=1.13.0" },
{ name = "pendulum", specifier = ">=3.0.0" },
{ name = "platformdirs", specifier = ">=4.0.0" },
···
{ name = "types-pyyaml", marker = "extra == 'dev'", specifier = ">=6.0.0" },
]
provides-extras = ["dev"]
[[package]]
name = "tomli"
···
version = 1
+
revision = 3
requires-python = ">=3.9"
resolution-markers = [
"python_full_version >= '3.10'",
···
]
[[package]]
name = "markdown-it-py"
version = "3.0.0"
source = { registry = "https://pypi.org/simple" }
···
sdist = { url = "https://files.pythonhosted.org/packages/38/71/3b932df36c1a044d397a1f92d1cf91ee0a503d91e470cbd670aa66b07ed0/markdown-it-py-3.0.0.tar.gz", hash = "sha256:e3f60a94fa066dc52ec76661e37c851cb232d92f9886b15cb560aaada2df8feb", size = 74596, upload-time = "2023-06-03T06:41:14.443Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/42/d7/1ec15b46af6af88f19b8e5ffea08fa375d433c998b8a7639e76935c14f1f/markdown_it_py-3.0.0-py3-none-any.whl", hash = "sha256:355216845c60bd96232cd8d8c40e8f9765cc86f46880e43a8fd22dc1a1a8cab1", size = 87528, upload-time = "2023-06-03T06:41:11.019Z" },
]
[[package]]
···
{ name = "feedparser" },
{ name = "gitpython" },
{ name = "httpx" },
{ name = "pendulum" },
{ name = "platformdirs" },
{ name = "pydantic" },
···
{ name = "types-pyyaml" },
]
+
[package.dev-dependencies]
+
dev = [
+
{ name = "pytest" },
+
]
+
[package.metadata]
requires-dist = [
{ name = "black", marker = "extra == 'dev'", specifier = ">=24.0.0" },
···
{ name = "feedparser", specifier = ">=6.0.11" },
{ name = "gitpython", specifier = ">=3.1.40" },
{ name = "httpx", specifier = ">=0.28.0" },
{ name = "mypy", marker = "extra == 'dev'", specifier = ">=1.13.0" },
{ name = "pendulum", specifier = ">=3.0.0" },
{ name = "platformdirs", specifier = ">=4.0.0" },
···
{ name = "types-pyyaml", marker = "extra == 'dev'", specifier = ">=6.0.0" },
]
provides-extras = ["dev"]
+
+
[package.metadata.requires-dev]
+
dev = [{ name = "pytest", specifier = ">=8.4.1" }]
[[package]]
name = "tomli"