Manage Atom feeds in a persistent git repository

Code Duplication Analysis for Thicket#

1. Duplicate JSON Handling Code#

Pattern: JSON file reading/writing#

Locations:

  • src/thicket/cli/commands/generate.py:230 - Reading JSON with json.load(f)
  • src/thicket/cli/commands/generate.py:249 - Reading links.json
  • src/thicket/cli/commands/index.py:2305 - Reading JSON
  • src/thicket/cli/commands/index.py:2320 - Writing JSON with json.dump()
  • src/thicket/cli/commands/threads.py:2456 - Reading JSON
  • src/thicket/cli/commands/info.py:2683 - Reading JSON
  • src/thicket/core/git_store.py:5546 - Writing JSON with custom serializer
  • src/thicket/core/git_store.py:5556 - Reading JSON
  • src/thicket/core/git_store.py:5566 - Writing JSON
  • src/thicket/core/git_store.py:5656 - Writing JSON with model dump

Recommendation: Create a shared json_utils.py module:

def read_json_file(path: Path) -> dict:
    """Read JSON file with error handling."""
    with open(path) as f:
        return json.load(f)

def write_json_file(path: Path, data: dict, indent: int = 2) -> None:
    """Write JSON file with consistent formatting."""
    with open(path, "w") as f:
        json.dump(data, f, indent=indent, default=str)

def write_model_json(path: Path, model: BaseModel, indent: int = 2) -> None:
    """Write Pydantic model as JSON."""
    with open(path, "w") as f:
        json.dump(model.model_dump(mode="json", exclude_none=True), f, indent=indent, default=str)

2. Repeated Datetime Handling#

Pattern: datetime formatting and fallback handling#

Locations:

  • src/thicket/cli/commands/generate.py:241 - key=lambda x: x[1].updated or x[1].published or datetime.min
  • src/thicket/cli/commands/generate.py:353 - Same pattern in thread sorting
  • src/thicket/cli/commands/generate.py:359 - Same pattern for max date
  • src/thicket/cli/commands/generate.py:625 - Same pattern
  • src/thicket/cli/commands/generate.py:655 - entry.updated or entry.published or datetime.min
  • src/thicket/cli/commands/generate.py:689 - Same pattern
  • src/thicket/cli/commands/generate.py:702 - Same pattern
  • Multiple .strftime('%Y-%m-%d') calls throughout

Recommendation: Create a shared datetime_utils.py module:

def get_entry_date(entry: AtomEntry) -> datetime:
    """Get the most relevant date for an entry with fallback."""
    return entry.updated or entry.published or datetime.min

def format_date_short(dt: datetime) -> str:
    """Format datetime as YYYY-MM-DD."""
    return dt.strftime('%Y-%m-%d')

def format_date_full(dt: datetime) -> str:
    """Format datetime as YYYY-MM-DD HH:MM."""
    return dt.strftime('%Y-%m-%d %H:%M')

def format_date_iso(dt: datetime) -> str:
    """Format datetime as ISO string."""
    return dt.isoformat()

3. Path Handling Patterns#

Pattern: Directory creation and existence checks#

Locations:

  • src/thicket/cli/commands/generate.py:225 - if user_dir.exists()
  • src/thicket/cli/commands/generate.py:247 - if links_file.exists()
  • src/thicket/cli/commands/generate.py:582 - self.output_dir.mkdir(parents=True, exist_ok=True)
  • src/thicket/cli/commands/generate.py:585-586 - Multiple mkdir calls
  • src/thicket/cli/commands/threads.py:2449 - if not index_path.exists()
  • src/thicket/cli/commands/info.py:2681 - if links_path.exists()
  • src/thicket/core/git_store.py:5515 - if not self.repo_path.exists()
  • src/thicket/core/git_store.py:5586 - user_dir.mkdir(exist_ok=True)
  • Many more similar patterns

Recommendation: Create a shared path_utils.py module:

def ensure_directory(path: Path) -> Path:
    """Ensure directory exists, creating if necessary."""
    path.mkdir(parents=True, exist_ok=True)
    return path

def read_json_if_exists(path: Path, default: Any = None) -> Any:
    """Read JSON file if it exists, otherwise return default."""
    if path.exists():
        with open(path) as f:
            return json.load(f)
    return default

def safe_path_join(*parts: Union[str, Path]) -> Path:
    """Safely join path components."""
    return Path(*parts)

4. Progress Bar and Console Output#

Pattern: Progress bar creation and updates#

Locations:

  • src/thicket/cli/commands/generate.py:209 - Progress with SpinnerColumn
  • src/thicket/cli/commands/index.py:2230 - Same Progress pattern
  • Multiple console.print() calls with similar formatting patterns
  • Progress update patterns repeated

Recommendation: Create a shared ui_utils.py module:

def create_progress_spinner(description: str) -> tuple[Progress, TaskID]:
    """Create a standard progress spinner."""
    progress = Progress(
        SpinnerColumn(),
        TextColumn("[progress.description]{task.description}"),
        transient=True,
    )
    task = progress.add_task(description)
    return progress, task

def print_success(message: str) -> None:
    """Print success message with consistent formatting."""
    console.print(f"[green]✓[/green] {message}")

def print_error(message: str) -> None:
    """Print error message with consistent formatting."""
    console.print(f"[red]Error: {message}[/red]")

def print_warning(message: str) -> None:
    """Print warning message with consistent formatting."""
    console.print(f"[yellow]Warning: {message}[/yellow]")

5. Git Store Operations#

Pattern: Entry file operations#

Locations:

  • Multiple patterns of loading entries from user directories
  • Repeated safe_id generation
  • Repeated user directory path construction

Recommendation: Enhance GitStore with helper methods:

def get_user_dir(self, username: str) -> Path:
    """Get user directory path."""
    return self.repo_path / username

def iter_user_entries(self, username: str) -> Iterator[tuple[Path, AtomEntry]]:
    """Iterate over all entries for a user."""
    user_dir = self.get_user_dir(username)
    if user_dir.exists():
        for entry_file in user_dir.glob("*.json"):
            if entry_file.name not in ["index.json", "duplicates.json"]:
                try:
                    entry = self.read_entry_file(entry_file)
                    yield entry_file, entry
                except Exception:
                    continue

6. Error Handling Patterns#

Pattern: Try-except with console error printing#

Locations:

  • Similar error handling patterns throughout CLI commands
  • Repeated raise typer.Exit(1) patterns
  • Similar exception message formatting

Recommendation: Create error handling decorators:

def handle_cli_errors(func):
    """Decorator to handle CLI command errors consistently."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except ValidationError as e:
            console.print(f"[red]Validation error: {e}[/red]")
            raise typer.Exit(1)
        except Exception as e:
            console.print(f"[red]Error: {e}[/red]")
            if kwargs.get('verbose'):
                console.print_exception()
            raise typer.Exit(1)
    return wrapper

7. Configuration and Validation#

Pattern: Config file loading and validation#

Locations:

  • Repeated config loading pattern in every CLI command
  • Similar validation patterns for URLs and paths

Recommendation: Create a config_utils.py module:

def load_config_with_defaults(config_path: Optional[Path] = None) -> ThicketConfig:
    """Load config with standard defaults and error handling."""
    if config_path is None:
        config_path = Path("thicket.yaml")
    
    if not config_path.exists():
        raise ConfigError(f"Configuration file not found: {config_path}")
    
    return load_config(config_path)

def validate_url(url: str) -> HttpUrl:
    """Validate and return URL with consistent error handling."""
    try:
        return HttpUrl(url)
    except ValidationError:
        raise ConfigError(f"Invalid URL: {url}")

8. Model Serialization#

Pattern: Pydantic model JSON encoding#

Locations:

  • Repeated json_encoders={datetime: lambda v: v.isoformat()} in model configs
  • Similar model_dump patterns

Recommendation: Create base model class:

class ThicketBaseModel(BaseModel):
    """Base model with common configuration."""
    model_config = ConfigDict(
        json_encoders={datetime: lambda v: v.isoformat()},
        str_strip_whitespace=True,
    )
    
    def to_json_dict(self) -> dict:
        """Convert to JSON-serializable dict."""
        return self.model_dump(mode="json", exclude_none=True)

Summary of Refactoring Benefits#

  1. Reduced Code Duplication: Eliminate 30-40% of duplicate code
  2. Consistent Error Handling: Standardize error messages and handling
  3. Easier Maintenance: Central location for common patterns
  4. Better Testing: Easier to unit test shared utilities
  5. Type Safety: Shared type hints and validation
  6. Performance: Potential to optimize common operations in one place

Implementation Priority#

  1. High Priority:

    • JSON utilities (used everywhere)
    • Datetime utilities (critical for sorting and display)
    • Error handling decorators (improves UX consistency)
  2. Medium Priority:

    • Path utilities
    • UI/Console utilities
    • Config utilities
  3. Low Priority:

    • Base model classes (requires more refactoring)
    • Git store enhancements (already well-structured)