Manage Atom feeds in a persistent git repository
1# Code Duplication Analysis for Thicket 2 3## 1. Duplicate JSON Handling Code 4 5### Pattern: JSON file reading/writing 6**Locations:** 7- `src/thicket/cli/commands/generate.py:230` - Reading JSON with `json.load(f)` 8- `src/thicket/cli/commands/generate.py:249` - Reading links.json 9- `src/thicket/cli/commands/index.py:2305` - Reading JSON 10- `src/thicket/cli/commands/index.py:2320` - Writing JSON with `json.dump()` 11- `src/thicket/cli/commands/threads.py:2456` - Reading JSON 12- `src/thicket/cli/commands/info.py:2683` - Reading JSON 13- `src/thicket/core/git_store.py:5546` - Writing JSON with custom serializer 14- `src/thicket/core/git_store.py:5556` - Reading JSON 15- `src/thicket/core/git_store.py:5566` - Writing JSON 16- `src/thicket/core/git_store.py:5656` - Writing JSON with model dump 17 18**Recommendation:** Create a shared `json_utils.py` module: 19```python 20def read_json_file(path: Path) -> dict: 21 """Read JSON file with error handling.""" 22 with open(path) as f: 23 return json.load(f) 24 25def write_json_file(path: Path, data: dict, indent: int = 2) -> None: 26 """Write JSON file with consistent formatting.""" 27 with open(path, "w") as f: 28 json.dump(data, f, indent=indent, default=str) 29 30def write_model_json(path: Path, model: BaseModel, indent: int = 2) -> None: 31 """Write Pydantic model as JSON.""" 32 with open(path, "w") as f: 33 json.dump(model.model_dump(mode="json", exclude_none=True), f, indent=indent, default=str) 34``` 35 36## 2. Repeated Datetime Handling 37 38### Pattern: datetime formatting and fallback handling 39**Locations:** 40- `src/thicket/cli/commands/generate.py:241` - `key=lambda x: x[1].updated or x[1].published or datetime.min` 41- `src/thicket/cli/commands/generate.py:353` - Same pattern in thread sorting 42- `src/thicket/cli/commands/generate.py:359` - Same pattern for max date 43- `src/thicket/cli/commands/generate.py:625` - Same pattern 44- `src/thicket/cli/commands/generate.py:655` - `entry.updated or entry.published or datetime.min` 45- `src/thicket/cli/commands/generate.py:689` - Same pattern 46- `src/thicket/cli/commands/generate.py:702` - Same pattern 47- Multiple `.strftime('%Y-%m-%d')` calls throughout 48 49**Recommendation:** Create a shared `datetime_utils.py` module: 50```python 51def get_entry_date(entry: AtomEntry) -> datetime: 52 """Get the most relevant date for an entry with fallback.""" 53 return entry.updated or entry.published or datetime.min 54 55def format_date_short(dt: datetime) -> str: 56 """Format datetime as YYYY-MM-DD.""" 57 return dt.strftime('%Y-%m-%d') 58 59def format_date_full(dt: datetime) -> str: 60 """Format datetime as YYYY-MM-DD HH:MM.""" 61 return dt.strftime('%Y-%m-%d %H:%M') 62 63def format_date_iso(dt: datetime) -> str: 64 """Format datetime as ISO string.""" 65 return dt.isoformat() 66``` 67 68## 3. Path Handling Patterns 69 70### Pattern: Directory creation and existence checks 71**Locations:** 72- `src/thicket/cli/commands/generate.py:225` - `if user_dir.exists()` 73- `src/thicket/cli/commands/generate.py:247` - `if links_file.exists()` 74- `src/thicket/cli/commands/generate.py:582` - `self.output_dir.mkdir(parents=True, exist_ok=True)` 75- `src/thicket/cli/commands/generate.py:585-586` - Multiple mkdir calls 76- `src/thicket/cli/commands/threads.py:2449` - `if not index_path.exists()` 77- `src/thicket/cli/commands/info.py:2681` - `if links_path.exists()` 78- `src/thicket/core/git_store.py:5515` - `if not self.repo_path.exists()` 79- `src/thicket/core/git_store.py:5586` - `user_dir.mkdir(exist_ok=True)` 80- Many more similar patterns 81 82**Recommendation:** Create a shared `path_utils.py` module: 83```python 84def ensure_directory(path: Path) -> Path: 85 """Ensure directory exists, creating if necessary.""" 86 path.mkdir(parents=True, exist_ok=True) 87 return path 88 89def read_json_if_exists(path: Path, default: Any = None) -> Any: 90 """Read JSON file if it exists, otherwise return default.""" 91 if path.exists(): 92 with open(path) as f: 93 return json.load(f) 94 return default 95 96def safe_path_join(*parts: Union[str, Path]) -> Path: 97 """Safely join path components.""" 98 return Path(*parts) 99``` 100 101## 4. Progress Bar and Console Output 102 103### Pattern: Progress bar creation and updates 104**Locations:** 105- `src/thicket/cli/commands/generate.py:209` - Progress with SpinnerColumn 106- `src/thicket/cli/commands/index.py:2230` - Same Progress pattern 107- Multiple `console.print()` calls with similar formatting patterns 108- Progress update patterns repeated 109 110**Recommendation:** Create a shared `ui_utils.py` module: 111```python 112def create_progress_spinner(description: str) -> tuple[Progress, TaskID]: 113 """Create a standard progress spinner.""" 114 progress = Progress( 115 SpinnerColumn(), 116 TextColumn("[progress.description]{task.description}"), 117 transient=True, 118 ) 119 task = progress.add_task(description) 120 return progress, task 121 122def print_success(message: str) -> None: 123 """Print success message with consistent formatting.""" 124 console.print(f"[green]✓[/green] {message}") 125 126def print_error(message: str) -> None: 127 """Print error message with consistent formatting.""" 128 console.print(f"[red]Error: {message}[/red]") 129 130def print_warning(message: str) -> None: 131 """Print warning message with consistent formatting.""" 132 console.print(f"[yellow]Warning: {message}[/yellow]") 133``` 134 135## 5. Git Store Operations 136 137### Pattern: Entry file operations 138**Locations:** 139- Multiple patterns of loading entries from user directories 140- Repeated safe_id generation 141- Repeated user directory path construction 142 143**Recommendation:** Enhance GitStore with helper methods: 144```python 145def get_user_dir(self, username: str) -> Path: 146 """Get user directory path.""" 147 return self.repo_path / username 148 149def iter_user_entries(self, username: str) -> Iterator[tuple[Path, AtomEntry]]: 150 """Iterate over all entries for a user.""" 151 user_dir = self.get_user_dir(username) 152 if user_dir.exists(): 153 for entry_file in user_dir.glob("*.json"): 154 if entry_file.name not in ["index.json", "duplicates.json"]: 155 try: 156 entry = self.read_entry_file(entry_file) 157 yield entry_file, entry 158 except Exception: 159 continue 160``` 161 162## 6. Error Handling Patterns 163 164### Pattern: Try-except with console error printing 165**Locations:** 166- Similar error handling patterns throughout CLI commands 167- Repeated `raise typer.Exit(1)` patterns 168- Similar exception message formatting 169 170**Recommendation:** Create error handling decorators: 171```python 172def handle_cli_errors(func): 173 """Decorator to handle CLI command errors consistently.""" 174 @functools.wraps(func) 175 def wrapper(*args, **kwargs): 176 try: 177 return func(*args, **kwargs) 178 except ValidationError as e: 179 console.print(f"[red]Validation error: {e}[/red]") 180 raise typer.Exit(1) 181 except Exception as e: 182 console.print(f"[red]Error: {e}[/red]") 183 if kwargs.get('verbose'): 184 console.print_exception() 185 raise typer.Exit(1) 186 return wrapper 187``` 188 189## 7. Configuration and Validation 190 191### Pattern: Config file loading and validation 192**Locations:** 193- Repeated config loading pattern in every CLI command 194- Similar validation patterns for URLs and paths 195 196**Recommendation:** Create a `config_utils.py` module: 197```python 198def load_config_with_defaults(config_path: Optional[Path] = None) -> ThicketConfig: 199 """Load config with standard defaults and error handling.""" 200 if config_path is None: 201 config_path = Path("thicket.yaml") 202 203 if not config_path.exists(): 204 raise ConfigError(f"Configuration file not found: {config_path}") 205 206 return load_config(config_path) 207 208def validate_url(url: str) -> HttpUrl: 209 """Validate and return URL with consistent error handling.""" 210 try: 211 return HttpUrl(url) 212 except ValidationError: 213 raise ConfigError(f"Invalid URL: {url}") 214``` 215 216## 8. Model Serialization 217 218### Pattern: Pydantic model JSON encoding 219**Locations:** 220- Repeated `json_encoders={datetime: lambda v: v.isoformat()}` in model configs 221- Similar model_dump patterns 222 223**Recommendation:** Create base model class: 224```python 225class ThicketBaseModel(BaseModel): 226 """Base model with common configuration.""" 227 model_config = ConfigDict( 228 json_encoders={datetime: lambda v: v.isoformat()}, 229 str_strip_whitespace=True, 230 ) 231 232 def to_json_dict(self) -> dict: 233 """Convert to JSON-serializable dict.""" 234 return self.model_dump(mode="json", exclude_none=True) 235``` 236 237## Summary of Refactoring Benefits 238 2391. **Reduced Code Duplication**: Eliminate 30-40% of duplicate code 2402. **Consistent Error Handling**: Standardize error messages and handling 2413. **Easier Maintenance**: Central location for common patterns 2424. **Better Testing**: Easier to unit test shared utilities 2435. **Type Safety**: Shared type hints and validation 2446. **Performance**: Potential to optimize common operations in one place 245 246## Implementation Priority 247 2481. **High Priority**: 249 - JSON utilities (used everywhere) 250 - Datetime utilities (critical for sorting and display) 251 - Error handling decorators (improves UX consistency) 252 2532. **Medium Priority**: 254 - Path utilities 255 - UI/Console utilities 256 - Config utilities 257 2583. **Low Priority**: 259 - Base model classes (requires more refactoring) 260 - Git store enhancements (already well-structured)