Manage Atom feeds in a persistent git repository
1# Code Duplication Analysis for Thicket
2
3## 1. Duplicate JSON Handling Code
4
5### Pattern: JSON file reading/writing
6**Locations:**
7- `src/thicket/cli/commands/generate.py:230` - Reading JSON with `json.load(f)`
8- `src/thicket/cli/commands/generate.py:249` - Reading links.json
9- `src/thicket/cli/commands/index.py:2305` - Reading JSON
10- `src/thicket/cli/commands/index.py:2320` - Writing JSON with `json.dump()`
11- `src/thicket/cli/commands/threads.py:2456` - Reading JSON
12- `src/thicket/cli/commands/info.py:2683` - Reading JSON
13- `src/thicket/core/git_store.py:5546` - Writing JSON with custom serializer
14- `src/thicket/core/git_store.py:5556` - Reading JSON
15- `src/thicket/core/git_store.py:5566` - Writing JSON
16- `src/thicket/core/git_store.py:5656` - Writing JSON with model dump
17
18**Recommendation:** Create a shared `json_utils.py` module:
19```python
20def read_json_file(path: Path) -> dict:
21 """Read JSON file with error handling."""
22 with open(path) as f:
23 return json.load(f)
24
25def write_json_file(path: Path, data: dict, indent: int = 2) -> None:
26 """Write JSON file with consistent formatting."""
27 with open(path, "w") as f:
28 json.dump(data, f, indent=indent, default=str)
29
30def write_model_json(path: Path, model: BaseModel, indent: int = 2) -> None:
31 """Write Pydantic model as JSON."""
32 with open(path, "w") as f:
33 json.dump(model.model_dump(mode="json", exclude_none=True), f, indent=indent, default=str)
34```
35
36## 2. Repeated Datetime Handling
37
38### Pattern: datetime formatting and fallback handling
39**Locations:**
40- `src/thicket/cli/commands/generate.py:241` - `key=lambda x: x[1].updated or x[1].published or datetime.min`
41- `src/thicket/cli/commands/generate.py:353` - Same pattern in thread sorting
42- `src/thicket/cli/commands/generate.py:359` - Same pattern for max date
43- `src/thicket/cli/commands/generate.py:625` - Same pattern
44- `src/thicket/cli/commands/generate.py:655` - `entry.updated or entry.published or datetime.min`
45- `src/thicket/cli/commands/generate.py:689` - Same pattern
46- `src/thicket/cli/commands/generate.py:702` - Same pattern
47- Multiple `.strftime('%Y-%m-%d')` calls throughout
48
49**Recommendation:** Create a shared `datetime_utils.py` module:
50```python
51def get_entry_date(entry: AtomEntry) -> datetime:
52 """Get the most relevant date for an entry with fallback."""
53 return entry.updated or entry.published or datetime.min
54
55def format_date_short(dt: datetime) -> str:
56 """Format datetime as YYYY-MM-DD."""
57 return dt.strftime('%Y-%m-%d')
58
59def format_date_full(dt: datetime) -> str:
60 """Format datetime as YYYY-MM-DD HH:MM."""
61 return dt.strftime('%Y-%m-%d %H:%M')
62
63def format_date_iso(dt: datetime) -> str:
64 """Format datetime as ISO string."""
65 return dt.isoformat()
66```
67
68## 3. Path Handling Patterns
69
70### Pattern: Directory creation and existence checks
71**Locations:**
72- `src/thicket/cli/commands/generate.py:225` - `if user_dir.exists()`
73- `src/thicket/cli/commands/generate.py:247` - `if links_file.exists()`
74- `src/thicket/cli/commands/generate.py:582` - `self.output_dir.mkdir(parents=True, exist_ok=True)`
75- `src/thicket/cli/commands/generate.py:585-586` - Multiple mkdir calls
76- `src/thicket/cli/commands/threads.py:2449` - `if not index_path.exists()`
77- `src/thicket/cli/commands/info.py:2681` - `if links_path.exists()`
78- `src/thicket/core/git_store.py:5515` - `if not self.repo_path.exists()`
79- `src/thicket/core/git_store.py:5586` - `user_dir.mkdir(exist_ok=True)`
80- Many more similar patterns
81
82**Recommendation:** Create a shared `path_utils.py` module:
83```python
84def ensure_directory(path: Path) -> Path:
85 """Ensure directory exists, creating if necessary."""
86 path.mkdir(parents=True, exist_ok=True)
87 return path
88
89def read_json_if_exists(path: Path, default: Any = None) -> Any:
90 """Read JSON file if it exists, otherwise return default."""
91 if path.exists():
92 with open(path) as f:
93 return json.load(f)
94 return default
95
96def safe_path_join(*parts: Union[str, Path]) -> Path:
97 """Safely join path components."""
98 return Path(*parts)
99```
100
101## 4. Progress Bar and Console Output
102
103### Pattern: Progress bar creation and updates
104**Locations:**
105- `src/thicket/cli/commands/generate.py:209` - Progress with SpinnerColumn
106- `src/thicket/cli/commands/index.py:2230` - Same Progress pattern
107- Multiple `console.print()` calls with similar formatting patterns
108- Progress update patterns repeated
109
110**Recommendation:** Create a shared `ui_utils.py` module:
111```python
112def create_progress_spinner(description: str) -> tuple[Progress, TaskID]:
113 """Create a standard progress spinner."""
114 progress = Progress(
115 SpinnerColumn(),
116 TextColumn("[progress.description]{task.description}"),
117 transient=True,
118 )
119 task = progress.add_task(description)
120 return progress, task
121
122def print_success(message: str) -> None:
123 """Print success message with consistent formatting."""
124 console.print(f"[green]✓[/green] {message}")
125
126def print_error(message: str) -> None:
127 """Print error message with consistent formatting."""
128 console.print(f"[red]Error: {message}[/red]")
129
130def print_warning(message: str) -> None:
131 """Print warning message with consistent formatting."""
132 console.print(f"[yellow]Warning: {message}[/yellow]")
133```
134
135## 5. Git Store Operations
136
137### Pattern: Entry file operations
138**Locations:**
139- Multiple patterns of loading entries from user directories
140- Repeated safe_id generation
141- Repeated user directory path construction
142
143**Recommendation:** Enhance GitStore with helper methods:
144```python
145def get_user_dir(self, username: str) -> Path:
146 """Get user directory path."""
147 return self.repo_path / username
148
149def iter_user_entries(self, username: str) -> Iterator[tuple[Path, AtomEntry]]:
150 """Iterate over all entries for a user."""
151 user_dir = self.get_user_dir(username)
152 if user_dir.exists():
153 for entry_file in user_dir.glob("*.json"):
154 if entry_file.name not in ["index.json", "duplicates.json"]:
155 try:
156 entry = self.read_entry_file(entry_file)
157 yield entry_file, entry
158 except Exception:
159 continue
160```
161
162## 6. Error Handling Patterns
163
164### Pattern: Try-except with console error printing
165**Locations:**
166- Similar error handling patterns throughout CLI commands
167- Repeated `raise typer.Exit(1)` patterns
168- Similar exception message formatting
169
170**Recommendation:** Create error handling decorators:
171```python
172def handle_cli_errors(func):
173 """Decorator to handle CLI command errors consistently."""
174 @functools.wraps(func)
175 def wrapper(*args, **kwargs):
176 try:
177 return func(*args, **kwargs)
178 except ValidationError as e:
179 console.print(f"[red]Validation error: {e}[/red]")
180 raise typer.Exit(1)
181 except Exception as e:
182 console.print(f"[red]Error: {e}[/red]")
183 if kwargs.get('verbose'):
184 console.print_exception()
185 raise typer.Exit(1)
186 return wrapper
187```
188
189## 7. Configuration and Validation
190
191### Pattern: Config file loading and validation
192**Locations:**
193- Repeated config loading pattern in every CLI command
194- Similar validation patterns for URLs and paths
195
196**Recommendation:** Create a `config_utils.py` module:
197```python
198def load_config_with_defaults(config_path: Optional[Path] = None) -> ThicketConfig:
199 """Load config with standard defaults and error handling."""
200 if config_path is None:
201 config_path = Path("thicket.yaml")
202
203 if not config_path.exists():
204 raise ConfigError(f"Configuration file not found: {config_path}")
205
206 return load_config(config_path)
207
208def validate_url(url: str) -> HttpUrl:
209 """Validate and return URL with consistent error handling."""
210 try:
211 return HttpUrl(url)
212 except ValidationError:
213 raise ConfigError(f"Invalid URL: {url}")
214```
215
216## 8. Model Serialization
217
218### Pattern: Pydantic model JSON encoding
219**Locations:**
220- Repeated `json_encoders={datetime: lambda v: v.isoformat()}` in model configs
221- Similar model_dump patterns
222
223**Recommendation:** Create base model class:
224```python
225class ThicketBaseModel(BaseModel):
226 """Base model with common configuration."""
227 model_config = ConfigDict(
228 json_encoders={datetime: lambda v: v.isoformat()},
229 str_strip_whitespace=True,
230 )
231
232 def to_json_dict(self) -> dict:
233 """Convert to JSON-serializable dict."""
234 return self.model_dump(mode="json", exclude_none=True)
235```
236
237## Summary of Refactoring Benefits
238
2391. **Reduced Code Duplication**: Eliminate 30-40% of duplicate code
2402. **Consistent Error Handling**: Standardize error messages and handling
2413. **Easier Maintenance**: Central location for common patterns
2424. **Better Testing**: Easier to unit test shared utilities
2435. **Type Safety**: Shared type hints and validation
2446. **Performance**: Potential to optimize common operations in one place
245
246## Implementation Priority
247
2481. **High Priority**:
249 - JSON utilities (used everywhere)
250 - Datetime utilities (critical for sorting and display)
251 - Error handling decorators (improves UX consistency)
252
2532. **Medium Priority**:
254 - Path utilities
255 - UI/Console utilities
256 - Config utilities
257
2583. **Low Priority**:
259 - Base model classes (requires more refactoring)
260 - Git store enhancements (already well-structured)