this repo has no description
1# Email Filter Improvement Workflow
2
3This document describes the streamlined process for improving the email classifier.
4
5## Quick Start
6
7```bash
8# 1. Export new emails from Gmail (run in Apps Script)
9# Uses: export-from-label.gscript
10
11# 2. Label the exported emails interactively
12bun label.ts new-emails.json
13
14# 3. Import labeled emails and see results
15bun import-labeled.ts new-emails-labeled.json
16
17# 4. If there are failures, update classifier.ts
18
19# 5. Test and regenerate
20bun test
21bun run evaluate.ts
22bun run generate-gscript.ts
23```
24
25## Detailed Workflow
26
27### Step 1: Export Emails from Gmail
28
29In Google Apps Script, run the export script:
30
31```javascript
32// Run this in Apps Script console
33exportEmailsToDrive()
34```
35
36This exports all emails with the `College/Auto` label to a JSON file in your Google Drive.
37
38Download the file to your project directory.
39
40### Step 2: Label Emails Interactively
41
42Use the interactive labeling tool:
43
44```bash
45bun label.ts college_emails_export_2025-12-07.json
46```
47
48For each email, you'll be prompted:
49- `y` - Email is relevant (should go to inbox)
50- `n` - Email is not relevant (should be filtered)
51- `s` - Skip this email
52- `q` - Quit and save labeled emails so far
53
54When marking as relevant/not relevant, provide a short reason like:
55- "password reset"
56- "marketing spam"
57- "scholarship awarded"
58- "unsolicited outreach"
59
60The tool saves to `college_emails_export_2025-12-07-labeled.json` by default.
61
62### Step 3: Import and Evaluate
63
64Import the labeled emails into the main dataset:
65
66```bash
67bun import-labeled.ts college_emails_export_2025-12-07-labeled.json
68```
69
70This will:
711. Merge new labeled emails into `college_emails_export_2025-12-05_labeled.json`
722. Check for duplicates (by thread_id)
733. Run the classifier on the new emails
744. Report any failures
75
76### Step 4: Fix Failures
77
78If there are classification failures, update `classifier.ts`:
79
80**For false positives** (classifier said relevant when it's not):
81- Add exclusion patterns to existing rules
82- Add new patterns to `checkIrrelevant()`
83
84**For false negatives** (classifier said not relevant when it is):
85- Add new patterns to the appropriate check function
86- Ensure patterns are specific enough
87
88Example:
89```typescript
90// False positive: "I'm eager to consider you" triggered accepted_student
91// Fix: Add exclusion in checkAccepted()
92if (/\bi'?m\s+eager\s+to\s+consider\s+you\b/.test(combined)) {
93 return null; // Not actually accepted
94}
95```
96
97### Step 5: Test and Deploy
98
99```bash
100# Run unit tests
101bun test
102
103# Run full evaluation on all labeled emails
104bun run evaluate.ts college_emails_export_2025-12-05_labeled.json
105
106# Generate updated GScript
107bun run generate-gscript.ts
108
109# Copy filter-hybrid.gscript to Apps Script and deploy
110```
111
112## File Structure
113
114- `export-from-label.gscript` - Apps Script to export emails from Gmail
115- `label.ts` - Interactive CLI for labeling emails
116- `import-labeled.ts` - Import labeled emails and evaluate
117- `classifier.ts` - TypeScript classifier (source of truth)
118- `generate-gscript.ts` - Generate GScript from TypeScript
119- `filter-hybrid.gscript` - Generated GScript for Gmail
120- `college_emails_export_*_labeled.json` - Main labeled dataset
121
122## Tips
123
124### Labeling Best Practices
125
126- **Be consistent** - Use similar reasons for similar emails
127- **Be specific** - "marketing spam" vs "scholarship not awarded"
128- **Label in batches** - Do 10-20 at a time to stay focused
129- **When in doubt** - Mark as not relevant (safer to filter)
130
131### Common Email Categories
132
133**Relevant** (should go to inbox):
134- Password resets / security alerts
135- Application confirmations
136- Enrollment confirmations
137- Scholarships actually awarded
138- Financial aid offers ready
139- Dual enrollment course info
140- Accepted student portal access
141
142**Not Relevant** (should be filtered):
143- Marketing / newsletters
144- Unsolicited outreach
145- Application reminders
146- Scholarship eligibility (not awarded)
147- FAFSA reminders
148- Campus events / tours
149- Deadline extensions
150
151### Classifier Pattern Tips
152
1531. **Test patterns broadly** - Use `combined` (subject + body) for most checks
1542. **Add exclusions** - Marketing often uses similar words to real notifications
1553. **Be specific** - "admission decision ready" vs "you will receive an admission decision"
1564. **Check order matters** - More specific checks should come before general ones
157
158## Maintenance
159
160### Regular Tasks
161
1621. **Weekly**: Export new emails and label them
1632. **Monthly**: Review classifier accuracy
1643. **As needed**: Update patterns for new spam types
165
166### Monitoring
167
168Check accuracy metrics after each import:
169- **Target accuracy**: >95%
170- **Target precision**: >90% (low false positives)
171- **Target recall**: >95% (low false negatives)
172
173If metrics drop below targets, review recent failures and update patterns.