this repo has no description
1# Email Filter Improvement Workflow 2 3This document describes the streamlined process for improving the email classifier. 4 5## Quick Start 6 7```bash 8# 1. Export new emails from Gmail (run in Apps Script) 9# Uses: export-from-label.gscript 10 11# 2. Label the exported emails interactively 12bun label.ts new-emails.json 13 14# 3. Import labeled emails and see results 15bun import-labeled.ts new-emails-labeled.json 16 17# 4. If there are failures, update classifier.ts 18 19# 5. Test and regenerate 20bun test 21bun run evaluate.ts 22bun run generate-gscript.ts 23``` 24 25## Detailed Workflow 26 27### Step 1: Export Emails from Gmail 28 29In Google Apps Script, run the export script: 30 31```javascript 32// Run this in Apps Script console 33exportEmailsToDrive() 34``` 35 36This exports all emails with the `College/Auto` label to a JSON file in your Google Drive. 37 38Download the file to your project directory. 39 40### Step 2: Label Emails Interactively 41 42Use the interactive labeling tool: 43 44```bash 45bun label.ts college_emails_export_2025-12-07.json 46``` 47 48For each email, you'll be prompted: 49- `y` - Email is relevant (should go to inbox) 50- `n` - Email is not relevant (should be filtered) 51- `s` - Skip this email 52- `q` - Quit and save labeled emails so far 53 54When marking as relevant/not relevant, provide a short reason like: 55- "password reset" 56- "marketing spam" 57- "scholarship awarded" 58- "unsolicited outreach" 59 60The tool saves to `college_emails_export_2025-12-07-labeled.json` by default. 61 62### Step 3: Import and Evaluate 63 64Import the labeled emails into the main dataset: 65 66```bash 67bun import-labeled.ts college_emails_export_2025-12-07-labeled.json 68``` 69 70This will: 711. Merge new labeled emails into `college_emails_export_2025-12-05_labeled.json` 722. Check for duplicates (by thread_id) 733. Run the classifier on the new emails 744. Report any failures 75 76### Step 4: Fix Failures 77 78If there are classification failures, update `classifier.ts`: 79 80**For false positives** (classifier said relevant when it's not): 81- Add exclusion patterns to existing rules 82- Add new patterns to `checkIrrelevant()` 83 84**For false negatives** (classifier said not relevant when it is): 85- Add new patterns to the appropriate check function 86- Ensure patterns are specific enough 87 88Example: 89```typescript 90// False positive: "I'm eager to consider you" triggered accepted_student 91// Fix: Add exclusion in checkAccepted() 92if (/\bi'?m\s+eager\s+to\s+consider\s+you\b/.test(combined)) { 93 return null; // Not actually accepted 94} 95``` 96 97### Step 5: Test and Deploy 98 99```bash 100# Run unit tests 101bun test 102 103# Run full evaluation on all labeled emails 104bun run evaluate.ts college_emails_export_2025-12-05_labeled.json 105 106# Generate updated GScript 107bun run generate-gscript.ts 108 109# Copy filter-hybrid.gscript to Apps Script and deploy 110``` 111 112## File Structure 113 114- `export-from-label.gscript` - Apps Script to export emails from Gmail 115- `label.ts` - Interactive CLI for labeling emails 116- `import-labeled.ts` - Import labeled emails and evaluate 117- `classifier.ts` - TypeScript classifier (source of truth) 118- `generate-gscript.ts` - Generate GScript from TypeScript 119- `filter-hybrid.gscript` - Generated GScript for Gmail 120- `college_emails_export_*_labeled.json` - Main labeled dataset 121 122## Tips 123 124### Labeling Best Practices 125 126- **Be consistent** - Use similar reasons for similar emails 127- **Be specific** - "marketing spam" vs "scholarship not awarded" 128- **Label in batches** - Do 10-20 at a time to stay focused 129- **When in doubt** - Mark as not relevant (safer to filter) 130 131### Common Email Categories 132 133**Relevant** (should go to inbox): 134- Password resets / security alerts 135- Application confirmations 136- Enrollment confirmations 137- Scholarships actually awarded 138- Financial aid offers ready 139- Dual enrollment course info 140- Accepted student portal access 141 142**Not Relevant** (should be filtered): 143- Marketing / newsletters 144- Unsolicited outreach 145- Application reminders 146- Scholarship eligibility (not awarded) 147- FAFSA reminders 148- Campus events / tours 149- Deadline extensions 150 151### Classifier Pattern Tips 152 1531. **Test patterns broadly** - Use `combined` (subject + body) for most checks 1542. **Add exclusions** - Marketing often uses similar words to real notifications 1553. **Be specific** - "admission decision ready" vs "you will receive an admission decision" 1564. **Check order matters** - More specific checks should come before general ones 157 158## Maintenance 159 160### Regular Tasks 161 1621. **Weekly**: Export new emails and label them 1632. **Monthly**: Review classifier accuracy 1643. **As needed**: Update patterns for new spam types 165 166### Monitoring 167 168Check accuracy metrics after each import: 169- **Target accuracy**: >95% 170- **Target precision**: >90% (low false positives) 171- **Target recall**: >95% (low false negatives) 172 173If metrics drop below targets, review recent failures and update patterns.