Email Filter Improvement Workflow#
This document describes the streamlined process for improving the email classifier.
Quick Start#
# 1. Export new emails from Gmail (run in Apps Script)
# Uses: export-from-label.gscript
# 2. Label the exported emails interactively
bun label.ts new-emails.json
# 3. Import labeled emails and see results
bun import-labeled.ts new-emails-labeled.json
# 4. If there are failures, update classifier.ts
# 5. Test and regenerate
bun test
bun run evaluate.ts
bun run generate-gscript.ts
Detailed Workflow#
Step 1: Export Emails from Gmail#
In Google Apps Script, run the export script:
// Run this in Apps Script console
exportEmailsToDrive()
This exports all emails with the College/Auto label to a JSON file in your Google Drive.
Download the file to your project directory.
Step 2: Label Emails Interactively#
Use the interactive labeling tool:
bun label.ts college_emails_export_2025-12-07.json
For each email, you'll be prompted:
y- Email is relevant (should go to inbox)n- Email is not relevant (should be filtered)s- Skip this emailq- Quit and save labeled emails so far
When marking as relevant/not relevant, provide a short reason like:
- "password reset"
- "marketing spam"
- "scholarship awarded"
- "unsolicited outreach"
The tool saves to college_emails_export_2025-12-07-labeled.json by default.
Step 3: Import and Evaluate#
Import the labeled emails into the main dataset:
bun import-labeled.ts college_emails_export_2025-12-07-labeled.json
This will:
- Merge new labeled emails into
college_emails_export_2025-12-05_labeled.json - Check for duplicates (by thread_id)
- Run the classifier on the new emails
- Report any failures
Step 4: Fix Failures#
If there are classification failures, update classifier.ts:
For false positives (classifier said relevant when it's not):
- Add exclusion patterns to existing rules
- Add new patterns to
checkIrrelevant()
For false negatives (classifier said not relevant when it is):
- Add new patterns to the appropriate check function
- Ensure patterns are specific enough
Example:
// False positive: "I'm eager to consider you" triggered accepted_student
// Fix: Add exclusion in checkAccepted()
if (/\bi'?m\s+eager\s+to\s+consider\s+you\b/.test(combined)) {
return null; // Not actually accepted
}
Step 5: Test and Deploy#
# Run unit tests
bun test
# Run full evaluation on all labeled emails
bun run evaluate.ts college_emails_export_2025-12-05_labeled.json
# Generate updated GScript
bun run generate-gscript.ts
# Copy filter-hybrid.gscript to Apps Script and deploy
File Structure#
export-from-label.gscript- Apps Script to export emails from Gmaillabel.ts- Interactive CLI for labeling emailsimport-labeled.ts- Import labeled emails and evaluateclassifier.ts- TypeScript classifier (source of truth)generate-gscript.ts- Generate GScript from TypeScriptfilter-hybrid.gscript- Generated GScript for Gmailcollege_emails_export_*_labeled.json- Main labeled dataset
Tips#
Labeling Best Practices#
- Be consistent - Use similar reasons for similar emails
- Be specific - "marketing spam" vs "scholarship not awarded"
- Label in batches - Do 10-20 at a time to stay focused
- When in doubt - Mark as not relevant (safer to filter)
Common Email Categories#
Relevant (should go to inbox):
- Password resets / security alerts
- Application confirmations
- Enrollment confirmations
- Scholarships actually awarded
- Financial aid offers ready
- Dual enrollment course info
- Accepted student portal access
Not Relevant (should be filtered):
- Marketing / newsletters
- Unsolicited outreach
- Application reminders
- Scholarship eligibility (not awarded)
- FAFSA reminders
- Campus events / tours
- Deadline extensions
Classifier Pattern Tips#
- Test patterns broadly - Use
combined(subject + body) for most checks - Add exclusions - Marketing often uses similar words to real notifications
- Be specific - "admission decision ready" vs "you will receive an admission decision"
- Check order matters - More specific checks should come before general ones
Maintenance#
Regular Tasks#
- Weekly: Export new emails and label them
- Monthly: Review classifier accuracy
- As needed: Update patterns for new spam types
Monitoring#
Check accuracy metrics after each import:
- Target accuracy: >95%
- Target precision: >90% (low false positives)
- Target recall: >95% (low false negatives)
If metrics drop below targets, review recent failures and update patterns.