Pre-filtering Guide
Pre-filtering automatically identifies invalid or problematic responses before they reach the AI classifiers. This saves credits, improves agreement rates, and produces cleaner data.
Why Pre-filter Responses?
Survey open-ends often contain responses that shouldn't be classified:
- Empty responses: Blank fields or whitespace only
- Keyboard spam: "asdfasdf" or "zzzzzzz"
- Placeholder text: "N/A", "none", "."
- Too short: Single words that lack context
Sending these to AI classifiers wastes credits and introduces noise. Pre-filtering catches them automatically and marks them as "Rejected" with a specific reason.
Rejected responses don't affect agreement: Pre-filtered responses are excluded from agreement calculations and Cohen's Kappa, so they don't artificially inflate or deflate your reliability metrics.
Available Filters
Empty Responses
Catches completely empty responses or those containing only whitespace.
- Patterns detected: Empty strings, whitespace only, null values
- Default: Enabled
- Recommendation: Always keep enabled
Too Short
Rejects responses shorter than a specified character count (excluding whitespace).
- Default threshold: 3 characters
- Configurable: Yes, when starting a coding run
- Examples rejected: "ok", "no", "ya"
Be careful with the threshold: Setting it too high may reject valid brief responses. A threshold of 3-10 characters catches most garbage while preserving legitimate short answers.
Keyboard Spam
Detects patterns suggesting random keyboard entry rather than genuine responses.
- Patterns detected:
- Repeated characters: "aaaaaaa", "111111"
- Keyboard patterns: "asdf", "qwerty", "zxcv"
- Repeated sequences: "abcabcabc"
- Default: Enabled
Repetitive Text
Catches responses that repeat the same word or phrase multiple times.
- Patterns detected:
- Word repetition: "good good good good"
- Phrase loops: "I like it I like it I like it"
- Default: Enabled
Placeholder Responses
Identifies common placeholder or non-answer text that respondents use to skip questions.
- Patterns detected:
- Explicit non-answers: "N/A", "n/a", "NA", "none", "nothing"
- Single punctuation: ".", "-", "–", "—"
- Dismissive responses: "no comment", "no opinion", "idk"
- Test entries: "test", "testing", "xxx"
- Default: Enabled
Placeholder vs N/A category: The placeholder filter catches responses where the person didn't attempt to answer. This is different from the implicit N/A category, which is for genuine responses that don't fit your defined categories. Both can be useful together.
Filter Configuration
When starting a new coding run, you can customize which filters are active:
Expanding Filter Options
Click "Pre-filtering Options" in the Start Coding Run dialog to see and configure individual filters. The collapsed view shows how many filters are currently enabled.
Per-Run Settings
Filter settings are saved per coding run, so you can use different configurations for different datasets or columns. This is useful when:
- Different columns have different expected response patterns
- You want to compare results with different filter settings
- A specific dataset has unusual characteristics
Reviewing Rejected Responses
Rejected responses are still accessible in your results. You can:
- Filter the results table to show only rejected responses
- See the rejection reason for each response
- Export all records (including rejected) using Statistical format
Impact on Statistics
Pre-filtering affects your reported statistics in specific ways:
| Metric | Impact |
|---|---|
| Total Responses | Includes all responses (rejected + classified) |
| Agreement Rate | Calculated on classified responses only |
| Cohen's Kappa | Calculated on classified responses only |
| Credit Usage | Only charged for classified responses |
No credits for rejected responses: Pre-filtered responses don't count against your credit balance since they're never sent to the AI classifiers.
Best Practices
Start with Defaults
The default filter settings work well for most survey data. Only adjust if you notice:
- Valid responses being incorrectly rejected
- Invalid responses making it through to classification
Review a Sample First
Before running on your full dataset, consider:
- Running on a small sample (50-100 responses)
- Reviewing rejected responses for false positives
- Adjusting filter settings if needed
- Running on the full dataset
Document Your Settings
For research requiring methodological transparency, note which filters were enabled. This information is included in the Statistical export format.
Related: Learn how rejected responses affect agreement calculations and which export format includes them.