Pre-filtering Guide

Pre-filtering automatically identifies invalid or problematic responses before they reach the AI classifiers. This saves credits, improves agreement rates, and produces cleaner data.

Why Pre-filter Responses?

Survey open-ends often contain responses that shouldn't be classified:

  • Empty responses: Blank fields or whitespace only
  • Keyboard spam: "asdfasdf" or "zzzzzzz"
  • Placeholder text: "N/A", "none", "."
  • Too short: Single words that lack context

Sending these to AI classifiers wastes credits and introduces noise. Pre-filtering catches them automatically and marks them as "Rejected" with a specific reason.

Rejected responses don't affect agreement: Pre-filtered responses are excluded from agreement calculations and Cohen's Kappa, so they don't artificially inflate or deflate your reliability metrics.

Available Filters

Empty Responses

Catches completely empty responses or those containing only whitespace.

  • Patterns detected: Empty strings, whitespace only, null values
  • Default: Enabled
  • Recommendation: Always keep enabled

Too Short

Rejects responses shorter than a specified character count (excluding whitespace).

  • Default threshold: 3 characters
  • Configurable: Yes, when starting a coding run
  • Examples rejected: "ok", "no", "ya"

Be careful with the threshold: Setting it too high may reject valid brief responses. A threshold of 3-10 characters catches most garbage while preserving legitimate short answers.

Keyboard Spam

Detects patterns suggesting random keyboard entry rather than genuine responses.

  • Patterns detected:
    • Repeated characters: "aaaaaaa", "111111"
    • Keyboard patterns: "asdf", "qwerty", "zxcv"
    • Repeated sequences: "abcabcabc"
  • Default: Enabled

Repetitive Text

Catches responses that repeat the same word or phrase multiple times.

  • Patterns detected:
    • Word repetition: "good good good good"
    • Phrase loops: "I like it I like it I like it"
  • Default: Enabled

Placeholder Responses

Identifies common placeholder or non-answer text that respondents use to skip questions.

  • Patterns detected:
    • Explicit non-answers: "N/A", "n/a", "NA", "none", "nothing"
    • Single punctuation: ".", "-", "–", "—"
    • Dismissive responses: "no comment", "no opinion", "idk"
    • Test entries: "test", "testing", "xxx"
  • Default: Enabled

Placeholder vs N/A category: The placeholder filter catches responses where the person didn't attempt to answer. This is different from the implicit N/A category, which is for genuine responses that don't fit your defined categories. Both can be useful together.

Filter Configuration

When starting a new coding run, you can customize which filters are active:

Expanding Filter Options

Click "Pre-filtering Options" in the Start Coding Run dialog to see and configure individual filters. The collapsed view shows how many filters are currently enabled.

Per-Run Settings

Filter settings are saved per coding run, so you can use different configurations for different datasets or columns. This is useful when:

  • Different columns have different expected response patterns
  • You want to compare results with different filter settings
  • A specific dataset has unusual characteristics

Reviewing Rejected Responses

Rejected responses are still accessible in your results. You can:

  • Filter the results table to show only rejected responses
  • See the rejection reason for each response
  • Export all records (including rejected) using Statistical format

Impact on Statistics

Pre-filtering affects your reported statistics in specific ways:

Metric Impact
Total Responses Includes all responses (rejected + classified)
Agreement Rate Calculated on classified responses only
Cohen's Kappa Calculated on classified responses only
Credit Usage Only charged for classified responses

No credits for rejected responses: Pre-filtered responses don't count against your credit balance since they're never sent to the AI classifiers.

Best Practices

Start with Defaults

The default filter settings work well for most survey data. Only adjust if you notice:

  • Valid responses being incorrectly rejected
  • Invalid responses making it through to classification

Review a Sample First

Before running on your full dataset, consider:

  1. Running on a small sample (50-100 responses)
  2. Reviewing rejected responses for false positives
  3. Adjusting filter settings if needed
  4. Running on the full dataset

Document Your Settings

For research requiring methodological transparency, note which filters were enabled. This information is included in the Statistical export format.


Related: Learn how rejected responses affect agreement calculations and which export format includes them.