Auto-Suggest Coding Guide
Two independent AI models analyze your responses separately, then a third merges their findings semantically — producing a draft codebook with confidence ratings in minutes instead of days.
New to qualcode.ai? Auto-suggest is optional. You can always create coding guides manually or use existing ones. Auto-suggest eliminates the blank-page problem: when category discovery is the hardest part, it delivers a structured, confidence-rated starting codebook in minutes.
Already have a codebook? If you already have your categories in an Excel or CSV file, see Import & Export instead — auto-suggest is for discovering categories from your data, not for ingesting an existing list.
How It Works
Auto-suggest uses a three-AI workflow to analyze your survey responses and identify common themes:
- You provide your data: Upload a CSV or Excel file with open-ended survey responses
- Two AIs analyze independently: OpenAI GPT-5.2 and Anthropic Claude Opus 4.5 each analyze a random sample of your responses and suggest categories
- Semantic merge: A third AI pass performs semantic reconciliation—matching categories by meaning, not just name. Categories identified by both AIs are flagged as high-confidence; categories from only one AI are flagged as low-confidence
- You review and refine: Rename suggestions, edit descriptions, and delete or restore categories before creating the guide. Deeper editing can continue on the resulting coding guide later
- Create your guide: Apply the suggestions to create a new coding guide, ready for coding runs
Why Two AIs?
Using two independent AI models mirrors the inter-rater reliability approach used in traditional qualitative research. When two human coders independently analyze data and agree on themes, those themes are more likely to be meaningful and valid.
The same principle applies here:
- Categories both AIs identified are marked "High" confidence - they represent clear, robust themes in your data
- Categories only one AI found are marked "Low" confidence - they may represent valid but less obvious themes, or false positives
Different perspectives: OpenAI and Anthropic models were trained on different data with different approaches. This genuine independence makes their agreement more meaningful than if we simply ran the same model twice.
Why a Third AI Pass?
The third pass is not a third rater for confidence scoring. It is a semantic merge step that reconciles overlapping category ideas by meaning, so you do not have to manually normalize near-duplicates before editing the draft codebook.
- Less cleanup: Similar categories from the two independent raters are merged before you review them
- Better starting structure: You see a cleaner draft codebook, not two disconnected suggestion lists
- Clear provenance: Confidence still reflects whether the first two independent models agreed
Getting Started
To use auto-suggest, you need a data file with survey responses already uploaded to your project.
- Navigate to your project and open your data file
- Click Suggest Coding Guide (sparkle icon) in the data file header
- Select the column containing your open-ended responses
- Select Quick analysis mode (Thorough is shown as a coming-soon option)
- Adjust the sample size if needed
- Click Generate Suggestions to start the analysis
Premium AI Models
Auto-suggest always uses our most capable AI models to ensure high-quality category suggestions:
- OpenAI GPT-5.2 with extended reasoning capabilities
- Anthropic Claude Opus 4.5 with extended thinking
These premium models provide deeper analysis and more nuanced category identification than the standard models used for coding runs.
Choosing an Analysis Mode
Quick mode is available now. Thorough mode is planned for a later release and may appear in the interface as a disabled Soon option.
| Mode | Availability | Description | Best For | Cost |
|---|---|---|---|---|
| Quick | Available | Direct dual-rater analysis with high reasoning effort | Most use cases, exploratory analysis | 1.0x (base cost) |
| Thorough | Coming soon | Multi-step iterative analysis with deeper reasoning | Complex topics, nuanced themes, final codebook development | Planned: 2.0x |
Note: Quick mode already uses extended reasoning/thinking. Thorough mode is not yet live; once available, it will spend more credits because it performs a deeper multi-step pass over the sampled responses.
Sample Size
You can adjust how many responses are analyzed (10-1000). The default of 300 works well for most datasets. Larger samples capture more themes but cost more credits.
Responses are randomly selected from your data file. This ensures the sample is representative of the full dataset rather than biased toward responses at the beginning or end of the file.
Understanding Results
After analysis completes, you will see a list of suggested categories. Each category includes:
- Category name: A short, descriptive label for the theme
- Description: A definition explaining what belongs in this category
- Example responses: Sample responses from your data that fit this category
- Confidence badge: Whether both AIs agreed (High) or only one suggested it (Low)
- Source: Which AI(s) identified this category
Confidence Levels
The confidence level indicates how robustly a category was identified:
| Confidence | Meaning | Recommendation |
|---|---|---|
| High | Both OpenAI and Anthropic independently identified this theme | Strong candidate - likely a real pattern in your data |
| Low | Only one AI identified this theme | Review carefully - may be valid but less obvious, or could be a false positive |
Provenance Badges
Each category shows where it came from:
- Both: Category identified by both AIs (shown with a dual-user icon)
- OpenAI only: Category identified only by the OpenAI model
- Anthropic only: Category identified only by the Anthropic model
Editing Suggestions
The suggested categories are a starting point. You should review and refine them before creating your coding guide, then continue editing the guide itself if you want to reshape the codebook further.
Renaming Categories
Click on a category name to edit it. You can also modify the description to better match your research questions.
Combining Categories
If two suggested categories are similar or overlapping, you can remove one before guide creation and fold its wording into the category you keep. Once the guide exists, you can use the normal coding guide editor to add, rename, rewrite, split, or consolidate categories as your review develops.
Deleting Categories
Remove categories that do not fit your research needs. You can delete:
- Overly broad categories that would catch too many responses
- Categories outside the scope of your research question
- Low-confidence categories you do not find useful
Reviewing Examples
Each category comes with example responses from your data. Review them as evidence for whether the suggestion makes sense; after creating the guide, you can add or edit training examples in the Coding Guides section.
Creating the Coding Guide
Once you are satisfied with your categories:
- Review all categories one final time
- Enter a name for your new coding guide
- Choose whether to enable multi-label coding (if responses can belong to multiple categories)
- Click Create Coding Guide
Your new guide will be created with:
- All categories you selected
- Descriptions for each category
- Training examples from the AI-identified responses
You can then use this guide immediately for coding runs, or continue editing it in the Coding Guides section.
Best Practices
Before Running Auto-Suggest
- Use representative data: The AI analyzes a sample of your responses. Make sure your data file represents the full range of responses you expect.
- Larger samples are better: More responses give the AI more patterns to identify. At least 50-100 responses is recommended.
- Consider your research question: Have a clear idea of what you are looking for - it helps you evaluate whether the suggestions are useful.
Reviewing Suggestions
- Trust high-confidence categories: When both AIs agree, the theme is likely real and meaningful.
- Scrutinize low-confidence categories: These may be valid but need human judgment to confirm.
- Look for missing themes: AI suggestions are a starting point, not exhaustive. Add categories manually after creating the guide if important themes are missing.
- Combine similar categories: The two AIs might use different names for the same concept. Remove duplicates before guide creation or consolidate them in the guide editor afterward.
After Creating the Guide
- Run a test coding: Try the guide on a small subset of your data to see if categories work as expected.
- Add training examples: If certain categories have low accuracy, add more training examples from your reconciled results.
- Iterate: Coding guides improve over time as you add training data and refine categories.
Academic Validity
A common concern: Is AI-generated categorization academically valid? Here is why qualcode.ai's approach meets academic standards:
The Dual-Rater Principle
Inter-rater reliability is a cornerstone of qualitative research. When multiple coders independently analyze the same data and agree, this provides evidence that the coding scheme is reliable and not just one person's interpretation.
qualcode.ai applies this same principle using two genuinely independent AI systems:
- OpenAI's models (trained by OpenAI)
- Anthropic's models (trained by Anthropic)
These systems have different architectures, training data, and design philosophies. When they independently identify the same themes, this provides meaningful validation - similar to two human coders agreeing.
Human Review Required
Auto-suggest is a starting point, not a final answer. The workflow requires human review:
- Researchers decide which suggested categories to keep, modify, or delete
- Category names and descriptions are edited by humans
- The final coding guide is a human-curated product, informed by AI suggestions
Transparent Provenance
Unlike black-box AI tools, qualcode.ai shows you exactly where each suggestion came from:
- Which AI(s) identified each category
- Confidence levels based on inter-AI agreement
- Example responses that support each category
This transparency allows researchers to make informed decisions and report their methodology accurately.
Suggested Methods Section Text
When using auto-suggest, you might describe your methodology as:
"Coding categories were developed using an AI-assisted inductive approach via qualcode.ai's auto-suggest feature. Two independent large language models (OpenAI GPT-5.2 and Anthropic Claude Opus 4.5) analyzed a random sample of [N] responses to identify emergent themes. A semantic merge step reconciled categories by meaning across both models. Categories identified by both models were flagged as high-confidence; categories identified by only one model were flagged as low-confidence. All suggested categories were reviewed and refined by the research team before finalizing the coding guide."
Need more detailed templates? See our Citing qualcode.ai documentation for complete methods section templates, AI transparency statements, and supplementary materials checklists.
Frequently Asked Questions
How many responses does auto-suggest analyze?
Auto-suggest analyzes a random sample of your responses (configurable from 10 to 1,000, with a default of 300). The sample is randomly selected to ensure it represents the full range of themes in your data. Larger samples may identify more themes but cost more credits.
What if important categories are missing?
AI suggestions are a starting point, not a complete solution. If you know certain themes should be present, create the suggested guide and add the missing categories in the normal coding guide editor.
Can I run auto-suggest multiple times?
Yes. If you have updated your data or want a fresh pass, you can run auto-suggest again. Each run produces fresh suggestions that you can review independently.
Does auto-suggest cost credits?
Yes. Auto-suggest uses credits because it runs two high-capability models plus a merge step on your data. The cost depends on sample size and analysis mode: max(5, floor((5 + 0.08 × sample_size) × mode_factor)). Quick uses 1.0x; at the default sample size of 300 responses, that is 29 credits. Thorough is planned but not live yet. The exact cost is shown before you confirm.
What if the two AIs disagree on everything?
Low agreement between AIs can indicate that your data has diverse or complex themes. In this case:
- Review the low-confidence suggestions - many may still be valid
- Use a larger sample if you have enough responses available
- Use the suggestions as inspiration and create your own categories manually
Related Documentation
- Citing qualcode.ai - Methods section templates for auto-suggested categories
- Dual-Rater Methodology - How two AI raters improve reliability
- Coding Guide Best Practices - Design effective categories
- Training Versions - Improve accuracy with examples
- Agreement Metrics - Understanding Kappa and Alpha