Training Data Versions
Training data helps the AI models understand your specific coding scheme better. This guide explains how to manage different versions of your training data.
Training data is optional: You can run coding jobs without any training examples. Start with zero-shot mode, then add training data if you notice consistent misclassifications.
What is Training Data?
Training data consists of example responses paired with their correct categories. When you provide training examples, they're included in the prompt to help the AI models understand nuances in your coding scheme.
Training data is particularly useful when:
- Your categories have subtle distinctions that names and descriptions alone don't capture
- You have domain-specific terminology or jargon
- Previous coding runs showed consistent misclassifications
- You want to establish a consistent coding style across multiple runs
Version Management
qualcode.ai uses a versioning system for training data. This allows you to:
- Track how your training data evolves over time
- Experiment with different training approaches
- Maintain a stable version for production while testing improvements
- Revert to previous versions if needed
Creating a New Version
There are three ways to create a new training version:
| Method | When to Use |
|---|---|
| Start Empty | When you want to build training data from scratch via reconciliation |
| Copy Existing | When you want to modify an existing version without affecting it |
| Upload CSV | When you have existing coded data from another source |
The Active Version
Each coding guide has one active version at a time. The active version is special because:
- All new coding runs use the active version's training examples
- Reconciled disagreements are added to the active version
- It represents your "production" training data
Best practice: Create a copy of your active version before making major changes. This lets you experiment while keeping a stable version available.
CSV Format
When uploading training data, your CSV file must have these columns:
| Column | Required | Description |
|---|---|---|
response_text | Yes | The text content of the response |
categories | Yes | The assigned category (or categories, pipe-separated for multi-label) |
Single-Label Example
response_text,categories "The product quality was excellent",positive "Too expensive for what you get",negative "Delivery was on time",neutral
Multi-Label Example
For multi-label guides, separate multiple categories with a pipe (|):
response_text,categories "Great product but expensive",positive|price_concern "Fast shipping and good quality",positive|delivery "Would not recommend",negative
Category names must match: The category names in your CSV must exactly match the category names in your coding guide (case-sensitive).
Building Training Data from Reconciliation
The most effective way to build training data is through the reconciliation workflow:
- Run a coding job without training data (zero-shot)
- Review disagreements where the two AI raters assigned different categories
- Reconcile by selecting the correct category for each disagreement
- Add to training data - your reconciled responses become training examples
- Repeat - run again with training data to see improved accuracy
This creates an active learning loop where the AI models continuously improve based on your feedback.
Best Practices
Quality Over Quantity
- 10-20 high-quality examples per category is often sufficient
- Include edge cases and ambiguous examples
- Ensure examples are representative of your actual data
Balanced Representation
- Include examples from all categories
- Don't over-represent common categories
- Include examples of what doesn't belong in each category
Iterative Improvement
- Start with a small set of examples
- Review coding run results to identify problem areas
- Add targeted examples to address specific issues
- Create new versions for significant changes
Downloading Training Data
You can download any training version as a CSV file. This is useful for:
- Backing up your training data
- Reviewing examples outside the application
- Sharing training data with colleagues
- Importing into another coding guide
Related Topics
- Key Concepts - Overview of training data in context
- Coding Guide Best Practices - Design effective categories
- Agreement Calculation - Understand reliability metrics