Training Data Versions

Training data is how qualcode.ai learns your coding style. Each reconciled disagreement becomes a training example that sharpens both AI raters in subsequent runs — a self-learning loop that starts from zero and improves with every coding cycle.

Training data is optional: You can run coding jobs without any training examples. Start with zero-shot mode, then add training data if you notice consistent misclassifications.

Looking to import a coding guide? Training versions cover uploading training examples (response → category pairs). To import a full coding guide (the categories themselves), see Import & Export coding guides.

What is Training Data?

Training data consists of example responses paired with their correct categories. When you provide training examples, they're included in the prompt to help the AI models understand nuances in your coding scheme.

Training data is particularly useful when:

Your categories have subtle distinctions that names and descriptions alone don't capture
You have domain-specific terminology or jargon
Previous coding runs showed consistent misclassifications
You want to establish a consistent coding style across multiple runs

Version Management

qualcode.ai uses a versioning system for training data. This allows you to:

Track how your training data evolves over time
Experiment with different training approaches
Maintain a stable version for production while testing improvements
Revert to previous versions if needed

Creating a New Version

There are three ways to create a new training version:

Method	When to Use
Start Empty	When you want to build training data from scratch via reconciliation
Copy Existing	When you want to modify an existing version without affecting it
Upload CSV	When you have existing coded data from another source

The Active Version

Each coding guide has one active version at a time. The active version is special because:

All new coding runs use the active version's training examples
Reconciled disagreements are added to the active version
It represents your "production" training data

Best practice: Create a copy of your active version before making major changes. This lets you experiment while keeping a stable version available.

CSV Format

When uploading training data, your CSV file must have these columns:

Column	Required	Description
`response_text`	Yes	The text content of the response
`categories`	Yes	The assigned category (or categories, pipe-separated for multi-label)

Single-Label Example

response_text,categories
"The product quality was excellent",positive
"Too expensive for what you get",negative
"Delivery was on time",neutral

Multi-Label Example

For multi-label guides, separate multiple categories with a pipe (|):

response_text,categories
"Great product but expensive",positive|price_concern
"Fast shipping and good quality",positive|delivery
"Would not recommend",negative

Category names must match: The category names in your CSV must exactly match the category names in your coding guide (case-sensitive).

Building Training Data from Reconciliation

The most effective way to build training data is through the reconciliation workflow:

Run a coding job without training data (zero-shot)
Review disagreements where the two AI raters assigned different categories
Reconcile by selecting the correct category for each disagreement
Add to training data - your reconciled responses become training examples
Re-run and measure the difference — the system gets measurably sharper with each cycle. Start with zero training data and build precision through use.

This creates a self-learning loop: every reconciled disagreement becomes a training example for both raters. The system gets measurably sharper with each cycle — start with zero training data and build precision through use.

Best Practices

Quality Over Quantity

10-20 high-quality examples per category is often sufficient
Include edge cases and ambiguous examples
Ensure examples are representative of your actual data

Balanced Representation

Include examples from all categories
Don't over-represent common categories
Include examples of what doesn't belong in each category

Iterative Improvement

Start with a small set of examples
Review coding run results to identify problem areas
Add targeted examples to address specific issues
Create new versions for significant changes

Downloading Training Data

You can download any training version as a CSV file. This is useful for:

Backing up your training data
Reviewing examples outside the application
Sharing training data with colleagues
Importing into another coding guide

Training Data Versions

What is Training Data?

Version Management

Creating a New Version

The Active Version

CSV Format

Single-Label Example

Multi-Label Example

Building Training Data from Reconciliation

Best Practices

Quality Over Quantity

Balanced Representation

Iterative Improvement

Downloading Training Data

Related Topics