Training Data Versions

Training data helps the AI models understand your specific coding scheme better. This guide explains how to manage different versions of your training data.

Training data is optional: You can run coding jobs without any training examples. Start with zero-shot mode, then add training data if you notice consistent misclassifications.

What is Training Data?

Training data consists of example responses paired with their correct categories. When you provide training examples, they're included in the prompt to help the AI models understand nuances in your coding scheme.

Training data is particularly useful when:

  • Your categories have subtle distinctions that names and descriptions alone don't capture
  • You have domain-specific terminology or jargon
  • Previous coding runs showed consistent misclassifications
  • You want to establish a consistent coding style across multiple runs

Version Management

qualcode.ai uses a versioning system for training data. This allows you to:

  • Track how your training data evolves over time
  • Experiment with different training approaches
  • Maintain a stable version for production while testing improvements
  • Revert to previous versions if needed

Creating a New Version

There are three ways to create a new training version:

Method When to Use
Start Empty When you want to build training data from scratch via reconciliation
Copy Existing When you want to modify an existing version without affecting it
Upload CSV When you have existing coded data from another source

The Active Version

Each coding guide has one active version at a time. The active version is special because:

  • All new coding runs use the active version's training examples
  • Reconciled disagreements are added to the active version
  • It represents your "production" training data

Best practice: Create a copy of your active version before making major changes. This lets you experiment while keeping a stable version available.

CSV Format

When uploading training data, your CSV file must have these columns:

Column Required Description
response_text Yes The text content of the response
categories Yes The assigned category (or categories, pipe-separated for multi-label)

Single-Label Example

response_text,categories
"The product quality was excellent",positive
"Too expensive for what you get",negative
"Delivery was on time",neutral

Multi-Label Example

For multi-label guides, separate multiple categories with a pipe (|):

response_text,categories
"Great product but expensive",positive|price_concern
"Fast shipping and good quality",positive|delivery
"Would not recommend",negative

Category names must match: The category names in your CSV must exactly match the category names in your coding guide (case-sensitive).

Building Training Data from Reconciliation

The most effective way to build training data is through the reconciliation workflow:

  1. Run a coding job without training data (zero-shot)
  2. Review disagreements where the two AI raters assigned different categories
  3. Reconcile by selecting the correct category for each disagreement
  4. Add to training data - your reconciled responses become training examples
  5. Repeat - run again with training data to see improved accuracy

This creates an active learning loop where the AI models continuously improve based on your feedback.

Best Practices

Quality Over Quantity

  • 10-20 high-quality examples per category is often sufficient
  • Include edge cases and ambiguous examples
  • Ensure examples are representative of your actual data

Balanced Representation

  • Include examples from all categories
  • Don't over-represent common categories
  • Include examples of what doesn't belong in each category

Iterative Improvement

  • Start with a small set of examples
  • Review coding run results to identify problem areas
  • Add targeted examples to address specific issues
  • Create new versions for significant changes

Downloading Training Data

You can download any training version as a CSV file. This is useful for:

  • Backing up your training data
  • Reviewing examples outside the application
  • Sharing training data with colleagues
  • Importing into another coding guide

Related Topics