Citing qualcode.ai

When using qualcode.ai in academic research, proper citation helps readers understand your methodology and enables reproducibility. This page provides citation formats and template language for your publications.

Software Citation

Cite qualcode.ai as software in your references section using these formats:

APA 7th Edition

qualcode.ai. (2026). qualcode.ai: AI-powered dual-rater qualitative
    coding (Version 1.0) [Computer software]. https://qualcode.ai

BibTeX

@software{qualcode2026,
  title = {qualcode.ai: AI-powered dual-rater qualitative coding},
  author = {{qualcode.ai}},
  year = {2026},
  version = {1.0},
  url = {https://qualcode.ai},
  note = {Accessed: YYYY-MM-DD}
}

Chicago (Author-Date)

qualcode.ai. 2026. qualcode.ai: AI-powered Dual-Rater Qualitative
    Coding. Version 1.0. https://qualcode.ai.

Vancouver

qualcode.ai. qualcode.ai: AI-powered dual-rater qualitative coding
    [Internet]. Version 1.0. 2026 [cited YYYY Mon DD]. Available from:
    https://qualcode.ai

Version and access date: Include the version number and the date you accessed the service. This helps with reproducibility as the platform evolves.

Methods Section Language

Use this template text in your methods section, adapting the bracketed values to your study:

Finding your model names: Replace bracketed placeholders like [OpenAI model name] with the actual model names from your coding run summary. Each coding run records which models were used.

Standard Template

"Open-ended survey responses (N = [total responses]) were coded using qualcode.ai (https://qualcode.ai), an AI-assisted qualitative coding platform. The system employs a dual-rater methodology where two independent large language models ([OpenAI model name] and [Anthropic model name]) code each response, enabling calculation of inter-rater reliability metrics. Models were configured with temperature 0.0, following provider recommendations for classification tasks (OpenAI, 2024; Anthropic, 2024).

Categories were defined by the research team based on [theoretical framework / prior research / inductive analysis]. [N] training examples were provided to guide the AI classification. Inter-rater agreement between the two AI raters was [interpretation] (Cohen's κ = [value]; Krippendorff's α = [value]; percent agreement = [value]%).

Disagreements (n = [count], [percentage]%) were reviewed and reconciled by [author initials / "two independent researchers"]. Final category assignments were used for subsequent analysis."

Zero-Shot Template (No Training Data)

If you did not provide training examples:

"Responses were coded using qualcode.ai in zero-shot mode, where category assignments were based solely on the category names and descriptions provided by the research team. No training examples were used. This approach tests whether the categories are sufficiently clear for unambiguous classification."

With Reconciliation Details

If you want to describe your reconciliation process:

"Disagreements were resolved through [independent review by two coders / discussion until consensus / majority vote among three reviewers]. Reconciled classifications were [used as additional training data for a second coding pass / retained as final codes]."

Auto-Suggested Coding Guides

If you used qualcode.ai's auto-suggest feature to generate your coding guide, describe the process accurately. The methodology involves dual AI category extraction followed by semantic merging and human refinement.

Methodology Overview

The auto-suggest feature uses a dual-rater approach to category generation:

Dual Independent Analysis: Two large language models (OpenAI GPT-5.2 with extended reasoning and Anthropic Claude Opus 4.5 with extended thinking) independently analyze a random sample of responses to identify emergent categories.
Semantic Merge: Categories identified by both models are flagged as high-confidence. A third AI pass performs semantic reconciliation to merge similar categories across raters based on meaning, not just name matching.
Human Refinement: Researchers review all suggested categories, with the option to rename, merge, split, add, or remove categories. Final category names, descriptions, and inclusion/exclusion criteria are determined by the research team.

Template for Auto-Suggested Categories

"Coding categories were developed using an AI-assisted inductive approach via qualcode.ai's auto-suggest feature. Two independent large language models (OpenAI GPT-5.2 and Anthropic Claude Opus 4.5) analyzed a random sample of [N] responses to identify emergent themes. Categories identified by both models were flagged as high-confidence; categories identified by only one model were flagged as low-confidence.

The AI-suggested categories were reviewed and refined by the research team. [Describe modifications: e.g., "Three categories were merged due to conceptual overlap, and two low-confidence categories were removed as not relevant to the research questions."] The final coding guide consisted of [N] categories with researcher-authored descriptions and inclusion/exclusion criteria.

This coding guide was then applied using qualcode.ai's dual-rater classification, where [OpenAI model name] and [Anthropic model name] independently coded all [N] responses..."

Template for Minimal Modification

If you accepted the AI suggestions with minimal changes:

"Initial coding categories were generated using qualcode.ai's auto-suggest feature, which employs dual independent AI analysis (OpenAI GPT-5.2 and Anthropic Claude Opus 4.5) to identify themes from a sample of [N] responses. Categories agreed upon by both models were retained with minor editorial refinements to category names and descriptions. The research team validated that all categories were conceptually distinct and relevant to the research questions before proceeding with classification."

Template for Substantial Refinement

If you substantially modified the AI suggestions:

"AI-suggested categories served as a starting point for coding guide development. Two independent large language models identified [N] initial categories from a sample of responses. The research team substantially revised these suggestions based on [theoretical framework / domain expertise / prior literature], resulting in a final coding guide of [N] categories that [describe how they differ: e.g., "better aligned with established constructs in the literature" or "reflected more granular distinctions relevant to our research questions"]."

Transparency principle: Clearly distinguish between AI-suggested categories and researcher-defined categories. If you used auto-suggest as a starting point but made substantial changes, describe both the AI contribution and your modifications. Reviewers appreciate transparency about where ideas originated.

AI Transparency Statement

Many journals now require disclosure of AI use in research. Use this template for transparency statements:

Standard Disclosure (Researcher-Defined Categories)

"This research used AI-assisted coding via qualcode.ai. The AI models ([OpenAI model name] and [Anthropic model name]) performed initial coding of open-ended survey responses based on researcher-defined categories. All AI-generated codes were subject to inter-rater reliability assessment, and disagreements were resolved by human researchers.

Survey response data was processed via OpenAI and Anthropic enterprise APIs under data processing agreements. No participant data was used for AI model training. All processing complied with [GDPR / institutional ethics requirements]."

Disclosure with Auto-Suggested Categories

"This research used AI-assisted coding via qualcode.ai. Coding categories were initially generated using dual AI analysis (OpenAI GPT-5.2 and Anthropic Claude Opus 4.5), then reviewed and refined by the research team. Classification was performed by two independent AI models ([OpenAI model name] and [Anthropic model name]) with inter-rater reliability assessment. All AI-suggested categories and AI-generated codes were subject to human review.

For Ethics Committees / IRB

When describing AI use in ethics applications:

"Open-ended response data will be processed using qualcode.ai, an AI-assisted coding platform. Data is transmitted to OpenAI and Anthropic APIs for classification under enterprise data processing agreements that prohibit use of data for model training. Data is processed in EU data centers (Frankfurt) and deleted within 30 days. No identifying information is required for the coding process; responses can be anonymized before upload if required."

Reporting Quality Tiers

If you used non-default model settings, report the quality tier:

Models listed below are current as of January 2026. Check your coding run summary for actual models used.

Tier	Models Used	Report As
Budget	GPT-4o-mini, Claude Haiku 4.5	"...using budget-tier models (GPT-4o-mini, Claude Haiku 4.5)..."
Standard	GPT-4o-mini, Claude Haiku 4.5	"...using standard-tier models (GPT-4o-mini, Claude Haiku 4.5)..."
Quality	GPT-4o, Claude Sonnet 4	"...using quality-tier models (GPT-4o, Claude Sonnet 4)..."
Premium	GPT-4o, Claude Opus 4	"...using premium-tier models (GPT-4o, Claude Opus 4)..."

Example Full Methods Section

Here are complete examples for hypothetical studies:

Example: Researcher-Defined Categories

Qualitative Coding of Open-Ended Responses

Participants' responses to the open-ended question "What factors influenced your decision?" (N = 1,247) were coded using qualcode.ai (Version 1.0; https://qualcode.ai), an AI-assisted qualitative coding platform employing dual-rater methodology.

A coding scheme with eight categories was developed based on the Health Belief Model (Rosenstock, 1974): perceived susceptibility, perceived severity, perceived benefits, perceived barriers, cues to action, self-efficacy, knowledge, and other. Category descriptions and five training examples per category were provided to the system.

Two independent large language models ([OpenAI model name] and [Anthropic model name]) coded each response. Inter-rater agreement was substantial (Cohen's κ = 0.74; Krippendorff's α = 0.76; percent agreement = 82.3%). Disagreements (n = 221, 17.7%) were reviewed by two authors (AB, CD) who reached consensus on final classifications.

Response data was processed in compliance with GDPR via EU data centers. No data was used for AI model training per enterprise API agreements.

Example: Auto-Suggested Categories with Refinement

Qualitative Coding of Open-Ended Responses

Participants' responses to the open-ended question "How has remote work affected your wellbeing?" (N = 2,156) were coded using qualcode.ai (Version 1.0; https://qualcode.ai), an AI-assisted qualitative coding platform.

Coding categories were developed using an AI-assisted inductive approach. Two independent large language models (OpenAI GPT-5.2 and Anthropic Claude Opus 4.5) analyzed a random sample of 300 responses to identify emergent themes. Categories identified by both models were flagged as high-confidence. The dual-AI analysis initially suggested 12 categories; 8 were identified by both models (high-confidence) and 4 by only one model (low-confidence).

The research team reviewed all suggestions, merging two overlapping categories ("work-life balance" and "boundary management" into "work-life boundaries") and removing one low-confidence category ("commuting") as tangential to the research questions. The final coding guide consisted of 10 categories with researcher-authored descriptions.

Classification was performed using qualcode.ai's dual-rater methodology, where two independent models ([OpenAI model name] and [Anthropic model name]) coded each response. Inter-rater agreement was substantial (Cohen's κ = 0.71; Krippendorff's α = 0.73; percent agreement = 79.8%). Disagreements (n = 436, 20.2%) were reviewed and reconciled by two authors (EF, GH).

Response data was processed in compliance with GDPR via EU data centers. No data was used for AI model training per enterprise API agreements.

Supplementary Materials

Consider including these in supplementary materials for full transparency:

Coding guide: Full category names, descriptions, and examples
Agreement details: Per-category Kappa and Alpha values, confusion matrix
Reconciliation log: Summary of disagreement types and resolution rationale
Model settings: Quality tier, temperature (0.0), confidence threshold, any pre-filtering applied

If you used auto-suggested categories, also consider including:

Original AI suggestions: The unmodified categories suggested by each AI model
Confidence levels: Which categories were high-confidence (both AIs) vs. low-confidence (one AI)
Modification log: What changes were made during human refinement (merges, deletions, additions, renames)
Sample size: How many responses were analyzed for category generation

Citing Reliability Metrics

When discussing your reliability metrics, cite the original methodological references:

Cohen's Kappa

Cohen, J. (1960). A coefficient of agreement for nominal scales.
    Educational and Psychological Measurement, 20(1), 37-46.

Kappa Interpretation Scale

Landis, J. R., & Koch, G. G. (1977). The measurement of observer
    agreement for categorical data. Biometrics, 33(1), 159-174.

Krippendorff's Alpha

Krippendorff, K. (2004). Content analysis: An introduction to its
    methodology (2nd ed.). Sage Publications.

Temperature Settings for LLM Classification

Renze, M., & Guven, E. (2024). The effect of sampling temperature on
    problem solving in large language models. Findings of EMNLP 2024,
    7346-7356. https://aclanthology.org/2024.findings-emnlp.432/

Check journal requirements: Some journals have specific AI disclosure requirements. Check the author guidelines before submission to ensure your methodology description meets their standards.

Related: Learn more about the Dual-Rater Methodology, see how Agreement Metrics are calculated, or read about Auto-Suggested Coding Guides.