Training a large language model to code qualitative research data: Results from discussions of ethical issues

Simmonds, David; Haines, Russell

Volume 18

V18 N4 Pages 46-55	Dec 2025
Training a large language model to code qualitative research data: Results from discussions of ethical issues
David Simmonds Auburn University - Montgomery Montgomery, AL USA Russell Haines Appalachian State University Boone, NC USA

Abstract: Comment coding is an important part of qualitative research, but it is a labor intensive process. Furthermore, researchers need to assess whether or not comments were coded accurately and reliability. Here, we present a process for arranging the original comments and using them to train a Google BERT large language model (LLM) that was able to code comments with 87.9% reliability. This process can be extended by future researchers to potentially code comments made in less-structured research settings, or potentially have the LLM create the comment groupings automatically.

Download this article: JISARA - V18 N4 Page 46.pdf

Recommended Citation: Simmonds, D., Haines, R.P., (2025). Training a large language model to code qualitative research data: Results from discussions of ethical issues. Journal of Information Systems Applied Research and Analytics 18(4) pp 46-55. https://doi.org/10.62273/OTJZ7714

JISARA

Volume 18

V18 N4 Pages 46-55

Dec 2025

Training a large language model to code qualitative research data: Results from discussions of ethical issues