Catherine Terhaar, MS, LCGC (she/her) ; Diana Tully, MS, LCGC (she/her); Tessa Niemchak, MS, CGC, CCGC (she/her)
The article below reflects the personal opinions of the author(s) and does not reflect the views or opinions of the Perspectives editors or committee, or the National Society of Genetic Counselors (NSGC).
Artificial intelligence (AI) — we can’t escape it, it’s everywhere we turn, people can’t stop talking about it and some even believe it will take over the world. We’re betting it will change the world, so the real question is: what can AI do to make our lives as genetic counselors better? In an effort to leverage this technology, we attempted to use AI to interpret the National Comprehensive Cancer Network (NCCN) guidelines.
The latest wave of AI, particularly large language models (LLMs), has demonstrated unprecedented advancements in natural language processing and machine learning. LLMs have the potential to analyze vast datasets, understand language in context and generate meaningful conversations.
One potential application of LLMs is interpreting the NCCN guidelines for hereditary cancer testing. These guidelines are complex, requiring a deep understanding of family structure, cancer histology and nuances embedded within multiple footnotes and qualifiers. The interpretation of these guidelines is critical to ensure that patients receive appropriate genetic testing and to facilitate reimbursement by payors. However, many individuals responsible for applying these guidelines lack the necessary genetic or clinical expertise, leading to inconsistent or incorrect application. Our group investigated the accuracy of LLMs in applying NCCN criteria to hypothetical clinical scenarios.
Experience Using ChatGPT (round one)
ChatGPT, developed by OpenAI, was selected as the initial target for this investigation. To start, ChatGPT (Version GPT-3.5) was asked if a patient meets NCCN testing criteria based on a hypothetical scenario. This approach was unsuccessful for a few reasons. ChatGPT has a lag in knowledge and provides answers based on its most recent update, which at the time of our research was January 2022. The NCCN guidelines are updated once or twice a year, so ChatGPT was using outdated guidelines. Additionally, the output repeatedly stated that ChatGPT is not a doctor or genetic counselor and was unable to assess a patient’s personal/family history. While this is an accurate and responsible answer, it was not helpful for interpreting guidelines. Another limitation was that ChatGPT provided different answers to the same hypothetical scenarios, which was problematic for this application where consistency and reliability are important.
Experience Using ChatGPT (round two)
After discovering these limitations, we brainstormed ways to use ChatGPT’s adaptability to overcome these setbacks. This time, a user fed prompts to ChatGPT instructing it to create a list of criteria identical to the current NCCN guidelines¹. After creating the list of criteria, ChatGPT was asked to analyze a given family history to determine if a patient meets criteria. The hope was that we would overcome the barriers identified in round one. We were successful: the guidelines were up to date and ChatGPT performed like a chatbot. ChatGPT demonstrated its ability to learn the guidelines and responded positively to feedback including a request to provide a yes/no answer as to whether a patient met criteria.
This approach took time and had its own limitations. The homemade chatbot was not simple to create and required many reminders to ensure accurate output. ChatGPT consistently struggled with the difference between first- and second-degree relatives. ChatGPT turned chatbot was also not scalable. A second user input identical prompts to create a list of criteria on a separate ChatGPT account. However, ChatGPT’s output was less accurate for this second user. Both users asked if a patient met NCCN criteria based on 13 hypothetical family history scenarios. The first user’s ChatGPT responded correctly to 10/13 (77%). Comparatively, the second user’s ChatGPT responded correctly to 6/13 (54%). This approach could not be reliably scaled and required constant assessment for accuracy and a user with a deep knowledge of the guidelines.
Experience Using ClaudeAI
ClaudeAI (Claude), developed by Anthropic AI, has a special capability that was of interest to our team: users can upload documents to teach Claude. The team had high expectations as a snippet from the NCCN guidelines was uploaded to the AI assistant. Claude handled the first hypothetical family history scenarios easily, providing accurate responses including why the specific scenario did or did not meet criteria. However, as the scenarios evolved, Claude made errors. It had difficulty with Ashkenazi Jewish ancestry, triple negative breast cancer diagnoses and families that included three close relatives with a breast cancer diagnosis. Then Claude made an error that seemed impossible for AI technology — it stated that 51 years old was less than 50 years old. LLMs are known to hallucinate, but our team was surprised that a numerical hallucination like this was possible². Given the importance of age of diagnosis for accurate interpretation of NCCN guidelines, these hallucinations make this application impossible.
Conclusions
While our study found that large language models were not able to accurately interpret the NCCN guidelines for hereditary cancer testing, their potential remains undeniable. Though they may not yet handle complex clinical decision-making, LLMs excel in tasks such as generating clear, compelling text — like assisting with the transformation of this piece into an op-ed format. This highlights an important takeaway: genetic counselors should not view AI as a replacement for their expertise, but as a valuable tool that can enhance their practice in complementary ways. By learning to interact with and leverage LLMs effectively, genetic counselors can focus more on patient care while relying on AI for tasks like documentation, communication and possibly, in the future, more accurate guideline interpretation.
Disclosure: ChatGPT was used to generate ideas and text for portions of this article.
1. National Comprehensive Cancer Network®. NCCN Clinical Practice Guidelines in Oncology. Genetic/Familial High-Risk Assessment: Breast, Ovarian, and Pancreatic. V3.2024. http://www.NCCN.org
2. Ji, Z., Lee, N., Frieske, R., et al. (2023). Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, 55(12), Article 248: 1-38.
Catherine Terhaar, MS, LCGC (she/her) graduated from the Arcadia genetic counseling program in 2009 and currently lives in Grand Rapids, Michigan. She has clinical experience in pediatric genetics and has provided lab support for women’s health and oncology genetics. Terhaar has a special interest in research, genetics education and supporting genetic counseling students. Terhaar is an employee of Quest Diagnostics; however, this work was completed in the author’s capacity as an independent scholar.
Diana Tully, MS, LCGC (she/her) graduated from the Ichan School of Medicine at Mt. Sinai Genetic Counseling Training Program in 2004 and currently lives in Ocala, Florida. Her primary area of expertise is oncology. She has held clinical positions in academic, for-profit and not-for-profit hospitals and has also worked in industry providing lab support and sales support. She is passionate about personalized medicine and genetics education. Tully is an employee of and holds stock in Quest Diagnostics; however, this work was completed in the author’s capacity as an independent scholar.
Tessa Niemchak, MS, CGC, CCGC (she/her) graduated from the Wayne State University School of Medicine genetic counseling program in 2012 and currently lives in Ayr, Ontario, Canada. She has clinical experience in hereditary oncology and has laboratory GC experience in hereditary oncology, women’s health, pediatrics and pharmacogenetics. Niemchak has a specific interest in research topics focused on laboratory genetic medicine and its impact on patient care. Niemchak is an employee of Quest Diagnostics; however, this work was completed in the author’s capacity as an independent scholar.