Readability, quality, and reliability of AI-generated information on myofascial pain syndrome: A comparative analysis of ChatGPT, Gemini, and Perplexity

Published on June 5, 2026

PLoS One. 2026 Jun 4;21(6):e0350402. doi: 10.1371/journal.pone.0350402. eCollection 2026.

ABSTRACT

Patients seeking information about Myofascial Pain Syndrome (MPS), which affects a large segment of the population, are increasingly turning to AI-based chatbots as an alternative to traditional methods. However, the medical accuracy of the content offered by these digital platforms, as well as its suitability to the "grade 6 reading level" standard, which determines its comprehensibility by patients, is a critical point of uncertainty. This study aims to fill this significant gap in the literature by systematically comparing MPS content generated by different AI models using readability indices, reliability, and quality metrics. The 18 most relevant keywords, derived from 25 keywords identified via Google Trends data, were queried using ChatGPT (GPT-5.2), Gemini 3 Flash, and Perplexity (Sonar-4 Large) models. The readability of the generated responses was analyzed using six different indices (FRES, FKGL, GFOG, CLI, ARI, SMOG), while content quality was assessed using GQS and EQIP scales, and reliability using DISCERN and JAMA scales by two independent observers. The responses generated by all AI models examined were found to be statistically significantly more complex than the suggested 6th-grade reading level (p < 0.001). In inter-model comparisons, ChatGPT exhibited the easiest readability [lowest linguistic difficulty] scores, while Perplexity scored significantly higher than both ChatGPT and Gemini in content quality and reliability metrics (JAMA, DISCERN, GQS, EQIP) (p < 0.05). Correlation analysis revealed a strong and positive relationship between quality and reliability parameters. Artificial intelligence platforms have been observed to exhibit high potential in the production of medical information. However, linguistic barriers exceeding sixth-grade reading comprehension, along with reliability limitations of current models, prevent them from replacing professional medical consultation. Perplexity has been found superior in terms of academic quality, while ChatGPT has been found superior in terms of readability. Nevertheless, positioning these systems as complementary "secondary consultation mechanisms" supporting physician oversight in clinical decision-making processes is critically important for patient safety.

PMID:42241388 | DOI:10.1371/journal.pone.0350402