ChatGPT is nearly capable of passing the US Medical Licensing Exam

A surgical robot at work at Glasgow Royal Infirmary.

Image: Jane Barlow-Pool/Getty Images

Before long, it’ll be easier to list the tasks ChatGPT can’t complete than the ones it can. We have already shared reports about ChatGPT passing law school and business school exams, and now a new study reveals that the AI chatbot can also pass the United States Medical Licensing Exam (USMLE), though its score isn’t especially impressive.

Researchers from healthcare startup Ansible Health shared the results of their study in the journal PLOS Digital Health on February 9. They found that ChatGPT was able to score “at or around the approximately 60 percent passing threshold” for the licensing exam.

As the website explains, the USMLE is a three-step exam that physicians are required to take for medical licensure in the US. In addition to testing the skills and medical knowledge of prospective physicians, the test also assesses their values and attitudes.

After eliminating image-based questions, the researchers fed ChatGPT 350 of the 376 questions from the June 2022 USMLE. Across the three exams, ChatGPT scored between 52.4% and 75%. In most years, the passing threshold is around 60%. ChatGPT also outscored PubMedGPT — a model trained exclusively on biomedical literature — which scored 50.8%.

The authors say: “Reaching the passing score for this notoriously difficult expert exam, and doing so without any human reinforcement, marks a notable milestone in clinical AI maturation.”

Shortly after the study was published, the Federation of State Medical Boards and National Board of Medical Examiners, both USMLE co-sponsors, shared a statement of their own. They note that two recent studies used test prep material and practice questions as opposed to actual USMLE exam questions. As such, ChatGPT’s achievement comes with an asterisk:

…it’s important to note that the practice questions used by ChatGPT are not representative of the entire depth and breadth of USMLE exam content as experienced by examinees. For example, certain question types were not included in the studies, such as those using pictures, heart sounds, and computer-based clinical skill simulations. This means that other critical test constructs are not being represented in their entirety in the studies.

“Although there is insufficient evidence to support the current claims that AI can pass the USMLE Step exams, we would not be surprised to see AI models improve their performance dramatically as the technology evolves,” the groups added. “If utilized correctly, these tools can have a positive impact on how assessments are built and how students learn.”

Don’t Miss: ChatGPT in Microsoft Bing goes off the rails, spews depressive nonsense

This article talks about:

ChatGPT

Jacob Siegal Associate Editor

Jacob Siegal is Associate Editor at BGR, having joined the news team in 2013. He has over a decade of professional writing and editing experience, and helps to lead our technology and entertainment product launch and movie release coverage.