Artificial Intelligence and Medical Ethics: Where Models Stumble?

Researchers have found that even the most advanced artificial intelligence (LLM) models, such as ChatGPT, can make significant errors in complex medical ethics scenarios. The study, published in NPJ Digital Medicine, highlights the need for careful human oversight in the use of AI in healthcare settings.

The research team from the Icahn School of Medicine at Mount Sinai, in collaboration with the Rabin Medical Center in Israel, modified well-known ethical dilemmas to examine the capabilities of LLMs. For example, in the classic "Surgeon's Dilemma," where a boy is injured and the surgeon exclaims, "I cannot operate on this boy – he is my son!", AI models often incorrectly assumed that the surgeon was the mother, even when given the information that the father was the surgeon.

Similarly, in a scenario where parents consent to a blood transfusion for their child, some models continued to suggest overriding a refusal that no longer existed. The findings indicate that LLMs may rely on familiar patterns and biases, overlooking crucial details.

The researchers emphasize that AI can be useful as a complement to clinical expertise, but it should not replace human judgment, especially in high-risk decisions. Human oversight is essential for handling situations that require ethical sensitivity and emotional intelligence.