A revised version of the summary arXiv paper V2 on LLM in the medical field, which was released last November, has been uploaded.
제목: A Survey of Large Language Models in Medicine:
Principles, Applications, and Challenges
Summary:
Large-scale language models (LLMs) such as ChatGPT have received significant attention due to their impressive ability to understand and generate human languages. Therefore, the application of LLMs in the medical field to support physician and patient care has emerged as a promising research direction in both artificial intelligence and clinical medicine. Reflecting this trend, this study provides a comprehensive overview of the principles, applications, and challenges of artificial neural networks in the medical field. Specifically, we would like to address the following questions: 1) How can medical LLMs be built? 2) What are the downstream outcomes of medical LLMs? 3) How can we utilize medical LLMs in real-world clinical practice? 4) What problems arise when using medical LLMs? 5) How can we better build and utilize medical LLMs? As a result, this study aims to provide insights into the opportunities and challenges of LLMs in the medical field and serve as valuable resources for building practical and effective medical LLMs.
arXiv: https://arxiv.org/abs/2311.05112 It looks like a regulatory framework for LLM’s use in healthcare needs to be created sooner. Currently, no LLM is approved by regulators for treatment or diagnosis in healthcare settings, but these papers keep coming up.
This is a comparative paper of AI and clinician performance in predicting the probability of diagnosis before and after the test published in JAMA, and LLM was more accurate than human clinicians in determining the probability after and after the test after the negative test in all five cases….
제목: Artificial Intelligence vs Clinician Performance in Estimating Probabilities of Diagnoses Before and After Testing
Summary:
For diagnosis, it is necessary to consider the possibility of various diseases based on the patient’s symptoms and update these possibilities based on the results of diagnostic tests. However, when combining the statistics provided and evaluating realistic patient scenarios, doctors often perform poorly in estimating the probability of disease before and after examination.1 Large-scale language models (LLMs) suggest a good understanding of clinical reasoning as they can convincingly solve difficult diagnostic cases, pass licensing tests, and communicate empathetically with patients. 2-4 This diagnostic study compared the performance of Artificial Intelligence (AI) chatbot GPT-4 (OpenAI) to a large survey of human clinicians.
Discussion:
LLM was more accurate than human clinicians in determining the probabilities before and after the test after a negative test in all five cases. LLM did not perform very well after the test result was positive. For cases that were classified as urinary tract infections (UTIs) in the problem stem but were actually asymptomatic bacteriuria, LLM estimates were worse than human estimates. Some clinicians recognized this, but the model did not and most likely provided estimates assuming that the UTI diagnosis was accurate. With the exception of the fifth test case, when AI formally solved the basic statistical inference problem, the probability output range of LLM for clinical vignetting seemed to stem from its probabilistic nature.
A limitation of this work is that it employed a simple input-output prompt design strategy, which is worth studying as other approaches can lead to better results. We also simplified the case to have clear reference criteria. Future studies should investigate LLM performance in more complex cases.
It is not clear why LLM’s performance is not very robust in post-test probabilities after positive results. However, even if incomplete, the stochastic recommendation of LLM can improve the diagnostic performance of humans through collective intelligence, especially if AI diagnostic aids can combine stochastic, narrative, and heuristic approaches to diagnosis.
Paper: https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2812737 With the final agreement of the EU’s AI regulatory bill, AI ACT, many regulatory changes for high-risk AI systems are expected.
In the healthcare field, if classified as a class IIa, IIb, and III medical device in the artificial intelligence medical device field, it is classified as a high-risk AI system and third-party conformity assistance must be performed, and the importance of ISO/IEC TS 29119-11 testing standards, including IEC 63521, is likely to be emphasized.
Now, starting with the EU, other countries will continue to make similar regulatory legislation, so I think I’m going to feel more burdened as an editor and maker of the relevant standards.
공지: https://www.consilium.europa.eu/en/press/press-releases/2023/12/09/artificial-intelligence-act-council-and-parliament-strike-a-deal-on-the-first-worldwide-rules-for-ai/
The #EU #AIA #AIRegulation Act Google Geminai video’s manipulated case analysis keeps coming up. Google was so greedy to win anyway that it made a huge mistake of exaggerating.
In the video, when you say, “Is this the right order?” in a real-time voice conversation, Jeminai says, “No, the right order is the sun, Earth, Saturn.”
In real life, with a text prompt with a picture, “Is this the right order? Consider the distance from the sun and explain your reasoning.”
In addition, the video is not recorded in real time, and the voice interface or guidance is not provided in this way.
There is also a bitter story that ChatGPT is well-matched even if asked without giving contextual information. IEC 63203-402-2 standard, which was proposed by Korea, and Veronica of the CTA in the U.S., co-editor and has been working on since 2019, has passed the FDIS vote, which is the final review stage, and will be published as an international standard no later than December or January next year.
Only one modification of typos submitted in Korea was made in the voting. Thanks to this, it can be fixed quickly.
In the future, if the step counting function is provided on all wearable devices, including mobile devices, the accuracy can be measured and presented using this IEC 63203-402-2 test method.
I hope it will be the first international standard that will be used the most in the wearable field in the future.