A paper has been posted on arXiv that tested how much it could help to apply GPT-4V for medical purposes through prompt improvement. It has 71 pages with test cases and results, but it feels a little like a yudzu-mi.
제목 : ENHANCING MEDICAL TASK PERFORMANCE IN GPT-4V: A
COMPREHENSIVE STUDY ON PROMPT ENGINEERING
STRATEGIES
Summary:
GPT-4V (ision), OpenAI’s latest large visual language model (LVLM), has attracted significant interest due to its potential in healthcare. However, recent research and internal review have revealed that it lacks performance in professional healthcare tasks. In this paper, we specifically explore the limitations of GPT-4V in processing complex image data such as endoscopy, CT scan, and MRI. We utilized the open-source dataset to evaluate our underlying competencies and identify areas where improvements are needed. Through iterative testing, we have improved the model’s prompts, which significantly improves interpretation accuracy and relevance in medical imaging. Through a comprehensive evaluation, we derived ten effective prompt engineering techniques, each of which reinforced the medical insights of GPT-4V. These systematic improvements allow GPT-4V to gain more reliable, accurate, and clinically valuable insights, and improve operability in critical healthcare settings. The findings play a pivotal role for those using AI in healthcare, providing clear and actionable guidance to take full advantage of the diagnostic potential of GPT-4V.
In our study, the endoscopy group in the test set utilized data from the Kvasir_SEG[8] and M2caiseg datasets [9] along with data from the cervical examination procedures we collected. For CT images, the AutoPET[10], TotalSegmentator[11], and AbdomenCT-1K[12] datasets were used to supplement the data from relevant literature. MRI data included the BraTS 2021[13], ATLAS V2.0, and AMOS 2021[14] datasets. It is important to acknowledge that this dataset editing is not comprehensive and does not represent all medical scenarios. Nevertheless, this provides an appropriate basis for testing prompt tips. Future efforts will aim to further enrich the research by expanding this dataset.
10 Prompt Tips Derived
- Concise language is more effective than complex descriptions, and concise and task-related details are highlighted in image analysis.
- Providing work can be more helpful for image analysis.
- Implementing step-by-step guidance with multi-step dialogue enables GPT-4V to more efficiently handle complex tasks by segmenting them into simpler tasks.
- Do not initially expose the target when you start the multi-round dialog.
- Explaining appearance or characteristics greatly improves performance.
- To avoid misunderstanding, appearance descriptors should not conflict with image annotations.
- Clarifying the contextual relationship between successive images improves the analysis accuracy and detail recognition of GPT-4V.
- Connecting multiple images together and providing sequence sequences improves task processing efficiency compared to inputting multiple images simultaneously.
- In particular, it intensifies lesion or condition evaluation by providing opportunities for comparative analysis with temporal patient data.
- Focusing GPT-4V on the regions of interest yields more targeted and relevant results.
arXiv: https://arxiv.org/abs/2312.04344
Browse: https://browse.arxiv.org/pdf/2312.04344.pdf
PDF: https://arxiv.org/pdf/2312.04344.pdf
arXiv-vanity: https://www.arxiv-vanity.com/papers/2312.04344
Paper page: https://huggingface.co/papers/2312.04344
Papers with code: https://paperswithcode.com/paper/enhancing-medical-task-performance-in-gpt-4v There’s Pixel 8 Pro news with Gemini Nano in it, but I don’t think it’s well introduced in the feed, so I’m sharing it.
“Pixel 8 Pro, the first smartphone with AI built-in, now runs Gemini Nano with more AI updates to the Pixel portfolio. Gemini Nano now supports the on-device creation AI capabilities of the Pixel 8 Pro.”
Maybe it’s the first time I can show what applications are available when my phone is loaded with LLM…
The two use cases that Google introduces on page 8 of the pixel. When the actual product comes out, and the application starts to develop, more use case ideas will come out. You can also see if LLM can really be useful….
Anyway, Google got the title of the first mobile phone equipped with LLM.
- Recorder Summary
The most efficient model built for on-device experiences, the Gemini Nano now supports summaries in the Pixel 8 Pro’s recorder app. Get summaries of recorded conversations, interviews, presentations, and more without a network connection.
- Gboard’s smart reply
Within the Pixel 8 Pro, Gemini Nano has started supporting Smart Reply on Google’s keyboard app, Gboard, as a developer preview. The on-device AI model, which is available now on WhatsApp and will be available on more apps next year, saves time by suggesting high-quality responses through dialogue recognition.
Early next year, it will also include supporting Assistant with Bard on Pixel devices, and will basically upgrade to smarter, much more capable models in some areas through voice-enabled interfaces…
“Gemini Ultra outperforms GPT-4 in Large Multi-Work Language Understanding (MMLU) by 32 examples combined with Chain of Thought.”
There are criticisms of the results suggested by Gemini Ultra that it outperformed GPT4. The performance evaluation method that claims to be lower and better than GPT4 in terms of the new shot performance criteria for the five examples is a different method. In other words, for each question, you can combine CoT and try it 32 times to get the expected results
I think we’ll know whether it’s a marketing victory or a technology victory only when the actual product comes out and receives user evaluation.
Find out more in the paper – https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf
Blog: https://blog.google/products/pixel/pixel-feature-drop-december-2023/
YouTube: There is a lot of controversy in the U.S. over regulations to label “AI Nutrition Facts,” such as https://www.youtube.com/watch?v=IjenqKORUSM ‘s food nutrition labeling label.
The fundamental purpose is to increase choice and transparency when using AI, and it seems that it started when HHS proposed a regulatory plan in June to reflect specific information about the algorithm and system in EHR when using AI.
Since then, the Biden administration’s AI executive order came out at the end of October, which seems to have expanded to ensure transparency, leading to discussions on AI Nutrition label attachment regulations.
In fact, similar proposals and discussions were made in a paper published in NPJ digital media in 2020 to provide accurate information on ML models using model fact labels. At that time, I agreed on the need to provide and disclose information, but there are so many things to consider about specific labeling methods and issues with their practical utility that the discussion did not spread.
Other similarities include efforts to create reporting guidelines to share medical AI research processes and content in a standardized way, such as CONSORT-AI, SPIRIT-AI, STARD-AI, DECIDE-AI.
The question is: 1) What can be meaningfully standardized and put on this label? 2) How meaningful can it be in this field where technology is developing at the speed of light? 3) How much can it be helpful to consumers without burdening developers. Sooo-Yong Shin Kyuhwan Jung Chungkeun Lee Kim Hui-young Jung-Woo Ha
Above all, in order to enable proper labeling, the evaluation process itself must be standardized, the evaluation results must be standardized and recorded, and the basis for consistent evaluation and verification based on the evaluation results must be established to label each other in the same language.
In that sense, I think it is another important area to consider as an extended series in the IEC 63521 standard that began development.