Benchmarking Generative AI: A Call for Establishing a Comprehensive Framework and a Generative AIQ Test

Main Article Content

Malik Sallam
Roaa Khalil
Mohammed Sallam

Abstract

The introduction and rapid evolution of generative artificial intelligence (genAI) models necessitates a refined understanding for the concept of “intelligence”. The genAI tools are known for its capability to produce complex, creative, and contextually relevant output. Nevertheless, the deployment of genAI models in healthcare should be accompanied appropriate and rigorous performance evaluation tools. In this rapid communication, we emphasizes the urgent need to develop a “Generative AIQ Test” as a novel tailored tool for comprehensive benchmarking of genAI models against multiple human-like intelligence attributes. A preliminary framework is proposed in this communication. This framework incorporates miscellaneous performance metrics including accuracy, diversity, novelty, and consistency. These metrics were considered critical in the evaluation of genAI models that might be utilized to generate diagnostic recommendations, treatment plans, and patient interaction suggestions. This communication also highlights the importance of orchestrated collaboration to construct robust and well-annotated benchmarking datasets to capture the complexity of diverse medical scenarios and patient demographics. This communication suggests an approach aiming to ensure that genAI models are effective, equitable, and transparent. To maximize the potential of genAI models in healthcare, it is important to establish rigorous, dynamic standards for its benchmarking. Consequently, this approach can help to improve clinical decision-making with enhancement in patient care, which will enhance the reliability of genAI applications in healthcare.

Downloads

Download data is not yet available.

Article Details

How to Cite
Sallam, M., Khalil, R., & Sallam, M. (2024). Benchmarking Generative AI: A Call for Establishing a Comprehensive Framework and a Generative AIQ Test . Mesopotamian Journal of Artificial Intelligence in Healthcare, 2024, 69–75. https://doi.org/10.58496/MJAIH/2024/010
Section
Articles