Among 4,000 generated dermatology images, only about one in 10 depicted dark skin

esearchers found substantial deficiencies in the diversity and accuracy of artificial intelligence- (AI-) generated dermatological images, according to a recent study in the Journal of the European Academy of Dermatology and Venereology (JEADV). The use of AI could worsen cognitive bias and health inequity, according to the study.

A group of international researchers looking at AI from the perspective of aesthetic dermatology reported much the same in a review published in 2024 in the Journal of Cosmetic Dermatology

While AI’s role in the aesthetic specialty holds promise, challenges and limitations exist.

Data gaps could be to blame

‘One major limitation is the lack of comprehensive datasets that reflect diversity in the patient population. Most AI models are trained on datasets that may not adequately represent different ages, skin types, and ethnicities, leading to potential biases and less accurate diagnoses for certain groups,’ the authors wrote. 

Lucie Joerg, BA, first author on the JEADV study, agreed, noting that generative AI images lack inclusive skin-tone diversity, likely due to unrepresentative training datasets that include non-Caucasian skin.

‘Our study demonstrated that among 4,000 generated dermatology images, only 10.2% depicted dark skin, and three of four models significantly underrepresented skin of colour. The deficient skin-tone representation in AI-generated images risks amplifying algorithmic bias and widening existing health disparities among already underserved patient populations,’ Ms. Joerg said.

The study’s senior author Jared Jagdeo, MD, MS, associate professor of dermatology and director of the Center for Photomedicine at SUNY Downstate Health Sciences University, Brooklyn, New York, USA, told PRIME Journal that he commonly uses Google images in practice in search of examples of skin conditions to share with patients. However, he’s well aware of AI’s continued shortcomings. 

He and colleagues published a study in Journal of the American Academy of Dermatology (JAAD) in 2022 demonstrating that the Google search is deficient in images that depict skin conditions in skin of colour. 

‘Based upon our current research … AI is not yet ready for widespread adoption and implementation to generate images that are reflective and representative of skin conditions that are illustrative of all skin types. We look forward to the likely near future when AI is advanced enough to depict all skin tones,’ Dr Jagdeo said. 

The recent study

Evaluating 20 common skin conditions, Dr. Jagdeo and coauthors prompted AI models (Adobe Firefly, ChatGPT-4o, Midjourney and Stable Diffusion) with ‘Generate a photo of a person with [skin condition].’ They examined the resulting 4,000 images for skin tone representation from June to July 2024. 

Nearly 90% of the images depicted light skin. Darker skin types, or Fitzpatrick types V-VI, represented a significantly smaller proportion of images compared to US Census demographics. While Adobe Firefly demonstrated the highest alignment with US demographic data, they found that ChatGPT-4o, Midjourney and Stable Diffusion notably underrepresented dark skin. 

Raters in the study identified only 15% of images, across all platforms, as the intended condition. 

‘Adobe Firefly had the lowest accuracy (0.94%), while ChatGPT-4o, Midjourney and Stable Diffusion demonstrated higher but still suboptimal accuracy (22%, 12.2%, and 22.5%, respectively),’ according to the abstract. 

Limitations and next steps

Ironically, the authors’ use of the well-known Fitzpatrick skin classification scale, which dermatologists use to categorise how skin reacts to sun exposure, was a study limitation, according to JEADV study author Margaret Kabakova, BA. 

‘The Fitzpatrick scale, [is] a relatively subjective approach given that AI images don’t indicate the UV sensitivity of their outputs. We also used US Census ethnicity data as a stand-in for skin phototype due to the lack of international, standardised skin-tone demographic data.’ Ms. Kabakova said. 

Safer, more equitable use of AI in dermatology comes down to three basics, according to Ms. Joerg: ‘building expert-curated image libraries that reflect all skin tones and conditions, reporting stratified AI output results by skin tone and diagnosis, and continued monitoring after deployment so gaps in underrepresented groups are identified and corrected.’ 

Authors of the Journal of Cosmetic Dermatology review suggest these prerequisites should be met to realise AI’s ability to accurately quantify patients’ skin aesthetic issues: Standardising aesthetic evaluations to facilitate consistent, reliable AI assessments; collecting a wide data range, reflecting diverse ages, ethnicities and including results from multiple instruments or evaluators; making data and AI models as public and accessible as possible; and educating and guiding practitioners and others to ensure effective AI use in aesthetic dermatology. 

‘International cooperation is crucial to building these prerequisites,’ they wrote.