Synthetic Health Records Generation: A Scoping Review of Deep Learning Models

Introduction

The application of digital technologies, such as artificial intelligence, to improve medical management and healthcare delivery is known as digital medicine. A significant challenge in this field is the need for large datasets to train deep learning (DL) models, which is often problematic due to patient data privacy concerns.

Challenges in Medical Data Collection

Medical data, stored as Electronic Health Records (EHRs), contains sensitive patient information. Regulations like the European Union’s artificial intelligence regulation aim to reduce data abuse and ensure governance over developers. These restrictions make it difficult to prepare large, fair datasets for training DL models.

Synthetic Health Records (SHRs)

A practical solution to this challenge is creating synthetic copies of clinical datasets, known as Synthetic Health Records (SHRs). SHRs resemble EHR data characteristics while keeping subjects unidentifiable. They can be used for research purposes, training, and validation of DL models.

Review Methodology

This scoping review followed the PRISMA-ScR guidelines and searched PubMed, Web of Science, and Scopus for relevant publications between 2018 and 2023. The review included 52 publications that addressed machine learning topics for EHR generation.

Findings

The review found that:

Generative Adversarial Networks (GANs) were predominantly used for generating medical time series data.
Large Language Models (LLMs) were widely used for generating synthetic medical texts.
Probabilistic models were mainly used for longitudinal data.
GAN-based models suffered from mode collapse and required preliminary experiments to identify optimal hyperparameters.
Diffusion models showed promising results in synthesizing time series data but faced challenges like computational costs and interoperability difficulties.

Challenges and Future Directions

Key challenges include:

Lack of generic methods and metrics for comparing generative model performance.
Inadequate evaluation of re-identification risks in SHRs.
Limited representation of non-acute medical conditions and diverse demographic groups in public datasets.
Need for integrating domain-specific expertise from physicians into the learning process.

Conclusion

Generating SHRs is a practical strategy for addressing data scarcity and privacy concerns in digital medicine. The field is evolving, with a shift from GAN-based methods to statistical models like graph neural networks and diffusion models. Future research should focus on developing standardized evaluation metrics and improving the utility and fidelity of SHRs.

What's Hot

WM Technology Updates Stockholders on Non-Binding Proposal from Co-Founders

Access Restricted: Website Unavailable in Your Location

Best TV Deals in Amazon Prime Day 2025 Sale

WM Technology Updates Stockholders on Non-Binding Proposal from Co-Founders

Access Restricted: Website Unavailable in Your Location

Best TV Deals in Amazon Prime Day 2025 Sale

Tech in Asia Organization Profile

Restaurant Tech Startup Owner.com Hits $1 Billion Valuation

The Hidden Opportunity in AI: Energy Infrastructure

WM Technology Updates Stockholders on Non-Binding Proposal from Co-Founders

Access Restricted: Website Unavailable in Your Location

Best TV Deals in Amazon Prime Day 2025 Sale

Tech in Asia Organization Profile

Our Picks

WM Technology Updates Stockholders on Non-Binding Proposal from Co-Founders

Access Restricted: Website Unavailable in Your Location

Best TV Deals in Amazon Prime Day 2025 Sale

Subscribe to Updates

What's Hot

Synthetic Health Records Generation: A Scoping Review of Deep Learning Models

Introduction

Challenges in Medical Data Collection

Synthetic Health Records (SHRs)

Review Methodology

Findings

Challenges and Future Directions

Conclusion

Related Posts