Skip to content

SEOYUNJE/Lung-Image-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

512 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Lung-Image-Analysis

πŸ“Œ Chest X-Ray Lung Disease Classification with Grad-CAM Localization

=> This task involves classifying lung diseases using chest X-ray images and highlighting the affected regions using Grad-CAM (Gradient-weighted Class Activation Mapping).

Overview

The detection and classification of diseases from Chest X-ray (CXR) images is a crucial task in medical diagnostics, enabling timely and accurate identification of various thoracic conditions. This project aims to leverage advanced machine learning and deep learning techniques to detect and classify distinct diseases from CXR images.

View Position

image

Data Description

πŸ“Œ NIH Dataset

Original Data

=> NIH Chest X-rays

Citations

 - Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and - -Localization of Common Thorax Diseases. IEEE CVPR 2017, ChestX-ray8_Hospital-Scale_Chest_CVPR_2017_paper.pdf

 - NIH News release: NIH Clinical Center provides one of the largest publicly available chest x-ray datasets to scientific community

 - Original source files and documents: https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345

Custom Data

=> NIH Big Dataset

Modifications to original data

  • We just only select No Finding, Edema, Atelectasis, Pneumonia, Effusion, Pneumothorax, Emphysema, Consolidation, Fibrosis

  • To address the issue of data imbalance, We selected only 10,000 cases of "No Finding."

  • In the given NIH dataset, the labels generated by the RNN sometimes incorrectly indicate diseases. Therefore, if a patient has multiple diseases, any instance labeled as "No Finding" is removed in such cases.

  • Due to the differences in lung structure between children and adults, images of adolescents under the age of 16 were not included

Introduce of custom data

  • There are only 10 Diseases in custom datasets

  • There are 29,884 images(format: png, 1024x1024) in custom datasets

  • There are 11,016 AP View in custom datasets

  • There are 18,868 PA View in custom datasets

  • There are 14,939 Patient IDs in custom datasets

    Finding Labels Count
    No Finding 9526
    Edema 1131
    Atelectasis 7028
    Pneumonia 690
    Effusion 7320
    Pneumothorax 3501
    Emphysema 1667
    Consolidation 2543
    Fibrosis 1022
    Cardiomegaly 1904

πŸ“Œ PC Dataset

Original Data

=> PadChest Dataset

License

 Creative Commons Attribution-ShareAlike 4.0 International License

Custom Data

=> PC Dataset

Modifications to original data

  • We just only select No Finding, Edema, Atelectasis, Pneumonia, Effusion, Pneumothorax, Emphysema, Consolidation, Fibrosis

  • To address the issue of data imbalance, We selected only 10,000 cases of "No Finding."

  • In the given NIH dataset, the labels generated by the RNN sometimes incorrectly indicate diseases. Therefore, if a patient has multiple diseases, any instance labeled as "No Finding" is removed in such cases.

  • Due to the differences in lung structure between children and adults, images of adolescents under the age of 16 were not included

Introduce of custom data

  • There are only 10 Diseases in custom datasets

  • There are 29,835 images(format: png, 224x224) in custom datasets

  • There are 1711 AP View in custom datasets

  • There are 28,124 PA View in custom datasets

  • There are 23,305 Patient IDs in custom datasets

    Finding Labels Count
    No Finding 9925
    Edema 458
    Atelectasis 4420
    Pneumonia 3625
    Effusion 4165
    Pneumothorax 249
    Emphysema 962
    Consolidation 1032
    Fibrosis 683
    Cardiomegaly 8722

image

The hierarchical relationships between lung diseases

=> The image below is a diagram of hierarchical medical data.

hier

image

image

Data Preprocssing

image

Data Augmentation

image

Split Train/Test dataset

Separate sets of cases for training and testing algorithms are important for ensuring that all researchers are using the same cases for these tasks. Specifically, the test set should contain cases of varying difficulty in order to ensure that the method is tested thoroughly. The data were split into a training set and a testing set based on the Lung Diseases.

There are 47775 images in train set. (80% of total)

There are 11944 images in test set. (20% of total)

image

Hybrid Model

image

Hierarchical model

image

About

Chest X-Ray Lung Diseases Classification with CAM Visualization

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors