π Chest X-Ray Lung Disease Classification with Grad-CAM Localization
=> This task involves classifying lung diseases using chest X-ray images and highlighting the affected regions using Grad-CAM (Gradient-weighted Class Activation Mapping).
The detection and classification of diseases from Chest X-ray (CXR) images is a crucial task in medical diagnostics, enabling timely and accurate identification of various thoracic conditions. This project aims to leverage advanced machine learning and deep learning techniques to detect and classify distinct diseases from CXR images.
Original Data
- Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and - -Localization of Common Thorax Diseases. IEEE CVPR 2017, ChestX-ray8_Hospital-Scale_Chest_CVPR_2017_paper.pdf
- NIH News release: NIH Clinical Center provides one of the largest publicly available chest x-ray datasets to scientific community
- Original source files and documents: https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345
Custom Data
-
We just only select No Finding, Edema, Atelectasis, Pneumonia, Effusion, Pneumothorax, Emphysema, Consolidation, Fibrosis
-
To address the issue of data imbalance, We selected only 10,000 cases of "No Finding."
-
In the given NIH dataset, the labels generated by the RNN sometimes incorrectly indicate diseases. Therefore, if a patient has multiple diseases, any instance labeled as "No Finding" is removed in such cases.
-
Due to the differences in lung structure between children and adults, images of adolescents under the age of 16 were not included
-
There are only
10Diseases in custom datasets -
There are
29,884images(format: png, 1024x1024) in custom datasets -
There are
11,016AP View in custom datasets -
There are
18,868PA View in custom datasets -
There are
14,939Patient IDs in custom datasetsFinding Labels Count No Finding 9526 Edema 1131 Atelectasis 7028 Pneumonia 690 Effusion 7320 Pneumothorax 3501 Emphysema 1667 Consolidation 2543 Fibrosis 1022 Cardiomegaly 1904
Original Data
Creative Commons Attribution-ShareAlike 4.0 International License
Custom Data
=> PC Dataset
-
We just only select No Finding, Edema, Atelectasis, Pneumonia, Effusion, Pneumothorax, Emphysema, Consolidation, Fibrosis
-
To address the issue of data imbalance, We selected only 10,000 cases of "No Finding."
-
In the given NIH dataset, the labels generated by the RNN sometimes incorrectly indicate diseases. Therefore, if a patient has multiple diseases, any instance labeled as "No Finding" is removed in such cases.
-
Due to the differences in lung structure between children and adults, images of adolescents under the age of 16 were not included
-
There are only
10Diseases in custom datasets -
There are
29,835images(format: png, 224x224) in custom datasets -
There are
1711AP View in custom datasets -
There are
28,124PA View in custom datasets -
There are
23,305Patient IDs in custom datasetsFinding Labels Count No Finding 9925 Edema 458 Atelectasis 4420 Pneumonia 3625 Effusion 4165 Pneumothorax 249 Emphysema 962 Consolidation 1032 Fibrosis 683 Cardiomegaly 8722
=> The image below is a diagram of hierarchical medical data.
Separate sets of cases for training and testing algorithms are important for ensuring that all researchers are using the same cases for these tasks. Specifically, the test set should contain cases of varying difficulty in order to ensure that the method is tested thoroughly. The data were split into a training set and a testing set based on the Lung Diseases.
There are 47775 images in train set. (80% of total)
There are 11944 images in test set. (20% of total)








