Skip to content

ZhiGroup/UDIP-FA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

UDIP-FA: Unsupervised Deep Representation Learning of Fractional Anisotropy Maps

DOI Python R License

This repository contains the complete analysis pipeline for the study "Unveiling genetic architecture of white matter microstructure through unsupervised deep representation learning of fractional anisotropy maps".

Figure 1_page-0001

πŸ“‹ Table of Contents

πŸ”¬ Overview

This study introduces UDIP-FA (Unsupervised Deep Image Phenotyping of Fractional Anisotropy), a novel deep learning approach for analyzing white matter microstructure in brain imaging data. The pipeline includes:

  • Deep representation learning of FA maps using customized 3D AutoEncoders.
  • Genome-wide association studies (GWAS) on learned endophenotypes.
  • Polygenic risk score (PRS) associations with brain disorders.
  • Network-based drug targeting analysis.

πŸ›  Installation

Prerequisites

  • Python 3.8 or higher
  • R 4.0 or higher
  • Git

Python Dependencies

We recommend using a virtual environment (conda or venv).

# Create and activate environment
conda create -n udip-fa python=3.8
conda activate udip-fa

# Install dependencies from requirements.txt
pip install -r requirements.txt

Note: Ensure you have a compatible PyTorch version for your CUDA driver installed.

R Dependencies

install.packages(c("ggplot2", "dplyr", "tidyr", "data.table", 
                   "ComplexHeatmap", "circlize", "RColorBrewer",
                   "cowplot", "ggpubr", "pheatmap"))

# Bioconductor packages
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("clusterProfiler", "org.Hs.eg.db", "DOSE"))

πŸ“ Repository Structure

UDIP-FA/
β”œβ”€β”€ Model/                     # Deep Learning Model & Scripts
β”‚   β”œβ”€β”€ model.py               # AutoEncoder Architecture (PyTorch)
β”‚   β”œβ”€β”€ dataset.py             # Dataset Loading Logic
β”‚   β”œβ”€β”€ Train.py               # Training Script (PyTorch Lightning)
β”‚   β”œβ”€β”€ inference.py           # Inference Script for generating embeddings
β”‚   └── model_compare.py       # Analysis & Visualization scripts
β”œβ”€β”€ FA_GWAS_all.ipynb          # Main GWAS Analysis Notebook
β”œβ”€β”€ FA_all.R                   # Post-GWAS Analysis (R)
β”œβ”€β”€ FA_network_drug_analysis.R # Network & Drug Analysis (R)
β”œβ”€β”€ requirements.txt           # Python Project Dependencies
└── README.md                  # Project Documentation

🧠 UDIP-FA Model Usage

The deep learning model is located in the Model/ directory.

Data Preparation

Input data should be Affine registered MRI images (NIfTI format). Prepare a CSV file containing the paths to your images under a column named mri_names (or specify your column name during inference).

Training

To train the AutoEncoder from scratch:

python Model/Train.py \
  --train_csv /path/to/train.csv \
  --val_csv /path/to/val.csv \
  --modality_col T1_unbiased_linear \
  --output_dir ./runs/udip_fa \
  --batch_size 9 \
  --max_epochs 60 \
  --gpus 0

For multi-GPU training, pass multiple device ids, for example --gpus 0 1 2 3. To run on CPU, pass --gpus without any ids.

Common arguments:

  • --train_csv: CSV file for training samples.
  • --val_csv: CSV file for validation samples.
  • --modality_col: Column containing image paths.
  • --output_dir: Directory for checkpoints and logs.
  • --learning_rate: Learning rate for the optimizer.
  • --seed: Random seed used by PyTorch Lightning.

Inference

To generate latent representation (endophenotypes) from trained models:

python Model/inference.py --input_csv /path/to/data.csv \
                          --checkpoint /path/to/model.ckpt \
                          --output_dir /path/to/results

Common Arguments:

  • --input_csv: Path to CSV file with image paths.
  • --checkpoint: Path to the .ckpt model file.
  • --output_dir: Folder to save the output pickle files.
  • --device: cuda:0 or cpu.

Analysis

For performing analysis on significant SNPs and feature correlations:

python Model/model_compare.py

This script includes functions to:

  1. Plot significant SNPs across different thresholds.
  2. Compute and visualize pairwise correlations (CCA, Pearson) between feature sets.

🧬 GWAS & Post-Analysis

The repository includes comprehensive scripts for the genetic analysis stages:

FA_GWAS_all.ipynb

This Jupyter notebook serves as the main entry point for the genetic analysis, covering:

  • UDIP-FA feature association analyses: Correlating deep learning features with genetic variants.
  • Polygenic Risk Score (PRS) associations: Investigating links between learned features and brain disorders.
  • Model Explainability: Interpretability assessments of the autoencoder features.
  • Comparative Analysis: Benchmarking against previous white matter studies.

FA_all.R

R script dedicated to post-GWAS statistical processing:

  • Result Aggregation: Filtering and summarizing GWAS statistics.
  • Figure Generation: Producing publication-ready plots (Manhattan plots, QQ plots).
  • Meta-analysis: Effect size calculations and statistical validation.

Run as a standalone script from the repository root:

Rscript FA_all.R

The script now validates required packages and catches missing input files earlier, but it still expects the project-specific intermediate result files referenced in the analysis sections to exist.

FA_network_drug_analysis.R

Advanced network analysis for biological insights:

  • Gene-Drug Interaction: Constructing networks to identify potential drug targets.
  • Therapeutic Targets: Highlighting genes actionable by existing drugs.
  • Mechanism of Action: Pathway analysis to understand underlying biological mechanisms.

Run the default workflow:

Rscript FA_network_drug_analysis.R

The network script exposes a single run_fa_network_drug_analysis() entry point and groups file paths in a default config object for easier review and reuse.

πŸ”„ Reproducibility

Pre-trained Models

The pretrained model can be accessed at this Google Drive Link.

Random Seeds

  • Training: python Model/Train.py --seed 42 ...
  • Python inference scripts use deterministic file ordering from the input CSV.
  • R: set.seed(42)

Practical Notes

  • Input MRI volumes are z-scored using non-zero voxels only; empty or zero-variance images are handled safely.
  • Training and inference both validate required input columns and file paths before running.
  • Checkpoints, TensorBoard logs, and CSV logs are written under the directory passed to --output_dir.

πŸ“š Citation

If you use this code in your research, please cite:

@article{zhao2025udip,
  title={Unveiling genetic architecture of white matter microstructure through unsupervised deep representation learning of fractional anisotropy maps},
  author={Zhao, Xingzhong and Xie, Ziqian and He, Wei and Fornage, Myriam and Zhi, Degui},
  journal={medRxiv},
  year={2025},
  doi={10.1101/2025.07.04.25330856}
}

πŸ’¬ Contact


Keywords: white matter, fractional anisotropy, deep learning, GWAS, neuroimaging, brain imaging, genetics, biomarker

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors