This repository contains the complete analysis pipeline for the study "Unveiling genetic architecture of white matter microstructure through unsupervised deep representation learning of fractional anisotropy maps".
- Overview
- Installation
- Repository Structure
- UDIP-FA Model Usage
- GWAS & Post-Analysis
- Reproducibility
- Citation
- Contact
This study introduces UDIP-FA (Unsupervised Deep Image Phenotyping of Fractional Anisotropy), a novel deep learning approach for analyzing white matter microstructure in brain imaging data. The pipeline includes:
- Deep representation learning of FA maps using customized 3D AutoEncoders.
- Genome-wide association studies (GWAS) on learned endophenotypes.
- Polygenic risk score (PRS) associations with brain disorders.
- Network-based drug targeting analysis.
- Python 3.8 or higher
- R 4.0 or higher
- Git
We recommend using a virtual environment (conda or venv).
# Create and activate environment
conda create -n udip-fa python=3.8
conda activate udip-fa
# Install dependencies from requirements.txt
pip install -r requirements.txtNote: Ensure you have a compatible PyTorch version for your CUDA driver installed.
install.packages(c("ggplot2", "dplyr", "tidyr", "data.table",
"ComplexHeatmap", "circlize", "RColorBrewer",
"cowplot", "ggpubr", "pheatmap"))
# Bioconductor packages
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("clusterProfiler", "org.Hs.eg.db", "DOSE"))UDIP-FA/
βββ Model/ # Deep Learning Model & Scripts
β βββ model.py # AutoEncoder Architecture (PyTorch)
β βββ dataset.py # Dataset Loading Logic
β βββ Train.py # Training Script (PyTorch Lightning)
β βββ inference.py # Inference Script for generating embeddings
β βββ model_compare.py # Analysis & Visualization scripts
βββ FA_GWAS_all.ipynb # Main GWAS Analysis Notebook
βββ FA_all.R # Post-GWAS Analysis (R)
βββ FA_network_drug_analysis.R # Network & Drug Analysis (R)
βββ requirements.txt # Python Project Dependencies
βββ README.md # Project Documentation
The deep learning model is located in the Model/ directory.
Input data should be Affine registered MRI images (NIfTI format).
Prepare a CSV file containing the paths to your images under a column named mri_names (or specify your column name during inference).
To train the AutoEncoder from scratch:
python Model/Train.py \
--train_csv /path/to/train.csv \
--val_csv /path/to/val.csv \
--modality_col T1_unbiased_linear \
--output_dir ./runs/udip_fa \
--batch_size 9 \
--max_epochs 60 \
--gpus 0For multi-GPU training, pass multiple device ids, for example --gpus 0 1 2 3.
To run on CPU, pass --gpus without any ids.
Common arguments:
--train_csv: CSV file for training samples.--val_csv: CSV file for validation samples.--modality_col: Column containing image paths.--output_dir: Directory for checkpoints and logs.--learning_rate: Learning rate for the optimizer.--seed: Random seed used by PyTorch Lightning.
To generate latent representation (endophenotypes) from trained models:
python Model/inference.py --input_csv /path/to/data.csv \
--checkpoint /path/to/model.ckpt \
--output_dir /path/to/resultsCommon Arguments:
--input_csv: Path to CSV file with image paths.--checkpoint: Path to the.ckptmodel file.--output_dir: Folder to save the output pickle files.--device:cuda:0orcpu.
For performing analysis on significant SNPs and feature correlations:
python Model/model_compare.pyThis script includes functions to:
- Plot significant SNPs across different thresholds.
- Compute and visualize pairwise correlations (CCA, Pearson) between feature sets.
The repository includes comprehensive scripts for the genetic analysis stages:
This Jupyter notebook serves as the main entry point for the genetic analysis, covering:
- UDIP-FA feature association analyses: Correlating deep learning features with genetic variants.
- Polygenic Risk Score (PRS) associations: Investigating links between learned features and brain disorders.
- Model Explainability: Interpretability assessments of the autoencoder features.
- Comparative Analysis: Benchmarking against previous white matter studies.
R script dedicated to post-GWAS statistical processing:
- Result Aggregation: Filtering and summarizing GWAS statistics.
- Figure Generation: Producing publication-ready plots (Manhattan plots, QQ plots).
- Meta-analysis: Effect size calculations and statistical validation.
Run as a standalone script from the repository root:
Rscript FA_all.RThe script now validates required packages and catches missing input files earlier, but it still expects the project-specific intermediate result files referenced in the analysis sections to exist.
Advanced network analysis for biological insights:
- Gene-Drug Interaction: Constructing networks to identify potential drug targets.
- Therapeutic Targets: Highlighting genes actionable by existing drugs.
- Mechanism of Action: Pathway analysis to understand underlying biological mechanisms.
Run the default workflow:
Rscript FA_network_drug_analysis.RThe network script exposes a single run_fa_network_drug_analysis() entry point and groups file paths in a default config object for easier review and reuse.
The pretrained model can be accessed at this Google Drive Link.
- Training:
python Model/Train.py --seed 42 ... - Python inference scripts use deterministic file ordering from the input CSV.
- R:
set.seed(42)
- Input MRI volumes are z-scored using non-zero voxels only; empty or zero-variance images are handled safely.
- Training and inference both validate required input columns and file paths before running.
- Checkpoints, TensorBoard logs, and CSV logs are written under the directory passed to
--output_dir.
If you use this code in your research, please cite:
@article{zhao2025udip,
title={Unveiling genetic architecture of white matter microstructure through unsupervised deep representation learning of fractional anisotropy maps},
author={Zhao, Xingzhong and Xie, Ziqian and He, Wei and Fornage, Myriam and Zhi, Degui},
journal={medRxiv},
year={2025},
doi={10.1101/2025.07.04.25330856}
}- Xingzhong Zhao - [xingzhong.zhao@uth.tmc.edu]
Keywords: white matter, fractional anisotropy, deep learning, GWAS, neuroimaging, brain imaging, genetics, biomarker
