UDIP-FA: Unsupervised Deep Representation Learning of Fractional Anisotropy Maps

This repository contains the complete analysis pipeline for the study "Unveiling genetic architecture of white matter microstructure through unsupervised deep representation learning of fractional anisotropy maps".

📋 Table of Contents

Overview
Installation
Repository Structure
UDIP-FA Model Usage
GWAS & Post-Analysis
Reproducibility
Citation
Contact

🔬 Overview

This study introduces UDIP-FA (Unsupervised Deep Image Phenotyping of Fractional Anisotropy), a novel deep learning approach for analyzing white matter microstructure in brain imaging data. The pipeline includes:

Deep representation learning of FA maps using customized 3D AutoEncoders.
Genome-wide association studies (GWAS) on learned endophenotypes.
Polygenic risk score (PRS) associations with brain disorders.
Network-based drug targeting analysis.

🛠 Installation

Prerequisites

Python 3.8 or higher
R 4.0 or higher
Git

Python Dependencies

We recommend using a virtual environment (conda or venv).

# Create and activate environment
conda create -n udip-fa python=3.8
conda activate udip-fa

# Install dependencies from requirements.txt
pip install -r requirements.txt

Note: Ensure you have a compatible PyTorch version for your CUDA driver installed.

R Dependencies

install.packages(c("ggplot2", "dplyr", "tidyr", "data.table", 
                   "ComplexHeatmap", "circlize", "RColorBrewer",
                   "cowplot", "ggpubr", "pheatmap"))

# Bioconductor packages
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("clusterProfiler", "org.Hs.eg.db", "DOSE"))

📁 Repository Structure

UDIP-FA/
├── Model/                     # Deep Learning Model & Scripts
│   ├── model.py               # AutoEncoder Architecture (PyTorch)
│   ├── dataset.py             # Dataset Loading Logic
│   ├── Train.py               # Training Script (PyTorch Lightning)
│   ├── inference.py           # Inference Script for generating embeddings
│   └── model_compare.py       # Analysis & Visualization scripts
├── FA_GWAS_all.ipynb          # Main GWAS Analysis Notebook
├── FA_all.R                   # Post-GWAS Analysis (R)
├── FA_network_drug_analysis.R # Network & Drug Analysis (R)
├── requirements.txt           # Python Project Dependencies
└── README.md                  # Project Documentation

🧠 UDIP-FA Model Usage

The deep learning model is located in the Model/ directory.

Data Preparation

Input data should be Affine registered MRI images (NIfTI format). Prepare a CSV file containing the paths to your images under a column named mri_names (or specify your column name during inference).

Training

To train the AutoEncoder from scratch:

python Model/Train.py \
  --train_csv /path/to/train.csv \
  --val_csv /path/to/val.csv \
  --modality_col T1_unbiased_linear \
  --output_dir ./runs/udip_fa \
  --batch_size 9 \
  --max_epochs 60 \
  --gpus 0

For multi-GPU training, pass multiple device ids, for example --gpus 0 1 2 3. To run on CPU, pass --gpus without any ids.

Common arguments:

--train_csv: CSV file for training samples.
--val_csv: CSV file for validation samples.
--modality_col: Column containing image paths.
--output_dir: Directory for checkpoints and logs.
--learning_rate: Learning rate for the optimizer.
--seed: Random seed used by PyTorch Lightning.

Inference

To generate latent representation (endophenotypes) from trained models:

python Model/inference.py --input_csv /path/to/data.csv \
                          --checkpoint /path/to/model.ckpt \
                          --output_dir /path/to/results

Common Arguments:

--input_csv: Path to CSV file with image paths.
--checkpoint: Path to the .ckpt model file.
--output_dir: Folder to save the output pickle files.
--device: cuda:0 or cpu.

Analysis

For performing analysis on significant SNPs and feature correlations:

python Model/model_compare.py

This script includes functions to:

Plot significant SNPs across different thresholds.
Compute and visualize pairwise correlations (CCA, Pearson) between feature sets.

🧬 GWAS & Post-Analysis

The repository includes comprehensive scripts for the genetic analysis stages:

`FA_GWAS_all.ipynb`

This Jupyter notebook serves as the main entry point for the genetic analysis, covering:

UDIP-FA feature association analyses: Correlating deep learning features with genetic variants.
Polygenic Risk Score (PRS) associations: Investigating links between learned features and brain disorders.
Model Explainability: Interpretability assessments of the autoencoder features.
Comparative Analysis: Benchmarking against previous white matter studies.

`FA_all.R`

R script dedicated to post-GWAS statistical processing:

Result Aggregation: Filtering and summarizing GWAS statistics.
Figure Generation: Producing publication-ready plots (Manhattan plots, QQ plots).
Meta-analysis: Effect size calculations and statistical validation.

Run as a standalone script from the repository root:

Rscript FA_all.R

The script now validates required packages and catches missing input files earlier, but it still expects the project-specific intermediate result files referenced in the analysis sections to exist.

`FA_network_drug_analysis.R`

Advanced network analysis for biological insights:

Gene-Drug Interaction: Constructing networks to identify potential drug targets.
Therapeutic Targets: Highlighting genes actionable by existing drugs.
Mechanism of Action: Pathway analysis to understand underlying biological mechanisms.

Run the default workflow:

Rscript FA_network_drug_analysis.R

The network script exposes a single run_fa_network_drug_analysis() entry point and groups file paths in a default config object for easier review and reuse.

🔄 Reproducibility

Pre-trained Models

The pretrained model can be accessed at this Google Drive Link.

Random Seeds

Training: python Model/Train.py --seed 42 ...
Python inference scripts use deterministic file ordering from the input CSV.
R: set.seed(42)

Practical Notes

Input MRI volumes are z-scored using non-zero voxels only; empty or zero-variance images are handled safely.
Training and inference both validate required input columns and file paths before running.
Checkpoints, TensorBoard logs, and CSV logs are written under the directory passed to --output_dir.

📚 Citation

If you use this code in your research, please cite:

@article{zhao2025udip,
  title={Unveiling genetic architecture of white matter microstructure through unsupervised deep representation learning of fractional anisotropy maps},
  author={Zhao, Xingzhong and Xie, Ziqian and He, Wei and Fornage, Myriam and Zhi, Degui},
  journal={medRxiv},
  year={2025},
  doi={10.1101/2025.07.04.25330856}
}

💬 Contact

Xingzhong Zhao - [xingzhong.zhao@uth.tmc.edu]

Keywords: white matter, fractional anisotropy, deep learning, GWAS, neuroimaging, brain imaging, genetics, biomarker

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UDIP-FA: Unsupervised Deep Representation Learning of Fractional Anisotropy Maps

📋 Table of Contents

🔬 Overview

🛠 Installation

Prerequisites

Python Dependencies

R Dependencies

📁 Repository Structure

🧠 UDIP-FA Model Usage

Data Preparation

Training

Inference

Analysis

🧬 GWAS & Post-Analysis

`FA_GWAS_all.ipynb`

`FA_all.R`

`FA_network_drug_analysis.R`

🔄 Reproducibility

Pre-trained Models

Random Seeds

Practical Notes

📚 Citation

💬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Figures		Figures
Model		Model
.DS_Store		.DS_Store
FA_GWAS_all.ipynb		FA_GWAS_all.ipynb
FA_all.R		FA_all.R
FA_network_drug_analysis.R		FA_network_drug_analysis.R
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

UDIP-FA: Unsupervised Deep Representation Learning of Fractional Anisotropy Maps

📋 Table of Contents

🔬 Overview

🛠 Installation

Prerequisites

Python Dependencies

R Dependencies

📁 Repository Structure

🧠 UDIP-FA Model Usage

Data Preparation

Training

Inference

Analysis

🧬 GWAS & Post-Analysis

FA_GWAS_all.ipynb

FA_all.R

FA_network_drug_analysis.R

🔄 Reproducibility

Pre-trained Models

Random Seeds

Practical Notes

📚 Citation

💬 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`FA_GWAS_all.ipynb`

`FA_all.R`

`FA_network_drug_analysis.R`

Packages