This is the official codebase of the paper "Drug repurposing for rare diseases: A novel knowledge graph explainable approach to identify drug candidates"
All datasets used here are open-source knowledge graphs and can be found in the data foler.
Hetionet: https://doi.org/10.7554/eLife.26726
BioKG: https://doi.org/10.1145/3340531.3412776
Oregano: https://doi.org/10.1038/s41597-023-02757-0
If you want to use our work with other KGs, your datafile should be a .tsv with, on each row, subject \t relation \t object
To process the KG and create the data to train the random foret classifier, use the command:
python3 creates_features.py -c config/<dataset>/create_features.yaml
You can add your own .yaml config file, with:
- kg_path: str, path to your KG
- save_path: str, path for saving the result
- link_name: str, name of the link on which you want to make the prediction
- n_cpus: int, number of cpus to use
- num_neg: int, number of negative triples to generate for each positive triple
- feature_type: ["pr", "1h", "2h"]; pr to use PageRank node selection, 1h to use only common 1-hop neighors, 2h to use 2-hops common neighbors
To train the random forest classifier, use:
python3 random_forest.py -c config/<dataset>/random_forest.yaml
You can use your own .yaml donfig file, with:
- data_path: str, path to your processed data
- save_path: str, path to save the results
- model_save_path: str, path to save the model
You can find all our results concerning drug repurposing for ALS here: https://sites.google.com/view/alsdrugrepurposing
If you use this work, please cite our paper: