Overview

GDAP (Gene-Disease Association Prediction) is a comprehensive machine learning pipeline designed to predict novel gene-disease associations using graph-based approaches and network analysis.

Purpose

The primary goal of GDAP is to identify potential gene-disease relationships that may not be immediately apparent through traditional experimental methods. This is particularly valuable for:

Drug Discovery: Identifying new therapeutic targets
Disease Research: Understanding genetic mechanisms of diseases
Personalized Medicine: Supporting precision medicine approaches

How It Works

GDAP operates through a five-step pipeline:

Data Collection: Gathers disease-gene associations from Open Targets Platform and protein-protein interactions from STRING database
Graph Construction: Builds a bipartite graph connecting diseases to genes with weighted edges
Embedding Generation: Creates node representations using graph embedding algorithms
Feature Extraction: Extracts edge features for machine learning
Model Training: Trains binary classifiers to predict novel associations

Key Components

Data Sources: Open Targets Platform (disease-gene data) and STRING database (protein interactions)
Embedding Algorithms: Node2Vec, ProNE, GGVec, and degree-based embeddings
Machine Learning Models: Random Forest, SVM, Logistic Regression, Gradient Boosting
Web Interface: Streamlit app for interactive exploration and visualization

Overview

Purpose

How It Works

Key Components

Table of contents