Network Analysis

Introduction

In this notebook, we explore the creation and analysis of disease-gene and protein-protein interaction networks using the NetworkX library. We will fetch relevant datasets, construct directed and undirected graphs, and visualize disease-specific networks, with a focus on breast cancer and other diseases. By the end, we will have a comprehensive understanding of network creation and analysis methods applicable to various diseases from open target platform.

Fetching Datasets
- Open Target Dataset (Breast Cancer)
- STRING Database
Graph Creation and Visualization with NetworkX
- Tutorial: Directed and Undirected Graphs
  - Creating Directed Graphs
  - Creating Undirected Graphs
- Detecting Communities in Graphs
Disease-Specific Graph Construction
- Building Breast Cancer Interaction Networks
Integrating Graphs and Class Definitions
- Generating Graphs for Infectious Diseases
- Generating Graphs for Alzheimer’s Disease

We need to get the Open Targets disease data and then get the STRING Database for Protein-Protein Interaction (PPI)

We can either download the data manually, but we’ll use the GraphQL API and STRING API as they’re the fastest options to retrieve them

import requests
import json
from pprint import pprint
import pandas as pd
import networkx as nx
from networkx.algorithms import community
import matplotlib.pyplot as plt

Fetching Datasets

Open Target Dataset

Query Sourced from here and here

query = """
  query DiseaseAssociationsQuery($efoId: String!){
  disease(efoId: $efoId){
    id
    name
    associatedTargets{
      count
      rows{
        target{
          id
          approvedSymbol
        }
        score
        datasourceScores {
          id
          score
        }
      }
    }
  }
}
"""

def fetch_graphql_data(disease_id):
    """
    Fetch data from a GraphQL API endpoint.
    """
    # Set variables
    variables = {"efoId": disease_id}

    # Set base URL of GraphQL API endpoint
    base_url = "https://api.platform.opentargets.org/api/v4/graphql"

    # Perform POST request and check status code of response
    response = requests.post(base_url, json={"query": query, "variables": variables})
    if response.status_code == 200:
        print("request was successful\n")

    # transform API response from JSON into Python dict
    api_res = json.loads(response.text)
    pprint(api_res)

    return api_res

Let’s fetch the Breast Cancer dataset. The EFO ID for Breast Cancer is MONDO_0007254

disease_id = "MONDO_0007254" # you can find it at the top left of the open targets platform
api_res = fetch_graphql_data(disease_id)

request was successful

{'data': {'disease': {'associatedTargets': {'count': 11908,
                                            'rows': [{'datasourceScores': [{'id': 'uniprot_variants',
                                                                            'score': 0.9967809897717167},
                                                                           {'id': 'gene_burden',
                                                                            'score': 0.981427193451389},
                                                                           {'id': 'genomics_england',
                                                                            'score': 0.9779019396789099},
                                                                           {'id': 'eva',
                                                                            'score': 0.9699325219179143},
                                                                           {'id': 'eva_somatic',
                                                                            'score': 0.9488937677821665},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.865017942479752},
                                                                           {'id': 'uniprot_literature',
                                                                            'score': 0.8274613634158176},
                                                                           {'id': 'clingen',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'orphanet',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.8979147935416834},
                                                                           {'id': 'cancer_biomarkers',
                                                                            'score': 0.865457038266544},
                                                                           {'id': 'intogen',
                                                                            'score': 0.336354528190285},
                                                                           {'id': 'ot_genetics_portal',
                                                                            'score': 0.3249228882691419},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9050938838031505},
                                                                           {'id': 'impc',
                                                                            'score': 0.4329809623969801}],
                                                      'score': 0.9246154877175718,
                                                      'target': {'approvedSymbol': 'BRCA2',
                                                                 'id': 'ENSG00000139618'}},
                                                     {'datasourceScores': [{'id': 'uniprot_variants',
                                                                            'score': 0.9967809897717167},
                                                                           {'id': 'eva',
                                                                            'score': 0.9699027100268988},
                                                                           {'id': 'gene_burden',
                                                                            'score': 0.9615415295186884},
                                                                           {'id': 'eva_somatic',
                                                                            'score': 0.9551136912636755},
                                                                           {'id': 'genomics_england',
                                                                            'score': 0.9471757543886921},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.8655308214319849},
                                                                           {'id': 'uniprot_literature',
                                                                            'score': 0.8274613634158176},
                                                                           {'id': 'orphanet',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'clingen',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'intogen',
                                                                            'score': 0.5077719549344466},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9626521671438224},
                                                                           {'id': 'cancer_biomarkers',
                                                                            'score': 0.865457038266544},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9967806662471856},
                                                                           {'id': 'impc',
                                                                            'score': 0.4942722213432534}],
                                                      'score': 0.9212307544907994,
                                                      'target': {'approvedSymbol': 'BRCA1',
                                                                 'id': 'ENSG00000012048'}},
                                                     {'datasourceScores': [{'id': 'cancer_gene_census',
                                                                            'score': 0.9632140851345072},
                                                                           {'id': 'uniprot_variants',
                                                                            'score': 0.9549947196932794},
                                                                           {'id': 'intogen',
                                                                            'score': 0.9509202575344607},
                                                                           {'id': 'chembl',
                                                                            'score': 0.9278946328772129},
                                                                           {'id': 'eva_somatic',
                                                                            'score': 0.8007968431539079},
                                                                           {'id': 'eva',
                                                                            'score': 0.7730988405360585},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9865331671779851},
                                                                           {'id': 'cancer_biomarkers',
                                                                            'score': 0.9190679877429972},
                                                                           {'id': 'crispr',
                                                                            'score': 0.45050869819016554},
                                                                           {'id': 'progeny',
                                                                            'score': 0.5807359874260895},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.988642821472146},
                                                                           {'id': 'clingen',
                                                                            'score': 0.006079307976116211}],
                                                      'score': 0.8793777133901898,
                                                      'target': {'approvedSymbol': 'PIK3CA',
                                                                 'id': 'ENSG00000121879'}},
                                                     {'datasourceScores': [{'id': 'eva',
                                                                            'score': 0.9685600208209743},
                                                                           {'id': 'gene_burden',
                                                                            'score': 0.9518332216356467},
                                                                           {'id': 'genomics_england',
                                                                            'score': 0.9190679877429972},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.8412836902101531},
                                                                           {'id': 'uniprot_literature',
                                                                            'score': 0.8274613634158176},
                                                                           {'id': 'clingen',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.8493282943298344},
                                                                           {'id': 'ot_genetics_portal',
                                                                            'score': 0.29079990469976646},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.8813600716365745}],
                                                      'score': 0.8672675131240247,
                                                      'target': {'approvedSymbol': 'PALB2',
                                                                 'id': 'ENSG00000083093'}},
                                                     {'datasourceScores': [{'id': 'eva',
                                                                            'score': 0.9493992387005677},
                                                                           {'id': 'gene_burden',
                                                                            'score': 0.9329896927619304},
                                                                           {'id': 'uniprot_variants',
                                                                            'score': 0.8897742701710087},
                                                                           {'id': 'genomics_england',
                                                                            'score': 0.8586178167934132},
                                                                           {'id': 'uniprot_literature',
                                                                            'score': 0.8274613634158176},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.6893009908072479},
                                                                           {'id': 'clingen',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'eva_somatic',
                                                                            'score': 0.5015429080295873},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9475493046633926},
                                                                           {'id': 'ot_genetics_portal',
                                                                            'score': 0.34792928780386473},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9684002733734489},
                                                                           {'id': 'intogen',
                                                                            'score': 0.15198269940290526},
                                                                           {'id': 'chembl',
                                                                            'score': 0.12158615952232422}],
                                                      'score': 0.8634848741187544,
                                                      'target': {'approvedSymbol': 'CHEK2',
                                                                 'id': 'ENSG00000183765'}},
                                                     {'datasourceScores': [{'id': 'intogen',
                                                                            'score': 0.9564265224205744},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.9531985691504341},
                                                                           {'id': 'eva',
                                                                            'score': 0.9109719579436069},
                                                                           {'id': 'eva_somatic',
                                                                            'score': 0.8820907288419858},
                                                                           {'id': 'genomics_england',
                                                                            'score': 0.7599134970145264},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9856315360164882},
                                                                           {'id': 'progeny',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'cancer_biomarkers',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9925763625311573},
                                                                           {'id': 'impc',
                                                                            'score': 0.7762948222349234},
                                                                           {'id': 'chembl',
                                                                            'score': 0.14353921610274387}],
                                                      'score': 0.8579753568583963,
                                                      'target': {'approvedSymbol': 'TP53',
                                                                 'id': 'ENSG00000141510'}},
                                                     {'datasourceScores': [{'id': 'chembl',
                                                                            'score': 0.9954471784803182},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.9226214066290276},
                                                                           {'id': 'ot_genetics_portal',
                                                                            'score': 0.8077072080945117},
                                                                           {'id': 'intogen',
                                                                            'score': 0.7545563275197559},
                                                                           {'id': 'cancer_biomarkers',
                                                                            'score': 0.9607983220922475},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.8292004600368877},
                                                                           {'id': 'crispr',
                                                                            'score': 0.4028554954816608},
                                                                           {'id': 'eva_somatic',
                                                                            'score': 0.3039653988058105},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9975790652001129},
                                                                           {'id': 'progeny',
                                                                            'score': 0.3039653988058105},
                                                                           {'id': 'impc',
                                                                            'score': 0.2783107191466001},
                                                                           {'id': 'eva',
                                                                            'score': 0.015198269940290528}],
                                                      'score': 0.8579482544926643,
                                                      'target': {'approvedSymbol': 'ESR1',
                                                                 'id': 'ENSG00000091831'}},
                                                     {'datasourceScores': [{'id': 'chembl',
                                                                            'score': 0.9944690246399933},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.9276008908162294},
                                                                           {'id': 'intogen',
                                                                            'score': 0.8005621897249015},
                                                                           {'id': 'cancer_biomarkers',
                                                                            'score': 0.9857024850308149},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9742143913335374},
                                                                           {'id': 'crispr',
                                                                            'score': 0.39637088004277693},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9992089407387643},
                                                                           {'id': 'eva_somatic',
                                                                            'score': 0}],
                                                      'score': 0.8393681058533538,
                                                      'target': {'approvedSymbol': 'ERBB2',
                                                                 'id': 'ENSG00000141736'}},
                                                     {'datasourceScores': [{'id': 'cancer_gene_census',
                                                                            'score': 0.9199005839308011},
                                                                           {'id': 'intogen',
                                                                            'score': 0.9147281211319899},
                                                                           {'id': 'reactome',
                                                                            'score': 0.865457038266544},
                                                                           {'id': 'uniprot_literature',
                                                                            'score': 0.8274613634158176},
                                                                           {'id': 'chembl',
                                                                            'score': 0.6500589899523559},
                                                                           {'id': 'eva_somatic',
                                                                            'score': 0.547137717850459},
                                                                           {'id': 'eva',
                                                                            'score': 0.5073689115066987},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9802615355154657},
                                                                           {'id': 'crispr',
                                                                            'score': 0.3275531137531414},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9827086927319217},
                                                                           {'id': 'ot_genetics_portal',
                                                                            'score': 0.07916000824617708}],
                                                      'score': 0.8282070633417251,
                                                      'target': {'approvedSymbol': 'AKT1',
                                                                 'id': 'ENSG00000142208'}},
                                                     {'datasourceScores': [{'id': 'cancer_gene_census',
                                                                            'score': 0.9470005150013617},
                                                                           {'id': 'intogen',
                                                                            'score': 0.9334985585431808},
                                                                           {'id': 'eva',
                                                                            'score': 0.9310298173424281},
                                                                           {'id': 'eva_somatic',
                                                                            'score': 0.6839221473130737},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9390736181431255},
                                                                           {'id': 'cancer_biomarkers',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9705590991710802}],
                                                      'score': 0.8254207565099935,
                                                      'target': {'approvedSymbol': 'CDH1',
                                                                 'id': 'ENSG00000039068'}},
                                                     {'datasourceScores': [{'id': 'eva',
                                                                            'score': 0.9493909648924126},
                                                                           {'id': 'genomics_england',
                                                                            'score': 0.9066612367713315},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.8920325429621649},
                                                                           {'id': 'uniprot_variants',
                                                                            'score': 0.8897742701710087},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9326485111500797},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.6886823067572628},
                                                                           {'id': 'impc',
                                                                            'score': 0.258005830506372},
                                                                           {'id': 'clingen',
                                                                            'score': 0.006079307976116211}],
                                                      'score': 0.8233869133963113,
                                                      'target': {'approvedSymbol': 'BRIP1',
                                                                 'id': 'ENSG00000136492'}},
                                                     {'datasourceScores': [{'id': 'eva',
                                                                            'score': 0.9687974527484059},
                                                                           {'id': 'genomics_england',
                                                                            'score': 0.7599134970145264},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.7076059334305238},
                                                                           {'id': 'clingen',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'eva_somatic',
                                                                            'score': 0.57753425773104},
                                                                           {'id': 'gene_burden',
                                                                            'score': 0.5370693399711839},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9721288826333171},
                                                                           {'id': 'intogen',
                                                                            'score': 0.3260714893140308},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9543985308894327},
                                                                           {'id': 'impc',
                                                                            'score': 0.6490645774656854},
                                                                           {'id': 'ot_genetics_portal',
                                                                            'score': 0.09417570446505588}],
                                                      'score': 0.8102877835482611,
                                                      'target': {'approvedSymbol': 'ATM',
                                                                 'id': 'ENSG00000149311'}},
                                                     {'datasourceScores': [{'id': 'intogen',
                                                                            'score': 0.9027520542982432},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.866273997846718},
                                                                           {'id': 'eva',
                                                                            'score': 0.8570668490441947},
                                                                           {'id': 'genomics_england',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9605410860851366},
                                                                           {'id': 'cancer_biomarkers',
                                                                            'score': 0.9190679877429972},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9829189382758448},
                                                                           {'id': 'impc',
                                                                            'score': 0.7483599311963033}],
                                                      'score': 0.784760888471985,
                                                      'target': {'approvedSymbol': 'PTEN',
                                                                 'id': 'ENSG00000171862'}},
                                                     {'datasourceScores': [{'id': 'cancer_gene_census',
                                                                            'score': 0.9315693279673053},
                                                                           {'id': 'intogen',
                                                                            'score': 0.9007870162776002},
                                                                           {'id': 'ot_genetics_portal',
                                                                            'score': 0.8006700483488376},
                                                                           {'id': 'gene2phenotype',
                                                                            'score': 0.3039653988058105},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.38364098596622226}],
                                                      'score': 0.7707324681836835,
                                                      'target': {'approvedSymbol': 'MAP3K1',
                                                                 'id': 'ENSG00000095015'}},
                                                     {'datasourceScores': [{'id': 'eva',
                                                                            'score': 0.9462009100754374},
                                                                           {'id': 'genomics_england',
                                                                            'score': 0.8586178167934132},
                                                                           {'id': 'orphanet',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9095618988667489},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.8203383684006365},
                                                                           {'id': 'clingen',
                                                                            'score': 0.006079307976116211}],
                                                      'score': 0.7681561976356213,
                                                      'target': {'approvedSymbol': 'RAD51C',
                                                                 'id': 'ENSG00000108384'}},
                                                     {'datasourceScores': [{'id': 'eva',
                                                                            'score': 0.9489953381878606},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.7032174930743333},
                                                                           {'id': 'clingen',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'gene2phenotype',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9432831222921076},
                                                                           {'id': 'gene_burden',
                                                                            'score': 0.19577539648350917},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.8922733459337661},
                                                                           {'id': 'impc',
                                                                            'score': 0.25824900282541663}],
                                                      'score': 0.7654432846523105,
                                                      'target': {'approvedSymbol': 'BARD1',
                                                                 'id': 'ENSG00000138376'}},
                                                     {'datasourceScores': [{'id': 'chembl',
                                                                            'score': 0.9863841154458036},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.8699743018321302},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.8469682157917556},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.3227423563847339}],
                                                      'score': 0.7629323355279769,
                                                      'target': {'approvedSymbol': 'CDK6',
                                                                 'id': 'ENSG00000105810'}},
                                                     {'datasourceScores': [{'id': 'chembl',
                                                                            'score': 0.9782080820264845},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.7032174930743333},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9776035037816935},
                                                                           {'id': 'intogen',
                                                                            'score': 0.35037268153398754},
                                                                           {'id': 'crispr',
                                                                            'score': 0.2942385060440246},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9922142106444732},
                                                                           {'id': 'eva',
                                                                            'score': 0.19453785523571876},
                                                                           {'id': 'progeny',
                                                                            'score': 0.3039653988058105},
                                                                           {'id': 'eva_somatic',
                                                                            'score': 0}],
                                                      'score': 0.7622532919844915,
                                                      'target': {'approvedSymbol': 'EGFR',
                                                                 'id': 'ENSG00000146648'}},
                                                     {'datasourceScores': [{'id': 'eva',
                                                                            'score': 0.9469210713496257},
                                                                           {'id': 'genomics_england',
                                                                            'score': 0.8274613634158176},
                                                                           {'id': 'orphanet',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9089652610585701},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.41246601446483044},
                                                                           {'id': 'clingen',
                                                                            'score': 0.006079307976116211}],
                                                      'score': 0.7618637638136027,
                                                      'target': {'approvedSymbol': 'RAD51D',
                                                                 'id': 'ENSG00000185379'}},
                                                     {'datasourceScores': [{'id': 'chembl',
                                                                            'score': 0.9103120255511722},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.8611084598274582},
                                                                           {'id': 'ot_genetics_portal',
                                                                            'score': 0.6494039087995811},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9748162620857227},
                                                                           {'id': 'intogen',
                                                                            'score': 0.3091165783782697},
                                                                           {'id': 'cancer_biomarkers',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.7389422896398891}],
                                                      'score': 0.7611490409640307,
                                                      'target': {'approvedSymbol': 'ERBB4',
                                                                 'id': 'ENSG00000178568'}},
                                                     {'datasourceScores': [{'id': 'chembl',
                                                                            'score': 0.9769980215991244},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.7001778390862753},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.8334267210607424},
                                                                           {'id': 'intogen',
                                                                            'score': 0.4115121385503686},
                                                                           {'id': 'eva',
                                                                            'score': 0.25963711147996316},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.4497215895787084},
                                                                           {'id': 'impc',
                                                                            'score': 0.379166438470368}],
                                                      'score': 0.7529192716595609,
                                                      'target': {'approvedSymbol': 'POLD1',
                                                                 'id': 'ENSG00000062822'}},
                                                     {'datasourceScores': [{'id': 'chembl',
                                                                            'score': 0.9863841154458036},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.6227068933869035},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.9229772476990096},
                                                                           {'id': 'crispr',
                                                                            'score': 0.3981682274459157},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9561795921185277}],
                                                      'score': 0.7452455272403391,
                                                      'target': {'approvedSymbol': 'CDK4',
                                                                 'id': 'ENSG00000135446'}},
                                                     {'datasourceScores': [{'id': 'cancer_gene_census',
                                                                            'score': 0.9226214066290276},
                                                                           {'id': 'intogen',
                                                                            'score': 0.8565451884252918},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.8472920286170607},
                                                                           {'id': 'crispr',
                                                                            'score': 0.3890757104714375},
                                                                           {'id': 'cancer_biomarkers',
                                                                            'score': 0.607930797611621},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.9712342504116543},
                                                                           {'id': 'eva_somatic',
                                                                            'score': 0}],
                                                      'score': 0.7451414332495789,
                                                      'target': {'approvedSymbol': 'GATA3',
                                                                 'id': 'ENSG00000107485'}},
                                                     {'datasourceScores': [{'id': 'cancer_gene_census',
                                                                            'score': 0.9184184282751694},
                                                                           {'id': 'intogen',
                                                                            'score': 0.809326307468332},
                                                                           {'id': 'ot_genetics_portal',
                                                                            'score': 0.6891228181288519},
                                                                           {'id': 'crispr',
                                                                            'score': 0.29667022923447106},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.777555491966681},
                                                                           {'id': 'impc',
                                                                            'score': 0.26779351634791904}],
                                                      'score': 0.7438454496469091,
                                                      'target': {'approvedSymbol': 'TBX3',
                                                                 'id': 'ENSG00000135111'}},
                                                     {'datasourceScores': [{'id': 'chembl',
                                                                            'score': 0.9769980215991244},
                                                                           {'id': 'cancer_gene_census',
                                                                            'score': 0.7050938226965915},
                                                                           {'id': 'slapenrich',
                                                                            'score': 0.8334267210607424},
                                                                           {'id': 'eva',
                                                                            'score': 0.2785700719367037},
                                                                           {'id': 'europepmc',
                                                                            'score': 0.19988561484069164}],
                                                      'score': 0.7408139382927514,
                                                      'target': {'approvedSymbol': 'POLE',
                                                                 'id': 'ENSG00000177084'}}]},
                      'id': 'MONDO_0007254',
                      'name': 'breast cancer'}}}

# extract relevant data
associated_targets = api_res['data']['disease']['associatedTargets']['rows']

# normalize the JSON data
open_target_df = pd.json_normalize(
    data=associated_targets,
    record_path='datasourceScores',      # the path to the records to flatten
    meta=['target', 'score'],            # the fields to extract from the parent record
    record_prefix='datasourceScores_',   # Prefix for fields from the records
    errors='ignore'
)

# further normalize the nested target column
target_df = pd.json_normalize(open_target_df['target'])
open_target_df = open_target_df.drop(columns=['target'])
open_target_df = pd.concat([open_target_df, target_df], axis=1)

open_target_df.head()

	datasourceScores_id	datasourceScores_score	score	id	approvedSymbol
0	uniprot_variants	0.996781	0.924615	ENSG00000139618	BRCA2
1	gene_burden	0.981427	0.924615	ENSG00000139618	BRCA2
2	genomics_england	0.977902	0.924615	ENSG00000139618	BRCA2
3	eva	0.969933	0.924615	ENSG00000139618	BRCA2
4	eva_somatic	0.948894	0.924615	ENSG00000139618	BRCA2

agg_df = open_target_df.groupby('id').agg({
    'approvedSymbol': 'first',
    'score': 'mean'
}).reset_index()

agg_df.head()

	id	approvedSymbol	score
0	ENSG00000012048	BRCA1	0.921231
1	ENSG00000039068	CDH1	0.825421
2	ENSG00000062822	POLD1	0.752919
3	ENSG00000083093	PALB2	0.867268
4	ENSG00000091831	ESR1	0.857948

# Aggregate by ID
agg_df = open_target_df.groupby('id').agg({
    'approvedSymbol': 'first',
    'score': 'mean',
    'datasourceScores_id': lambda x: list(x),    # aggregate into a list
    'datasourceScores_score': lambda x: list(x)  # aggregate into a list
}).reset_index()

agg_df.head()

	id	approvedSymbol	score	datasourceScores_id	datasourceScores_score
0	ENSG00000012048	BRCA1	0.921231	[uniprot_variants, eva, gene_burden, eva_somat...	[0.9967809897717167, 0.9699027100268988, 0.961...
1	ENSG00000039068	CDH1	0.825421	[cancer_gene_census, intogen, eva, eva_somatic...	[0.9470005150013617, 0.9334985585431808, 0.931...
2	ENSG00000062822	POLD1	0.752919	[chembl, cancer_gene_census, slapenrich, intog...	[0.9769980215991244, 0.7001778390862753, 0.833...
3	ENSG00000083093	PALB2	0.867268	[eva, gene_burden, genomics_england, cancer_ge...	[0.9685600208209743, 0.9518332216356467, 0.919...
4	ENSG00000091831	ESR1	0.857948	[chembl, cancer_gene_census, ot_genetics_porta...	[0.9954471784803182, 0.9226214066290276, 0.807...

# These are the gene symbols associated with breast cancer disease
gene_symbols = agg_df['approvedSymbol'].tolist()
gene_symbols

['BRCA1',
 'CDH1',
 'POLD1',
 'PALB2',
 'ESR1',
 'MAP3K1',
 'CDK6',
 'GATA3',
 'RAD51C',
 'PIK3CA',
 'TBX3',
 'CDK4',
 'BRIP1',
 'BARD1',
 'BRCA2',
 'TP53',
 'ERBB2',
 'AKT1',
 'EGFR',
 'ATM',
 'PTEN',
 'POLE',
 'ERBB4',
 'CHEK2',
 'RAD51D']

# and these are the unique features associated with the gene symbols
features = []
for feature_list in agg_df['datasourceScores_id'].tolist():
    for feature in feature_list:
        if feature not in features:
            features.append(feature)

features

['uniprot_variants',
 'eva',
 'gene_burden',
 'eva_somatic',
 'genomics_england',
 'cancer_gene_census',
 'uniprot_literature',
 'orphanet',
 'clingen',
 'intogen',
 'slapenrich',
 'cancer_biomarkers',
 'europepmc',
 'impc',
 'chembl',
 'ot_genetics_portal',
 'crispr',
 'progeny',
 'gene2phenotype',
 'reactome']

def get_feature_score(df, gene_symbol, feature_key):
    """
    Filters the DataFrame for a given gene symbol and 
    retrieves the score for the specified feature.
    """
    gene_df = df[df['approvedSymbol'] == gene_symbol]
    if gene_df.empty:
        raise ValueError(f"No features found for the gene symbol: {gene_symbol}")

    score = None
    
    for _, row in gene_df.iterrows():
        if feature_key in row['datasourceScores_id']:
            if isinstance(row['datasourceScores_score'], float):
                score = row['datasourceScores_score']
            else:
                index = row['datasourceScores_id'].index(feature_key)
                score = row['datasourceScores_score'][index]
            break

    if score is None:
        raise ValueError(f"Feature key '{feature_key}' not found for gene symbol: {gene_symbol}")
    
    print(f"{feature_key} score for {gene_symbol}: {score}")

    return score

# reference the (x1, y1) coordinates in the platform, as shown in the attached screenshot
score = get_feature_score(agg_df,"BRCA2", "ot_genetics_portal")

ot_genetics_portal score for BRCA2: 0.3249228882691419

screenshot

STRING Database

source code used can be found here Getting String Network Interactions

https://string-db.org/api/[output-format]/network?identifiers=[your_identifiers]&[optional_parameters]

For the latter ML network, we can use this all partners of protein set

def fetch_string_data(my_genes, caller_identity="app.name"):
    """
    Fetches data from the STRING API based on provided gene symbols
    """
    string_api_url = "https://version-11-5.string-db.org/api"
    output_format = "json"
    method = "network"

    ## Construct URL
    request_url = "/".join([string_api_url, output_format, method])

    ## Set parameters
    params = {
        "identifiers": "%0d".join(my_genes),  # your protein
        "species": 9606, # Human taxonomy, aka Homo sapiens
        "caller_identity": caller_identity  # your app name
    }

    ## Call STRING
    try:
        res = requests.post(request_url, data=params)
        res.raise_for_status()
        data = res.json()

        return data
    
    except requests.exceptions.RequestException as e:
        print(f"Request failed: {e}")

        return None
    
    except json.JSONDecodeError:
        print("Failed to decode JSON response")

        return None

# gene symbols are from our open targets
# print (gene_symbols)
my_genes = gene_symbols
data = fetch_string_data(my_genes)

STRING_df = pd.json_normalize(data, errors='ignore')
print(STRING_df.columns)
print(STRING_df.shape)

Index(['stringId_A', 'stringId_B', 'preferredName_A', 'preferredName_B',
       'ncbiTaxonId', 'score', 'nscore', 'fscore', 'pscore', 'ascore',
       'escore', 'dscore', 'tscore'],
      dtype='object')
(396, 13)

Field	Description
stringId_A	STRING identifier (protein A)
stringId_B	STRING identifier (protein B)
preferredName_A	common protein name (protein A)
preferredName_B	common protein name (protein B)
ncbiTaxonId	NCBI taxon identifier
score	combined score
nscore	gene neighborhood score
fscore	gene fusion score
pscore	phylogenetic profile score
ascore	coexpression score
escore	experimental score
dscore	database score
tscore	textmining score

STRING_df.head()

	stringId_A	stringId_B	preferredName_A	preferredName_B	ncbiTaxonId	score	ascore	escore	tscore
0	9606.ENSP00000257566	9606.ENSP00000269305	TBX3	TP53	9606	0.400	0.000	0.000	0.400
1	9606.ENSP00000257566	9606.ENSP00000269305	TBX3	TP53	9606	0.400	0.000	0.000	0.400
2	9606.ENSP00000257566	9606.ENSP00000261769	TBX3	CDH1	9606	0.425	0.062	0.000	0.413
3	9606.ENSP00000257566	9606.ENSP00000261769	TBX3	CDH1	9606	0.425	0.062	0.000	0.413
4	9606.ENSP00000257566	9606.ENSP00000382423	TBX3	MAP3K1	9606	0.468	0.000	0.065	0.455

STRING_df.dtypes

stringId_A          object
stringId_B          object
preferredName_A     object
preferredName_B     object
ncbiTaxonId         object
score              float64
nscore             float64
fscore               int64
pscore             float64
ascore             float64
escore             float64
dscore             float64
tscore             float64
dtype: object

df_filtered = STRING_df[STRING_df['escore'] > 0.4]
df_sorted = df_filtered.sort_values(by='escore', ascending=False)
df_sorted[['preferredName_A', 'preferredName_B', 'escore', 'score']].head()

	preferredName_A	preferredName_B	escore	score
91	BARD1	BRCA1	0.998	0.999
90	BARD1	BRCA1	0.998	0.999
252	ERBB2	EGFR	0.982	0.998
253	ERBB2	EGFR	0.982	0.998
120	PALB2	BRCA2	0.981	0.999

# fast overlap check for the BRCA1 gene
gene = 'BRCA1'

openTarget = open_target_df[open_target_df['approvedSymbol'] == gene]
openTarget.head()

	datasourceScores_id	datasourceScores_score	score	id	approvedSymbol
15	uniprot_variants	0.996781	0.921231	ENSG00000012048	BRCA1
16	eva	0.969903	0.921231	ENSG00000012048	BRCA1
17	gene_burden	0.961542	0.921231	ENSG00000012048	BRCA1
18	eva_somatic	0.955114	0.921231	ENSG00000012048	BRCA1
19	genomics_england	0.947176	0.921231	ENSG00000012048	BRCA1

StringDB = STRING_df[STRING_df['preferredName_A'] == gene]
StringDB.head()

	stringId_A	stringId_B	preferredName_A	preferredName_B	ncbiTaxonId	score	ascore	escore	dscore	tscore
392	9606.ENSP00000418960	9606.ENSP00000466399	BRCA1	RAD51D	9606	0.825	0.126	0.095	0.0	0.797
393	9606.ENSP00000418960	9606.ENSP00000466399	BRCA1	RAD51D	9606	0.825	0.126	0.095	0.0	0.797
394	9606.ENSP00000418960	9606.ENSP00000451828	BRCA1	AKT1	9606	0.987	0.000	0.835	0.8	0.657
395	9606.ENSP00000418960	9606.ENSP00000451828	BRCA1	AKT1	9606	0.987	0.000	0.835	0.8	0.657

StringDB2 = STRING_df[STRING_df['preferredName_B'] == gene]
StringDB2.head()

	stringId_A	stringId_B	preferredName_A	preferredName_B	ncbiTaxonId	score	ascore	escore	dscore	tscore
44	9606.ENSP00000257904	9606.ENSP00000418960	CDK4	BRCA1	9606	0.973	0.088	0.675	0.8	0.604
45	9606.ENSP00000257904	9606.ENSP00000418960	CDK4	BRCA1	9606	0.973	0.088	0.675	0.8	0.604
66	9606.ENSP00000259008	9606.ENSP00000418960	BRIP1	BRCA1	9606	0.999	0.212	0.981	0.8	0.989
67	9606.ENSP00000259008	9606.ENSP00000418960	BRIP1	BRCA1	9606	0.999	0.212	0.981	0.8	0.989
90	9606.ENSP00000260947	9606.ENSP00000418960	BARD1	BRCA1	9606	0.999	0.161	0.998	0.9	0.992

Graph Creation and Visualization with NetworkX

Creating Disease-Gene & Protein-Protein Interaction Network Graphs using Networkx

Tutorial: Directed and Undirected Graphs

in Networkx, there are two types of graphs, directed graphs have edges with a specific direction (from node u to node v), while undirected graphs treat edges as bidirectional

Creating Directed Graphs

def plot_DiGraph(
        df: pd.DataFrame,
        source_col: str,
        target_col: str,
        edge_attr_col: str,
        figsize: tuple = (10, 8)
) -> None:
    """
    Create and plot a directed graph from the given df
    """
    # create a directed graph from the DataFrame
    DiGraph = nx.from_pandas_edgelist(
        df,
        source=source_col,
        target=target_col,
        edge_attr=edge_attr_col,
        create_using=nx.DiGraph()  # notice the DiGraph class, aka DirectGraph
    )
    
    # plot the directed graph
    plt.figure(figsize=figsize)
    pos = nx.spring_layout(DiGraph)
    nx.draw(
        DiGraph,
        pos,
        with_labels=True,
        node_color='lightblue',
        edge_color='gray',
        node_size=3000,
        font_size=10,
        font_weight='bold',
        arrows=True
    )
    
    edge_labels = nx.get_edge_attributes(DiGraph, edge_attr_col)
    nx.draw_networkx_edge_labels(DiGraph, pos, edge_labels=edge_labels)
    plt.title(f'PPI Network ({DiGraph})')
    plt.show()
    return DiGraph

tut_STRING_df = STRING_df.copy()

# Select the first two gene symbols for the example
tut_gene = gene_symbols[0:2]
tut_gene

['BRCA1', 'CDH1']

# Filter the dataframe for interactions involving the selected genes
tut_STRING_df = tut_STRING_df[(tut_STRING_df['preferredName_A'].isin(tut_gene)) & 
                          (tut_STRING_df['preferredName_B'].isin(gene_symbols))]

tut_STRING_df.head()

	stringId_A	stringId_B	preferredName_A	preferredName_B	ncbiTaxonId	score	escore	tscore
122	9606.ENSP00000261769	9606.ENSP00000278616	CDH1	ATM	9606	0.459	0.000	0.459
123	9606.ENSP00000261769	9606.ENSP00000278616	CDH1	ATM	9606	0.459	0.000	0.459
124	9606.ENSP00000261769	9606.ENSP00000369497	CDH1	BRCA2	9606	0.470	0.000	0.470
125	9606.ENSP00000261769	9606.ENSP00000369497	CDH1	BRCA2	9606	0.470	0.000	0.470
126	9606.ENSP00000261769	9606.ENSP00000342235	CDH1	ERBB4	9606	0.489	0.059	0.479

In the STRING directed Graph the arrows indicate the direction of relationships from preferredName_A to preferredName_B. The weight represents as the score value for each edge

Tut_STRING_DiGraph = plot_DiGraph(tut_STRING_df,'preferredName_A', 'preferredName_B', 'score')

png

# similary we can see the directed graph in the open target dataset
tut_open_target_df = open_target_df.copy()
# Filter the df to include only rows with the a specific gene symbol
tut_open_target_df = tut_open_target_df[tut_open_target_df['approvedSymbol'] == 'BRCA1']
# Round the 'score' column to 2 decimal places
tut_open_target_df['datasourceScores_score'] = tut_open_target_df['datasourceScores_score'].round(2)

# plot the directed graph
Tut_open_target_DiGraph = plot_DiGraph(tut_open_target_df, 'datasourceScores_id', 'approvedSymbol', 'datasourceScores_score') 

png

Creating Undirected Graphs

In the undirected Graph, there are no arrows because edges are bidirectional. The weight (scores) remains unchanged, and edge direction is not considered

def plot_Graph(
        df: pd.DataFrame,
        source_col: str,
        target_col: str,
        edge_attr_col: str,
        figsize: tuple = (10, 8)
) -> None:
    """
    Create and plot a undirected graph from the given df
    """
    # create a undirected graph from the DataFrame
    G = nx.from_pandas_edgelist(
        df,
        source=source_col,
        target=target_col,
        edge_attr=edge_attr_col,
        create_using=nx.Graph()  # notice the undirect Graph
    )
    
    # plot the undirected graph
    plt.figure(figsize=figsize)
    pos = nx.spring_layout(G)
    nx.draw(
        G,
        pos,
        with_labels=True,
        node_color='lightgreen',
        edge_color='gray',
        node_size=3000,
        font_size=10,
        font_weight='bold',
        arrows=True
    )
    
    edge_labels = nx.get_edge_attributes(G, edge_attr_col)
    nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
    plt.title(f'PPI Network ({G})')
    plt.show()
    return G

Tut_STRING_Graph = plot_Graph(tut_STRING_df,'preferredName_A', 'preferredName_B', 'score')

png

Tut_open_target_Graph = plot_Graph(tut_open_target_df, 'datasourceScores_id', 'approvedSymbol', 'datasourceScores_score')

png

Which Graph we chose?

For a general PPI networks where interactions are typically mutual and direction is not that important, we use an undirected graph, as it’s the common choice for most PPI studies. However, if we ever need to direct these graphs, we now know where to look

Detecting Communities in Graphs

Community detection identifies groups of nodes in a graph that are more connected to each other than to the rest of the network
Each community is assigned a distinct color, so nodes in the same community (cluster) share the same color
Colors help to easily visualize and distinguish different communities (clusters) within the network

def plot_community_detection(graph, edge_attr_col: str, figsize: tuple = (10, 8))-> None:
    """
    plots graph with community detection
    """
    communities = community.greedy_modularity_communities(graph)

    # Color nodes by community
    colors = [0] * graph.number_of_nodes()
    for i, comm in enumerate(communities):
        for node in comm:
            colors[list(graph.nodes()).index(node)] = i

    # Plotting
    plt.figure(figsize=figsize)
    pos = nx.spring_layout(graph)
    nx.draw(
        graph,
        pos, with_labels=True,
        node_color=colors,
        cmap=plt.cm.jet, node_size=1000, edge_color='gray', arrows=True
    )
    edge_labels = nx.get_edge_attributes(graph, edge_attr_col)
    nx.draw_networkx_edge_labels(graph, pos, edge_labels=edge_labels)
    plt.title(f'PPI Network with Communities ({graph})')
    plt.show()

plot_community_detection(Tut_STRING_DiGraph, edge_attr_col ='score')
plot_community_detection(Tut_STRING_Graph, edge_attr_col = 'score')

png

Disease-Specific Graph Construction

Building Breast Cancer Interaction Networks

we already fetched the breast cancer dataset in this section

# breast cancer dataset
open_target_df

	datasourceScores_id	datasourceScores_score	score	id	approvedSymbol
0	uniprot_variants	0.996781	0.924615	ENSG00000139618	BRCA2
1	gene_burden	0.981427	0.924615	ENSG00000139618	BRCA2
2	genomics_england	0.977902	0.924615	ENSG00000139618	BRCA2
3	eva	0.969933	0.924615	ENSG00000139618	BRCA2
4	eva_somatic	0.948894	0.924615	ENSG00000139618	BRCA2
...	...	...	...	...	...
209	chembl	0.976998	0.740814	ENSG00000177084	POLE
210	cancer_gene_census	0.705094	0.740814	ENSG00000177084	POLE
211	slapenrich	0.833427	0.740814	ENSG00000177084	POLE
212	eva	0.278570	0.740814	ENSG00000177084	POLE
213	europepmc	0.199886	0.740814	ENSG00000177084	POLE

214 rows × 5 columns

open_target_df['datasourceScores_score'] = open_target_df['datasourceScores_score'].round(3)

STRING_df

	stringId_A	stringId_B	preferredName_A	preferredName_B	ncbiTaxonId	score	nscore	fscore	pscore	ascore	escore	dscore	tscore
0	9606.ENSP00000257566	9606.ENSP00000269305	TBX3	TP53	9606	0.400	0.0	0	0.0	0.000	0.000	0.0	0.400
1	9606.ENSP00000257566	9606.ENSP00000269305	TBX3	TP53	9606	0.400	0.0	0	0.0	0.000	0.000	0.0	0.400
2	9606.ENSP00000257566	9606.ENSP00000261769	TBX3	CDH1	9606	0.425	0.0	0	0.0	0.062	0.000	0.0	0.413
3	9606.ENSP00000257566	9606.ENSP00000261769	TBX3	CDH1	9606	0.425	0.0	0	0.0	0.062	0.000	0.0	0.413
4	9606.ENSP00000257566	9606.ENSP00000382423	TBX3	MAP3K1	9606	0.468	0.0	0	0.0	0.000	0.065	0.0	0.455
...	...	...	...	...	...	...	...	...	...	...	...	...	...
391	9606.ENSP00000406046	9606.ENSP00000418960	POLD1	BRCA1	9606	0.978	0.0	0	0.0	0.198	0.391	0.9	0.613
392	9606.ENSP00000418960	9606.ENSP00000466399	BRCA1	RAD51D	9606	0.825	0.0	0	0.0	0.126	0.095	0.0	0.797
393	9606.ENSP00000418960	9606.ENSP00000466399	BRCA1	RAD51D	9606	0.825	0.0	0	0.0	0.126	0.095	0.0	0.797
394	9606.ENSP00000418960	9606.ENSP00000451828	BRCA1	AKT1	9606	0.987	0.0	0	0.0	0.000	0.835	0.8	0.657
395	9606.ENSP00000418960	9606.ENSP00000451828	BRCA1	AKT1	9606	0.987	0.0	0	0.0	0.000	0.835	0.8	0.657

396 rows × 13 columns

Method 1: Using from_pandas_edgelist and compose

# create a graph for open target data
G = nx.from_pandas_edgelist(
    open_target_df, 
    source='datasourceScores_id', 
    target='approvedSymbol',
    edge_attr='datasourceScores_score',
    # edge_attr=['datasourceScores_score', 'score'], 
    create_using=nx.Graph()
)

for u, v, data in G.edges(data=True):
    data['weight'] = data.pop('datasourceScores_score')  # Assign weight
    data['type'] = 'disease-gene'

# Create a graph for the STRING dataframe
H = nx.from_pandas_edgelist(
    STRING_df, 
    source='preferredName_A', 
    target='preferredName_B', 
    edge_attr='score',
    # edge_attr=['score', 'nscore', 'fscore', 'pscore', 'ascore', 'escore', 'dscore', 'tscore'],
    create_using=nx.Graph()
)

for u, v, data in H.edges(data=True):
    data['weight'] = data.pop('score') # Assign weight
    data['type'] = 'protein-protein'

# Merge the two graphs
G_composed = nx.compose(G, H)

plt.figure(figsize=(30, 20))

pos = nx.spring_layout(G_composed, seed=42, k=0.7, iterations=90)  # Adjust 'k' and 'iterations' for better spacing

# Draw nodes with labels
nx.draw_networkx(G_composed, pos, with_labels=True, node_size=3000, font_size=10, font_weight='bold', edge_color='gray')
edge_labels = nx.get_edge_attributes(G_composed, 'weight')
nx.draw_networkx_edge_labels(G_composed, pos, edge_labels, font_color='red')

plt.axis('off')
plt.title(f"Breast Cancer Disease Associated Genes", fontsize=30, fontweight='bold')
plt.tight_layout()
# os.makedirs('assets', exist_ok=True)
# plt.savefig(f'assets/{disease_id}_{G_composed}.png')
plt.show()

png

these edge attributes are preserved, and they can be accessed later using nx.get_edge_attributes with the specific attribute name score or datasourceScores_score

Method 2: Manually Adding Edges with `add_edge` and compose

G2 = nx.Graph()
for _, row in open_target_df.iterrows():
    G2.add_edge(row['datasourceScores_id'],
               row['approvedSymbol'],
               weight=row['datasourceScores_score'],
               type='disease-gene')

H2 = nx.Graph()
for _, row in STRING_df.iterrows():
    H2.add_edge(row['preferredName_A'],
               row['preferredName_B'],
               weight=row['score'],
               type='protein-protein')

# Merge the two graphs
G_composed2 = nx.compose(G2, H2)

plt.figure(figsize=(30, 20))

pos = nx.spring_layout(G_composed2, seed=42, k=0.9, iterations=100)  # Adjust 'k' and 'iterations' for better spacing


# Draw nodes with labels
nx.draw_networkx(G_composed2, pos, with_labels=True, node_size=3000, font_size=10, font_weight='bold', edge_color='gray')

edge_labels = nx.get_edge_attributes(G_composed2, 'weight')
nx.draw_networkx_edge_labels(G_composed2, pos, edge_labels, font_color='red')

plt.axis('off')
plt.title(f"Breast Cancer Disease Associated Genes", fontsize=30, fontweight='bold')
plt.tight_layout()
# plt.savefig(f'assets/{disease_id}.png')
plt.show()

png

Comparing Node Connections Across Methods

def get_node_edges(graph, node, edge_type=None):
    """
    Returns edges connected to specified nodes in a graph
    optionally filtered by edge type
    """
    edge_types = ['disease-gene', 'protein-protein']
    if edge_type is not None and edge_type not in edge_types:
        raise ValueError(f"Invalid edge type. Must be one of {edge_types}.")
    edges = list(graph.edges(node, data=True))
    if edge_type:
        edges = [edge for edge in edges if edge[2].get('type') == edge_type]
    return edges

def edge_to_tuple(edge):
    """Convert edge to a tuple with dictionaries as frozensets"""
    u, v, attr = edge
    attr_frozenset = frozenset(attr.items())
    return (u, v, attr_frozenset)

node = 'ERBB4'
edges_composed = get_node_edges(G_composed, node, edge_type='protein-protein')
edges_composed2 = get_node_edges(G_composed2, node, edge_type='protein-protein')

print(f"Edges connected to {node} in {G_composed}:")
for edge in edges_composed[:5]:
    print(edge)

print(f"\nEdges connected to {node} in {G_composed2}:")
for edge in edges_composed2[:5]:
    print(edge)

edges_composed_set = set(edge_to_tuple(edge) for edge in edges_composed)
edges_composed2_set = set(edge_to_tuple(edge) for edge in edges_composed2)
if edges_composed_set == edges_composed2_set:
    print("\nThe sets of edges are identical.")

Edges connected to ERBB4 in Graph with 45 nodes and 412 edges:
('ERBB4', 'TP53', {'weight': 0.701, 'type': 'protein-protein'})
('ERBB4', 'CDH1', {'weight': 0.489, 'type': 'protein-protein'})
('ERBB4', 'ESR1', {'weight': 0.982, 'type': 'protein-protein'})
('ERBB4', 'ERBB2', {'weight': 0.987, 'type': 'protein-protein'})
('ERBB4', 'AKT1', {'weight': 0.423, 'type': 'protein-protein'})

Edges connected to ERBB4 in Graph with 45 nodes and 412 edges:
('ERBB4', 'TP53', {'weight': 0.701, 'type': 'protein-protein'})
('ERBB4', 'CDH1', {'weight': 0.489, 'type': 'protein-protein'})
('ERBB4', 'ESR1', {'weight': 0.982, 'type': 'protein-protein'})
('ERBB4', 'ERBB2', {'weight': 0.987, 'type': 'protein-protein'})
('ERBB4', 'AKT1', {'weight': 0.423, 'type': 'protein-protein'})

The sets of edges are identical.

Integrating Graphs and Class Definitions

import sys
sys.path.append('../src')
from graph_composer import GraphComposer

Generating Graphs for Infectious Diseases

The dataset for the Infectious Disease is identified by EFO_0005741

disease_id = "EFO_0005741"
disease_name="Infectious"
Infectious = GraphComposer(disease_id, disease_name)
Infectious.process_all(plot=True)

png

# Accessing the features, and genes
features = Infectious.features
df = Infectious.open_target_df
gene = Infectious.gene_symbols

score = get_feature_score(df, gene_symbol = 'PRNP', feature_key='orphanet')

orphanet score for PRNP: 0.89

merged_df = Infectious.get_merged_dataframe()
print(merged_df.describe)
print(merged_df.info())

<bound method NDFrame.describe of                source  target  weight             type
0    genomics_england    PRNP   0.978     disease-gene
1    genomics_england     OTC   0.919     disease-gene
2    genomics_england  RANBP2   0.304     disease-gene
3    genomics_england  NUP214   0.890     disease-gene
4    genomics_england    TP53   0.608     disease-gene
..                ...     ...     ...              ...
162              XPO1    EGFR   0.539  protein-protein
163              EGFR     B2M   0.452  protein-protein
164              EGFR     SYK   0.526  protein-protein
165              EGFR     MET   0.946  protein-protein
166               SYK     B2M   0.689  protein-protein

[167 rows x 4 columns]>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 167 entries, 0 to 166
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   source  167 non-null    object 
 1   target  167 non-null    object 
 2   weight  167 non-null    float64
 3   type    167 non-null    object 
dtypes: float64(1), object(3)
memory usage: 5.3+ KB
None

merged_df.head()

	source	target	weight	type
0	genomics_england	PRNP	0.978	disease-gene
1	genomics_england	OTC	0.919	disease-gene
2	genomics_england	RANBP2	0.304	disease-gene
3	genomics_england	NUP214	0.890	disease-gene
4	genomics_england	TP53	0.608	disease-gene

Generating Graphs for Alzheimer’s Disease

The dataset for the Alzheimer Disease is identified by MONDO_0004975

disease_id = "MONDO_0004975"
Alzheimer = GraphComposer(disease_id, disease_name="Alzheimer")
Alzheimer.process_all(plot=True)

png

# Accessing the merged_df
merged_df2 = Alzheimer.get_merged_dataframe()

merged_df2.head()

	source	target	weight	type
0	uniprot_variants	PSEN1	0.994	disease-gene
1	uniprot_variants	APP	0.951	disease-gene
2	uniprot_variants	SORL1	0.414	disease-gene
3	uniprot_variants	ADAM10	0.760	disease-gene
4	PSEN1	genomics_england	0.955	disease-gene