Limitations
While GDAP provides valuable predictions, it has several important limitations that users should be aware of.
Data Limitations
Coverage
- Disease-specific: Predictions are limited to diseases with sufficient data in Open Targets Platform
- Gene coverage: Only includes genes with protein interaction data in STRING database
- Species-specific: Currently limited to Homo sapiens
Data Quality
- Experimental bias: Training data reflects historical research priorities
- Publication bias: Well-studied diseases have more data
- Annotation quality: Depends on accuracy of source databases
Model Limitations
Prediction Accuracy
- False positives: Some predicted associations may not be biologically relevant
- False negatives: May miss associations due to data limitations
- Confidence scores: Should not be interpreted as absolute probabilities
Generalization
- Training bias: Models trained on known associations may not generalize to novel relationships
- Disease specificity: Models trained on one disease may not apply to others
- Temporal changes: New discoveries may not be reflected in training data
Technical Limitations
Computational Resources
- Memory requirements: Large datasets require significant RAM
- Processing time: Complex embeddings can take hours to generate
- Storage: Results can be several GB for large analyses
Scalability
- Graph size: Very large networks may exceed memory limits
- Embedding quality: Performance may degrade with extremely large graphs
- Model complexity: Some algorithms don’t scale well to massive datasets
Biological Limitations
Context Dependence
- Tissue specificity: Predictions don’t account for tissue-specific expression
- Temporal dynamics: Don’t capture developmental or disease progression changes
- Environmental factors: Don’t consider external influences on gene expression
Validation Requirements
- Experimental validation: All predictions require laboratory confirmation
- Biological context: Predictions need interpretation in disease context
- Clinical relevance: Not all associations are clinically meaningful
Best Practices
Mitigating Limitations
- Use multiple models: Compare predictions across different algorithms
- Validate predictions: Always confirm with experimental data
- Consider context: Interpret results in biological and clinical context
- Update regularly: Retrain models with new data
Quality Control
- Check data sources: Verify disease and gene data quality
- Monitor performance: Track model accuracy over time
- Document assumptions: Clearly state limitations in reports