SMAD is an R package designed for the statistical analysis of Affinity Purification–Mass Spectrometry (AP-MS) data. It implements several widely-used algorithms to compute confidence scores, helping researchers distinguish bona fide protein-protein interactions (PPI) from non-specific background noise.
You can install the development version of SMAD from GitHub using devtools or remotes:
# Install devtools if not already installed
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
# Install SMAD
devtools::install_github("zqzneptune/SMAD")
# Load the package
library(SMAD)To ensure compatibility with SMAD scoring functions, your input data.frame should follow a specific structure. You can explore the built-in demo dataset:
data(TestDataInput)
head(TestDataInput)| Column | Description |
|---|---|
| idRun | Unique identifier for a specific AP-MS run (replicate). |
| idBait | Unique identifier for the bait protein. |
| idPrey | Unique identifier for the prey protein. |
| countPrey | Spectral counts (or peptide counts) for the prey protein. |
| lenPrey | Protein sequence length of the prey (required for HGScore/NSAF). |
Note on Replicates: If you have biological or technical duplicates, ensure the
idRunis unique (e.g., append suffixes like_A,_B). The combination ofidRunandidBaitshould uniquely identify a single purification.
The Comparative Proteomic Analysis Software Suite (CompPASS) uses a "spoke model" to identify high-confidence interactions. It outputs several metrics, including Z-score, S-score, D-score, and WD-score. This implementation is optimized for performance based on the BioPlex pipeline.
- Required columns:
idRun,idBait,idPrey,countPrey. - Usage:
datScore <- CompPASS(datInput)
The Dice coefficient measures the similarity between prey pair-wise combinations. It is particularly useful for identifying preys that frequently co-occur across different runs.
- Required columns:
idRun,idPrey. - Usage:
datScore <- DICE(datInput)
Based on a hypergeometric distribution error model, this algorithm calculates the probability of finding a prey across different bait purifications by chance.
- Required columns:
idRun,idPrey. - Usage:
datScore <- Hart(datInput)
HGScore enhances the Hart hypergeometric model by incorporating NSAF (Normalized Spectral Abundance Factor), which accounts for protein length and abundance. It is designed based on a "matrix model."
- Required columns:
idRun,idPrey,countPrey,lenPrey. - Usage:
datScore <- HG(datInput)
The PE score incorporates both spoke and matrix models to calculate an interaction score based on the frequency and exclusivity of prey identifications.
- Required columns:
idRun,idBait,idPrey. - Usage:
datScore <- PE(datInput)
library(SMAD)
# Load example data
data(TestDataInput)
# Run CompPASS scoring
results <- CompPASS(TestDataInput)
# View top-scoring interactions based on WD-score
head(results[order(results$WD, decreasing = TRUE), ])Released under the MIT License.
© Qingzhou Zhang
- CompPASS: Sowa et al., 2009 (Cell)
- BioPlex: Huttlin et al., 2015 (Cell)
- Hart Scoring: Hart et al., 2007 (BMC Bioinformatics)
- HGScore: Guruharsha et al., 2011 (Cell)
- DICE: Zhang et al., 2008 (Bioinformatics)
- PE Score: Collins et al., 2007 (MCP)