| Title: | 'C++' Implementations of Functional Enrichment Analysis |
|---|---|
| Description: | Fast implementations of functional enrichment analysis methods using 'C++' via 'Rcpp'. Currently provides Over-Representation Analysis (ORA), Gene Set Enrichment Analysis (GSEA), Weighted Enrichment Analysis for ORA and GSEA, Network-based Set Enrichment Analysis (NSEA), multi-layer network-based enrichment, and multi-omics integration workflows. Additional features include early fusion at the feature level, late fusion at the pathway level, multi-omics contribution tracing, topology-aware explanation helpers, Bayesian term selection, and extremely fast Random Walk with Restart (RWR) using 'RcppEigen'. The enrichment methods build on GSEA by Subramanian et al. (2005) <doi:10.1073/pnas.0506580102>, the multilevel strategy derived from 'fgsea' by Korotkevich et al. (2021) <doi:10.1101/060012>, and network-based enrichment ideas described by Glaab et al. (2012) <doi:10.1093/bioinformatics/bts389>. |
| Authors: | Guangchuang Yu [aut, cre] |
| Maintainer: | Guangchuang Yu <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 0.2.0 |
| Built: | 2026-07-02 20:31:44 UTC |
| Source: | https://github.com/cran/enrichit |
Combine pathway-level enrichment results from multiple omics or independent analyses. P-values of identical pathways are merged using statistical methods (e.g., Brown's method).
aggregate_enrichment(res_list, method = c("brown", "fisher", "stouffer"), ...)aggregate_enrichment(res_list, method = c("brown", "fisher", "stouffer"), ...)
res_list |
A named list of enrichment result objects (e.g., |
method |
Character, aggregation method for p-values. One of "brown", "fisher", or "stouffer". |
... |
Additional arguments passed to |
An enrichResult object containing the aggregated p-values, FDR, and combined gene lists.
Aggregate multi-omics or multi-source statistics into a unified object for downstream enrichment analysis.
aggregate_omics( x, method = c("fisher", "stouffer", "brown", "mean", "weighted_mean", "max_abs"), input = c("pvalue", "signed_score"), feature_type = "gene", conflict_policy = c("keep_all", "strict", "penalty"), ... )aggregate_omics( x, method = c("fisher", "stouffer", "brown", "mean", "weighted_mean", "max_abs"), input = c("pvalue", "signed_score"), feature_type = "gene", conflict_policy = c("keep_all", "strict", "penalty"), ... )
x |
A list of named numeric vectors, a data.frame, or a matrix. Row names (or names for vectors) must represent feature IDs. |
method |
Character, aggregation method. One of "fisher", "stouffer", "mean", or "max_abs". |
input |
Character, input type. One of "pvalue" or "signed_score". |
feature_type |
Character, type of the features (e.g., "gene", "protein"). Default is "gene". |
conflict_policy |
Character, strategy to handle directional conflicts when input is "signed_score". One of "keep_all" (default, ignore conflicts), "strict" (set to NA if any signs conflict), or "penalty" (divide final score by 2 if signs conflict). |
... |
Additional arguments. |
An object of class omics_aggregated containing score, pvalue (if input is "pvalue"), input_type, feature_type, and feature_id.
bayes_enrich() adds a model-based selection layer on top of ORA results.
It estimates the posterior probability that each candidate term is an
active biological program explaining the observed input genes.
bayes_enrich( x, candidate = c("top", "significant", "all"), n_terms = 200, by = "p.adjust", prior = 0.1, false_positive = 0.01, false_negative = 0.1, n_iter = 5000, burnin = 1000, thin = 1, posterior_cutoff = 0.5, seed = NULL, verbose = FALSE )bayes_enrich( x, candidate = c("top", "significant", "all"), n_terms = 200, by = "p.adjust", prior = 0.1, false_positive = 0.01, false_negative = 0.1, n_iter = 5000, burnin = 1000, thin = 1, posterior_cutoff = 0.5, seed = NULL, verbose = FALSE )
x |
An |
candidate |
Candidate terms to include. |
n_terms |
Maximum number of candidate terms when |
by |
Column used to order candidate terms. |
prior |
Prior probability that a term is active. |
false_positive |
Probability of observing a gene not covered by active terms. |
false_negative |
Probability of missing a gene covered by active terms. |
n_iter |
Total number of MCMC iterations. |
burnin |
Number of initial iterations discarded. |
thin |
Keep one sample every |
posterior_cutoff |
Terms with posterior greater than or equal to this value are marked active. |
seed |
Optional random seed. |
verbose |
Print sampler progress. |
The implementation uses a lightweight Metropolis-Hastings sampler over
binary latent term states. Given active terms, each gene is modeled as
observed with probability 1 - false_negative if covered by at least one
active term, and with probability false_positive otherwise. The prior
probability that a candidate term is active is prior.
This is intended as a result-compression and interpretation layer, not as a replacement for ORA p-values.
The input enrichResult object with additional columns in
@result: posterior, posterior_odds, bayes_rank,
bayes_active, bayes_covered_gene, and bayes_covered_count.
Return a data frame sorted by posterior probability from a result processed
by bayes_enrich(). This is a convenience wrapper around sorting
as.data.frame(x) by decreasing posterior.
bayes_summary(x, active = FALSE, n = Inf)bayes_summary(x, active = FALSE, n = Inf)
x |
An |
active |
Logical. If |
n |
Number of rows to return. Use |
A data frame ordered by decreasing posterior.
Compare merged enrichment results with single-omics enrichment results to classify the contribution pattern of each pathway.
classify_omics_pattern( merged_res, single_res, p_cutoff = 0.05, by = "p.adjust" )classify_omics_pattern( merged_res, single_res, p_cutoff = 0.05, by = "p.adjust" )
merged_res |
An |
single_res |
A named list of |
p_cutoff |
Numeric, the significance cutoff. Default is 0.05. |
by |
Character, the column to use for significance threshold. Default is "p.adjust". |
The merged_res object with an additional column Omics_Pattern in its result data.frame.
Collapse multi-layer diffusion scores
collapse_multilayer_scores( x, collapse = c("weighted_mean", "sum", "mean", "max_abs"), layer_weights = NULL, output_space = c("union", "gene"), mapping = NULL, target_layer = NULL )collapse_multilayer_scores( x, collapse = c("weighted_mean", "sum", "mean", "max_abs"), layer_weights = NULL, output_space = c("union", "gene"), mapping = NULL, target_layer = NULL )
x |
result from |
collapse |
one of "weighted_mean", "sum", "mean", or "max_abs". |
layer_weights |
optional named numeric vector used when
|
output_space |
one of "union" or "gene". |
mapping |
optional mapping data.frame with |
target_layer |
optional layer name to extract before collapsing. |
A multilayer_collapsed object with a score vector.
Class "compareClusterResult" This class represents the comparison result of gene clusters by GO categories at specific level or GO enrichment analysis.
compareClusterResultcluster comparing result
geneClustersa list of genes
funone of groupGO, enrichGO and enrichKEGG
gene2Symbolgene ID to Symbol
keytypeGene ID type
readablelogical flag of gene ID in symbol or not.
.callfunction call
termsimSimilarity between term
methodmethod of calculating the similarity between nodes
drdimension reduction result
organismorganism
Guangchuang Yu https://yulab-smu.top
Common parameters for enrichit functions
geneList |
A named numeric vector of gene statistics (e.g., log fold change), ranked in descending order. |
gene_sets |
A named list of gene sets. Each element is a character vector of genes. |
nPerm |
Number of permutations for p-value calculation (default: 1000). |
exponent |
Weighting exponent for enrichment score (default: 1.0). |
minGSSize |
minimal size of each geneSet for analyzing |
maxGSSize |
maximal size of each geneSet for analyzing |
pvalueCutoff |
P-value cutoff. |
pAdjustMethod |
P-value adjustment method (e.g., "BH"). |
verbose |
Logical. Print progress messages. |
gson |
A GSON object containing gene set information. |
method |
Permutation method. |
adaptive |
Logical. Use adaptive permutation. |
minPerm |
Minimum permutations for adaptive mode. |
maxPerm |
Maximum permutations for adaptive mode. |
pvalThreshold |
P-value threshold for early stopping. |
Class "enrichResult" This class represents the result of enrichment analysis.
resultenrichment analysis
pvalueCutoffpvalueCutoff
pAdjustMethodpvalue adjust method
qvalueCutoffqvalueCutoff
organismonly "human" supported
ontologybiological ontology
geneGene IDs
keytypeGene ID type
universebackground gene
gene2Symbolmapping gene to Symbol
geneSetsgene sets
readablelogical flag of gene ID in symbol or not.
termsimSimilarity between term
methodmethod of calculating the similarity between nodes
drdimension reduction result
Guangchuang Yu https://yulab-smu.top
mapping gene ID to gene Symbol
EXTID2NAME(OrgDb, geneID, keytype, toType = "SYMBOL")EXTID2NAME(OrgDb, geneID, keytype, toType = "SYMBOL")
OrgDb |
OrgDb |
geneID |
entrez gene ID |
keytype |
keytype |
toType |
ID type of the output |
gene symbol
Guangchuang Yu https://yulab-smu.top
mnseaResult
Extract pathway subnetwork data from a mnseaResult
extract_mnsea_subnetwork( res, pathway_id = NULL, include_couplings = TRUE, include_isolated = TRUE )extract_mnsea_subnetwork( res, pathway_id = NULL, include_couplings = TRUE, include_isolated = TRUE )
res |
A |
pathway_id |
Optional pathway ID. If |
include_couplings |
Logical, whether to include inter-layer coupling
edges. Default is |
include_isolated |
Logical, whether to keep nodes without retained
edges. Default is |
A list with pathway, layer_contribution, nodes, and edges.
geneID generic
geneID(x)geneID(x)
x |
enrichResult object |
'geneID' return the 'geneID' column of the enriched result which can be converted to data.frame via 'as.data.frame'
## Not run: data(geneList, package="DOSE") de <- names(geneList)[1:100] x <- DOSE::enrichDO(de) geneID(x) ## End(Not run)## Not run: data(geneList, package="DOSE") de <- names(geneList)[1:100] x <- DOSE::enrichDO(de) geneID(x) ## End(Not run)
geneInCategory generic
geneInCategory(x)geneInCategory(x)
x |
enrichResult |
'geneInCategory' return a list of genes, by spliting the input gene vector to enriched functional categories
## Not run: data(geneList, package="DOSE") de <- names(geneList)[1:100] x <- DOSE::enrichDO(de) geneInCategory(x) ## End(Not run)## Not run: data(geneList, package="DOSE") de <- names(geneList)[1:100] x <- DOSE::enrichDO(de) geneInCategory(x) ## End(Not run)
mnseaResult
Get cached contribution tables from a mnseaResult
get_mnsea_contribution(res, pathway_id = NULL, level = c("pathway", "feature"))get_mnsea_contribution(res, pathway_id = NULL, level = c("pathway", "feature"))
res |
A |
pathway_id |
Optional pathway ID. If |
level |
One of |
A data.frame containing cached contribution information.
Extract the original multi-omics statistics for genes in a specific enriched pathway.
get_omics_contribution(res, agg, pathway_id = NULL)get_omics_contribution(res, agg, pathway_id = NULL)
res |
An |
agg |
An |
pathway_id |
Character, the ID of the pathway to extract. If NULL, the top pathway is used. |
A data.frame containing the genes, their original omics statistics, the aggregated score, and whether they belong to the core enrichment.
Perform Gene Set Enrichment Analysis (GSEA) using a ranked gene list.
gsea( geneList, gene_sets, weight = NULL, minGSSize = 10, maxGSSize = 500, nPerm = 1000, exponent = 1, method = "multilevel", adaptive = FALSE, minPerm = 101, maxPerm = 1e+05, pvalThreshold = 0.1, eps = 1e-10, sampleSize = 101, seed = FALSE, nPermSimple = 1000, scoreType = "std", verbose = TRUE )gsea( geneList, gene_sets, weight = NULL, minGSSize = 10, maxGSSize = 500, nPerm = 1000, exponent = 1, method = "multilevel", adaptive = FALSE, minPerm = 101, maxPerm = 1e+05, pvalThreshold = 0.1, eps = 1e-10, sampleSize = 101, seed = FALSE, nPermSimple = 1000, scoreType = "std", verbose = TRUE )
geneList |
A named numeric vector of gene statistics (e.g., log fold change), ranked in descending order. |
gene_sets |
A named list of gene sets. Each element is a character vector of genes. |
weight |
A named numeric vector of weights for genes. The names should match the names of geneList. If provided, the geneList will be multiplied by the weight and resorted before GSEA (default: NULL). |
minGSSize |
minimal size of each geneSet for analyzing |
maxGSSize |
maximal size of each geneSet for analyzing |
nPerm |
Number of permutations for p-value calculation (default: 1000). |
exponent |
Weighting exponent for enrichment score (default: 1.0). |
method |
Permutation method. |
adaptive |
Logical. Use adaptive permutation. |
minPerm |
Minimum permutations for adaptive mode. |
maxPerm |
Maximum permutations for adaptive mode. |
pvalThreshold |
P-value threshold for early stopping. |
eps |
Epsilon for multilevel methods (default: 1e-10). Sets the smallest p-value that can be estimated. |
sampleSize |
Sample size for multilevel methods (default: 101). |
seed |
Random seed for reproducibility (default: FALSE). If FALSE, a random seed is generated. |
nPermSimple |
Number of permutations for the simple method (default: 1000). |
scoreType |
Type of enrichment score calculation: "std", "pos", "neg" (default: "std"). |
verbose |
Logical. Print progress messages. |
A data.frame with columns:
ID: Gene set name
enrichmentScore: Enrichment Score
NES: Normalized Enrichment Score
pvalue: Empirical p-value from permutation test
setSize: Size of the gene set (number of genes found in geneList)
nPerm: (adaptive mode only) Actual number of permutations used
rank: Rank at which the maximum enrichment score is attained
leading_edge: Leading edge statistics (tags, list, signal)
core_enrichment: Genes in the leading edge, separated by '/'
# Example data stats <- rnorm(1000) names(stats) <- paste0("Gene", 1:1000) stats <- sort(stats, decreasing = TRUE) gs1 <- paste0("Gene", 1:50) gs2 <- paste0("Gene", 500:550) gene_sets <- list(Pathway1 = gs1, Pathway2 = gs2) # Use default fixed permutation method result <- gsea(geneList=stats, gene_sets=gene_sets, nPerm=100) # Use adaptive permutation for more accurate p-values result_adaptive <- gsea(geneList=stats, gene_sets=gene_sets, adaptive=TRUE)# Example data stats <- rnorm(1000) names(stats) <- paste0("Gene", 1:1000) stats <- sort(stats, decreasing = TRUE) gs1 <- paste0("Gene", 1:50) gs2 <- paste0("Gene", 500:550) gene_sets <- list(Pathway1 = gs1, Pathway2 = gs2) # Use default fixed permutation method result <- gsea(geneList=stats, gene_sets=gene_sets, nPerm=100) # Use adaptive permutation for more accurate p-values result_adaptive <- gsea(geneList=stats, gene_sets=gene_sets, adaptive=TRUE)
generic function for gene set enrichment analysis
gsea_gson( geneList, gson, weight = NULL, nPerm = 1000, exponent = 1, minGSSize = 10, maxGSSize = 500, pvalueCutoff = 0.05, pAdjustMethod = "BH", method = "multilevel", adaptive = FALSE, minPerm = 101, maxPerm = 1e+05, pvalThreshold = 0.1, verbose = TRUE, ... )gsea_gson( geneList, gson, weight = NULL, nPerm = 1000, exponent = 1, minGSSize = 10, maxGSSize = 500, pvalueCutoff = 0.05, pAdjustMethod = "BH", method = "multilevel", adaptive = FALSE, minPerm = 101, maxPerm = 1e+05, pvalThreshold = 0.1, verbose = TRUE, ... )
geneList |
A named numeric vector of gene statistics (e.g., log fold change), ranked in descending order. |
gson |
A GSON object containing gene set information. |
weight |
A named numeric vector of weights for genes. |
nPerm |
Number of permutations for p-value calculation (default: 1000). |
exponent |
Weighting exponent for enrichment score (default: 1.0). |
minGSSize |
minimal size of each geneSet for analyzing |
maxGSSize |
maximal size of each geneSet for analyzing |
pvalueCutoff |
P-value cutoff. |
pAdjustMethod |
P-value adjustment method (e.g., "BH"). |
method |
Permutation method. |
adaptive |
Logical. Use adaptive permutation. |
minPerm |
Minimum permutations for adaptive mode. |
maxPerm |
Maximum permutations for adaptive mode. |
pvalThreshold |
P-value threshold for early stopping. |
verbose |
Logical. Print progress messages. |
... |
Additional parameters passed to gsea() |
gseaResult object
Guangchuang Yu
Class "gseaResult" This class represents the result of GSEA analysis
resultGSEA anaysis
organismorganism
setTypesetType
geneSetsgeneSets
geneListorder rank geneList
keytypeID type of gene
permScorespermutation scores
paramsparameters
gene2Symbolgene ID to Symbol
readablewhether convert gene ID to symbol
drdimension reduction result
Guangchuang Yu https://yulab-smu.top
Calculate GSEA Running Enrichment Scores
gseaScores(geneList, geneSet, exponent = 1, fortify = FALSE)gseaScores(geneList, geneSet, exponent = 1, fortify = FALSE)
geneList |
a named numeric vector of gene statistics (e.g., t-statistics or log-fold changes), sorted in decreasing order. |
geneSet |
a character vector of gene IDs belonging to the gene set. |
exponent |
a numeric value defining the weight of the running enrichment score. Default is 1. |
fortify |
logical. If TRUE, returns a data frame with columns |
If fortify = TRUE, a data frame containing the running enrichment scores and positions.
If fortify = FALSE, a numeric value representing the Enrichment Score (ES).
Guangchuang Yu
filter enriched result by gene set size or gene count
gsfilter(x, by = "GSSize", min = NA, max = NA)gsfilter(x, by = "GSSize", min = NA, max = NA)
x |
instance of enrichResult or compareClusterResult |
by |
one of 'GSSize' or 'Count' |
min |
minimal size |
max |
maximal size |
update object
Guangchuang Yu
Map protein-level or other feature-level statistics to a unified gene-level space.
harmonize_ids( x, mapping, from = "protein", to = "gene", collapse = c("max_abs", "mean", "min_p") )harmonize_ids( x, mapping, from = "protein", to = "gene", collapse = c("max_abs", "mean", "min_p") )
x |
A structured result from |
mapping |
A data.frame with |
from |
Character, source feature type. Default is "protein". |
to |
Character, target feature type. Default is "gene". |
collapse |
Character, method to collapse multiple source IDs mapped to a single target ID. One of "max_abs", "mean", or "min_p". |
A harmonized omics_aggregated object.
Multi-layer Network-based Gene Set Enrichment Analysis
mnsea( seed_list, networks, couplings, gene_sets, mode = c("evidence", "signed"), layer_weights = NULL, collapse = c("weighted_mean", "sum", "mean", "max_abs"), target_layer = NULL, output_space = c("union", "gene"), p = 0.5, interlayer_strength = 1, specific_weight = FALSE, minGSSize = 10, maxGSSize = 500, threshold = 1e-09, maxIter = 100, verbose = TRUE, ... )mnsea( seed_list, networks, couplings, gene_sets, mode = c("evidence", "signed"), layer_weights = NULL, collapse = c("weighted_mean", "sum", "mean", "max_abs"), target_layer = NULL, output_space = c("union", "gene"), p = 0.5, interlayer_strength = 1, specific_weight = FALSE, minGSSize = 10, maxGSSize = 500, threshold = 1e-09, maxIter = 100, verbose = TRUE, ... )
seed_list |
named list of named numeric vectors, one per layer. |
networks |
named list of layer-specific networks. |
couplings |
data.frame of inter-layer edges. |
gene_sets |
list of gene sets. |
mode |
one of "evidence" or "signed". |
layer_weights |
optional named numeric vector. |
collapse |
one of "weighted_mean", "sum", "mean", or "max_abs". |
target_layer |
optional layer name to export scores from. |
output_space |
one of "union" or "gene". |
p |
restart probability. |
interlayer_strength |
global scaling factor for coupling edges. |
specific_weight |
logical. |
minGSSize |
minimal size of each gene set. |
maxGSSize |
maximal size of genes annotated for testing. |
threshold |
convergence threshold. |
maxIter |
maximal number of iterations. |
verbose |
logical. |
... |
additional arguments passed to |
A mnseaResult object.
Multi-layer NSEA using a GSON object
mnsea_gson( seed_list, networks, couplings, gson, mode = c("evidence", "signed"), layer_weights = NULL, collapse = c("weighted_mean", "sum", "mean", "max_abs"), target_layer = NULL, output_space = c("union", "gene"), p = 0.5, interlayer_strength = 1, specific_weight = FALSE, minGSSize = 10, maxGSSize = 500, threshold = 1e-09, maxIter = 100, verbose = TRUE, ... )mnsea_gson( seed_list, networks, couplings, gson, mode = c("evidence", "signed"), layer_weights = NULL, collapse = c("weighted_mean", "sum", "mean", "max_abs"), target_layer = NULL, output_space = c("union", "gene"), p = 0.5, interlayer_strength = 1, specific_weight = FALSE, minGSSize = 10, maxGSSize = 500, threshold = 1e-09, maxIter = 100, verbose = TRUE, ... )
seed_list |
named list of named numeric vectors, one per layer. |
networks |
named list of layer-specific networks. |
couplings |
data.frame of inter-layer edges. |
gson |
a GSON object. |
mode |
one of "evidence" or "signed". |
layer_weights |
optional named numeric vector. |
collapse |
one of "weighted_mean", "sum", "mean", or "max_abs". |
target_layer |
optional layer name to export scores from. |
output_space |
one of "union" or "gene". |
p |
restart probability. |
interlayer_strength |
global scaling factor for coupling edges. |
specific_weight |
logical. |
minGSSize |
minimal size of each gene set. |
maxGSSize |
maximal size of genes annotated for testing. |
threshold |
convergence threshold. |
maxIter |
maximal number of iterations. |
verbose |
logical. |
... |
additional arguments passed to |
A mnseaResult object.
Class "mnseaResult" This class represents the result of multi-layer Network-based Set Enrichment Analysis.
resultenrichment analysis
organismorganism label for the enrichment result
setTypegene set collection type
geneSetsgene sets
geneListorder rank geneList
keytypeID type of gene
permScorespermutation score matrix inherited from gseaResult
gene2Symbolgene ID to symbol mapping
readablelogical flag of gene ID in symbol or not.
termsimCalculation matrix of termsim.
methodMethod of termsim.
paramsparameters
drdimension reduction result
multilayer_networkprepared multi-layer network object.
layer_scoreslist of layer-specific diffusion score vectors.
collapsed_scoresnumeric vector used for downstream enrichment.
layer_weightsnumeric vector of layer weights.
coupling_tabledata.frame of inter-layer couplings.
modecharacter, "evidence" or "signed".
iterationsinteger, the actual number of iterations RWR took to converge.
restart_probnumeric, the restart probability used in RWR.
collapse_methodcharacter collapse method used on layer scores.
target_layeroptional layer name used for downstream export.
output_spacecharacter output space of collapsed scores.
pathway_contributionpathway-by-layer contribution table precomputed for explanation.
feature_contributionfeature-by-layer contribution table precomputed for explanation.
Guangchuang Yu https://yulab-smu.top
Network-based Gene Set Enrichment Analysis
nsea( geneList, network, gene_sets, mode = c("evidence", "signed"), p = 0.5, specific_weight = FALSE, minGSSize = 10, maxGSSize = 500, threshold = 1e-09, maxIter = 100, verbose = TRUE, ... )nsea( geneList, network, gene_sets, mode = c("evidence", "signed"), p = 0.5, specific_weight = FALSE, minGSSize = 10, maxGSSize = 500, threshold = 1e-09, maxIter = 100, verbose = TRUE, ... )
geneList |
named numeric vector. In "evidence" mode, must be non-negative. In "signed" mode, can contain both positive and negative values. |
network |
edge list (data.frame) or sparse matrix. |
gene_sets |
list of gene sets. |
mode |
character, either "evidence" (default) or "signed". If "signed", the network propagation runs separately for positive and negative values. |
p |
restart probability for RWR (default is 0.5). |
specific_weight |
logical, whether to apply gene specificity weighting (TF-IDF style) based on gene frequencies in |
minGSSize |
minimal size of each gene set for analyzing. default here is 10. |
maxGSSize |
maximal size of genes annotated for testing. default here is 500. |
threshold |
convergence threshold for RWR (default is 1e-9). |
maxIter |
maximal number of RWR iterations (default is 100). |
verbose |
logical, print messages. |
... |
other arguments passed to |
A nseaResult object of NSEA results.
Network-based GSEA using a GSON object
nsea_gson( geneList, network, gson, mode = c("evidence", "signed"), p = 0.5, specific_weight = FALSE, minGSSize = 10, maxGSSize = 500, threshold = 1e-09, maxIter = 100, verbose = TRUE, ... )nsea_gson( geneList, network, gson, mode = c("evidence", "signed"), p = 0.5, specific_weight = FALSE, minGSSize = 10, maxGSSize = 500, threshold = 1e-09, maxIter = 100, verbose = TRUE, ... )
geneList |
named numeric vector. In "evidence" mode, must be non-negative. In "signed" mode, can contain both positive and negative values. |
network |
edge list (data.frame) or sparse matrix. |
gson |
a GSON object. |
mode |
character, either "evidence" (default) or "signed". |
p |
restart probability for RWR (default is 0.5). |
specific_weight |
logical, whether to apply gene specificity weighting (TF-IDF style) based on gene frequencies in the GSON object. Default is FALSE. |
minGSSize |
minimal size of each gene set for analyzing. default here is 10. |
maxGSSize |
maximal size of genes annotated for testing. default here is 500. |
threshold |
convergence threshold for RWR (default is 1e-9). |
maxIter |
maximal number of RWR iterations (default is 100). |
verbose |
logical, print messages. |
... |
other arguments passed to |
A nseaResult object.
Class "nseaResult" This class represents the result of Network-based Set Enrichment Analysis (NSEA).
resultenrichment analysis
organismorganism label for the enrichment result
setTypegene set collection type
geneSetsgene sets
geneListorder rank geneList
keytypeID type of gene
permScorespermutation score matrix inherited from gseaResult
gene2Symbolgene ID to symbol mapping
readablelogical flag of gene ID in symbol or not.
termsimCalculation matrix of termsim.
methodMethod of termsim.
paramsparameters
drdimension reduction result
networksparse matrix or data.frame representing the underlying network.
diffusion_scoresnumeric vector of RWR diffusion scores for each node.
modecharacter, "evidence" or "signed", describing the RWR propagation mode.
iterationsinteger, the actual number of iterations RWR took to converge.
restart_probnumeric, the restart probability used in RWR.
Guangchuang Yu https://yulab-smu.top
Perform over-representation analysis using hypergeometric test (Fisher's exact test).
ora(gene, gene_sets, universe, weight = NULL)ora(gene, gene_sets, universe, weight = NULL)
gene |
Character vector of differentially expressed genes (or gene list of interest). |
gene_sets |
A named list of gene sets. Each element is a character vector of genes. |
universe |
Character vector of background genes (e.g., all genes in the platform). |
weight |
A named numeric vector of weights for background genes. If provided, Weighted ORA will be performed using Wallenius' noncentral hypergeometric distribution (requires 'BiasedUrn' package). The names should match the universe genes. |
A data.frame with columns:
GeneSet |
Gene set name |
SetSize |
Number of genes in the gene set (intersected with universe) |
DEInSet |
Number of differentially expressed genes in the gene set |
DESize |
Total number of differentially expressed genes in universe |
PValue |
Raw p-value from hypergeometric test |
# Example data de_genes <- c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5") all_genes <- paste0("Gene", 1:1000) gs1 <- paste0("Gene", 1:50) gs2 <- paste0("Gene", 51:150) gs3 <- paste0("Gene", 151:300) gene_sets <- list(Pathway1 = gs1, Pathway2 = gs2, Pathway3 = gs3) result <- ora(gene=de_genes, gene_sets=gene_sets, universe=all_genes) head(result)# Example data de_genes <- c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5") all_genes <- paste0("Gene", 1:1000) gs1 <- paste0("Gene", 1:50) gs2 <- paste0("Gene", 51:150) gs3 <- paste0("Gene", 151:300) gene_sets <- list(Pathway1 = gs1, Pathway2 = gs2, Pathway3 = gs3) result <- ora(gene=de_genes, gene_sets=gene_sets, universe=all_genes) head(result)
interal method for enrichment analysis
ora_gson( gene, pvalueCutoff, pAdjustMethod = "BH", universe = NULL, weight = NULL, minGSSize = 10, maxGSSize = 500, qvalueCutoff = 0.2, gson )ora_gson( gene, pvalueCutoff, pAdjustMethod = "BH", universe = NULL, weight = NULL, minGSSize = 10, maxGSSize = 500, qvalueCutoff = 0.2, gson )
gene |
a vector of entrez gene id. |
pvalueCutoff |
P-value cutoff. |
pAdjustMethod |
P-value adjustment method (e.g., "BH"). |
universe |
background genes, default is the intersection of the 'universe' with genes that have annotations.
Users can set |
weight |
A named numeric vector of weights for background genes. If provided, Weighted ORA will be performed. |
minGSSize |
minimal size of each geneSet for analyzing |
maxGSSize |
maximal size of each geneSet for analyzing |
qvalueCutoff |
cutoff of qvalue |
gson |
A GSON object containing gene set information. |
using the hypergeometric model
A enrichResult instance.
Guangchuang Yu https://yulab-smu.top
Prepare multi-layer network for repeated propagation
prepare_multilayer_network( networks, couplings, directed = FALSE, intra_normalize = "column", inter_normalize = "column", interlayer_strength = 1, layer_order = names(networks) )prepare_multilayer_network( networks, couplings, directed = FALSE, intra_normalize = "column", inter_normalize = "column", interlayer_strength = 1, layer_order = names(networks) )
networks |
named list of layer-specific networks. |
couplings |
data.frame of inter-layer edges with columns
|
directed |
logical, whether the multi-layer graph is directed. |
intra_normalize |
one of "column", "row", or "none". |
inter_normalize |
one of "column", "row", or "none". |
interlayer_strength |
numeric scalar used to scale all coupling edges. |
layer_order |
explicit layer order. Defaults to |
A multilayer_network object.
Prepare network for repeated NSEA runs
prepare_network(network, directed = FALSE, normalize = "column")prepare_network(network, directed = FALSE, normalize = "column")
network |
edge list (data.frame with 2 or 3 columns) or sparse matrix. |
directed |
logical, whether the network is directed. Default is FALSE. |
normalize |
one of "column", "row", or "none". Default is "column". |
A sparse matrix (dgCMatrix) that has been properly formatted and normalized.
Propagate signals on a multi-layer network
propagate_multilayer( seed_list, network, mode = c("evidence", "signed"), p = 0.5, threshold = 1e-09, maxIter = 100, layer_weights = NULL, target_layer = NULL )propagate_multilayer( seed_list, network, mode = c("evidence", "signed"), p = 0.5, threshold = 1e-09, maxIter = 100, layer_weights = NULL, target_layer = NULL )
seed_list |
named list of named numeric vectors, one per layer. |
network |
a prepared |
mode |
one of "evidence" or "signed". |
p |
restart probability. |
threshold |
convergence threshold. |
maxIter |
maximum number of iterations. |
layer_weights |
optional named numeric vector of layer weights. |
target_layer |
optional layer name to focus on downstream. |
A multilayer_propagation object.
Convert continuous aggregated statistics into a discrete list of genes and a universe for Over-Representation Analysis.
select_features_for_ora(x, cutoff = 0.05, by = c("pvalue", "score"), ...)select_features_for_ora(x, cutoff = 0.05, by = c("pvalue", "score"), ...)
x |
A structured result from |
cutoff |
Numeric, the threshold to apply. |
by |
Character, metric to apply the threshold on. One of "pvalue" or "score". |
... |
Additional arguments. |
A list containing gene (the selected feature IDs) and universe (all feature IDs).
mapping geneID to gene Symbol
setReadable(x, OrgDb, keyType = "auto", toType = "SYMBOL")setReadable(x, OrgDb, keyType = "auto", toType = "SYMBOL")
x |
enrichResult Object |
OrgDb |
OrgDb |
keyType |
keyType of gene |
toType |
ID type of the output |
enrichResult Object
Guangchuang Yu
show method for gseaResult instance
show method for nseaResult instance
show method for mnseaResult instance
show method for enrichResult instance
show(object) show(object) show(object) show(object)show(object) show(object) show(object) show(object)
object |
A |
message
message
message
message
Guangchuang Yu https://yulab-smu.top
summary method for gseaResult instance
summary method for enrichResult instance
summary(object, ...) summary(object, ...)summary(object, ...) summary(object, ...)
object |
A |
... |
additional parameter |
A data frame
A data frame
Guangchuang Yu https://yulab-smu.top