In this vignette, we are going to see how to use MIRit for investigating the compromised miRNA-gene regulatory networks in thyroid cancer. In particular, an RNA-Seq experiment will be used as an example to demonstrate how to perform an integrative analysis with MIRit, including differential expression analysis, functional enrichment and characterization, correlation analysis and, lastly, the construction and visualization of the impaired miRNAs regulatory axes within biological pathways.
MIRit 1.0.0
MIRit
(miRNA integration tool) is an open source R package that aims to
facilitate the comprehension of microRNA (miRNA) biology through the integrative
analysis of gene and miRNA expression data deriving from different platforms,
including microarrays, RNA-Seq, miRNA-Seq, proteomics and single-cell
transcriptomics. Given their regulatory importance, a complete characterization
of miRNA dysregulations results crucial to explore the molecular networks that
may lead to the insurgence of complex diseases. Unfortunately, there are no
currently available options for thoroughly interpreting the biological
consequences of miRNA dysregulations, thus limiting the ability to identify
the affected pathways and reconstruct the compromised molecular networks. To
tackle this limitation, we developed MIRit, an all-in-one framework that
provides flexible and powerful methods for performing integrative miRNA-mRNA
multi-omic analyses from start to finish. In particular, MIRit includes multiple
modules that allow to perform:
If you use MIRit in published research, please cite the following paper:
Ronchi J and Foti M. ‘MIRit: an integrative R framework for the identification of impaired miRNA-mRNA regulatory networks in complex diseases’. bioRxiv (2023). doi:10.1101/2023.11.24.568528
This package internally uses different R/Bioconductor packages, remember to cite the appropriate publications.
Before starting, MIRit must be installed on your system. You can do this through Bioconductor.
## install MIRit from Bioconductor
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("MIRit")
If needed, you could also install the development version of MIRit directly from GitHub:
## install the development version from GitHub
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("jacopo-ronchi/MIRit")
When MIRit is installed, we can load it through:
## load MIRit
library(MIRit)
To demonstrate the capabilities of MIRit we will use RNA-Seq data from Riesco-Eizaguirre et al. (2015). This experiment collected samples from 8 papillary thyroid carcinoma tumors and contralateral normal thyroid tissue from the same patients. These samples were profiled for gene expression through RNA-Seq, and for miRNA expression through miRNA-Seq. To provide easy access to the user, raw count matrices have been retrieved from GEO and included in this package.
To load example data, we can simply use the data()
function:
## load count matrix for genes
data(geneCounts, package = "MIRit")
## load count matrix for miRNAs
data(mirnaCounts, package = "MIRit")
When using MIRit, we must specify whether miRNA and gene expression values derive from the same individuals or not. As already mentioned, paired data are those where individuals used to measure gene expression are the same subjects used to measure miRNA expression. On the other hand, unpaired data are those where gene expression and miRNA expression derive from different cohorts of donors. Importantly, MIRit considers as paired samples those data sets where paired measurements are available for at least some samples.
In our case, miRNA and gene expression data originate from the same subjects, and therefore we will conduct a paired samples analysis.
As input data, MIRit requires miRNA and gene expression measurements as matrices with samples as columns, and genes/miRNAs as rows. Further, the row names of miRNA expression matrix should contain miRNA names according to miRBase nomenclature (e.g. hsa-miR-151a-5p, hsa-miR-21-5p), whereas for gene expression matrix, row names must contain gene symbols according to HGNC (e.g. TYK2, BDNF, NTRK2).
These matrices may handle different types of values deriving from multiple technologies, including microarrays, RNA-Seq and proteomics. The only requirement is that, for microarray studies, expression matrices must be normalized and log2 transformed. This can be achieved by applying the RMA algorithm implemented in the oligo (Carvalho and Irizarry 2010) package, or by applying other quantile normalization strategies. On the contrary, for RNA-Seq and miRNA-Seq experiments, the simple count matrix must be supplied.
Eventually, expression matrices required by MIRit should appear as those in
mirnaCounts
and geneCounts
, which are displayed in Tables
1 and 2.
PTC 1 | PTC 2 | PTC 3 | PTC 4 | PTC 5 | |
---|---|---|---|---|---|
A1BG | 7 | 6 | 9 | 12 | 7 |
A1BG-AS1 | 20 | 8 | 22 | 6 | 16 |
A1CF | 0 | 0 | 0 | 1 | 1 |
A2M | 9 | 11 | 9 | 37 | 18 |
A2M-AS1 | 1486 | 722 | 801 | 968 | 1787 |
PTC 1 | PTC 2 | PTC 3 | PTC 4 | PTC 5 | |
---|---|---|---|---|---|
hsa-let-7a-2-3p | 3 | 0 | 9 | 1 | 4 |
hsa-let-7a-3p | 472 | 82 | 228 | 122 | 313 |
hsa-let-7a-5p | 141101 | 45543 | 105598 | 45503 | 159598 |
hsa-let-7b-3p | 412 | 81 | 120 | 147 | 164 |
hsa-let-7b-5p | 16337 | 6586 | 8121 | 7993 | 16516 |
Once we have expression matrices in the proper format, we need to inform MIRit
about the samples in study and the biological conditions of interest. To do so,
we need to create a data.frame
that must contain:
primary
, specifying a unique identifier for each different
subject;mirnaCol
, matching the column name used for each sample in
the miRNA expression matrix;geneCol
, matching the column name used for each sample in
the gene expression matrix;Firstly, let’s take a look at the column names used for miRNA and gene expression matrices.
## print sample names in geneCounts
colnames(geneCounts)
#> [1] "PTC 1" "PTC 2" "PTC 3" "PTC 4" "PTC 5" "PTC 6" "PTC 7" "PTC 8" "NTH 1"
#> [10] "NTH 2" "NTH 3" "NTH 4" "NTH 5" "NTH 6" "NTH 7" "NTH 8"
## print sample names in mirnaCounts
colnames(mirnaCounts)
#> [1] "PTC 1" "PTC 2" "PTC 3" "PTC 4" "PTC 5" "PTC 6" "PTC 7" "PTC 8" "NTH 1"
#> [10] "NTH 2" "NTH 3" "NTH 4" "NTH 5" "NTH 6" "NTH 7" "NTH 8"
## check if samples in geneCounts are equal to those in mirnaCounts
identical(colnames(geneCounts), colnames(mirnaCounts))
#> [1] TRUE
In our case, we see that both expression matrices have the same column names,
and therefore mirnaCol
and geneCol
will be identical. However, note that is
not always the case, especially for unpaired data, where miRNA and gene
expression values derive from different subjects. In these cases, mirnaCol
and
geneCol
must map each column of miRNA and gene expression matrices to the
relative subjects indicated in the primary
column. Notably, for unpaired data,
NAs
can be used for missing entries in mirnaCol
/geneCol
.
That said, we can proceed to create the data.frame
with sample metadata as
follows.
## create a data.frame containing sample metadata
meta <- data.frame(primary = colnames(mirnaCounts),
mirnaCol = colnames(mirnaCounts),
geneCol = colnames(geneCounts),
disease = c(rep("PTC", 8), rep("NTH", 8)),
patient = c(rep(paste("Sample_", seq(8), sep = ""), 2)))
MirnaExperiment
objectAt this point, we need to create an object of class MirnaExperiment
, which is
the main class used in MIRit for integrative miRNA-mRNA analyses. In particular,
this class extends the MultiAssayExperiment
class from the homonym package
(Ramos et al. 2017) to store expression levels of both miRNAs and genes,
differential expression results, miRNA-target pairs and integrative miRNA-gene
analysis.
The easiest way to create a valid MirnaExperiment
object is to use the
MirnaExperiment()
function, which automatically handles the formatting of
input data and the creation of the object.
## create the MirnaExperiment object
experiment <- MirnaExperiment(mirnaExpr = mirnaCounts,
geneExpr = geneCounts,
samplesMetadata = meta,
pairedSamples = TRUE)
Now that the MirnaExperiment
object has been created, we can move to
differential expression analysis, which aims to define differentially expressed
features across biological conditions.
Firstly, before doing anything else, it is useful to explore expression
variability through dimensionality reduction techniques. This is useful because
it allows us to visualize the main drivers of expression differences. In this
regard, MIRit offers the plotDimensions()
function, which enables to visualize
both miRNA and gene expression in the multidimensional space (MDS plots).
Moreover, it is possible to color samples based on specific variables, hence
allowing to explore specific patterns between distinct biological groups.
In our example, let’s examine expression variability for both miRNAs and genes, and let’s color the samples based on “disease”, a variable included in the previously defined metadata.
geneMDS <- plotDimensions(experiment,
assay = "genes",
condition = "disease",
title = "MDS plot for genes")
mirnaMDS <- plotDimensions(experiment,
assay = "microRNA",
condition = "disease",
title = "MDS plot for miRNAs")
ggpubr::ggarrange(geneMDS, mirnaMDS,
nrow = 1, labels = "AUTO", common.legend = TRUE)
As we can see from Figures 1A and 1B, the samples are very well separated on the basis of disease condition, thus suggesting that this is the major factor that influences expression variability. This is exactly what we want, since we aim to evaluate the differences between cancer and normal tissue. If this wasn’t the case, we should identify the confounding variables and include them in the model used for differential expression analysis (see Section 3.2.2).
Now, we are ready to perform differential expression analysis. In this concern, MIRit provides different options based on the technology used. Indeed, when expression measurements derive from microarrays, MIRit calculates differentially expressed features through the pipeline implemented in the limma package. On the other hand, when expression values derive from RNA-Seq experiments, MIRit allows to choose between different approaches, including:
limma-voom
approach defined in the limma package.Moreover, MIRit gives the possibility of fully customizing the parameters used for differential expression analysis, thus allowing a finer control that makes it possible to adopt strategies that differ from the standard pipelines proposed in these packages. For additional information, see Section 3.2.4.
After identifying the variable of interest and the confounding factors, we must indicate the experimental model used for fitting expression values. Notably, MIRit will automatically take care of model fitting, so that we only need to indicate a formula with the appropriate variables.
In our case, we want to evaluate the differences between cancer and normal thyroid. Therefore, disease condition is our variable of interest. However, in this experimental design, each individual has been assayed twice, one time for cancer tissue, and one time for healthy contralateral thyroid. Thus, we also need to include the patient ID as a covariate in order to prevent the individual differences between subjects from confounding the differences due to disease.
## design the linear model for both genes and miRNAs
model <- ~ disease + patient
If other variables affecting miRNA and gene expression are observed, they should be included in this formula.
performMirnaDE()
and performGeneDE()
functionsOnce the linear model has been defined, we can perform differential expression
analysis through the performMirnaDE()
and performGeneDE()
functions, which
take as input a MirnaExperiment
object, and compute differential expression
for miRNAs and genes, respectively.
Additionally, we must define multiple arguments, namely:
group
, which corresponds to the name of the variable of interest, as
specified in Section 2.4. In our case, we are interested in studying the
differences between cancer tissue and normal tissue. Therefore our variable of
interest is “disease”.contrast
, which indicates the levels of the variable of interest to be
compared. In particular, this parameter takes as input a string where the
levels are separated by a dash, and where the second level corresponds to the
reference group. In our example, we want to compare samples from papillary
thyroid cancer (PTC) against normal thyroid tissue (NTH), thus we set contrast
to “PTC-NTH”.design
, which specifies the linear model with the variable of interest and
eventual covariates. To do so, we pass to this parameter the R formula that we
defined in Section 3.2.2.method
, which tells MIRit which pipeline we want to use for computing
differentially expressed features. As stated in Section 3.2.1, for
microarray studies the only option available is limma
, while for RNA-Seq
experiments, the user can choose between edgeR
, DESeq2
, and voom
(for
limma-voom). In our case we are going to perform differential expression
analysis through the Quasi-Likelihood pipeline implemented in the
edgeR package.Following our example, let’s calculate differentially expressed genes and differentially expressed miRNAs in thyroid cancer.
## perform differential expression for genes
experiment <- performGeneDE(experiment,
group = "disease",
contrast = "PTC-NTH",
design = model,
pCutoff = 0.01)
#> Performing differential expression analysis with edgeR...
#> Differential expression analysis reported 199 significant genes with p < 0.01 (correction: fdr). You can use the 'geneDE()' function to access results.
## perform differential expression for miRNAs
experiment <- performMirnaDE(experiment,
group = "disease",
contrast = "PTC-NTH",
design = model,
pCutoff = 0.01)
#> Performing differential expression analysis with edgeR...
#> Differential expression analysis reported 22 significant miRNAs with p < 0.01 (correction: fdr). You can use the 'mirnaDE()' function to access results.
If not specified, the performMirnaDE()
and performGeneDE()
functions will
define differentially expressed genes/miRNAs as those having an adjusted p-value
lower than 0.05. However, this behavior can be changed by tweaking the pCutoff
parameter, that specifies the statistical significance threshold; and the
pAdjustment
option, which specifies the approach used for multiple testing
correction (default is fdr
to use the Benjamini-Hochberg method). Optionally,
it is possible to set the logFC
parameter, which indicates the minimum log2
fold change that features must display for being considered as differentially
expressed. Please note that the simultaneous use of adjusted p-value and logFC
cutoffs is discouraged and not recommended.
In addition to the above mentioned settings, other options can be passed to
the performMirnaDE()
and performGeneDE()
functions. Specifically, depending
on the method adopted for differential expression analysis, the user can finely
control the arguments passed to each function involved in the pipeline. In
particular, the following advanced parameters can be set:
filterByExpr.args
,calcNormFactors.args
,estimateDisp.args
,glmQLFit.args
,glmQLFTest.args
,DESeq.args
,useVoomWithQualityWeights
,voom.args
,lmFit.args
,eBayes.args
,useArrayWeights
,useWsva
,arrayWeights.args
,wsva.args
.useDuplicateCorrelation
correlationBlockVariable
duplicateCorrelation.args
In detail, when using limma-voom strategy, the useVoomWithQualityWeights
parameter tells MIRit whether to use voomWithWualityWeights()
instead of the
standard voom()
function. In the same way, for microarray studies, the
useArrayWeights
specifies whether to consider array quality weights during
the limma
pipeline. Similarly, useWsva
can be set to TRUE to include a
weighted surrogate variable analysis for batch effect correction. Moreover,
useDuplicateCorrelation
can be set to TRUE if you want to consider the effect
of correlated samples through the duplicateCorrelation()
function in limma
.
In this concern, the correlationBlockVariable
specifies the blocking variable
to use. All the other parameters ending with “.args”, accept a list
object
with additional parameters to be passed to the relative functions. In this way,
the user has full control over the strategy used.
For a complete reference on the usage of these parameters, check the help page of these functions. Instead, for further instructions on how to use these tools, please refer to their original manuals, which represent exceptional resources for learning the basics of differential expression analysis:
Even though MIRit implements all the most commonly used strategies for
differential expression analyses, these methods may not be suitable for all kind
of experiments. For instance, expression data deriving from technologies
different from microarrays and RNA-Seq can’t be processed through
performGeneDE()
and performMirnaDE()
functions. Therefore, MIRit grants the
possibility to perform differential expression analysis with every method of
choice, and then add the results to an existing MirnaExperiment
object. This
is particularly valuable for proteomic studies, where different normalization
strategies are used. In this way, MIRit fully supports the use of proteomic data
for conducting miRNA integrative analyses.
To do so, we can make use of the addDifferentialExpression()
function, which
takes as input a MirnaExperiment
object, and a table containing the
differential expression results for all miRNAs/genes analyzed (not just for
statistically significant species). If we want to manually set differential
expression results for both miRNAs and genes, two different tables must be
supplied. These tables must include:
ID
, Symbol
, Gene_Symbol
,
Mirna
, mir
, Gene
, gene.symbol
, Gene.symbol
.logFC
,
log2FoldChange
, FC
, lFC
.AveExpr
,
baseMean
, logCPM
.P.Value
, pvalue
, PValue
, Pvalue
.adj.P.Val
, padj
, FDR
, fdr
, adj
, adj.p
, adjp
.Further, we must specify the cutoff levels used to consider miRNAs/genes as significantly differentially expressed.
Once differential expression analysis has been performed, we can use the
mirnaDE()
and geneDE()
functions to access a table with differentially
expressed features.
## access DE results for genes
deGenes <- geneDE(experiment)
## access DE results for miRNAs
deMirnas <- mirnaDE(experiment)
In addition to tables, we can also generate a graphical overview of
differential expression through volcano plots, which are extremely useful for
visualizing features changing across biological conditions. To produce volcano
plots, MIRit offers the plotVolcano()
function.
## create a volcano plot for genes
geneVolcano <- plotVolcano(experiment,
assay = "genes",
title = "Gene differential expression")
## create a volcano plot for miRNAs
mirnaVolcano <- plotVolcano(experiment,
assay = "microRNA",
title = "miRNA differential expression")
## plot graphs side by side
ggpubr::ggarrange(geneVolcano, mirnaVolcano,
nrow = 1, labels = "AUTO", common.legend = TRUE)
Finally, if we are interested in specific genes/miRNAs, MIRit implements the
plotDE()
function that allows to represent expression changes of specific
features as box plots, bar plots, or violin plots. In our example, we can use
this function to visualize expression changes of different genes involved in
the normal functioning of thyroid gland. Note that we use the linear = FALSE
option to plot data in log scale (useful when multiple genes have very different
expression levels).
## create a bar plot for all thyroid features
plotDE(experiment,
features = c("TG", "TPO", "DIO2", "PAX8"),
graph = "barplot", linear = FALSE)
After finding differentially expressed genes, we usually end up having long lists of features whose expression changes between conditions. However, this is usually not very informative, and we seek to understand which functions result impaired in our experiments. In this regard, different methods exist for determining which cellular processes are dysregulated in our samples.
MIRit supports different strategies for functional enrichment analysis of genes, including over-representation analysis (ORA), gene-set enrichment analysis (GSEA), and Correlation Adjusted MEan RAnk gene set test (CAMERA). In this way, the user can infer compromised biological functions according to the approach of choice.
Among these methods, the first one that has been developed is over-representation analysis (Boyle et al. 2004), often abbreviated as ORA. This analysis aims to define whether genes annotated to specific gene-sets (such as ontological terms or biological pathways) are differentially expressed more than would be expected by chance. To do this, a p-value is calculated by the hypergeometric distribution as in Equation (1).
\[\begin{equation} p = 1 - \sum_{i = 0}^{k - 1}{\frac{\binom{M}{i}\binom{N - M} {n - 1}}{\binom{N}{n}}} \tag{1} \end{equation}\]
Here, \(N\) is the total number of genes tested, \(M\) is the number of genes that are annotated to a particular gene-set, \(n\) is the number of differentially expressed genes, and \(k\) is the number of differentially expressed genes that are annotated to the gene set.
Additionally, another available approach is the gene set enrichment analysis (Subramanian et al. 2005), often refereed to with the acronym GSEA, which is suitable to find categories whose genes change in a small but coordinated way. The GSEA algorithm takes as input a list of genes ranked with a particular criterion, and then walks down the list to evaluate whether members of a given gene set are normally distributed or are mainly present at the top or at the bottom of the list. To check this out, the algorithm uses a running-sum that increases when finding a gene belonging to a given category, and decreases when a gene not contained in that specific set is found. The maximum distance from zero occurred in the running score is defined as the enrichment score (ES). To estimate the statistical significance of enrichment scores, a permutation test is performed by swapping gene labels annotated to a gene set.
Even though GSEA is arguably the most commonly used approach for functional enrichment, Wu and Smyth (2012) demonstrated that inter-gene correlations might affect its reliability. To overcome this issue, they developed the Correlation Adjusted MEan RAnk gene set test (CAMERA). The main advantage of this method is that it adjusts the gene set test statistic according to inter-gene correlations.
As described above, functional enrichment analysis relies on gene-sets, which
consist in collections of genes that are annotated to specific functions or
terms. Independently from the strategy used for the analysis, functional
enrichment methods need access to these properly curated collections of genes.
In the effort of providing access to a vast number of these resources, MIRit
uses the geneset package to support multiple databases,
including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG),
MsigDB, WikiPathways, Reactome, Enrichr, Disease Ontology (DO), Network of
Cancer Genes (NCG), DisGeNET, and COVID-19. However, the majority of these
databases contain multiple categories. To see the complete list of available
gene-sets for each database refer to the documentation of the enrichGenes()
function.
The above-mentioned collections have their own lists of supported species.
To check the available species for a given database, MIRit provides a practical
helper function named supportedOrganisms()
. For example, to retrieve the
species supported by Reactome database, we can simply run the following piece
of code.
## list available species for Reactome database
supportedOrganisms("Reactome")
#> [1] "Bos taurus" "Caenorhabditis elegans"
#> [3] "Danio rerio" "Drosophila melanogaster"
#> [5] "Gallus gallus" "Homo sapiens"
#> [7] "Mus musculus" "Rattus norvegicus"
#> [9] "Saccharomyces cerevisiae" "Sus scrofa"
#> [11] "Xenopus tropicalis"
enrichGenes()
functionThe main function in MIRit for the functional enrichment analysis of genes is
enrichGenes()
, which requires as input:
MirnaExperiment
object that we get after running differential
expression analysis;method
, which specifies the desired approach among ORA
, GSEA
, and
CAMERA
;database
and category
, which define the gene-set that you want to use
(see Section 4.2 for a complete reference of available databases
and categories);organism
, which indicates the specie under investigation (defaults to
“Homo sapiens”);pCutoff
and pAdjustment
, which specify the cutoff for statistical
significance and the multiple testing correction, respectively.In our example, we are going to perform the enrichment analysis by using ORA with GO database (biological processes).
## perform over-representation analysis with GO
ora <- enrichGenes(experiment,
method = "ORA",
database = "GO",
organism = "Homo sapiens")
#> Since not specified, 'category' for GO database is set to bp (default).
#> Reading GO gene-sets from cache...
#> Performing the enrichment of upregulated genes...
#> Performing the enrichment of downregulated genes...
#> The enrichment of genes reported 8 significantly enriched terms for downregulated genes and 0 for upregulated genes.
MIRit performs ORA separately for upregulated and downregulated genes, as it
has been shown to be more powerful compared to enriching all DE genes
(Hong et al. 2014). Therefore, when we use ORA, enrichGenes()
returns a
list
containing two objects of class FunctionalEnrichment
, one storing
enrichment results for upregulated genes, and one for downregulated genes.
Before exploring the results of the analysis, we will also demonstrate the use of GSEA with the gene-sets provided by the KEGG pathway database.
## set seed for reproducible results
set.seed(1234)
## perform gene-set enrichment analysis with KEGG
gse <- enrichGenes(experiment,
method = "GSEA",
database = "KEGG",
organism = "Homo sapiens")
#> Since not specified, 'category' for KEGG database is set to pathway (default).
#> Reading KEGG gene-sets from cache...
#> Ranking genes based on signed.pval...
#> Performing gene-set enrichment analysis (GSEA)...
#> GSEA reported 3 significantly enriched terms.
In this case, the enrichGenes()
function returns a single object of class
FunctionalEnrichment
, containing GSEA results.
After running the enrichGenes()
function, we get FunctionalEnrichment
objects holding the results of enrichment analyses. To access the full table
containing significantly affected functions, we can use the
enrichmentResults()
function. In our case, we can check the results of GSEA
(Table 3).
pathway | pval | padj | log2err | ES | NES | size | leadingEdge |
---|---|---|---|---|---|---|---|
Thyroid hormone synthesis | 0 | 0.02 | 0.54 | -0.67 | -2.02 | 27 | SLC26A4,…. |
Protein processing in endoplasmic reticulum | 0 | 0.02 | 0.52 | -0.55 | -1.90 | 55 | BCL2, HY…. |
Coronavirus disease - COVID-19 | 0 | 0.05 | 0.48 | -0.45 | -1.67 | 85 | JUN, RPL…. |
Further, MIRit offers several options for the visualization of enrichment
analyses, including dot plots and bar plots. These plots are available for
every FunctionalEnrichment
object independently from the method used.
Following our example, we can visualize ORA results that we obtained in Section 4.4 through a simple dot plot.
## create a dot plot for ORA
enrichmentDotplot(ora$downregulated, title = "Depleted functions")
Additionally, MIRit provides specific visualization methods that are exclusive
for GSEA, including ridge plots and GSEA plots. For instance, after running GSEA
through enrichGenes()
, we can produce a GSEA-style plot through the
gseaPlot()
function. In our case, we are going to produce this plot for the
“Thyroid hormone synthesis” pathway.
## create a GSEA plot
gseaPlot(gse, "Thyroid hormone synthesis", rankingMetric = TRUE)
Interestingly, MIRit enables to explore the presence of disease-associated SNPs located in differentially expressed miRNAs. In this concern, SNPs occurring within miRNA loci may have important effects on the biological function of these transcripts, as they might alter their expression or the spectrum of targets. To verify the presence of disease-SNPs within miRNA loci, MIRit directly queries the NHGRI-EBI Catalog of published genome-wide association studies through the gwasrapidd package, and then retains only SNPs that affect DE-miRNA genes or their relative host genes (for intragenic miRNAs).
In our case, there are no SNPs associated with thyroid cancer that occur within differentially expressed miRNAs. Therefore, we will demonstrate the use of this function with SNPs associated with the response to antidepressant drugs.
Now, we can use the findMirnaSNPs()
function to identify the disease-related
SNPs that affect miRNA loci.
## detect SNPs occuring at DE-miRNAs loci
association <- findMirnaSNPs(experiment, "response to antidepressant")
#> Querying GWAS Catalog, this may take some time...
#> Finding genomic information of differentially expressed miRNAs...
#> After the analysis, 1 variants associated with response to antidepressant were found within differentially expressed miRNA genes
If any disease-related SNP is present within DE-miRNA loci, we can use the
mirVariantPlot()
function to graphically build a track plot that displays the
polymorphism along with the relevant genomic context, including the
corresponding miRNA locus.
## create a track plot to represent disease-SNPs at DE-miRNA loci
mirVariantPlot("rs2402960", association, showSequence = FALSE)
Finally, to review the literature supporting the association between SNPs and
specific traits, and possibly with differentially expressed miRNAs, MIRit
provides the getEvidence()
function, which returns a data.frame reporting
some details of the studies where the association is supported.
For example, we can see the list of experiments where the association between rs2402960 and the response to antidepressants was observed.
## retrieve the evidence supporting SNP-trait association
snpEvidence <- getEvidence(variant = "rs2402960",
diseaseEFO = "response to antidepressant")
#> Retrieving biomedical evidence for the association between response to antidepressant and rs2402960 variant...
#> 20 studies reporting this association were found!
## take a look at the evidence table
head(snpEvidence)
#> # A tibble: 6 × 7
#> study_id pubmed_id publication_date publication title author_fullname
#> <chr> <int> <date> <chr> <chr> <chr>
#> 1 GCST003813 27622933 2016-09-13 Transl Psychiatry Anal… Li QS
#> 2 GCST001641 22907730 2012-08-21 Pharmacogenomics J Phar… Ji Y
#> 3 GCST002046 23726668 2013-05-30 J Psychiatr Res A ge… Hunter AM
#> 4 GCST000470 19724244 2009-08-31 Pharmacogenet Gen… Geno… Laje G
#> 5 GCST000471 19736353 2009-09-01 Arch Gen Psychiat… A ge… Ising M
#> 6 GCST000509 19846067 2009-10-19 Biol Psychiatry A ge… Garriock HA
#> # ℹ 1 more variable: author_orcid <chr>
Before performing integrative miRNA-mRNA analyses, we need to identify the targets of differentially expressed miRNAs, so that we can test whether they really affect the levels of their targets or not.
Different resources have been developed over the years to predict and collect miRNA-target interactions, and we can categorize them in two main types:
The choice of which type of resources to use for identifying miRNA targets drastically influences the outcome of the analysis. In this regard, some researchers tend to give the priority to validated interactions, even though they are usually fewer than predicted ones. On the other hand, predicted pairs are much more numerous, but they exhibit a high number of false positive hits.
The downside of miRNA target prediction algorithms is also the scarce extend of overlap existing between different tools. To address this issue, several ensemble methods have been developed, trying to aggregate the predictions obtained by different algorithms. Initially, several researchers determined as significant miRNA-target pairs those predicted by more than one tool (intersection method). However, this method is not able to capture an important number of meaningful interactions. Alternatively, other strategies used to merge predictions from several algorithms (union method). Despite identifying more true relationships, the union method leads to a higher proportion of false discoveries. Therefore, other ensemble methods started using other statistics to rank miRNA-target predictions obtained by multiple algorithms. Among these newly developed ensemble methods, one of the best performing one is the microRNA Data Integration Portal (mirDIP) database, which aggregates miRNA target predictions from 24 different resources by using an integrated score inferred from different prediction metrics. In this way, mirDIP reports more accurate predictions compared to those of individual tools. For additional information on mirDIP database and its ranking metric refer to Tokar et al. (2018) and Hauschild et al. (2023).
getTargets()
Given the above, MIRit allows the prediction of miRNA-target interactions via
the mirDIP database (version 5.2). In addition, in order to raise the number
of true interactions, MIRit combines the miRNA-target pairs returned by mirDIP
with the experimentally validated interactions contained in miRTarBase
(version 9) (Huang et al. 2022). In practice, to identify miRNA targets,
MIRit implements the getTargets()
function. Specifically, this function
includes a parameter called score
that determines the degree of confidence
required for the targets predicted by mirDIP. The value of this parameter must
be one of Very High
, High
(default), Medium
, and Low
, which correspond
to ranks among top 1%, top 5%, top 1/3, and remaining predictions, respectively.
Moreover, the includeValidated
parameter tells MIRit whether to include
experimentally validated interactions deriving from miRTarBase. It is also
possible (with the evidence
parameter) to consider all interactions in
miRTarBase, or just limiting the retrieval to those interactions with strong
experimental evidence. Please note that mirDIP database is only available for
human miRNAs; thus, for species other than Homo sapiens, only validated
interactions contained in miRTarBase are used.
In our example, we are going to retrieve both predicted and validated interactions by using default settings.
## retrieve miRNA target genes
experiment <- getTargets(experiment)
#> Retrieving targets from mirDIP (this may take a while)...
#>
#> Loading miRTarBase from cache...
#> Merging predicted and validated results...
#> 7596 miRNA-target pairs have been identified for the 22 differentially expressed miRNAs.
After running this function, we obtain a MirnaExperiment
object containing
miRNA-target interactions in its targets
slot. The user can access a
data.frame
detailing these interactions through the mirnaTargets()
function.
Now that we have defined the targets of differentially expressed miRNAs, we can continue with the integrative analysis of miRNA and gene expression levels. The purpose of this analysis is to only consider miNA-target pairs where an inverse relationship is observed.
As already mentioned, MIRit can work with both paired and unpaired data by using different statistical approaches, including:
fry
function from
limma package.For unpaired data, only association tests and rotation gene-set tests are
available, whereas correlation analysis is the best performing strategy for
paired data. The integrative analysis, either performed through correlation,
association tests, or rotation gene-set tests, is implemented in the
mirnaIntegration()
function. When using the default option test = "auto"
,
MIRit automatically performs the appropriate test for paired and unpaired
samples. If only some samples of the dataset have paired measurements, a
correlation analysis will be carried out only for those subjects.
When both miRNA and gene expression measurements are available for the same samples, a correlation analysis is the recommended procedure. In statistics, correlation is a measure that expresses the extent to which two random variables are dependent. In our case, we want to assess whether a statistical relationship is present between the expression of a miRNA and the expression of its targets.
Several statistical coefficients can be used to weigh the degree of a
correlation. Among them, the most commonly used are
Pearson’s correlation coefficient \(r\), Spearman’s correlation coefficient
\(\rho\), and Kendall’s Tau-b correlation coefficient \(\tau_b\). Pearson’s \(r\) is
probably the most diffused for determining the association between miRNA and
gene expression. However, it assumes that the relationship between miRNA and
gene expression values is linear. This is typically not true for miRNAs, whose
interactions with their targets are characterized by imperfect complementarity.
Additionally, miRNAs can target multiple genes with different binding sites, and
this implies that a simple linear relationship may not be sufficient to properly
model the complexity of these interactions. In contrast, Spearman’s and
Kendall’s Tau-b correlation coefficients result more suitable for representing
the interplay between miRNAs and targets, because they are robust to non-linear
relationships and outliers. However, Kendall’s correlation just relies on the
number of concordant and discordant pairs, and is less sensitive then Spearman’s
correlation; so, when many ties are present or when the sample size is small, it
may have a lower detection power. This is the reason why
Spearman’s correlation coefficient is the default used in the
mirnaIntegration()
function. Moreover, since miRNAs mainly act as negative
regulators, only negatively correlated miRNA-target pairs are considered, and
statistical significance is estimated through a one-tailed t-test.
To sum up the steps that MIRit follows when evaluating the correlation between
miRNAs and genes, what the mirnaIntegration()
function does during a
correlation analysis is to:
getTargets()
function;In our thyroid cancer example, we want to find the miRNA-target pairs that exhibit a negative correlation with a Spearman’s coefficient lower than -0.5 and with an adjusted p-value less than 0.05.
## perform a correlation analysis
experiment <- mirnaIntegration(experiment, test = "correlation")
#> As specified by the user, a correlation will be used.
#> Performing Spearman's correlation analysis...
#> A statistically significant correlation between 106 miRNA-target pairs was found!
Please note that all the parameters used for the correlation analysis are
customizable. For instance, the user can change the significance threshold and
the multiple testing correction method by setting the pCutoff
and
pAdjustment
parameters, respectively. Further, it is also possible to change
the correlation coefficient used, by editing the corMethod
option, and the
minimum required value of the correlation coefficient, by changing the
corCutoff
setting.
Sometimes, when exploring expression variability through MDS plots, as we do
with the plotDimensions()
function, we notice the presence of batch effects
that prevent a clear separation of our biological groups. Batch effects
consist in unwanted sources of technical variation that confound expression
variability and limit downstream analyses. Since the reliability of biological
conclusions depends on the correlation between miRNAs and genes,
it is pivotal to ensure that expression measurements are scarcely affected by
technical artifacts. In this regard, if strong batch effects are noticed in the
data, MIRit provides the batchCorrection()
function, which removes batch
effects prior to correlation analysis. Please note that this procedure cannot be
used before differential expression testing, because for that purpose it is more appropriate to include batch variables in the linear model, as specified in
Section 3.2.2. For additional information, please refer to the manual of
the batchCorrection()
function.
Before moving to the identification of the altered miRNAs regulatory networks,
we can explore correlated miRNA-target pairs thanks to the integration()
function, which returns a data.frame
object with comprehensive details about
the computed interactions.
## extract correlation results
integrationResults <- integration(experiment)
## take a look at correlation results
head(integrationResults)
#> microRNA Target microRNA.Direction Corr.Coefficient
#> hsa.miR.138.1.3p hsa-miR-138-1-3p CCDC18 downregulated -0.6235294
#> hsa.miR.138.1.3p.1 hsa-miR-138-1-3p CCND2 downregulated -0.8941176
#> hsa.miR.138.1.3p.6 hsa-miR-138-1-3p LRP2BP downregulated -0.5529412
#> hsa.miR.138.1.3p.9 hsa-miR-138-1-3p TNNI1 downregulated -0.7764706
#> hsa.miR.138.5p hsa-miR-138-5p CLCN2 downregulated -0.7029412
#> hsa.miR.138.5p.3 hsa-miR-138-5p DTX4 downregulated -0.7441176
#> Corr.P.Value Corr.Adjusted.P.Val
#> hsa.miR.138.1.3p 4.927825e-03 1.448662e-02
#> hsa.miR.138.1.3p.1 1.505655e-06 9.184495e-05
#> hsa.miR.138.1.3p.6 1.315725e-02 3.309659e-02
#> hsa.miR.138.1.3p.9 2.021754e-04 1.710951e-03
#> hsa.miR.138.5p 1.193476e-03 4.773905e-03
#> hsa.miR.138.5p.3 4.741414e-04 2.754536e-03
Additionally, MIRit allows to graphically represent inverse correlations through
a scatter plot. To do so, we can use the plotCorrelation()
function to display
the correlation between specific miRNA-target pairs. For example, we can plot
the existing correlation between miR-146b-5p and DIO2, which is crucial for
thyroid hormone functioning. Furthermore, we can also show how the upregulation
of miR-146b-3p is associated with the downregulation of PAX8, which directly
induces TG transcription.
## plot the correlation between miR-146b-5p and DIO2
cor1 <- plotCorrelation(experiment,
mirna = "hsa-miR-146b-5p",
gene = "DIO2",
condition = "disease")
## plot the correlation between miR-146b-3p and PAX8
cor2 <- plotCorrelation(experiment,
mirna = "hsa-miR-146b-3p",
gene = "PAX8",
condition = "disease")
## plot graphs side by side
ggpubr::ggarrange(cor1, cor2, nrow = 1,
labels = "AUTO", common.legend = TRUE)
For unpaired data, we cannot directly quantify the influence of miRNA expression on the levels of their targets, because we do not have any sample correspondence between miRNA and gene measurements. However, one-sided association tests can be applied in these cases to evaluate if targets of downregulated miRNAs are statistically enriched in upregulated genes, and, conversely, if targets of upregulated miRNAs are statistically enriched in downregulated genes. In this regard, to estimate the effects of differentially expressed miRNAs on their targets, MIRit can use two different one-sided association tests, namely:
Both these tests consist in a statistical procedure that estimates the association between two dichotomous categorical variables. In our case, for each miRNA, we want to evaluate whether the proportion of targets within the differentially expressed genes significantly differs from the proportion of targets in non-differentially expressed genes. To do this, a 2x2 contingency table is built as shown in Table 4.
Target genes | Non target genes | Row total | |
---|---|---|---|
Differentially expressed | \(a\) | \(b\) | \(a + b\) |
Non differentially expressed | \(c\) | \(d\) | \(c + d\) |
Column total | \(a + c\) | \(b + d\) | \(a + b + c + d = n\) |
When the contingency table has been defined, Fisher’s exact test p-value can be calculated through Equation (2).
\[\begin{equation} p = \frac{(a + b)!\ (c + d)!\ (a + c)!\ (b + d)!}{a!\ b!\ c!\ d!\ n!} \tag{2} \end{equation}\]
Additionally, it is also possible to compute Fisher’s p-values with Lancaster’s mid-p adjustment, since it has been proven that it increases statistical power while retaining Type I error rates.
The major drawback of the Fisher’s exact test is that it consists in a conditional test that requires the sum of both rows and columns of a contingency table to be fixed. Notably, this is not true for genomic data because it is likely that different datasets may lead to a different number of DEGs. Therefore, the default behavior in MIRit is to use a variant of Barnard’s exact test, named Boschloo’s exact test, that is suitable when group sizes of contingency tables are variable. Moreover, it is possible to demonstrate that Boschloo’s exact test is uniformly more powerful compared to Fisher’s one. However, keep in mind that Boschloo’s test is much more computational intensive compared to Fisher’s exact test, and it may require some time, even though parallel computing is employed.
In MIRit, the mirnaIntegration()
function automatically performs association
tests for unpaired data when test = "auto"
. Moreover, the type of association
test to use can be specified through the associationMethod
parameter, which
can be set to:
fisher
, to perform a simple one-sided Fisher’s exact test;fisher-midp
, to perform a one-sided Fisher’s exact test with Lancaster’s
mid-p correction; andboschloo
, to perform a one-sided Boschloo’s exact test (default option).For example, we could use Fisher’s exact test with mid-p correction to evaluate the inverse association between miRNA and gene expression.
## perform a one-sided inverse association
exp.association <- mirnaIntegration(experiment,
test = "association",
associationMethod = "fisher-midp",
pCutoff = 0.2,
pAdjustment = "none")
#> As specified by the user, a association will be used.
#> Performing One-sided Fisher's exact test with Lancaster's mid-p correction...
#> A statistically significant association between 3 DE-miRNAs and 23 genes was found!
In the end, results can be accessed through the integration()
function in the
same way as we can do for correlation analyses.
Lastly, for unpaired data, the effect of DE-miRNAs on the expression of target
genes can be estimated through rotation gene-set tests. In this approach, we
want to evaluate for each miRNA whether its target genes tend to be
differentially expressed in the opposite direction. In particular, a fast
approximation to rotation gene-set testing called fry
, implemented in the
limma package, can be used to statistically quantify the
impact of miRNAs on expression changes of their targets.
To perform the integrative analysis through rotation gene-set tests, we must
simply set test = "fry"
when calling the mirnaIntegration()
function.
## perform the integrative analysis through 'fry' method
exp.fry <- mirnaIntegration(experiment,
test = "fry",
pAdjustment = "none")
#> As specified by the user, a fry will be used.
#> Performing miRNA-gene integrative analysis using 'fry' method...
#> A statistically significant association between 2 DE-miRNAs and 17 genes was found!
After finding influential miRNA-target pairs, we can try to identify the
consequences of miRNomic alterations through ORA. To do this, MIRit provides the
enrichTargets()
function, which automatically performs ORA for target genes
that result associated with differentially expressed miRNAs.
In our example, we are going to enrich the significantly anti-correlated targets that we have found in Section 7.1.2 through the Disease Ontology database.
## enrichment of integrated targets
oraTarg <- enrichTargets(experiment, database = "DO")
#> Reading DO gene-sets from cache...
#> Performing the enrichment of upregulated genes...
#> Performing the enrichment of downregulated genes...
#> The enrichment of genes reported 61 significantly enriched terms for downregulated genes and 0 for upregulated genes.
## show a dot plot for the enrichment of downregulated genes
enrichmentDotplot(oraTarg$downregulated,
showTerms = 7,
showTermsParam = "padj",
title = "Depleted diseases")
In Figure 8, we appreciate the depletion of diseases where thyroid gland is overly active, such as goiter and hyperthyroidism, therefore suggesting the involvement of miRNAs in thyroid malfunctioning.
Once the dysregulated miRNA-mRNA pairs have been identified, the typical goal is to infer altered cellular processes and networks. To do so, MIRit introduces a novel approach named Topology-Aware Integrative Pathway Analysis (TAIPA), which specifically focuses on detecting compromised molecular networks in miRNA-mRNA multi-omic analyses by considering the topology of biological pathways and miRNA-interactions interactions.
In this analysis, biological pathways are retrieved from a pathway database such as KEGG, and the interplay between miRNAs and genes is then added to the networks. Each network is defined as a graph \(G(V, E)\), where \(V\) represents nodes, and \(E\) represents the relationships between nodes. Then, nodes that are not significantly differentially expressed are assigned a weight \(w_i = 1\), whereas differentially expressed nodes are assigned a weight \(w_i = \left| \Delta E_i \right|\), where \(\Delta E_i\) is the linear fold change of the node. Moreover, to consider the biological interaction between two nodes, namely \(i\) and \(j\), we define an interaction parameter \(\beta_{i \rightarrow j} = 1\) for activation interactions and \(\beta_{i \rightarrow j} = -1\) for repression interactions. Subsequently, the concordance coefficient \(\gamma_{i \rightarrow j}\) is defined as in Equation (3):
\[\begin{equation} \gamma_{i \rightarrow j} = \begin{cases} \beta_{i \rightarrow j} &\text{if } sign(\Delta E_i) = sign(\Delta E_j) \\ - \beta_{i \rightarrow j} &\text{if } sign(\Delta E_i) \not= sign(\Delta E_j) \end{cases}\,. \tag{3} \end{equation}\]
Later in the process, a breadth-first search (BFS) algorithm is applied to topologically sort pathway nodes so that each individual node occurs after all its upstream nodes. Nodes within cycles are considered leaf nodes. At this point, a node score \(\phi\) is calculated for each pathway node \(i\) as in Equation (4):
\[\begin{equation} \phi_i = w_i + \sum_{j=1}^{U} \gamma_{i \rightarrow j} \cdot k_j\,, \tag{4} \end{equation}\]
where \(U\) represents the number of upstream nodes, \(\gamma_{i \rightarrow j}\) denotes the concordance coefficient, and \(k_j\) is a propagation factor defined as in Equation (5):
\[\begin{equation} k_j = \begin{cases} w_j &\text{if } \phi_j = 0 \\ \phi_j &\text{if } \phi_j \not = 0 \end{cases}\,. \tag{5} \end{equation}\]
Finally, the pathway score \(\Psi\) is calculated as in Equation (6):
\[\begin{equation} \Psi = \frac{1 - M}{N} \cdot \sum_{i=1}^{N} \phi_i\,, \tag{6} \end{equation}\]
where \(M\) represents the proportion of miRNAs in the pathway, and \(N\) represents the total number of nodes in the network. Then, to compute the statistical significance of each pathway score, a permutation procedure is applied. Later, both observed pathway scores and permuted scores are standardized by subtracting the mean score of the permuted sets \(\mu_{\Psi_P}\) and then dividing by the standard deviation of the permuted scores \(\sigma_{\Psi_P}\).
Finally, the p-value is defined based on the fraction of permutations that reported a higher normalized pathway score than the observed one. However, to prevent p-values equal to zero, we define p-values as in Equation (7):
\[\begin{equation} p = \frac{\sum_{n=1}^{N_p} \left[ \Psi_{P_N} \ge \Psi_N \right] + 1} {N_p + 1}\,. \tag{7} \end{equation}\]
In the end, either p-values are corrected for multiple testing through the max-T procedure (default option) which is particularly suited for permutation tests, or through the standard multiple testing approaches.
Before performing TAIPA, we need to create miRNA-augmented networks. To do so,
MIRit implements the preparePathways()
function, which automatically uses the
graphite R package to download biological networks from multiple
pathway databases, namely KEGG
, WikiPathways
and Reactome
. Then, each
pathway is converted to a graph
object and significant miRNA-mRNA pairs are
added to the network. Further, edge weights are included according to
interaction type. After running this function, we obtain a list
containing
all the miRNA-augmented networks as graph
objects.
In our example, we want to use the significant miRNA-target pairs that we identified in Section 7.1.2 to augment biological pathways retrieved from the KEGG database.
## create miRNA-augmented networks using KEGG pathways
networks <- preparePathways(experiment,
database = "KEGG",
organism = "Homo sapiens",
minPc = 20)
#> Reading KEGG pathways from cache...
#> Adding miRNA-gene interactions to biological pathways...
#> Loading required package: graph
#> Warning: 292 pathways have been ignored because they contain too few nodes with
#> gene expression measurement.
#> Performing topological sorting of pathway nodes...
After running this function, pathways with less than 20% of nodes with
expression measurements are removed. This option can be changed by specifying
the minPc
parameter (default is 10%).
Now, we are ready to perform TAIPA through the topologicalAnalysis()
function,
which is used to calculate pathway scores for all the augmented networks and
evaluate their statistical significance through permutation testing. For
demonstration purposes, we only considered a smaller subset of augmented
pathways.
## only consider a smaller set of augmented networks
networks <- networks[seq(15, 30)]
## set seed for reproducible results
set.seed(1234)
## perform TAIPA with 1000 permutations
taipa <- topologicalAnalysis(experiment,
pathways = networks,
nPerm = 1000)
#> Calculating pathway scores...
#> Generating random permutations...
#> Calculating p-values with 1000 permutations...
#> Correcting p-values through the max-T procedure...
#> The topologically-aware integrative pathway analysis reported 2 significantly altered pathways!
As a result of the analysis, an object of class IntegrativePathwayAnalysis
storing the results of TAIPA is returned. Notably, the user can change the
behavior of topologicalAnalysis()
in several ways. For example, the pCutoff
and pAdjustment
parameters can be used to change the significance threshold
and the multiple testing correction method, respectively. Moreover, the nPerm
parameter can be tweaked to change the number of permutations used for
evaluating statistical significance. In this regard, we recommend using at least
10000 permutations, with no less than 1000.
BiocParallel
For computational efficiency, pathway score computation has been implemented in
C++ language. Furthermore, since computing pathway score for 10000 networks for
each pathway is computationally intensive, parallel computing has been employed
to reduce running time. The user can modify the parallel computing behavior by
specifying the BPPARAM
parameter. See the help page of the
BiocParallel package for further details. Both the
preparePathways()
and the topologicalAnalysis()
functions accept the
BPPARAM
option.
After running the topologicalAnalysis()
function, we can inspect the
significantly perturbed pathways contained in the IntegrativePathwayAnalysis
object by using the integratedPathways()
function, which returns a
data.frame
reporting the results.
## extract the results of TAIPA
perturbedNetworks <- integratedPathways(taipa)
As with functional enrichment analyses, we can plot perturbed miRNA-mRNA
networks as dotplots. To do so, the integrationDotplot()
function can be used.
## produce a dotplot that shows the most affected networks
integrationDotplot(taipa)
Finally, MIRit provides the possibility of exploring the molecular
perturbations. In this concern, the visualizeNetworks()
function can be used
to visually reconstruct the compromised pathways along with expression changes
of both miRNAs and genes, so that users can easily interpret the functional
consequences of miRNA and gene dysregulations. For example, we can explore the
perturbed molecular events that are responsible for diminished production of
thyroid hormone in thyroid cancer.
## plot the impaired network responsible for reduced TG synthesis
visualizeNetwork(taipa, "Thyroid hormone synthesis")
#> R version 4.4.0 beta (2024-04-15 r86425)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] graph_1.82.0 MIRit_1.0.0
#> [3] MultiAssayExperiment_1.30.0 SummarizedExperiment_1.34.0
#> [5] Biobase_2.64.0 GenomicRanges_1.56.0
#> [7] GenomeInfoDb_1.40.0 IRanges_2.38.0
#> [9] S4Vectors_0.42.0 BiocGenerics_0.50.0
#> [11] MatrixGenerics_1.16.0 matrixStats_1.3.0
#> [13] BiocStyle_2.32.0
#>
#> loaded via a namespace (and not attached):
#> [1] splines_4.4.0 BiocIO_1.14.0 bitops_1.0-7
#> [4] urltools_1.7.3 filelock_1.0.3 tibble_3.2.1
#> [7] cellranger_1.1.0 triebeard_0.4.1 XML_3.99-0.16.1
#> [10] rpart_4.1.23 lifecycle_1.0.4 httr2_1.0.1
#> [13] rstatix_0.7.2 edgeR_4.2.0 lattice_0.22-6
#> [16] ensembldb_2.28.0 backports_1.4.1 magrittr_2.0.3
#> [19] limma_3.60.0 Hmisc_5.1-2 sass_0.4.9
#> [22] rmarkdown_2.26 jquerylib_0.1.4 yaml_2.3.8
#> [25] Gviz_1.48.0 cowplot_1.1.3 DBI_1.2.2
#> [28] RColorBrewer_1.1-3 lubridate_1.9.3 abind_1.4-5
#> [31] zlibbioc_1.50.0 quadprog_1.5-8 purrr_1.0.2
#> [34] AnnotationFilter_1.28.0 biovizBase_1.52.0 RCurl_1.98-1.14
#> [37] nnet_7.3-19 VariantAnnotation_1.50.0 rappdirs_0.3.3
#> [40] GenomeInfoDbData_1.2.12 codetools_0.2-20 DelayedArray_0.30.0
#> [43] xml2_1.3.6 tidyselect_1.2.1 UCSC.utils_1.0.0
#> [46] farver_2.1.1 BiocFileCache_2.12.0 base64enc_0.1-3
#> [49] GenomicAlignments_1.40.0 jsonlite_1.8.8 Formula_1.2-5
#> [52] tools_4.4.0 progress_1.2.3 Rcpp_1.0.12
#> [55] glue_1.7.0 gridExtra_2.3 SparseArray_1.4.0
#> [58] BiocBaseUtils_1.6.0 xfun_0.43 dplyr_1.1.4
#> [61] withr_3.0.0 BiocManager_1.30.22 fastmap_1.1.1
#> [64] latticeExtra_0.6-30 fansi_1.0.6 digest_0.6.35
#> [67] timechange_0.3.0 R6_2.5.1 mime_0.12
#> [70] colorspace_2.1-0 jpeg_0.1-10 dichromat_2.0-0.1
#> [73] biomaRt_2.60.0 RSQLite_2.3.6 utf8_1.2.4
#> [76] tidyr_1.3.1 generics_0.1.3 data.table_1.15.4
#> [79] rtracklayer_1.64.0 prettyunits_1.2.0 httr_1.4.7
#> [82] htmlwidgets_1.6.4 S4Arrays_1.4.0 pkgconfig_2.0.3
#> [85] gtable_0.3.5 blob_1.2.4 XVector_0.44.0
#> [88] htmltools_0.5.8.1 carData_3.0-5 geneset_0.2.7
#> [91] bookdown_0.39 fgsea_1.30.0 ProtGenerics_1.36.0
#> [94] scales_1.3.0 png_0.1-8 knitr_1.46
#> [97] rstudioapi_0.16.0 rjson_0.2.21 checkmate_2.3.1
#> [100] curl_5.2.1 cachem_1.0.8 stringr_1.5.1
#> [103] parallel_4.4.0 foreign_0.8-86 AnnotationDbi_1.66.0
#> [106] restfulr_0.0.15 pillar_1.9.0 grid_4.4.0
#> [109] vctrs_0.6.5 ggpubr_0.6.0 car_3.1-2
#> [112] dbplyr_2.5.0 cluster_2.1.6 htmlTable_2.4.2
#> [115] Rgraphviz_2.48.0 evaluate_0.23 GenomicFeatures_1.56.0
#> [118] cli_3.6.2 locfit_1.5-9.9 compiler_4.4.0
#> [121] Rsamtools_2.20.0 rlang_1.1.3 crayon_1.5.2
#> [124] ggsignif_0.6.4 labeling_0.4.3 interp_1.1-6
#> [127] plyr_1.8.9 stringi_1.8.3 deldir_2.0-4
#> [130] BiocParallel_1.38.0 assertthat_0.2.1 munsell_0.5.1
#> [133] Biostrings_2.72.0 lazyeval_0.2.2 Matrix_1.7-0
#> [136] BSgenome_1.72.0 hms_1.1.3 bit64_4.0.5
#> [139] ggplot2_3.5.1 KEGGREST_1.44.0 statmod_1.5.0
#> [142] highr_0.10 broom_1.0.5 memoise_2.0.1
#> [145] bslib_0.7.0 fastmatch_1.1-4 gwasrapidd_0.99.17
#> [148] bit_4.0.5 readxl_1.4.3 MonoPoly_0.3-10
Boyle, Elizabeth I., Shuai Weng, Jeremy Gollub, Heng Jin, David Botstein, J. Michael Cherry, and Gavin Sherlock. 2004. “GO::TermFinder—Open Source Software for Accessing Gene Ontology Information and Finding Significantly Enriched Gene Ontology Terms Associated with a List of Genes.” Bioinformatics 20 (18): 3710–5. https://doi.org/10.1093/bioinformatics/bth456.
Carvalho, Benilton S., and Rafael A. Irizarry. 2010. “A Framework for Oligonucleotide Microarray Preprocessing.” Bioinformatics 26 (19): 2363–7. https://doi.org/10.1093/bioinformatics/btq431.
Hauschild, Anne-Christin, Chiara Pastrello, Gitta Kirana Anindya Ekaputeri, Dylan Bethune-Waddell, Mark Abovsky, Zuhaib Ahmed, Max Kotlyar, Richard Lu, and Igor Jurisica. 2023. “MirDIP 5.2: Tissue Context Annotation and Novel microRNA Curation.” Nucleic Acids Research 51 (D1): D217–D225. https://doi.org/10.1093/nar/gkac1070.
Hong, Guini, Wenjing Zhang, Hongdong Li, Xiaopei Shen, and Zheng Guo. 2014. “Separate Enrichment Analysis of Pathways for up- and Downregulated Genes.” Journal of the Royal Society Interface 11 (92): 20130950. https://doi.org/10.1098/rsif.2013.0950.
Huang, Hsi-Yuan, Yang-Chi-Dung Lin, Shidong Cui, Yixian Huang, Yun Tang, Jiatong Xu, Jiayang Bao, et al. 2022. “miRTarBase Update 2022: An Informative Resource for Experimentally Validated miRNA–Target Interactions.” Nucleic Acids Research 50 (D1): D222–D230. https://doi.org/10.1093/nar/gkab1079.
Ramos, Marcel, Lucas Schiffer, Angela Re, Rimsha Azhar, Azfar Basunia, Carmen Rodriguez, Tiffany Chan, et al. 2017. “Software for the Integration of Multiomics Experiments in Bioconductor.” Cancer Research 77 (21): e39–e42. https://doi.org/10.1158/0008-5472.CAN-17-0344.
Riesco-Eizaguirre, Garcilaso, León Wert-Lamas, Javier Perales-Patón, Ana Sastre-Perona, Lara P. Fernández, and Pilar Santisteban. 2015. “The miR-146b-3p/PAX8/NIS Regulatory Circuit Modulates the Differentiation Phenotype and Function of Thyroid Cells During Carcinogenesis.” Cancer Research 75 (19): 4119–30. https://doi.org/10.1158/0008-5472.CAN-14-3547.
Subramanian, Aravind, Pablo Tamayo, Vamsi K. Mootha, Sayan Mukherjee, Benjamin L. Ebert, Michael A. Gillette, Amanda Paulovich, et al. 2005. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences 102 (43): 15545–50. https://doi.org/10.1073/pnas.0506580102.
Tokar, Tomas, Chiara Pastrello, Andrea E M Rossos, Mark Abovsky, Anne-Christin Hauschild, Mike Tsay, Richard Lu, and Igor Jurisica. 2018. “mirDIP 4.1—Integrative Database of Human microRNA Target Predictions.” Nucleic Acids Research 46 (D1): D360–D370. https://doi.org/10.1093/nar/gkx1144.
Wu, Di, and Gordon K. Smyth. 2012. “Camera: A Competitive Gene Set Test Accounting for Inter-Gene Correlation.” Nucleic Acids Research 40 (17): e133. https://doi.org/10.1093/nar/gks461.