1. Introduction
MicrobiotaProcess
is an R package for analysis, visualization and biomarker discovery of microbial datasets. It supports the import of microbiome census data, calculating alpha index and provides functions to visualize rarefaction curves. Moreover, it also supports visualizing the abundance of taxonomy of samples. And It also provides functions to perform the PCA
, PCoA
and hierarchical cluster analysis. In addition, MicrobiotaProcess
also provides a method for the biomarker discovery of metagenome or other datasets.
2. MicrobiotaProcess
profiling
2.1 import function
MicrobiotaProcess
has an feature to import phylogenetic sequencing data from dada2(Callahan et al. 2016) and qiime2(Bolyen et al. 2019) taxonomic clustering pipelines. The resulting object after import is phyloseq object(McMurdie and Holmes 2013)
# import data from dada2 pipeline.
seqtabfile <- system.file("extdata", "seqtab.nochim.rds", package="MicrobiotaProcess")
seqtab <- readRDS(seqtabfile)
taxafile <- system.file("extdata", "taxa_tab.rds",package="MicrobiotaProcess")
seqtab <- readRDS(seqtabfile)
taxa <- readRDS(taxafile)
# the seqtab and taxa are output of dada2
sampleda <- system.file("extdata", "mouse.time.dada2.txt", package="MicrobiotaProcess")
ps_dada2 <- import_dada2(seqtab=seqtab, taxatab=taxa, sampleda=sampleda)
ps_dada2
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 232 taxa and 19 samples ]
## sample_data() Sample Data: [ 19 samples by 2 sample variables ]
## tax_table() Taxonomy Table: [ 232 taxa by 7 taxonomic ranks ]
## refseq() DNAStringSet: [ 232 reference sequences ]
# import data from qiime2 pipeline
otuqzafile <- system.file("extdata", "table.qza", package="MicrobiotaProcess")
taxaqzafile <- system.file("extdata", "taxa.qza", package="MicrobiotaProcess")
mapfile <- system.file("extdata", "metadata_qza.txt", package="MicrobiotaProcess")
ps_qiime2 <- import_qiime2(otuqza=otuqzafile, taxaqza=taxaqzafile, mapfilename=mapfile)
ps_qiime2
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 138 taxa and 87 samples ]
## sample_data() Sample Data: [ 87 samples by 23 sample variables ]
## tax_table() Taxonomy Table: [ 138 taxa by 8 taxonomic ranks ]
## refseq() DNAStringSet: [ 138 reference sequences ]
2.2 Rarefaction visualization
Rarefaction, based on sampling technique, was used to compensate for the effect of sample size on the number of units observed in a sample(Siegel 2004). MicrobiotaProcess
provided ggrarecurve
to plot the curves, based on rrarefy
of vegan(Oksanen et al. 2019).
2.3 calculate alpha index and visualization
MicrobiotaProcess
provides get_alphaindex
to calculate alpha index and the ggbox
to visualize the result
## Observe Chao1 ACE Shannon Simpson J sample time
## F3D0 102 102.10000 102.57080 3.822501 0.9633082 0.8264916 F3D0 Early
## F3D1 99 99.14286 99.48498 3.991029 0.9709703 0.8685363 F3D1 Early
## F3D141 74 74.00000 74.00000 3.463816 0.9517886 0.8047779 F3D141 Late
## F3D142 48 48.00000 48.00000 3.116764 0.9386549 0.8051155 F3D142 Late
## F3D143 56 56.00000 56.00000 3.292717 0.9464422 0.8179949 F3D143 Late
## F3D144 47 47.00000 47.00000 2.989304 0.9306528 0.7764129 F3D144 Late
2.4 The visualization of taxonomy abundance
MicrobiotaProcess
presents the ggbartax
for the visualization of composition of microbial communities.
# relative abundance
otubar <- ggbartax(obj=ps_dada2) +
xlab(NULL) +
ylab("relative abundance (%)")
otubar
If you want to get the abundance of specific levels of class, You can use get_taxadf
then use ggbartax
to visualize.
phytax <- get_taxadf(obj=ps_dada2, taxlevel=2)
phybar <- ggbartax(obj=phytax) +
xlab(NULL) + ylab("relative abundance (%)")
phybar
Moreover, the abundance (count) of taxonomy also can be visualized by setting count to TRUE, and the facet of plot can be showed by setting the facetNames.
2.5 PCA and PCoA analysis
PCA
(Principal component analysis) and PCoA
(Principal Coordinate Analysis) are general statistical procedures to compare groups of samples. And PCoA
can based on the phylogenetic or count-based distance metrics, such as Bray-Curtis
, Jaccard
, Unweighted-UniFrac
and weighted-UniFrac
. MicrobiotaProcess
presents the get_pca
, get_pcoa
and ggordpoint
for the analysis.
pcares <- get_pca(obj=ps_dada2, method="hellinger")
# Visulizing the result
pcaplot <- ggordpoint(obj=pcares, biplot=TRUE, speciesannot=TRUE,
factorNames=c("time"), ellipse=TRUE) +
scale_colour_manual(values=c("#2874C5", "#EABF00")) +
scale_fill_manual(values=c("#2874C5", "#EABF00"))
pcaplot
pcoares <- get_pcoa(obj=ps_dada2, distmethod="euclidean", method="hellinger")
# Visualizing the result
pcoaplot <- ggordpoint(obj=pcoares, biplot=TRUE, speciesannot=TRUE,
factorNames=c("time"), ellipse=TRUE) +
scale_colour_manual(values=c("#2874C5", "#EABF00")) +
scale_fill_manual(values=c("#2874C5", "#EABF00"))
pcoaplot
2.6 Hierarchical cluster analysis
beta diversity
metrics can assess the differences between microbial communities. It can be visualized with PCA
or PCoA
, this can also be visualized with hierarchical clustering. MicrobiotaProcess
also implements the analysis based on ggtree(Yu et al. 2017).
hcsample <- get_clust(obj=ps_dada2, distmethod="euclidean",
method="hellinger", hclustmethod="average")
# rectangular layout
clustplot1 <- ggclust(obj=hcsample,
layout = "rectangular",
pointsize=1,
fontsize=0,
factorNames=c("time")) +
scale_color_manual(values=c("#2874C5", "#EABF00")) +
theme_tree2(legend.position="right",
plot.title = element_text(face="bold", lineheight=25,hjust=0.5))
clustplot1
3. Need helps?
If you have questions/issues, please visit github issue tracker.
4. Session information
Here is the output of sessionInfo() on the system on which this document was compiled:
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] MicrobiotaProcess_1.2.2 tidytree_0.3.3 treeio_1.14.3
## [4] ggtree_2.4.1 phyloseq_1.34.0 forcats_0.5.1
## [7] stringr_1.4.0 dplyr_1.0.5 purrr_0.3.4
## [10] readr_1.4.0 tidyr_1.1.3 tibble_3.1.0
## [13] tidyverse_1.3.1 DT_0.18 ggplot2_3.3.3
##
## loaded via a namespace (and not attached):
## [1] TH.data_1.0-10 colorspace_2.0-0 ggsignif_0.6.1
## [4] modeltools_0.2-23 ellipsis_0.3.1 XVector_0.30.0
## [7] fs_1.5.0 aplot_0.0.6 rstudioapi_0.13
## [10] farver_2.1.0 ggrepel_0.9.1 mvtnorm_1.1-1
## [13] fansi_0.4.2 lubridate_1.7.10 coin_1.4-1
## [16] xml2_1.3.2 codetools_0.2-18 splines_4.0.5
## [19] libcoin_1.0-8 knitr_1.32 ade4_1.7-16
## [22] jsonlite_1.7.2 broom_0.7.6 cluster_2.1.2
## [25] dbplyr_2.1.1 BiocManager_1.30.12 compiler_4.0.5
## [28] httr_1.4.2 rvcheck_0.1.8 backports_1.2.1
## [31] assertthat_0.2.1 Matrix_1.3-2 lazyeval_0.2.2
## [34] cli_2.4.0 htmltools_0.5.1.1 prettyunits_1.1.1
## [37] tools_4.0.5 igraph_1.2.6 gtable_0.3.0
## [40] glue_1.4.2 reshape2_1.4.4 Rcpp_1.0.6
## [43] Biobase_2.50.0 cellranger_1.1.0 jquerylib_0.1.3
## [46] vctrs_0.3.7 Biostrings_2.58.0 rhdf5filters_1.2.0
## [49] multtest_2.46.0 ape_5.4-1 nlme_3.1-152
## [52] iterators_1.0.13 xfun_0.22 ps_1.6.0
## [55] Rmisc_1.5 rvest_1.0.0 lifecycle_1.0.0
## [58] gtools_3.8.2 zoo_1.8-9 zlibbioc_1.36.0
## [61] MASS_7.3-53.1 scales_1.1.1 hms_1.0.0
## [64] sandwich_3.0-0 parallel_4.0.5 biomformat_1.18.0
## [67] rhdf5_2.34.0 yaml_2.2.1 gridExtra_2.3
## [70] sass_0.3.1 reshape_0.8.8 stringi_1.5.3
## [73] highr_0.9 S4Vectors_0.28.1 foreach_1.5.1
## [76] permute_0.9-5 ggstar_1.0.2 BiocGenerics_0.36.1
## [79] matrixStats_0.58.0 rlang_0.4.10 pkgconfig_2.0.3
## [82] evaluate_0.14 lattice_0.20-41 Rhdf5lib_1.12.1
## [85] labeling_0.4.2 patchwork_1.1.1 htmlwidgets_1.5.3
## [88] tidyselect_1.1.0 plyr_1.8.6 magrittr_2.0.1
## [91] R6_2.5.0 IRanges_2.24.1 generics_0.1.0
## [94] multcomp_1.4-16 DBI_1.1.1 pillar_1.6.0
## [97] haven_2.4.0 withr_2.4.1 mgcv_1.8-34
## [100] prettydoc_0.4.1 survival_3.2-10 modelr_0.1.8
## [103] crayon_1.4.1 utf8_1.2.1 rmarkdown_2.7
## [106] progress_1.2.2 grid_4.0.5 readxl_1.3.1
## [109] data.table_1.14.0 vegan_2.5-7 reprex_2.0.0
## [112] digest_0.6.27 stats4_4.0.5 munsell_0.5.0
## [115] bslib_0.2.4
5. References
Bolyen, Evan, Jai Ram Rideout, Matthew R Dillon, Nicholas A Bokulich, Christian C Abnet, Gabriel A Al-Ghalith, Harriet Alexander, et al. 2019. “Reproducible, Interactive, Scalable and Extensible Microbiome Data Science Using Qiime 2.” Nature Biotechnology 37 (8): 852–57. https://doi.org/10.1038/s41587-019-0209-9.
Callahan, Benjamin J, Paul J McMurdie, Michael J Rosen, Andrew W Han, Amy Jo A Johnson, and Susan P Holmes. 2016. “DADA2: High-Resolution Sample Inference from Illumina Amplicon Data.” Nature Methods 13 (7): 581. https://doi.org/10.1038/nmeth.3869.
McMurdie, Paul J., and Susan Holmes. 2013. “Phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PLoS ONE 8 (4): e61217. https://doi.org/10.1371/journal.pone.0061217.
Oksanen, Jari, F. Guillaume Blanchet, Michael Friendly, Roeland Kindt, Pierre Legendre, Dan McGlinn, Peter R. Minchin, et al. 2019. “Vegan: Community Ecology Package.” https://CRAN.R-project.org/package=vegan.
Siegel, Andrew F. 2004. “Rarefaction Curves.” Encyclopedia of Statistical Sciences 10. https://doi.org/10.1002/0471667196.ess2195.pub2.
Yu, Guangchuang, David Smith, Huachen Zhu, Yi Guan, and Tommy Tsan-Yuk Lam. 2017. “Ggtree: An R Package for Visualization and Annotation of Phylogenetic Trees with Their Covariates and Other Associated Data.” Methods in Ecology and Evolution 8 (1): 28–36. https://doi.org/10.1111/2041-210X.12628.