MeSH (Medical Subject Headings) is the NLM (U.S. National Library of Medicine) controlled vocabulary used to manually index articles for MEDLINE/PubMed. MeSH is comprehensive life science vocabulary. MeSH has 19 categories and MeSH.db
contains 16 of them. That is:
Abbreviation | Category |
---|---|
A | Anatomy |
B | Organisms |
C | Diseases |
D | Chemicals and Drugs |
E | Analytical, Diagnostic and Therapeutic Techniques and Equipment |
F | Psychiatry and Psychology |
G | Phenomena and Processes |
H | Disciplines and Occupations |
I | Anthropology, Education, Sociology and Social Phenomena |
J | Technology and Food and Beverages |
K | Humanities |
L | Information Science |
M | Persons |
N | Health Care |
V | Publication Type |
Z | Geographical Locations |
MeSH terms were associated with Entrez Gene ID by three methods, gendoo
, gene2pubmed
and RBBH
(Reciprocal Blast Best Hit).
Method | Way of corresponding Entrez Gene IDs and MeSH IDs |
---|---|
Gendoo | Text-mining |
gene2pubmed | Manual curation by NCBI teams |
RBBH | sequence homology with BLASTP search (E-value<10-50) |
meshes
supports enrichment analysis (over-representation analysis and gene set enrichment analysis) of gene list or whole expression profile using MeSH annotation. Data source from gendoo
, gene2pubmed
and RBBH
are all supported. User can selecte interesting category to test. All 16 categories are supported. The analysis supports >70 species listed in MeSHDb BiocView.
For algorithm details, please refer to the vignettes of DOSE1 package.
library(meshes)
data(geneList, package="DOSE")
de <- names(geneList)[1:100]
x <- enrichMeSH(de, MeSHDb = "MeSH.Hsa.eg.db", database='gendoo', category = 'C')
head(x)
## ID Description GeneRatio BgRatio pvalue
## D043171 D043171 Chromosomal Instability 16/96 198/16528 2.794765e-14
## D000782 D000782 Aneuploidy 17/96 320/16528 3.866830e-12
## D042822 D042822 Genomic Instability 16/96 312/16528 3.007419e-11
## D012595 D012595 Scleroderma, Systemic 11/96 279/16528 6.449334e-07
## D009303 D009303 Nasopharyngeal Neoplasms 11/96 314/16528 2.049315e-06
## D019698 D019698 Hepatitis C, Chronic 11/96 317/16528 2.246856e-06
## p.adjust qvalue
## D043171 2.459394e-11 1.815127e-11
## D000782 1.701405e-09 1.255702e-09
## D042822 8.821761e-09 6.510798e-09
## D012595 1.418854e-04 1.047168e-04
## D009303 3.295389e-04 2.432123e-04
## D019698 3.295389e-04 2.432123e-04
## geneID
## D043171 4312/991/2305/1062/4605/10403/7153/55355/4751/4085/81620/332/7272/9212/1111/6790
## D000782 4312/55143/991/1062/7153/4751/79019/55839/890/983/4085/332/7272/9212/8208/1111/6790
## D042822 55143/991/1062/4605/7153/1381/9787/4751/10635/890/4085/81620/332/9212/1111/6790
## D012595 4312/6280/1062/4605/7153/3627/4283/6362/7850/3002/4321
## D009303 4312/7153/3627/6241/983/4085/5918/332/3002/4321/6790
## D019698 4312/3627/10563/6373/4283/983/6362/7850/332/3002/3620
## Count
## D043171 16
## D000782 17
## D042822 16
## D012595 11
## D009303 11
## D019698 11
In the over-representation analysis, we use data source from gendoo
and C
(Diseases) category.
In the following example, we use data source from gene2pubmed
and test category G
(Phenomena and Processes) using GSEA.
y <- gseMeSH(geneList, MeSHDb = "MeSH.Hsa.eg.db", database = 'gene2pubmed', category = "G")
head(y)
## ID Description setSize enrichmentScore NES
## D009929 D009929 Organ Size 489 -0.3508778 -1.568321
## D009043 D009043 Motor Activity 424 -0.3321683 -1.474033
## D009119 D009119 Muscle Contraction 404 -0.3261438 -1.434361
## D050156 D050156 Adipogenesis 404 -0.3414986 -1.501890
## D001846 D001846 Bone Development 314 -0.3751600 -1.621060
## D006339 D006339 Heart Rate 321 -0.3697848 -1.601069
## pvalue p.adjust qvalues rank
## D009929 0.001215067 0.03467782 0.02522292 2309
## D009043 0.001246883 0.03467782 0.02522292 2126
## D009119 0.001290323 0.03467782 0.02522292 2781
## D050156 0.001290323 0.03467782 0.02522292 2207
## D001846 0.001298701 0.03467782 0.02522292 2100
## D006339 0.001300390 0.03467782 0.02522292 2405
## leading_edge
## D009929 tags=27%, list=18%, signal=23%
## D009043 tags=23%, list=17%, signal=20%
## D009119 tags=29%, list=22%, signal=23%
## D050156 tags=25%, list=18%, signal=21%
## D001846 tags=27%, list=17%, signal=23%
## D006339 tags=29%, list=19%, signal=24%
## core_enrichment
## D009929 154/9846/3315/6716/9732/5139/7337/5530/23001/4086/80114/6532/6416/1499/8945/7157/627/2252/22891/2908/8654/4088/27445/22846/4057/860/23286/268/2735/2104/23522/5480/51131/3082/10253/831/604/1028/182/7173/5624/8743/23047/596/9905/1548/2272/22829/948/27303/4314/196/6019/595/5021/7248/4212/2488/54820/5334/6403/2246/4803/866/5919/2308/79789/1907/7048/1831/4060/23387/2247/5468/8076/5793/3485/1733/3952/126/3778/79068/79633/6653/5244/4313/3625/10468/9201/1501/6720/2273/2099/3480/5764/6387/1471/1462/4016/2690/8817/6678/8821/5125/1191/5350/2162/5744/23541/185/367/4982/25802/4128/150/64084/3479/10451/9370/10699/125/4857/1308/2167/652/57502/4137/8614/5241
## D009043 1499/6453/8945/7157/627/408/2908/22881/27445/11132/2752/9445/2571/23621/3082/1291/2915/1543/7466/3240/3350/55304/181/3632/2169/27306/80169/9627/196/8678/8863/23284/81627/4692/5799/2259/3087/1278/1277/3953/4747/2247/6414/210/4744/5468/8835/89795/4023/8522/3485/3952/79068/8864/4313/2944/2273/2099/3480/8528/4908/56892/3339/5138/57161/4741/4306/6571/79750/4915/5744/2487/58503/347/6863/2952/5327/367/4982/4128/4059/3572/150/7060/9358/7166/3479/9254/5348/4129/9370/3708/1311/5105/4137/1408/5241
## D009119 23411/10724/4026/8502/1215/3672/7169/9455/10014/5742/10174/2150/5562/3611/4604/7070/4985/7139/3784/1760/3315/9732/72/5595/3092/6416/9759/270/6558/627/953/408/2908/7138/5563/6794/5564/3567/2104/845/3371/6548/831/3554/126393/7402/1129/7201/3350/5590/5592/7168/2149/4628/8082/5021/2318/844/79026/3790/2308/1907/253959/54795/4311/10580/1848/2281/10398/5166/50507/1012/6876/10203/83700/11167/2317/3952/3778/1009/5733/10468/3693/6253/9499/5159/3991/857/1909/6678/7041/32/8639/5350/3551/1264/2697/185/7043/3357/2205/253190/5327/25802/1634/3572/8490/3679/3479/5348/9370/9122/4629/652/7021/5241
## D050156 5595/8609/9563/27332/1499/79738/4837/7157/79960/5729/408/2908/4088/23741/6500/8038/4057/6649/5564/860/8648/10365/10253/54884/4602/8452/7474/6776/79875/596/25956/8644/80781/79923/1490/50486/7840/84162/6041/4692/2246/4208/11075/63924/5919/284119/2308/9411/54795/5950/79365/2247/5468/373/50507/6469/8553/4023/594/7350/81029/3952/79068/5733/4313/10468/10628/6720/2099/3480/11213/55893/290/6678/63895/4035/633/23414/8639/2162/165/3551/10788/185/3357/367/4982/3667/1634/4128/23024/3479/6424/9370/2167/652/8839/54829/2625/79689/10974
## D001846 8945/7157/57798/79048/627/6500/8038/860/2752/4882/3371/2915/63971/54455/3791/819/57045/596/2034/54808/80781/1280/64388/2261/4054/11059/3483/9900/26234/4734/9452/4208/4322/253461/1278/7048/51280/10903/7869/1277/3953/10516/10411/8835/79776/11167/2317/3485/3952/5274/54681/4488/10486/1009/2202/91851/2099/5764/23327/3339/8817/83716/6678/4915/633/658/54361/5744/165/5654/10631/3487/367/4982/3667/79971/1634/3479/114899/9370/652/8614/4969
## D006339 4985/7139/8929/3784/3375/154/1760/9781/5139/118/2702/6532/6416/2869/270/7157/627/2908/7138/5563/3643/1129/7779/947/1901/2034/4179/4804/64388/1621/4881/8863/5021/844/4212/11030/5797/6403/4803/84059/79789/5176/3953/5243/5468/1012/2868/5793/4023/7056/3952/5577/126/2946/3778/477/5733/4313/2944/9201/3075/9499/2273/2099/1471/857/775/5138/4306/4487/213/5350/5744/23245/2152/2697/2791/185/6863/2952/5327/80206/9607/3572/150/3479/2006/55259/9370/125/652/55351
User can use visualization methods implemented in DOSE (i.e.barplot
, dotplot
, cnetplot
, enrichMap
, upsetplot
and gseaplot
) to visualize these enrichment results. With these visualization methods, it’s much easier to interpret enriched results.
dotplot(x)
gseaplot(y, y[1,1], title=y[1,2])
meshes
implemented four IC-based methods (i.e. Resnik2, Jiang3, Lin4 and Schlicker5) and one graph-structure based method (i.e. Wang6). For algorithm details, please refer to the vignette of GOSemSim package7
meshSim
function is designed to measure semantic similarity between two MeSH term vectors.
library(meshes)
## hsamd <- meshdata("MeSH.Hsa.eg.db", category='A', computeIC=T, database="gendoo")
data(hsamd)
meshSim("D000009", "D009130", semData=hsamd, measure="Resnik")
## [1] 0.2910261
meshSim("D000009", "D009130", semData=hsamd, measure="Rel")
## [1] 0.521396
meshSim("D000009", "D009130", semData=hsamd, measure="Jiang")
## [1] 0.4914785
meshSim("D000009", "D009130", semData=hsamd, measure="Wang")
## [1] 0.5557103
meshSim(c("D001369", "D002462"), c("D017629", "D002890", "D008928"), semData=hsamd, measure="Wang")
## D017629 D002890 D008928
## D001369 0.2886598 0.1923711 0.2193326
## D002462 0.6521739 0.2381925 0.2809552
geneSim
function is designed to measure semantic similarity among two gene vectors.
geneSim("241", "251", semData=hsamd, measure="Wang", combine="BMA")
## [1] 0.487
geneSim(c("241", "251"), c("835", "5261","241", "994"), semData=hsamd, measure="Wang", combine="BMA")
## 835 5261 241 994
## 241 0.732 0.337 1.000 0.438
## 251 0.526 0.588 0.487 0.597
Here is the output of sessionInfo()
on the system on which this document was compiled:
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.6-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.6-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] meshes_1.4.0 DOSE_3.4.0 MeSH.db_1.9.0
## [4] MeSH.Hsa.eg.db_1.9.0 MeSHDbi_1.14.0 BiocGenerics_0.24.0
## [7] BiocStyle_2.6.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.13 plyr_1.8.4 compiler_3.4.2
## [4] tools_3.4.2 digest_0.6.12 bit_1.1-12
## [7] gtable_0.2.0 RSQLite_2.0 evaluate_0.10.1
## [10] memoise_1.1.0 tibble_1.3.4 pkgconfig_2.0.1
## [13] rlang_0.1.2 igraph_1.1.2 fastmatch_1.1-0
## [16] DBI_0.7 rvcheck_0.0.9 yaml_2.1.14
## [19] gridExtra_2.3 fgsea_1.4.0 stringr_1.2.0
## [22] knitr_1.17 S4Vectors_0.16.0 IRanges_2.12.0
## [25] grid_3.4.2 stats4_3.4.2 rprojroot_1.2
## [28] bit64_0.9-7 qvalue_2.10.0 data.table_1.10.4-3
## [31] Biobase_2.38.0 AnnotationDbi_1.40.0 BiocParallel_1.12.0
## [34] GOSemSim_2.4.0 rmarkdown_1.6 bookdown_0.5
## [37] reshape2_1.4.2 ggplot2_2.2.1 GO.db_3.4.2
## [40] DO.db_2.9 blob_1.1.0 magrittr_1.5
## [43] splines_3.4.2 scales_0.5.0 backports_1.1.1
## [46] htmltools_0.3.6 colorspace_1.3-2 labeling_0.3
## [49] stringi_1.1.5 lazyeval_0.2.1 munsell_0.4.3
1. Yu, G., Wang, L.-G., Yan, G.-R. & He, Q.-Y. DOSE: An r/bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31, 608–609 (2015).
2. Philip, R. Semantic similarity in a taxonomy: An Information-Based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999).
3. Jiang, J. J. & Conrath, D. W. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of 10th International Conference on Research In Computational Linguistics (1997). at <http://www.citebase.org/abstract?id=oai:arXiv.org:cmp-lg/9709008>
4. Lin, D. An Information-Theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning 296—304 (1998). doi:10.1.1.55.1832
5. Schlicker, A., Domingues, F. S., Rahnenfuhrer, J. & Lengauer, T. A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7, 302 (2006).
6. Wang, J. Z., Du, Z., Payattakool, R., Yu, P. S. & Chen, C.-F. A new method to measure the semantic similarity of go terms. Bioinformatics (Oxford, England) 23, 1274–81 (2007).
7. Yu, G. et al. GOSemSim: An r package for measuring semantic similarity among go terms and gene products. Bioinformatics 26, 976–978 (2010).
8. Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an r package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology 16, 284–287 (2012).
9. Yu, G. & He, Q.-Y. ReactomePA: An r/bioconductor package for reactome pathway analysis and visualization. Mol. BioSyst. 12, 477–479 (2016).