signifinder vignette

Stefania Pirrotta1* and Enrica Calura1**

1Biology Department, University of Padova, Italy

*stefania.pirrotta@phd.unipd.it
**enrica.calura@unipd.it

7 May 2024

Abstract

signifinder offers a rapid way to apply and explore gene expression tumor signatures from literature. It allows to compute the signature scores on the user dataset. Further, it supports the exploration of the scores proving functions to visualize single or multiple signatures. Currently, signifinder contains more than 60 distinct signatures collected from the literature relating to multiple tumors and multiple cancer processes.

Package

signifinder 1.6.0

1 Introduction

In cancer studies, many works propose transcriptional signatures as good indicators of cancer processes, for their potential to show cancer ongoing activities and that can be used for patient stratification. For these reasons, they are considered potentially useful to guide therapeutic decisions and monitoring interventions. Moreover, transcriptional signatures of RNA-seq experiments are also used to assess the complex relations between the tumor and its microenvironment. In recent years, the new technologies for transcriptome detection (single-cell RNA-seq and spatial transcriptomics) highlighted the highly heterogeneous behaviour of this disease and, as a result, the need to dissect its complexity.

Each of these signatures has a specific gene set (and eventually a set of coefficients to differently weight the gene contributions) whose expression levels are combined in a single-sample score. And each signature has its own method to define the computation of the score. Despite much evidence that computational implementations are useful to improve data applicability and dissemination, the vast majority of signatures in literature are not published along with a computational code and only few of them have been implemented in a software, virtuous examples are: the R package consensusOV, dedicated to the TCGA ovarian cancer signature; and the R package genefu which hosts some of the most popular signatures of breast cancer.

signifinder provides an easy and fast computation of several published signatures. Firstly, users can see all the signatures collected so far in the package, with all the useful information and a description on how to properly interpret the scores. Then, users can decide which signature they want to compute on their dataset. To be easily integrated in the expression data analysis pipelines, signifinder works with the Bioconductor data structures (SummarizedExperiment, SingleCellExperiment and SpatialExperiment).

Also, several visualization functions are implemented to visualize the scores. These can help in the result interpretations: users can not only browse single signatures independently but also compare them with each other.

2 Installation

To install this package:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("signifinder")

3 Criteria for signature inclusion

The criteria for the inclusion of the signatures are: (i) signatures should rely on cancer topics, and be developed and used on cancer samples; (ii) signatures must exclusively use transcriptomic data; (iii) the original paper must state the gene list used for the signature definition, where all genes have an official gene symbol (Hugo consortium) or an unambiguous translation (genes without an official gene symbol are removed); (iv) the method to calculate the score must be unambiguously described. While it may not ever be possible to include all cancer signatures proposed in the literature, our package makes easy the addition of new signatures (by us or by others via “pull requests”, see Adding new signatures).

4 How to use signifinder

4.1 Input expression data

The input expression dataset must be normalized RNA-Seq counts (or normalized data matrix from microarrays) of bulk transcriptomics data, single-cell transcriptomics data or spatial transcriptomics data. They should be provided in the form of a matrix, a data frame or a SummarizedExperiment (and respectively SingleCellExperiment/SpatialExperiment) where rows correspond to genes and columns correspond to samples. In the last case, the name of the assay containing the normalized values should be “norm_expr” (users can also choose another name, but it should be specified in the whichAssay argument). Regardless the input type class, the output data is a SummarizedExperiment (SingleCellExperiment/SpatialExperiment) where the scores computed are put in the colData section.

Gene IDs in the input data can either be gene symbols, NCBI entrez or ensembl gene IDs. Users must say which of the three identifiers they use (SYMBOL, ENTREZID or ENSEMBL) to let the package convert the signature gene lists (nametype argument inside the signature functions). When a signature is computed a message is shown that says the percentage of genes found in the input data compared to the original list. There is no minimum threshold of genes for signatures to be computed, but a warning will be given if there are less than the 30% of signature genes. After a signature has been calculated it is possible to visually inspect signature gene expressions using geneHeatmapSignPlot (see Signature goodness).

Furthermore, the original works also specify the type of expression value (e.g. normalized value, TPM (transcript per million), log(TPM), etc…) that should be used to compute the signature. Therefore, during signature computation, data type should be eventually converted as reported in the original work. When using signifinder, users must supply the input data in the form of normalised counts (or normalised arrays) and, for the signatures which require this, a data transformation step will be automatically performed. The transformed data matrix will be included in the output as an additional assay and the name of the assay will be the name of the conversion (i.e. “TPM”, “CPM” or “FPKM”). Alternatively, if the input data is a SummarizedExperiment object that already contains (in addition to the normalized count) also an assay of the transformed data, this will be used directly. Note that in order to be used they must be called “TPM”, “CPM” or “FPKM”. Finally, included signatures have been developed both from array and RNA-seq data, therefore it is crucially important for users to specify the type of input data: “microarray” or “rnaseq” (inputType argument inside the signature functions). In signifinder, signatures developed with microarray can be applied to RNA-seq data but not vice versa due to input type conversions.

4.2 Computation of scores

In the following section, we use an example bulk expression dataset of ovarian cancer to show how to use signifinder with a standard workflow.

# loading packages
library(SummarizedExperiment)
library(signifinder)
library(dplyr)
data(ovse)
ovse

## class: SummarizedExperiment 
## dim: 3180 40 
## metadata(0):
## assays(4): norm_expr TPM CPM FPKM
## rownames(3180): ABL2 ACADM ... TMSB4Y USP9Y
## rowData names(0):
## colnames(40): sample1 sample2 ... sample39 sample40
## colData names(42): OV_subtype os ... APM_Wang ADO_Sidders

We can check all the signatures available in the package with the function availableSignatures.

availSigns <- availableSignatures()

The function returns a data frame with all the signatures included in the package and for each signature the following information:

signature: name of the signature
scoreLabel: column name(s) of scores added inside colData
functionName: name of the function to use to compute the signature
topic: general cancer topic
tumor: tumor type for which the signature was developed
tissue: tumor tissue for which the signature was developed
cellType: cell type for which the signature was developed
requiredInput: tumor data with which the signature was developed
transformationStep: data transformation step performed inside the function starting from the user’s ‘normArray’ or ‘normCounts’ data
author: first author of the work in which the signature is described
reference: reference of the work
description: brief description of the signature and how to evaluate its score

	1
signature	EMT_Miow
scoreLabel	EMT_Miow_Epithelial, EMT_Miow_Mesenchymal
functionName	EMTSign
topic	epithelial to mesenchymal
tumor	ovarian cancer
tissue	ovary
cellType	bulk
requiredInput	microarray, rnaseq
transformationStep	normArray, normCounts
author	Miow
reference	Miow Q. et al. Oncogene (2015)
description	Double score obtained with ssGSEA to establish the epithelial- and the mesenchymal-like status in ovarian cancer patients.

We can also interrogate the table asking which signatures are available for a specific tissue (e.g. ovary).

ovary_signatures <- availableSignatures(tissue = "ovary", description = FALSE)

Table 1: Table 2: Signatures developed for ovary collected in signifinder.
	signature	scoreLabel	functionName	topic	tumor	tissue	cellType	requiredInput	transformationStep	author	reference
1	EMT_Miow	EMT_Miow_Epithelial, EMT_Miow_Mesenchymal	EMTSign	epithelial to mesenchymal	ovarian cancer	ovary	bulk	microarray, rnaseq	normArray, normCounts	Miow	Miow Q. et al. Oncogene (2015)
5	Pyroptosis_Ye	Pyroptosis_Ye	pyroptosisSign	pyroptosis	ovarian cancer	ovary	bulk	rnaseq	FPKM	Ye	Ye Y. et al. Cell Death Discov. (2021)
9	Ferroptosis_Ye	Ferroptosis_Ye	ferroptosisSign	ferroptosis	ovarian cancer	ovary	bulk	microarray, rnaseq	normArray, FPKM	Ye	Ye Y. et al. Front. Mol. Biosci. (2021)
13	LipidMetabolism_Zheng	LipidMetabolism_Zheng	lipidMetabolismSign	metabolism	epithelial ovarian cancer	ovary	bulk	rnaseq	normCounts	Zheng	Zheng M. et al. Int. J. Mol. Sci. (2020)
15	ImmunoScore_Hao	ImmunoScore_Hao	immunoScoreSign	immune system	epithelial ovarian cancer	ovary	bulk	microarray, rnaseq	normArray, log2(FPKM+0.01)	Hao	Hao D. et al. Clin Cancer Res (2018)
17	ConsensusOV_Chen	ConsensusOV_Chen_IMR, ConsensusOV_Chen_DIF, ConsensusOV_Chen_PRO, ConsensusOV_Chen_MES	consensusOVSign	ovarian subtypes	high-grade serous ovarian carcinoma	ovary	bulk	microarray, rnaseq	normArray, normCounts	Chen	Chen G.M. et al. Clin Cancer Res (2018)
19	Matrisome_Yuzhalin	Matrisome_Yuzhalin	matrisomeSign	extracellular matrix	ovarian cystadenocarcinoma, gastric adenocarcinoma, colorectal adenocarcinoma, lung adenocarcinoma	ovary, lung, stomach, colon	bulk	microarray, rnaseq	normArray, normCounts	Yuzhalin	Yuzhalin A. et al. Br J Cancer (2018)
45	HRDS_Lu	HRDS_Lu	HRDSSign	chromosomal instability	ovarian cancer, breast cancer	ovary, breast	bulk	microarray, rnaseq	normArray, normCounts	Lu	Lu J. et al. J Mol Med (2014)
47	DNArep_Kang	DNArep_Kang	DNArepSign	chromosomal instability	serous ovarian cystadenocarcinoma	ovary	bulk	microarray, rnaseq	normArray, log2(normCount+1)	Kang	Kang J. et al. JNCI (2012)
48	IPSOV_Shen	IPSOV_Shen	IPSOVSign	immune system	ovarian cancer	ovary	bulk	microarray, rnaseq	normArray, log2(normCount+1)	Shen	Shen S. et al. EBiomed (2019)
60	LRRC15CAF_Dominguez	LRRC15CAF_Dominguez	LRRC15CAFSign	cancer associated fibroblasts	pancreatic adenocarcinoma, breast cancer, lung cancer, ovarian cancer, colon cancer, renal cancer, esophageal cancer, stomach adenocarcinoma, bladder cancer, head and neck squamous cell carcinoma	pancreas, breast, lung, ovary, colon, kidney, esophagus, stomach, bladder, head and neck	bulk	rnaseq	log2(normCounts+1)	Dominguez	Dominguez C.X. et al. Cancer Discovery (2020)
63	COXIS_Bonavita	COXIS_Bonavita	COXISSign	immune system	melanoma, bladder cancer, gastric cancer, clear cell renal cancer, ovarian cancer, cervical cancer, breast cancer (TNBC), lung cancer, head and neck squamous cell carcinoma	skin, bladder, stomach, kidney, ovary, cervix, breast, lung, head and neck	bulk	rnaseq	log2(normCounts+1)	Bonavita	Bonavita E. et al. Immunity (2020)

Once we have found a signature of interest, we can compute it by using the corresponding function (indicated in the functionName field of availableSignatures table). All the signature functions require the expression data and to indicate the type of input data (inputType equal to “rnaseq” or “microarray”). Data are supposed to be the normalized expression values.

ovse <- ferroptosisSign(dataset = ovse, inputType = "rnaseq")

## ferroptosisSignYe is using 100% of signature genes

Some signatures are grouped in the same function by cancer topic even if they deal with different cancer types and computation approaches. We can unequivocally choose the one we are interested in by stating the first author of the signature (indicated in the author field of availableSignatures table). E.g., currently, there are four different epithelial-to-mesenchymal transition (EMT) signatures implemented inside the EMTSign function (“Miow”, “Mak”, “Cheng” or “Thompson”). We can choose which one to compute stating the author argument:

ovse <- EMTSign(dataset = ovse, inputType = "rnaseq", author = "Miow")

## EMTSignMiow is using 96% of epithelial signature genes

## EMTSignMiow is using 91% of mesenchymal signature genes

## Warning in .filterGenes(dataMatrix, removeConstant = removeConstant,
## removeNzConstant = removeNzConstant): 3 genes with constant values throughout
## the samples.

## [1] "Calculating ranks..."
## [1] "Calculating absolute values from ranks..."

In this way, “EMT_Miow” is computed. Regardless the expression input type, the output data of all the signature functions is a SummarizedExperiment with the computed signature scores in the colData. Thus, the returned object can be resubmitted as input data to another signature function and will be returned as well with the addition of the new signature in the colData.

We can also compute multiple signatures at once with the function multipleSign. We can specify which signatures we are interested in through the use of the arguments tissue, tumor and/or topic to define the signature list to compute. E.g. here below we compute all the available signature for ovary and pan-tissue:

ovse <- multipleSign(dataset = ovse, inputType = "rnaseq",
                     tissue = c("ovary", "pan-tissue"))

## EMTSignMiow is using 96% of epithelial signature genes

## EMTSignMiow is using 91% of mesenchymal signature genes

## Warning in .filterGenes(dataMatrix, removeConstant = removeConstant,
## removeNzConstant = removeNzConstant): 3 genes with constant values throughout
## the samples.

## [1] "Calculating ranks..."
## [1] "Calculating absolute values from ranks..."

## EMTSignMak is using 96% of epithelial signature genes

## EMTSignMak is using 100% of mesenchymal signature genes

## pyroptosisSignYe is using 86% of signature genes

## ferroptosisSignYe is using 100% of signature genes

## lipidMetabolismSign is using 100% of signature genes

## hypoxiaSign is using 92% of signature genes

## immunoScoreSignHao is using 100% of signature genes

## immunoScoreSignRoh is using 100% of signature genes

## 'select()' returned 1:1 mapping between keys and columns

## Loading training data

## Training Random Forest...

## IPSSign is using 98% of signature genes

## matrisomeSign is using 100% of signature genes

## mitoticIndexSign is using 100% of signature genes

## ImmuneCytSignRooney is using 100% of signature genes

## IFNSign is using 100% of signature genes

## expandedImmuneSign is using 100% of signature genes

## TinflamSign is using 100% of signature genes

## CINSign is using 96% of signature genes

## CINSign is using 94% of signature genes

## cellCycleSignLundberg is using 93% of signature genes

## cellCycleSignDavoli is using 100% of signature genes

## ASCSign is using 92% of signature genes

## ImmuneCytSignDavoli is using 100% of signature genes

## ChemokineSign is using 100% of signature genes

## ECMSign is using 100% of up signature genes

## ECMSign is using 93% of down signature genes

## Warning in .filterGenes(dataMatrix, removeConstant = removeConstant,
## removeNzConstant = removeNzConstant): 3 genes with constant values throughout
## the samples.

## [1] "Calculating ranks..."
## [1] "Calculating absolute values from ranks..."

## HRDSSign is using 89% of signature genes

## VEGFSign is using 100% of signature genes

## DNArepSign is using 87% of signature genes

## IPSOVSign is using 100% of signature genes

## Warning in .filterAndMapGenesAndGeneSets(param, removeConstant = FALSE, : Some
## gene sets have size one. Consider setting 'minSize > 1'.

## [1] "Calculating ranks..."
## [1] "Calculating absolute values from ranks..."

## APMSign is using 100% of signature genes

## Warning in .filterGenes(dataMatrix, removeConstant = removeConstant,
## removeNzConstant = removeNzConstant): 3 genes with constant values throughout
## the samples.

## Warning in .filterGenes(dataMatrix, removeConstant = removeConstant,
## removeNzConstant = removeNzConstant): Genes with constant values are discarded.

## ADOSign is using 100% of signature genes

## Warning in .filterGenes(dataMatrix, removeConstant = removeConstant,
## removeNzConstant = removeNzConstant): 3 genes with constant values throughout
## the samples.
## Warning in .filterGenes(dataMatrix, removeConstant = removeConstant,
## removeNzConstant = removeNzConstant): Genes with constant values are discarded.

## LRRC15CAFSign is using 100% of signature genes

## Warning in scoreSingleSamples(gdb, datasetm, methods = "ewm"): 3 row(s) removed
## from expression object (y) due to 0sd

## COXISSign is using 83% of signature genes

Here below, instead, we compute all the available signature for ovary, pan-tissue and that are related to the immune system activity:

ovse <- multipleSign(dataset = ovse, inputType = "rnaseq",
                     tissue = c("ovary", "pan-tissue"), 
                     topic = "immune system")

## immunoScoreSignHao is using 100% of signature genes

## immunoScoreSignRoh is using 100% of signature genes

## IPSSign is using 98% of signature genes

## ImmuneCytSignRooney is using 100% of signature genes

## IFNSign is using 100% of signature genes

## expandedImmuneSign is using 100% of signature genes

## TinflamSign is using 100% of signature genes

## ImmuneCytSignDavoli is using 100% of signature genes

## ChemokineSign is using 100% of signature genes

## IPSOVSign is using 100% of signature genes

## Warning in .filterAndMapGenesAndGeneSets(param, removeConstant = FALSE, : Some
## gene sets have size one. Consider setting 'minSize > 1'.

## [1] "Calculating ranks..."
## [1] "Calculating absolute values from ranks..."

## APMSign is using 100% of signature genes

## Warning in .filterGenes(dataMatrix, removeConstant = removeConstant,
## removeNzConstant = removeNzConstant): 3 genes with constant values throughout
## the samples.

## Warning in .filterGenes(dataMatrix, removeConstant = removeConstant,
## removeNzConstant = removeNzConstant): Genes with constant values are discarded.

## COXISSign is using 83% of signature genes

Alternatively, we can state exactly the signature names using the whichSign argument.

ovse <- multipleSign(dataset = ovse, inputType = "rnaseq",
                     whichSign = c("EMT_Miow", "IPSOV_Shen"))

## EMTSignMiow is using 96% of epithelial signature genes

## EMTSignMiow is using 91% of mesenchymal signature genes

## Warning in .filterGenes(dataMatrix, removeConstant = removeConstant,
## removeNzConstant = removeNzConstant): 3 genes with constant values throughout
## the samples.

## [1] "Calculating ranks..."
## [1] "Calculating absolute values from ranks..."

## IPSOVSign is using 100% of signature genes

## Warning in .filterAndMapGenesAndGeneSets(param, removeConstant = FALSE, : Some
## gene sets have size one. Consider setting 'minSize > 1'.

## [1] "Calculating ranks..."
## [1] "Calculating absolute values from ranks..."

4.3 Signature goodness

When computing a signature on a dataset we always have to keep in mind that not all the signature genes may be present in the dataset. Also, these may have many zero values or other issues affecting the goodness of a specific signature for the dataset. We can inspect some signature’s technical parameters to evaluate their reliability for the analysed dataset. First, users can access the complete gene list of a signature with the function getSignGenes, that returns a dataframe object with “SYMBOL” in the first column. Some signatures have also additional columns: “coeff” for coefficients that weigh the gene contributions; “class” for a classification that divides the signature in two or more groups. Few signatures have other specific columns.

getSignGenes("VEGF_Hu")

##     SYMBOL
## 1    RRAGD
## 2    FABP5
## 3    UCHL1
## 4      GAL
## 5    PLOD1
## 6    DDIT4
## 7    VEGFA
## 8      ADM
## 9  ANGPTL4
## 10   NDRG1
## 11 SLC16A3
## 12  FLVCR2

getSignGenes("Pyroptosis_Ye")

##   SYMBOL  coeff
## 1   AIM2 -0.187
## 2  PLCG1  0.068
## 3  ELANE  0.097
## 4   PJVK -0.143
## 5  CASP3 -0.086
## 6  CASP6 -0.033
## 7  GSDMA  0.130

getSignGenes("EMT_Thompson")

##    SYMBOL       class
## 1    CDH1  epithelial
## 2    CDH3  epithelial
## 3   CLDN4  epithelial
## 4   EPCAM  epithelial
## 5    ST14  epithelial
## 6    MAL2  epithelial
## 7     VIM mesenchymal
## 8   SNAI2 mesenchymal
## 9    ZEB2 mesenchymal
## 10    FN1 mesenchymal
## 11   MMP2 mesenchymal
## 12   AGER mesenchymal

Second, the evaluationSignPlot function returns a multipanel plot that shows for each signature: (i) a value of the goodness of a signature for the user’s dataset. This goes from 0, worst goodness, to 100, best goodness, and is a combination of the parameters shown in the other pannels; (ii) the percentage of genes from the signature gene list that are actually available in the dataset; (iii) the percentage of zero values in the signature genes, for each sample; (iv) the correlation between signature scores and the sample total read counts; (v) the correlation between signature scores and the percentage of the sample total zero values.

evaluationSignPlot(data = ovse)

Third, users may be also interested in visually exploring the expression values of the genes involved in a signature. In this case, we can use geneHeatmapSignPlot to visualize them. It generates a heatmap of the expression values with genes on the rows and samples on the columns.

geneHeatmapSignPlot(data = ovse, whichSign = "LipidMetabolism_Zheng", 
                    logCount = TRUE)

Further, the function is not restricted to the visualization of only one signature, and we can also plot the expression values of genes from multiple signatures, also evaluating the gene list intersections. Since each signature has its own method to compute the score then to plot several signatures together the scores are transformed into z-score, individually for each signature.

set.seed(21)
geneHeatmapSignPlot(data = ovse, whichSign = c("IFN_Ayers", "Tinflam_Ayers"), 
                    logCount = TRUE)

4.4 Visualization

4.4.1 Score distribution plot

Each signature computed can be explored using the oneSignPlot function to visualize both the score and the density distribution.

oneSignPlot(data = ovse, whichSign = "Hypoxia_Buffa")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

4.4.2 Score correlation plot

To easily investigate the relation across multiple signatures, signifinder provides a function to easily show the pairwise correlations of the signatures (correlationSignPlot). The whichSign argument could be set to specify which signatures should be plotted. When it is not stated all signatures are used. Green-blue colors represent anticorrelations while orange-red scale is for positive correlations. Then, signatures are clustered to group together higher related ones.

sign_cor <- correlationSignPlot(data = ovse)

highest_correlated <- unique(unlist(
    sign_cor$data[(sign_cor$data$cor>0.95 & sign_cor$data$cor<1),c(1,2)]))

4.4.3 Score heatmap

We can compare scores across different signatures with the hetmapSignPlot function. Since each signature has its own method to compute the score then to plot several signatures together the scores are transformed into z-score, individually for each signature. The whichSign argument could be set to specify which signatures should be plotted. When it is not stated all signatures are used.

heatmapSignPlot(data = ovse)

heatmapSignPlot(data = ovse, whichSign = highest_correlated)

Users may also be interested in seeing how signatures are sorted in relation to only one or few of them. In this case, we can pass one or few signatures to the clusterBySign argument that will be used to cluster samples. Furthermore, users can add to the plot external sample annotations or plot the internal signature annotations (“signature”, “topic”, “tumor” or “tissue”).

set.seed(21)
heatmapSignPlot(data = ovse, whichSign = highest_correlated, 
                clusterBySign = paste0("ConsensusOV_Chen_", c("IMR","DIF","PRO","MES")),
                sampleAnnot = ovse$OV_subtype, signAnnot = "topic")

4.4.4 Survival plot

Using the function survivalSignPlot we can test the association with survival of a signature. The function needs a data frame with the patient survival time data. survivalSignPlot uses a Kaplan-Meier curve to test if patients with high or low values of the signature have differences in survival time. Different cut points of the signature score can be indicated through the argument cutpoint to define the two patient groups.

mysurvData <- cbind(ovse$os, ovse$status)
rownames(mysurvData) <- rownames(colData(ovse))
head(mysurvData)

##         [,1] [,2]
## sample1   NA    0
## sample2 1720    1
## sample3  887    1
## sample4  547    1
## sample5  260    0
## sample6 1069    1

survivalSignPlot(data = ovse, survData = mysurvData, 
                 whichSign = "Pyroptosis_Ye", cutpoint = "optimal")

## Warning in geom_segment(aes(x = 0, y = max(y2), xend = max(x1), yend = max(y2)), : All aesthetics have length 1, but the data has 2 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.
## All aesthetics have length 1, but the data has 2 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.
## All aesthetics have length 1, but the data has 2 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.
## All aesthetics have length 1, but the data has 2 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.

4.4.5 Score ridgeline plot

Finally, we can plot ridge lines with one or multiple signatures, also grouping samples by external annotations if needed. Since each signature has its own method to compute the score then to plot several signatures together the scores are transformed into z-score, individually for each signature.

ridgelineSignPlot(data = ovse, whichSign = highest_correlated)

## Picking joint bandwidth of 0.405

ridgelineSignPlot(data = ovse, whichSign = highest_correlated, 
                  groupByAnnot = ovse$OV_subtype)

## Picking joint bandwidth of 0.24

## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_density_ridges()`).

4.5 Other examples

Here, we present the results obtained with other two example datasets; one for single-cell transcriptomics and one for spatial transcriptomics.

4.5.1 Single-cell transcriptomics

We report here the results obtained using the single-cell transcriptomics dataset coming from a glioblastoma tissue of Darmanis et al. (GEO ID: GSE84465, Darmanis, S. et al. Single-Cell RNA-Seq Analysis of Infiltrating Neoplastic Cells at the Migrating Front of Human Glioblastoma. Cell Rep 21, 1399–1410 (2017)). We focused on the cells coming from the BT_S2 patient, that were labeled as immune cells, neoplastic or oligodendrocyte precursor cells (OPC) and that come from both the core and the periphery of the tumor.

We computed all the signatures for “brain” and “pan-tissue” that are available in signifinder running the command multipleSign setting inputType = "rnaseq" and tissue = c("brain", "pan-tissue"). Then, we performed a t-SNE and plotted the signature scores. Here, we can see the ridge plot and the t-SNE colored by some of the signatures computed, all cells or separately for different cell types.

4.5.2 Spatial transcriptomics

We used the spatial transcriptomic dataset “Human Breast Cancer: Ductal Carcinoma In Situ, Invasive Carcinoma (FFPE)”, included in the 10x Genomics Visium Spatial Gene Expression data, from the 10x website (https://www.10xgenomics.com). A manual annotation of the tissue area was performed and used to annotate the spots. We computed all the signatures for “breast” and “pan-tissue” cancers available in signifinder running the command multipleSign setting inputType = "rnaseq" and tissue = c("breast", "pan-tissue"). Here, we show the ridge plot and the spatial distribution of scores obtained for the Hipoxia_Buffa signature.

5 Adding new signatures

Please contact us if you have a gene expression signature that you would like to see added to the signifinder package. You can write us an email (stefania.pirrotta@phd.unipd.it) or open an issue in https://github.com/CaluraLab/signifinder/issues. The more difficult/custom the implementation, the better, as its inclusion in this package will provide more value for other users in the R/Bioconductor community.

6 Session info

Here is the output of sessionInfo() on the system on which this document was compiled.

sessionInfo()

## R version 4.4.0 (2024-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.19-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] dplyr_1.1.4                 signifinder_1.6.0          
##  [3] SummarizedExperiment_1.34.0 Biobase_2.64.0             
##  [5] GenomicRanges_1.56.0        GenomeInfoDb_1.40.0        
##  [7] IRanges_2.38.0              S4Vectors_0.42.0           
##  [9] BiocGenerics_0.50.0         MatrixGenerics_1.16.0      
## [11] matrixStats_1.3.0           BiocStyle_2.32.0           
## 
## loaded via a namespace (and not attached):
##   [1] ggtext_0.1.2                            
##   [2] ProtGenerics_1.36.0                     
##   [3] GSVA_1.52.1                             
##   [4] bitops_1.0-7                            
##   [5] lubridate_1.9.3                         
##   [6] httr_1.4.7                              
##   [7] RColorBrewer_1.1-3                      
##   [8] doParallel_1.0.17                       
##   [9] tools_4.4.0                             
##  [10] backports_1.4.1                         
##  [11] utf8_1.2.4                              
##  [12] R6_2.5.1                                
##  [13] HDF5Array_1.32.0                        
##  [14] lazyeval_0.2.2                          
##  [15] mgcv_1.9-1                              
##  [16] rhdf5filters_1.16.0                     
##  [17] GetoptLong_1.0.5                        
##  [18] withr_3.0.0                             
##  [19] gridExtra_2.3                           
##  [20] cli_3.6.2                               
##  [21] Cairo_1.6-2                             
##  [22] exactRankTests_0.8-35                   
##  [23] labeling_0.4.3                          
##  [24] sass_0.4.9                              
##  [25] mvtnorm_1.2-4                           
##  [26] survMisc_0.5.6                          
##  [27] readr_2.1.5                             
##  [28] randomForest_4.7-1.1                    
##  [29] ggridges_0.5.6                          
##  [30] commonmark_1.9.1                        
##  [31] Rsamtools_2.20.0                        
##  [32] systemfonts_1.0.6                       
##  [33] svglite_2.1.3                           
##  [34] maps_3.4.2                              
##  [35] limma_3.60.0                            
##  [36] rstudioapi_0.16.0                       
##  [37] RSQLite_2.3.6                           
##  [38] generics_0.1.3                          
##  [39] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 
##  [40] shape_1.4.6.1                           
##  [41] BiocIO_1.14.0                           
##  [42] consensusOV_1.26.0                      
##  [43] car_3.1-2                               
##  [44] Matrix_1.7-0                            
##  [45] interp_1.1-6                            
##  [46] fansi_1.0.6                             
##  [47] abind_1.4-5                             
##  [48] lifecycle_1.0.4                         
##  [49] yaml_2.3.8                              
##  [50] edgeR_4.2.0                             
##  [51] carData_3.0-5                           
##  [52] rhdf5_2.48.0                            
##  [53] SparseArray_1.4.3                       
##  [54] grid_4.4.0                              
##  [55] blob_1.2.4                              
##  [56] crayon_1.5.2                            
##  [57] lattice_0.22-6                          
##  [58] beachmat_2.20.0                         
##  [59] cowplot_1.1.3                           
##  [60] GenomicFeatures_1.56.0                  
##  [61] annotate_1.82.0                         
##  [62] KEGGREST_1.44.0                         
##  [63] mapproj_1.2.11                          
##  [64] magick_2.8.3                            
##  [65] pillar_1.9.0                            
##  [66] knitr_1.46                              
##  [67] ComplexHeatmap_2.20.0                   
##  [68] rjson_0.2.21                            
##  [69] codetools_0.2-20                        
##  [70] glue_1.7.0                              
##  [71] data.table_1.15.4                       
##  [72] vctrs_0.6.5                             
##  [73] png_0.1-8                               
##  [74] gtable_0.3.5                            
##  [75] assertthat_0.2.1                        
##  [76] cachem_1.0.8                            
##  [77] xfun_0.43                               
##  [78] S4Arrays_1.4.0                          
##  [79] survival_3.6-4                          
##  [80] SingleCellExperiment_1.26.0             
##  [81] iterators_1.0.14                        
##  [82] tinytex_0.51                            
##  [83] KMsurv_0.1-5                            
##  [84] statmod_1.5.0                           
##  [85] nlme_3.1-164                            
##  [86] bit64_4.0.5                             
##  [87] openair_2.18-2                          
##  [88] bslib_0.7.0                             
##  [89] maxstat_0.7-25                          
##  [90] irlba_2.3.5.1                           
##  [91] colorspace_2.1-0                        
##  [92] DBI_1.2.2                               
##  [93] tidyselect_1.2.1                        
##  [94] bit_4.0.5                               
##  [95] compiler_4.4.0                          
##  [96] curl_5.2.1                              
##  [97] ontologyIndex_2.12                      
##  [98] graph_1.82.0                            
##  [99] xml2_1.3.6                              
## [100] DelayedArray_0.30.1                     
## [101] plotly_4.10.4                           
## [102] bookdown_0.39                           
## [103] rtracklayer_1.64.0                      
## [104] checkmate_2.3.1                         
## [105] scales_1.3.0                            
## [106] hexbin_1.28.3                           
## [107] stringr_1.5.1                           
## [108] SpatialExperiment_1.14.0                
## [109] digest_0.6.35                           
## [110] rmarkdown_2.26                          
## [111] sparrow_1.10.1                          
## [112] XVector_0.44.0                          
## [113] htmltools_0.5.8.1                       
## [114] pkgconfig_2.0.3                         
## [115] jpeg_0.1-10                             
## [116] DGEobj.utils_1.0.6                      
## [117] sparseMatrixStats_1.16.0                
## [118] highr_0.10                              
## [119] fastmap_1.1.1                           
## [120] ensembldb_2.28.0                        
## [121] rlang_1.1.3                             
## [122] GlobalOptions_0.1.2                     
## [123] htmlwidgets_1.6.4                       
## [124] UCSC.utils_1.0.0                        
## [125] DelayedMatrixStats_1.26.0               
## [126] farver_2.1.1                            
## [127] jquerylib_0.1.4                         
## [128] zoo_1.8-12                              
## [129] jsonlite_1.8.8                          
## [130] BiocParallel_1.38.0                     
## [131] BiocSingular_1.20.0                     
## [132] RCurl_1.98-1.14                         
## [133] magrittr_2.0.3                          
## [134] kableExtra_1.4.0                        
## [135] GenomeInfoDbData_1.2.12                 
## [136] patchwork_1.2.0                         
## [137] Rhdf5lib_1.26.0                         
## [138] munsell_0.5.1                           
## [139] Rcpp_1.0.12                             
## [140] babelgene_22.9                          
## [141] viridis_0.6.5                           
## [142] stringi_1.8.4                           
## [143] zlibbioc_1.50.0                         
## [144] MASS_7.3-60.2                           
## [145] plyr_1.8.9                              
## [146] org.Hs.eg.db_3.19.1                     
## [147] parallel_4.4.0                          
## [148] deldir_2.0-4                            
## [149] survminer_0.4.9                         
## [150] Biostrings_2.72.0                       
## [151] splines_4.4.0                           
## [152] gridtext_0.1.5                          
## [153] hms_1.1.3                               
## [154] circlize_0.4.16                         
## [155] locfit_1.5-9.9                          
## [156] ggpubr_0.6.0                            
## [157] markdown_1.12                           
## [158] ggsignif_0.6.4                          
## [159] ScaledMatrix_1.12.0                     
## [160] DGEobj_1.1.2                            
## [161] XML_3.99-0.16.1                         
## [162] evaluate_0.23                           
## [163] latticeExtra_0.6-30                     
## [164] BiocManager_1.30.23                     
## [165] tzdb_0.4.0                              
## [166] foreach_1.5.2                           
## [167] tidyr_1.3.1                             
## [168] purrr_1.0.2                             
## [169] km.ci_0.5-6                             
## [170] clue_0.3-65                             
## [171] ggplot2_3.5.1                           
## [172] rsvd_1.0.5                              
## [173] broom_1.0.5                             
## [174] xtable_1.8-4                            
## [175] restfulr_0.0.15                         
## [176] AnnotationFilter_1.28.0                 
## [177] rstatix_0.7.2                           
## [178] viridisLite_0.4.2                       
## [179] TxDb.Hsapiens.UCSC.hg38.knownGene_3.18.0
## [180] tibble_3.2.1                            
## [181] memoise_2.0.1                           
## [182] AnnotationDbi_1.66.0                    
## [183] GenomicAlignments_1.40.0                
## [184] cluster_2.1.6                           
## [185] timechange_0.3.0                        
## [186] BiocSet_1.18.0                          
## [187] GSEABase_1.66.0