1 Version info

R version: R version 4.0.3 (2020-10-10)
Bioconductor version: 3.12
Package version: 1.10.1

2 Abstract

srnadiff is an R package that finds differently expressed regions from RNA-seq data at base-resolution level without relying on existing annotation. To do so, the package implements the identify-then-annotate methodology that builds on the idea of combining two pipelines approach: differential expressed regions detection and differential expression quantification.

3 Introduction

There is no real method for finding differentially expressed short RNAs. The most used method focusses on miRNAs, and only uses a standard RNA-Seq pipe-line on these genes.

However, annotated tRF, siRNAs, piRNA, etc. are thus out of the scope of these analyses. Several ad hoc method have been used, and this package implements a unifying method, finding differentially expressed genes or regions of any kind.

The srnadiff package implements two major methods to produce potential differentially expressed regions: the HMM and IR method. Briefly, these methods identify contiguous base-pairs in the genome that present differential expression signal, then these are regrouped into genomic intervals called differentially expressed regions (DERs).

Once DERs are detected, the second step in a sRNA-diff approach is to quantify the signification of these. To do so, reads (including fractions of reads) that overlap each expressed region are counted to arrive at a count matrix with one row per region and one column per sample. Then, this count matrix is analyzed using the standard workflow of DESeq2 for differential expression of RNA-seq data, assigning a p-value to each candidate DER.

The main functions for finds differently expressed regions are srnadiffExp and srnadiff. The first one creates an S4 class providing the infrastructure (slots) to store the input data, methods parameters, intermediate calculations and results of an sRNA-diff approach. The second one implement four methods to find candidate differentially expressed regions and quantify the statistic signification of the finded regions.

This vignette explains the basics of using srnadiff by showing an example, including advanced material for fine tuning some options. The vignette also includes description of the methods behind the package.

3.1 Citing srnadiff

We hope that srnadiff will be useful for your research. Please use the following information to cite srnadiff and the overall approach when you publish results obtained using this package, as such citation is the main means by which the authors receive credit for their work. Thank you!

Zytnicki, M., and I. González. (2021). “srnadiff: Finding differentially expressed unannotated genomic regions from RNA-seq data.” R package version 1.10.1.

3.2 How to get help for srnadiff

Most questions about individual functions will hopefully be answered by the documentation. To get more information on any specific named function, for example MIMFA, you can bring up the documentation by typing at the R.

help("srnadiff")

?srnadiff

The authors of srnadiff always appreciate receiving reports of bugs in the package functions or in the documentation. The same goes for well-considered suggestions for improvements. If you’ve run into a question that isn’t addressed by the documentation, or you’ve found a conflict between the documentation and what the software does, then there is an active community that can offer help. Send your questions or problems concerning srnadiff to the Bioconductor support site at .

Please send requests for general assistance and advice to the support site, rather than to the individual authors. It is particularly critical that you provide a small reproducible example and your session information so package developers can track down the source of the error. Users posting to the support site for the first time will find it helpful to read the posting guide at the Bioconductor help page.

3.3 Quick start

A typical sRNA-diff session can be divided into three steps:

Data preparation: In this first step, a convenient R object of class srnadiffExp is created containing all the information required for the two remaining steps. The user needs to provide a vector with the full paths to the BAM files, a data.frame with sample and experimental design information and optionally annotated regions as a GRanges object.
Performing srnadiff: Using the object created in the first step the user can perform srnadiff to find potential DERs and quantify the statistic signification of these.
Visualization of the results: The DERs obtained in the second step are visualized by plotting the coverage information surrounding genomic regions.

A typical srnadiff session might look like the following. Here we assume that bamFiles is a vector with the full paths to the BAM files and the sample and experimental design information are stored in a data frame sampleInfo.

4 Using srnadiff

4.1 Installation

We assume that the user has the R program (see the R project) already installed.

The srnadiff package is available from the Bioconductor repository. To be able to install the package one needs first to install the core Bioconductor packages. If you have already installed Bioconductor packages on your system then you can skip the two lines below.

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

Once the core Bioconductor packages are installed, you can install the srnadiff package by

BiocManager::install("srnadiff", version="3.8")

Load the srnadiff package in your R session:

library(srnadiff)

A list of all accessible vignettes and methods is available with the following command:

help.search("srnadiff")

4.2 Data overview

To help demonstrate the functionality of srnadiff, the package includes datasets of published by (Viollet et al. 2015).

Briefly, these data consist of three replicates of sRNA-Seq of SLK (human) cell lines, and three replicates of SLK cell lines infected with Kaposi’s sarcoma associated herpesvirus. The analysis shows that several loci are repressed in the infected cell lines, including the 14q32 miRNA cluster.

Raw data have been downloaded from the GEO data set GSE62830. Adapters were removed with fastx_clipper and mapped with (Salzberg and Langmead 2012) on the human genome version GRCh38.

This data is restricted to a small locus on chr14. It uses the whole genome annotation (with coding genes, etc.) and extracts miRNAs.

The file dataInfo.csv contains three columns for each BAM file:

the file name (FileName)
a human readable sample name (SampleName)
a condition, e.g. WT (Condition)

4.3 Data preparation: the `srnadiffExp` object

The first step in an sRNA-diff approach is to create a srnadiffExp object. srnadiffExp is an S4 class providing the infrastructure to store the input data, methods parameters, intermediate calculations and results of a sRNA-diff approach. This object will be also the input of the visualization function.

The object srnadiffExp will usually be represented in the code here as srnaExp. To build such an object the user needs the following:

paths to the BAM files: a vector with the full paths to the sample BAM files.
sample information: a data.frame with three columns labelled FileName, SampleName and Condition. The first column is the BAM file name (without extension), the second column the sample name, and the third column the condition to which sample belongs. Each row describes one sample.
annotation: optionally annotation information. A GRanges object containing annotated regions.

Here, we demonstrate how to construct a srnadiffExp object from (Viollet et al. 2015).

## Determine the path to data files
basedir <- system.file("extdata", package="srnadiff", mustWork=TRUE)

## Vector with the full paths to the BAM files to use
bamFiles <- paste(file.path(basedir, sampleInfo$FileName), "bam", sep=".")

## Reads sample information file and creates a data frame from it
sampleInfo <- read.csv(file.path(basedir, "dataInfo.csv"))

## Vector with the full paths to the BAM files to use
bamFiles <- paste(file.path(basedir, sampleInfo$FileName), "bam", sep = ".")

## Creates an srnadiffExp object
srnaExp <- srnadiffExp(bamFiles, sampleInfo)

Optionally, if annotation information is available as a GRanges object, annotReg say, then a srnadiffExp object can be created by

srnaExp <- srnadiffExp(bamFiles, sampleInfo, annotReg)

or by

srnaExp <- srnadiffExp(bamFiles, sampleInfo)
annotReg(srnaExp) <- annotReg

A summary of the srnaExp object can be seen by typing the object name at the R prompt

srnaExp

## Object of class srnadiffExp.
##  Sample information
##     FileName SampleName Condition
## 1 SRR1634756      ctr_1   control
## 2 SRR1634757      ctr_2   control
## 3 SRR1634758      ctr_3   control
## 4 SRR1634759      inf_1  infected
## 5 SRR1634760      inf_2  infected
## 6 SRR1634761      inf_3  infected

For your conveniance and illustrative purposes, an example of an srnadiffExp object can be loaded with an only command, so the script boils down to:

srnaExp <- srnadiffExample()

The srnadiffExp object in this example was constructed by:

basedir    <- system.file("extdata", package="srnadiff", mustWork=TRUE)
sampleInfo <- read.csv(file.path(basedir, "dataInfo.csv"))
gtfFile    <- file.path(basedir, "Homo_sapiens.GRCh38.76.gtf.gz")
annotReg   <- readAnnotation(gtfFile, feature="gene", source="miRNA")
bamFiles   <- paste(file.path(basedir, sampleInfo$FileName), "bam", sep=".")
srnaExp    <- srnadiffExp(bamFiles, sampleInfo, annotReg)

4.4 Read annotation

srnadiff offers the readAnnotation function related to loading annotation data. This accepts two annotation format files: GTF and GFF formats. Specification of GTF/GFF format can be found at the UCSC dedicated page.

readAnnotation reads and parses content of GTF/GFF files and stores annotated genomic features (regions) in a GRanges object. This has three main arguments: the first argument indicates the path, URL or connection to the GTF/GFF annotation file. The second and third argument, feature and source respectively, are of type character string, these specify the feature and attribute type used to select rows in the GTF/GFF annotation which will be imported. feature and source can be NULL, in this case, no selection is performed and all content into the file is imported.

4.4.1 Extraction of putative regions using an GTF annotation file

This method simply provides the genomic regions corresponding to the annotation file that is optionally given by the user. It can be a set of known miRNAs, siRNAs, piRNAs, genes, or a combination thereof.

4.4.2 Whole genome file annotation

This GTF file can be found in the central repositories (NCBI, Ensembl) and contains all the annotation found in an organism (coding genes, tranposable element, etc.). The following function reads the annotation file and extracts the miRNAs. Annotation files may have different formats, but this command has been tested on several model organisms (including human) from Ensembl.

gtfFile  <- file.path(basedir, "Homo_sapiens.GRCh38.76.gtf.gz")
annotReg <- readAnnotation(gtfFile, feature="gene", source="miRNA")

4.4.3 Extraction of precursor miRNAs using a miRBase-formatted GFF file

miRBase (Kozomara and Griffiths-Jones 2014) is the central repository for miRNAs. If your organism is available, you can download their miRNA annotation in GFF3 format (check the “Browse” tab). The following code parses a GFF3 miRBase file, and extracts the precursor miRNAs.

gffFile  <- file.path(basedir, "mirbase21_GRCh38.gff3")
annotReg <- readAnnotation(gffFile, feature="miRNA_primary_transcript")

4.4.4 Extraction of mature miRNAs using a miRBase-formatted GFF file

In the previous example, the reads will be counted per pre-miRNA, and the 5’ and 3’ arms, the miRNA and the miRNA* will be merged in the same feature. If you want to separate the two, use:

gffFile  <- file.path(basedir, "mirbase21_GRCh38.gff3")
annotReg <- readAnnotation(gffFile, feature="miRNA")

4.4.5 Other format

When the previous functions do not work, you can use your own parser with:

annotation <- readAnnotation(gtfFile, source="miRNA", feature="gene")

The source parameter keeps all the lines such that the second field matches the given parameter (e.g. miRNA). The feature parameter keeps all the lines such that the third field matches the given parameter (e.g. gene). The name of the feature will be given by the tag name (e.g. gene_name). source, feature and name can be NULL. In this case, no selection is performed on source or feature. If name is null, then a systematic name is given (annotation_N).

4.5 Performing sRNA-diff

The main function for performing an sRNA-diff analysis is srnadiff, this the wrapper for running several key functions from this package. srnadiff implement four methods to produce potential DERs: the annotation, naive, hmm and IR method (see bellow). Once potential DERs are detected, the second step in srnadiff is to quantify the statistic signification of these.

srnadiff has three main arguments. The first argument is an instance of class srnadiffExp. The second argument is of type character vector, it specify the segmentation methods to use, one of annotation, naive, hmm, IR or combinations thereof. The default all, all methods are used. The third arguments is of type list, it contain named components for the methods parameters to use. If missing, default parameter values are supplied. Details about the methods parameters are further described in the manual page of the parameters function and in Methods to produce differentially expressed regions section.

We then performs an sRNA-diff analysis on the input data contained in srnaExp by

srnaExp <- srnadiff(srnaExp)

srnadiff returns an object of class srnadiffExp again containing additional slots for:

regions
parameters
countMatrix

5 Working with the `srnadiffExp` object

Once the srnadiffExp object is created the user can use the methods defined for this class to access the information encapsulated in the object.

By example, the sample information is accessed by

sampleInfo(srnaExp)

##     FileName SampleName Condition
## 1 SRR1634756      ctr_1   control
## 2 SRR1634757      ctr_2   control
## 3 SRR1634758      ctr_3   control
## 4 SRR1634759      inf_1  infected
## 5 SRR1634760      inf_2  infected
## 6 SRR1634761      inf_3  infected

For accessing the chromosomeSize slot

chromosomeSizes(srnaExp)

##        14 
## 107043718

The list of parameters can be exported by the function parameters

parameters(srnaExp)

5.1 Extracting regions

The regions, with corresponding information provided by DESeq2 (mean expression, fold-change, p-value, adjusted p-value, etc.), can be extracted with this command:

regions <- regions(srnaExp, pvalue=0.5)

where pvalue is the (adjusted) p-value threshold. The output in a GenomicRanges object, and the information is accessible with the mcols() function.

5.2 Data visualization

An insightful way of looking at the results of srnadiff is to investigate how the coverage information surrounding finded regions are distributed on the genomic coordinates.

plotRegions provides a flexible genomic visualization framework by displaying tracks in the sense of the Gviz package. Given a region (or regions), four separate tracks are represented:

GenomeAxisTrack, a horizontal axis with genomic coordinate tickmarks for reference location to the displayed genomic regions;
GeneRegionTrack, if the annot argument is passed, a track displaying all gene and/or sRNA annotation information in a particular region;
AnnotationTrack, regions are plotted as simple boxes if no strand information is available, or as arrows to indicate their direction; and
DataTrack, plot the sample coverages surrounding the genomic regions.

The sample coverages can be plotted in various different forms as well as combinations thereof. Supported plotting types are:

p: simple dot plot;

l: lines plot;

b: combination of dot and lines plot;

a: lines plot of the sample-groups average (i.e., mean) values;

confint: confidence intervals for average values.

The default visualization for results from srnadiff is a lines plot of the sample-groups average.

plotRegions(srnaExp, regions(srnaExp)[1])

6 Methods behind srnadiff

6.1 Pre-processing data

As input, srnadiffExp expects BAM files as obtained, e.g., from RNA-Seq or another high-throughput sequencing experiment. Reading and processing of BAM files uses current Bioconductor infrastructure for processing sequencing reads: RSamtools, IRanges and GenomicRanges libraries. At this stage BAM files are summarized into base-resolution coverage and stored in a run-length encoding format in order to enhance computational performance. Run-length encoding is a compact way to store an atomic vector as a pairs of vectors (value, length). It is based on the rle function from the base R package.

As a second pre-processing step, srnadiffExp estimate the size factors (the effective library size) from the coverage data, such that count values in each sample coverage can be brought to a common scale by dividing by the corresponding size (normalization) factor. This step is also called normalization, its purpose is to render coverages (counts) from different samples, which may have been sequenced to different depths, comparable. Normalization factors are estimated using the median ratio method described by Equation 5 in (Anders and Huber 2010).

6.2 Methods to produce differentially expressed regions

6.2.1 HMM method: `hmm`

The first step in HMM method is quantifying the evidence for differential expression at the base-resolution level. To do this, srnadiff use the common approach in comparative analysis of transcriptomics data: test the null hypothesis that the logarithmic fold change between condition groups for a nucleotide expression is exactly zero.

The next step in the HMM approach enforces a smoothness assumption over the state of nucleotides: differential expression does not randomly switch along the chromosome, rather, continuous regions of RNA are either “differentially expressed” or “not”. This is captured with a hidden Markov model (HMM) with binary latent state corresponding to the true state of each nucleotide: differentially expressed or not differentially expressed.

The observations of the HMM are then the empirical p-values arising from the differential expression analysis corresponding to each nucleotide position. Modelling p-values directly enabled us to define the emission of each state as follows: the differentially expressed state emits a p-value \(< t\) with probability \(p\), and the not differentially expressed state emits a p-value \(\geqslant t\) with probability \(1-p\), where \(t\) is a real number between 0 and 1.

The HMM approach normally needs emission, transition, and starting probabilities values. They can be tuned by the user according to the overall p-values from differential analysis. We then run the Viterbi algorithm [ref] in order to finding the most likely sequence of states from the HMM. This essentially segments the genome into regions, where a region is defined as a set of consecutive bases showing a common expression signature. A region of bases with differentially expressed state is referred as an expressed region and is given as output of the method.

To run the HMM approach, srnadiff first form a large matrix, with rows corresponding to bases, columns corresponding to samples and entries are the coverage from a nucleotide of a particular sample. This count matrix is then analyzed as into feature-level counts using the feature-level RNA-seq differential expression analysis from DESeq2. In practice, the p-value is not computed for every nucleotide. Nucleotides for which the sum of the coverage across all samples is less than a threshold are given a p-value of 1, because these poorly expressed bases are unlikely to provide a differentially expressed sRNA.

The parameters for the HMM method are:

noDiffToDiff: Initial transition probability from “no differentially expressed” state to “differentially expressed” state.

diffToNoDiff: Initial transition probability from “differentially expressed” state to no “differentially expressed” state.

emission: Is the probability to emit a p-value \(<t\) in the “differentially expressed” state, and a p-value \(\geq t\) in the “not differentially expressed” state. emissionThreshold: Is the threshold \(t\) that limits each state.

This parameters can be changed using using the assignment function parameters<-

parameters(srnaExp) <- list(noDiffToDiff=0.01, emissionThreshold=0.2)

6.2.2 IR method: `IR`

In this approach, for each base, the average from the normalized coverage is calculated across all samples into each condition. This generates a vector of (normalized) mean coverage expression per condition. These two vectors are then used to compute per-nucleotide log-ratios (in absolute value) across the genome. For the computed log-ratio expression, the method uses a sliding threshold h that run across the log-ratio levels identifying bases with log-ratio value above of h. Regions of contiguous bases passing this threshold are then analyzed using an adaptation of Aumann and Lindell algorithm for irreducibility property (Aumann and Lindell 2003).

The minimun sliding threshold, minLogFC, used in the IR method can be changed using the assignment function parameters<-

parameters(srnaExp) <- list(minLogFC=1)

6.2.3 Naive method: `naive`

This method is the simplest, gived a fixed threshold h, contiguous bases with log-ratio expression (in absolute value) passing this threshold are then considered as candidate differentially expressed regions.

The fixed threshold, cutoff, used in this method can be changed using the assignment function parameters<-

parameters(srnaExp) <- list(cutoff=1.5)

6.3 Quantifying DERs

The result of the ER step is a list of genomic regions which were chosen with a specific use, to quantify their expression for subsequent testing for differential expression. The selected regions are then quantified using the summarizeOverlaps function of the GenomicAlignments package. Notice that a read can overlap two different regions (e.g. extracted from the HMM and the IR methods), and thus can be counted twice for the quantification. The result is ultimately a matrix with rows corresponding to ERs and columns corresponding to samples; entries of this matrix are the number of aligned reads from a particular sample that overlap a particular region. Then, this count matrix is analyzed using the standard DESeq2 workflow for differential expression of RNA-seq data, assigning a p-value to each DER.

6.4 General parameters

The three last strategies can be tuned by specifying:

the minimum and maximum regions sizes,
the minimum depth of the most expressed condition.

The default values can be changed using these functions:

parameters(srnaExp) <- list(minDepth=1)
parameters(srnaExp) <- list(minSize=15, maxSize=1000)

6.5 Combination of strategies

6.5.1 Choice of the strategies

All the regions given by each strategies are then combined into a list of regions. You can choose not to use some strategies, use the parameter segMethod of the function srnadiff.

srnaExp <- srnadiffExample()
srnaExp <- srnadiff(srnaExp, segMethod=c("hmm", "IR"))

6.5.2 Quantification of the features

The selected regions are then quantified using of the summarizeOverlaps function of the GenomicAlignments package. Notice that a read can overlap two different regions (e.g. extracted from the naive and the slicing methods), and thus can be counted twice for the quantification.

You can adjust the minimum number of overlapping nucleotides between a read and a region to declare a hit, using:

parameters(srnaExp) <- list(minOverlap=1000)

DESeq2 is then used to get the adjusted p-values of these regions.

7 Misc

7.1 Using several cores

The quantification and differential expression steps can be accelerated using several cores and the following command:

exp <- setNThreads(exp, nThreads=4)

7.2 Troubleshooting

While installing the package, if the compiler complains and says

#error This file requires compiler and library support for the ISO C++ 2011 standard.
This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.

Add this line

Sys.setenv("PKG_CXXFLAGS"="-std=c++11")

before installing the package.

8 Session information

devtools::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.3 (2020-10-10)
##  os       Ubuntu 18.04.5 LTS          
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  C                           
##  ctype    en_US.UTF-8                 
##  tz       America/New_York            
##  date     2021-01-05                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package              * version  date       lib source        
##  annotate               1.68.0   2021-01-05 [2] Bioconductor  
##  AnnotationDbi          1.52.0   2021-01-05 [2] Bioconductor  
##  AnnotationFilter       1.14.0   2021-01-05 [2] Bioconductor  
##  askpass                1.1      2019-01-13 [2] CRAN (R 4.0.3)
##  assertthat             0.2.1    2019-03-21 [2] CRAN (R 4.0.3)
##  backports              1.2.1    2020-12-09 [2] CRAN (R 4.0.3)
##  base64enc              0.1-3    2015-07-28 [2] CRAN (R 4.0.3)
##  Biobase                2.50.0   2021-01-05 [2] Bioconductor  
##  BiocFileCache          1.14.0   2021-01-05 [2] Bioconductor  
##  BiocGenerics           0.36.0   2021-01-05 [2] Bioconductor  
##  BiocManager          * 1.30.10  2019-11-16 [2] CRAN (R 4.0.3)
##  BiocParallel           1.24.1   2021-01-05 [2] Bioconductor  
##  BiocStyle            * 2.18.1   2021-01-05 [2] Bioconductor  
##  biomaRt                2.46.0   2021-01-05 [2] Bioconductor  
##  Biostrings             2.58.0   2021-01-05 [2] Bioconductor  
##  biovizBase             1.38.0   2021-01-05 [2] Bioconductor  
##  bit                    4.0.4    2020-08-04 [2] CRAN (R 4.0.3)
##  bit64                  4.0.5    2020-08-30 [2] CRAN (R 4.0.3)
##  bitops                 1.0-6    2013-08-17 [2] CRAN (R 4.0.3)
##  blob                   1.2.1    2020-01-20 [2] CRAN (R 4.0.3)
##  bookdown               0.21     2020-10-13 [2] CRAN (R 4.0.3)
##  BSgenome               1.58.0   2021-01-05 [2] Bioconductor  
##  callr                  3.5.1    2020-10-13 [2] CRAN (R 4.0.3)
##  checkmate              2.0.0    2020-02-06 [2] CRAN (R 4.0.3)
##  cli                    2.2.0    2020-11-20 [2] CRAN (R 4.0.3)
##  cluster                2.1.0    2019-06-19 [2] CRAN (R 4.0.3)
##  colorspace             2.0-0    2020-11-11 [2] CRAN (R 4.0.3)
##  crayon                 1.3.4    2017-09-16 [2] CRAN (R 4.0.3)
##  curl                   4.3      2019-12-02 [2] CRAN (R 4.0.3)
##  data.table             1.13.6   2020-12-30 [2] CRAN (R 4.0.3)
##  DBI                    1.1.0    2019-12-15 [2] CRAN (R 4.0.3)
##  dbplyr                 2.0.0    2020-11-03 [2] CRAN (R 4.0.3)
##  DelayedArray           0.16.0   2021-01-05 [2] Bioconductor  
##  desc                   1.2.0    2018-05-01 [2] CRAN (R 4.0.3)
##  DESeq2                 1.30.0   2021-01-05 [2] Bioconductor  
##  devtools               2.3.2    2020-09-18 [2] CRAN (R 4.0.3)
##  dichromat              2.0-0    2013-01-24 [2] CRAN (R 4.0.3)
##  digest                 0.6.27   2020-10-24 [2] CRAN (R 4.0.3)
##  dplyr                  1.0.2    2020-08-18 [2] CRAN (R 4.0.3)
##  ellipsis               0.3.1    2020-05-15 [2] CRAN (R 4.0.3)
##  ensembldb              2.14.0   2021-01-05 [2] Bioconductor  
##  evaluate               0.14     2019-05-28 [2] CRAN (R 4.0.3)
##  fansi                  0.4.1    2020-01-08 [2] CRAN (R 4.0.3)
##  foreign                0.8-81   2020-12-22 [2] CRAN (R 4.0.3)
##  Formula                1.2-4    2020-10-16 [2] CRAN (R 4.0.3)
##  fs                     1.5.0    2020-07-31 [2] CRAN (R 4.0.3)
##  genefilter             1.72.0   2021-01-05 [2] Bioconductor  
##  geneplotter            1.68.0   2021-01-05 [2] Bioconductor  
##  generics               0.1.0    2020-10-31 [2] CRAN (R 4.0.3)
##  GenomeInfoDb           1.26.2   2021-01-05 [2] Bioconductor  
##  GenomeInfoDbData       1.2.4    2020-10-12 [2] Bioconductor  
##  GenomicAlignments      1.26.0   2021-01-05 [2] Bioconductor  
##  GenomicFeatures        1.42.1   2021-01-05 [2] Bioconductor  
##  GenomicRanges          1.42.0   2021-01-05 [2] Bioconductor  
##  ggplot2                3.3.3    2020-12-30 [2] CRAN (R 4.0.3)
##  glue                   1.4.2    2020-08-27 [2] CRAN (R 4.0.3)
##  gridExtra              2.3      2017-09-09 [2] CRAN (R 4.0.3)
##  gtable                 0.3.0    2019-03-25 [2] CRAN (R 4.0.3)
##  Gviz                   1.34.0   2021-01-05 [2] Bioconductor  
##  Hmisc                  4.4-2    2020-11-29 [2] CRAN (R 4.0.3)
##  hms                    0.5.3    2020-01-08 [2] CRAN (R 4.0.3)
##  htmlTable              2.1.0    2020-09-16 [2] CRAN (R 4.0.3)
##  htmltools              0.5.0    2020-06-16 [2] CRAN (R 4.0.3)
##  htmlwidgets            1.5.3    2020-12-10 [2] CRAN (R 4.0.3)
##  httr                   1.4.2    2020-07-20 [2] CRAN (R 4.0.3)
##  IRanges                2.24.1   2021-01-05 [2] Bioconductor  
##  jpeg                   0.1-8.1  2019-10-24 [2] CRAN (R 4.0.3)
##  knitr                * 1.30     2020-09-22 [2] CRAN (R 4.0.3)
##  lattice                0.20-41  2020-04-02 [2] CRAN (R 4.0.3)
##  latticeExtra           0.6-29   2019-12-19 [2] CRAN (R 4.0.3)
##  lazyeval               0.2.2    2019-03-15 [2] CRAN (R 4.0.3)
##  lifecycle              0.2.0    2020-03-06 [2] CRAN (R 4.0.3)
##  locfit                 1.5-9.4  2020-03-25 [2] CRAN (R 4.0.3)
##  magick                 2.5.2    2020-11-10 [2] CRAN (R 4.0.3)
##  magrittr               2.0.1    2020-11-17 [2] CRAN (R 4.0.3)
##  Matrix                 1.3-0    2020-12-22 [2] CRAN (R 4.0.3)
##  MatrixGenerics         1.2.0    2021-01-05 [2] Bioconductor  
##  matrixStats            0.57.0   2020-09-25 [2] CRAN (R 4.0.3)
##  memoise                1.1.0    2017-04-21 [2] CRAN (R 4.0.3)
##  munsell                0.5.0    2018-06-12 [2] CRAN (R 4.0.3)
##  nnet                   7.3-14   2020-04-26 [2] CRAN (R 4.0.3)
##  openssl                1.4.3    2020-09-18 [2] CRAN (R 4.0.3)
##  pillar                 1.4.7    2020-11-20 [2] CRAN (R 4.0.3)
##  pkgbuild               1.2.0    2020-12-15 [2] CRAN (R 4.0.3)
##  pkgconfig              2.0.3    2019-09-22 [2] CRAN (R 4.0.3)
##  pkgload                1.1.0    2020-05-29 [2] CRAN (R 4.0.3)
##  png                    0.1-7    2013-12-03 [2] CRAN (R 4.0.3)
##  prettyunits            1.1.1    2020-01-24 [2] CRAN (R 4.0.3)
##  processx               3.4.5    2020-11-30 [2] CRAN (R 4.0.3)
##  progress               1.2.2    2019-05-16 [2] CRAN (R 4.0.3)
##  ProtGenerics           1.22.0   2021-01-05 [2] Bioconductor  
##  ps                     1.5.0    2020-12-05 [2] CRAN (R 4.0.3)
##  purrr                  0.3.4    2020-04-17 [2] CRAN (R 4.0.3)
##  R6                     2.5.0    2020-10-28 [2] CRAN (R 4.0.3)
##  rappdirs               0.3.1    2016-03-28 [2] CRAN (R 4.0.3)
##  RColorBrewer           1.1-2    2014-12-07 [2] CRAN (R 4.0.3)
##  Rcpp                   1.0.5    2020-07-06 [2] CRAN (R 4.0.3)
##  RCurl                  1.98-1.2 2020-04-18 [2] CRAN (R 4.0.3)
##  remotes                2.2.0    2020-07-21 [2] CRAN (R 4.0.3)
##  rlang                  0.4.10   2020-12-30 [2] CRAN (R 4.0.3)
##  rmarkdown            * 2.6      2020-12-14 [2] CRAN (R 4.0.3)
##  rpart                  4.1-15   2019-04-12 [2] CRAN (R 4.0.3)
##  rprojroot              2.0.2    2020-11-15 [2] CRAN (R 4.0.3)
##  Rsamtools              2.6.0    2021-01-05 [2] Bioconductor  
##  RSQLite                2.2.1    2020-09-30 [2] CRAN (R 4.0.3)
##  rstudioapi             0.13     2020-11-12 [2] CRAN (R 4.0.3)
##  rtracklayer            1.50.0   2021-01-05 [2] Bioconductor  
##  S4Vectors              0.28.1   2021-01-05 [2] Bioconductor  
##  scales                 1.1.1    2020-05-11 [2] CRAN (R 4.0.3)
##  sessioninfo            1.1.1    2018-11-05 [2] CRAN (R 4.0.3)
##  srnadiff             * 1.10.1   2021-01-05 [1] Bioconductor  
##  stringi                1.5.3    2020-09-09 [2] CRAN (R 4.0.3)
##  stringr                1.4.0    2019-02-10 [2] CRAN (R 4.0.3)
##  SummarizedExperiment   1.20.0   2021-01-05 [2] Bioconductor  
##  survival               3.2-7    2020-09-28 [2] CRAN (R 4.0.3)
##  testthat               3.0.1    2020-12-17 [2] CRAN (R 4.0.3)
##  tibble                 3.0.4    2020-10-12 [2] CRAN (R 4.0.3)
##  tidyselect             1.1.0    2020-05-11 [2] CRAN (R 4.0.3)
##  usethis                2.0.0    2020-12-10 [2] CRAN (R 4.0.3)
##  VariantAnnotation      1.36.0   2021-01-05 [2] Bioconductor  
##  vctrs                  0.3.6    2020-12-17 [2] CRAN (R 4.0.3)
##  withr                  2.3.0    2020-09-22 [2] CRAN (R 4.0.3)
##  xfun                   0.19     2020-10-30 [2] CRAN (R 4.0.3)
##  XML                    3.99-0.5 2020-07-23 [2] CRAN (R 4.0.3)
##  xml2                   1.3.2    2020-04-23 [2] CRAN (R 4.0.3)
##  xtable                 1.8-4    2019-04-21 [2] CRAN (R 4.0.3)
##  XVector                0.30.0   2021-01-05 [2] Bioconductor  
##  yaml                   2.2.1    2020-02-01 [2] CRAN (R 4.0.3)
##  zlibbioc               1.36.0   2021-01-05 [2] Bioconductor  
## 
## [1] /tmp/Rtmp46jCrd/Rinst31282343969c
## [2] /home/biocbuild/bbs-3.12-bioc/R/library

References

Anders, S., and W. Huber. 2010. “Differential Expression Analysis for Sequence Count Data.” Genome Biology 11 (10): R106.

Aumann, Y., and Y. Lindell. 2003. “A statistical theory for quantitative association rules.” Journal of Intelligent Information Systems 20: 255–83.

Kozomara, A., and S. Griffiths-Jones. 2014. “Base: annotating high confidence microRNAs using deep sequencing data.” NAR 42: D68–D73.

Salzberg, S., and B. Langmead. 2012. “Fast gapped-read alignment with Bowtie 2.” Nature Methods 9: 357–59.

Viollet, C., Davis D. A., M. Reczko, J. M. Ziegelbauer, F. Pezzella, J. Ragoussis, and R. Yarchoan. 2015. “Next-generation sequencing analysis reveals differential expression profiles of miRNA-mRNA target pairs in KSHV-infected cells.” PLOS ONE 10: 1–23.

“Finding differentially expressed unannotated genomic regions from RNA-seq data with srnadiff”

5 January 2021

Package

1 Version info

2 Abstract

3 Introduction

3.1 Citing srnadiff

3.2 How to get help for srnadiff

3.3 Quick start

4 Using srnadiff

4.1 Installation

4.2 Data overview

4.3 Data preparation: the `srnadiffExp` object

4.4 Read annotation

4.4.1 Extraction of putative regions using an GTF annotation file

4.4.2 Whole genome file annotation

4.4.3 Extraction of precursor miRNAs using a miRBase-formatted GFF file

4.4.4 Extraction of mature miRNAs using a miRBase-formatted GFF file

4.4.5 Other format

4.5 Performing sRNA-diff

5 Working with the `srnadiffExp` object

5.1 Extracting regions

5.2 Data visualization

6 Methods behind srnadiff

6.1 Pre-processing data

6.2 Methods to produce differentially expressed regions

6.2.1 HMM method: `hmm`

6.2.2 IR method: `IR`

6.2.3 Naive method: `naive`

6.3 Quantifying DERs

6.4 General parameters

6.5 Combination of strategies

6.5.1 Choice of the strategies

6.5.2 Quantification of the features

7 Misc

7.1 Using several cores

7.2 Troubleshooting

8 Session information

References

“Finding differentially expressed unannotated genomic regions from RNA-seq data with srnadiff”

5 January 2021

Package

1 Version info

2 Abstract

3 Introduction

3.1 Citing srnadiff

3.2 How to get help for srnadiff

3.3 Quick start

4 Using srnadiff

4.1 Installation

4.2 Data overview

4.3 Data preparation: the srnadiffExp object

4.4 Read annotation

4.4.1 Extraction of putative regions using an GTF annotation file

4.4.2 Whole genome file annotation

4.4.3 Extraction of precursor miRNAs using a miRBase-formatted GFF file

4.4.4 Extraction of mature miRNAs using a miRBase-formatted GFF file

4.4.5 Other format

4.5 Performing sRNA-diff

5 Working with the srnadiffExp object

5.1 Extracting regions

5.2 Data visualization

6 Methods behind srnadiff

6.1 Pre-processing data

6.2 Methods to produce differentially expressed regions

6.2.1 HMM method: hmm

6.2.2 IR method: IR

6.2.3 Naive method: naive

6.3 Quantifying DERs

6.4 General parameters

6.5 Combination of strategies

6.5.1 Choice of the strategies

6.5.2 Quantification of the features

7 Misc

7.1 Using several cores

7.2 Troubleshooting

8 Session information

References

4.3 Data preparation: the `srnadiffExp` object

5 Working with the `srnadiffExp` object

6.2.1 HMM method: `hmm`

6.2.2 IR method: `IR`

6.2.3 Naive method: `naive`