RGalaxy

Dan Tenenbaum

Galaxy is an open, web-based platform for data-intensive biomedical research. It provides an easy-to-use web interface and can expose bioinformatics workflows written in any programming language.

Normally, in order to expose new functionality (a tool in Galaxy parlance) in a Galaxy instance, you have to manually create an XML file with information about the function, and modify an additional XML file.

The RGalaxy package automates this process, pulling most of the necessary information from the function itself and its manual page (you provide the remaining information as arguments to the galaxy function).

A Simple Example

Let's say you want to create a Galaxy tool that adds two numbers.

First, load RGalaxy:

library(RGalaxy)

Then write a function like this:


addTwoNumbers <-
function(
        number1=GalaxyNumericParam(required=TRUE),
        number2=GalaxyNumericParam(required=TRUE),
        sum=GalaxyOutput("sum", "txt"))
{
    cat(number1 + number2, file=sum)
}

There are a few things to notice about this function:

t <- tempfile()
addTwoNumbers(2, 2, t)
readLines(t, warn=FALSE)
## [1] "4"

Documenting the Example

We're almost ready to tell Galaxy about our function, but first we need to document it with a manual page. RGalaxy will use information in this page to create the Galaxy tool, and the man page will also be useful to anyone who wants to run your function in R.

The man page might look like this:

\name{addTwoNumbers}

\alias{addTwoNumbers}
\title{Add two numbers}

\description{
    An example function that can be made into a Galaxy tool.
    Takes two numbers, adds them, and returns a file containing
    the result.
}

\usage{
addTwoNumbers(number1=GalaxyNumericParam(required=TRUE),
        number2=GalaxyNumericParam(required=TRUE),
        sum=GalaxyOutput("sum", "txt"))
}

\arguments{
  \item{number1}{
    The first number to add.
  }
  \item{number2}{
    The second number to add.
  }
  \item{sum}{
    Where the result of the addition should be written.
  }
}

\value{
  invisible(NULL)
}

\seealso{
    \code{\link{galaxy}}, 
    \code{\link{GalaxyConfig}}, \code{\link{GalaxyOutput}},
    \code{\link{addTwoNumbers}}

}

\examples{
t <- tempfile()
addTwoNumbers(2, 2, t)
readLines(t, warn=FALSE)
}

Some things to note about this man page:

Installing Galaxy

Before we can tell Galaxy about our function, we have to install Galaxy.

The Galaxy Installation page gives full instructions, but in a nutshell, you can install Galaxy as follows (you may need to install Mercurial, which provides the hg command):

hg clone https://bitbucket.org/galaxy/galaxy-dist/

The directory where you just installed Galaxy (a full path ending in galaxy-dist) is your “Galaxy Home” directory, represented by galaxyHome in the following code snippet.

Telling Galaxy about the function

Now we point Galaxy to the function we just wrote:

galaxy("addTwoNumbers",
    galaxyConfig=
      GalaxyConfig(galaxyHome, "mytool", "Test Section",
        "testSectionId")
    )
## Warning: Not enough information to create a functional test.
## Note: Did not find section 'Details' in man page.
## [1] "/tmp/Rtmp1TlDbP/Rbuild657262645e9b/RGalaxy/vignettes/tools/mytool/addTwoNumbers.xml"

Notice the warning about functional tests. We'll cover that later in the vignette.

The galaxy function notifies you that the details section of the man page is empty. It also returns the path to the XML tool wrapper it created.

Running the example function in Galaxy

To start Galaxy, open a command window and change to your Galaxy home directory (defined earlier). Then issue this command:

./run.sh –reload

If Galaxy is already running, you should stop it (with control-C) and restart it with the command above. Galaxy should always be restarted after running the galaxy function.

You can now access Galaxy at http://localhost:3000.

If you click on “Test Section” and then “Add Two Numbers”, you should see something like Figure 1.

The Add Two Numbers tool in Galaxy

Things to notice about this:

If we enter 10 and 5, then click “Execute”, Galaxy will run and when finished will show 'sum.txt' in the History Pane at the right. Clicking on it should show something like Figure 2. You can download the result or send it to another Galaxy tool.

Result of Add Two Numbers

Functional Testing

We just ran Galaxy and made sure our tool worked. It would be nice to automate this procedure so we can know that for inputs x and y, our tool will always produce output d.

With a couple of small additions, we can accomplish this. Our function will have a self-contained test.

Also, when submitting tools to the public Galaxy instance, functional tests like this are required.

Here is our addTwoNumbers function again, this time with a functional test:


addTwoNumbersWithTest <-
function(
        number1=GalaxyNumericParam(required=TRUE, testValues=5L),
        number2=GalaxyNumericParam(required=TRUE, testValues=5L),
        sum=GalaxyOutput("sum", "txt"))
{
    cat(number1 + number2, file=sum)
}

The only visible difference is that we've added a testValues argument to each input parameter. Another, subtler difference is that we have added a file in our package called inst/functionalTests/addTwoNumbersWithTest/sum, which contains the expected output of the function. By using this convention, we ensure RGalaxy can find the file.

Does the function pass its functional test?

runFunctionalTest(addTwoNumbersWithTest)
## [1] TRUE

Note that this just runs the function in R, it does not test it inside a running Galaxy. But because the functional test infrastructure is present in the XML file generated by RGalaxy, you can do that from your Galaxy home directory as follows:

./run_functional_tests.sh -id addTwoNumbersWithTest

The output of the test will be written to run_functional_tests.html.

Note that R doesn't always produce the same output each time, even though the files may look identical. The pdf function in particular may produce different files. You can use the png function as a workaround.

Should my function be in a package?

We've glossed over it so far, but the addTwoNumbers() function and its man page live a package (the RGalaxy package in this case). It is possible to expose in Galaxy a function that does not live in a package, but you have to provide a lot of extra information. We recommend that the functions you expose live in a package (and be exported in your NAMESPACE file).

Best practices

Using Rserve for better performance

Galaxy runs tools by invoking scripts in various languages at the command line. These scripts are generally self-contained. Sometimes it can take a long time for the script to load its dependencies. Sometimes this takes longer than the actual work that the script is supposed to do. We can stop waiting for the script to load its dependencies if the script does its work on a remote instance of R where the dependencies have already been loaded. We accomplish this using the Rserve package.

To use Rserve, create an Rserv.conf file that contains statements like this:

eval library(LongLoadingPackage1)
eval library(LongLoadingPackage2)

Replace the package names with the packages your function uses that take a long time to load.

Start Rserve as follows:

R CMD Rserve --vanilla --RS-conf Rserv.conf

Re-run Galaxy on your function, specifying that Rserv should be used:

galaxy("addTwoNumbersWithTest",
    galaxyConfig=
      GalaxyConfig(galaxyHome, "mytool", "Test Section",
        "testSectionId"),
    RserveConnection=RserveConnection()
    )

Install the RSclient package:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("RSclient", site_repository="http://www.rforge.net")

Restart Galaxy if it is already running. Your function should be much faster.

You can run Rserve on a different machine (and on a different port) by passing this information to the RserveConnection() function:

RserveConnection(host="mymachine", port=2012L)
## An object of class "RserveConnection"
## Slot "host":
## [1] "mymachine"
## 
## Slot "port":
## [1] 2012

Note that the other machine should have shared disk space with the machine where you are running Galaxy.

A practical example

Suppose you have some Affymetrix probe IDs and you want to look up the PFAM and SYMBOL names for them. It's quite easy to write a function to expose this in Galaxy:


probeLookup <-
function(
    probe_ids=GalaxyCharacterParam(
        required=TRUE,
        testValues="1002_f_at 1003_s_at"),
    outputfile=GalaxyOutput("probeLookup", "csv"))
{
    suppressPackageStartupMessages(requireNamespace("hgu95av2.db"))
    ids <- strsplit(probe_ids, " ")[[1]]
    results <- AnnotationDbi::select(
        hgu95av2.db::hgu95av2.db, keys=ids, columns=c("SYMBOL","PFAM"),
        keytype="PROBEID")
    write.csv(results, file=outputfile)
}

Behind the scenes, we've also written a man page for the function, and put a test fixture in our package (which can be found at inst/functionalTests/probeLookup/outputfile).

Let's run it and make sure it works:

runFunctionalTest(probeLookup)
## 
## 'select()' returned 1:1 mapping between keys and columns
## [1] TRUE