splatter 1.14.1
suppressPackageStartupMessages({
library("splatter")
library("scater")
library("VariantAnnotation")
library("ggplot2")
})
splatPop is an extension of the splat model that allows you to simulate single cell count data for an entire population of individuals. Like with splat, these simulations resemble real single-cell data because they use parameters estimated from empirical data. Provided with genotype information (VCF) for a population as input, splatPop simulates gene counts for multiple cells for all individuals in the population. Realistic population structure (the pattern of genetic relatedness between individuals in the population) in the simulations is achieved by modelling expression Quantitative Trait Loci (eQTL) effects, where the expression of a gene is associated with the genotype of the individual at a specific loci. Finally, splatPop allows for the simulation of complex datasets with cells from multiple groups (e.g. cell types), cells along differentiation trajectories, and cells from different batches.
The primary simulation function is splatPopSimulate
, which runs through the
two main phases:
splatPopSimulateMeans
: the simulation of means for all genes for all
individuals in the population.splatPopSimulateSC
: the simulation of single-cell counts for all cells for
all genes for all individuals.The second phase is essentially a wrapper around the original splatSimulate()
function, which is described in detail here. The figure below
describes the first phase. Input parameters that can be estimated from real data
have double borders and are shaded by the type of data used (blue = single-cell
counts, yellow = population scale bulk/sc-aggregated RNA-seq data, and green =
eQTL mapping results). The final output (red) is a matrix of means for each
gene and each individual that is used as input to the second phase.
To get started with splatPop, you need genotype information for the population
you want to simulate (i.e. a VCF). Genotype information should be provided as a
VariantAnnotation object.
A mock VariantAnnotation object can be produced using the mockVCF()
function.
Here we simulate single-cell RNA-sequencing counts for 100 random genes for 6
random samples:
vcf <- mockVCF(n.samples = 6)
sim <- splatPopSimulate(vcf = vcf, "nGenes" = 100)
#> Getting parameters...
#> Simulating gene means for population...
#> Simulating population single cell counts...
#> Sparsifying assays...
#> Automatically converting to sparse matrices, threshold = 0.95
#> Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
#> Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
#> Skipping 'BCV': estimated sparse size 1.5 * dense matrix
#> Skipping 'CellMeans': estimated sparse size 1.48 * dense matrix
#> Skipping 'TrueCounts': estimated sparse size 2.83 * dense matrix
#> Skipping 'counts': estimated sparse size 2.83 * dense matrix
#> Done!
sim <- logNormCounts(sim)
sim <- runPCA(sim, ncomponents = 10)
plotPCA(sim, colour_by = "Sample")
The parameters used in splatPop have sensible default values, but can also be
estimated from real data provided by the user. For example, gene mean and
variance levels are sampled from gamma distributions derived from real
population scale RNA-seq data and eQTL effect sizes from a gamma distribution
derived from real eQTL mapping results. The default parameters were derived
from GTEx data (v7, thyroid tissue). However, they can
also be estimated from user provided data using splatPopEstimate()
. You can
also provide splatPopEstimate()
with real single-cell RNA-sequencing data to
estimate single-cell parameters as in splatEstimate()
.
All parameters needed for splatPop simulations are stored in a
SplatPopParams
object. In addition to the compartments in SplatParams
described in detail in the Splat parameters vignette and
the parameters that are set manually (described below),
SplatPopParams
also contains the following parameters that can be estimated
from real data:
pop.mean.shape
- Shape parameter for mean expression from population
scale data.pop.mean.rate
- Rate parameter for mean expression from population
scale data.pop.cv.param
- Shape and rate parameters for the coefficient of
variation (cv) across individuals from the population scale data, binned by
mean expression.eqtl.ES.shape
- Shape parameter for eQTL effect sizes.eqtl.ES.rate
- Rate parameter for eQTL effect sizes.Let’s take a look at the default parameters…
params <- newSplatPopParams()
params
#> A Params object of class SplatPopParams
#> Parameters can be (estimable) or [not estimable], 'Default' or 'NOT DEFAULT'
#> Secondary parameters are usually set during simulation
#>
#> Global:
#> (Genes) (Cells) [Seed]
#> 10000 100 117638
#>
#> 41 additional parameters
#>
#> Batches:
#> [Batches] [Batch Cells] [Location] [Scale] [Remove]
#> 1 100 0.1 0.1 FALSE
#>
#> Mean:
#> (Rate) (Shape)
#> 0.3 0.6
#>
#> Library size:
#> (Location) (Scale) (Norm)
#> 11 0.2 FALSE
#>
#> Exprs outliers:
#> (Probability) (Location) (Scale)
#> 0.05 4 0.5
#>
#> Groups:
#> [Groups] [Group Probs]
#> 1 1
#>
#> Diff expr:
#> [Probability] [Down Prob] [Location] [Scale]
#> 0.1 0.5 0.1 0.4
#>
#> BCV:
#> (Common Disp) (DoF)
#> 0.1 60
#>
#> Dropout:
#> [Type] (Midpoint) (Shape)
#> none 0 -1
#>
#> Paths:
#> [From] [Steps] [Skew] [Non-linear] [Sigma Factor]
#> 0 100 0.5 0.1 0.8
#>
#> Population params:
#> (mean.shape) (mean.rate) [similarity.scale] [cv.bins]
#> 0.3395709 0.008309486 1 10
#>
#> (cv.params)
#> data.frame (10 x 4) with columns: start, end, shape, rate
#> start end shape rate
#> 1 0.000 0.476 11.636709 8.229737
#> 2 0.476 0.955 5.084263 3.236401
#> 3 0.955 1.860 3.161149 1.901426
#> 4 1.860 3.490 2.603407 1.615142
#> # ... with 6 more rows
#>
#> eQTL params:
#> [eqtl.n] [eqtl.dist] [eqtl.maf.min]
#> 1 1e+06 0.05
#> [eqtl.maf.max] [eqtl.group.specific] (eqtl.ES.shape)
#> 0.5 0.2 2.538049
#> (eqtl.ES.rate)
#> 5.962323
This tells us we have “a Params
object of class SplatPopParams
” and shows
the values of these parameters. As with SplatParams
, the parameters that can
be estimated by splatPopEstimate
are in parentheses, those that can’t be
estimated are in brackets, and those that have been changed from their default
are in ALL CAPS.
For example, we can estimate new parameter values from user provided data…
bulk.means <- mockBulkMatrix(n.genes=100, n.samples=100)
bulk.eqtl <- mockBulkeQTL(n.genes=100)
counts <- mockSCE()
params.est <- splatPopEstimate(means = bulk.means,
eqtl = bulk.eqtl,
counts = counts)
#> NOTE: Library sizes have been found to be normally distributed instead of log-normal. You may want to check this is correct.
params.est
#> A Params object of class SplatPopParams
#> Parameters can be (estimable) or [not estimable], 'Default' or 'NOT DEFAULT'
#> Secondary parameters are usually set during simulation
#>
#> Global:
#> (GENES) (CELLS) [Seed]
#> 2000 200 117638
#>
#> 41 additional parameters
#>
#> Batches:
#> [BATCHES] [BATCH CELLS] [Location] [Scale] [Remove]
#> 1 200 0.1 0.1 FALSE
#>
#> Mean:
#> (RATE) (SHAPE)
#> 0.00299908742852435 0.50442538306756
#>
#> Library size:
#> (LOCATION) (SCALE) (NORM)
#> 366788.04 10485.5999303044 TRUE
#>
#> Exprs outliers:
#> (PROBABILITY) (Location) (Scale)
#> 0 4 0.5
#>
#> Groups:
#> [Groups] [Group Probs]
#> 1 1
#>
#> Diff expr:
#> [Probability] [Down Prob] [Location] [Scale]
#> 0.1 0.5 0.1 0.4
#>
#> BCV:
#> (COMMON DISP) (DOF)
#> 0.742961631417432 3464.9110945956
#>
#> Dropout:
#> [Type] (MIDPOINT) (SHAPE)
#> none 2.70949043016295 -1.37242676395376
#>
#> Paths:
#> [From] [Steps] [Skew] [Non-linear] [Sigma Factor]
#> 0 100 0.5 0.1 0.8
#>
#> Population params:
#> (MEAN.SHAPE) (MEAN.RATE) [similarity.scale] [cv.bins]
#> 0.438442140538142 0.0075545657715684 1 10
#>
#> (CV.PARAMS)
#> data.frame (10 x 4) with columns: start, end, shape, rate
#> start end shape rate
#> shape 0.00 1.26 20.83335 25.39937
#> 2 1.26 2.11 86.00920 117.46876
#> 3 2.11 6.33 18.76159 22.41237
#> 4 6.33 10.70 14.60420 15.37294
#> # ... with 6 more rows
#>
#> eQTL params:
#> [eqtl.n] [eqtl.dist] [eqtl.maf.min]
#> 1 1e+06 0.05
#> [eqtl.maf.max] [eqtl.group.specific] (EQTL.ES.SHAPE)
#> 0.5 0.2 3.27497744363571
#> (EQTL.ES.RATE)
#> 8.33496846971044
Note that splatPopEstimate()
will only estimate new parameters if the data
required is provided. For example, if you want to simulate data using default
gene means and eQTL parameters, but from single-cell parameters estimated from
your own real single-cell counts data, you could run splatPopEstimate()
with
only the counts
argument provided.
The splatPopSimulate()
function runs both phases of splatPop, however we can
run these two phases separately to highlight their unique functions. The
first phase is run using splatPopSimulateMeans()
.
This function requires two pieces of input data: genotypes and genes. Mock
genotype and gene data can be provided using mockVCF()
and mockGFF()
,
respectively. These mock functions generate random SNP and gene annotation data
for chromosome 22. To simulate populations with realistic population structure,
the user should provide real (or simulated) genotypes as a VCF file read in as a
VariantAnnotation
object.
splatPop takes in information about what genes to simulate in three ways:
data.frame
object. splatPop will filter out all non-gene features (3rd column != gene).
This method uses real gene names and locations, but will randomly assign
expression values and eQTL effects to these genes.data.frame
object including
information about genes you want to simulate. This object must include the
gene’s name (geneID), chromosome (chromosome), and location
(geneMiddle). With just those columns, splatPop will function the same as
if a GFF was provided. However, you can also use this object to specify
other information. For example, if you provide a desired mean (meanSampled)
and variance (cvSampled) for each gene, splatPop will use these instead of
randomly sampled values. Finally, if you provide the type (eQTL.type, e.g.
NA or global), SNP identifier (eSNP.ID), and effect size
(eQTL.EffectSize), splatPop will simulate gene means with these eQTL
associations instead of generating eQTL associations randomly.mockGFF()
to
generate a random GFF file for a specified chromosome. This is the default
option if neither gff
or key
is provided.In addition to the parameters estimated from real data, the SplatPopParams
object also includes control parameters that must be set by the user. The
following SplatPopParams
control parameters can be changed using
setParams()
:
similarity.scale
- Scaling factor for the population variance (cv) rate
parameter. Increasing this scaling factor increases the similarity between
individuals.eqtl.n
- Number (>1) or percent (<=1) of genes to assign with eQTL
effects.eqtl.dist
- Maximum distance (bp) between the center of a gene and
possible eSNPs for that gene.eqtl.maf.min
- Minimum Minor Allele Frequency (MAF) of eSNPs.eqtl.maf.max
- Maximum MAF of eSNPs.eqtl.group.specific
- Percent of eQTL effects to make group specific.
The number of groups is specified using the “group.prob” parameter.nGroups
- Number of groups to simulate for each individual.group.prob
- Array of the proportion of cells that should be simulated
in each group.In addition to the group specific eQTL effects, each group will have group specific differential expression effects, which are not associated with a genetic variant). These parameters are estimated from real single-cell data as described in splatter.
The output of splatPopSimulateMeans()
is a list containing:
means
- a data.frame (or list of data.frames if nGroups
> 1) with
simulated mean gene expression value for each gene (row) and each sample
(column).key
- a data.frame listing for all simulated genes: the assigned mean
and variance (before and after quantile normalization), the assigned eSNP
and its effect size and type (global/group specific), and other group effects.Note that when splatPopSimulate()
is run, these to objects are contained in
the output SingleCellExperiment object (details below). Let’s look at a
snapshot of some simulated means and the corresponding key…
vcf <- mockVCF(n.samples = 6)
gff <- mockGFF(n.genes = 100)
sim.means <- splatPopSimulateMeans(vcf = vcf, gff = gff,
params = newSplatPopParams())
#> Simulating gene means for population...
round(sim.means$means[1:5, 1:6], digits = 2)
#> sample_1 sample_2 sample_3 sample_4 sample_5 sample_6
#> gene_001 0.82 1.25 0.75 1.11 3.31 0.90
#> gene_002 1.16 1.16 1.16 1.38 1.95 0.86
#> gene_003 6.01 0.00 7.96 0.00 8.88 4.21
#> gene_004 3.53 5.12 1.20 0.03 3.96 1.45
#> gene_005 4.63 3.24 5.11 4.43 4.43 3.24
print(sim.means$key[1:5, ], digits = 2)
#> geneID chromosome geneStart geneEnd geneMiddle meanSampled cvSampled
#> 1 gene_001 22 430514 432586 431550 30 0.841
#> 2 gene_002 22 1102267 1105337 1103802 8 0.029
#> 3 gene_003 22 1789000 1791487 1790243 127 0.805
#> 4 gene_004 22 3031738 3031552 3031831 46 0.587
#> 5 gene_005 22 7119924 7122070 7120997 96 0.173
#> eQTL.type eSNP.ID eSNP.chromosome eSNP.loc eSNP.MAF eQTL.EffectSize
#> 1 global snp_08925 22 1124844 0.083 0.49
#> 2 global snp_00651 22 1597464 0.333 0.17
#> 3 global snp_03246 22 1015271 0.250 -0.51
#> 4 global snp_02476 22 2805389 0.250 0.41
#> 5 global snp_06566 22 6124850 0.083 0.38
#> meanQuantileNorm cvQuantileNorm
#> 1 1.2 0.47
#> 2 1.1 0.17
#> 3 5.2 0.62
#> 4 2.0 0.66
#> 5 3.9 0.16
Replicate a simulation by providing a gene key
As described above, information about genes can also be provided in a data.frame
using the key
argument. If you provide splatPopSimulateMeans()
with the key
output from a previous run, it will generate a new population with the same
properties, essentially creating a replicate. Here is a snapshot of such a
replicate using the key simulated above:
sim.means.rep2 <- splatPopSimulateMeans(vcf = vcf, key=sim.means$key,
params = newSplatPopParams())
#> Simulating gene means for population...
round(sim.means.rep2$means[1:5, 1:6], digits = 2)
#> sample_1 sample_2 sample_3 sample_4 sample_5 sample_6
#> gene_001 1.99 1.76 0.04 1.69 2.95 4.16
#> gene_002 1.19 0.98 1.39 1.48 1.48 1.15
#> gene_003 3.48 3.58 7.33 0.00 6.03 8.73
#> gene_004 2.13 3.32 2.90 3.21 4.53 2.68
#> gene_005 5.36 3.64 6.32 3.48 4.16 2.78
Use real population-scale bulk expression data
An important step of splatPopSimulate()
is the quantile normalization of
simulated gene means for each sample to match a gamma distribution
estimated from real single-cell RNA-seq data using splatEstimate()
or
splatPopEstimate()
. This step ensures that even if bulk sequencing data are
used to estimate population parameters, the means output from
splatPopSimulateMeans()
will be distributed like a single-cell dataset.
If you already have bulk expression data for a population, you can use this
quantile normalization function directly on that data and use the output as
input to splatPopSimulateSC()
. Note that this will not simulate eQTL or group
effects, just simulate single-cell counts using the bulk means provided.
bulk.qnorm <- splatPopQuantNorm(newSplatPopParams(), bulk.means)
round(bulk.qnorm[1:5, 1:5], 3)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.692 1.503 1.764 1.244 0.070
#> [2,] 0.538 0.592 0.398 0.538 0.692
#> [3,] 1.152 1.832 1.559 3.336 2.438
#> [4,] 3.907 0.054 3.751 1.337 1.195
#> [5,] 3.069 0.054 2.957 5.872 0.070
Finally, single cell level data is simulated using splatPopSimulateSC()
.
Running this function on its own requires the SplatPopParams
object, and the
two outputs from splatPopSimulateMeans()
: the key and the simulated means
matrix (or list of matrices if nGroups > 1). The user can also provide
additional parameters for the single-cell simulation, for example how many
cells to simulate.
Looking at the output of splatPopSimulateSC()
we see that it is a single
SingleCellExperiment
object with a row for each feature (gene) and
a column for each cell. The simulated counts are accessed using counts
.
although it can also hold other expression measures such as FPKM or TPM.
Information about each cell (e.g. sample, group, batch) is held in the
colData
and information about each gene (e.g. location, eQTL effects, and
other data from the splatPop key) is held in the rowData
.
sim.sc <- splatPopSimulateSC(params=params,
key = sim.means$key,
sim.means=sim.means$means,
batchCells=50)
#> Simulating population single cell counts...
#> Sparsifying assays...
#> Automatically converting to sparse matrices, threshold = 0.95
#> Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
#> Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
#> Skipping 'BCV': estimated sparse size 1.5 * dense matrix
#> Skipping 'CellMeans': estimated sparse size 1.48 * dense matrix
#> Skipping 'TrueCounts': estimated sparse size 2.8 * dense matrix
#> Skipping 'counts': estimated sparse size 2.8 * dense matrix
#> Done!
sim.sc
#> class: SingleCellExperiment
#> dim: 100 300
#> metadata(2): Params Simulated_Means
#> assays(6): BatchCellMeans BaseCellMeans ... TrueCounts counts
#> rownames(100): gene_001 gene_002 ... gene_099 gene_100
#> rowData names(22): Row.names sample_1_BaseGeneMean ... meanQuantileNorm
#> cvQuantileNorm
#> colnames(300): Cell1 Cell2 ... Cell49 Cell50
#> colData names(4): Cell Batch ExpLibSize Sample
#> reducedDimNames(0):
#> altExpNames(0):
We can visualize these simulations using plotting functions from scater like plotPCA…
sim.sc <- logNormCounts(sim.sc)
sim.sc <- runPCA(sim.sc, ncomponents = 10)
plotPCA(sim.sc, colour_by = "Sample")
Using the same methods as splat, splatPop allows you to simulate single-cell
counts for a population with group (e.g. cell-types), batch, and path (e.g.
developmental series) effects. Group effects are simulated by
splatPopSimulateMeans()
and applied to the single cell simulations in
splatPopSimulateSC()
. Path and batch effects are simulated by
splatPopSimulateSC()
.
The population simulated above is an example of a dataset with a single cell type across many samples. However, splatPop also allows you to simulate population-scale data for a mixture of cell-types (i.e. groups).
Two types of group effects are included: group-eQTL and group-differential
expression (DE) effects. The number of groups to simulate is set using the
group.prob parameter in SplatPopParams
. The DE effects are implemented as
in the splat
simulation, with the user able to control splatPopParam
parameters including de.prob, de.downProb, de.facLoc, and de.facScale.
For group-specific eQTL, the proportion of eQTL to designate as group-specific
eQTL is set using eqtl.group.specific.
When used to simulate single-cell data with group-specific effects,
splatSimulatePop
also outputs:
colData
)
Group
- The group ID for each cell.params.group <- newSplatPopParams(nGenes = 50,
batchCells = 40,
group.prob = c(0.5, 0.5))
sim.sc.gr2 <- splatPopSimulate(vcf = vcf, params = params.group)
#> Getting parameters...
#> Simulating gene means for population...
#> Simulating sc counts for Group1...
#> Simulating sc counts for Group2...
#> Sparsifying assays...
#> Automatically converting to sparse matrices, threshold = 0.95
#> Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
#> Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
#> Skipping 'BCV': estimated sparse size 1.5 * dense matrix
#> Skipping 'CellMeans': estimated sparse size 1.47 * dense matrix
#> Skipping 'TrueCounts': estimated sparse size 2.84 * dense matrix
#> Skipping 'counts': estimated sparse size 2.84 * dense matrix
#> Done!
sim.sc.gr2 <- logNormCounts(sim.sc.gr2)
sim.sc.gr2 <- runPCA(sim.sc.gr2, ncomponents = 10)
plotPCA(sim.sc.gr2, colour_by = "Group", shape_by = "Sample")
From the PCA plot above you can see that in this simulation the sample effect outweighs the group effect. But we can tune these parameters to change the relative weight of these effects. First we can decrease the sample effect by increasing the similarity.scale parameter. And second we can increase the group effect by adjusting the eqtl.group.specific and de parameters:
params.group <- newSplatPopParams(batchCells = 40,
nGenes = 50,
similarity.scale = 6,
eqtl.group.specific = 0.6,
de.prob = 0.5,
de.facLoc = 0.5,
de.facScale = 0.4,
group.prob = c(0.5, 0.5))
sim.sc.gr2 <- splatPopSimulate(vcf = vcf, params = params.group)
#> Getting parameters...
#> Simulating gene means for population...
#> Simulating sc counts for Group1...
#> Simulating sc counts for Group2...
#> Sparsifying assays...
#> Automatically converting to sparse matrices, threshold = 0.95
#> Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
#> Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
#> Skipping 'BCV': estimated sparse size 1.5 * dense matrix
#> Skipping 'CellMeans': estimated sparse size 1.47 * dense matrix
#> Skipping 'TrueCounts': estimated sparse size 2.81 * dense matrix
#> Skipping 'counts': estimated sparse size 2.81 * dense matrix
#> Done!
sim.sc.gr2 <- logNormCounts(sim.sc.gr2)
sim.sc.gr2 <- runPCA(sim.sc.gr2, ncomponents = 10)
plotPCA(sim.sc.gr2, colour_by = "Group", shape_by = "Sample")
Like splat, splatPop also allows you to simulate single-cell data with path or
batch effects using the method
tag in splatSimulatePop
. Note that you can
also set method = group, but this is done automatically by setting the
group.prob parameter. For more information about these settings, see the
Splat parameters vignette.
params.batches <- newSplatPopParams(batchCells = c(20, 20),
nGenes = 50,
similarity.scale = 5,
batch.facLoc = 0.3,
batch.facScale = 0.3)
sim.pop.batches <- splatPopSimulate(vcf = vcf, params = params.batches)
#> Getting parameters...
#> Simulating gene means for population...
#> Simulating population single cell counts...
#> Sparsifying assays...
#> Automatically converting to sparse matrices, threshold = 0.95
#> Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
#> Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
#> Skipping 'BCV': estimated sparse size 1.5 * dense matrix
#> Skipping 'CellMeans': estimated sparse size 1.47 * dense matrix
#> Skipping 'TrueCounts': estimated sparse size 2.82 * dense matrix
#> Skipping 'counts': estimated sparse size 2.82 * dense matrix
#> Done!
sim.pop.batches <- logNormCounts(sim.pop.batches)
sim.pop.batches <- runPCA(sim.pop.batches, ncomponents = 10)
plotPCA(sim.pop.batches, colour_by = "Batch", shape_by = "Sample",
ncomponents = 5:6)
params.paths <- newSplatPopParams(batchCells = 40,
nGenes = 50,
similarity.scale = 6,
de.facLoc = 0.5,
de.facScale = 0.5,
de.prob = 0.5)
sim.pop.paths <- splatPopSimulate(vcf = vcf, params = params.paths,
method = "paths")
#> Getting parameters...
#> Simulating gene means for population...
#> Simulating population single cell counts...
#> Sparsifying assays...
#> Automatically converting to sparse matrices, threshold = 0.95
#> Skipping 'BatchCellMeans': estimated sparse size 1.5 * dense matrix
#> Skipping 'BaseCellMeans': estimated sparse size 1.5 * dense matrix
#> Skipping 'BCV': estimated sparse size 1.5 * dense matrix
#> Skipping 'CellMeans': estimated sparse size 1.47 * dense matrix
#> Skipping 'TrueCounts': estimated sparse size 2.86 * dense matrix
#> Skipping 'counts': estimated sparse size 2.86 * dense matrix
#> Done!
sim.pop.paths <- logNormCounts(sim.pop.paths)
sim.pop.paths <- runPCA(sim.pop.paths, ncomponents = 10)
plotPCA(sim.pop.paths, colour_by = "Step", shape_by = "Sample",
ncomponents = 5:6)