This is a packge which stores data used in R workshops sponsored by the Casualty Actuarial Society1. This short vignette will describe the various data sets and give examples of their use. As well, there is a short note about each of the packages which we suggest be installed.
library(raw)
Once the raw
package has been installed, any of the data sets may be accessed. To get a list of the data sets available in any package in R, one may call the data
function with the name of the package as an argument.
data(package = "raw")
To get help about any data set, one may use R’s help facility to display the documentation for that data set.
?COTOR2
There are four sets in all. Note that the numbering begins at 2. So far as I am aware, the data for the first COTOR challenge is not available from the CAS website. Each data set has a set of randomly generated non-life insurance claims.
This is data taken from Appendix A of the Basic Ratemaking study note by Werner and Modlin. This is a suite of data - six sets in all - pertaining to personal auto. More information may be found in the note itself.
data(PPA)
head(PPA_LossDevelopment)
This contains basic data
data("Hurricane")
hist(Hurricane$Wind, xlab = "Wind Speed (knots)", main = "")
This data represents ten complete years of Schedule P development from many NAIC reporting companies. The data was prepared by Glenn Meyers and Peng Shi and is available from the CAS.
When installed using the command below, raw
will also ensure that a useful suite of packages is installed.
install.packages("raw", dependencies = "Suggests")
This contains a varied set of useful actuarial tools, from loss distributions to credibility to compound loss models. This also contains the data from the “Loss Models” textbook by Klugman, Panjer and Wilmot. Read more here: https://cran.r-project.org/package=actuar.
ChainLadder supports the standard suite of loss reserving methods. A quick overview may be found here: https://github.com/mages/ChainLadder.
The mondate package enables one to calculate time differences in terms of months. Trust me, you need this.
library(mondate)
##
## Attaching package: 'mondate'
## The following object is masked from 'package:base':
##
## as.difftime
<- mondate("2010-03-31")
endOfQuarter ::add(endOfQuarter, 3, "months") mondate
## mondate: timeunits="months"
## [1] 2010-06-30
Contains functions for present value, annuities, internal rate of return and more. Website here: http://felixfan.github.io/FinCal/
At the time of writing, this package has been removed from the list of suggested packages that ships with raw
. This is due to a potential installation issue that could have affected its acceptance when submitting to CRAN. Despite that, I can strongly suggest that actuaries install this package on their own. To do so, simply type the command below.
install.packages("rstan")
The rstan
package is the R implementation of the Stan MCMC project. This is an incredibly powerful and easy to use Bayesian framework. No, really, it is. If you’re into Bayesian computation, this is essential. If you’re not a Bayesian, this package will make you one. There are piles of material on the Stan website https://mc-stan.org/users/interfaces/rstan. The tutorials are easy to follow and worth your time.
library(tidyverse)
Hadley Wickham is an incredibly prolific and popular programmer who has made some big contributions to the R landscape.
I use the dplyr
package on a daily basis. It has a simple, easy to understand grammar for data manipulation and summarization. Read the introduction here: https://github.com/tidyverse/dplyr.
The readr
package has a few nice improvements to the basic functions for reading in flat data files. Among other things, it makes it easier to specify column data types and will give useful warnings when it encounters problematic data cells. Read more here: https://github.com/tidyverse/readr
There are a number of packages to read and write Excel files (xlsx
and XLConnect
are two). However, most use a java virtual machine to access the files and can run into memory issues. readxl
doesn’t.
tidyr is a light but useful set of functions to manipulate data. You’ll most often see the spread
and gather
functions used to transform a data set between “long” and “wide” formats. There’s a short intro here: https://blog.rstudio.com/2014/07/22/introducing-tidyr/.
This is a very powerful, flexible graphing engine. Quite a few books have been written about it, which is testament to both its complexity and utility.
Lubridate has a wealth of functions for manipulating dates and time intervals.
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:mondate':
##
## day, month, quarter, year, ymd
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
<- mdy("2/16/1972")
myDate year(myDate) <- 2016
Scales has some functions to render numbers in pretty formats.
library(scales)
data("COTOR2")
head(COTOR2)
## [1] 1009.2364 121785.9300 913.5061 4513.6190 1010.8515 3227.8844
head(dollar(COTOR2))
## [1] "$1,009" "$121,786" "$914" "$4,514" "$1,011" "$3,228"
This package is likely not all that useful to beginners, but it does have one very useful function: install_github
. This will allow you to intall the development version of packages from the GitHub codesharing site. For example, the command below will ensure that you’re always up to date with this package.
::install_github("PirateGrunt/raw") devtools
The knitr package, written by Yihue Xie, enables a user to combine R code and documentation in a single file. That file may then be converted into a format like PDF, HTML or Word.
This package enhances knitr
to make the document conversion described above a bit easier. This is an incredibly powerful framework. Read more here: https://rmarkdown.rstudio.com/.
The CAS has neither produced nor endorsed the content of this package.↩︎