Add new expecteds and variances to an experience study

mutate_expecvar() uses a new expected rate for a decrement of interest and adds a corresponding expected decrements column and corresponding variance of expected decrements column. If there are already expecteds and variances measures within the study dataset, either new, prefixed columns will be added or the current expecteds and variances can be overwritten.

Usage

mutate_expecvar(
  .data,
  new_expected_rates,
  new_expecvar_prefix = "auto",
  measure_sets = guess_measure_sets(.data),
  amount_scalar = NULL,
  .by = NULL,
  .keep = c("all", "used", "unused", "none"),
  .before = NULL,
  .after = NULL
)

Arguments

.data

A base::data.frame() that houses an experience study.

new_expected_rates

A numeric vector to use as the expected probability for the study's event of interest (i.e., policy lapse or insured death). This can be a column in the dataset or a new numeric vector of length 1 or nrow(.data).

new_expecvar_prefix

A string to distinguish the new expecteds and variances columns in the dataset. To overwrite existing expecteds and variances columns, use an argument value of NULL, character(), or ''. The default 'auto' will add a numeric prefix based on the previous names of expecteds/variances so that names will remain unique.

measure_sets

A (potentially named) list of measure sets. Only need to specify once if chaining multiple expstudy functions as the measure_sets will be passed as an attribute in results.

amount_scalar

A numeric vector to use when determining amount-weighted expecteds and variances. The function will determine whether or not the new expecteds/variances are amount-weighted if the corresponding actuals in the study have values greater than 1 (actuals that are not amount-weighted, i.e., counts, should only be 0 or 1).

.by

<tidy-select> Optionally, a selection of columns to group by for just this operation, functioning as an alternative to group_by(). For details and examples, see ?dplyr_by.

.keep

Control which columns from .data are retained in the output. Grouping columns and columns created by ... are always kept.

"all" retains all columns from .data. This is the default.
"used" retains only the columns used in ... to create new columns. This is useful for checking your work, as it displays inputs and outputs side-by-side.
"unused" retains only the columns not used in ... to create new columns. This is useful if you generate new columns, but no longer need the columns used to generate them.
"none" doesn't retain any extra columns from .data. Only the grouping variables and columns created by ... are kept.

.before, .after

<tidy-select> Optionally, control where new columns should appear (the default is to add to the right hand side). See relocate() for more details.

Value

An object of the same type as .data. The output has the following properties:

Columns from .data will be preserved according to the .keep argument.
Existing columns that are modified by ... will always be returned in their original location.
New columns created through ... will be placed according to the .before and .after arguments.
The number of rows is not affected.
Columns given the value NULL will be removed.
Groups will be recomputed if a grouping variable is mutated.
Data frame attributes are preserved.

Underlying Assumptions

This function was developed according to current industry practice relating to experience study calculations. Some of the assumptions incorporated are briefly outlined below.

The experience study data is at a seriatim level where repeated observations of multiple units can exist. For example, the study data can contain experience for multiple policies over multiple calendar or policy years.
Each decrement event can be described as a Bernoulli random variable with expected rate of decrement equal to $p$. Furthermore, combining multiple observation units with equal rates of decrement $p$ can be considered a Binomial random variable with $n$ equal to the number of observation units.
Decrements are considered to be uniform between observations.

With these assumptions, new expecteds that are not amount-weighted are calculated as the product of exposures and the expected decrement rate, new variances are calculated as the product of the previously calculated new expecteds and 1 minus the previously calculated new expecteds. Amount-weighted expecteds and variances follow the prior calculations and additionally multiply by the amount scalar and amount scalar squared, respectively.

For a more detailed explanation of these methods used, please refer to the Society of Actuary's publication over experience study calculations.

Naming convention

expstudy uses a naming convention where some functions are prefixed by the underling dplyr verb. The purpose of this is to associate the resulting structure of the expstudy function with a very similar output as what the dplyr function would produce. Note that the intention here is not replace all dplyr use cases but instead add specific functionality to streamline routine experience study analyses.

Examples

mortexp |>
  dplyr::mutate(
    NEW_EXPECTED_MORT_RT = runif(n = nrow(mortexp))
  ) |>
  mutate_expecvar(
    new_expected_rates = NEW_EXPECTED_MORT_RT,
    new_expecvar_prefix = 'ADJ_',
    amount_scalar = FACE_AMOUNT
  )
#> # A tibble: 176,096 × 28
#>    AS_OF_DATE POLICY_HOLDER GENDER SMOKING_STATUS UNDERWRITING_CLASS FACE_AMOUNT
#>  * <date>     <fct>         <fct>  <fct>          <fct>                    <dbl>
#>  1 1998-04-30 PH_0001       FEMALE NON-SMOKER     STANDARD                  5000
#>  2 1998-05-31 PH_0001       FEMALE NON-SMOKER     STANDARD                  5000
#>  3 1998-06-30 PH_0001       FEMALE NON-SMOKER     STANDARD                  5000
#>  4 1998-07-31 PH_0001       FEMALE NON-SMOKER     STANDARD                  5000
#>  5 1998-08-31 PH_0001       FEMALE NON-SMOKER     STANDARD                  5000
#>  6 1998-09-30 PH_0001       FEMALE NON-SMOKER     STANDARD                  5000
#>  7 1998-10-31 PH_0001       FEMALE NON-SMOKER     STANDARD                  5000
#>  8 1998-11-30 PH_0001       FEMALE NON-SMOKER     STANDARD                  5000
#>  9 1998-12-31 PH_0001       FEMALE NON-SMOKER     STANDARD                  5000
#> 10 1999-01-31 PH_0001       FEMALE NON-SMOKER     STANDARD                  5000
#> # ℹ 176,086 more rows
#> # ℹ 22 more variables: INSURED_DOB <date>, ISSUE_DATE <date>,
#> #   TERMINATION_DATE <date>, ISSUE_AGE <dbl>, ATTAINED_AGE <dbl>,
#> #   EXPECTED_MORTALITY_RT <dbl>, POLICY_DURATION_YR <dbl>,
#> #   POLICY_DURATION_MNTH <int>, POLICY_STATUS <fct>, MORT_EXPOSURE_CNT <dbl>,
#> #   MORT_EXPOSURE_AMT <dbl>, MORT_ACTUAL_CNT <dbl>, MORT_ACTUAL_AMT <dbl>,
#> #   MORT_EXPECTED_CNT <dbl>, MORT_EXPECTED_AMT <dbl>, …