---
title: "Creating a BDS Time-to-Event ADaM"
output:
  rmarkdown::html_vignette:
vignette: >
  %\VignetteIndexEntry{Creating a BDS Time-to-Event ADaM}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(admiraldev)
```

# Introduction

This article describes creating a BDS time-to-event ADaM.

The main part in programming a time-to-event dataset is the definition of the
events and censoring times. `{admiral}` supports single events like death or
composite events like disease progression or death. More than one source dataset
can be used for the definition of the event and censoring times.

**Note**: *All examples assume CDISC SDTM and/or ADaM format as input unless
otherwise specified.*

## Required Packages

The examples of this vignette require the following packages.

```{r, warning=FALSE, message=FALSE}
library(admiral)
library(dplyr, warn.conflicts = FALSE)
library(pharmaversesdtm)
```

```{r, warning=FALSE, message=FALSE, include=FALSE}
library(lubridate)
```

# Programming Workflow

* [Read in Data](#readdata)
* [Derive Parameters (`CNSR`, `ADT`, `STARTDT`)](#parameters)
* [Derive Analysis Value (`AVAL`)](#aval)
* [Derive Analysis Sequence Number (`ASEQ`)](#aseq)
* [Add ADSL Variables](#adslvars)
* [Add Labels and Attributes](#attributes)

## Read in Data {#readdata}

To start, all datasets needed for the creation of the time-to-event dataset
should be read into the environment. This will be a company specific process.

For example purpose, the ADSL dataset---which is included
in `{admiral}`---and the SDTM datasets from `{pharmaversesdtm}` are used.

```{r}
ae <- pharmaversesdtm::ae
adsl <- admiral::admiral_adsl

ae <- convert_blanks_to_na(ae)
```

```{r echo=FALSE}
ae <- filter(ae, USUBJID %in% c("01-701-1015", "01-701-1023", "01-703-1086", "01-703-1096", "01-707-1037", "01-716-1024"))
```

The following code creates a minimally viable ADAE dataset to be used throughout
the following examples.

```{r}
adae <- ae %>%
  left_join(adsl, by = c("STUDYID", "USUBJID")) %>%
  derive_vars_dt(
    new_vars_prefix = "AST",
    dtc = AESTDTC,
    highest_imputation = "M"
  ) %>%
  derive_vars_dt(
    new_vars_prefix = "AEN",
    dtc = AEENDTC,
    highest_imputation = "M",
    date_imputation = "last"
  ) %>%
  mutate(TRTEMFL = if_else(ASTDT >= TRTSDT &
    AENDT <= TRTEDT + days(30), "Y", NA_character_))
```

## Derive Parameters (`CNSR`, `ADT`, `STARTDT`) {#parameters}

To derive the parameter dependent variables like `CNSR`, `ADT`, `STARTDT`,
`EVNTDESC`, `SRCDOM`, `PARAMCD`, ... the `derive_param_tte()` function can be
used. It adds one parameter to the input dataset with one observation per
subject. Usually it is called several times.

For each subject it is determined if an event occurred. In the affirmative the
analysis date `ADT` is set to the earliest event date. If no event occurred, the
analysis date is set to the latest censoring date.

The events and censorings are defined by the `event_source()` and the
`censor_source()` class respectively. It defines

- which observations (`filter` parameter) of a source dataset (`dataset_name`
parameter) are potential events or censorings,
- the value of the `CNSR` variable (`censor` parameter), and
- which variable provides the date (`date` parameter).

The date can be provided as date (`*DT` variable) or datetime (`*DTM` variable).

CDISC strongly recommends `CNSR = 0` for events and positive integers for
censorings. `{admiral}` enforces this recommendation. Therefore the `censor`
parameter is available for `censor_source()` only. It is defaulted to `1`.

The `dataset_name` parameter expects a character value which is used as an
identifier. The actual data which is used for the derivation of the parameter is
provided via the `source_datasets` parameter of `derive_param_tte()`. It expects
a named list of datasets. The names correspond to the identifiers specified for
the `dataset_name` parameter. This allows to define events and censoring
independent of the data.

### Pre-Defined Time-to-Event Source Objects

The table below shows all pre-defined `tte_source` objects which should cover the most common use cases.

<details>
<summary>`tte_source` objects</summary>
```{r echo=FALSE}
# knitr::kable(admiral:::list_tte_source_objects())
library(reactable)
library(htmltools)
reactable(
  admiral:::list_tte_source_objects(),
  defaultColDef = colDef(
    header = function(value) {
      tags$code(value)
    },
    cell = function(value) {
      tags$code(value)
    }
  ),
  columns = list(
    object = colDef(
      header = function(value) "Object Name"
    ),
    dataset_name = colDef(
      width = 130
    ),
    date = colDef(
      width = 100
    ),
    censor = colDef(
      width = 80
    ),
    set_values_to = colDef(
      cell = function(value) {
        items <- stringr::str_split_1(as.character(value), "<br>")
        tagList(lapply(items, function(item) div(item)))
      }
    )
  ),
  filterable = TRUE,
  resizable = TRUE,
  defaultPageSize = 20
)
```
</details>

These pre-defined objects can be passed directly to `derive_param_tte()` to create a new time-to-event parameter.

```{r}
adtte <- derive_param_tte(
  dataset_adsl = adsl,
  start_date = TRTSDT,
  event_conditions = list(ae_ser_event),
  censor_conditions = list(lastalive_censor),
  source_datasets = list(adsl = adsl, adae = adae),
  set_values_to = exprs(PARAMCD = "TTAESER", PARAM = "Time to First Serious AE")
)
```
```{r, echo=FALSE}
dataset_vignette(
  adtte,
  display_vars = exprs(USUBJID, PARAMCD, PARAM, STARTDT, ADT, CNSR)
)
```


### Single Event

For example, the overall survival time could be defined from treatment start to death. Patients alive or lost to follow-up would be censored to the last alive date. The following call defines a death event based on ADSL variables.

```{r}
death <- event_source(
  dataset_name = "adsl",
  filter = DTHFL == "Y",
  date = DTHDT
)
```

A corresponding censoring based on the last known alive date can be defined by
the following call.

```{r}
lstalv <- censor_source(
  dataset_name = "adsl",
  date = LSTALVDT
)
```

The definitions can be passed to `derive_param_tte()` to create a new time-to-event parameter.

```{r}
adtte <- derive_param_tte(
  dataset_adsl = adsl,
  source_datasets = list(adsl = adsl),
  start_date = TRTSDT,
  event_conditions = list(death),
  censor_conditions = list(lstalv),
  set_values_to = exprs(PARAMCD = "OS", PARAM = "Overall Survival")
)
```
```{r, echo=FALSE}
dataset_vignette(
  adtte,
  display_vars = exprs(USUBJID, PARAMCD, PARAM, STARTDT, ADT, CNSR)
)
```

Note that in practice for efficacy parameters you might use randomization date as the time to event origin date.

### Add Additional Information for Events and Censoring (`EVNTDESC`, `SRCVAR`, ...)

To add additional information like event or censoring description (`EVNTDESC`)
or source variable (`SRCVAR`) the `set_values_to` parameter can be specified in
the event/censoring definition.

```{r}
# define death event #
death <- event_source(
  dataset_name = "adsl",
  filter = DTHFL == "Y",
  date = DTHDT,
  set_values_to = exprs(
    EVNTDESC = "DEATH",
    SRCDOM = "ADSL",
    SRCVAR = "DTHDT"
  )
)

# define censoring at last known alive date #
lstalv <- censor_source(
  dataset_name = "adsl",
  date = LSTALVDT,
  set_values_to = exprs(
    EVNTDESC = "LAST KNOWN ALIVE DATE",
    SRCDOM = "ADSL",
    SRCVAR = "LSTALVDT"
  )
)

# derive time-to-event parameter #
adtte <- derive_param_tte(
  dataset_adsl = adsl,
  source_datasets = list(adsl = adsl),
  event_conditions = list(death),
  censor_conditions = list(lstalv),
  set_values_to = exprs(PARAMCD = "OS", PARAM = "Overall Survival")
)
```

```{r, echo=FALSE}
dataset_vignette(
  adtte,
  display_vars = exprs(USUBJID, EVNTDESC, SRCDOM, SRCVAR, CNSR, ADT)
)
# save adtte and adsl for next section
adtte_bak <- adtte
adsl_bak <- adsl
```

### Handling Subjects Without Assessment

If a subject has no event and has no record meeting the censoring rule, it will
not be included in the output dataset. In order to have a record for this
subject in the output dataset, another `censoring_source()` object should be
created to specify how those patients will be censored. Therefore the `start`
censoring is defined below to achieve that subjects without data in `adrs` are
censored at the start date.

The ADaM IG requires that a computed date must be accompanied by imputation flags.
Thus, if the function detects a `*DTF` and/or `*TMF` variable corresponding
to `start_date` then `STARTDTF` and `STARTTMF` are set automatically to the values
of these variables. If a date variable from one of the event
or censoring source datasets is imputed, the imputation flag can be specified
for the `set_values_to` parameter in `event_source()` or `censor_source()` (see
definition of the `start` censoring below).

As the CDISC pilot does not contain a `RS` dataset, the following example for
progression free survival uses manually created datasets.

```
View(adsl)
```
```{r, echo=FALSE}
adsl <- tibble::tribble(
  ~USUBJID, ~DTHFL, ~DTHDT,            ~TRTSDT,           ~TRTSDTF,
  "01",     "Y",    ymd("2021-06-12"), ymd("2021-01-01"), "M",
  "02",     "N",    NA,                ymd("2021-02-03"), NA,
  "03",     "Y",    ymd("2021-08-21"), ymd("2021-08-10"), NA,
  "04",     "N",    NA,                ymd("2021-02-03"), NA,
  "05",     "N",    NA,                ymd("2021-04-01"), "D"
) %>%
  mutate(STUDYID = "AB42")

dataset_vignette(
  adsl,
  display_vars = exprs(USUBJID, DTHFL, DTHDT, TRTSDT, TRTSDTF)
)
```

```
View(adrs)
```
```{r, echo=FALSE}
adrs <- tibble::tribble(
  ~USUBJID, ~AVALC, ~ADT,              ~ASEQ,
  "01",     "SD",   ymd("2021-01-03"), 1,
  "01",     "PR",   ymd("2021-03-04"), 2,
  "01",     "PD",   ymd("2021-05-05"), 3,
  "02",     "PD",   ymd("2021-02-03"), 1,
  "04",     "SD",   ymd("2021-02-13"), 1,
  "04",     "PR",   ymd("2021-04-14"), 2,
  "04",     "CR",   ymd("2021-05-15"), 3
) %>%
  mutate(
    STUDYID = "AB42",
    PARAMCD = "OVR",
    PARAM = "Overall Response"
  ) %>%
  select(STUDYID, USUBJID, PARAMCD, PARAM, ADT, ASEQ, AVALC)

dataset_vignette(
  adrs,
  display_vars = exprs(USUBJID, AVALC, ADT, ASEQ, PARAMCD, PARAM)
)
```

An event for progression free survival occurs if

- progression of disease is observed or
- the subject dies.

Therefore two `event_source()` objects are defined:

- `pd` for progression of disease and
- `death` for death.

Some subjects may experience both events. In this case the first one is selected
by `derive_param_tte()`.

```{r}
# progressive disease event #
pd <- event_source(
  dataset_name = "adrs",
  filter = AVALC == "PD",
  date = ADT,
  set_values_to = exprs(
    EVNTDESC = "PD",
    SRCDOM = "ADRS",
    SRCVAR = "ADT",
    SRCSEQ = ASEQ
  )
)

# death event #
death <- event_source(
  dataset_name = "adsl",
  filter = DTHFL == "Y",
  date = DTHDT,
  set_values_to = exprs(
    EVNTDESC = "DEATH",
    SRCDOM = "ADSL",
    SRCVAR = "DTHDT"
  )
)
```

Subjects without event must be censored at the last tumor assessment. For the
censoring the `lastvisit` object is defined as _all_ tumor assessments. Please
note that it is not necessary to select the last one or exclude assessments
which resulted in progression of disease. This is handled within
`derive_param_tte()`.

```{r}
# last tumor assessment censoring (CNSR = 1 by default) #
lastvisit <- censor_source(
  dataset_name = "adrs",
  date = ADT,
  set_values_to = exprs(
    EVNTDESC = "LAST TUMOR ASSESSMENT",
    SRCDOM = "ADRS",
    SRCVAR = "ADT",
    SRCSEQ = ASEQ
  )
)
```

Patients without tumor assessment should be censored at the start date.
Therefore the `start` object is defined with the treatment start date as
censoring date. It is not necessary to exclude patient with tumor assessment in
the definition of `start` because `derive_param_tte()` selects the last date
across all `censor_source()` objects as censoring date.

```{r}
# start date censoring (for patients without tumor assessment) (CNSR = 2) #
start <- censor_source(
  dataset_name = "adsl",
  date = TRTSDT,
  censor = 2,
  set_values_to = exprs(
    EVNTDESC = "TREATMENT START",
    SRCDOM = "ADSL",
    SRCVAR = "TRTSDT",
    ADTF = TRTSDTF
  )
)

# derive time-to-event parameter #
adtte <- derive_param_tte(
  dataset_adsl = adsl,
  source_datasets = list(adsl = adsl, adrs = adrs),
  start_date = TRTSDT,
  event_conditions = list(pd, death),
  censor_conditions = list(lastvisit, start),
  set_values_to = exprs(PARAMCD = "PFS", PARAM = "Progression Free Survival")
)
```

```{r, echo=FALSE}
dataset_vignette(
  adtte %>%
    select(
      STUDYID, USUBJID, PARAMCD, PARAM, STARTDT, ADT, ADTF, CNSR,
      EVNTDESC, SRCDOM, SRCVAR, SRCSEQ
    ),
  display_vars = exprs(USUBJID, PARAMCD, STARTDT, ADT, ADTF, CNSR)
)
```

### Deriving a Series of Time-to-Event Parameters

If several similar time-to-event parameters need to be derived the
`call_derivation()` function is useful.

In the following example parameters for time to first AE, time to first serious
AE, and time to first related AE are derived. The censoring is the same for all
three. Only the definition of the event differs.

```{r echo=FALSE}
adtte <- adtte_bak
adsl <- adsl_bak
```
```{r}
# define censoring #
observation_end <- censor_source(
  dataset_name = "adsl",
  date = pmin(TRTEDT + days(30), EOSDT),
  censor = 1,
  set_values_to = exprs(
    EVNTDESC = "END OF TREATMENT",
    SRCDOM = "ADSL",
    SRCVAR = "TRTEDT"
  )
)

# define time to first AE #
tt_ae <- event_source(
  dataset_name = "ae",
  date = ASTDT,
  order = exprs(AESEQ),
  set_values_to = exprs(
    EVNTDESC = "ADVERSE EVENT",
    SRCDOM = "AE",
    SRCVAR = "AESTDTC",
    SRCSEQ = AESEQ
  )
)

# define time to first serious AE #
tt_ser_ae <- event_source(
  dataset_name = "ae",
  filter = AESER == "Y",
  date = ASTDT,
  order = exprs(AESEQ),
  set_values_to = exprs(
    EVNTDESC = "SERIOUS ADVERSE EVENT",
    SRCDOM = "AE",
    SRCVAR = "AESTDTC",
    SRCSEQ = AESEQ
  )
)

# define time to first related AE #
tt_rel_ae <- event_source(
  dataset_name = "ae",
  filter = AEREL %in% c("PROBABLE", "POSSIBLE", "REMOTE"),
  date = ASTDT,
  order = exprs(AESEQ),
  set_values_to = exprs(
    EVNTDESC = "RELATED ADVERSE EVENT",
    SRCDOM = "AE",
    SRCVAR = "AESTDTC",
    SRCSEQ = AESEQ
  )
)

# derive all three time to event parameters #
adaette <- call_derivation(
  derivation = derive_param_tte,
  variable_params = list(
    params(
      event_conditions = list(tt_ae),
      set_values_to = exprs(PARAMCD = "TTAE")
    ),
    params(
      event_conditions = list(tt_ser_ae),
      set_values_to = exprs(PARAMCD = "TTSERAE")
    ),
    params(
      event_conditions = list(tt_rel_ae),
      set_values_to = exprs(PARAMCD = "TTRELAE")
    )
  ),
  dataset_adsl = adsl,
  source_datasets = list(
    adsl = adsl,
    ae = filter(adae, TRTEMFL == "Y")
  ),
  censor_conditions = list(observation_end)
)
```

```{r, echo=FALSE}
adaette %>%
  select(STUDYID, USUBJID, PARAMCD, STARTDT, ADT, CNSR, EVNTDESC, SRCDOM, SRCVAR, SRCSEQ) %>%
  arrange(USUBJID, PARAMCD) %>%
  dataset_vignette(display_vars = exprs(USUBJID, PARAMCD, STARTDT, ADT, CNSR, EVNTDESC, SRCDOM, SRCVAR, SRCSEQ))
```

If time-to-event parameters need to be derived for each by group of a source
dataset, the `by_vars` parameter can be specified. See the `derive_param_tte()`
documentation for an example.

### Further Scenarios

For further scenarios on how to derive data for time-to-event analyses, see the
[Time-to-Event Analyses](tte_analyses.html) vignette.

## Derive Analysis Value (`AVAL`) {#aval}

The analysis value (`AVAL`) can be derived by calling `derive_vars_duration()`.

This example derives the time to event in days. Other units can be requested by
the specifying the `out_unit` parameter.

```{r echo=FALSE}
adtte <- adtte_bak
adsl <- adsl_bak
```
```{r eval=TRUE}
adtte <- derive_vars_duration(
  adtte,
  new_var = AVAL,
  start_date = STARTDT,
  end_date = ADT
)
```

```{r, echo=FALSE}
dataset_vignette(
  adtte
)
```

## Derive Analysis Sequence Number (`ASEQ`) {#aseq}

The `{admiral}` function `derive_var_obs_number()` can be used to derive `ASEQ`.
Note that creating `ASEQ` is not required for all ADaM datasets according to the ADaM IG,
and this is just for demonstration purpose.

```{r eval=TRUE}
# Calculate ASEQ (Optional Variable)
adtte <- derive_var_obs_number(
  adtte,
  by_vars = exprs(STUDYID, USUBJID),
  order = exprs(PARAMCD),
  check_type = "error"
)
```

```{r, echo=FALSE}
dataset_vignette(adtte)
```

## Add ADSL Variables {#adslvars}

Variables from ADSL which are required for time-to-event analyses, e.g.,
treatment variables or covariates can be added using `derive_vars_merged()`.

```{r eval=TRUE}
adtte <- derive_vars_merged(
  adtte,
  dataset_add = adsl,
  new_vars = exprs(ARMCD, ARM, ACTARMCD, ACTARM, AGE, SEX),
  by_vars = exprs(STUDYID, USUBJID)
)
```

```{r, echo=FALSE}
dataset_vignette(
  adtte,
  display_vars = exprs(USUBJID, PARAMCD, CNSR, AVAL, ARMCD, AGE, SEX)
)
```

```{r, results='asis', echo=FALSE}
admiral_labels_attrs_section()
```
