forked from insightsengineering/tern
-
Notifications
You must be signed in to change notification settings - Fork 0
/
tables.Rmd
479 lines (395 loc) · 18.7 KB
/
tables.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
---
title: "Tabulation"
date: "2022-03-09"
output:
rmarkdown::html_document:
theme: "spacelab"
highlight: "kate"
toc: true
toc_float: true
vignette: >
%\VignetteIndexEntry{Tabulation}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
editor_options:
markdown:
wrap: 72
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## `tern` Tabulation
The `tern` R package provides functions to create common analyses from clinical trials in `R`.
The core functionality for tabulation is built on the more general purpose `rtables` package.
New users should first begin by reading the ["Introduction to tern"](https://insightsengineering.github.io/tern/main/articles/tern.html) and ["Introduction to `rtables`"](https://insightsengineering.github.io/rtables/main/articles/introduction.html) vignettes.
The packages used in this vignette are:
```{r, message=FALSE}
library(rtables)
library(tern)
library(dplyr)
```
The datasets used in this vignette are:
```{r, message=FALSE}
adsl <- ex_adsl
adae <- ex_adae
adrs <- ex_adrs
```
## `tern` Analyze Functions
Analyze functions are used in combination with the `rtables` layout functions, in the pipeline which creates the `rtables` table.
They apply some statistical logic to the layout of the `rtables` table.
The table layout is materialized with the `rtables::build_table` function and the data.
The `tern` analyze functions are wrappers around `rtables::analyze` function, they offer various methods useful from the perspective of clinical trials and other statistical projects.
Examples of the `tern` analyze functions are `count_occurrences`, `summarize_ancova` or `analyze_vars`.
As there is no one prefix to identify all `tern` analyze functions it is recommended to use the [the tern website functions reference](https://insightsengineering.github.io/tern/main/reference/index.html).
### Internals of `tern` Analyze Functions
**Please skip this subsection if you are not interested in the internals of `tern` analyze functions.**
Internally `tern` analyze functions like `summarize_ancova` are mainly built in the 4 elements chain:
```
h_ancova() -> tern:::s_ancova() -> tern:::a_ancova() -> summarize_ancova()
```
The descriptions for each function type:
- analysis helper functions `h_*`. These functions are useful to help define the analysis.
- statistics function `s_*`. Statistics functions should do the computation of the numbers that are tabulated later.
In order to separate computation from formatting, they should not take care of `rcell` type formatting themselves.
- formatted analysis functions `a_*`.
These have the same arguments as the corresponding statistics functions, and can be further customized by calling `rtables::make_afun()` on them.
They are used as `afun` in `rtables::analyze()`.
- **analyze functions `rtables::analyze(..., afun = make_afun(tern::a_*))`.
Analyze functions are used in combination with the `rtables` layout functions, in the pipeline which creates the table.
They are the last element of the chain.**
We will use the native `rtables::analyze` function with the `tern` formatted analysis functions as a `afun` parameter.
```
l <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
analyze(vars = "AVAL", afun = a_summary)
build_table(l, df = adrs)
```
The `rtables::make_afun` function is helpful when somebody wants to attach some format to the formatted analysis function.
```
afun <- make_afun(
a_summary,
.stats = NULL,
.formats = c(median = "xx."),
.labels = c(median = "My median"),
.indent_mods = c(median = 1L)
)
l2 <- basic_table() %>%
split_cols_by(var = "ARM") %>%
split_rows_by(var = "AVISIT") %>%
analyze(vars = "AVAL", afun = afun)
build_table(l2, df = adrs)
```
## Tabulation Examples
We are going to create 3 different tables using `tern` analyze functions and the `rtables` interface.
| Table | `tern` analyze functions |
|---------------------|:---------------------------------|
| **Demographic Table** | `analyze_vars()` and `summarize_num_patients()` |
| **Adverse event Table** | `count_occurrences()` |
| **Response Table** | `estimate_proportion()`, `estimate_proportion_diff()` and `test_proportion_diff()` |
### Demographic Table
Demographic tables provide a summary of the characteristics of patients enrolled in a clinical trial. Typically the table columns represent treatment arms and variables summarized in the table are demographic properties such as age, sex, race, etc.
In the example below the only function from `tern` is `analyze_vars()` and the remaining layout functions are from `rtables`.
```{r}
# Select variables to include in table.
vars <- c("AGE", "SEX")
var_labels <- c("Age (yr)", "Sex")
basic_table() %>%
split_cols_by(var = "ARM") %>%
add_overall_col("All Patients") %>%
add_colcounts() %>%
analyze_vars(
vars = vars,
var_labels = var_labels
) %>%
build_table(adsl)
```
To change the display order of categorical variables in a table use factor variables and explicitly set the order of the levels. This is the case for the display order in columns and rows. Note that the `forcats` package has many useful functions to help with these types of data processing steps (not used below).
```{r}
# Reorder the levels in the ARM variable.
adsl$ARM <- factor(adsl$ARM, levels = c("B: Placebo", "A: Drug X", "C: Combination"))
# Reorder the levels in the SEX variable.
adsl$SEX <- factor(adsl$SEX, levels = c("M", "F", "U", "UNDIFFERENTIATED"))
basic_table() %>%
split_cols_by(var = "ARM") %>%
add_overall_col("All Patients") %>%
add_colcounts() %>%
analyze_vars(
vars = vars,
var_labels = var_labels
) %>%
build_table(adsl)
```
The `tern` package includes many functions similar to `analyze_vars()`. These functions are called layout creating functions and are used in combination with other `rtables` layout functions just like in the examples above. Layout creating functions are wrapping calls to `rtables` `analyze()`, `analyze_colvars()` and `summarize_row_groups()` and provide options for easy formatting and analysis modifications.
To customize the display for the demographics table, we can do so via the arguments in `analyze_vars()`. Most layout creating functions in `tern` include the standard arguments `.stats`, `.formats`, `.labels` and `.indent_mods` which control which statistics are displayed and how the numbers are formatted. Refer to the package help with `help("analyze_vars")` or `?analyze_vars` to see the full set of options.
For this example we will change the default summary for numeric variables to include the number of records, and the mean and standard deviation (in a single statistic, i.e. within a single cell). For categorical variables we modify the summary to include the number of records and the counts of categories. We also modify the display format for the mean and standard deviation to print two decimal places instead of just one.
```{r}
# Select statistics and modify default formats.
basic_table() %>%
split_cols_by(var = "ARM") %>%
add_overall_col("All Patients") %>%
add_colcounts() %>%
analyze_vars(
vars = vars,
var_labels = var_labels,
.stats = c("n", "mean_sd", "count"),
.formats = c(mean_sd = "xx.xx (xx.xx)")
) %>%
build_table(adsl)
```
One feature of a `layout` is that it can be used with different datasets to create different summaries. For example, here we can easily create the same summary of demographics for the Brazil and China subgroups, respectively:
```{r}
lyt <- basic_table() %>%
split_cols_by(var = "ARM") %>%
add_overall_col("All Patients") %>%
add_colcounts() %>%
analyze_vars(
vars = vars,
var_labels = var_labels
)
build_table(lyt, df = adsl %>% dplyr::filter(COUNTRY == "BRA"))
build_table(lyt, df = adsl %>% dplyr::filter(COUNTRY == "CHN"))
```
### Adverse Event Table
The standard table of adverse events is a summary by system organ class and preferred term. For frequency counts by preferred term, if there are multiple occurrences of the same AE in an individual we count them only once.
To create this table we will need to use a combination of several layout creating functions in a tabulation pipeline.
We start by creating the high-level summary. The layout creating function in `tern` that can do this is `summarize_num_patients()`:
```{r}
basic_table() %>%
split_cols_by(var = "ACTARM") %>%
add_colcounts() %>%
add_overall_col(label = "All Patients") %>%
summarize_num_patients(
var = "USUBJID",
.stats = c("unique", "nonunique"),
.labels = c(
unique = "Total number of patients with at least one AE",
nonunique = "Overall total number of events"
)
) %>%
build_table(
df = adae,
alt_counts_df = adsl
)
```
Note that for this table, the denominator used for percentages and shown in the header of the table `(N = xx)` is defined based on the subject-level dataset `adsl`. This is done by using the `alt_df_counts` argument in `build_table()`, which provides an alternative data set for deriving the counts in the header. This is often required when we work with data sets that include multiple records per patient as `df`, such as `adae` here.
#### Statistics Functions
Before building out the rest of the AE table it is helpful to introduce some more `tern` package design conventions. Each layout creating function in `tern` is a wrapper for a Statistics function. Statistics functions are the ones that do the actual computation of numbers in a table. These functions always return named lists whose elements are the statistics available to include in a layout via the `.stats` argument at the layout creating function level.
Statistics functions follow a naming convention to always begin with `s_*` and for ease of use are documented on the same page as their layout creating function counterpart. It is helpful to review a Statistic function to understand the logic used to calculate the numbers in a table and see what options may be available to modify the analysis.
For example, the Statistics function calculating the numbers in `summarize_num_patients()` is `s_num_patients()`. The results of this Statistics function is a list with the elements `unique`, `nonunique` and `unique_count`:
```{r}
s_num_patients(x = adae$USUBJID, labelstr = "", .N_col = nrow(adae))
```
From these results you can see that the `unique` and `nonunique` statistics are those displayed in the "All Patients" column in the initial AE table output above. Also you can see that these are raw numbers and are not formatted in any way. All formatting functionality is handled at the layout creating function level with the `.formats` argument.
Now that we know what types of statistics can be derived by `s_num_patients()`, we can try modifying the default layout returned by `summarize_num_patients()`. Instead of reporting the `unique` and `nonqunie` statistics, we specify that the analysis should include only the `unique_count` statistic. The result will show only the counts of unique patients. Note we make this update in both the `.stats` and `.labels` argument of `summarize_num_patients()`.
```{r}
basic_table() %>%
split_cols_by(var = "ACTARM") %>%
add_colcounts() %>%
add_overall_col(label = "All Patients") %>%
summarize_num_patients(
var = "USUBJID",
.stats = "unique_count",
.labels = c(unique_count = "Total number of patients with at least one AE")
) %>%
build_table(
df = adae,
alt_counts_df = adsl
)
```
Let's now continue building on the layout for the adverse event table.
After we have the top-level summary, we can repeat the same summary at each system organ class level. To do this we split the analysis data with `split_rows_by()` before calling again `summarize_num_patients()`.
```{r}
basic_table() %>%
split_cols_by(var = "ACTARM") %>%
add_colcounts() %>%
add_overall_col(label = "All Patients") %>%
summarize_num_patients(
var = "USUBJID",
.stats = c("unique", "nonunique"),
.labels = c(
unique = "Total number of patients with at least one AE",
nonunique = "Overall total number of events"
)
) %>%
split_rows_by(
"AEBODSYS",
child_labels = "visible",
nested = FALSE,
indent_mod = -1L,
split_fun = drop_split_levels
) %>%
summarize_num_patients(
var = "USUBJID",
.stats = c("unique", "nonunique"),
.labels = c(
unique = "Total number of patients with at least one AE",
nonunique = "Overall total number of events"
)
) %>%
build_table(
df = adae,
alt_counts_df = adsl
)
```
The table looks almost ready. For the final step, we need a layout creating function that can produce a count table of event frequencies. The layout creating function for this is `count_occurrences()`. Let's first try using this function in a simpler layout without row splits:
```{r}
basic_table() %>%
split_cols_by(var = "ACTARM") %>%
add_colcounts() %>%
add_overall_col(label = "All Patients") %>%
count_occurrences(vars = "AEDECOD") %>%
build_table(
df = adae,
alt_counts_df = adsl
)
```
Putting everything together, the final AE table looks like this:
```{r}
basic_table() %>%
split_cols_by(var = "ACTARM") %>%
add_colcounts() %>%
add_overall_col(label = "All Patients") %>%
summarize_num_patients(
var = "USUBJID",
.stats = c("unique", "nonunique"),
.labels = c(
unique = "Total number of patients with at least one AE",
nonunique = "Overall total number of events"
)
) %>%
split_rows_by(
"AEBODSYS",
child_labels = "visible",
nested = FALSE,
indent_mod = -1L,
split_fun = drop_split_levels
) %>%
summarize_num_patients(
var = "USUBJID",
.stats = c("unique", "nonunique"),
.labels = c(
unique = "Total number of patients with at least one AE",
nonunique = "Overall total number of events"
)
) %>%
count_occurrences(vars = "AEDECOD") %>%
build_table(
df = adae,
alt_counts_df = adsl
)
```
### Response Table
A typical response table for a binary clinical trial endpoint may be composed of several different analyses:
* Proportion of responders in each treatment group
* Difference between proportion of responders in comparison groups vs. control group
* Chi-Square test for difference in response rates between comparison groups vs. control group
We can build a table layout like this by following the same approach we used for the AE table: each table section will be produced using a different layout creating function from `tern`.
First we start with some data preparation steps to set up the analysis dataset. We select the endpoint to analyze from `PARAMCD` and define the logical variable `is_rsp` which indicates whether a patient is classified as a responder or not.
```{r}
# Preprocessing to select an analysis endpoint.
anl <- adrs %>%
dplyr::filter(PARAMCD == "BESRSPI") %>%
dplyr::mutate(is_rsp = AVALC %in% c("CR", "PR"))
```
To create a summary of the proportion of responders in each treatment group, use the `estimate_proportion()` layout creating function:
```{r}
basic_table() %>%
split_cols_by(var = "ARM") %>%
add_colcounts() %>%
estimate_proportion(
vars = "is_rsp",
table_names = "est_prop"
) %>%
build_table(anl)
```
To specify which arm in the table should be used as the reference, use the argument `ref_group` from `split_cols_by()`. Below we change the reference arm to "B: Placebo" and so this arm is displayed as the first column:
```{r}
basic_table() %>%
split_cols_by(var = "ARM", ref_group = "B: Placebo") %>%
add_colcounts() %>%
estimate_proportion(
vars = "is_rsp"
) %>%
build_table(anl)
```
To further customize the analysis, we can use the `method` and `conf_level` arguments to modify the type of confidence interval that is calculated:
```{r}
basic_table() %>%
split_cols_by(var = "ARM", ref_group = "B: Placebo") %>%
add_colcounts() %>%
estimate_proportion(
vars = "is_rsp",
method = "clopper-pearson",
conf_level = 0.9
) %>%
build_table(anl)
```
The next table section needed should summarize the difference in response rates between the reference arm each comparison arm. Use `estimate_proportion_diff()` layout creating function for this:
```{r}
basic_table() %>%
split_cols_by(var = "ARM", ref_group = "B: Placebo") %>%
add_colcounts() %>%
estimate_proportion_diff(
vars = "is_rsp",
show_labels = "visible",
var_labels = "Unstratified Analysis"
) %>%
build_table(anl)
```
The final section needed to complete the table includes a statistical test for the difference in response rates. Use the `test_proportion_diff()` layout creating function for this:
```{r}
basic_table() %>%
split_cols_by(var = "ARM", ref_group = "B: Placebo") %>%
add_colcounts() %>%
test_proportion_diff(vars = "is_rsp") %>%
build_table(anl)
```
To customize the output, we use the `method` argument to select a Chi-Squared test with Schouten correction.
```{r}
basic_table() %>%
split_cols_by(var = "ARM", ref_group = "B: Placebo") %>%
add_colcounts() %>%
test_proportion_diff(
vars = "is_rsp",
method = "schouten"
) %>%
build_table(anl)
```
Now we can put all the table sections together in one layout pipeline. Note there is one more small change needed. Since the primary analysis variable in all table sections is the same (`is_rsp`), we need to give each sub-table a unique name. This is done by adding the `table_names` argument and providing unique names through that:
```{r}
basic_table() %>%
split_cols_by(var = "ARM", ref_group = "B: Placebo") %>%
add_colcounts() %>%
estimate_proportion(
vars = "is_rsp",
method = "clopper-pearson",
conf_level = 0.9,
table_names = "est_prop"
) %>%
estimate_proportion_diff(
vars = "is_rsp",
show_labels = "visible",
var_labels = "Unstratified Analysis",
table_names = "est_prop_diff"
) %>%
test_proportion_diff(
vars = "is_rsp",
method = "schouten",
table_names = "test_prop_diff"
) %>%
build_table(anl)
```
## Summary
Tabulation with `tern` builds on top of the the layout tabulation framework from `rtables`. Complex tables are built step by step in a pipeline by combining layout creating functions that perform a specific type of analysis.
The `tern` analyze functions introduced in this vignette are:
* `analyze_vars()`
* `summarize_num_patients()`
* `count_occurrences()`
* `estimate_proportion()`
* `estimate_proportion_diff()`
* `test_proportion_diff()`
Layout creating functions build a formatted `layout` by controlling features such as labels, numerical display formats and indentation. These functions are wrappers for the Statistics functions which calculate the raw summaries of each analysis. You can easily spot Statistics functions in the documentation because they always begin with the prefix `s_`. It can be helpful to inspect and run Statistics functions to understand ways an analysis can be customized.