Package 'ems'

Title:	Epimed Solutions Collection for Data Editing, Analysis, and Benchmark of Health Units
Description:	Collection of functions related to benchmark with prediction models for data analysis and editing of clinical and epidemiological data.
Authors:	Lunna Borges [aut, cre]
Maintainer:	Lunna Borges <[email protected]>
License:	GPL (>= 2)
Version:	1.3.11
Built:	2025-03-30 03:25:55 UTC
Source:	https://github.com/cran/ems

Help Index

German Breast Cancer Dataset
Calibration Curve
Collection of functions to check data quality in a dataset and remove not valid or extreme values.
Funnel plot for benchmarking health units
Data from ICU admissions.
Miscellaneous functions for data editing
Mortality Rate
Comparisson of the Standardized Resource Use (SRU)
Standardized Mortality Ratio (SMR)
Standardized Resource Use (SRU)
Survival models performance analysis
Tabulation of variables in a stack form

German Breast Cancer Dataset

Description

A dataset containing variables related with breast cancer from german population.

Usage

breastCancer
breastCancer

Format

A data frame with 686 rows and 14 variables

Details

age Patient age (years).
meno Menopausal status (0 = premeno, 1 = postmeno).
size Tumour size (mm).
gradd1 1 = tumour grade 2 or 3; 0 = grade 1.
gradd2 1 = tumour grade 3; 0 = grade 1 or 2.
nodes Number of positive lymph nodes.
enodes emp(-0.12 x nodes).
pgr Progesterone receptor status (fmol 1^-1).
er Oestrogen reeceotor status (fmol 1^-1).
hormon Tamoxifen treatment (0 = no; 1 = yes).
rectime Time (days) to death or cancer recurrence.
censrec Censoring (0 = censored; 1 = event).

Source

https://biostat.app.vumc.org/wiki/Main/DataSets

calcurve function returns a data.frame containing the number of patients, the observed mortality rate and the predicted mortality rate for each category of the predicted mortality rate. If any other acute physiology score is given, the function will also return the mortality rate predicted by this score for each category.

Usage

calcurve(
  deaths,
  pred,
  score = NULL,
  name_score = "Saps3",
  other_score = NULL,
  name_other_score = NULL,
  categories_option = c("predicted", "score", "patients"),
  table = FALSE,
  plot = TRUE,
  title_label = "Calibration Curve",
  y1axis_label = "Patients (n)",
  y2axis_label = "Mortality Rate (%)",
  score_color = c("#cac7cc", "#ffc341", "#33cca3"),
  bar_color = "#1f77b4",
  points = c(19, 18, 17),
  cultureCode = "en-US",
  legend_inset = -0.7
)

## S3 method for class 'calcurve'
print(x, ...)

## S3 method for class 'calcurve'
plot(
  x,
  ...,
  xlab = NULL,
  ylab2 = NULL,
  main = x$title_label,
  text = x$y2axis_label,
  ylab = x$y1axis_label,
  col = c(x$bar_color, x$score_color),
  pch = x$points,
  cultureCode = x$cultureCode,
  legend_inset = x$legend_inset
)
calcurve(
  deaths,
  pred,
  score = NULL,
  name_score = "Saps3",
  other_score = NULL,
  name_other_score = NULL,
  categories_option = c("predicted", "score", "patients"),
  table = FALSE,
  plot = TRUE,
  title_label = "Calibration Curve",
  y1axis_label = "Patients (n)",
  y2axis_label = "Mortality Rate (%)",
  score_color = c("#cac7cc", "#ffc341", "#33cca3"),
  bar_color = "#1f77b4",
  points = c(19, 18, 17),
  cultureCode = "en-US",
  legend_inset = -0.7
)

## S3 method for class 'calcurve'
print(x, ...)

## S3 method for class 'calcurve'
plot(
  x,
  ...,
  xlab = NULL,
  ylab2 = NULL,
  main = x$title_label,
  text = x$y2axis_label,
  ylab = x$y1axis_label,
  col = c(x$bar_color, x$score_color),
  pch = x$points,
  cultureCode = x$cultureCode,
  legend_inset = x$legend_inset
)

Arguments

`deaths`	a numerical vector that only contains 0 and 1, indicating whether the patient was alive or dead, respectively.
`pred`	a numerical vector that contains the mortality rate predicted by the main score, in percentage, for each patient.
`score`	a numerical vector that contains the main score punctuation for each patient, or NULL.
`name_score`	a character string which determines the name of the main score.
`other_score`	a list of numerical vectors, where each vector contains the mortality rate predicted by other score, in percentage, for each patient, or NULL (the default).
`name_other_score`	if other_score variable is different from NULL, this argument must be a vector with the name(s) of the score(s) given.
`categories_option`	a character string which determines if the categories will refer to the main score or to the predicted mortality rate. Accepted values are 'predicted' (the default), 'score' or 'patients'.
`table`	logical; if `TRUE` prints the `data.frame`.
`plot`	logical; if `TRUE` (the default) plots the categories chosen versus the mortality rates in the secondary vertical axis. The main vertical axis refers to the number of patients in each category, represented by the bars.
`title_label`	main title for `calcurve`.
`y1axis_label`, `y2axis_label`	labels of the main vertical axis and the secondary axis, respectively, for `calcurve`.
`score_color`	a vector with the colors to be used in the score traces for `calcurve`.
`bar_color`	color of the bars for `calcurve`.
`points`	a vector with markers types of the scores for `calcurve`.
`cultureCode`	a character specifying which language should be used for plot x axis title and legends. Possible values are 'pt-BR' and 'en-US'. The default is 'en-US'.
`legend_inset`	inset distance(s) from the margins as a fraction of the plot region when legend is placed by keyword. See `legend`.
`x`	an object of class 'calcurve'.
`...`	further arguments passed to `plot`.
`xlab`	label of the horizontal axis for `plot.calcurve`, defaults to `NULL`.
`ylab2`	label of the secondary vertical axis for `plot.calcurve`, defaults to `NULL`.
`main`	main title for `plot.calcurve`.
`text`	label of the secondary vertical axis for `plot.calcurve`.
`ylab`	label of the main vertical axis for `plot.calcurve`.
`col`	character vector with the colors of the bars and score traces, in this order, for `plot.calcurve`.
`pch`	a vector with markers types of the scores for `plot.curve`.

Details

If categories_option = 'score', the categories will refer to the deciles of the main score punctuation. If categories_option = 'predicted', the categories will refer to fixed intervals of the predicted mortality rate. If categories_option = 'patients' the categories will refer to the deciles of patients.

Author(s)

Camila Cardoso

Examples



# Loading the dataset
data(icu)

# Calibration Curve Plot
a <- calcurve(deaths = icu$UnitDischargeName,
pred = icu$Saps3DeathProbabilityStandardEquation,
score = icu$Saps3Points, name_score = 'Saps3',
categories_option = 'predicted', table = FALSE, plot = TRUE)


# Loading the dataset
data(icu)

# Calibration Curve Plot
a <- calcurve(deaths = icu$UnitDischargeName,
pred = icu$Saps3DeathProbabilityStandardEquation,
score = icu$Saps3Points, name_score = 'Saps3',
categories_option = 'predicted', table = FALSE, plot = TRUE)

Collection of functions to check data quality in a dataset and remove not valid or extreme values.

Description

These functions return the counts and fractions of expected values, unexpected values, missing values and not valid values. They are able to do it with factor variables, numeric variables and date variables. t_factor, t_num, and t_date do the job for a single variable and have simpler arguments, while factor.table, num.table, and date.table do the job for several variables at once. rm.unwanted cheks the factor and numeric variables and remove the not valid or extreme values. This approach is attractive before data imputation. They all return a data.frame.

t_factor and factor.table will try to get factor or character variables and check how much of its content match with the expectd. They will try to treat the levels or cells containing " " as NAs.

t_num will try to get a numeric variable (even if it is currently formated as character or factor) and check how much of its content is expected (match a desired range), unexpected, non-numeric values and missing vlaues. num.table does the same, but for two or more variables at once.

t_date will try to get a date variable (even if it is currently formated as character or factor) and check how much of its content is expected (match a desired range), unexpected, non-date values and missing vlaues. date.table does the same, but for two or more variables at once.

rm.unwanted will chek in data the variables specified in the limits object according to the limits specified for each variable. If there are levels considered not valid in a factor variable, these levels are deleted. For example, if Sex is expected to be "M" and "F", and there is also an "I" level in data, every "I" is replaced by NA. Similarly, misspelled levels will be understood as non-valid levels and coercerd to NA, with the exception of leading or trailing empty spaces and lower and upper cases diferences if try.keep = TRUE. If there is a continuous numeric variable and it is expected to have values ranging from 30 to 700, the values outside this range, i.e. higher then 700 or lower then 30, are replaced by NA. Non-numeric elements, i.e. non-valid elements that should be numeric, will also be coerced to NA. If a varible is specified in num.limits, then it will be returned as a numeric variable, even if it was formated as factor or character. If a variable is specified in limits, the returnig format will depend on the stringAsFactors argument, unless it is formated as logical. In this case it is skipped. The arguments limits and num.limits may be NULL, meaning that the factor-character variables or the numeric variables , respectively, will not be edited.

Usage

t_factor(
  data,
  variable,
  legal,
  var.labels = attr(data, "var.labels")[match(variable, names(data))],
  digits = 3
)

factor.table(
  data,
  limits,
  var.labels = attr(data, "var.labels")[match(unlist(sapply(seq_along(limits),
    function(i) limits[[i]][1])), names(data))],
  digits = 3
)

t_num(
  data,
  num.var,
  num.max = 100,
  num.min = 0,
  var.labels = attr(data, "var.labels")[match(num.var, names(data))],
  digits = 3
)

num.table(
  data,
  num.limits,
  var.labels = attr(data, "var.labels")[match(num.limits$num.var, names(data))],
  digits = 3
)

t_date(
  data,
  date.var,
  date.max = as.Date("2010-11-30"),
  date.min = as.Date("2010-01-31"),
  format.date = "auto",
  digits = 3,
  var.labels = attr(data, "var.labels")[match(date.var, names(data))]
)

date.table(
  data,
  date.limits,
  format.date = "auto",
  digits = 3,
  var.labels = attr(data, "var.labels")[match(date.limits$date.var, names(data))]
)

rm.unwanted(
  data,
  limits = NULL,
  num.limits = TRUE,
  try.keep = TRUE,
  stringAsFactors = TRUE
)
t_factor(
  data,
  variable,
  legal,
  var.labels = attr(data, "var.labels")[match(variable, names(data))],
  digits = 3
)

factor.table(
  data,
  limits,
  var.labels = attr(data, "var.labels")[match(unlist(sapply(seq_along(limits),
    function(i) limits[[i]][1])), names(data))],
  digits = 3
)

t_num(
  data,
  num.var,
  num.max = 100,
  num.min = 0,
  var.labels = attr(data, "var.labels")[match(num.var, names(data))],
  digits = 3
)

num.table(
  data,
  num.limits,
  var.labels = attr(data, "var.labels")[match(num.limits$num.var, names(data))],
  digits = 3
)

t_date(
  data,
  date.var,
  date.max = as.Date("2010-11-30"),
  date.min = as.Date("2010-01-31"),
  format.date = "auto",
  digits = 3,
  var.labels = attr(data, "var.labels")[match(date.var, names(data))]
)

date.table(
  data,
  date.limits,
  format.date = "auto",
  digits = 3,
  var.labels = attr(data, "var.labels")[match(date.limits$date.var, names(data))]
)

rm.unwanted(
  data,
  limits = NULL,
  num.limits = TRUE,
  try.keep = TRUE,
  stringAsFactors = TRUE
)

Arguments

`data`	A data.frame where variables will be tested.
`variable`	A character vector of length one, indicating the name of the variable in the dataset to be tested.
`legal`	A character vector representeing the expected levels of the tested variable.
`var.labels`	Variables labels for a nice output. Must be informed in the same order as variable argument. By default, it captures the labels stored in attr(data, "var.labels"), if any. If not informed, the function returns the variables names.
`digits`	Number of decimal places for rounding.
`limits`	a list of two or more lists, each containing the arguments variable name and legal levels (in this order), to check on the factor variables. In the case of `rm.unwanted`, if left NULL, it means no numeric variable will be checked. See examples.
`num.var`	A character vector indicating the name of a variable that should be numeric (although it can yet be formated as character or factor).
`num.max`, `num.min`	The maximal and minimal limits of acceptable range of a numeric variable.
`num.limits`	A data.frame with the following variables: num.var, num.max and num.min, representing the numeric variables names, maximal and minimal expected valid values. In the case of `rm.unwanted`, if left NULL, it means no numeric variable will be checked. See example.
`date.var`	A character vector indicating the name of a variable in data that should be a date (althoug it can yet be formated as character or factor).
`date.max`, `date.min`	The maximal and minimal limits of acceptable range of a date variable.
`format.date`	Default is "auto". If so, `t_date` will use `f.date` to detect the date format and format it as date. If not set to "auto", it should be a date format to be passed to `as.Date` format argument. If `format.date` is misspecified, then `t_date` and `date.table` will identify all dates as non-dates. For `date.table`, if it is set to 'auto' , it will use `f.date` to detect the date format and format it as date. If different from 'auto', one should specify the desired date formats in the date.limits data.frame. See example.
`date.limits`	A `data.frame` with the following variables: date.var, date.max, date.min, and (optionaly) format.date. These represent values of the arguments above. See example.
`try.keep`	Default is `TRUE`. If `TRUE`, `remove.unwanted` will first trim all empty spaces and transform all levels to lower case characters before comparing the found levels and expected levels of a character/factor variable. Therefore, found levels such as "yes " will be considered identical to the expected level "Yes", and will not be coerced to `NA`.
`stringAsFactors`	In `rm.unwanted`, if set to `TRUE`, the default value, variables in the limits argument that are character and numeric variables in data will be returned as factors. Logical variables are skipped. However, a variable will be returned as logical if it is originally a factor but its final levels are `TRUE` and `FALSE` and `stringAsFactors = FALSE`.

Author(s)

Lunna Borges & Pedro Brasil

Examples

# Simulating a dataset with 5 factor variables and assigning labels
y <- data.frame(Var1 = sample(c("Yes","No", "Ignored", "", "yes ", NA), 200, replace = TRUE),
                Var2 = sample(c("Death","Discharge", "", NA), 200, replace = TRUE),
                Var3 = sample(c(16:35, NA), 200, replace = TRUE),
                Var4 = sample(c(12:300, "Female", "", NA), 200, replace = TRUE),
                Var5 = sample(c(60:800), 200, replace = TRUE))
attr(y, "var.labels") <- c("Intervention use","Unit destination","BMI","Age","Cholesterol")
summary(y)

# Cheking the quality only the first variable
t_factor(y, "Var1", c("Yes","No","Ignored"))

# Checking two or more variables at once
factor.limits  = list(list("Var1",c("Yes","No")),
                      list("Var2",c("Death","Discharge")))
factor.table(y, limits = factor.limits)

# Checking only one variable that shohuld be numeric
t_num(y,"Var3", num.min = 17, num.max = 32)

# Making the limits data.frame
num.limits <- data.frame(num.var = c("Var3","Var4","Var5"),
              num.min = c(17,18,70), num.max = c(32,110,300))
num.limits

# Checking two or more numeric variables (or the ones that
#          should be as numeric) at once
num.table(y, num.limits)

# Removing the unwanted values (extremes or not valid).
y <- rm.unwanted(data = y, limits = factor.limits,
                           num.limits = num.limits)
summary(y)

rm(y, num.limits, factor.limits)
#'
# Loading a dataset and assigning labels
data(icu)
attr(icu, "var.labels")[match(c("UnitAdmissionDateTime","UnitDischargeDateTime",
   "HospitalAdmissionDate", "HospitalDischargeDate"), names(icu))] <-
   c("Unit admission","Unit discharge","Hospital admission","Hospital discharge")

# Checking only one variable that should be a date.
t_date(icu, "HospitalDischargeDate", date.max = as.Date("2013-10-30"),
                                     date.min = as.Date("2013-02-20"))

# Checking a date variable misspecifying the date format
# will cause the variable dates to be identified as non-date values.
t_date(data = icu, date.var = "HospitalDischargeDate",
                   date.max = as.Date("2013-10-30"),
                   date.min = as.Date("2013-02-20"),
                   format.date = "%d/%m/%Y")

# Making a limit data.frame assuming an 'auto' format.date
d.lim <- data.frame(date.var = c("UnitAdmissionDateTime","UnitDischargeDateTime",
                   "HospitalAdmissionDate","HospitalDischargeDate"),
                   date.min = rep(as.Date("2013-02-28"), 4),
                   date.max = rep(as.Date("2013-11-30"), 4))
d.lim

# Checking two or more date variables (or the ones that should be as date) at once
date.table(data = icu, date.limits = d.lim)

# Making a limit data.frame specifying format.date argument
# Here the the last 'format.date' is missspecified on purpose
# So, the last date will be identified as non-date values.
d.lim <- data.frame(date.var = c("UnitAdmissionDateTime","UnitDischargeDateTime",
         "HospitalAdmissionDate","HospitalDischargeDate"),
          date.min = rep(as.Date("2013-02-28"), 4),
          date.max = rep(as.Date("2013-11-30"), 4),
          format.date = c(rep("%Y/%m/%d",3), "%Y-%m-%d"))
d.lim

# Checking the quality of date variable with new limits.
# The 'format.date = ""' is required to force the function to look the format
# into the date.limits data.frame
date.table(data = icu, date.limits = d.lim, format.date = "")

rm(icu, d.lim)

# Simulating a dataset with 5 factor variables and assigning labels
y <- data.frame(Var1 = sample(c("Yes","No", "Ignored", "", "yes ", NA), 200, replace = TRUE),
                Var2 = sample(c("Death","Discharge", "", NA), 200, replace = TRUE),
                Var3 = sample(c(16:35, NA), 200, replace = TRUE),
                Var4 = sample(c(12:300, "Female", "", NA), 200, replace = TRUE),
                Var5 = sample(c(60:800), 200, replace = TRUE))
attr(y, "var.labels") <- c("Intervention use","Unit destination","BMI","Age","Cholesterol")
summary(y)

# Cheking the quality only the first variable
t_factor(y, "Var1", c("Yes","No","Ignored"))

# Checking two or more variables at once
factor.limits  = list(list("Var1",c("Yes","No")),
                      list("Var2",c("Death","Discharge")))
factor.table(y, limits = factor.limits)

# Checking only one variable that shohuld be numeric
t_num(y,"Var3", num.min = 17, num.max = 32)

# Making the limits data.frame
num.limits <- data.frame(num.var = c("Var3","Var4","Var5"),
              num.min = c(17,18,70), num.max = c(32,110,300))
num.limits

# Checking two or more numeric variables (or the ones that
#          should be as numeric) at once
num.table(y, num.limits)

# Removing the unwanted values (extremes or not valid).
y <- rm.unwanted(data = y, limits = factor.limits,
                           num.limits = num.limits)
summary(y)

rm(y, num.limits, factor.limits)
#'
# Loading a dataset and assigning labels
data(icu)
attr(icu, "var.labels")[match(c("UnitAdmissionDateTime","UnitDischargeDateTime",
   "HospitalAdmissionDate", "HospitalDischargeDate"), names(icu))] <-
   c("Unit admission","Unit discharge","Hospital admission","Hospital discharge")

# Checking only one variable that should be a date.
t_date(icu, "HospitalDischargeDate", date.max = as.Date("2013-10-30"),
                                     date.min = as.Date("2013-02-20"))

# Checking a date variable misspecifying the date format
# will cause the variable dates to be identified as non-date values.
t_date(data = icu, date.var = "HospitalDischargeDate",
                   date.max = as.Date("2013-10-30"),
                   date.min = as.Date("2013-02-20"),
                   format.date = "%d/%m/%Y")

# Making a limit data.frame assuming an 'auto' format.date
d.lim <- data.frame(date.var = c("UnitAdmissionDateTime","UnitDischargeDateTime",
                   "HospitalAdmissionDate","HospitalDischargeDate"),
                   date.min = rep(as.Date("2013-02-28"), 4),
                   date.max = rep(as.Date("2013-11-30"), 4))
d.lim

# Checking two or more date variables (or the ones that should be as date) at once
date.table(data = icu, date.limits = d.lim)

# Making a limit data.frame specifying format.date argument
# Here the the last 'format.date' is missspecified on purpose
# So, the last date will be identified as non-date values.
d.lim <- data.frame(date.var = c("UnitAdmissionDateTime","UnitDischargeDateTime",
         "HospitalAdmissionDate","HospitalDischargeDate"),
          date.min = rep(as.Date("2013-02-28"), 4),
          date.max = rep(as.Date("2013-11-30"), 4),
          format.date = c(rep("%Y/%m/%d",3), "%Y-%m-%d"))
d.lim

# Checking the quality of date variable with new limits.
# The 'format.date = ""' is required to force the function to look the format
# into the date.limits data.frame
date.table(data = icu, date.limits = d.lim, format.date = "")

rm(icu, d.lim)

Funnel plot for benchmarking health units

Description

Produces a variety of funnel plots comparing health units or ICUs (intensive care units) making easy to identify those units which deviate from the group. There is a function that calculates all the values required and returns the values for all units and the funnel, and there is a function that calls graphical parameters from the former values. The options of funnels available are the funnel for rate, for ratio of rates, for proportions, for difference of proportions and for ratio of proportions.

The funnel for rates are usually plots of either SMR or SRU at vertical axis. If the direct method is chosen, the horizontal axis will display the number of admissions. If the indirect method is chosen instead, the expected number of deaths will be displayed for SMR or the expected length of stay for SRU. As consequence of this differentiation, the interpretation regarding the clssification of the points displayed will be the same in every case.

The funnel for ratio of rates are usually plots of ratios of SMRs (or SRUs) within the same units. These two SMRs are, for example, from the same units in different time periods. Therefore, it expresses how the SMR changed over time. If the number of expected deaths is different in both periods, the plot will return at the horizontal axis a parametrization of the geometric mean of the expected number of deaths in both periods for each unit. If the number of expected deaths is identical in both periods, the plot will return at the horizontal axis the arithmetic mean of the observed number of deaths in both periods for each unit.

The funnel for proportions plots on the vertical axis the percentage of observed deaths of the units and on the horizontal axis the number (volume) of admissions. The funnel for ratio of proportions and for difference of proportions are usually used to express the fraction of deaths of the same units in different time period. Therefore, they express how the fraction of deaths changed over time in each unit. If one picks the difference of proportions, the horizontal axis will display a parametrization of the arithmetic mean of the number of admissions in both periods. If one picks the ratio of proportions, the horizontal axis will display a parametrization of the geometric mean of the number of admissions in both periods.

Usage

funnel(
  unit,
  y,
  n,
  n1,
  n2,
  o,
  o1,
  o2,
  e,
  e1,
  e2,
  lambda1 = sum(o1)/sum(n1),
  lambda2 = sum(o2)/sum(n2),
  pi1 = sum(o1)/sum(n1),
  pi2 = sum(o2)/sum(n2),
  y.type = c("SMR", "SRU"),
  p = c(0.95, 0.998),
  theta,
  method = c("normal", "exact"),
  direct = FALSE,
  myunits = rep(0, length(unit)),
  option = c("rate", "ratioRates", "prop", "diffProp", "ratioProp"),
  printUnits = TRUE,
  plot = TRUE,
  digits = 5,
  overdispersion = FALSE,
  ...
)

## S3 method for class 'funnel'
print(x, ...)

## S3 method for class 'funnel'
plot(
  x,
  ...,
  col = c("darkblue", "paleturquoise3", "gray26"),
  lwd = 2,
  lty = c(2, 6, 1),
  bty = "n",
  pch = 21,
  pt.col = "white",
  bg = "orange",
  pt.cex = 1.5,
  auto.legend = TRUE,
  text.cex = 0.7,
  text.pos = NULL,
  mypts.col = "darkblue",
  printUnits = x$printUnits,
  xlab = x$xlab,
  ylab = x$ylab,
  xlim = x$xlim,
  ylim = x$ylim
)

rateFunnel(
  unit,
  y,
  n,
  o,
  e,
  y.type,
  p,
  theta = 1,
  method = c("exact", "normal"),
  direct,
  ...,
  printUnits,
  auto.xlab = TRUE,
  xlab = c("Volume of cases", "Expected values"),
  ylab = y.type[1],
  xlim = c(0, max(rho)),
  ylim = c(min(lowerCI[[which(p == max(p))]]), max(upperCI[[which(p == max(p))]])),
  myunits,
  digits,
  overdispersion
)

changeRateFunnel(
  unit,
  n1,
  n2,
  o1,
  e1,
  o2,
  e2,
  lambda1,
  lambda2,
  y.type,
  p,
  ...,
  printUnits,
  auto.xlab = TRUE,
  xlab = c("Average observed count", "Expectation per period"),
  auto.ylab = TRUE,
  ylab = c(paste0(y.type[1], "'s Ratio"), paste0("Log(", y.type[1], "'s Ratio)")),
  ylim = c(max(lowerCI[[which(p == max(p))]]) - 1.5 * theta, min(upperCI[[which(p ==
    max(p))]]) + 1.5 * theta),
  xlim = c(0, max(rho)),
  myunits,
  digits,
  overdispersion
)

propFunnel(
  unit,
  o,
  n,
  theta,
  p,
  method = c("exact", "normal"),
  ...,
  printUnits,
  ylab = "%",
  xlab = "Volume",
  ylim = c(0, min(upperCI[[which(p == max(p))]]) + 2.5 * theta),
  xlim = c(0, max(n)),
  myunits,
  digits,
  overdispersion
)

changePropFunnel(
  unit,
  o1,
  o2,
  n1,
  n2,
  p,
  pi1,
  pi2,
  method = c("diff", "ratio"),
  ...,
  printUnits,
  xlab = "Sample size per period",
  auto.ylab = TRUE,
  ylab = c("Proportions difference", "Proportions ratio log"),
  ylim = c(max(lowerCI[[which(p == max(p))]]) - 6 * theta, min(upperCI[[which(p ==
    max(p))]]) + 6 * theta),
  xlim = c(0, max(rho)),
  myunits,
  digits,
  overdispersion
)
funnel(
  unit,
  y,
  n,
  n1,
  n2,
  o,
  o1,
  o2,
  e,
  e1,
  e2,
  lambda1 = sum(o1)/sum(n1),
  lambda2 = sum(o2)/sum(n2),
  pi1 = sum(o1)/sum(n1),
  pi2 = sum(o2)/sum(n2),
  y.type = c("SMR", "SRU"),
  p = c(0.95, 0.998),
  theta,
  method = c("normal", "exact"),
  direct = FALSE,
  myunits = rep(0, length(unit)),
  option = c("rate", "ratioRates", "prop", "diffProp", "ratioProp"),
  printUnits = TRUE,
  plot = TRUE,
  digits = 5,
  overdispersion = FALSE,
  ...
)

## S3 method for class 'funnel'
print(x, ...)

## S3 method for class 'funnel'
plot(
  x,
  ...,
  col = c("darkblue", "paleturquoise3", "gray26"),
  lwd = 2,
  lty = c(2, 6, 1),
  bty = "n",
  pch = 21,
  pt.col = "white",
  bg = "orange",
  pt.cex = 1.5,
  auto.legend = TRUE,
  text.cex = 0.7,
  text.pos = NULL,
  mypts.col = "darkblue",
  printUnits = x$printUnits,
  xlab = x$xlab,
  ylab = x$ylab,
  xlim = x$xlim,
  ylim = x$ylim
)

rateFunnel(
  unit,
  y,
  n,
  o,
  e,
  y.type,
  p,
  theta = 1,
  method = c("exact", "normal"),
  direct,
  ...,
  printUnits,
  auto.xlab = TRUE,
  xlab = c("Volume of cases", "Expected values"),
  ylab = y.type[1],
  xlim = c(0, max(rho)),
  ylim = c(min(lowerCI[[which(p == max(p))]]), max(upperCI[[which(p == max(p))]])),
  myunits,
  digits,
  overdispersion
)

changeRateFunnel(
  unit,
  n1,
  n2,
  o1,
  e1,
  o2,
  e2,
  lambda1,
  lambda2,
  y.type,
  p,
  ...,
  printUnits,
  auto.xlab = TRUE,
  xlab = c("Average observed count", "Expectation per period"),
  auto.ylab = TRUE,
  ylab = c(paste0(y.type[1], "'s Ratio"), paste0("Log(", y.type[1], "'s Ratio)")),
  ylim = c(max(lowerCI[[which(p == max(p))]]) - 1.5 * theta, min(upperCI[[which(p ==
    max(p))]]) + 1.5 * theta),
  xlim = c(0, max(rho)),
  myunits,
  digits,
  overdispersion
)

propFunnel(
  unit,
  o,
  n,
  theta,
  p,
  method = c("exact", "normal"),
  ...,
  printUnits,
  ylab = "%",
  xlab = "Volume",
  ylim = c(0, min(upperCI[[which(p == max(p))]]) + 2.5 * theta),
  xlim = c(0, max(n)),
  myunits,
  digits,
  overdispersion
)

changePropFunnel(
  unit,
  o1,
  o2,
  n1,
  n2,
  p,
  pi1,
  pi2,
  method = c("diff", "ratio"),
  ...,
  printUnits,
  xlab = "Sample size per period",
  auto.ylab = TRUE,
  ylab = c("Proportions difference", "Proportions ratio log"),
  ylim = c(max(lowerCI[[which(p == max(p))]]) - 6 * theta, min(upperCI[[which(p ==
    max(p))]]) + 6 * theta),
  xlim = c(0, max(rho)),
  myunits,
  digits,
  overdispersion
)

Arguments

`unit`	A factor vector representing the unit names.
`y`	A numeric vector representing the "Standardized rate" for each unit, usually the SMR (Standardized Mortality Ratio), or possibly the SRU (Standardized Resource Use), accordind to `y.type` . It's also called "indicator".
`n`	A numeric vector representing the case volume, or number of admissions, for each unit.
`n1`, `n2`	If one picks `option = "ratioRates"` or `option = "diffProp"` or `option = "ratioProp"`, then `n1` and `n2` are numeric vectors representing the total of admissions at 1st and 2nd periods, respectively.
`o`	A numeric vector representing the observed death. Acceptable values are 0 (absence) or 1 (presence).
`o1`, `o2`	If one picks `option = "ratioRates"` or `option = "diffProp"` or `option = "ratioProp"`, then `o1` and `o2` are numeric vectors representing the observed deaths at 1st and 2nd periods, respectively.
`e`	Used only when `option = "rate"` and `direct = FALSE`. This is a numeric vector representing the expected number of deaths.
`e1`, `e2`	If one picks `option = "ratioRates"`, `e1` and `e2` are numeric vectors representing the expected number of deaths at 1st and 2nd periods, respectively.
`lambda1`, `lambda2`	Values correponding to the rate at which a death occurs in the institutions at the 1st and 2nd periods, respectively. It is assumed that the parameters `o1` and `o2` are distributed as `oi` ~ Poisson(`lambdai`) when `option = "ratioRates"`. The default value for `lambdai` is `lambdai = sum(oi)/sum(ni)`, where ni is the value of the parameter `n1` or `n2` when i equals 1 or 2. `lambdai` is the estimate for the mean of the poisson distribution.
`pi1`, `pi2`	Values correponding to the probability for the occurrence of a death in the institutions at the 1st and 2nd periods, respectively. Its assumed that the paramenters `o1` and `o2` are distributed as `oi` ~ Bin(`pii,ni`) when `option = "diffProp"` or `option = "ratioProp"`. `ni` is the value of the parameter n1 or n2 when i equals 1 or 2. The default value for `pii` is `sum(oi)/sum(ni)`, the estimate for the mean of the poisson distribution.
`y.type`	A character vector representing the indicator type. It is used to name the vertical axis if `option = "rate"` or `option = "ratioRate"` and ignored otherwise. It usually is 'SMR' or 'SRU'.
`p`	A confidence level numeric vector. The function will return a confidence interval for each value in p. The default is 2 and 3 standard deviations (`p = c(.95, 998)`).
`theta`	The target value which specifies the desired expectation for institutions considered "in control". Used when `option = "prop"` or `option = "rate"`. Usually, this function internally estimates a theta to represent a central tendency of the group. However, one may want to set a pre-specified value for theta to indicate a "baseline" parameter for comparison (e.g 1 for `option = "rate"` or .20 for `option = "prop"`). If this is the case, the horizontal line representing theta may not be centralized in the funnel or may be even outside the funnel, making the plot look unusual.
`method`	There are two kinds of approximations for the CI, as mentioned in `direct` parameter. The one from the exact distribuition (binomial or poisson) and the one from the normal distribution. So, `method` is a character vector representing the kind of approximation desired, being `"exact"` (default) or `"normal"` the two options. It is used when `option = "rate"` or `option = "prop"`. The original report makes no formal comparison of which method is best, however it is mentioned that the funnels from different methods should look identical or very similar if all units have 100 or more observations. If any unit has less, the funnel from the normal approximation may mislead the interpretation. See details.
`direct`	Logical (default = `FALSE`); Used when `option = "rate"`. If `TRUE`, we assume the vector of rates "y" is being reported as a rate per (say) 1000 individuals, and that it has been transformed to a proportion between 0 and 1. The associated error (horizontal axis) will be measured accordingly to the size of the populations `n`. The CI - confidence interval - is calculated by a binomial distribution. If `FALSE`, the associated error will be measured accordingly to the expected number of deaths `e`. The CI is calculated by a poisson distribution instead. See details.
`myunits`	A numeric vector coded with 0 and 1 indicating which units one would like to benchmark among all units. These will be highlighted with dots of different collors in the plot.
`option`	A character specifying the type of funnel plot one wants to produce. It can assume `"rate"`, `"ratioRates"`, `"prop"`, `"diffProp"` or `"ratioProp"`. If `option = "rate"`, `funnel` plots a standardized rate y versus the expected number of deaths or case volume (number of unit admissions) for all units. If `option = "ratioRate"`, `funnel` can be used to compare units at two diferent periods. It plots a ratio of rates y versus a precision parameter rho. If `option = "prop"`, `funnel` plots a proportion y versus its case volume (number of admissions). If `option = "ratioProp"` or `option = "diffProp"`, `funnel` can be used to compare units at two diferent periods. It plots a ratio (or difference) of proportions y versus a precision parameter rho. See details.
`printUnits`	Logical (default = `TRUE`); If `TRUE`, the units are identified in the plot and printed in de console. The numbers plotted correspond to the row numbers printed in the console.
`plot`	Logical; If `TRUE` (default), the correspondent graphic is plotted with the standard options.
`digits`	Integer indicating the number of decimals to be used in the output.
`overdispersion`	Logical (default = FALSE); If TRUE, introduces an multiplicative over-dispersion factor phi that will inflate the CI null variance. See details.
`...`	Further arguments passed to `plot`.
`x`	An object of class 'funnel'.
`col`	A character vector representing the colors for the CI funnel lines. Must have same length of `p` + 1 with the target line color in the last position.
`lwd`	A positive number specifying the lines width. It's the same for all lines in the plot. See `par`.
`lty`	A numeric vector representing the CI lines types. See `par`.
`bty`	A character string which represents the type of `box` which is drawn around plots. See `par`.
`pch`	Either an integer or a single character specifying a symbol to be used as the default in plotting points. See `points` for possible values and their interpretation. Note that only integers and single-character strings can be set as a graphics parameter (and not NA nor NULL).
`pt.col`	A character specifying the points colors.
`bg`	A character specifying the color to be used for the points background when `pch = 21` (default). See `par`.
`pt.cex`	A numerical value giving the amount by which plotting points should be magnified relative to the default. See `par`.
`auto.legend`	Logical; If `TRUE` (default), prints a legend with default arguments.
`text.cex`	A numerical value giving the amount by which plotting text should be magnified relative to the default. See `par`.
`text.pos`	A position specifier for numbers that correspond to the units in the plot. Values of 1, 2, 3 and 4, respectively indicate positions below, to the left of, above and to the right of the points.
`mypts.col`	A character representing the color used to benchmark the units specified in `myunits`.
`xlab`, `ylab`	A title for the x and y axis. See `title`
`xlim`, `ylim`	Limits of horizontal and vertical axis. These limits are defined in the funnel plot and passed to `plot.funnel`. The user may redefine the limits in `plot.funnel`. Ultimately, these arguments are passed to `plot.default`.
`auto.xlab`, `auto.ylab`	Logical. If `TRUE`, one is not able to change x and y axis labels, respectively.

Details

For every possible value of option, if overdispersion = TRUE, the CI can be inflated by a overdispersion parameter phi. There is a test for overdispersion which inflates the funnel if it's necessary. An "Winsorized" over-dispersion parameter is estimated and is used to inflate the funnel limits if it is significantly greater than 1. The parameter phi is returned as an funnel object.
If option = "rate", funnel plots a standardized rate y versus the expected number of deaths or volume value for several units.

To choose the direct argument, one should pay attention if one wants to use a Direct or Indirect Standardized Rate. If direct, we assume the rate is reported as a rate per (say) 1000 individuals, then it is treated as a proportion. If indirect, we assume it is a cross-sectional data that leads to a standardized event ratio.

In many circumstances we can assume an exact or approximate normal distribution for the data. Using the method argument, one could choose between "exact" or "normal". For direct standardized rates, the exact distribuition is binomial and for indirect standardized rates, the exact distribuition is poisson. Assume rho is the precision parameter (volume, for direct rates; expected value, for indirect rates). The original report claims that, for rho > 100, the normal and exact curves almost coincide. So, one could perfectly use normal approximation if ones data parameter precision is greater than 100, in general.

The console warns if there are units with volume/expected value less than 100.

phi = (1/total) * sum((y - theta) ^ 2 * rho)/g(theta)

var(y|theta,rho) = (phi * g(theta))/rho
If option = "ratioRate", funnel can be used to compare units at two diferent periods. It plots a ratio of rates y versus a precision parameter rho.

Suppose we have two measures for each institution: O1; E1 in a baseline period and O2; E2 in a subsequent period, and we wish to assess the change in the underlying rate (SMR or SRU). We shall only consider the ratio of rates option. The exact method will automatically be applied if E1 = E2, and the indirect method, of normal approximations, otherwise. On this second method, for low (especially zero) counts the funnel function adds 0.5 to all parameters O and E in order to stabilize the estimates.

Y = (O1/E1)/(O2/E2) and the target theta = lambda2/lambda1.

When E1 = E2, y is plotted versus the average observed count (rho).

When E1 is different of E2, i.e., it is used normal approximation. It is convenient to work on a logarithmic scale so that log(theta) is a target for log(Y). Y is plotted versus a different rho depending on the chosen rate.
If option = "prop", funnel plots a proportion y versus its volume. It is used for cross-sectional data. Suppose in each institution that O events are observed out of a sample size of N:

The indicator is the observed proportion y = O/N

Assume N is the precision parameter (volume). Similarly to when option = "rate", for N > 100 the normal and exact curves almost coincide. So, one could perfectly use normal approximation on the parameter method if ones data parameter precision is greater than 100, in general.

phi = (1/total) * sum((y - theta) ^ 2 * N)/g(theta)

var(y|theta,N) = (phi * g(theta))/N
If option = "ratioProp" or option = "diffProp", funnel can be used to compare units at two diferent periods. It plots a ratio (or difference) of proportions y versus a precision parameter rho to assess the change in the underlying proportion from pi1 to pi2. Normal approximations are used throughout, and for low (especially zero) counts, the function adds 0.5 to all arguments r and 1 to all arguments n in order to stabilize the estimates.

In the case option = "diffProp", the indicator is Y = (O2/N2 - O1/N1) and theta = pi2 - pi1. If option = "ratioProp", the indicator is Y = (O2/N2)/(O1/N1) and theta = pi2/pi1. It is convenient to work on a logarithmic scale, so that log(theta) is a target for log(Y) in this case as well.

For these two parameter options, the precision parameter (plotted at horizontal axis) can be interpreted as approximately the sample size per period.

Value

A table with unit names, y, observed (Obs), expected (Exp) and admissions (N) for each unit, a binary column showing which units one would like to highlight in the plot (myunits) and final columns show which units are out of control.

References

Spiegelhalter, David J. "Funnel plots for comparing institutional performance." Statistics in medicine 24.8 (2005): 1185-1202.

Examples

# Loading data
data(icu)

# Some edition
icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation / 100
icu <- icu[-which(icu$Unit == "F"),]
icu$myunits <- ifelse(icu$Unit == "A",1,0) #my units
icu <- droplevels(icu)

# Getting the cross-sectional arguments to use in funnel
x <- SMR.table(data = icu, group.var = "Unit",
              obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation")
myunit_names <- unique(icu$Unit[which(icu$myunits == 1)])
x$myunits <- ifelse(x$Levels %in% myunit_names, 1,0)

# Analysis of proportions
f1 <- funnel(unit = x$Levels[-1], o = x[-1,]$Observed, theta = x$Observed[1] / x$N[1],
            n = x[-1,]$N, method = "exact", myunits = x$myunits[-1], option = "prop", plot = FALSE)
f1
plot(f1, main = "Cross-sectional proportions")

# To analyze rates (SMR)
f2 <- funnel(unit = x$Levels[-1], y = x[-1,]$SMR, method = "exact", direct = TRUE,
            theta = x$SMR[1], e = x[-1,]$Expected, n = x[-1,]$N, o = x[-1,]$Observed,
            option = "rate", plot = FALSE)
f2
plot(f2, main = "Cross-sectional rate (SMR)")

# Creating a variable containing month information about each admission
icu$month <- as.numeric(format(as.Date(icu$UnitAdmissionDateTime),"%m"))

# First quarter
dt1 <- icu[which(icu$month %in% c(1,2,3)),]

# Second quarter
dt2 <- icu[which(icu$month %in% c(4,5,6)),]

# Getting the two period arguments to use in funnel
z <- SMR.table(data = dt1, group.var = "Unit", obs.var = "UnitDischargeName",
              pred.var = "Saps3DeathProbabilityStandardEquation")
w <- SMR.table(data = dt2, group.var = "Unit", obs.var = "UnitDischargeName",
              pred.var = "Saps3DeathProbabilityStandardEquation")

z$myunits <- ifelse(z$Levels %in% myunit_names, 1,0)
w$myunits <- ifelse(w$Levels %in% myunit_names, 1,0)
# To analyze periods using ratio rates with e1 = e1
f3 <- funnel(unit = z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1],
            e1 = z$Expected[-1],
            n2 = w$N[-1], o2 = w$Observed[-1], e2 = z$Expected[-1],
            myunits = z$myunits[-1], option = "ratioRates", plot = FALSE)
f3
plot(f3, main = "Ratio of SMRs of periods with same expectation of death")


# To analyze periods using ratio rates with e1 =! e1
f4 <- funnel(unit <- z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1],
            e1 = z$Expected[-1], n2 = w$N[-1], o2 = w$Observed[-1], e2 = w$Expected[-1],
            option = "ratioRates", plot = FALSE)
f4
plot(f4, main = "Ratio of SMRs of periods with different expectation of death",
    ylim = c(-1.5,1.5), xlim = c(0,200))

# To analyze periods by difference in proportions
f5 <- funnel(unit <- z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1],
            n2 = w$N[-1], o2 = w$Observed[-1], option = "diffProp", plot = FALSE)
f5
plot(f5, main = "Difference in proportions of death for two periods")

# To analyze periods by ratio of proportions
f6 <- funnel(unit <- z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1],
            n2 = w$N[-1], o2 = w$Observed[-1], option = "ratioProp", plot = FALSE)
f6
plot(f6, main = "Ratio of proportions of death for two periods")

rm(icu, x, z, w, dt1, dt2, unit, f1, f2, f3, f4, f5, f6)

# Loading data
data(icu)

# Some edition
icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation / 100
icu <- icu[-which(icu$Unit == "F"),]
icu$myunits <- ifelse(icu$Unit == "A",1,0) #my units
icu <- droplevels(icu)

# Getting the cross-sectional arguments to use in funnel
x <- SMR.table(data = icu, group.var = "Unit",
              obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation")
myunit_names <- unique(icu$Unit[which(icu$myunits == 1)])
x$myunits <- ifelse(x$Levels %in% myunit_names, 1,0)

# Analysis of proportions
f1 <- funnel(unit = x$Levels[-1], o = x[-1,]$Observed, theta = x$Observed[1] / x$N[1],
            n = x[-1,]$N, method = "exact", myunits = x$myunits[-1], option = "prop", plot = FALSE)
f1
plot(f1, main = "Cross-sectional proportions")

# To analyze rates (SMR)
f2 <- funnel(unit = x$Levels[-1], y = x[-1,]$SMR, method = "exact", direct = TRUE,
            theta = x$SMR[1], e = x[-1,]$Expected, n = x[-1,]$N, o = x[-1,]$Observed,
            option = "rate", plot = FALSE)
f2
plot(f2, main = "Cross-sectional rate (SMR)")

# Creating a variable containing month information about each admission
icu$month <- as.numeric(format(as.Date(icu$UnitAdmissionDateTime),"%m"))

# First quarter
dt1 <- icu[which(icu$month %in% c(1,2,3)),]

# Second quarter
dt2 <- icu[which(icu$month %in% c(4,5,6)),]

# Getting the two period arguments to use in funnel
z <- SMR.table(data = dt1, group.var = "Unit", obs.var = "UnitDischargeName",
              pred.var = "Saps3DeathProbabilityStandardEquation")
w <- SMR.table(data = dt2, group.var = "Unit", obs.var = "UnitDischargeName",
              pred.var = "Saps3DeathProbabilityStandardEquation")

z$myunits <- ifelse(z$Levels %in% myunit_names, 1,0)
w$myunits <- ifelse(w$Levels %in% myunit_names, 1,0)
# To analyze periods using ratio rates with e1 = e1
f3 <- funnel(unit = z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1],
            e1 = z$Expected[-1],
            n2 = w$N[-1], o2 = w$Observed[-1], e2 = z$Expected[-1],
            myunits = z$myunits[-1], option = "ratioRates", plot = FALSE)
f3
plot(f3, main = "Ratio of SMRs of periods with same expectation of death")


# To analyze periods using ratio rates with e1 =! e1
f4 <- funnel(unit <- z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1],
            e1 = z$Expected[-1], n2 = w$N[-1], o2 = w$Observed[-1], e2 = w$Expected[-1],
            option = "ratioRates", plot = FALSE)
f4
plot(f4, main = "Ratio of SMRs of periods with different expectation of death",
    ylim = c(-1.5,1.5), xlim = c(0,200))

# To analyze periods by difference in proportions
f5 <- funnel(unit <- z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1],
            n2 = w$N[-1], o2 = w$Observed[-1], option = "diffProp", plot = FALSE)
f5
plot(f5, main = "Difference in proportions of death for two periods")

# To analyze periods by ratio of proportions
f6 <- funnel(unit <- z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1],
            n2 = w$N[-1], o2 = w$Observed[-1], option = "ratioProp", plot = FALSE)
f6
plot(f6, main = "Ratio of proportions of death for two periods")

rm(icu, x, z, w, dt1, dt2, unit, f1, f2, f3, f4, f5, f6)

Data from ICU admissions.

Description

A dataset containing selected data from some ICU (intensive care units) admissions and its outcomes at the year 2013 used in the ORCHESTRA study.

Usage

icu
icu

Format

A data frame with 13709 rows and 24 variables

Details

Unit The name of the ICU unit.
Age Patient age.
UnitAdmissionDateTime ICU unit admission date and time.
UnitDischargeDateTime ICU unit discharge date and time.
UnitDischargeName Unit admission outcome. Death = 1, or Discharge = 0.
UnitDestinationName ICU unit destination after discharge.
HospitalAdmissionDate Hospital admission date.
HospitalDischargeDate Hospital discharge date.
HospitalDischargeName Hospital admission outcome. Death = 1, Discharge = 0.
LengthHospitalStayPriorUnitAdmission Hospital length of stay before unit admission.
AdmissionSourceName The origin of the patient before ICU admission.
AdmissionTypeName_pri Admission as Clinical treatment (1) Elective surgery (2), or Urgent surgery (3).
AdmissionReasonName_pri Main diagnosis groups.
Vasopressors_D1 Vasopressors use at ICU first day admission? No = No, Yes = 1
IsMechanicalVentilation1h Required mechanichal ventilation at 1st hour of admission. No = 0, Yes = 1.
CharlsonComorbidityIndex Charlson comorbidity index.
Saps3Points SAPS 3 score
Saps3DeathProbabilityStandardEquation SAPS 3 estimated probability
SofaScore SOFA score.
los Unit length of stay, that is, the time difference between UnitAdmissionDateTime and UnitDischargeDateTime, in days.

Source

Organizational characteristics, outcomes, and resource use in 78 Brazilian intensive care units: the ORCHESTRA study. Intensive Care Med. 2015 Dec;41(12):2149-60.

Miscellaneous functions for data editing

Description

Collection of functions for data editing, usually used as lower levels for other functions.

f.num is a wrapper to format numeric variables that are stored as character or factor, simultaneously it will try to detect comma spearated and replace it by dots before formating the variable as numeric. Any non-numeric encoding will be coerced to NA.

f.date is a wrapper either to as.Date or strptime to format character or factor variables into dates. In Epimed Solutions database there are a few pre-specified formats that f.date will try to detect and return a formated date. f.date will try to dected if more than half of the elements in a vector have a pre-specified format. If so, the remaining will be coerced to NA if they have different format from the detected. See example.

remove.na identifies all the empty spaces, i.e. the " " cells, of character or factor variables in a data.frame and returns the same data.frame with these empty cells replaced, by default, by NAs. It does not matter the length of the empty spaces. Also, remove.na trims the leading and trailing empty spaces from all character and factor variables. It does not format the numeric variables. It may also return at the console a few information about the " " fields.

tab2tex removes the empty rows, and also tunrs the rownames of a table epiDisplay::tableStack into the first column, to make it easier to paste the table into a rtf or latex document without empty rows or rownames conflicts.

trunc_num truncates a numeric vector by replacing the values below the min value or above the max values by the min and max values respectively or optionall to NA. See example.

dummy.columns takes a data.frame with one column with concatatenated levels of a factor (or character) variable and return a data.frame with additional columns with zeros and ones (dummy values), which names are the factor levels of the original column. See example below. rm.dummy.columns is an internal function of dummy.columns that deletes the new dummy columns which have less then a specified minimum events.

funnelEstimate estimates funnel confidence intervals (CI) for binomial, poisson or normal distribution. Used inside funnel.

winsorising is an internal function that estimates a phi parameter after shirinking extreme z-scores. This parameter is used to inflate funnel CIs due overdispersion presence.

Usage

f.num(num.var)

f.date(date)

remove.na(data, replace = NA, console.output = TRUE)

tab2tex(x, nc = ncol(x))

trunc_num(x, min, max, toNA = FALSE)

dummy.columns(
  data,
  original.column,
  factors,
  scan.oc = FALSE,
  sep = ",",
  colnames.add = "Dummy.",
  min.events = NULL,
  rm.oc = FALSE,
  warn = FALSE,
  return.factor = TRUE
)

rm.dummy.columns(data, colnames, event = "1", min.events = 50, warn = FALSE)

funnelEstimate(
  y,
  range,
  u,
  totalAdmissions,
  totalObserved,
  p = 0.95,
  theta = 1,
  overdispersion = TRUE,
  dist = c("binomial", "normal", "poisson"),
  rho,
  gdetheta
)

winsorising(z_score, u)
f.num(num.var)

f.date(date)

remove.na(data, replace = NA, console.output = TRUE)

tab2tex(x, nc = ncol(x))

trunc_num(x, min, max, toNA = FALSE)

dummy.columns(
  data,
  original.column,
  factors,
  scan.oc = FALSE,
  sep = ",",
  colnames.add = "Dummy.",
  min.events = NULL,
  rm.oc = FALSE,
  warn = FALSE,
  return.factor = TRUE
)

rm.dummy.columns(data, colnames, event = "1", min.events = 50, warn = FALSE)

funnelEstimate(
  y,
  range,
  u,
  totalAdmissions,
  totalObserved,
  p = 0.95,
  theta = 1,
  overdispersion = TRUE,
  dist = c("binomial", "normal", "poisson"),
  rho,
  gdetheta
)

winsorising(z_score, u)

Arguments

`num.var`	A character, or factor variable to be formated as numeric.
`date`	A character or factor variable to be formated as date.
`data`	A `data.frame`.
`replace`	By default, NA. But could be any vector of length 1.
`console.output`	Logical. Print at the console a few informations about the " " fields?
`x`, `nc`	For `tab2tex` x is a object from epiDisplay::tableStack. nc is the number of the last column to keep in the table. If the table has 5 columns and nc = 3, then columns 4 and 5 are removed. For `trunc_num`, x is a numeric vector.
`min`, `max`	For `trunc_num`, min and max are the minimal and maximal numeric values where the numeric vector will be truncated.
`toNA`	For `trunc_num`, if FALSE any min and max are the minimal and maximal numeric values where the numeric vector will be truncated.
`original.column`	A character vector representing the name of the column the be transformed in dummy variables.
`factors`	A character vector to make new dummy columns and to match values in `original.column`. This is interesting if the user desires to make dummy only from a few factors in the originlal column. Ignored if scan.oc = TRUE
`scan.oc`	Default = FALSE, if TRUE, `dummy.columns` scans the specified `original.column` and uses all factors to generate dummy variables. It overrides the `factor` argument.
`sep`	A character of legth one that systematically split the factors in the original columns. It wil be passed to the `sep` argument in the `scan` function.
`colnames.add`	The default is '= "Dummy_"'. This is a character vector of length one to stick in the `colnames` of the dummy variables. For example, if the orginal column has A;B;C factor levels, the new dummy variables `colnames` would be "Dummy_A", "Dummy_B", and "Dummy_C"
`min.events`	Either `NULL` (default), or a numeric scalar. If any of the new variables have less events then specified in `min.events`, they will be deleted before returning the output data.
`rm.oc`	Default is `FALSE`. If `TRUE`, `dummy.columns` will delete the original column before returnig the final `data.frame`.
`warn`	Default is `FALSE`. If `TRUE`, `dummy.columns` will print at the console the deleted columns names.
`return.factor`	Default is `TRUE`. If `TRUE`, `dummy.columns` return factor columns with "0" and "1" levels, or numeric otherwise.
`colnames`	For `rm.dummy.columns` this is the names of the columns to be tested and deleted inside `dummy.columns`.
`event`	A character string to be detected as a event. In `rm.dummy.columns`, if the columns are coded as '0' and '1', the event is '1', if it is coded as logical, the events is 'TRUE'.
`y`	A numeric vector representing the "Standardized rate" for each unit, usually the SMR or possibly the SRU , accordind to `y.type`.
`range`	A numeric range representing for which values the funnel will be estimated. Usually the same variable in x axis (the precision parameter).
`u`	A number indicating the total amount of ICUs in the data.
`totalAdmissions`	The quantity of admissions in all units.
`totalObserved`	The quantity of observed death in all units.
`p`	A number between 0 and 1 indicating the confidence interval level for the funnel.
`theta`	Target value which specifies the desired expectation for institutions considered "in control".
`overdispersion`	Logical (default = FALSE); If TRUE, introduces an multiplicative over-dispersion factor phi that will inflate the CI null variance. See `funnel` details.
`dist`	A character specifying the distribution about the funnel control limits will be estimated. It can be "binomial" (default), "normal" or "poisson".
`rho`	A numeric vector representing the funnel precision parameter. It is calculated inside `funnel` and used to calculer `z_score`.
`gdetheta`	A numeric auxiliary numeric vector used to calculate `z_score` to be used to calculate estimate funnel control limits.
`z_score`	A numeric vector indicating the standardized Pearson residual or the "naive" Z-Score for each unit.

Author(s)

Lunna Borges & Pedro Brasil

Examples

# Formating character or factor variable that should be numeric variables
f.num(c("2,4000","10,0000","5.0400"))

# Simulating a dataset
y <- data.frame(v1 = sample(c(" F","M  ","   "), 10, replace = TRUE),
                v2 = sample(c(1:3,"     "), 10, replace = TRUE),
                v3 = sample(c("Alive","Dead",""), 10, replace = TRUE))
y

# Replacing the "" cells by NA
y <- remove.na(y)
y

rm(y)

# Formating dates
x <- f.date(c("28/02/2013","16/07/1998","31/03/2010"))
x
class(x)

# The first element (i.e., the different one) is coerced to NA
x <- f.date(c("2013-02-28 12:40","16/07/1998","31/03/2010"))
x
class(x)

# The last element (i.e. the different one) is coerced to NA
x <- f.date(c("2013-02-28 12:40","1998-07-16 18:50","31/03/2010"))
x
class(x)

# Truncating numeric vectors
trunc_num(1:12, min = 3, max = 10)

# Truncating numeric vectors but returning NAs instead
trunc_num(1:12, min = 3, max = 10, toNA = TRUE)

# Simulating a dataset for dummy.columns example

y <- data.frame(v1 = 1:20,
               v2 = sapply(1:20, function(i) toString(sample(c("Code1","Code2","Code3","Code4"),
                     size = sample(2:4, 1), replace = FALSE))))
y


# For a few of the codes in the original column
y <- dummy.columns(y, original.column = "v2", factor = c("Code2","Code3"))
y

# For all codes in the original column
y <- dummy.columns(y[, 1:2], original.column = "v2", scan.oc = TRUE)
y

# Funnel Estimate
data(icu)
icu

funnelEstimate(y = icu$Saps3DeathProbabilityStandardEquation,
               range = 1, u = length(unique(icu$Unit)),
               totalAdmissions = nrow(icu),
               totalObserved = sum(icu$UnitDischargeName),
               theta = mean(icu$Saps3DeathProbabilityStandardEquation),
               dist = 'normal', rho = 1, gdetheta = 1)

rm(y, icu)

# Formating character or factor variable that should be numeric variables
f.num(c("2,4000","10,0000","5.0400"))

# Simulating a dataset
y <- data.frame(v1 = sample(c(" F","M  ","   "), 10, replace = TRUE),
                v2 = sample(c(1:3,"     "), 10, replace = TRUE),
                v3 = sample(c("Alive","Dead",""), 10, replace = TRUE))
y

# Replacing the "" cells by NA
y <- remove.na(y)
y

rm(y)

# Formating dates
x <- f.date(c("28/02/2013","16/07/1998","31/03/2010"))
x
class(x)

# The first element (i.e., the different one) is coerced to NA
x <- f.date(c("2013-02-28 12:40","16/07/1998","31/03/2010"))
x
class(x)

# The last element (i.e. the different one) is coerced to NA
x <- f.date(c("2013-02-28 12:40","1998-07-16 18:50","31/03/2010"))
x
class(x)

# Truncating numeric vectors
trunc_num(1:12, min = 3, max = 10)

# Truncating numeric vectors but returning NAs instead
trunc_num(1:12, min = 3, max = 10, toNA = TRUE)

# Simulating a dataset for dummy.columns example

y <- data.frame(v1 = 1:20,
               v2 = sapply(1:20, function(i) toString(sample(c("Code1","Code2","Code3","Code4"),
                     size = sample(2:4, 1), replace = FALSE))))
y


# For a few of the codes in the original column
y <- dummy.columns(y, original.column = "v2", factor = c("Code2","Code3"))
y

# For all codes in the original column
y <- dummy.columns(y[, 1:2], original.column = "v2", scan.oc = TRUE)
y

# Funnel Estimate
data(icu)
icu

funnelEstimate(y = icu$Saps3DeathProbabilityStandardEquation,
               range = 1, u = length(unique(icu$Unit)),
               totalAdmissions = nrow(icu),
               totalObserved = sum(icu$UnitDischargeName),
               theta = mean(icu$Saps3DeathProbabilityStandardEquation),
               dist = 'normal', rho = 1, gdetheta = 1)

rm(y, icu)

Mortality Rate

Description

mortality_rate function returns a list with the mortality rate and the number of patients for each month or quarter of the year.

Usage

mortality_rate(
  deaths,
  period = NULL,
  isQuarter = FALSE,
  isYear = FALSE,
  option = c("both", "monthly", "quarterly", "annual"),
  periodName = NULL,
  default_tapply = NA
)
mortality_rate(
  deaths,
  period = NULL,
  isQuarter = FALSE,
  isYear = FALSE,
  option = c("both", "monthly", "quarterly", "annual"),
  periodName = NULL,
  default_tapply = NA
)

Arguments

`deaths`	a numerical vector that only contains 0 and 1, indicating whether the patient was alive or dead, respectively.
`period`	a numerical vector that contains the order of months when the patients were admitted to the hospital unit. If period variable is NULL (the default), the function will return a single mortality rate.
`isQuarter`	logical indicating whether the `period` refers to quarter or not. The default is `FALSE`.
`isYear`	logical indicating whether the `period` refers to years or not. The default is `FALSE`.
`option`	a character string which determines what the function mortality_rate returns. If the option is chosen to be 'both' (the default), the function will return a list containing monthly mortality rate, quarterly mortality rate, annual mortality rate and the number of patients in each month, quarter and year. If the option is 'monthly', only the monthly mortality rate and the number of patients in each month are returned. If the option is 'quarterly', only the quarterly mortality rate and the number of patients in each quarter are returned. If the option is 'annual', only the annual mortality rate and the number of patients in each year are returned.
`periodName`	a character vector that contains the name of months when the patients were admitted to de hospital unit. Used only if period is not NULL.
`default_tapply`	argument to set the `default` in `tapply` function when evaluating the mortality rate for each period. Can be equal to `0` or `NA` (the default).

Author(s)

Camila Cardoso <[email protected]> Lunna Borges <[email protected]>

Examples


# Loading the dataset
data(icu)

# Creating a vector of months
date <- as.Date(icu$UnitDischargeDateTime, tryFormats = '%d/%m/%Y')
months <- as.numeric(format(date, '%m'))

# Vector of deaths
deaths <- icu$UnitDischargeName

# Calculating monthly and quarterly mortality rate
mortality_rate(deaths = deaths, period = months, option = 'both')


# Loading the dataset
data(icu)

# Creating a vector of months
date <- as.Date(icu$UnitDischargeDateTime, tryFormats = '%d/%m/%Y')
months <- as.numeric(format(date, '%m'))

# Vector of deaths
deaths <- icu$UnitDischargeName

# Calculating monthly and quarterly mortality rate
mortality_rate(deaths = deaths, period = months, option = 'both')

Comparisson of the Standardized Resource Use (SRU)

Description

Compares ICU's (intensive care units) SRU with diferent severity classes or compares ICU's SRU at two diferents times. This comparison checks if the ICUs remains in the same quadrant after a time period, and highlights their rank changes over time.

plot.reclass Plots a SMR vs. SRU scatter plot with the ICUs which had their quadrant/rank classification changed.

print.reclass Prints a table with information about which ICUs changed from a classification to another.

Usage

reclass(
  x,
  y,
  same = TRUE,
  plot = FALSE,
  digits = 2,
  compare = c("SRU", "SMR", "BOTH"),
  decreasing = FALSE,
  complete.rank = TRUE
)

## S3 method for class 'reclass'
print(x, ...)

## S3 method for class 'reclass'
plot(
  x,
  ...,
  xlim_x = range(x$smr_x),
  ylim_x = range(x$sru_x),
  xlim_y = range(x$smr_y),
  ylim_y = range(x$sru_y),
  xlab = "SMR",
  ylab = "SRU",
  points.arg_x = list(pch = 21, col = "white", bg = "yellow", cex = 2),
  points.arg_y = list(pch = 21, col = "white", bg = "yellow", cex = 2),
  med.arg_x = list(col = "dodgerblue4", lwd = 2, lty = 1),
  med.arg_y = list(col = "dodgerblue4", lwd = 2, lty = 1),
  tert.arg_x = list(col = "darkorange2", lty = 2, lwd = 1),
  tert.arg_y = list(col = "darkorange2", lty = 2, lwd = 1),
  text.arg_x = list(labels = seq(1, nrow(x$tab)), cex = 0.6),
  text.arg_y = list(labels = seq(1, nrow(x$tab)), cex = 0.6),
  worse.arg_x = list(x = x$worse_x, pch = 21, col = "white", bg = "tomato", cex = 2),
  worse.arg_y = list(x = x$worse_y, pch = 21, col = "white", bg = "tomato", cex = 2),
  better.arg_x = list(x = x$better_x, pch = 21, col = "white", bg = "mediumseagreen",
    cex = 2),
  better.arg_y = list(x = x$better_y, pch = 21, col = "white", bg = "mediumseagreen",
    cex = 2),
  auto.legend = TRUE,
  leg.arg = list(x = "topleft", bty = "n", xpd = NA, inset = c(-1.8, -0.2), ncol = 1,
    horiz = F, pch = 19, cex = 0.8, pt.cex = 1.5),
  main.arg_x = list(main = "1st Stage"),
  main.arg_y = list(main = "2nd Stage")
)
reclass(
  x,
  y,
  same = TRUE,
  plot = FALSE,
  digits = 2,
  compare = c("SRU", "SMR", "BOTH"),
  decreasing = FALSE,
  complete.rank = TRUE
)

## S3 method for class 'reclass'
print(x, ...)

## S3 method for class 'reclass'
plot(
  x,
  ...,
  xlim_x = range(x$smr_x),
  ylim_x = range(x$sru_x),
  xlim_y = range(x$smr_y),
  ylim_y = range(x$sru_y),
  xlab = "SMR",
  ylab = "SRU",
  points.arg_x = list(pch = 21, col = "white", bg = "yellow", cex = 2),
  points.arg_y = list(pch = 21, col = "white", bg = "yellow", cex = 2),
  med.arg_x = list(col = "dodgerblue4", lwd = 2, lty = 1),
  med.arg_y = list(col = "dodgerblue4", lwd = 2, lty = 1),
  tert.arg_x = list(col = "darkorange2", lty = 2, lwd = 1),
  tert.arg_y = list(col = "darkorange2", lty = 2, lwd = 1),
  text.arg_x = list(labels = seq(1, nrow(x$tab)), cex = 0.6),
  text.arg_y = list(labels = seq(1, nrow(x$tab)), cex = 0.6),
  worse.arg_x = list(x = x$worse_x, pch = 21, col = "white", bg = "tomato", cex = 2),
  worse.arg_y = list(x = x$worse_y, pch = 21, col = "white", bg = "tomato", cex = 2),
  better.arg_x = list(x = x$better_x, pch = 21, col = "white", bg = "mediumseagreen",
    cex = 2),
  better.arg_y = list(x = x$better_y, pch = 21, col = "white", bg = "mediumseagreen",
    cex = 2),
  auto.legend = TRUE,
  leg.arg = list(x = "topleft", bty = "n", xpd = NA, inset = c(-1.8, -0.2), ncol = 1,
    horiz = F, pch = 19, cex = 0.8, pt.cex = 1.5),
  main.arg_x = list(main = "1st Stage"),
  main.arg_y = list(main = "2nd Stage")
)

Arguments

`x`, `y`	Objects of class 'SRU'. x is the SRU analsys from the 1st period (e.g. first trimester) and y from the 2nd period (e.g. second trimester). For `print.reclass` or `plot.reclass`, x is an object of class 'reclass'.
`same`	Logical; If `TRUE`, compare the same units, with the same severity classes at two consecutive time periods (default). If `same = TRUE` and the ICUs do not match exactly in 'x' and 'y', there is a warning and non matching units are discarded from the analysis. If `FALSE`, it compares the same units, with different severity classes within the same period. In this case, if the ICUs do not match exactly in 'x' and'y', the function will return an error.
`plot`	Logical. If `TRUE` (default), plots a SMR vs. SRU scatter plot highlighting the ICUs which had their classification changed.
`digits`	Integer indicating the number of decimal places to be used in the output.
`compare`	The way one prefers to benchmark the ICUs: by "SRU" (default), "SMR" or "BOTH". If "BOTH", the ICUs will be ranked by their SRU.
`decreasing`	Logical. Should the sort order of ICU's rank be increasing or decreasing?
`complete.rank`	Logical. If `TRUE` (default), returns all ICUs ranked. If `FALSE`, returns only ICUs whose efficiency classification ranked changed.
`...`	Arguments to be passed to methods (see `par`).
`xlim_x`, `ylim_x`	Limits for x and y axis for 1st stage plot for `plot.reclass`.
`xlim_y`, `ylim_y`	Limits for x and y axis for 2nd stage plot for `plot.reclass`.
`xlab`, `ylab`	Labels of x and y axis for `plot.reclass`.
`points.arg_x`, `points.arg_y`	List of arguments passed to `points` for plotting points correponding to units' SMR and SRU in 1st and 2nd stage plots for `plot.reclass`.
`med.arg_x`, `med.arg_y`	List of arguments passed to `abline` for plotting lines corresponding to SRU and SMR medians in 1st and 2nd stage plots for `plot.reclass`.
`tert.arg_x`, `tert.arg_y`	List of arguments passed to `abline` for plotting lines corresponding to SRU and SMR tertiles in 1st and 2nd stage plots for `plot.reclass`.
`text.arg_x`, `text.arg_y`	List of arguments passed to `text` for plotting units labels in 1st and 2nd stage plots for `plot.reclass`.
`worse.arg_x`, `worse.arg_y`	List of arguments passed to `points` for plotting points correponding to units which got their rank worse in 1st and 2nd stage plots for `plot.reclass`.
`better.arg_x`, `better.arg_y`	List of arguments passed to `points` for plotting points correponding to units which got their rank better in 1st and 2nd stage plots for `plot.reclass`.
`auto.legend`	Logical. If `TRUE`, it prints a legend with `leg.arg` arguments for `plot.reclass`.
`leg.arg`	List of arguments passed to `legend` for plotting legends corresponding to SRU and SMR medians and tertiles in 1st and 2nd stage plots for `plot.reclass`.
`main.arg_x`, `main.arg_y`	List of arguments passed to `plot` for the titles for the 1st and 2nd stage plots for `plot.reclass`.

Value

reclass retunrs a data.frame with the following columns:

Unit Names of the ICU.
Admission Number of admissions in each ICU.
From ICU's initial efficiency quadrant.
To ICU's final efficiency quadrant.
SRU.1st ICU's initial SRU estimate.
SRU.2nd ICU's final SRU estimate.
SMR.1st ICU's initial SMR estimate.
SMR.2nd ICU's final SMR estimate.
Rank1 ICU's initial SRU (or SMR) rank.
Rank2 ICU's final SRU (or SMR) rank.

plot.reclass returns a scatter plot with grpahical comparison of the two periods/stages with their respective medians and tertiles.

Author(s)

Lunna Borges and Pedro Brasil

Examples


data(icu)
# A little editing
icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation / 100
icu <- icu[-which(icu$los < 0 ),]

# Subseting the data for the 1st quarter
x <- droplevels(icu[which(format(as.Date(icu$UnitAdmissionDate),"%m") %in% c("01","02","03")),])

# Subseting the data for the 2nd quarter
y <- droplevels(icu[which(format(as.Date(icu$UnitAdmissionDate),"%m") %in% c("04","05","06")),])

# Running the SRU analysis for both quarters
FirstQ <- SRU(prob = x$Saps3DeathProbabilityStandardEquation, death = x$UnitDischargeName,
unit = x$Unit, los = x$los, score = x$Saps3Points, originals = TRUE, type = 1, plot = FALSE)
FirstQ

SecondQ <- SRU(prob = y$Saps3DeathProbabilityStandardEquation, death = y$UnitDischargeName,
unit = y$Unit, los = y$los, score = y$Saps3Points, originals = TRUE, type = 1, plot = FALSE)
SecondQ

z <- reclass(x = FirstQ, y = SecondQ, same = TRUE)
z
plot(z)

rm(icu, x, y, FirstQ, SecondQ, z)

data(icu)
# A little editing
icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation / 100
icu <- icu[-which(icu$los < 0 ),]

# Subseting the data for the 1st quarter
x <- droplevels(icu[which(format(as.Date(icu$UnitAdmissionDate),"%m") %in% c("01","02","03")),])

# Subseting the data for the 2nd quarter
y <- droplevels(icu[which(format(as.Date(icu$UnitAdmissionDate),"%m") %in% c("04","05","06")),])

# Running the SRU analysis for both quarters
FirstQ <- SRU(prob = x$Saps3DeathProbabilityStandardEquation, death = x$UnitDischargeName,
unit = x$Unit, los = x$los, score = x$Saps3Points, originals = TRUE, type = 1, plot = FALSE)
FirstQ

SecondQ <- SRU(prob = y$Saps3DeathProbabilityStandardEquation, death = y$UnitDischargeName,
unit = y$Unit, los = y$los, score = y$Saps3Points, originals = TRUE, type = 1, plot = FALSE)
SecondQ

z <- reclass(x = FirstQ, y = SecondQ, same = TRUE)
z
plot(z)

rm(icu, x, y, FirstQ, SecondQ, z)

Standardized Mortality Ratio (SMR)

Description

Calculates the standardized mortality ratio and its confidence interval. SMR, for a group, is defined as the ratio of the observed deaths in this group and the sum of the predicted individual probabilities of death by any model (expected deaths).

SMR.table estimates at once the overall SMR and the SMR across several groups, e.g. ICU units or clinical characteristics. The SMR.table can be ordered by the SMR estimate or its confidence intervals, facilitating the comparison of the units ranks.

forest.SMR shows the SMR.table output as a forest plot. The plot opens two windows and plot at the left side the values from the SMR.table and at the right side the points and lines graphically representing each SMR and its confidence interval.

Usage

SMR(
  obs.var,
  pred.var,
  digits = 5,
  ci.method = c("Hosmer", "Byar"),
  ci.level = 0.95
)

SMR.table(
  data,
  group.var,
  obs.var,
  pred.var,
  digits = 5,
  use.label = FALSE,
  var.labels = attr(data, "var.labels")[match(group.var, names(data))],
  ci.method = c("Hosmer", "Byar"),
  ci.level = 0.95,
  reorder = c("no", "SMR", "lower.Cl", "upper.Cl"),
  decreasing = FALSE
)

forest.SMR(
  x,
  mar1 = c(5.1, 1, 4.1, 1),
  mar.SMR = c(5.1, 7, 4.1, 1),
  overall.arg = list(x = 0.01, font = 2, las = 1, labels = var.labels[1], xpd = NA, adj
    = 0),
  NOE.overall.args = list(x = c(N.values.arg$x, O.values.arg$x, E.values.arg$x), font =
    2, las = 1, xpd = NA),
  var.labels.arg = list(x = 0.01, font = 2, las = 1, cex = 1, xpd = NA, adj = 0),
  cat.labels.arg = list(x = 0.1, font = 3, las = 1, cex = 0.95, col = gray(0.4), xpd =
    NA, adj = 0),
  N.values.arg = list(x = 0.5, col = gray(0.4), xpd = NA),
  O.values.arg = list(x = 0.675, col = gray(0.4), xpd = NA),
  E.values.arg = list(x = 0.85, col = gray(0.4), xpd = NA),
  NOE.head.arg = list(font = 2, labels = c("N", "O", "E"), xpd = NA),
  Overall.seg.arg = list(col = "navyblue", xpd = NA, lwd = 2),
  Overall.p.arg = list(pch = 23, cex = 2, col = "black", bg = gray(0.4), xpd = NA),
  Overall.est.arg = list(x = smr.xlim[1] - 0.06, las = 1, font = 2, xpd = NA, adj = 1),
  cat.seg.arg = list(col = "navyblue", xpd = NA, lwd = 2),
  cat.p.arg = list(pch = 22, cex = 1, col = "black", bg = gray(0.4), xpd = NA),
  cat.est.arg = list(x = smr.xlim[1] - 0.06, las = 1, col = gray(0.4), xpd = NA, adj =
    1),
  SMR.head.arg = list(smr.xlim[1] - 0.06, font = 2, labels = "SMR [95% CIs]", xpd =
    NA, adj = 1),
  smr.xlab = "Standardized Mortality Ratio",
  smr.xlim = "auto",
  grid = TRUE,
  digits = 3
)
SMR(
  obs.var,
  pred.var,
  digits = 5,
  ci.method = c("Hosmer", "Byar"),
  ci.level = 0.95
)

SMR.table(
  data,
  group.var,
  obs.var,
  pred.var,
  digits = 5,
  use.label = FALSE,
  var.labels = attr(data, "var.labels")[match(group.var, names(data))],
  ci.method = c("Hosmer", "Byar"),
  ci.level = 0.95,
  reorder = c("no", "SMR", "lower.Cl", "upper.Cl"),
  decreasing = FALSE
)

forest.SMR(
  x,
  mar1 = c(5.1, 1, 4.1, 1),
  mar.SMR = c(5.1, 7, 4.1, 1),
  overall.arg = list(x = 0.01, font = 2, las = 1, labels = var.labels[1], xpd = NA, adj
    = 0),
  NOE.overall.args = list(x = c(N.values.arg$x, O.values.arg$x, E.values.arg$x), font =
    2, las = 1, xpd = NA),
  var.labels.arg = list(x = 0.01, font = 2, las = 1, cex = 1, xpd = NA, adj = 0),
  cat.labels.arg = list(x = 0.1, font = 3, las = 1, cex = 0.95, col = gray(0.4), xpd =
    NA, adj = 0),
  N.values.arg = list(x = 0.5, col = gray(0.4), xpd = NA),
  O.values.arg = list(x = 0.675, col = gray(0.4), xpd = NA),
  E.values.arg = list(x = 0.85, col = gray(0.4), xpd = NA),
  NOE.head.arg = list(font = 2, labels = c("N", "O", "E"), xpd = NA),
  Overall.seg.arg = list(col = "navyblue", xpd = NA, lwd = 2),
  Overall.p.arg = list(pch = 23, cex = 2, col = "black", bg = gray(0.4), xpd = NA),
  Overall.est.arg = list(x = smr.xlim[1] - 0.06, las = 1, font = 2, xpd = NA, adj = 1),
  cat.seg.arg = list(col = "navyblue", xpd = NA, lwd = 2),
  cat.p.arg = list(pch = 22, cex = 1, col = "black", bg = gray(0.4), xpd = NA),
  cat.est.arg = list(x = smr.xlim[1] - 0.06, las = 1, col = gray(0.4), xpd = NA, adj =
    1),
  SMR.head.arg = list(smr.xlim[1] - 0.06, font = 2, labels = "SMR [95% CIs]", xpd =
    NA, adj = 1),
  smr.xlab = "Standardized Mortality Ratio",
  smr.xlim = "auto",
  grid = TRUE,
  digits = 3
)

Arguments

`obs.var`	Observed death. Accepted values are 0 (absence) or 1 (presence) in a vector. For `SMR.table` it must be a character indicating the name of the variable in the data.
`pred.var`	Death individual predictions (ranging from 0 to 1) in a vector. For `SMR.table` it must be a character indicating the name of the variable in the data.
`digits`	Number of digits for rounding the output.
`ci.method`	Method to estimate the confidence interval. "Hosmer" (default) or "Byar" are acceptable values.
`ci.level`	Level of the confidence interval. Default is 0.95.
`data`	For `SMR.table`, a dataset where pred.var, obs.var and group.var are in.
`group.var`	For `SMR.table`, this is a character vector indicating the name(s) of the variable(s) in the data that will form the groups where SMR will be calculated. The variables must be factors.
`use.label`	Logical. Default is FALSE. For `SMR.table` this option will replace the variables names by its labels in var.labels argument.
`var.labels`	A character vector with variables labels. The default is to replace the variable name by the label stored at attr(data, "var.labels"). But one may specify labels directly.
`reorder`	Default is "no". Possible values are: "no", "SMR","lower.Cl", and "upper.Cl". It will make the `SMR.table` to be ordered within each varibale by its original order, or by SMR order, or by lower.Cl order, or by upper.Cl.
`decreasing`	Logical. When 'reorderd' is TRUE, should the order be decreasing or incresing? See `order`
`x`	For the `forest.SMR` this is the output of `SMR.table`.
`mar1`, `mar.SMR`	Values to set the margins (mar parameter) of left and right windows. See `par`
`overall.arg`	A list of arguments passed to `text` for plotting the overall label. Internally, 'y' coordinate is replaced.
`NOE.overall.args`	A list of arguments passed to `text` for plotting the overall N (number of observations), O (observed deaths) and E (expected deaths). Internally, 'labels' and 'y' arguments are replaced.
`var.labels.arg`	A list of arguments passed to `text` for plotting the variables labels. Internally, 'y' coordinate is replaced.
`cat.labels.arg`	A list of arguments passed to `text` for plotting the categories labels. Internally, 'y' coordinate is replaced.
`N.values.arg`	A list of arguments passed to `text` for plotting the values of N (number of observations) of each subgroup. Internally, the arguments 'label' and 'y' coordinate are replaced.
`O.values.arg`	A list of arguments passed to `text` for plotting the values of Observed deaths of each subgroup. Internally, the arguments 'label' and 'y' coordinate are replaced.
`E.values.arg`	A list of arguments passed to `text` for plotting the values of Expected deaths of each subgroup. Internally, the arguments 'label' and 'y' coordinate are replaced.
`NOE.head.arg`	A list of arguments passed to `text` for plotting the labels of the columns N, E and O on the top of the graph. Internally, the 'x' and 'y' coordinates are replaced. The x coordinates are taken from the x in `N.values.arg`, `O.values.arg` and `E.values.arg`.
`Overall.seg.arg`	A list of arguments passed to `segments` for plotting the lines corresponding to overall SMR confidence intervals. Internally, 'x' and 'y' coordinates are replaced.
`Overall.p.arg`	A list of arguments passed to `points` for plotting the points corresponding to overall SMR. Internally, 'x' and 'y' coordinates are replaced.
`Overall.est.arg`	A list of arguments passed to `text` for plotting the overall SMR beside the graph. Internally, 'y' coordinate and 'label' argument are replaced.
`cat.seg.arg`	A list of arguments passed to `segments` for plotting the lines corresponding to SMR confidence intervals for all groups. Internally, 'x' and 'y' coordinates are replaced.
`cat.p.arg`	A list of arguments passed to `points` for plotting the points corresponding to all categoreis SMR. Internally, 'x' and 'y' coordinates are replaced.
`cat.est.arg`	A list of arguments passed to `text` for plotting the categories SMR beside the graph. Internally, 'y' coordinate and 'label' arguments are replaced.
`SMR.head.arg`	A list of arguments passed to `text` for plotting the label of the SMR column on the top of the graph. Internally, the 'y' coordinate is replaced.
`smr.xlab`	Label of the x axis. Default is "Standardized Mortality Ratio".
`smr.xlim`	Limits of x axis of the `forest.SMR` plot. Default is "auto", which internally will pick the highest values of all upper.Cl and the lowest lower.Cl. Besides "auto", only a vector of 2 numbers is valid, and will be passed to `plot.default`.
`grid`	Logical. If TRUE (default), it will draw a grid with the `grid` default arguments.

Value

If SMR, then:

N Number of subjects analyzed.
O Observed number of deaths.
E Expected number of deaths.
SMR Standardized mortality ratio.
lower.Cl lower confidence limit.
upper.Cl upper confidence limit.

If SMR.table, then a data.frame with the same information as above, and the additional information is returned: "Variables" (variables names), "Levels" (variables levels).

If forest.SMR, then a plot is returned.

Author(s)

Lunna Borges and Pedro Brasil

References

David W. Hosmer and Stanley Lemeshow. Confidence intervals estimates of an index of quality performance basend on logistic regression models. Statistics in Medicine , vol. 14, 2161-2172 (1995)

Examples

# Loading a example data
data(icu)

# Setting variable labels to data
attr(icu, "var.labels")[match(c("Unit", "IsMechanicalVentilation1h",
          "AdmissionTypeName_pri","Vasopressors_D1"), names(icu))] <-
  c("ICU unit","Mechanichal ventilation","Admission type","Vasopressors at admission")

# Some editing
icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation /100
icu$IsMechanicalVentilation1h <- as.factor(ifelse(icu$IsMechanicalVentilation1h == 1, "Yes", "No"))
icu$AdmissionTypeName_pri <- as.factor(icu$AdmissionTypeName_pri)
levels(icu$AdmissionTypeName_pri) <- c("Clinical","Elective surgery", "Urgent surgery")
icu$Vasopressors_D1 <- as.factor(ifelse(icu$Vasopressors_D1 == 1, "Yes", "No"))

# The overall SMR for the whole sample
SMR(icu$UnitDischargeName, icu$Saps3DeathProbabilityStandardEquation)

# The overall SMR and for some subgroups
x <- SMR.table(data = icu, obs.var = "UnitDischargeName",
               pred.var = "Saps3DeathProbabilityStandardEquation",
               group.var = c( "IsMechanicalVentilation1h",
               "AdmissionTypeName_pri","Vasopressors_D1"),
               reorder = "no",
               decreasing = TRUE,
               use.label = TRUE)
x

# A forest plot for all groups SMR (resize the window may be required)
forest.SMR(x, digits = 2)

# The same thing but reordering the categories
x <- SMR.table(data = icu, obs.var = "UnitDischargeName",
               pred.var = "Saps3DeathProbabilityStandardEquation",
               group.var = c( "IsMechanicalVentilation1h",
               "AdmissionTypeName_pri", "Vasopressors_D1"),
               reorder = "SMR",
               decreasing = TRUE,
               use.label = TRUE)
forest.SMR(x, digits = 2)

# The overall SMR and for all Units
x <- SMR.table(data = icu, obs.var = "UnitDischargeName",
               pred.var = "Saps3DeathProbabilityStandardEquation",
               group.var = "Unit",
               reorder = "no",
               decreasing = TRUE,
               use.label = TRUE)
x

# A forest plot for all Units
forest.SMR(x, digits = 2)

# The same thing but reordering the categories
x <- SMR.table(data = icu, obs.var = "UnitDischargeName",
               pred.var = "Saps3DeathProbabilityStandardEquation",
               group.var = "Unit",
               reorder = "SMR",
               decreasing = TRUE,
               use.label = TRUE)
forest.SMR(x, digits = 2)

rm(x, icu)
# Loading a example data
data(icu)

# Setting variable labels to data
attr(icu, "var.labels")[match(c("Unit", "IsMechanicalVentilation1h",
          "AdmissionTypeName_pri","Vasopressors_D1"), names(icu))] <-
  c("ICU unit","Mechanichal ventilation","Admission type","Vasopressors at admission")

# Some editing
icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation /100
icu$IsMechanicalVentilation1h <- as.factor(ifelse(icu$IsMechanicalVentilation1h == 1, "Yes", "No"))
icu$AdmissionTypeName_pri <- as.factor(icu$AdmissionTypeName_pri)
levels(icu$AdmissionTypeName_pri) <- c("Clinical","Elective surgery", "Urgent surgery")
icu$Vasopressors_D1 <- as.factor(ifelse(icu$Vasopressors_D1 == 1, "Yes", "No"))

# The overall SMR for the whole sample
SMR(icu$UnitDischargeName, icu$Saps3DeathProbabilityStandardEquation)

# The overall SMR and for some subgroups
x <- SMR.table(data = icu, obs.var = "UnitDischargeName",
               pred.var = "Saps3DeathProbabilityStandardEquation",
               group.var = c( "IsMechanicalVentilation1h",
               "AdmissionTypeName_pri","Vasopressors_D1"),
               reorder = "no",
               decreasing = TRUE,
               use.label = TRUE)
x

# A forest plot for all groups SMR (resize the window may be required)
forest.SMR(x, digits = 2)

# The same thing but reordering the categories
x <- SMR.table(data = icu, obs.var = "UnitDischargeName",
               pred.var = "Saps3DeathProbabilityStandardEquation",
               group.var = c( "IsMechanicalVentilation1h",
               "AdmissionTypeName_pri", "Vasopressors_D1"),
               reorder = "SMR",
               decreasing = TRUE,
               use.label = TRUE)
forest.SMR(x, digits = 2)

# The overall SMR and for all Units
x <- SMR.table(data = icu, obs.var = "UnitDischargeName",
               pred.var = "Saps3DeathProbabilityStandardEquation",
               group.var = "Unit",
               reorder = "no",
               decreasing = TRUE,
               use.label = TRUE)
x

# A forest plot for all Units
forest.SMR(x, digits = 2)

# The same thing but reordering the categories
x <- SMR.table(data = icu, obs.var = "UnitDischargeName",
               pred.var = "Saps3DeathProbabilityStandardEquation",
               group.var = "Unit",
               reorder = "SMR",
               decreasing = TRUE,
               use.label = TRUE)
forest.SMR(x, digits = 2)

rm(x, icu)

Standardized Resource Use (SRU)

Description

SRU calculates the standardized resource use for ICUs (Intensive Care Units) from information regarding admissions of individual patients. Resource use is represented by the patient's length of stay (LOS). Therefore the SRU for each unit is defined as the observed LOS divided by it's expected LOS. To estimate the expected LOS for each ICU one must define a severity score, here defined by the SAPS 3 score. In theory, the 'score' could be any score/probability that estimates death for each ICU admission.

The plot.SRU function will return a SMR versus SRU scatter plot with its medians and tertiles. Thus, it classifies each unit in the quadrants formed by these two medians as: most efficient (ME) which is the lower left quadrant (both SRU and SMR below their medians); least efficient (LE) is the upper right quadrant (both SRU and SMR above their medians); and least achieving (LA) - the lower right quadrant (SRU below and SMR above their medians); and over achieving (OA) - the upper left quadrant (SRU above and SMR below their medians).

print.SRU Prints a object of class 'SRU'.

cut_in is used to find limits to define severity classes which are used in SRU function. The severity classes are necessary to calculate the average of days to produce one survivor and consequently to estimate the expected LOS in each ICU. Its rationale is to find the limits for the severity classes that yeld a desired average of days to produce one survivor. At some point in time, we made a study to test if different arrangements of the severity classes would yeld different classifications in the efficiency quadrants. Despite the fact that this study did not show any difference from each approach, we left the function in the package. Therefore, any arbitrary severity classes should yeild the same results.

SRUcalc is a simpler function to estimte SRU and returns, for each unit, the SRU value, the observed and expected number of deaths, and the observed and expected LOS.

Usage

SRU(
  prob,
  death,
  unit,
  los,
  los.exp,
  class,
  score,
  plot = FALSE,
  type = 1,
  digits = 2,
  digits2 = 5,
  originals = FALSE,
  myunits = NULL
)

## S3 method for class 'SRU'
print(x, ...)

## S3 method for class 'SRU'
plot(
  x,
  ...,
  xlim = range(x$rates[, 2]),
  ylim = range(x$rates[, 1]),
  xlab = "SMR",
  ylab = "SRU",
  points.arg = list(pch = 21, col = "white", bg = "cadetblue3", cex = 1.5),
  med.arg = list(col = "dodgerblue4", lwd = 2, lty = 1),
  tert.arg = list(col = "darkorange2", lty = 2, lwd = 1),
  auto.legend = TRUE,
  leg.arg = list(x = "top", bty = "n", xpd = NA, inset = -0.2, ncol = 2),
  bty = "n",
  myunits = x$myunits,
  myunitspts.arg = list(pch = 21, col = "white", bg = "red", cex = 1.5),
  myunitstext.arg = list(pos = 1, font = 2, cex = 0.8)
)

cut_in(
  score,
  los,
  death,
  unit,
  days,
  min = 200,
  exc.ICU = TRUE,
  complete = FALSE,
  digits = 5
)

SRUcalc(prob, death, unit, los, score, digits = 2)
SRU(
  prob,
  death,
  unit,
  los,
  los.exp,
  class,
  score,
  plot = FALSE,
  type = 1,
  digits = 2,
  digits2 = 5,
  originals = FALSE,
  myunits = NULL
)

## S3 method for class 'SRU'
print(x, ...)

## S3 method for class 'SRU'
plot(
  x,
  ...,
  xlim = range(x$rates[, 2]),
  ylim = range(x$rates[, 1]),
  xlab = "SMR",
  ylab = "SRU",
  points.arg = list(pch = 21, col = "white", bg = "cadetblue3", cex = 1.5),
  med.arg = list(col = "dodgerblue4", lwd = 2, lty = 1),
  tert.arg = list(col = "darkorange2", lty = 2, lwd = 1),
  auto.legend = TRUE,
  leg.arg = list(x = "top", bty = "n", xpd = NA, inset = -0.2, ncol = 2),
  bty = "n",
  myunits = x$myunits,
  myunitspts.arg = list(pch = 21, col = "white", bg = "red", cex = 1.5),
  myunitstext.arg = list(pos = 1, font = 2, cex = 0.8)
)

cut_in(
  score,
  los,
  death,
  unit,
  days,
  min = 200,
  exc.ICU = TRUE,
  complete = FALSE,
  digits = 5
)

SRUcalc(prob, death, unit, los, score, digits = 2)

Arguments

`prob`	Individual predicted probability of death (ranging from 0 to 1) in a vector.
`death`	Observed death. Accepted values are 0 (absence) or 1 (presence) in a vector.
`unit`	A character or factor variable indicating the ICU where the patient is admitted.
`los`	A numeric variable indicating the observed length of stay for each patient.
`los.exp`	Estimated length of stay (LOS). This argument is optional and will be required only if `type = 2`. If the user has an alternative model to estimate the individual LOS, the predicted individual LOS should be passed to this argument. If this is the case, the predicted ICU LOS is estimated as the mean of the individual predictions of the LOS of these groups.
`class`	A factor variable indicating the class of severity score (e.g. SAPS 3). In the case of SAPS 3, this is a cut in the SAPS 3 score, grouping patients into severity classes. This will be required if the argument `original = FALSE` and NAs are not allowed; if `original = TRUE`, class is ignored.
`score`	A numeric vector with the Acute Physiology Score (SAPS) 3 score for each admission. The function will use this argument to know to which severity class each patient will be assigned to. It is used only when `originals = TRUE` and ignored otherwise. NAs are not allowed.
`plot`	Logical; If `TRUE`, plots a SMR versus SRU scatter plot.
`type`	A Way to calculate SRU. If `type = 1`, it does as the original article to estimate the ICU's expected LOS (default). First, it multiplies the overall average of days of each severity class by the number of survivors in the same severity class in that ICU. Than, it sums the expected LOS for each severity class in that ICU. If `type = 2`, the user must provide the `los.exp` (expected LOS) for each subject (i.e. from a prediction model), and the function will estimate the ICU's expected LOS as the mean of all individual LOS for patients in that ICU.
`digits`, `digits2`	Integer indicating the number of decimals to be used in the output.
`originals`	Logical; If `TRUE`, it uses the severity classes and average days as the original article and will override the `class` argument, if any. It requires the `score` argument and it must be the SAPS 3 score. We recommend not to set it `TRUE` unless you really know what you are doing. Even if one wishes to have severity classes identical to the original paper, it is better to set the severity classes before running the analysis. This way, the function will estimate the average days from the data instead of using the fixed average days from the original paper.
`myunits`	A character vector with the unit names which one would like to benchmark among all units. These units will be highlighted with dots of different collors in the plot. Default is `NULL`.
`x`	For `print.SRU` or `plot.SRU`, an object of class 'SRU'.
`...`	Arguments to be passed to `plot.default` or to `print`.
`xlim`, `ylim`	Limits of x and y axis for `plot.SRU`.
`xlab`, `ylab`	Labels of x and y axis for `plot.SRU`.
`points.arg`	List of arguments passed to `points` for plotting points correponding to ICU's SMR and SRU.
`med.arg`	List of arguments passed to `abline` for plotting lines corresponding to SRU's and SMR's medians.
`tert.arg`	List of arguments passed to `abline` for plotting lines corresponding to SRU's and SMR's tertiles.
`auto.legend`	Logical; If `TRUE`, prints a legend with parameters in `leg.arg` arguments.
`leg.arg`	List of arguments passed to `legend` for plotting legends in `plot.SRU`.
`bty`	A character string which determines the type of box that is drawn about plots. See `par`
`myunitspts.arg`	List of arguments passed to `points` for plotting points correponding to `myunits`'s SMR and SRU.
`myunitstext.arg`	List of arguments passed to `text` for labelling points correponding to `myunits`'s position.
`days`	For `cut_in`, this is a vector of days to get an avarage. See example.
`min`	For `cut_in`, this is the minimum desired quantity of patients in each severity class (default = 200) to estimate the average days.
`exc.ICU`	Logical; For `cut_in`, if `TRUE`, ICUs without surviving patients are ignored.
`complete`	Logical; For `cut_in`, if `TRUE`, shows additional information about severity classes.

Value

Two tables: one with information about severity classes and the respective quantities required to estimate the expected LOS, and another with information about ICUs classified as Most Efficient (ME) or Least Efficient (LE).

Sev Severity class.
Total Total of patients.
Surv Total of survivors.
Total.LOS Total length of stay (days).
AvDays Average days to produce a survivor.
N.Unit Quantity of ICUs.
N.Pat Quantity of patients.
SMR Standardized Mortality Ratio Mean (standard deviation).
SRU Standardized Resource Use Mean (standard deviation).

Most Efficient ICUs have SRU, SMR < median. Least Efficient ICUs have SRU, SMR > median.

cut_in returns a vector with the limits to cut the severity score.

SRUcalc returns a table with:

Unit ICUs names.
SMR or SRU Standardized Rate.
N Number of subjects analyzed.
Observed Observed number of deaths.
Expected Expected number of deaths.
LOS_esp Expected length of stay.

Author(s)

Lunna Borges and Pedro Brasil

References

Rothen HU, Stricker K, Einfalt J, Bauer P, Metnitz PGH, Moreno RP, Takala J (2007) Variability in outcome and resource use in intensive care units. Intensive Care Med 33:1329-1336

Examples


# Loading the dataset
data(icu)

# Removing data with inapropriate values and some editing
icu <- icu[-which(icu$los < 0 ),]
icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation / 100

# Setting classes acording to limits of SAPS 3 score
days <- seq(1,100)
cut_lims <- cut_in(icu$Saps3Points, icu$los, icu$UnitDischargeName,
                   icu$Unit, days, exc.ICU = TRUE)
icu$class <- cut(icu$Saps3Points, breaks = cut_lims, include.lowest = TRUE)

# Estimating the SRU benchmarking myunit A and B
x <- SRU(prob = icu$Saps3DeathProbabilityStandardEquation,
death = icu$UnitDischargeName, unit = icu$Unit,
los = icu$los, score = icu$Saps3Points,
originals = TRUE, type = 1, plot = FALSE, myunits = c("A","B"))
x
plot(x)

# To see the units rankings and individual SMR and SRU, ordering by its SRU
x$rates[order(x$rates$sru),]

# SRU with diferent severity classes created by cut_in function
y <- SRU(prob = icu$Saps3DeathProbabilityStandardEquation,
death = icu$UnitDischargeName, unit = icu$Unit,
los = icu$los, score = icu$Saps3Points,
originals = FALSE, type = 1, plot = FALSE, class = icu$class)
y

# Using SRUcalc
SRUcalc(prob = icu$Saps3DeathProbabilityStandardEquation,
        death = icu$UnitDischargeName, unit = icu$Unit, los = icu$los,
        score = icu$Saps3Points)

rm(x, y, days, icu, cut_lims)

# Loading the dataset
data(icu)

# Removing data with inapropriate values and some editing
icu <- icu[-which(icu$los < 0 ),]
icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation / 100

# Setting classes acording to limits of SAPS 3 score
days <- seq(1,100)
cut_lims <- cut_in(icu$Saps3Points, icu$los, icu$UnitDischargeName,
                   icu$Unit, days, exc.ICU = TRUE)
icu$class <- cut(icu$Saps3Points, breaks = cut_lims, include.lowest = TRUE)

# Estimating the SRU benchmarking myunit A and B
x <- SRU(prob = icu$Saps3DeathProbabilityStandardEquation,
death = icu$UnitDischargeName, unit = icu$Unit,
los = icu$los, score = icu$Saps3Points,
originals = TRUE, type = 1, plot = FALSE, myunits = c("A","B"))
x
plot(x)

# To see the units rankings and individual SMR and SRU, ordering by its SRU
x$rates[order(x$rates$sru),]

# SRU with diferent severity classes created by cut_in function
y <- SRU(prob = icu$Saps3DeathProbabilityStandardEquation,
death = icu$UnitDischargeName, unit = icu$Unit,
los = icu$los, score = icu$Saps3Points,
originals = FALSE, type = 1, plot = FALSE, class = icu$class)
y

# Using SRUcalc
SRUcalc(prob = icu$Saps3DeathProbabilityStandardEquation,
        death = icu$UnitDischargeName, unit = icu$Unit, los = icu$los,
        score = icu$Saps3Points)

rm(x, y, days, icu, cut_lims)

Survival models performance analysis

Description

Colection of functions for survival models performance analysis.

R2sh estimates a distance-based estimator of survival predictive accuracy proposed by Schemper and Henderson. It was inspirated in survAUC::schemper function, but receives the predicted values directly. Besides that, R2sh does bootstrap resampling and returns its confidence interval estimate.

R2pm calculates a estimator of survival predictive accuracy proposed by Kent & O'Quigley and its bootstrap confidence interval.

cal.Slope returns the calibration slope of a survival model and its bootstrap confidence interval.

Usage

R2sh(time, status, lin.pred, data, R)

R2pm(lin.pred, R)

cal.Slope(time, status, lin.pred, R)
R2sh(time, status, lin.pred, data, R)

R2pm(lin.pred, R)

cal.Slope(time, status, lin.pred, R)

Arguments

`time`	A vector of event times.
`status`	A indicator vector of event occurrence.
`lin.pred`	A vector of linear predictors of a survival model for each observation. (prognostic index)
`data`	A data.frame where to find column vectors.
`R`	The number of bootstrap replicates. Usually this will be a single positive integer. For importance resampling, some resamples may use one set of weights and others use a different set of weights. In this case R would be a vector of integers where each component gives the number of resamples from each of the rows of weights. To be passed to `boot`.

Value

R2sh returns a list with the following components:

D: The estimator of predictive accuracy obtained from the covariate-free null model.
Dx: The estimator of predictive accuracy obtained from the Cox model.
V: The estimator of relative gains in predictive accuracy.
Mhat: The absolute distance estimator obtained from the Cox model (evaluated at the event times of the test data).
Mhat.0: The absolute distance estimator obtained from the covariate-free null model (evaluated at the event times of the test data).
timep: The event times of the test data.
lower: V lower confidence limit.
upper: V upper confidence limit.
boot: An object of class "boot".
bootCI: Boot confidence intervals resampling.

R2pm returns a list with the following components:

r2: The estimator of predictive accuracy obtained from the Cox model.
lower: r2 lower confidence limit.
upper: r2 upper confidence limit.
boot: An object of class "boot".
bootCI: Boot confidence intervals resampling.

cal.Slope returns a list with the following components:

slope: The calibration slope measure of a survival model.
lower: slope lower confidence limit.
upper: slope upper confidence limit.
boot: An object of class "boot".
bootCI: Boot confidence intervals resampling.

Author(s)

Lunna Borges <[email protected]>

References

Schemper, M. and R. Henderson (2000). Predictive accuracy and explained variation in Cox regression. Biometrics 56, 249-255.

Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application, Chapter 5. Cambridge University Press.

DiCiccio, T.J. and Efron B. (1996) Bootstrap confidence intervals (with Discussion). Statistical Science, 11, 189-228.

Efron, B. (1987) Better bootstrap confidence intervals (with Discussion). Journal of the American Statistical Association, 82, 171-200.

Kent, John T., and J. O. H. N. O'QUIGLEY. "Measures of dependence for censored survival data." Biometrika 75.3 (1988): 525-534.

van Houwelingen, Hans C. "Validation, calibration, revision and combination of prognostic survival models." Statistics in medicine 19.24 (2000): 3401-3415.

Rahman, M. Shafiqur, et al. "Review and evaluation of performance measures for survival prediction models in external validation settings." BMC medical research methodology 17.1 (2017): 60.

Examples


#### Survival model ####

data(breastCancer)
class(breastCancer$gradd1) <- "character"
class(breastCancer$gradd2) <- "character"

traindata <- breastCancer[sample(nrow(breastCancer), nrow(breastCancer)*2/3),]
newdata <- breastCancer[-sample(nrow(breastCancer), nrow(breastCancer)*2/3),]
model <- rms::cph(survival::Surv(rectime, censrec) ~ rms::rcs(age,6) +
 rms::rcs(nodes,3) + rms::rcs(pgr,3) + gradd1 + gradd2 +
 hormon, data = traindata)

lp <- predict(model, newdata = newdata)

#### R2sh  example ####

R2sh(newdata$rectime, newdata$censrec, lp, data = newdata, R = 50)

#### R2pm example ####

R2pm(lp, R = 50)

#### cal.slope example ####

cal.Slope(newdata$rectime, newdata$censrec, lp, R = 50)

#### Survival model ####

data(breastCancer)
class(breastCancer$gradd1) <- "character"
class(breastCancer$gradd2) <- "character"

traindata <- breastCancer[sample(nrow(breastCancer), nrow(breastCancer)*2/3),]
newdata <- breastCancer[-sample(nrow(breastCancer), nrow(breastCancer)*2/3),]
model <- rms::cph(survival::Surv(rectime, censrec) ~ rms::rcs(age,6) +
 rms::rcs(nodes,3) + rms::rcs(pgr,3) + gradd1 + gradd2 +
 hormon, data = traindata)

lp <- predict(model, newdata = newdata)

#### R2sh  example ####

R2sh(newdata$rectime, newdata$censrec, lp, data = newdata, R = 50)

#### R2pm example ####

R2pm(lp, R = 50)

#### cal.slope example ####

cal.Slope(newdata$rectime, newdata$censrec, lp, R = 50)

Tabulation of variables in a stack form

Description

There are two functionalities: Tabulation of variables with the same possible range of distribution and stack into a new table with or without other descriptive statistics or to breakdown distribution of more than one row variables against a column variable

Usage

tableStack(
  vars,
  dataFrame,
  minlevel = "auto",
  maxlevel = "auto",
  count = TRUE,
  na.rm = FALSE,
  means = TRUE,
  medians = FALSE,
  sds = TRUE,
  decimal = 2,
  total = TRUE,
  var.labels = TRUE,
  var.labels.trunc = 150,
  reverse = FALSE,
  vars.to.reverse = NULL,
  by = NULL,
  vars.to.factor = NULL,
  iqr = "auto",
  prevalence = FALSE,
  percent = c("column", "row", "none"),
  frequency = TRUE,
  test = TRUE,
  name.test = TRUE,
  total.column = FALSE,
  simulate.p.value = FALSE,
  sample.size = TRUE,
  assumption.p.value = 0.01,
  NAcol = FALSE,
  NArow = FALSE,
  drplvls = FALSE
)
tableStack(
  vars,
  dataFrame,
  minlevel = "auto",
  maxlevel = "auto",
  count = TRUE,
  na.rm = FALSE,
  means = TRUE,
  medians = FALSE,
  sds = TRUE,
  decimal = 2,
  total = TRUE,
  var.labels = TRUE,
  var.labels.trunc = 150,
  reverse = FALSE,
  vars.to.reverse = NULL,
  by = NULL,
  vars.to.factor = NULL,
  iqr = "auto",
  prevalence = FALSE,
  percent = c("column", "row", "none"),
  frequency = TRUE,
  test = TRUE,
  name.test = TRUE,
  total.column = FALSE,
  simulate.p.value = FALSE,
  sample.size = TRUE,
  assumption.p.value = 0.01,
  NAcol = FALSE,
  NArow = FALSE,
  drplvls = FALSE
)

Arguments

`vars`	a vector of variables in the data frame. The imput may be given with or without quotes.
`dataFrame`	source data frame of the variables
`minlevel`	possible minimum value of items specified by user
`maxlevel`	possible maximum value of items specified by user
`count`	whether number of valid records for each item will be displayed
`na.rm`	whether missing value would be removed during calculation mean score of each person
`means`	whether means of all selected items will be displayed
`medians`	whether medians of all selected items will be displayed
`sds`	whether standard deviations of all selected items will be displayed
`decimal`	number of decimals displayed
`total`	display of means and standard deviations of total and average scores
`var.labels`	presence of descriptions of variables on the last column of output
`var.labels.trunc`	number of characters used for variable description
`reverse`	whether item(s) negatively correlated with other majority will be reversed
`vars.to.reverse`	variable(s) to reverse
`by`	a variable for column breakdown. If NONE is given, only the 'total column' will be displayed. More on Details.
`vars.to.factor`	variable(s) to be converted to factor for tabulaton
`iqr`	variable(s) to display median and inter-quartile range
`prevalence`	for logical or dichotomous variables, whether prevalence of the dichotomous row variable in each column subgroup will be displayed
`percent`	type of percentage displayed when the variable is categorical and for NArow when activated. Default is column
`frequency`	whether to display frequency in the cells when the variable is categorical and for NArow when activated
`test`	whether statistical test(s) will be computed
`name.test`	display name of the test and relevant degrees of freedom
`total.column`	whether to add 'total column' to the output or not
`simulate.p.value`	simulate P value for Fisher's exact test
`sample.size`	whether to display non-missing sample size of each column
`assumption.p.value`	level of Bartlett's test P value to judge whether the comparison and the test should be parametric
`NAcol`	whether to add 'NA column' to the output or not
`NArow`	whether to add 'NA rows' for each variable to the output or not
`drplvls`	whether to hide non used levels on factor and character variables or not

Details

This function is a clone of tableStack from the epiDisplay package. Comparing to the original, tt adds options to show the NA in the variables as categories, similar to the option useNA in the table function, and it also fix few bugs, such as showing the total.column without the need to test = TRUE, and to show or hide levels with zero counts without returning error.

This function simultaneously explores several variables with a fixed integer rating scale. For non-factor variables, the default values for tabulation are the minimum and the maximum of all variables but can be specified by the user.

When 'by' is omitted, all variables must be of the same class, and must be 'integer', 'factor' or 'logical. Some parameters are only used if by is omitted, others are only used if by is available. The by-omitted dependent variable are minlevel, maxlevel, count, na.rm, means, medians, sds, total, reverse, vars.to.reverse. The by-available dependent variables are iqr, prevalence, percent, frequency, test, name.test, total.column, simulate.p.value, sample.size, assumption.p.value, NArow, NAcol, drplvls. Unlike function 'alpha', the argument 'reverse' has a default value of FALSE. This argument is ignored if 'vars.to.reverse' is specified.

Options for 'reverse', 'vars.to.reverse' and statistics of 'means', 'medians', 'sds' and 'total' are available only if the items are not factor. To obtain statistics of factor items, users need to use 'unclassDataframe' to convert them into integer.

When the 'by' argument is given, 'reverse' and 'vars.to.reverse' do not apply, as mentioned before. Instead, columns of the 'by' variable will be formed. A table will be created against each selected variable. If the variable is a factor or coerced to factor with 'vars.to.factor', cross-tabulation will result with percents as specified, ie. "column", "row", or "none" (FALSE). For a dichotomous row variable, if set to 'TRUE', the prevalence of row variable in the form of a fraction is displayed in each subgroup column. For objects of class 'numeric' or 'integer', means with standard deviations will be displayed. For variables with residuals that are not normally distributed or where the variance of subgroups are significantly not normally distributed (using a significance level of 0.01), medians and inter-quartile ranges will be presented if the argument 'iqr' is set to "auto" (by default). Users may specify a subset of the selected variables (from the 'vars' argument) to be presented in such a form. Otherwise, the argument could be set as any other character string, except the variables names, to insist to present means and standard deviations.

When 'test = TRUE' (default), Pearson's chi-squared test (or a two-sided Fisher's exact test, if the sample size is small) will be carried out for a categorical variable or a factor. Parametric or non-parametric comparison and test will be carried out for a object of class 'numeric' or 'integer' (See 'iqr' and 'assumption.p.value' below). If the sample size of the numeric variable is too small in any group, the test is omitted and the problem reported.

For Fisher's exact test, the default method employs 'simulate.p.value = FALSE'. See further explanation in 'fisher.test' procedure. If the dataset is extraordinarily large, the option may be manually set to TRUE.

When 'by' is specified as a single character object (such as 'by="none"') or when 'by = NONE' there will be no column breakdown and all tests will be omitted. Only the total column is displayed. Only the 'total' column is shown.

If this 'total column' is to accompany the 'by' breakdown, the argument 'total.column=TRUE' should be specified. The 'sample.size' is TRUE by default. The total number of records for each group is displayed in the first row of the output. However, the variable in each row may have some missing records, the information on which is reported by NArow for each variable on 'vars' and by NAcol for the variable on 'by'.

By default, Epicalc sets 'var.labels=TRUE' in order to give nice output. However, 'var.labels=FALSE' can sometimes be more useful during data exploration. Variable numbers as well as variable names are displayed instead of variable labels. Names and numbers of abnormally distributed variables, especially factors with too many levels, can be easily identified for further relevelling or recoding.

The argument 'iqr' has a default value being "auto". Non-parametric comparison and test will be automatically chosen if Bartlett's test P value is below the 'assumption.p.value'.

The test can be forced to parametric by setting 'iqr=NULL' and to non-parametric by if iqr is set to the variable number of cont.var (See examples.).

Value

an object of class 'tableStack' and 'list' when by=NULL

`results`	an object of class 'noquote' which is used for print out
`items.reversed`	name(s) of variable(s) reversed
`total.score`	a vector from 'rowSums' of the columns of variables specified in 'vars'
`mean.score`	a vector from 'rowMeans' of the columns of variables specified in 'vars'
`mean.of.total.scores`	mean of total scores
`sd.of.total.scores`	standard deviation of total scores
`mean.of.average.scores`	mean of mean scores
`sd.of.average.scores`	standard deviation of mean scores

When 'by' is specified, an object of class 'tableStack' and 'table is returned.

Author(s)

Virasakdi Chongsuvivatwong <[email protected]>

Caio Ferreira <[email protected]>

Lunna Borges <[email protected]>

Pedro Brasil <[email protected]>

References

'table', 'tab1', 'summ', 'alpha', 'unclassDataframe'

Examples

set.seed(1)
data <- data.frame(sex = sample(c("M","F"), 50, rep = TRUE),
age = sample(c(NA,20:70), 50, rep = TRUE),
admissionType = sample(c(NA,"urgency", "clinical", "scheduled"), 50, rep = TRUE),
hospitalizationTime = sample(c(0:10), 50, rep = TRUE),
numberOfChildren = sample(c(NA,0:3), 50, rep = TRUE),
cancerInFamily = sample(c(NA,TRUE,FALSE), 50, rep = TRUE),
diabetesInFamily = sample(c(TRUE,FALSE), 50, rep = TRUE),
thrombosisInFamily = sample(c(TRUE,FALSE), 50, rep = TRUE),
mentaldiseasesInFamily = sample(c(TRUE,FALSE), 50, rep = TRUE),
cardiadicdiseaseInFamily = sample(c(NA,TRUE,FALSE), 50, rep = TRUE),
readmission = sample(c(NA,TRUE,FALSE), 50, rep = TRUE))

attach(data)
tableStack(cancerInFamily:cardiadicdiseaseInFamily, dataFrame = data)
detach(data)
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data) # Default data frame is data
# "by" compares variables
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission)
# "prevalence" returns the prevalence instead of the absolute values
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data,
by= readmission, prevalence=TRUE)
# "percent" as FALSE hides the percentage in parenthesis
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data,
by= readmission, percent=FALSE)
# "name.test" as FALSE hides the column that shows the tests names
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data,
by= readmission, percent=FALSE, name.test=FALSE)
# "NAcol" displays a column of NA values on the variable on "by"
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data,
by= readmission, NAcol = TRUE)
# "NArow" displays rows of NA values on the variables on "vars"
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data,
by= readmission, NAcol = TRUE, NArow = TRUE)

# the specification of the vars may be done as the range
tableStack(vars=2:7, data, by=sex)
# "by" var may be specified as "none" and the selected vars will be crossed only against the total
tableStack(vars=2:7, data, by="none")
# by = NONE works just as by = "none"
tableStack(vars=2:7, data, by = NONE)
# total.column displays a column of totals in adition to the variable on by
tableStack(vars=2:7, data, by=sex, total.column=TRUE)

var.labels <- c("sex", "Type of admission for each patient",
"age", "Duration time in days of the patient's hospitalization",
"Number of children that the patient have",
"whether or not the patient has cancer in family",
"whether or not the patient has diabetes in family",
"whether or not the patient has thrombosis in family",
"whether or not the patient has mental diseases in family",
"whether or not the patient has cardiac diseases in family",
"whether or not the patient is on a relapse admission")
#setting the attribute var.labels
attr(data, "var.labels") <- var.labels
rm(var.labels)

# May need full screen of Rconsole
tableStack(vars=c(numberOfChildren,hospitalizationTime), data)
# Fits in with default R console screen
tableStack(vars=c(numberOfChildren,hospitalizationTime), data,
var.labels.trunc=35)
tableStack(vars=c(age,numberOfChildren,hospitalizationTime),
data, reverse=TRUE) -> a
a
## Components of 'a' have appropriate items reversed
a$mean.score -> mean.score
a$total.score -> total.score
data$mean.score <- mean.score
data$total.score <- total.score

# hiding the test column
tableStack(c(age, numberOfChildren,hospitalizationTime,
mean.score,total.score), data, by=sex, test=FALSE)
# variables specified on iqr will not display SD but IQR instead
tableStack(3:5, data, by=sex, iqr=hospitalizationTime)
## 'vars' can be mixture of variables of different classes
tableStack(3:5, data, by=admissionType,
iqr=c(hospitalizationTime, total.score))

data$highscore <- mean.score > 4
# a variable with some comparison may be created easily
tableStack(mean.score:highscore, data,
by=sex, iqr=total.score)

# the percentage information may be hidden
tableStack(vars=c(readmission,admissionType),
data, by=sex, percent="none")
# it may be shown the prevalende of the
# variable instead of the values themselves
tableStack(vars=c(readmission,admissionType), data,
by=sex, prevalence = TRUE)
# the name of the tests may be hidden
# while the test itself still shows
tableStack(vars=c(readmission,admissionType), data,
by=sex, name.test = FALSE)

## Variable in numeric or factor
# as continuous varaibles
tableStack(vars=3:5, data, by=sex)
# as factors
tableStack(vars=3:5, data, by=sex, vars.to.factor = 3:5)

## Using drplvls
# a dataframe will be created containing a factor with an unused level
bloodbank <- data.frame(AgeInDays =
    sample(0:15,200, replace = TRUE), Type =
    factor(sample(c("A","B","0"), 200, replace = TRUE),
      levels = c("A","B","AB","0")), Origin =
    sample(c("US","CA"), 200, replace = TRUE))

# by using drplvls the row of the unused fator is hidden
tableStack(vars = c(AgeInDays, Type),
bloodbank, by = Origin) #usual
tableStack(vars = c(AgeInDays, Type),
bloodbank, by = Origin,
drplvls = TRUE) # with drplvls

rm(total.score, mean.score, a, data, bloodbank)
set.seed(1)
data <- data.frame(sex = sample(c("M","F"), 50, rep = TRUE),
age = sample(c(NA,20:70), 50, rep = TRUE),
admissionType = sample(c(NA,"urgency", "clinical", "scheduled"), 50, rep = TRUE),
hospitalizationTime = sample(c(0:10), 50, rep = TRUE),
numberOfChildren = sample(c(NA,0:3), 50, rep = TRUE),
cancerInFamily = sample(c(NA,TRUE,FALSE), 50, rep = TRUE),
diabetesInFamily = sample(c(TRUE,FALSE), 50, rep = TRUE),
thrombosisInFamily = sample(c(TRUE,FALSE), 50, rep = TRUE),
mentaldiseasesInFamily = sample(c(TRUE,FALSE), 50, rep = TRUE),
cardiadicdiseaseInFamily = sample(c(NA,TRUE,FALSE), 50, rep = TRUE),
readmission = sample(c(NA,TRUE,FALSE), 50, rep = TRUE))

attach(data)
tableStack(cancerInFamily:cardiadicdiseaseInFamily, dataFrame = data)
detach(data)
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data) # Default data frame is data
# "by" compares variables
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission)
# "prevalence" returns the prevalence instead of the absolute values
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data,
by= readmission, prevalence=TRUE)
# "percent" as FALSE hides the percentage in parenthesis
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data,
by= readmission, percent=FALSE)
# "name.test" as FALSE hides the column that shows the tests names
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data,
by= readmission, percent=FALSE, name.test=FALSE)
# "NAcol" displays a column of NA values on the variable on "by"
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data,
by= readmission, NAcol = TRUE)
# "NArow" displays rows of NA values on the variables on "vars"
tableStack(cancerInFamily:cardiadicdiseaseInFamily, data,
by= readmission, NAcol = TRUE, NArow = TRUE)

# the specification of the vars may be done as the range
tableStack(vars=2:7, data, by=sex)
# "by" var may be specified as "none" and the selected vars will be crossed only against the total
tableStack(vars=2:7, data, by="none")
# by = NONE works just as by = "none"
tableStack(vars=2:7, data, by = NONE)
# total.column displays a column of totals in adition to the variable on by
tableStack(vars=2:7, data, by=sex, total.column=TRUE)

var.labels <- c("sex", "Type of admission for each patient",
"age", "Duration time in days of the patient's hospitalization",
"Number of children that the patient have",
"whether or not the patient has cancer in family",
"whether or not the patient has diabetes in family",
"whether or not the patient has thrombosis in family",
"whether or not the patient has mental diseases in family",
"whether or not the patient has cardiac diseases in family",
"whether or not the patient is on a relapse admission")
#setting the attribute var.labels
attr(data, "var.labels") <- var.labels
rm(var.labels)

# May need full screen of Rconsole
tableStack(vars=c(numberOfChildren,hospitalizationTime), data)
# Fits in with default R console screen
tableStack(vars=c(numberOfChildren,hospitalizationTime), data,
var.labels.trunc=35)
tableStack(vars=c(age,numberOfChildren,hospitalizationTime),
data, reverse=TRUE) -> a
a
## Components of 'a' have appropriate items reversed
a$mean.score -> mean.score
a$total.score -> total.score
data$mean.score <- mean.score
data$total.score <- total.score

# hiding the test column
tableStack(c(age, numberOfChildren,hospitalizationTime,
mean.score,total.score), data, by=sex, test=FALSE)
# variables specified on iqr will not display SD but IQR instead
tableStack(3:5, data, by=sex, iqr=hospitalizationTime)
## 'vars' can be mixture of variables of different classes
tableStack(3:5, data, by=admissionType,
iqr=c(hospitalizationTime, total.score))

data$highscore <- mean.score > 4
# a variable with some comparison may be created easily
tableStack(mean.score:highscore, data,
by=sex, iqr=total.score)

# the percentage information may be hidden
tableStack(vars=c(readmission,admissionType),
data, by=sex, percent="none")
# it may be shown the prevalende of the
# variable instead of the values themselves
tableStack(vars=c(readmission,admissionType), data,
by=sex, prevalence = TRUE)
# the name of the tests may be hidden
# while the test itself still shows
tableStack(vars=c(readmission,admissionType), data,
by=sex, name.test = FALSE)

## Variable in numeric or factor
# as continuous varaibles
tableStack(vars=3:5, data, by=sex)
# as factors
tableStack(vars=3:5, data, by=sex, vars.to.factor = 3:5)

## Using drplvls
# a dataframe will be created containing a factor with an unused level
bloodbank <- data.frame(AgeInDays =
    sample(0:15,200, replace = TRUE), Type =
    factor(sample(c("A","B","0"), 200, replace = TRUE),
      levels = c("A","B","AB","0")), Origin =
    sample(c("US","CA"), 200, replace = TRUE))

# by using drplvls the row of the unused fator is hidden
tableStack(vars = c(AgeInDays, Type),
bloodbank, by = Origin) #usual
tableStack(vars = c(AgeInDays, Type),
bloodbank, by = Origin,
drplvls = TRUE) # with drplvls

rm(total.score, mean.score, a, data, bloodbank)

Package 'ems'

Help Index

German Breast Cancer Dataset

Description

Usage

Format

Details

Source

Calibration Curve

Description

Usage

Arguments

Details

Author(s)

Examples

Collection of functions to check data quality in a dataset and remove not valid or extreme values.

Description

Usage

Arguments

Author(s)

See Also

Examples

Funnel plot for benchmarking health units

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Data from ICU admissions.

Description

Usage

Format

Details

Source

Miscellaneous functions for data editing

Description

Usage

Arguments

Author(s)

See Also

Examples

Mortality Rate

Description

Usage

Arguments

Author(s)

Examples

Comparisson of the Standardized Resource Use (SRU)

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Standardized Mortality Ratio (SMR)

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Standardized Resource Use (SRU)

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Survival models performance analysis

Description

Usage

Arguments