Title: | Epimed Solutions Collection for Data Editing, Analysis, and Benchmark of Health Units |
---|---|
Description: | Collection of functions related to benchmark with prediction models for data analysis and editing of clinical and epidemiological data. |
Authors: | Lunna Borges [aut, cre] |
Maintainer: | Lunna Borges <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.3.11 |
Built: | 2024-10-31 04:05:39 UTC |
Source: | https://github.com/cran/ems |
A dataset containing variables related with breast cancer from german population.
breastCancer
breastCancer
A data frame with 686 rows and 14 variables
age
Patient age (years).
meno
Menopausal status (0 = premeno, 1 = postmeno).
size
Tumour size (mm).
gradd1
1 = tumour grade 2 or 3; 0 = grade 1.
gradd2
1 = tumour grade 3; 0 = grade 1 or 2.
nodes
Number of positive lymph nodes.
enodes
emp(-0.12 x nodes
).
pgr
Progesterone receptor status (fmol 1^-1).
er
Oestrogen reeceotor status (fmol 1^-1).
hormon
Tamoxifen treatment (0 = no; 1 = yes).
rectime
Time (days) to death or cancer recurrence.
censrec
Censoring (0 = censored; 1 = event).
https://biostat.app.vumc.org/wiki/Main/DataSets
calcurve
function returns a data.frame containing the number of patients, the observed mortality rate and the predicted mortality rate for each category of the predicted mortality rate. If any other acute physiology score is given, the function will also return the mortality rate predicted by this score for each category.
calcurve( deaths, pred, score = NULL, name_score = "Saps3", other_score = NULL, name_other_score = NULL, categories_option = c("predicted", "score", "patients"), table = FALSE, plot = TRUE, title_label = "Calibration Curve", y1axis_label = "Patients (n)", y2axis_label = "Mortality Rate (%)", score_color = c("#cac7cc", "#ffc341", "#33cca3"), bar_color = "#1f77b4", points = c(19, 18, 17), cultureCode = "en-US", legend_inset = -0.7 ) ## S3 method for class 'calcurve' print(x, ...) ## S3 method for class 'calcurve' plot( x, ..., xlab = NULL, ylab2 = NULL, main = x$title_label, text = x$y2axis_label, ylab = x$y1axis_label, col = c(x$bar_color, x$score_color), pch = x$points, cultureCode = x$cultureCode, legend_inset = x$legend_inset )
calcurve( deaths, pred, score = NULL, name_score = "Saps3", other_score = NULL, name_other_score = NULL, categories_option = c("predicted", "score", "patients"), table = FALSE, plot = TRUE, title_label = "Calibration Curve", y1axis_label = "Patients (n)", y2axis_label = "Mortality Rate (%)", score_color = c("#cac7cc", "#ffc341", "#33cca3"), bar_color = "#1f77b4", points = c(19, 18, 17), cultureCode = "en-US", legend_inset = -0.7 ) ## S3 method for class 'calcurve' print(x, ...) ## S3 method for class 'calcurve' plot( x, ..., xlab = NULL, ylab2 = NULL, main = x$title_label, text = x$y2axis_label, ylab = x$y1axis_label, col = c(x$bar_color, x$score_color), pch = x$points, cultureCode = x$cultureCode, legend_inset = x$legend_inset )
deaths |
a numerical vector that only contains 0 and 1, indicating whether the patient was alive or dead, respectively. |
pred |
a numerical vector that contains the mortality rate predicted by the main score, in percentage, for each patient. |
score |
a numerical vector that contains the main score punctuation for each patient, or NULL. |
name_score |
a character string which determines the name of the main score. |
other_score |
a list of numerical vectors, where each vector contains the mortality rate predicted by other score, in percentage, for each patient, or NULL (the default). |
name_other_score |
if other_score variable is different from NULL, this argument must be a vector with the name(s) of the score(s) given. |
categories_option |
a character string which determines if the categories will refer to the main score or to the predicted mortality rate. Accepted values are 'predicted' (the default), 'score' or 'patients'. |
table |
logical; if |
plot |
logical; if |
title_label |
main title for |
y1axis_label , y2axis_label
|
labels of the main vertical axis and the secondary axis, respectively, for |
score_color |
a vector with the colors to be used in the score traces for |
bar_color |
color of the bars for |
points |
a vector with markers types of the scores for |
cultureCode |
a character specifying which language should be used for plot x axis title and legends. Possible values are 'pt-BR' and 'en-US'. The default is 'en-US'. |
legend_inset |
inset distance(s) from the margins as a fraction of the plot region when legend is placed by keyword. See |
x |
an object of class 'calcurve'. |
... |
further arguments passed to |
xlab |
label of the horizontal axis for |
ylab2 |
label of the secondary vertical axis for |
main |
main title for |
text |
label of the secondary vertical axis for |
ylab |
label of the main vertical axis for |
col |
character vector with the colors of the bars and score traces, in this order, for |
pch |
a vector with markers types of the scores for |
If categories_option = 'score'
, the categories will refer to the deciles of the main score punctuation. If categories_option = 'predicted'
, the categories will refer to fixed intervals of the predicted mortality rate. If categories_option = 'patients'
the categories will refer to the deciles of patients.
Camila Cardoso
# Loading the dataset data(icu) # Calibration Curve Plot a <- calcurve(deaths = icu$UnitDischargeName, pred = icu$Saps3DeathProbabilityStandardEquation, score = icu$Saps3Points, name_score = 'Saps3', categories_option = 'predicted', table = FALSE, plot = TRUE)
# Loading the dataset data(icu) # Calibration Curve Plot a <- calcurve(deaths = icu$UnitDischargeName, pred = icu$Saps3DeathProbabilityStandardEquation, score = icu$Saps3Points, name_score = 'Saps3', categories_option = 'predicted', table = FALSE, plot = TRUE)
These functions return the counts and fractions of expected values, unexpected values, missing values and not valid values. They are able to do it with factor variables, numeric variables and date variables. t_factor
, t_num
, and t_date
do the job for a single variable and have simpler arguments, while factor.table
, num.table
, and date.table
do the job for several variables at once. rm.unwanted
cheks the factor and numeric variables and remove the not valid or extreme values. This approach is attractive before data imputation. They all return a data.frame
.
t_factor
and factor.table
will try to get factor or character variables and check how much of its content match with the expectd. They will try to treat the levels or cells containing " " as NAs
.
t_num
will try to get a numeric variable (even if it is currently formated as character or factor) and check how much of its content is expected (match a desired range), unexpected, non-numeric values and missing vlaues. num.table
does the same, but for two or more variables at once.
t_date
will try to get a date variable (even if it is currently formated as character or factor) and check how much of its content is expected (match a desired range), unexpected, non-date values and missing vlaues. date.table
does the same, but for two or more variables at once.
rm.unwanted
will chek in data the variables specified in the limits object according to the limits specified for each variable. If there are levels considered not valid in a factor variable, these levels are deleted. For example, if Sex is expected to be "M" and "F", and there is also an "I" level in data, every "I" is replaced by NA
. Similarly, misspelled levels will be understood as non-valid levels and coercerd to NA
, with the exception of leading or trailing empty spaces and lower and upper cases diferences if try.keep = TRUE
. If there is a continuous numeric variable and it is expected to have values ranging from 30 to 700, the values outside this range, i.e. higher then 700 or lower then 30, are replaced by NA
. Non-numeric elements, i.e. non-valid elements that should be numeric, will also be coerced to NA
. If a varible is specified in num.limits
, then it will be returned as a numeric variable, even if it was formated as factor or character. If a variable is specified in limits, the returnig format will depend on the stringAsFactors
argument, unless it is formated as logical. In this case it is skipped. The arguments limits
and num.limits
may be NULL
, meaning that the factor-character variables or the numeric variables , respectively, will not be edited.
t_factor( data, variable, legal, var.labels = attr(data, "var.labels")[match(variable, names(data))], digits = 3 ) factor.table( data, limits, var.labels = attr(data, "var.labels")[match(unlist(sapply(seq_along(limits), function(i) limits[[i]][1])), names(data))], digits = 3 ) t_num( data, num.var, num.max = 100, num.min = 0, var.labels = attr(data, "var.labels")[match(num.var, names(data))], digits = 3 ) num.table( data, num.limits, var.labels = attr(data, "var.labels")[match(num.limits$num.var, names(data))], digits = 3 ) t_date( data, date.var, date.max = as.Date("2010-11-30"), date.min = as.Date("2010-01-31"), format.date = "auto", digits = 3, var.labels = attr(data, "var.labels")[match(date.var, names(data))] ) date.table( data, date.limits, format.date = "auto", digits = 3, var.labels = attr(data, "var.labels")[match(date.limits$date.var, names(data))] ) rm.unwanted( data, limits = NULL, num.limits = TRUE, try.keep = TRUE, stringAsFactors = TRUE )
t_factor( data, variable, legal, var.labels = attr(data, "var.labels")[match(variable, names(data))], digits = 3 ) factor.table( data, limits, var.labels = attr(data, "var.labels")[match(unlist(sapply(seq_along(limits), function(i) limits[[i]][1])), names(data))], digits = 3 ) t_num( data, num.var, num.max = 100, num.min = 0, var.labels = attr(data, "var.labels")[match(num.var, names(data))], digits = 3 ) num.table( data, num.limits, var.labels = attr(data, "var.labels")[match(num.limits$num.var, names(data))], digits = 3 ) t_date( data, date.var, date.max = as.Date("2010-11-30"), date.min = as.Date("2010-01-31"), format.date = "auto", digits = 3, var.labels = attr(data, "var.labels")[match(date.var, names(data))] ) date.table( data, date.limits, format.date = "auto", digits = 3, var.labels = attr(data, "var.labels")[match(date.limits$date.var, names(data))] ) rm.unwanted( data, limits = NULL, num.limits = TRUE, try.keep = TRUE, stringAsFactors = TRUE )
data |
A data.frame where variables will be tested. |
variable |
A character vector of length one, indicating the name of the variable in the dataset to be tested. |
legal |
A character vector representeing the expected levels of the tested variable. |
var.labels |
Variables labels for a nice output. Must be informed in the same order as variable argument. By default, it captures the labels stored in attr(data, "var.labels"), if any. If not informed, the function returns the variables names. |
digits |
Number of decimal places for rounding. |
limits |
a list of two or more lists, each containing the arguments variable name and legal levels (in this order), to check on the factor variables. In the case of |
num.var |
A character vector indicating the name of a variable that should be numeric (although it can yet be formated as character or factor). |
num.max , num.min
|
The maximal and minimal limits of acceptable range of a numeric variable. |
num.limits |
A data.frame with the following variables: num.var, num.max and num.min, representing the numeric variables names, maximal and minimal expected valid values. In the case of |
date.var |
A character vector indicating the name of a variable in data that should be a date (althoug it can yet be formated as character or factor). |
date.max , date.min
|
The maximal and minimal limits of acceptable range of a date variable. |
format.date |
Default is "auto". If so, |
date.limits |
A |
try.keep |
Default is |
stringAsFactors |
In |
Lunna Borges & Pedro Brasil
# Simulating a dataset with 5 factor variables and assigning labels y <- data.frame(Var1 = sample(c("Yes","No", "Ignored", "", "yes ", NA), 200, replace = TRUE), Var2 = sample(c("Death","Discharge", "", NA), 200, replace = TRUE), Var3 = sample(c(16:35, NA), 200, replace = TRUE), Var4 = sample(c(12:300, "Female", "", NA), 200, replace = TRUE), Var5 = sample(c(60:800), 200, replace = TRUE)) attr(y, "var.labels") <- c("Intervention use","Unit destination","BMI","Age","Cholesterol") summary(y) # Cheking the quality only the first variable t_factor(y, "Var1", c("Yes","No","Ignored")) # Checking two or more variables at once factor.limits = list(list("Var1",c("Yes","No")), list("Var2",c("Death","Discharge"))) factor.table(y, limits = factor.limits) # Checking only one variable that shohuld be numeric t_num(y,"Var3", num.min = 17, num.max = 32) # Making the limits data.frame num.limits <- data.frame(num.var = c("Var3","Var4","Var5"), num.min = c(17,18,70), num.max = c(32,110,300)) num.limits # Checking two or more numeric variables (or the ones that # should be as numeric) at once num.table(y, num.limits) # Removing the unwanted values (extremes or not valid). y <- rm.unwanted(data = y, limits = factor.limits, num.limits = num.limits) summary(y) rm(y, num.limits, factor.limits) #' # Loading a dataset and assigning labels data(icu) attr(icu, "var.labels")[match(c("UnitAdmissionDateTime","UnitDischargeDateTime", "HospitalAdmissionDate", "HospitalDischargeDate"), names(icu))] <- c("Unit admission","Unit discharge","Hospital admission","Hospital discharge") # Checking only one variable that should be a date. t_date(icu, "HospitalDischargeDate", date.max = as.Date("2013-10-30"), date.min = as.Date("2013-02-20")) # Checking a date variable misspecifying the date format # will cause the variable dates to be identified as non-date values. t_date(data = icu, date.var = "HospitalDischargeDate", date.max = as.Date("2013-10-30"), date.min = as.Date("2013-02-20"), format.date = "%d/%m/%Y") # Making a limit data.frame assuming an 'auto' format.date d.lim <- data.frame(date.var = c("UnitAdmissionDateTime","UnitDischargeDateTime", "HospitalAdmissionDate","HospitalDischargeDate"), date.min = rep(as.Date("2013-02-28"), 4), date.max = rep(as.Date("2013-11-30"), 4)) d.lim # Checking two or more date variables (or the ones that should be as date) at once date.table(data = icu, date.limits = d.lim) # Making a limit data.frame specifying format.date argument # Here the the last 'format.date' is missspecified on purpose # So, the last date will be identified as non-date values. d.lim <- data.frame(date.var = c("UnitAdmissionDateTime","UnitDischargeDateTime", "HospitalAdmissionDate","HospitalDischargeDate"), date.min = rep(as.Date("2013-02-28"), 4), date.max = rep(as.Date("2013-11-30"), 4), format.date = c(rep("%Y/%m/%d",3), "%Y-%m-%d")) d.lim # Checking the quality of date variable with new limits. # The 'format.date = ""' is required to force the function to look the format # into the date.limits data.frame date.table(data = icu, date.limits = d.lim, format.date = "") rm(icu, d.lim)
# Simulating a dataset with 5 factor variables and assigning labels y <- data.frame(Var1 = sample(c("Yes","No", "Ignored", "", "yes ", NA), 200, replace = TRUE), Var2 = sample(c("Death","Discharge", "", NA), 200, replace = TRUE), Var3 = sample(c(16:35, NA), 200, replace = TRUE), Var4 = sample(c(12:300, "Female", "", NA), 200, replace = TRUE), Var5 = sample(c(60:800), 200, replace = TRUE)) attr(y, "var.labels") <- c("Intervention use","Unit destination","BMI","Age","Cholesterol") summary(y) # Cheking the quality only the first variable t_factor(y, "Var1", c("Yes","No","Ignored")) # Checking two or more variables at once factor.limits = list(list("Var1",c("Yes","No")), list("Var2",c("Death","Discharge"))) factor.table(y, limits = factor.limits) # Checking only one variable that shohuld be numeric t_num(y,"Var3", num.min = 17, num.max = 32) # Making the limits data.frame num.limits <- data.frame(num.var = c("Var3","Var4","Var5"), num.min = c(17,18,70), num.max = c(32,110,300)) num.limits # Checking two or more numeric variables (or the ones that # should be as numeric) at once num.table(y, num.limits) # Removing the unwanted values (extremes or not valid). y <- rm.unwanted(data = y, limits = factor.limits, num.limits = num.limits) summary(y) rm(y, num.limits, factor.limits) #' # Loading a dataset and assigning labels data(icu) attr(icu, "var.labels")[match(c("UnitAdmissionDateTime","UnitDischargeDateTime", "HospitalAdmissionDate", "HospitalDischargeDate"), names(icu))] <- c("Unit admission","Unit discharge","Hospital admission","Hospital discharge") # Checking only one variable that should be a date. t_date(icu, "HospitalDischargeDate", date.max = as.Date("2013-10-30"), date.min = as.Date("2013-02-20")) # Checking a date variable misspecifying the date format # will cause the variable dates to be identified as non-date values. t_date(data = icu, date.var = "HospitalDischargeDate", date.max = as.Date("2013-10-30"), date.min = as.Date("2013-02-20"), format.date = "%d/%m/%Y") # Making a limit data.frame assuming an 'auto' format.date d.lim <- data.frame(date.var = c("UnitAdmissionDateTime","UnitDischargeDateTime", "HospitalAdmissionDate","HospitalDischargeDate"), date.min = rep(as.Date("2013-02-28"), 4), date.max = rep(as.Date("2013-11-30"), 4)) d.lim # Checking two or more date variables (or the ones that should be as date) at once date.table(data = icu, date.limits = d.lim) # Making a limit data.frame specifying format.date argument # Here the the last 'format.date' is missspecified on purpose # So, the last date will be identified as non-date values. d.lim <- data.frame(date.var = c("UnitAdmissionDateTime","UnitDischargeDateTime", "HospitalAdmissionDate","HospitalDischargeDate"), date.min = rep(as.Date("2013-02-28"), 4), date.max = rep(as.Date("2013-11-30"), 4), format.date = c(rep("%Y/%m/%d",3), "%Y-%m-%d")) d.lim # Checking the quality of date variable with new limits. # The 'format.date = ""' is required to force the function to look the format # into the date.limits data.frame date.table(data = icu, date.limits = d.lim, format.date = "") rm(icu, d.lim)
Produces a variety of funnel plots comparing health units or ICUs (intensive care units) making easy to identify those units which deviate from the group. There is a function that calculates all the values required and returns the values for all units and the funnel, and there is a function that calls graphical parameters from the former values. The options of funnels available are the funnel for rate, for ratio of rates, for proportions, for difference of proportions and for ratio of proportions.
The funnel for rates are usually plots of either SMR or SRU at vertical axis. If the direct method is chosen, the horizontal axis will display the number of admissions. If the indirect method is chosen instead, the expected number of deaths will be displayed for SMR or the expected length of stay for SRU. As consequence of this differentiation, the interpretation regarding the clssification of the points displayed will be the same in every case.
The funnel for ratio of rates are usually plots of ratios of SMRs (or SRUs) within the same units. These two SMRs are, for example, from the same units in different time periods. Therefore, it expresses how the SMR changed over time. If the number of expected deaths is different in both periods, the plot will return at the horizontal axis a parametrization of the geometric mean of the expected number of deaths in both periods for each unit. If the number of expected deaths is identical in both periods, the plot will return at the horizontal axis the arithmetic mean of the observed number of deaths in both periods for each unit.
The funnel for proportions plots on the vertical axis the percentage of observed deaths of the units and on the horizontal axis the number (volume) of admissions. The funnel for ratio of proportions and for difference of proportions are usually used to express the fraction of deaths of the same units in different time period. Therefore, they express how the fraction of deaths changed over time in each unit. If one picks the difference of proportions, the horizontal axis will display a parametrization of the arithmetic mean of the number of admissions in both periods. If one picks the ratio of proportions, the horizontal axis will display a parametrization of the geometric mean of the number of admissions in both periods.
funnel( unit, y, n, n1, n2, o, o1, o2, e, e1, e2, lambda1 = sum(o1)/sum(n1), lambda2 = sum(o2)/sum(n2), pi1 = sum(o1)/sum(n1), pi2 = sum(o2)/sum(n2), y.type = c("SMR", "SRU"), p = c(0.95, 0.998), theta, method = c("normal", "exact"), direct = FALSE, myunits = rep(0, length(unit)), option = c("rate", "ratioRates", "prop", "diffProp", "ratioProp"), printUnits = TRUE, plot = TRUE, digits = 5, overdispersion = FALSE, ... ) ## S3 method for class 'funnel' print(x, ...) ## S3 method for class 'funnel' plot( x, ..., col = c("darkblue", "paleturquoise3", "gray26"), lwd = 2, lty = c(2, 6, 1), bty = "n", pch = 21, pt.col = "white", bg = "orange", pt.cex = 1.5, auto.legend = TRUE, text.cex = 0.7, text.pos = NULL, mypts.col = "darkblue", printUnits = x$printUnits, xlab = x$xlab, ylab = x$ylab, xlim = x$xlim, ylim = x$ylim ) rateFunnel( unit, y, n, o, e, y.type, p, theta = 1, method = c("exact", "normal"), direct, ..., printUnits, auto.xlab = TRUE, xlab = c("Volume of cases", "Expected values"), ylab = y.type[1], xlim = c(0, max(rho)), ylim = c(min(lowerCI[[which(p == max(p))]]), max(upperCI[[which(p == max(p))]])), myunits, digits, overdispersion ) changeRateFunnel( unit, n1, n2, o1, e1, o2, e2, lambda1, lambda2, y.type, p, ..., printUnits, auto.xlab = TRUE, xlab = c("Average observed count", "Expectation per period"), auto.ylab = TRUE, ylab = c(paste0(y.type[1], "'s Ratio"), paste0("Log(", y.type[1], "'s Ratio)")), ylim = c(max(lowerCI[[which(p == max(p))]]) - 1.5 * theta, min(upperCI[[which(p == max(p))]]) + 1.5 * theta), xlim = c(0, max(rho)), myunits, digits, overdispersion ) propFunnel( unit, o, n, theta, p, method = c("exact", "normal"), ..., printUnits, ylab = "%", xlab = "Volume", ylim = c(0, min(upperCI[[which(p == max(p))]]) + 2.5 * theta), xlim = c(0, max(n)), myunits, digits, overdispersion ) changePropFunnel( unit, o1, o2, n1, n2, p, pi1, pi2, method = c("diff", "ratio"), ..., printUnits, xlab = "Sample size per period", auto.ylab = TRUE, ylab = c("Proportions difference", "Proportions ratio log"), ylim = c(max(lowerCI[[which(p == max(p))]]) - 6 * theta, min(upperCI[[which(p == max(p))]]) + 6 * theta), xlim = c(0, max(rho)), myunits, digits, overdispersion )
funnel( unit, y, n, n1, n2, o, o1, o2, e, e1, e2, lambda1 = sum(o1)/sum(n1), lambda2 = sum(o2)/sum(n2), pi1 = sum(o1)/sum(n1), pi2 = sum(o2)/sum(n2), y.type = c("SMR", "SRU"), p = c(0.95, 0.998), theta, method = c("normal", "exact"), direct = FALSE, myunits = rep(0, length(unit)), option = c("rate", "ratioRates", "prop", "diffProp", "ratioProp"), printUnits = TRUE, plot = TRUE, digits = 5, overdispersion = FALSE, ... ) ## S3 method for class 'funnel' print(x, ...) ## S3 method for class 'funnel' plot( x, ..., col = c("darkblue", "paleturquoise3", "gray26"), lwd = 2, lty = c(2, 6, 1), bty = "n", pch = 21, pt.col = "white", bg = "orange", pt.cex = 1.5, auto.legend = TRUE, text.cex = 0.7, text.pos = NULL, mypts.col = "darkblue", printUnits = x$printUnits, xlab = x$xlab, ylab = x$ylab, xlim = x$xlim, ylim = x$ylim ) rateFunnel( unit, y, n, o, e, y.type, p, theta = 1, method = c("exact", "normal"), direct, ..., printUnits, auto.xlab = TRUE, xlab = c("Volume of cases", "Expected values"), ylab = y.type[1], xlim = c(0, max(rho)), ylim = c(min(lowerCI[[which(p == max(p))]]), max(upperCI[[which(p == max(p))]])), myunits, digits, overdispersion ) changeRateFunnel( unit, n1, n2, o1, e1, o2, e2, lambda1, lambda2, y.type, p, ..., printUnits, auto.xlab = TRUE, xlab = c("Average observed count", "Expectation per period"), auto.ylab = TRUE, ylab = c(paste0(y.type[1], "'s Ratio"), paste0("Log(", y.type[1], "'s Ratio)")), ylim = c(max(lowerCI[[which(p == max(p))]]) - 1.5 * theta, min(upperCI[[which(p == max(p))]]) + 1.5 * theta), xlim = c(0, max(rho)), myunits, digits, overdispersion ) propFunnel( unit, o, n, theta, p, method = c("exact", "normal"), ..., printUnits, ylab = "%", xlab = "Volume", ylim = c(0, min(upperCI[[which(p == max(p))]]) + 2.5 * theta), xlim = c(0, max(n)), myunits, digits, overdispersion ) changePropFunnel( unit, o1, o2, n1, n2, p, pi1, pi2, method = c("diff", "ratio"), ..., printUnits, xlab = "Sample size per period", auto.ylab = TRUE, ylab = c("Proportions difference", "Proportions ratio log"), ylim = c(max(lowerCI[[which(p == max(p))]]) - 6 * theta, min(upperCI[[which(p == max(p))]]) + 6 * theta), xlim = c(0, max(rho)), myunits, digits, overdispersion )
unit |
A factor vector representing the unit names. |
y |
A numeric vector representing the "Standardized rate" for each unit, usually the SMR (Standardized Mortality Ratio), or possibly the SRU (Standardized Resource Use), accordind to |
n |
A numeric vector representing the case volume, or number of admissions, for each unit. |
n1 , n2
|
If one picks |
o |
A numeric vector representing the observed death. Acceptable values are 0 (absence) or 1 (presence). |
o1 , o2
|
If one picks |
e |
Used only when |
e1 , e2
|
If one picks |
lambda1 , lambda2
|
Values correponding to the rate at which a death occurs in the institutions at the 1st and 2nd periods, respectively. It is assumed that the parameters |
pi1 , pi2
|
Values correponding to the probability for the occurrence of a death in the institutions at the 1st and 2nd periods, respectively. Its assumed that the paramenters |
y.type |
A character vector representing the indicator type. It is used to name the vertical axis if |
p |
A confidence level numeric vector. The function will return a confidence interval for each value in p. The default is 2 and 3 standard deviations ( |
theta |
The target value which specifies the desired expectation for institutions considered "in control". Used when |
method |
There are two kinds of approximations for the CI, as mentioned in |
direct |
Logical (default = |
myunits |
A numeric vector coded with 0 and 1 indicating which units one would like to benchmark among all units. These will be highlighted with dots of different collors in the plot. |
option |
A character specifying the type of funnel plot one wants to produce. It can assume |
printUnits |
Logical (default = |
plot |
Logical; If |
digits |
Integer indicating the number of decimals to be used in the output. |
overdispersion |
Logical (default = FALSE); If TRUE, introduces an multiplicative over-dispersion factor phi that will inflate the CI null variance. See details. |
... |
Further arguments passed to |
x |
An object of class 'funnel'. |
col |
A character vector representing the colors for the CI funnel lines. Must have same length of |
lwd |
A positive number specifying the lines width. It's the same for all lines in the plot. See |
lty |
A numeric vector representing the CI lines types. See |
bty |
A character string which represents the type of |
pch |
Either an integer or a single character specifying a symbol to be used as the default in plotting points. See |
pt.col |
A character specifying the points colors. |
bg |
A character specifying the color to be used for the points background when |
pt.cex |
A numerical value giving the amount by which plotting points should be magnified relative to the default. See |
auto.legend |
Logical; If |
text.cex |
A numerical value giving the amount by which plotting text should be magnified relative to the default. See |
text.pos |
A position specifier for numbers that correspond to the units in the plot. Values of 1, 2, 3 and 4, respectively indicate positions below, to the left of, above and to the right of the points. |
mypts.col |
A character representing the color used to benchmark the units specified in |
xlab , ylab
|
A title for the x and y axis. See |
xlim , ylim
|
Limits of horizontal and vertical axis. These limits are defined in the funnel plot and passed to |
auto.xlab , auto.ylab
|
Logical. If |
For every possible value of option
, if overdispersion = TRUE
, the CI can be inflated by a overdispersion parameter phi. There is a test for overdispersion which inflates the funnel if it's necessary. An "Winsorized" over-dispersion parameter is estimated and is used to inflate the funnel limits if it is significantly greater than 1. The parameter phi is returned as an funnel object.
If option = "rate"
, funnel
plots a standardized rate y versus the expected number of deaths or volume value for several units.
To choose the direct
argument, one should pay attention if one wants to use a Direct or Indirect Standardized Rate. If direct, we assume the rate is reported as a rate per (say) 1000 individuals, then it is treated as a proportion. If indirect, we assume it is a cross-sectional data that leads to a standardized event ratio.
In many circumstances we can assume an exact or approximate normal distribution for the data. Using the method
argument, one could choose between "exact"
or "normal"
. For direct standardized rates, the exact distribuition is binomial and for indirect standardized rates, the exact distribuition is poisson. Assume rho is the precision parameter (volume, for direct rates; expected value, for indirect rates). The original report claims that, for rho > 100, the normal and exact curves almost coincide. So, one could perfectly use normal approximation if ones data parameter precision is greater than 100, in general.
The console warns if there are units with volume/expected value less than 100.
phi = (1/total) * sum((y - theta) ^ 2 * rho)/g(theta)
var(y|theta,rho) = (phi * g(theta))/rho
If option = "ratioRate"
, funnel
can be used to compare units at two diferent periods. It plots a ratio of rates y versus a precision parameter rho.
Suppose we have two measures for each institution: O1; E1 in a baseline period and O2; E2 in a subsequent period, and we wish to assess the change in the underlying rate (SMR or SRU). We shall only consider the ratio of rates option. The exact method will automatically be applied if E1 = E2, and the indirect method, of normal approximations, otherwise. On this second method, for low (especially zero) counts the funnel
function adds 0.5 to all parameters O and E in order to stabilize the estimates.
Y = (O1/E1)/(O2/E2) and the target theta = lambda2/lambda1.
When E1 = E2, y is plotted versus the average observed count (rho).
When E1 is different of E2, i.e., it is used normal approximation. It is convenient to work on a logarithmic scale so that log(theta) is a target for log(Y). Y is plotted versus a different rho depending on the chosen rate.
If option = "prop"
, funnel
plots a proportion y versus its volume.
It is used for cross-sectional data. Suppose in each institution that O events are observed out of a sample size of N:
The indicator is the observed proportion y = O/N
Assume N
is the precision parameter (volume). Similarly to when option = "rate"
, for N
> 100 the normal and exact curves almost coincide. So, one could perfectly use normal approximation on the parameter method
if ones data parameter precision is greater than 100, in general.
phi = (1/total) * sum((y - theta) ^ 2 * N)/g(theta)
var(y|theta,N) = (phi * g(theta))/N
If option = "ratioProp"
or option = "diffProp"
, funnel
can be used to compare units at two diferent periods. It plots a ratio (or difference) of proportions y versus a precision parameter rho to assess the change in the underlying proportion from pi1 to pi2. Normal approximations are used throughout, and for low (especially zero) counts, the function adds 0.5 to all arguments r and 1 to all arguments n in order to stabilize the estimates.
In the case option = "diffProp"
, the indicator is Y = (O2/N2 - O1/N1) and theta = pi2 - pi1. If option = "ratioProp"
, the indicator is Y = (O2/N2)/(O1/N1) and theta = pi2/pi1. It is convenient to work on a logarithmic scale, so that log(theta) is a target for log(Y) in this case as well.
For these two parameter options, the precision parameter (plotted at horizontal axis) can be interpreted as approximately the sample size per period.
A table with unit names, y, observed (Obs), expected (Exp) and admissions (N) for each unit, a binary column showing which units one would like to highlight in the plot (myunits) and final columns show which units are out of control.
Spiegelhalter, David J. "Funnel plots for comparing institutional performance." Statistics in medicine 24.8 (2005): 1185-1202.
# Loading data data(icu) # Some edition icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation / 100 icu <- icu[-which(icu$Unit == "F"),] icu$myunits <- ifelse(icu$Unit == "A",1,0) #my units icu <- droplevels(icu) # Getting the cross-sectional arguments to use in funnel x <- SMR.table(data = icu, group.var = "Unit", obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation") myunit_names <- unique(icu$Unit[which(icu$myunits == 1)]) x$myunits <- ifelse(x$Levels %in% myunit_names, 1,0) # Analysis of proportions f1 <- funnel(unit = x$Levels[-1], o = x[-1,]$Observed, theta = x$Observed[1] / x$N[1], n = x[-1,]$N, method = "exact", myunits = x$myunits[-1], option = "prop", plot = FALSE) f1 plot(f1, main = "Cross-sectional proportions") # To analyze rates (SMR) f2 <- funnel(unit = x$Levels[-1], y = x[-1,]$SMR, method = "exact", direct = TRUE, theta = x$SMR[1], e = x[-1,]$Expected, n = x[-1,]$N, o = x[-1,]$Observed, option = "rate", plot = FALSE) f2 plot(f2, main = "Cross-sectional rate (SMR)") # Creating a variable containing month information about each admission icu$month <- as.numeric(format(as.Date(icu$UnitAdmissionDateTime),"%m")) # First quarter dt1 <- icu[which(icu$month %in% c(1,2,3)),] # Second quarter dt2 <- icu[which(icu$month %in% c(4,5,6)),] # Getting the two period arguments to use in funnel z <- SMR.table(data = dt1, group.var = "Unit", obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation") w <- SMR.table(data = dt2, group.var = "Unit", obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation") z$myunits <- ifelse(z$Levels %in% myunit_names, 1,0) w$myunits <- ifelse(w$Levels %in% myunit_names, 1,0) # To analyze periods using ratio rates with e1 = e1 f3 <- funnel(unit = z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1], e1 = z$Expected[-1], n2 = w$N[-1], o2 = w$Observed[-1], e2 = z$Expected[-1], myunits = z$myunits[-1], option = "ratioRates", plot = FALSE) f3 plot(f3, main = "Ratio of SMRs of periods with same expectation of death") # To analyze periods using ratio rates with e1 =! e1 f4 <- funnel(unit <- z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1], e1 = z$Expected[-1], n2 = w$N[-1], o2 = w$Observed[-1], e2 = w$Expected[-1], option = "ratioRates", plot = FALSE) f4 plot(f4, main = "Ratio of SMRs of periods with different expectation of death", ylim = c(-1.5,1.5), xlim = c(0,200)) # To analyze periods by difference in proportions f5 <- funnel(unit <- z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1], n2 = w$N[-1], o2 = w$Observed[-1], option = "diffProp", plot = FALSE) f5 plot(f5, main = "Difference in proportions of death for two periods") # To analyze periods by ratio of proportions f6 <- funnel(unit <- z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1], n2 = w$N[-1], o2 = w$Observed[-1], option = "ratioProp", plot = FALSE) f6 plot(f6, main = "Ratio of proportions of death for two periods") rm(icu, x, z, w, dt1, dt2, unit, f1, f2, f3, f4, f5, f6)
# Loading data data(icu) # Some edition icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation / 100 icu <- icu[-which(icu$Unit == "F"),] icu$myunits <- ifelse(icu$Unit == "A",1,0) #my units icu <- droplevels(icu) # Getting the cross-sectional arguments to use in funnel x <- SMR.table(data = icu, group.var = "Unit", obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation") myunit_names <- unique(icu$Unit[which(icu$myunits == 1)]) x$myunits <- ifelse(x$Levels %in% myunit_names, 1,0) # Analysis of proportions f1 <- funnel(unit = x$Levels[-1], o = x[-1,]$Observed, theta = x$Observed[1] / x$N[1], n = x[-1,]$N, method = "exact", myunits = x$myunits[-1], option = "prop", plot = FALSE) f1 plot(f1, main = "Cross-sectional proportions") # To analyze rates (SMR) f2 <- funnel(unit = x$Levels[-1], y = x[-1,]$SMR, method = "exact", direct = TRUE, theta = x$SMR[1], e = x[-1,]$Expected, n = x[-1,]$N, o = x[-1,]$Observed, option = "rate", plot = FALSE) f2 plot(f2, main = "Cross-sectional rate (SMR)") # Creating a variable containing month information about each admission icu$month <- as.numeric(format(as.Date(icu$UnitAdmissionDateTime),"%m")) # First quarter dt1 <- icu[which(icu$month %in% c(1,2,3)),] # Second quarter dt2 <- icu[which(icu$month %in% c(4,5,6)),] # Getting the two period arguments to use in funnel z <- SMR.table(data = dt1, group.var = "Unit", obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation") w <- SMR.table(data = dt2, group.var = "Unit", obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation") z$myunits <- ifelse(z$Levels %in% myunit_names, 1,0) w$myunits <- ifelse(w$Levels %in% myunit_names, 1,0) # To analyze periods using ratio rates with e1 = e1 f3 <- funnel(unit = z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1], e1 = z$Expected[-1], n2 = w$N[-1], o2 = w$Observed[-1], e2 = z$Expected[-1], myunits = z$myunits[-1], option = "ratioRates", plot = FALSE) f3 plot(f3, main = "Ratio of SMRs of periods with same expectation of death") # To analyze periods using ratio rates with e1 =! e1 f4 <- funnel(unit <- z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1], e1 = z$Expected[-1], n2 = w$N[-1], o2 = w$Observed[-1], e2 = w$Expected[-1], option = "ratioRates", plot = FALSE) f4 plot(f4, main = "Ratio of SMRs of periods with different expectation of death", ylim = c(-1.5,1.5), xlim = c(0,200)) # To analyze periods by difference in proportions f5 <- funnel(unit <- z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1], n2 = w$N[-1], o2 = w$Observed[-1], option = "diffProp", plot = FALSE) f5 plot(f5, main = "Difference in proportions of death for two periods") # To analyze periods by ratio of proportions f6 <- funnel(unit <- z$Levels[-1], n1 = z$N[-1], o1 = z$Observed[-1], n2 = w$N[-1], o2 = w$Observed[-1], option = "ratioProp", plot = FALSE) f6 plot(f6, main = "Ratio of proportions of death for two periods") rm(icu, x, z, w, dt1, dt2, unit, f1, f2, f3, f4, f5, f6)
A dataset containing selected data from some ICU (intensive care units) admissions and its outcomes at the year 2013 used in the ORCHESTRA study.
icu
icu
A data frame with 13709 rows and 24 variables
Unit
The name of the ICU unit.
Age
Patient age.
UnitAdmissionDateTime
ICU unit admission date and time.
UnitDischargeDateTime
ICU unit discharge date and time.
UnitDischargeName
Unit admission outcome. Death = 1, or Discharge = 0.
UnitDestinationName
ICU unit destination after discharge.
HospitalAdmissionDate
Hospital admission date.
HospitalDischargeDate
Hospital discharge date.
HospitalDischargeName
Hospital admission outcome. Death = 1, Discharge = 0.
LengthHospitalStayPriorUnitAdmission
Hospital length of stay before unit admission.
AdmissionSourceName
The origin of the patient before ICU admission.
AdmissionTypeName_pri
Admission as Clinical treatment (1) Elective surgery (2), or Urgent surgery (3).
AdmissionReasonName_pri
Main diagnosis groups.
Vasopressors_D1
Vasopressors use at ICU first day admission? No = No, Yes = 1
IsMechanicalVentilation1h
Required mechanichal ventilation at 1st hour of admission. No = 0, Yes = 1.
CharlsonComorbidityIndex
Charlson comorbidity index.
Saps3Points
SAPS 3 score
Saps3DeathProbabilityStandardEquation
SAPS 3 estimated probability
SofaScore
SOFA score.
los
Unit length of stay, that is, the time difference between UnitAdmissionDateTime
and UnitDischargeDateTime
, in days.
Organizational characteristics, outcomes, and resource use in 78 Brazilian intensive care units: the ORCHESTRA study. Intensive Care Med. 2015 Dec;41(12):2149-60.
Collection of functions for data editing, usually used as lower levels for other functions.
f.num
is a wrapper to format numeric variables that are stored as character or factor, simultaneously it will try to detect comma spearated and replace it by dots before formating the variable as numeric. Any non-numeric encoding will be coerced to NA.
f.date
is a wrapper either to as.Date
or strptime
to format character or factor variables into dates. In Epimed Solutions database there are a few pre-specified formats that f.date
will try to detect and return a formated date. f.date
will try to dected if more than half of the elements in a vector have a pre-specified format. If so, the remaining will be coerced to NA if they have different format from the detected. See example.
remove.na
identifies all the empty spaces, i.e. the " " cells, of character or factor variables in a data.frame and returns the same data.frame with these empty cells replaced, by default, by NAs. It does not matter the length of the empty spaces. Also, remove.na
trims the leading and trailing empty spaces from all character and factor variables. It does not format the numeric variables. It may also return at the console a few information about the " " fields.
tab2tex
removes the empty rows, and also tunrs the rownames of a table epiDisplay::tableStack into the first column, to make it easier to paste the table into a rtf or latex document without empty rows or rownames conflicts.
trunc_num
truncates a numeric vector by replacing the values below the min value or above the max values by the min and max values respectively or optionall to NA. See example.
dummy.columns
takes a data.frame
with one column with concatatenated levels of a factor (or character) variable and return a data.frame
with additional columns with zeros and ones (dummy values), which names are the factor levels of the original column. See example below. rm.dummy.columns
is an internal function of dummy.columns
that deletes the new dummy columns which have less then a specified minimum events.
funnelEstimate
estimates funnel confidence intervals (CI) for binomial, poisson or normal distribution. Used inside funnel
.
winsorising
is an internal function that estimates a phi parameter after shirinking extreme z-scores. This parameter is used to inflate funnel CIs due overdispersion presence.
f.num(num.var) f.date(date) remove.na(data, replace = NA, console.output = TRUE) tab2tex(x, nc = ncol(x)) trunc_num(x, min, max, toNA = FALSE) dummy.columns( data, original.column, factors, scan.oc = FALSE, sep = ",", colnames.add = "Dummy.", min.events = NULL, rm.oc = FALSE, warn = FALSE, return.factor = TRUE ) rm.dummy.columns(data, colnames, event = "1", min.events = 50, warn = FALSE) funnelEstimate( y, range, u, totalAdmissions, totalObserved, p = 0.95, theta = 1, overdispersion = TRUE, dist = c("binomial", "normal", "poisson"), rho, gdetheta ) winsorising(z_score, u)
f.num(num.var) f.date(date) remove.na(data, replace = NA, console.output = TRUE) tab2tex(x, nc = ncol(x)) trunc_num(x, min, max, toNA = FALSE) dummy.columns( data, original.column, factors, scan.oc = FALSE, sep = ",", colnames.add = "Dummy.", min.events = NULL, rm.oc = FALSE, warn = FALSE, return.factor = TRUE ) rm.dummy.columns(data, colnames, event = "1", min.events = 50, warn = FALSE) funnelEstimate( y, range, u, totalAdmissions, totalObserved, p = 0.95, theta = 1, overdispersion = TRUE, dist = c("binomial", "normal", "poisson"), rho, gdetheta ) winsorising(z_score, u)
num.var |
A character, or factor variable to be formated as numeric. |
date |
A character or factor variable to be formated as date. |
data |
A |
replace |
By default, NA. But could be any vector of length 1. |
console.output |
Logical. Print at the console a few informations about the " " fields? |
x , nc
|
For |
min , max
|
For |
toNA |
For |
original.column |
A character vector representing the name of the column the be transformed in dummy variables. |
factors |
A character vector to make new dummy columns and to match values in |
scan.oc |
Default = FALSE, if TRUE, |
sep |
A character of legth one that systematically split the factors in the original columns. It wil be passed to the |
colnames.add |
The default is '= "Dummy_"'. This is a character vector of length one to stick in the |
min.events |
Either |
rm.oc |
Default is |
warn |
Default is |
return.factor |
Default is |
colnames |
For |
event |
A character string to be detected as a event. In |
y |
A numeric vector representing the "Standardized rate" for each unit, usually the SMR or possibly the SRU , accordind to |
range |
A numeric range representing for which values the funnel will be estimated. Usually the same variable in x axis (the precision parameter). |
u |
A number indicating the total amount of ICUs in the data. |
totalAdmissions |
The quantity of admissions in all units. |
totalObserved |
The quantity of observed death in all units. |
p |
A number between 0 and 1 indicating the confidence interval level for the funnel. |
theta |
Target value which specifies the desired expectation for institutions considered "in control". |
overdispersion |
Logical (default = FALSE); If TRUE, introduces an multiplicative over-dispersion factor phi that will inflate the CI null variance. See |
dist |
A character specifying the distribution about the funnel control limits will be estimated. It can be "binomial" (default), "normal" or "poisson". |
rho |
A numeric vector representing the funnel precision parameter. It is calculated inside |
gdetheta |
A numeric auxiliary numeric vector used to calculate |
z_score |
A numeric vector indicating the standardized Pearson residual or the "naive" Z-Score for each unit. |
Lunna Borges & Pedro Brasil
# Formating character or factor variable that should be numeric variables f.num(c("2,4000","10,0000","5.0400")) # Simulating a dataset y <- data.frame(v1 = sample(c(" F","M "," "), 10, replace = TRUE), v2 = sample(c(1:3," "), 10, replace = TRUE), v3 = sample(c("Alive","Dead",""), 10, replace = TRUE)) y # Replacing the "" cells by NA y <- remove.na(y) y rm(y) # Formating dates x <- f.date(c("28/02/2013","16/07/1998","31/03/2010")) x class(x) # The first element (i.e., the different one) is coerced to NA x <- f.date(c("2013-02-28 12:40","16/07/1998","31/03/2010")) x class(x) # The last element (i.e. the different one) is coerced to NA x <- f.date(c("2013-02-28 12:40","1998-07-16 18:50","31/03/2010")) x class(x) # Truncating numeric vectors trunc_num(1:12, min = 3, max = 10) # Truncating numeric vectors but returning NAs instead trunc_num(1:12, min = 3, max = 10, toNA = TRUE) # Simulating a dataset for dummy.columns example y <- data.frame(v1 = 1:20, v2 = sapply(1:20, function(i) toString(sample(c("Code1","Code2","Code3","Code4"), size = sample(2:4, 1), replace = FALSE)))) y # For a few of the codes in the original column y <- dummy.columns(y, original.column = "v2", factor = c("Code2","Code3")) y # For all codes in the original column y <- dummy.columns(y[, 1:2], original.column = "v2", scan.oc = TRUE) y # Funnel Estimate data(icu) icu funnelEstimate(y = icu$Saps3DeathProbabilityStandardEquation, range = 1, u = length(unique(icu$Unit)), totalAdmissions = nrow(icu), totalObserved = sum(icu$UnitDischargeName), theta = mean(icu$Saps3DeathProbabilityStandardEquation), dist = 'normal', rho = 1, gdetheta = 1) rm(y, icu)
# Formating character or factor variable that should be numeric variables f.num(c("2,4000","10,0000","5.0400")) # Simulating a dataset y <- data.frame(v1 = sample(c(" F","M "," "), 10, replace = TRUE), v2 = sample(c(1:3," "), 10, replace = TRUE), v3 = sample(c("Alive","Dead",""), 10, replace = TRUE)) y # Replacing the "" cells by NA y <- remove.na(y) y rm(y) # Formating dates x <- f.date(c("28/02/2013","16/07/1998","31/03/2010")) x class(x) # The first element (i.e., the different one) is coerced to NA x <- f.date(c("2013-02-28 12:40","16/07/1998","31/03/2010")) x class(x) # The last element (i.e. the different one) is coerced to NA x <- f.date(c("2013-02-28 12:40","1998-07-16 18:50","31/03/2010")) x class(x) # Truncating numeric vectors trunc_num(1:12, min = 3, max = 10) # Truncating numeric vectors but returning NAs instead trunc_num(1:12, min = 3, max = 10, toNA = TRUE) # Simulating a dataset for dummy.columns example y <- data.frame(v1 = 1:20, v2 = sapply(1:20, function(i) toString(sample(c("Code1","Code2","Code3","Code4"), size = sample(2:4, 1), replace = FALSE)))) y # For a few of the codes in the original column y <- dummy.columns(y, original.column = "v2", factor = c("Code2","Code3")) y # For all codes in the original column y <- dummy.columns(y[, 1:2], original.column = "v2", scan.oc = TRUE) y # Funnel Estimate data(icu) icu funnelEstimate(y = icu$Saps3DeathProbabilityStandardEquation, range = 1, u = length(unique(icu$Unit)), totalAdmissions = nrow(icu), totalObserved = sum(icu$UnitDischargeName), theta = mean(icu$Saps3DeathProbabilityStandardEquation), dist = 'normal', rho = 1, gdetheta = 1) rm(y, icu)
mortality_rate
function returns a list with the mortality rate and the number of patients for each month or quarter of the year.
mortality_rate( deaths, period = NULL, isQuarter = FALSE, isYear = FALSE, option = c("both", "monthly", "quarterly", "annual"), periodName = NULL, default_tapply = NA )
mortality_rate( deaths, period = NULL, isQuarter = FALSE, isYear = FALSE, option = c("both", "monthly", "quarterly", "annual"), periodName = NULL, default_tapply = NA )
deaths |
a numerical vector that only contains 0 and 1, indicating whether the patient was alive or dead, respectively. |
period |
a numerical vector that contains the order of months when the patients were admitted to the hospital unit. If period variable is NULL (the default), the function will return a single mortality rate. |
isQuarter |
logical indicating whether the |
isYear |
logical indicating whether the |
option |
a character string which determines what the function mortality_rate returns. If the option is chosen to be 'both' (the default), the function will return a list containing monthly mortality rate, quarterly mortality rate, annual mortality rate and the number of patients in each month, quarter and year. If the option is 'monthly', only the monthly mortality rate and the number of patients in each month are returned. If the option is 'quarterly', only the quarterly mortality rate and the number of patients in each quarter are returned. If the option is 'annual', only the annual mortality rate and the number of patients in each year are returned. |
periodName |
a character vector that contains the name of months when the patients were admitted to de hospital unit. Used only if period is not NULL. |
default_tapply |
argument to set the |
Camila Cardoso <[email protected]> Lunna Borges <[email protected]>
# Loading the dataset data(icu) # Creating a vector of months date <- as.Date(icu$UnitDischargeDateTime, tryFormats = '%d/%m/%Y') months <- as.numeric(format(date, '%m')) # Vector of deaths deaths <- icu$UnitDischargeName # Calculating monthly and quarterly mortality rate mortality_rate(deaths = deaths, period = months, option = 'both')
# Loading the dataset data(icu) # Creating a vector of months date <- as.Date(icu$UnitDischargeDateTime, tryFormats = '%d/%m/%Y') months <- as.numeric(format(date, '%m')) # Vector of deaths deaths <- icu$UnitDischargeName # Calculating monthly and quarterly mortality rate mortality_rate(deaths = deaths, period = months, option = 'both')
Compares ICU's (intensive care units) SRU with diferent severity classes or compares ICU's SRU at two diferents times. This comparison checks if the ICUs remains in the same quadrant after a time period, and highlights their rank changes over time.
plot.reclass
Plots a SMR vs. SRU scatter plot with the ICUs which had their quadrant/rank classification changed.
print.reclass
Prints a table with information about which ICUs changed from a classification to another.
reclass( x, y, same = TRUE, plot = FALSE, digits = 2, compare = c("SRU", "SMR", "BOTH"), decreasing = FALSE, complete.rank = TRUE ) ## S3 method for class 'reclass' print(x, ...) ## S3 method for class 'reclass' plot( x, ..., xlim_x = range(x$smr_x), ylim_x = range(x$sru_x), xlim_y = range(x$smr_y), ylim_y = range(x$sru_y), xlab = "SMR", ylab = "SRU", points.arg_x = list(pch = 21, col = "white", bg = "yellow", cex = 2), points.arg_y = list(pch = 21, col = "white", bg = "yellow", cex = 2), med.arg_x = list(col = "dodgerblue4", lwd = 2, lty = 1), med.arg_y = list(col = "dodgerblue4", lwd = 2, lty = 1), tert.arg_x = list(col = "darkorange2", lty = 2, lwd = 1), tert.arg_y = list(col = "darkorange2", lty = 2, lwd = 1), text.arg_x = list(labels = seq(1, nrow(x$tab)), cex = 0.6), text.arg_y = list(labels = seq(1, nrow(x$tab)), cex = 0.6), worse.arg_x = list(x = x$worse_x, pch = 21, col = "white", bg = "tomato", cex = 2), worse.arg_y = list(x = x$worse_y, pch = 21, col = "white", bg = "tomato", cex = 2), better.arg_x = list(x = x$better_x, pch = 21, col = "white", bg = "mediumseagreen", cex = 2), better.arg_y = list(x = x$better_y, pch = 21, col = "white", bg = "mediumseagreen", cex = 2), auto.legend = TRUE, leg.arg = list(x = "topleft", bty = "n", xpd = NA, inset = c(-1.8, -0.2), ncol = 1, horiz = F, pch = 19, cex = 0.8, pt.cex = 1.5), main.arg_x = list(main = "1st Stage"), main.arg_y = list(main = "2nd Stage") )
reclass( x, y, same = TRUE, plot = FALSE, digits = 2, compare = c("SRU", "SMR", "BOTH"), decreasing = FALSE, complete.rank = TRUE ) ## S3 method for class 'reclass' print(x, ...) ## S3 method for class 'reclass' plot( x, ..., xlim_x = range(x$smr_x), ylim_x = range(x$sru_x), xlim_y = range(x$smr_y), ylim_y = range(x$sru_y), xlab = "SMR", ylab = "SRU", points.arg_x = list(pch = 21, col = "white", bg = "yellow", cex = 2), points.arg_y = list(pch = 21, col = "white", bg = "yellow", cex = 2), med.arg_x = list(col = "dodgerblue4", lwd = 2, lty = 1), med.arg_y = list(col = "dodgerblue4", lwd = 2, lty = 1), tert.arg_x = list(col = "darkorange2", lty = 2, lwd = 1), tert.arg_y = list(col = "darkorange2", lty = 2, lwd = 1), text.arg_x = list(labels = seq(1, nrow(x$tab)), cex = 0.6), text.arg_y = list(labels = seq(1, nrow(x$tab)), cex = 0.6), worse.arg_x = list(x = x$worse_x, pch = 21, col = "white", bg = "tomato", cex = 2), worse.arg_y = list(x = x$worse_y, pch = 21, col = "white", bg = "tomato", cex = 2), better.arg_x = list(x = x$better_x, pch = 21, col = "white", bg = "mediumseagreen", cex = 2), better.arg_y = list(x = x$better_y, pch = 21, col = "white", bg = "mediumseagreen", cex = 2), auto.legend = TRUE, leg.arg = list(x = "topleft", bty = "n", xpd = NA, inset = c(-1.8, -0.2), ncol = 1, horiz = F, pch = 19, cex = 0.8, pt.cex = 1.5), main.arg_x = list(main = "1st Stage"), main.arg_y = list(main = "2nd Stage") )
x , y
|
Objects of class 'SRU'. x is the SRU analsys from the 1st period (e.g. first trimester) and y from the 2nd period (e.g. second trimester). For |
same |
Logical; If |
plot |
Logical. If |
digits |
Integer indicating the number of decimal places to be used in the output. |
compare |
The way one prefers to benchmark the ICUs: by "SRU" (default), "SMR" or "BOTH". If "BOTH", the ICUs will be ranked by their SRU. |
decreasing |
Logical. Should the sort order of ICU's rank be increasing or decreasing? |
complete.rank |
Logical. If |
... |
Arguments to be passed to methods (see |
xlim_x , ylim_x
|
Limits for x and y axis for 1st stage plot for |
xlim_y , ylim_y
|
Limits for x and y axis for 2nd stage plot for |
xlab , ylab
|
Labels of x and y axis for |
points.arg_x , points.arg_y
|
List of arguments passed to |
med.arg_x , med.arg_y
|
List of arguments passed to |
tert.arg_x , tert.arg_y
|
List of arguments passed to |
text.arg_x , text.arg_y
|
List of arguments passed to |
worse.arg_x , worse.arg_y
|
List of arguments passed to |
better.arg_x , better.arg_y
|
List of arguments passed to |
auto.legend |
Logical. If |
leg.arg |
List of arguments passed to |
main.arg_x , main.arg_y
|
List of arguments passed to |
reclass
retunrs a data.frame with the following columns:
Unit
Names of the ICU.
Admission
Number of admissions in each ICU.
From
ICU's initial efficiency quadrant.
To
ICU's final efficiency quadrant.
SRU.1st
ICU's initial SRU estimate.
SRU.2nd
ICU's final SRU estimate.
SMR.1st
ICU's initial SMR estimate.
SMR.2nd
ICU's final SMR estimate.
Rank1
ICU's initial SRU (or SMR) rank.
Rank2
ICU's final SRU (or SMR) rank.
plot.reclass
returns a scatter plot with grpahical comparison of the two periods/stages with their respective medians and tertiles.
Lunna Borges and Pedro Brasil
data(icu) # A little editing icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation / 100 icu <- icu[-which(icu$los < 0 ),] # Subseting the data for the 1st quarter x <- droplevels(icu[which(format(as.Date(icu$UnitAdmissionDate),"%m") %in% c("01","02","03")),]) # Subseting the data for the 2nd quarter y <- droplevels(icu[which(format(as.Date(icu$UnitAdmissionDate),"%m") %in% c("04","05","06")),]) # Running the SRU analysis for both quarters FirstQ <- SRU(prob = x$Saps3DeathProbabilityStandardEquation, death = x$UnitDischargeName, unit = x$Unit, los = x$los, score = x$Saps3Points, originals = TRUE, type = 1, plot = FALSE) FirstQ SecondQ <- SRU(prob = y$Saps3DeathProbabilityStandardEquation, death = y$UnitDischargeName, unit = y$Unit, los = y$los, score = y$Saps3Points, originals = TRUE, type = 1, plot = FALSE) SecondQ z <- reclass(x = FirstQ, y = SecondQ, same = TRUE) z plot(z) rm(icu, x, y, FirstQ, SecondQ, z)
data(icu) # A little editing icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation / 100 icu <- icu[-which(icu$los < 0 ),] # Subseting the data for the 1st quarter x <- droplevels(icu[which(format(as.Date(icu$UnitAdmissionDate),"%m") %in% c("01","02","03")),]) # Subseting the data for the 2nd quarter y <- droplevels(icu[which(format(as.Date(icu$UnitAdmissionDate),"%m") %in% c("04","05","06")),]) # Running the SRU analysis for both quarters FirstQ <- SRU(prob = x$Saps3DeathProbabilityStandardEquation, death = x$UnitDischargeName, unit = x$Unit, los = x$los, score = x$Saps3Points, originals = TRUE, type = 1, plot = FALSE) FirstQ SecondQ <- SRU(prob = y$Saps3DeathProbabilityStandardEquation, death = y$UnitDischargeName, unit = y$Unit, los = y$los, score = y$Saps3Points, originals = TRUE, type = 1, plot = FALSE) SecondQ z <- reclass(x = FirstQ, y = SecondQ, same = TRUE) z plot(z) rm(icu, x, y, FirstQ, SecondQ, z)
Calculates the standardized mortality ratio and its confidence interval. SMR, for a group, is defined as the ratio of the observed deaths in this group and the sum of the predicted individual probabilities of death by any model (expected deaths).
SMR.table
estimates at once the overall SMR and the SMR across several groups, e.g. ICU units or clinical characteristics. The SMR.table
can be ordered by the SMR estimate or its confidence intervals, facilitating the comparison of the units ranks.
forest.SMR
shows the SMR.table
output as a forest plot. The plot opens two windows and plot at the left side the values from the SMR.table
and at the right side the points and lines graphically representing each SMR and its confidence interval.
SMR( obs.var, pred.var, digits = 5, ci.method = c("Hosmer", "Byar"), ci.level = 0.95 ) SMR.table( data, group.var, obs.var, pred.var, digits = 5, use.label = FALSE, var.labels = attr(data, "var.labels")[match(group.var, names(data))], ci.method = c("Hosmer", "Byar"), ci.level = 0.95, reorder = c("no", "SMR", "lower.Cl", "upper.Cl"), decreasing = FALSE ) forest.SMR( x, mar1 = c(5.1, 1, 4.1, 1), mar.SMR = c(5.1, 7, 4.1, 1), overall.arg = list(x = 0.01, font = 2, las = 1, labels = var.labels[1], xpd = NA, adj = 0), NOE.overall.args = list(x = c(N.values.arg$x, O.values.arg$x, E.values.arg$x), font = 2, las = 1, xpd = NA), var.labels.arg = list(x = 0.01, font = 2, las = 1, cex = 1, xpd = NA, adj = 0), cat.labels.arg = list(x = 0.1, font = 3, las = 1, cex = 0.95, col = gray(0.4), xpd = NA, adj = 0), N.values.arg = list(x = 0.5, col = gray(0.4), xpd = NA), O.values.arg = list(x = 0.675, col = gray(0.4), xpd = NA), E.values.arg = list(x = 0.85, col = gray(0.4), xpd = NA), NOE.head.arg = list(font = 2, labels = c("N", "O", "E"), xpd = NA), Overall.seg.arg = list(col = "navyblue", xpd = NA, lwd = 2), Overall.p.arg = list(pch = 23, cex = 2, col = "black", bg = gray(0.4), xpd = NA), Overall.est.arg = list(x = smr.xlim[1] - 0.06, las = 1, font = 2, xpd = NA, adj = 1), cat.seg.arg = list(col = "navyblue", xpd = NA, lwd = 2), cat.p.arg = list(pch = 22, cex = 1, col = "black", bg = gray(0.4), xpd = NA), cat.est.arg = list(x = smr.xlim[1] - 0.06, las = 1, col = gray(0.4), xpd = NA, adj = 1), SMR.head.arg = list(smr.xlim[1] - 0.06, font = 2, labels = "SMR [95% CIs]", xpd = NA, adj = 1), smr.xlab = "Standardized Mortality Ratio", smr.xlim = "auto", grid = TRUE, digits = 3 )
SMR( obs.var, pred.var, digits = 5, ci.method = c("Hosmer", "Byar"), ci.level = 0.95 ) SMR.table( data, group.var, obs.var, pred.var, digits = 5, use.label = FALSE, var.labels = attr(data, "var.labels")[match(group.var, names(data))], ci.method = c("Hosmer", "Byar"), ci.level = 0.95, reorder = c("no", "SMR", "lower.Cl", "upper.Cl"), decreasing = FALSE ) forest.SMR( x, mar1 = c(5.1, 1, 4.1, 1), mar.SMR = c(5.1, 7, 4.1, 1), overall.arg = list(x = 0.01, font = 2, las = 1, labels = var.labels[1], xpd = NA, adj = 0), NOE.overall.args = list(x = c(N.values.arg$x, O.values.arg$x, E.values.arg$x), font = 2, las = 1, xpd = NA), var.labels.arg = list(x = 0.01, font = 2, las = 1, cex = 1, xpd = NA, adj = 0), cat.labels.arg = list(x = 0.1, font = 3, las = 1, cex = 0.95, col = gray(0.4), xpd = NA, adj = 0), N.values.arg = list(x = 0.5, col = gray(0.4), xpd = NA), O.values.arg = list(x = 0.675, col = gray(0.4), xpd = NA), E.values.arg = list(x = 0.85, col = gray(0.4), xpd = NA), NOE.head.arg = list(font = 2, labels = c("N", "O", "E"), xpd = NA), Overall.seg.arg = list(col = "navyblue", xpd = NA, lwd = 2), Overall.p.arg = list(pch = 23, cex = 2, col = "black", bg = gray(0.4), xpd = NA), Overall.est.arg = list(x = smr.xlim[1] - 0.06, las = 1, font = 2, xpd = NA, adj = 1), cat.seg.arg = list(col = "navyblue", xpd = NA, lwd = 2), cat.p.arg = list(pch = 22, cex = 1, col = "black", bg = gray(0.4), xpd = NA), cat.est.arg = list(x = smr.xlim[1] - 0.06, las = 1, col = gray(0.4), xpd = NA, adj = 1), SMR.head.arg = list(smr.xlim[1] - 0.06, font = 2, labels = "SMR [95% CIs]", xpd = NA, adj = 1), smr.xlab = "Standardized Mortality Ratio", smr.xlim = "auto", grid = TRUE, digits = 3 )
obs.var |
Observed death. Accepted values are 0 (absence) or 1 (presence) in a vector. For |
pred.var |
Death individual predictions (ranging from 0 to 1) in a vector. For |
digits |
Number of digits for rounding the output. |
ci.method |
Method to estimate the confidence interval. "Hosmer" (default) or "Byar" are acceptable values. |
ci.level |
Level of the confidence interval. Default is 0.95. |
data |
For |
group.var |
For |
use.label |
Logical. Default is FALSE. For |
var.labels |
A character vector with variables labels. The default is to replace the variable name by the label stored at attr(data, "var.labels"). But one may specify labels directly. |
reorder |
Default is "no". Possible values are: "no", "SMR","lower.Cl", and "upper.Cl". It will make the |
decreasing |
Logical. When 'reorderd' is TRUE, should the order be decreasing or incresing? See |
x |
For the |
mar1 , mar.SMR
|
Values to set the margins (mar parameter) of left and right windows. See |
overall.arg |
A list of arguments passed to |
NOE.overall.args |
A list of arguments passed to |
var.labels.arg |
A list of arguments passed to |
cat.labels.arg |
A list of arguments passed to |
N.values.arg |
A list of arguments passed to |
O.values.arg |
A list of arguments passed to |
E.values.arg |
A list of arguments passed to |
NOE.head.arg |
A list of arguments passed to |
Overall.seg.arg |
A list of arguments passed to |
Overall.p.arg |
A list of arguments passed to |
Overall.est.arg |
A list of arguments passed to |
cat.seg.arg |
A list of arguments passed to |
cat.p.arg |
A list of arguments passed to |
cat.est.arg |
A list of arguments passed to |
SMR.head.arg |
A list of arguments passed to |
smr.xlab |
Label of the x axis. Default is "Standardized Mortality Ratio". |
smr.xlim |
Limits of x axis of the |
grid |
Logical. If TRUE (default), it will draw a grid with the |
If SMR, then:
N
Number of subjects analyzed.
O
Observed number of deaths.
E
Expected number of deaths.
SMR
Standardized mortality ratio.
lower.Cl
lower confidence limit.
upper.Cl
upper confidence limit.
If SMR.table, then a data.frame with the same information as above, and the additional information is returned: "Variables" (variables names), "Levels" (variables levels).
If forest.SMR, then a plot is returned.
Lunna Borges and Pedro Brasil
David W. Hosmer and Stanley Lemeshow. Confidence intervals estimates of an index of quality performance basend on logistic regression models. Statistics in Medicine , vol. 14, 2161-2172 (1995)
# Loading a example data data(icu) # Setting variable labels to data attr(icu, "var.labels")[match(c("Unit", "IsMechanicalVentilation1h", "AdmissionTypeName_pri","Vasopressors_D1"), names(icu))] <- c("ICU unit","Mechanichal ventilation","Admission type","Vasopressors at admission") # Some editing icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation /100 icu$IsMechanicalVentilation1h <- as.factor(ifelse(icu$IsMechanicalVentilation1h == 1, "Yes", "No")) icu$AdmissionTypeName_pri <- as.factor(icu$AdmissionTypeName_pri) levels(icu$AdmissionTypeName_pri) <- c("Clinical","Elective surgery", "Urgent surgery") icu$Vasopressors_D1 <- as.factor(ifelse(icu$Vasopressors_D1 == 1, "Yes", "No")) # The overall SMR for the whole sample SMR(icu$UnitDischargeName, icu$Saps3DeathProbabilityStandardEquation) # The overall SMR and for some subgroups x <- SMR.table(data = icu, obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation", group.var = c( "IsMechanicalVentilation1h", "AdmissionTypeName_pri","Vasopressors_D1"), reorder = "no", decreasing = TRUE, use.label = TRUE) x # A forest plot for all groups SMR (resize the window may be required) forest.SMR(x, digits = 2) # The same thing but reordering the categories x <- SMR.table(data = icu, obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation", group.var = c( "IsMechanicalVentilation1h", "AdmissionTypeName_pri", "Vasopressors_D1"), reorder = "SMR", decreasing = TRUE, use.label = TRUE) forest.SMR(x, digits = 2) # The overall SMR and for all Units x <- SMR.table(data = icu, obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation", group.var = "Unit", reorder = "no", decreasing = TRUE, use.label = TRUE) x # A forest plot for all Units forest.SMR(x, digits = 2) # The same thing but reordering the categories x <- SMR.table(data = icu, obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation", group.var = "Unit", reorder = "SMR", decreasing = TRUE, use.label = TRUE) forest.SMR(x, digits = 2) rm(x, icu)
# Loading a example data data(icu) # Setting variable labels to data attr(icu, "var.labels")[match(c("Unit", "IsMechanicalVentilation1h", "AdmissionTypeName_pri","Vasopressors_D1"), names(icu))] <- c("ICU unit","Mechanichal ventilation","Admission type","Vasopressors at admission") # Some editing icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation /100 icu$IsMechanicalVentilation1h <- as.factor(ifelse(icu$IsMechanicalVentilation1h == 1, "Yes", "No")) icu$AdmissionTypeName_pri <- as.factor(icu$AdmissionTypeName_pri) levels(icu$AdmissionTypeName_pri) <- c("Clinical","Elective surgery", "Urgent surgery") icu$Vasopressors_D1 <- as.factor(ifelse(icu$Vasopressors_D1 == 1, "Yes", "No")) # The overall SMR for the whole sample SMR(icu$UnitDischargeName, icu$Saps3DeathProbabilityStandardEquation) # The overall SMR and for some subgroups x <- SMR.table(data = icu, obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation", group.var = c( "IsMechanicalVentilation1h", "AdmissionTypeName_pri","Vasopressors_D1"), reorder = "no", decreasing = TRUE, use.label = TRUE) x # A forest plot for all groups SMR (resize the window may be required) forest.SMR(x, digits = 2) # The same thing but reordering the categories x <- SMR.table(data = icu, obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation", group.var = c( "IsMechanicalVentilation1h", "AdmissionTypeName_pri", "Vasopressors_D1"), reorder = "SMR", decreasing = TRUE, use.label = TRUE) forest.SMR(x, digits = 2) # The overall SMR and for all Units x <- SMR.table(data = icu, obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation", group.var = "Unit", reorder = "no", decreasing = TRUE, use.label = TRUE) x # A forest plot for all Units forest.SMR(x, digits = 2) # The same thing but reordering the categories x <- SMR.table(data = icu, obs.var = "UnitDischargeName", pred.var = "Saps3DeathProbabilityStandardEquation", group.var = "Unit", reorder = "SMR", decreasing = TRUE, use.label = TRUE) forest.SMR(x, digits = 2) rm(x, icu)
SRU
calculates the standardized resource use for ICUs (Intensive Care Units) from information regarding admissions of individual patients. Resource use is represented by the patient's length of stay (LOS). Therefore the SRU for each unit is defined as the observed LOS divided by it's expected LOS. To estimate the expected LOS for each ICU one must define a severity score, here defined by the SAPS 3 score. In theory, the 'score' could be any score/probability that estimates death for each ICU admission.
The plot.SRU
function will return a SMR
versus SRU scatter plot with its medians and tertiles. Thus, it classifies each unit in the quadrants formed by these two medians as: most efficient (ME) which is the lower left quadrant (both SRU and SMR below their medians); least efficient (LE) is the upper right quadrant (both SRU and SMR above their medians); and least achieving (LA) - the lower right quadrant (SRU below and SMR above their medians); and over achieving (OA) - the upper left quadrant (SRU above and SMR below their medians).
print.SRU
Prints a object of class 'SRU'.
cut_in
is used to find limits to define severity classes which are used in SRU
function. The severity classes are necessary to calculate the average of days to produce one survivor and consequently to estimate the expected LOS in each ICU. Its rationale is to find the limits for the severity classes that yeld a desired average of days to produce one survivor. At some point in time, we made a study to test if different arrangements of the severity classes would yeld different classifications in the efficiency quadrants. Despite the fact that this study did not show any difference from each approach, we left the function in the package. Therefore, any arbitrary severity classes should yeild the same results.
SRUcalc
is a simpler function to estimte SRU and returns, for each unit, the SRU value, the observed and expected number of deaths, and the observed and expected LOS.
SRU( prob, death, unit, los, los.exp, class, score, plot = FALSE, type = 1, digits = 2, digits2 = 5, originals = FALSE, myunits = NULL ) ## S3 method for class 'SRU' print(x, ...) ## S3 method for class 'SRU' plot( x, ..., xlim = range(x$rates[, 2]), ylim = range(x$rates[, 1]), xlab = "SMR", ylab = "SRU", points.arg = list(pch = 21, col = "white", bg = "cadetblue3", cex = 1.5), med.arg = list(col = "dodgerblue4", lwd = 2, lty = 1), tert.arg = list(col = "darkorange2", lty = 2, lwd = 1), auto.legend = TRUE, leg.arg = list(x = "top", bty = "n", xpd = NA, inset = -0.2, ncol = 2), bty = "n", myunits = x$myunits, myunitspts.arg = list(pch = 21, col = "white", bg = "red", cex = 1.5), myunitstext.arg = list(pos = 1, font = 2, cex = 0.8) ) cut_in( score, los, death, unit, days, min = 200, exc.ICU = TRUE, complete = FALSE, digits = 5 ) SRUcalc(prob, death, unit, los, score, digits = 2)
SRU( prob, death, unit, los, los.exp, class, score, plot = FALSE, type = 1, digits = 2, digits2 = 5, originals = FALSE, myunits = NULL ) ## S3 method for class 'SRU' print(x, ...) ## S3 method for class 'SRU' plot( x, ..., xlim = range(x$rates[, 2]), ylim = range(x$rates[, 1]), xlab = "SMR", ylab = "SRU", points.arg = list(pch = 21, col = "white", bg = "cadetblue3", cex = 1.5), med.arg = list(col = "dodgerblue4", lwd = 2, lty = 1), tert.arg = list(col = "darkorange2", lty = 2, lwd = 1), auto.legend = TRUE, leg.arg = list(x = "top", bty = "n", xpd = NA, inset = -0.2, ncol = 2), bty = "n", myunits = x$myunits, myunitspts.arg = list(pch = 21, col = "white", bg = "red", cex = 1.5), myunitstext.arg = list(pos = 1, font = 2, cex = 0.8) ) cut_in( score, los, death, unit, days, min = 200, exc.ICU = TRUE, complete = FALSE, digits = 5 ) SRUcalc(prob, death, unit, los, score, digits = 2)
prob |
Individual predicted probability of death (ranging from 0 to 1) in a vector. |
death |
Observed death. Accepted values are 0 (absence) or 1 (presence) in a vector. |
unit |
A character or factor variable indicating the ICU where the patient is admitted. |
los |
A numeric variable indicating the observed length of stay for each patient. |
los.exp |
Estimated length of stay (LOS). This argument is optional and will be required only if |
class |
A factor variable indicating the class of severity score (e.g. SAPS 3). In the case of SAPS 3, this is a cut in the SAPS 3 score, grouping patients into severity classes. This will be required if the argument |
score |
A numeric vector with the Acute Physiology Score (SAPS) 3 score for each admission. The function will use this argument to know to which severity class each patient will be assigned to. It is used only when |
plot |
Logical; If |
type |
A Way to calculate SRU. If |
digits , digits2
|
Integer indicating the number of decimals to be used in the output. |
originals |
Logical; If |
myunits |
A character vector with the unit names which one would like to benchmark among all units. These units will be highlighted with dots of different collors in the plot. Default is |
x |
For |
... |
Arguments to be passed to |
xlim , ylim
|
Limits of x and y axis for |
xlab , ylab
|
Labels of x and y axis for |
points.arg |
List of arguments passed to |
med.arg |
List of arguments passed to |
tert.arg |
List of arguments passed to |
auto.legend |
Logical; If |
leg.arg |
List of arguments passed to |
bty |
A character string which determines the type of box that is drawn about plots. See |
myunitspts.arg |
List of arguments passed to |
myunitstext.arg |
List of arguments passed to |
days |
For |
min |
For |
exc.ICU |
Logical; For |
complete |
Logical; For |
Two tables: one with information about severity classes and the respective quantities required to estimate the expected LOS, and another with information about ICUs classified as Most Efficient (ME) or Least Efficient (LE).
Sev
Severity class.
Total
Total of patients.
Surv
Total of survivors.
Total.LOS
Total length of stay (days).
AvDays
Average days to produce a survivor.
N.Unit
Quantity of ICUs.
N.Pat
Quantity of patients.
SMR
Standardized Mortality Ratio Mean (standard deviation).
SRU
Standardized Resource Use Mean (standard deviation).
Most Efficient ICUs have SRU, SMR < median. Least Efficient ICUs have SRU, SMR > median.
cut_in
returns a vector with the limits to cut the severity score.
SRUcalc
returns a table with:
Unit
ICUs names.
SMR or SRU
Standardized Rate.
N
Number of subjects analyzed.
Observed
Observed number of deaths.
Expected
Expected number of deaths.
LOS_esp
Expected length of stay.
Lunna Borges and Pedro Brasil
Rothen HU, Stricker K, Einfalt J, Bauer P, Metnitz PGH, Moreno RP, Takala J (2007) Variability in outcome and resource use in intensive care units. Intensive Care Med 33:1329-1336
# Loading the dataset data(icu) # Removing data with inapropriate values and some editing icu <- icu[-which(icu$los < 0 ),] icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation / 100 # Setting classes acording to limits of SAPS 3 score days <- seq(1,100) cut_lims <- cut_in(icu$Saps3Points, icu$los, icu$UnitDischargeName, icu$Unit, days, exc.ICU = TRUE) icu$class <- cut(icu$Saps3Points, breaks = cut_lims, include.lowest = TRUE) # Estimating the SRU benchmarking myunit A and B x <- SRU(prob = icu$Saps3DeathProbabilityStandardEquation, death = icu$UnitDischargeName, unit = icu$Unit, los = icu$los, score = icu$Saps3Points, originals = TRUE, type = 1, plot = FALSE, myunits = c("A","B")) x plot(x) # To see the units rankings and individual SMR and SRU, ordering by its SRU x$rates[order(x$rates$sru),] # SRU with diferent severity classes created by cut_in function y <- SRU(prob = icu$Saps3DeathProbabilityStandardEquation, death = icu$UnitDischargeName, unit = icu$Unit, los = icu$los, score = icu$Saps3Points, originals = FALSE, type = 1, plot = FALSE, class = icu$class) y # Using SRUcalc SRUcalc(prob = icu$Saps3DeathProbabilityStandardEquation, death = icu$UnitDischargeName, unit = icu$Unit, los = icu$los, score = icu$Saps3Points) rm(x, y, days, icu, cut_lims)
# Loading the dataset data(icu) # Removing data with inapropriate values and some editing icu <- icu[-which(icu$los < 0 ),] icu$Saps3DeathProbabilityStandardEquation <- icu$Saps3DeathProbabilityStandardEquation / 100 # Setting classes acording to limits of SAPS 3 score days <- seq(1,100) cut_lims <- cut_in(icu$Saps3Points, icu$los, icu$UnitDischargeName, icu$Unit, days, exc.ICU = TRUE) icu$class <- cut(icu$Saps3Points, breaks = cut_lims, include.lowest = TRUE) # Estimating the SRU benchmarking myunit A and B x <- SRU(prob = icu$Saps3DeathProbabilityStandardEquation, death = icu$UnitDischargeName, unit = icu$Unit, los = icu$los, score = icu$Saps3Points, originals = TRUE, type = 1, plot = FALSE, myunits = c("A","B")) x plot(x) # To see the units rankings and individual SMR and SRU, ordering by its SRU x$rates[order(x$rates$sru),] # SRU with diferent severity classes created by cut_in function y <- SRU(prob = icu$Saps3DeathProbabilityStandardEquation, death = icu$UnitDischargeName, unit = icu$Unit, los = icu$los, score = icu$Saps3Points, originals = FALSE, type = 1, plot = FALSE, class = icu$class) y # Using SRUcalc SRUcalc(prob = icu$Saps3DeathProbabilityStandardEquation, death = icu$UnitDischargeName, unit = icu$Unit, los = icu$los, score = icu$Saps3Points) rm(x, y, days, icu, cut_lims)
Colection of functions for survival models performance analysis.
R2sh
estimates a distance-based estimator of survival predictive accuracy proposed by Schemper and Henderson. It was inspirated in survAUC::schemper function, but receives the predicted values directly. Besides that, R2sh
does bootstrap resampling and returns its confidence interval estimate.
R2pm
calculates a estimator of survival predictive accuracy proposed by Kent & O'Quigley and its bootstrap confidence interval.
cal.Slope
returns the calibration slope of a survival model and its bootstrap confidence interval.
R2sh(time, status, lin.pred, data, R) R2pm(lin.pred, R) cal.Slope(time, status, lin.pred, R)
R2sh(time, status, lin.pred, data, R) R2pm(lin.pred, R) cal.Slope(time, status, lin.pred, R)
time |
A vector of event times. |
status |
A indicator vector of event occurrence. |
lin.pred |
A vector of linear predictors of a survival model for each observation. (prognostic index) |
data |
A data.frame where to find column vectors. |
R |
The number of bootstrap replicates. Usually this will be a single positive integer. For importance resampling, some resamples may use one set of weights and others use a different set of weights. In this case R would be a vector of integers where each component gives the number of resamples from each of the rows of weights. To be passed to |
R2sh
returns a list with the following components:
D
: The estimator of predictive accuracy obtained from the covariate-free null model.
Dx
: The estimator of predictive accuracy obtained from the Cox model.
V
: The estimator of relative gains in predictive accuracy.
Mhat
: The absolute distance estimator obtained from the Cox model (evaluated at the event times of the test data).
Mhat.0
: The absolute distance estimator obtained from the covariate-free null model (evaluated at the event times of the test data).
timep
: The event times of the test data.
lower
: V lower confidence limit.
upper
: V upper confidence limit.
boot
: An object of class "boot
".
bootCI
: Boot confidence intervals resampling.
R2pm
returns a list with the following components:
r2
: The estimator of predictive accuracy obtained from the Cox model.
lower
: r2 lower confidence limit.
upper
: r2 upper confidence limit.
boot
: An object of class "boot
".
bootCI
: Boot confidence intervals resampling.
cal.Slope
returns a list with the following components:
slope
: The calibration slope measure of a survival model.
lower
: slope lower confidence limit.
upper
: slope upper confidence limit.
boot
: An object of class "boot
".
bootCI
: Boot confidence intervals resampling.
Lunna Borges <[email protected]>
Schemper, M. and R. Henderson (2000). Predictive accuracy and explained variation in Cox regression. Biometrics 56, 249-255.
Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application, Chapter 5. Cambridge University Press.
DiCiccio, T.J. and Efron B. (1996) Bootstrap confidence intervals (with Discussion). Statistical Science, 11, 189-228.
Efron, B. (1987) Better bootstrap confidence intervals (with Discussion). Journal of the American Statistical Association, 82, 171-200.
Kent, John T., and J. O. H. N. O'QUIGLEY. "Measures of dependence for censored survival data." Biometrika 75.3 (1988): 525-534.
van Houwelingen, Hans C. "Validation, calibration, revision and combination of prognostic survival models." Statistics in medicine 19.24 (2000): 3401-3415.
Rahman, M. Shafiqur, et al. "Review and evaluation of performance measures for survival prediction models in external validation settings." BMC medical research methodology 17.1 (2017): 60.
#### Survival model #### data(breastCancer) class(breastCancer$gradd1) <- "character" class(breastCancer$gradd2) <- "character" traindata <- breastCancer[sample(nrow(breastCancer), nrow(breastCancer)*2/3),] newdata <- breastCancer[-sample(nrow(breastCancer), nrow(breastCancer)*2/3),] model <- rms::cph(survival::Surv(rectime, censrec) ~ rms::rcs(age,6) + rms::rcs(nodes,3) + rms::rcs(pgr,3) + gradd1 + gradd2 + hormon, data = traindata) lp <- predict(model, newdata = newdata) #### R2sh example #### R2sh(newdata$rectime, newdata$censrec, lp, data = newdata, R = 50) #### R2pm example #### R2pm(lp, R = 50) #### cal.slope example #### cal.Slope(newdata$rectime, newdata$censrec, lp, R = 50)
#### Survival model #### data(breastCancer) class(breastCancer$gradd1) <- "character" class(breastCancer$gradd2) <- "character" traindata <- breastCancer[sample(nrow(breastCancer), nrow(breastCancer)*2/3),] newdata <- breastCancer[-sample(nrow(breastCancer), nrow(breastCancer)*2/3),] model <- rms::cph(survival::Surv(rectime, censrec) ~ rms::rcs(age,6) + rms::rcs(nodes,3) + rms::rcs(pgr,3) + gradd1 + gradd2 + hormon, data = traindata) lp <- predict(model, newdata = newdata) #### R2sh example #### R2sh(newdata$rectime, newdata$censrec, lp, data = newdata, R = 50) #### R2pm example #### R2pm(lp, R = 50) #### cal.slope example #### cal.Slope(newdata$rectime, newdata$censrec, lp, R = 50)
There are two functionalities: Tabulation of variables with the same possible range of distribution and stack into a new table with or without other descriptive statistics or to breakdown distribution of more than one row variables against a column variable
tableStack( vars, dataFrame, minlevel = "auto", maxlevel = "auto", count = TRUE, na.rm = FALSE, means = TRUE, medians = FALSE, sds = TRUE, decimal = 2, total = TRUE, var.labels = TRUE, var.labels.trunc = 150, reverse = FALSE, vars.to.reverse = NULL, by = NULL, vars.to.factor = NULL, iqr = "auto", prevalence = FALSE, percent = c("column", "row", "none"), frequency = TRUE, test = TRUE, name.test = TRUE, total.column = FALSE, simulate.p.value = FALSE, sample.size = TRUE, assumption.p.value = 0.01, NAcol = FALSE, NArow = FALSE, drplvls = FALSE )
tableStack( vars, dataFrame, minlevel = "auto", maxlevel = "auto", count = TRUE, na.rm = FALSE, means = TRUE, medians = FALSE, sds = TRUE, decimal = 2, total = TRUE, var.labels = TRUE, var.labels.trunc = 150, reverse = FALSE, vars.to.reverse = NULL, by = NULL, vars.to.factor = NULL, iqr = "auto", prevalence = FALSE, percent = c("column", "row", "none"), frequency = TRUE, test = TRUE, name.test = TRUE, total.column = FALSE, simulate.p.value = FALSE, sample.size = TRUE, assumption.p.value = 0.01, NAcol = FALSE, NArow = FALSE, drplvls = FALSE )
vars |
a vector of variables in the data frame. The imput may be given with or without quotes. |
dataFrame |
source data frame of the variables |
minlevel |
possible minimum value of items specified by user |
maxlevel |
possible maximum value of items specified by user |
count |
whether number of valid records for each item will be displayed |
na.rm |
whether missing value would be removed during calculation mean score of each person |
means |
whether means of all selected items will be displayed |
medians |
whether medians of all selected items will be displayed |
sds |
whether standard deviations of all selected items will be displayed |
decimal |
number of decimals displayed |
total |
display of means and standard deviations of total and average scores |
var.labels |
presence of descriptions of variables on the last column of output |
var.labels.trunc |
number of characters used for variable description |
reverse |
whether item(s) negatively correlated with other majority will be reversed |
vars.to.reverse |
variable(s) to reverse |
by |
a variable for column breakdown. If NONE is given, only the 'total column' will be displayed. More on Details. |
vars.to.factor |
variable(s) to be converted to factor for tabulaton |
iqr |
variable(s) to display median and inter-quartile range |
prevalence |
for logical or dichotomous variables, whether prevalence of the dichotomous row variable in each column subgroup will be displayed |
percent |
type of percentage displayed when the variable is categorical and for NArow when activated. Default is column |
frequency |
whether to display frequency in the cells when the variable is categorical and for NArow when activated |
test |
whether statistical test(s) will be computed |
name.test |
display name of the test and relevant degrees of freedom |
total.column |
whether to add 'total column' to the output or not |
simulate.p.value |
simulate P value for Fisher's exact test |
sample.size |
whether to display non-missing sample size of each column |
assumption.p.value |
level of Bartlett's test P value to judge whether the comparison and the test should be parametric |
NAcol |
whether to add 'NA column' to the output or not |
NArow |
whether to add 'NA rows' for each variable to the output or not |
drplvls |
whether to hide non used levels on factor and character variables or not |
This function is a clone of tableStack
from the epiDisplay
package. Comparing to the original, tt adds options to show the NA in the variables as categories, similar to the option useNA
in the table
function, and it also fix few bugs, such as showing the total.column
without the need to test = TRUE
, and to show or hide levels with zero counts without returning error.
This function simultaneously explores several variables with a fixed integer rating scale. For non-factor variables, the default values for tabulation are the minimum and the maximum of all variables but can be specified by the user.
When 'by' is omitted, all variables must be of the same class, and must be 'integer', 'factor' or 'logical. Some parameters are only used if by is omitted, others are only used if by is available. The by-omitted dependent variable are minlevel, maxlevel, count, na.rm, means, medians, sds, total, reverse, vars.to.reverse. The by-available dependent variables are iqr, prevalence, percent, frequency, test, name.test, total.column, simulate.p.value, sample.size, assumption.p.value, NArow, NAcol, drplvls. Unlike function 'alpha', the argument 'reverse' has a default value of FALSE. This argument is ignored if 'vars.to.reverse' is specified.
Options for 'reverse', 'vars.to.reverse' and statistics of 'means', 'medians', 'sds' and 'total' are available only if the items are not factor. To obtain statistics of factor items, users need to use 'unclassDataframe' to convert them into integer.
When the 'by' argument is given, 'reverse' and 'vars.to.reverse' do not apply, as mentioned before. Instead, columns of the 'by' variable will be formed. A table will be created against each selected variable. If the variable is a factor or coerced to factor with 'vars.to.factor', cross-tabulation will result with percents as specified, ie. "column", "row", or "none" (FALSE). For a dichotomous row variable, if set to 'TRUE', the prevalence of row variable in the form of a fraction is displayed in each subgroup column. For objects of class 'numeric' or 'integer', means with standard deviations will be displayed. For variables with residuals that are not normally distributed or where the variance of subgroups are significantly not normally distributed (using a significance level of 0.01), medians and inter-quartile ranges will be presented if the argument 'iqr' is set to "auto" (by default). Users may specify a subset of the selected variables (from the 'vars' argument) to be presented in such a form. Otherwise, the argument could be set as any other character string, except the variables names, to insist to present means and standard deviations.
When 'test = TRUE' (default), Pearson's chi-squared test (or a two-sided Fisher's exact test, if the sample size is small) will be carried out for a categorical variable or a factor. Parametric or non-parametric comparison and test will be carried out for a object of class 'numeric' or 'integer' (See 'iqr' and 'assumption.p.value' below). If the sample size of the numeric variable is too small in any group, the test is omitted and the problem reported.
For Fisher's exact test, the default method employs 'simulate.p.value = FALSE'. See further explanation in 'fisher.test' procedure. If the dataset is extraordinarily large, the option may be manually set to TRUE.
When 'by' is specified as a single character object (such as 'by="none"') or when 'by = NONE' there will be no column breakdown and all tests will be omitted. Only the total column is displayed. Only the 'total' column is shown.
If this 'total column' is to accompany the 'by' breakdown, the argument 'total.column=TRUE' should be specified. The 'sample.size' is TRUE by default. The total number of records for each group is displayed in the first row of the output. However, the variable in each row may have some missing records, the information on which is reported by NArow for each variable on 'vars' and by NAcol for the variable on 'by'.
By default, Epicalc sets 'var.labels=TRUE' in order to give nice output. However, 'var.labels=FALSE' can sometimes be more useful during data exploration. Variable numbers as well as variable names are displayed instead of variable labels. Names and numbers of abnormally distributed variables, especially factors with too many levels, can be easily identified for further relevelling or recoding.
The argument 'iqr' has a default value being "auto". Non-parametric comparison and test will be automatically chosen if Bartlett's test P value is below the 'assumption.p.value'.
The test can be forced to parametric by setting 'iqr=NULL' and to non-parametric by if iqr is set to the variable number of cont.var (See examples.).
an object of class 'tableStack' and 'list' when by=NULL
results |
an object of class 'noquote' which is used for print out |
items.reversed |
name(s) of variable(s) reversed |
total.score |
a vector from 'rowSums' of the columns of variables specified in 'vars' |
mean.score |
a vector from 'rowMeans' of the columns of variables specified in 'vars' |
mean.of.total.scores |
mean of total scores |
sd.of.total.scores |
standard deviation of total scores |
mean.of.average.scores |
mean of mean scores |
sd.of.average.scores |
standard deviation of mean scores |
When 'by' is specified, an object of class 'tableStack' and 'table is returned.
Virasakdi Chongsuvivatwong <[email protected]>
Caio Ferreira <[email protected]>
Lunna Borges <[email protected]>
Pedro Brasil <[email protected]>
'table', 'tab1', 'summ', 'alpha', 'unclassDataframe'
set.seed(1) data <- data.frame(sex = sample(c("M","F"), 50, rep = TRUE), age = sample(c(NA,20:70), 50, rep = TRUE), admissionType = sample(c(NA,"urgency", "clinical", "scheduled"), 50, rep = TRUE), hospitalizationTime = sample(c(0:10), 50, rep = TRUE), numberOfChildren = sample(c(NA,0:3), 50, rep = TRUE), cancerInFamily = sample(c(NA,TRUE,FALSE), 50, rep = TRUE), diabetesInFamily = sample(c(TRUE,FALSE), 50, rep = TRUE), thrombosisInFamily = sample(c(TRUE,FALSE), 50, rep = TRUE), mentaldiseasesInFamily = sample(c(TRUE,FALSE), 50, rep = TRUE), cardiadicdiseaseInFamily = sample(c(NA,TRUE,FALSE), 50, rep = TRUE), readmission = sample(c(NA,TRUE,FALSE), 50, rep = TRUE)) attach(data) tableStack(cancerInFamily:cardiadicdiseaseInFamily, dataFrame = data) detach(data) tableStack(cancerInFamily:cardiadicdiseaseInFamily, data) # Default data frame is data # "by" compares variables tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission) # "prevalence" returns the prevalence instead of the absolute values tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission, prevalence=TRUE) # "percent" as FALSE hides the percentage in parenthesis tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission, percent=FALSE) # "name.test" as FALSE hides the column that shows the tests names tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission, percent=FALSE, name.test=FALSE) # "NAcol" displays a column of NA values on the variable on "by" tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission, NAcol = TRUE) # "NArow" displays rows of NA values on the variables on "vars" tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission, NAcol = TRUE, NArow = TRUE) # the specification of the vars may be done as the range tableStack(vars=2:7, data, by=sex) # "by" var may be specified as "none" and the selected vars will be crossed only against the total tableStack(vars=2:7, data, by="none") # by = NONE works just as by = "none" tableStack(vars=2:7, data, by = NONE) # total.column displays a column of totals in adition to the variable on by tableStack(vars=2:7, data, by=sex, total.column=TRUE) var.labels <- c("sex", "Type of admission for each patient", "age", "Duration time in days of the patient's hospitalization", "Number of children that the patient have", "whether or not the patient has cancer in family", "whether or not the patient has diabetes in family", "whether or not the patient has thrombosis in family", "whether or not the patient has mental diseases in family", "whether or not the patient has cardiac diseases in family", "whether or not the patient is on a relapse admission") #setting the attribute var.labels attr(data, "var.labels") <- var.labels rm(var.labels) # May need full screen of Rconsole tableStack(vars=c(numberOfChildren,hospitalizationTime), data) # Fits in with default R console screen tableStack(vars=c(numberOfChildren,hospitalizationTime), data, var.labels.trunc=35) tableStack(vars=c(age,numberOfChildren,hospitalizationTime), data, reverse=TRUE) -> a a ## Components of 'a' have appropriate items reversed a$mean.score -> mean.score a$total.score -> total.score data$mean.score <- mean.score data$total.score <- total.score # hiding the test column tableStack(c(age, numberOfChildren,hospitalizationTime, mean.score,total.score), data, by=sex, test=FALSE) # variables specified on iqr will not display SD but IQR instead tableStack(3:5, data, by=sex, iqr=hospitalizationTime) ## 'vars' can be mixture of variables of different classes tableStack(3:5, data, by=admissionType, iqr=c(hospitalizationTime, total.score)) data$highscore <- mean.score > 4 # a variable with some comparison may be created easily tableStack(mean.score:highscore, data, by=sex, iqr=total.score) # the percentage information may be hidden tableStack(vars=c(readmission,admissionType), data, by=sex, percent="none") # it may be shown the prevalende of the # variable instead of the values themselves tableStack(vars=c(readmission,admissionType), data, by=sex, prevalence = TRUE) # the name of the tests may be hidden # while the test itself still shows tableStack(vars=c(readmission,admissionType), data, by=sex, name.test = FALSE) ## Variable in numeric or factor # as continuous varaibles tableStack(vars=3:5, data, by=sex) # as factors tableStack(vars=3:5, data, by=sex, vars.to.factor = 3:5) ## Using drplvls # a dataframe will be created containing a factor with an unused level bloodbank <- data.frame(AgeInDays = sample(0:15,200, replace = TRUE), Type = factor(sample(c("A","B","0"), 200, replace = TRUE), levels = c("A","B","AB","0")), Origin = sample(c("US","CA"), 200, replace = TRUE)) # by using drplvls the row of the unused fator is hidden tableStack(vars = c(AgeInDays, Type), bloodbank, by = Origin) #usual tableStack(vars = c(AgeInDays, Type), bloodbank, by = Origin, drplvls = TRUE) # with drplvls rm(total.score, mean.score, a, data, bloodbank)
set.seed(1) data <- data.frame(sex = sample(c("M","F"), 50, rep = TRUE), age = sample(c(NA,20:70), 50, rep = TRUE), admissionType = sample(c(NA,"urgency", "clinical", "scheduled"), 50, rep = TRUE), hospitalizationTime = sample(c(0:10), 50, rep = TRUE), numberOfChildren = sample(c(NA,0:3), 50, rep = TRUE), cancerInFamily = sample(c(NA,TRUE,FALSE), 50, rep = TRUE), diabetesInFamily = sample(c(TRUE,FALSE), 50, rep = TRUE), thrombosisInFamily = sample(c(TRUE,FALSE), 50, rep = TRUE), mentaldiseasesInFamily = sample(c(TRUE,FALSE), 50, rep = TRUE), cardiadicdiseaseInFamily = sample(c(NA,TRUE,FALSE), 50, rep = TRUE), readmission = sample(c(NA,TRUE,FALSE), 50, rep = TRUE)) attach(data) tableStack(cancerInFamily:cardiadicdiseaseInFamily, dataFrame = data) detach(data) tableStack(cancerInFamily:cardiadicdiseaseInFamily, data) # Default data frame is data # "by" compares variables tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission) # "prevalence" returns the prevalence instead of the absolute values tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission, prevalence=TRUE) # "percent" as FALSE hides the percentage in parenthesis tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission, percent=FALSE) # "name.test" as FALSE hides the column that shows the tests names tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission, percent=FALSE, name.test=FALSE) # "NAcol" displays a column of NA values on the variable on "by" tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission, NAcol = TRUE) # "NArow" displays rows of NA values on the variables on "vars" tableStack(cancerInFamily:cardiadicdiseaseInFamily, data, by= readmission, NAcol = TRUE, NArow = TRUE) # the specification of the vars may be done as the range tableStack(vars=2:7, data, by=sex) # "by" var may be specified as "none" and the selected vars will be crossed only against the total tableStack(vars=2:7, data, by="none") # by = NONE works just as by = "none" tableStack(vars=2:7, data, by = NONE) # total.column displays a column of totals in adition to the variable on by tableStack(vars=2:7, data, by=sex, total.column=TRUE) var.labels <- c("sex", "Type of admission for each patient", "age", "Duration time in days of the patient's hospitalization", "Number of children that the patient have", "whether or not the patient has cancer in family", "whether or not the patient has diabetes in family", "whether or not the patient has thrombosis in family", "whether or not the patient has mental diseases in family", "whether or not the patient has cardiac diseases in family", "whether or not the patient is on a relapse admission") #setting the attribute var.labels attr(data, "var.labels") <- var.labels rm(var.labels) # May need full screen of Rconsole tableStack(vars=c(numberOfChildren,hospitalizationTime), data) # Fits in with default R console screen tableStack(vars=c(numberOfChildren,hospitalizationTime), data, var.labels.trunc=35) tableStack(vars=c(age,numberOfChildren,hospitalizationTime), data, reverse=TRUE) -> a a ## Components of 'a' have appropriate items reversed a$mean.score -> mean.score a$total.score -> total.score data$mean.score <- mean.score data$total.score <- total.score # hiding the test column tableStack(c(age, numberOfChildren,hospitalizationTime, mean.score,total.score), data, by=sex, test=FALSE) # variables specified on iqr will not display SD but IQR instead tableStack(3:5, data, by=sex, iqr=hospitalizationTime) ## 'vars' can be mixture of variables of different classes tableStack(3:5, data, by=admissionType, iqr=c(hospitalizationTime, total.score)) data$highscore <- mean.score > 4 # a variable with some comparison may be created easily tableStack(mean.score:highscore, data, by=sex, iqr=total.score) # the percentage information may be hidden tableStack(vars=c(readmission,admissionType), data, by=sex, percent="none") # it may be shown the prevalende of the # variable instead of the values themselves tableStack(vars=c(readmission,admissionType), data, by=sex, prevalence = TRUE) # the name of the tests may be hidden # while the test itself still shows tableStack(vars=c(readmission,admissionType), data, by=sex, name.test = FALSE) ## Variable in numeric or factor # as continuous varaibles tableStack(vars=3:5, data, by=sex) # as factors tableStack(vars=3:5, data, by=sex, vars.to.factor = 3:5) ## Using drplvls # a dataframe will be created containing a factor with an unused level bloodbank <- data.frame(AgeInDays = sample(0:15,200, replace = TRUE), Type = factor(sample(c("A","B","0"), 200, replace = TRUE), levels = c("A","B","AB","0")), Origin = sample(c("US","CA"), 200, replace = TRUE)) # by using drplvls the row of the unused fator is hidden tableStack(vars = c(AgeInDays, Type), bloodbank, by = Origin) #usual tableStack(vars = c(AgeInDays, Type), bloodbank, by = Origin, drplvls = TRUE) # with drplvls rm(total.score, mean.score, a, data, bloodbank)