The vtable
package serves the purpose of outputting automatic variable documentation that can be easily viewed while continuing to work with data.
vtable
contains four main functions: vtable()
(or vt()
), sumtable()
(or st()
), labeltable()
, and dftoHTML()
/dftoLaTeX()
. This vignette focuses on vtable()
.
vtable()
takes a dataset and outputs a formatted variable documentation file. This serves several purposes.
First, it allows for an easy generation of a variable documentation file, without requiring that one has already been created and made accessible through help(data)
, or dealing with creating and finding R help documentation files.
Second, it produces a list of variables (and, if provided, their labels) that can be easily viewed while working with the data, preventing repeated calls to head()
, and making it much easier to work with confusingly-named variables.
Third, the variable documentation file can be opened in a browser (with option out='browser'
, saving to file and opening directly, or by opening in the RStudio Viewer pane and clicking âShow in New Windowâ) where it can be easily searched with standard Find-in-Page functions like Ctrl/Cmd-F, allowsing you to search for the variable or variable label you want.
vtable()
function
vtable()
(or vt()
for short) syntax follows the following outline:
vtable(data,
out=NA,
file=NA,
labels=NA,
class=TRUE,
values=TRUE,
missing=FALSE,
index=FALSE,
factor.limit=5,
char.values=FALSE,
data.title=NA,
desc=NA,
note=NA,
anchor=NA,
col.width=NA,
col.align=NA,
align=NA,
note.align='l',
fit.page=NA,
summ=NA,
lush=FALSE,
opts=list())
The goal of vtable()
is to take a data set data
and output a usually-HTML (but data.frame
, kable
, csv
, and latex
options are there too) file with documentation concerning each of the variables in data
. There are several options as to what will be included in the documentation file, and each of these options are explained below. Throughout, the output will be built as kable
s since this is an RMarkdown document. However, generally you can leave out
at its default and it will publish an HTML table to Viewer (in RStudio) or the browser (otherwise). This will also include some additional information about your data that canât be demonstrated in this vignette:
data
The data
argument can take any data.frame
, data.table
, tibble
, or matrix
, as long as it has a valid set of variable names stored in the colnames()
attribute. The goal of vtable()
is to produce documentation of each of the variables in this data set and display that documentation, one variable per row on the output vtable
.
If data
has embedded variable or value labels, as the data set efc
does below, vtable()
will extract and use them automatically.
library(vtable)
#Example 1, using base data LifeCycleSavings
data(LifeCycleSavings)
vtable(LifeCycleSavings, out='kable')
LifeCycleSavings Name Class Values sr numeric Num: 0.6 to 21.1 pop15 numeric Num: 21.44 to 47.64 pop75 numeric Num: 0.56 to 4.7 dpi numeric Num: 88.94 to 4001.89 ddpi numeric Num: 0.22 to 16.71
#Example 2, using efc data with embedded variable labels
library(sjlabelled)
data(efc)
#Don't forget the handy shortcut vt()!
vt(efc)
efc Name Class Label Values c12hour numeric average number of hours of care per week Num: 4 to 168 e15relat numeric relationship to elder â1: spouse/partnerâ â2: childâ â3: siblingâ â4: daughter or son -in-lawâ â5: ancle/auntâ and 3 more e16sex numeric elderâs gender â1: maleâ â2: femaleâ e17age numeric elderâ age Num: 65 to 103 e42dep numeric elderâs dependency â1: independentâ â2: slightly dependentâ â3: moderately dependentâ â4: severely dependentâ c82cop1 numeric do you feel you cope well as caregiver? â1: neverâ â2: sometimesâ â3: oftenâ â4: alwaysâ c83cop2 numeric do you find caregiving too demanding? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ c84cop3 numeric does caregiving cause difficulties in your relationship with your friends? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ c85cop4 numeric does caregiving have negative effect on your physical health? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ c86cop5 numeric does caregiving cause difficulties in your relationship with your family? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ c87cop6 numeric does caregiving cause financial difficulties? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ c88cop7 numeric do you feel trapped in your role as caregiver? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ c89cop8 numeric do you feel supported by friends/neighbours? â1: neverâ â2: sometimesâ â3: oftenâ â4: alwaysâ c90cop9 numeric do you feel caregiving worthwhile? â1: neverâ â2: sometimesâ â3: oftenâ â4: alwaysâ c160age numeric carerâ age Num: 18 to 89 c161sex numeric carerâs gender â1: Maleâ â2: Femaleâ c172code numeric carerâs level of education â1: low level of educationâ â2: intermediate level of educationâ â3: high level of educationâ c175empl numeric are you currently employed? â0: noâ â1: yesâ barthtot numeric Total score BARTHEL INDEX Num: 0 to 100 neg_c_7 numeric Negative impact with 7 items Num: 7 to 28 pos_v_4 numeric Positive value with 4 items Num: 5 to 16 quol_5 numeric Quality of life 5 items Num: 0 to 25 resttotn numeric Job restrictions Num: 0 to 4 tot_sc_e numeric Services for elderly Num: 0 to 9 n4pstu numeric Care level â0: No Care Levelâ â1: Care Level 1â â2: Care Level 2â â3: Care Level 3â â4: Care Level 3+â nur_pst numeric Care level â1: Care Level 1â â2: Care Level 2â â3: Care Level 3/3+â out
The out
option determines what will be done with the resulting variable documentation file. There are several options for out
:
file
option, saves that to CSV. kable Returns a knitr::kable()
latex Returns a LaTeX table. latexpage Returns an independently-buildable LaTeX document.
By default, vtable
will select âviewerâ if running in RStudio, and âbrowserâ otherwise. If itâs being built in an RMarkdown document with knitr
, it will default to âkableâ. Note that an RMarkdown default to âkableâ will also include some nice formatting, where out = 'kable'
directly will give you a more basic kable
you can format yourself.
data(LifeCycleSavings)
vtable(LifeCycleSavings)
vtable(LifeCycleSavings,out='browser')
vtable(LifeCycleSavings,out='viewer')
htmlcode <- vtable(LifeCycleSavings,out='htmlreturn')
vartable <- vtable(LifeCycleSavings,out='return')
#I can easily \input this into my LaTeX doc:
vt(LifeCycleSavings,out='latex',file='mytable1.tex')
file
The file
argument will write the variable documentation file to an HTML or LaTeX file and save it. Will automatically append âhtmlâ or âtexâ filetype if the filename does not include a period.
data(LifeCycleSavings)
vt(LifeCycleSavings,file='lifecycle_variabledocumentation')
labels
The labels
argument will attach variable labels to the variables in data
. If variable labels are embedded in data
and those labels are what you want, the labels
argument is unnecessary. Set labels='omit'
if there are embedded labels but you do not want them in the table.
labels
can be used in any one of three formats.
labels
as a vector
labels
can be set to be a vector of equal length to the number of variables in data
, and in the same order. NA
values can be used for padding if some variables do not have labels.
#Note that LifeCycleSavings has five variables
data(LifeCycleSavings)
#These variable labels are taken from help(LifeCycleSavings)
labs <- c('numeric aggregate personal savings',
'numeric % of population under 15',
'numeric % of population over 75',
'numeric real per-capita disposable income',
'numeric % growth rate of dpi')
vtable(LifeCycleSavings,labels=labs)
LifeCycleSavings Name Class Label Values sr numeric numeric aggregate personal savings Num: 0.6 to 21.1 pop15 numeric numeric % of population under 15 Num: 21.44 to 47.64 pop75 numeric numeric % of population over 75 Num: 0.56 to 4.7 dpi numeric numeric real per-capita disposable income Num: 88.94 to 4001.89 ddpi numeric numeric % growth rate of dpi Num: 0.22 to 16.71
labs <- c('numeric aggregate personal savings',NA,NA,NA,NA)
vtable(LifeCycleSavings,labels=labs)
LifeCycleSavings Name Class Label Values sr numeric numeric aggregate personal savings Num: 0.6 to 21.1 pop15 numeric NA Num: 21.44 to 47.64 pop75 numeric NA Num: 0.56 to 4.7 dpi numeric NA Num: 88.94 to 4001.89 ddpi numeric NA Num: 0.22 to 16.71 labels
as a two-column data set
labels
can be set to a two-column data set (any type will do) where the first column has the variable names, and the second column has the labels. The column names donât matter.
This approach does not require that every variable name in data
has a matching label.
#Note that LifeCycleSavings has five variables
#with names 'sr', 'pop15', 'pop75', 'dpi', and 'ddpi'
data(LifeCycleSavings)
#These variable labels are taken from help(LifeCycleSavings)
labs <- data.frame(nonsensename1 = c('sr', 'pop15', 'pop75'),
nonsensename2 = c('numeric aggregate personal savings',
'numeric % of population under 15',
'numeric % of population over 75'))
vt(LifeCycleSavings,labels=labs)
LifeCycleSavings Name Class Label Values sr numeric numeric aggregate personal savings Num: 0.6 to 21.1 pop15 numeric numeric % of population under 15 Num: 21.44 to 47.64 pop75 numeric numeric % of population over 75 Num: 0.56 to 4.7 dpi numeric NA Num: 88.94 to 4001.89 ddpi numeric NA Num: 0.22 to 16.71 labels
as a one-row data set
labels
can be set to a one-row data set in which the column names are the variable names in data
and the first row is the variable names. The labels
argument can take any data type including data frame, data table, tibble, or matrix, as long as it has a valid set of variable names stored in the colnames()
attribute.
This approach does not require that every variable name in data
has a matching label.
#Note that LifeCycleSavings has five variables
#with names 'sr', 'pop15', 'pop75', 'dpi', and 'ddpi'
data(LifeCycleSavings)
#These variable labels are taken from help(LifeCycleSavings)
labs <- data.frame(sr = 'numeric aggregate personal savings',
pop15 = 'numeric % of population under 15',
pop75 = 'numeric % of population over 75')
vtable(LifeCycleSavings,labels=labs)
LifeCycleSavings Name Class Label Values sr numeric numeric aggregate personal savings Num: 0.6 to 21.1 pop15 numeric numeric % of population under 15 Num: 21.44 to 47.64 pop75 numeric numeric % of population over 75 Num: 0.56 to 4.7 dpi numeric NA Num: 88.94 to 4001.89 ddpi numeric NA Num: 0.22 to 16.71 class
The class
flag will either report or not report the class of each variable in the resulting variable table. By default this is set to TRUE
.
values
The values
flag will either report or not report the values that each variable takes. Numeric variables will report a range, logicals will report âTRUE FALSEâ, and factor variables will report the first factor.limit
(default 5) factors listed.
If the variable is numeric but has value labels applied by the sjlabelled
package, vtable()
will find them and report the numeric-label crosswalk.
data(LifeCycleSavings)
vtable(LifeCycleSavings,values=FALSE)
LifeCycleSavings Name Class sr numeric pop15 numeric pop75 numeric dpi numeric ddpi numeric LifeCycleSavings Name Class Values sr numeric Num: 0.6 to 21.1 pop15 numeric Num: 21.44 to 47.64 pop75 numeric Num: 0.56 to 4.7 dpi numeric Num: 88.94 to 4001.89 ddpi numeric Num: 0.22 to 16.71
#CO2 contains factor variables
data(CO2)
vtable(CO2)
CO2 Name Class Values Plant ordered âQn1â âQn2â âQn3â âQc1â âQc2â and 7 more Type factor âQuebecâ âMississippiâ Treatment factor ânonchilledâ âchilledâ conc numeric Num: 95 to 1000 uptake numeric Num: 7.7 to 45.5
#efc contains labeled values
#Note that the original value labels do not easily tell you what numerical
#value each label maps to, but vtable() does.
library(sjlabelled)
data(efc)
vtable(efc)
efc Name Class Label Values c12hour numeric average number of hours of care per week Num: 4 to 168 e15relat numeric relationship to elder â1: spouse/partnerâ â2: childâ â3: siblingâ â4: daughter or son -in-lawâ â5: ancle/auntâ and 3 more e16sex numeric elderâs gender â1: maleâ â2: femaleâ e17age numeric elderâ age Num: 65 to 103 e42dep numeric elderâs dependency â1: independentâ â2: slightly dependentâ â3: moderately dependentâ â4: severely dependentâ c82cop1 numeric do you feel you cope well as caregiver? â1: neverâ â2: sometimesâ â3: oftenâ â4: alwaysâ c83cop2 numeric do you find caregiving too demanding? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ c84cop3 numeric does caregiving cause difficulties in your relationship with your friends? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ c85cop4 numeric does caregiving have negative effect on your physical health? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ c86cop5 numeric does caregiving cause difficulties in your relationship with your family? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ c87cop6 numeric does caregiving cause financial difficulties? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ c88cop7 numeric do you feel trapped in your role as caregiver? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ c89cop8 numeric do you feel supported by friends/neighbours? â1: neverâ â2: sometimesâ â3: oftenâ â4: alwaysâ c90cop9 numeric do you feel caregiving worthwhile? â1: neverâ â2: sometimesâ â3: oftenâ â4: alwaysâ c160age numeric carerâ age Num: 18 to 89 c161sex numeric carerâs gender â1: Maleâ â2: Femaleâ c172code numeric carerâs level of education â1: low level of educationâ â2: intermediate level of educationâ â3: high level of educationâ c175empl numeric are you currently employed? â0: noâ â1: yesâ barthtot numeric Total score BARTHEL INDEX Num: 0 to 100 neg_c_7 numeric Negative impact with 7 items Num: 7 to 28 pos_v_4 numeric Positive value with 4 items Num: 5 to 16 quol_5 numeric Quality of life 5 items Num: 0 to 25 resttotn numeric Job restrictions Num: 0 to 4 tot_sc_e numeric Services for elderly Num: 0 to 9 n4pstu numeric Care level â0: No Care Levelâ â1: Care Level 1â â2: Care Level 2â â3: Care Level 3â â4: Care Level 3+â nur_pst numeric Care level â1: Care Level 1â â2: Care Level 2â â3: Care Level 3/3+â missing
The missing
flag, set to TRUE, will report the number of missing values in each variable. Defaults to FALSE.
index
The index
flag will either report or not report the index number of each variable. Defaults to FALSE.
factor.limit
If values
is set to TRUE
, then factor.limit
limits the number of factors displayed on the variable table. factor.limit
is by default 5, to cut down on clutter. The table will include the phrase âand moreâ to indicate that some factors have been cut off.
Setting factor.limit=0
will include all factors. If values=FALSE
, factor.limit
does nothing.
char.values
If values
is set to TRUE
, then char.values = TRUE
instructs vtable
to list the values that character variables take, as though they were factors. If you only want some of the character variables to have their values listed, use a character vector to indicate which variables.
data(USJudgeRatings)
USJudgeRatings$Judge <- row.names(USJudgeRatings)
USJudgeRatings$SecondCharacter <- 'Less Interesting'
USJudgeRatings$ThirdCharacter <- 'Less Interesting Still!'
#Show values for all character variables
vtable(USJudgeRatings,char.values=TRUE)
#Or just for a subset
vtable(USJudgeRatings,char.values=c('Judge','SecondCharacter'))
data.title
, desc
, note
, and anchor
data.title
will include a data title in the variable documentation file. If not set manually, this will default to the variable name for data
.
desc
will include a description of the data set in the variable documentation file. This will by default include information on the number of observations and the number of columns. To remove this, set desc='omit'
, or include any description and then include âomitâ as the last four characters.
note
will add a table note in the last row. note.align
determines its left/right/center alignment, but is only used with LaTeX (see below).
anchor
will add an anchor ID (<a name =
in HTML or \label{}
in LaTeX) to allow other parts of your document to link to it, if you are including your vtable
in a larger document.
data.title
and desc
will only show up in full-page vtable
s. That is, you wonât get them with out = 'return'
, out = 'csv'
, or out = 'latex'
(although out = 'latexpage'
works). note
and anchor
will only show up in formats that support multi-column cells and anchoring, so they wonât work with out = 'return'
or out = 'csv'
.
out = 'kable'
is a half-exception in that it will use data.title
as the caption for the kable
, and will use the note
as a footnote, but wonât use desc
or anchor
.
library(vtable)
data(LifeCycleSavings)
vtable(LifeCycleSavings)
vtable(LifeCycleSavings,data.title='Intercountry Life-Cycle Savings Data',
desc='omit')
vtable(LifeCycleSavings,data.title='Intercountry Life-Cycle Savings Data',
desc='Data on the savings ratio 1960â1970. omit')
vtable(LifeCycleSavings,data.title='Intercountry Life-Cycle Savings Data',
desc='Data on the savings ratio 1960â1970',
note='Data from Belsley, Kuh, and Welsch (1980)')
col.width
vtable()
will select default column widths for the variable table depending on which measures (name, class, label, values, summ)
are included. col.width
, as a vector of percentage column widths on the 0-100 scale, will override these defaults.
library(sjlabelled)
data(efc)
#The variable names in this data set are pretty short, and the value labels are
#a little cramped, so let's move that over.
vtable(efc,col.width=c(10,10,40,40))
col.align
col.align
can be used to adjust text alignment in HTML output. Set to âleftâ, ârightâ, or âcenterâ to align all columns, or give a vector of column alignments to do each column separately.
If you want to get tricky, you can add a semicolon afterwards and keep putting in whatever CSS attributes you want. They will be applied to the whole column.
This option is only for HTML output and will only work with out
values of âbrowserâ, âviewerâ, or âhtmlreturnâ.
library(sjlabelled)
data(efc)
vtable(efc,col.align = 'right')
align
, note.align
, and fit.page
These options are used only with LaTeX output (out
is âlatexâ or âlatexpageâ).
align
and note.align
are single strings used for alignment. align
will be used as column alignment in standard LaTeX syntax, for example âlcccâ for the left column left-aligned and the other three centered. note.align
is an alignment note specifically for any table notes set with note
(or significance stars), which enters as part of a \multicolumn
argument. These both accept âp{}â and other LaTeX column types.
Defaults to left-aligned âVariableâ columns and right-aligned everything else. If col.widths
is specified, align
defaults to âp{}â columns, with widths set by col.width
.
fit.page
can be used to ensure that the table is a certain width, and will be used as an entry to a \resizebox{}
. Set to \\textwidth
to set the table to text width, or .9\\textwidth
for 90% of the page, and so on, or any recognized width value in LaTeX.
For all of these, be sure to escape special characters, in particular backslashes.
library(sjlabelled)
data(efc)
vtable(efc,align = 'p{.3\\textwidth}cc', fit.page = '\\textwidth', out = 'latex')
summ
summ
will calculate summary statistics for all variables that report valid output on the given summary statistics functions. summ
is very flexible. It takes a character vector in which each element is of the form function(x)
, where function(x)
is any function that takes a vector and returns a single numeric value. For example, summ=c('mean(x)','median(x)','mean(log(x))')
would calculate the mean, median, and mean of the log for each variable.
summ
treats as special two vtable
functions: propNA(x)
and countNA(x)
, which give the proportion and count of NA values, and the count of non-NA values in the variable, respectively. These two functions are always reported first, and are the only functions that include NA values in their calculations.
library(sjlabelled)
data(efc)
vtable(efc,summ=c('mean(x)','countNA(x)'))
efc Name Class Label Values Summary c12hour numeric average number of hours of care per week Num: 4 to 168 countNA: 6, mean: 42.399 e15relat numeric relationship to elder â1: spouse/partnerâ â2: childâ â3: siblingâ â4: daughter or son -in-lawâ â5: ancle/auntâ and 3 more countNA: 7 e16sex numeric elderâs gender â1: maleâ â2: femaleâ countNA: 7 e17age numeric elderâ age Num: 65 to 103 countNA: 17, mean: 79.121 e42dep numeric elderâs dependency â1: independentâ â2: slightly dependentâ â3: moderately dependentâ â4: severely dependentâ countNA: 7 c82cop1 numeric do you feel you cope well as caregiver? â1: neverâ â2: sometimesâ â3: oftenâ â4: alwaysâ countNA: 7 c83cop2 numeric do you find caregiving too demanding? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ countNA: 6 c84cop3 numeric does caregiving cause difficulties in your relationship with your friends? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ countNA: 6 c85cop4 numeric does caregiving have negative effect on your physical health? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ countNA: 10 c86cop5 numeric does caregiving cause difficulties in your relationship with your family? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ countNA: 6 c87cop6 numeric does caregiving cause financial difficulties? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ countNA: 8 c88cop7 numeric do you feel trapped in your role as caregiver? â1: Neverâ â2: Sometimesâ â3: Oftenâ â4: Alwaysâ countNA: 8 c89cop8 numeric do you feel supported by friends/neighbours? â1: neverâ â2: sometimesâ â3: oftenâ â4: alwaysâ countNA: 7 c90cop9 numeric do you feel caregiving worthwhile? â1: neverâ â2: sometimesâ â3: oftenâ â4: alwaysâ countNA: 20 c160age numeric carerâ age Num: 18 to 89 countNA: 7, mean: 53.463 c161sex numeric carerâs gender â1: Maleâ â2: Femaleâ countNA: 7 c172code numeric carerâs level of education â1: low level of educationâ â2: intermediate level of educationâ â3: high level of educationâ countNA: 66 c175empl numeric are you currently employed? â0: noâ â1: yesâ countNA: 6 barthtot numeric Total score BARTHEL INDEX Num: 0 to 100 countNA: 25, mean: 64.547 neg_c_7 numeric Negative impact with 7 items Num: 7 to 28 countNA: 16, mean: 11.85 pos_v_4 numeric Positive value with 4 items Num: 5 to 16 countNA: 27, mean: 12.477 quol_5 numeric Quality of life 5 items Num: 0 to 25 countNA: 11, mean: 14.369 resttotn numeric Job restrictions Num: 0 to 4 countNA: 0, mean: 0.329 tot_sc_e numeric Services for elderly Num: 0 to 9 countNA: 0, mean: 1.014 n4pstu numeric Care level â0: No Care Levelâ â1: Care Level 1â â2: Care Level 2â â3: Care Level 3â â4: Care Level 3+â countNA: 9 nur_pst numeric Care level â1: Care Level 1â â2: Care Level 2â â3: Care Level 3/3+â countNA: 419 lush
The default vtable
settings may not be to your liking, and in particular you may prefer more information. Setting lush = TRUE
is an easy way to get more information. It will force char.values
and missing
to TRUE
, and will also set a default summ
value of c('mean(x)', 'sd(x)', 'nuniq(x)')
.
opts
You can create a named list where the names are the above options and the values are the settings for those options, and input it into vtable
using opts=
. This is an easy way to set the same options for many vtable
s.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4