Title: | Stan Models for Item Response Theory |
---|---|
Description: | Provides convenience functions and pre-programmed Stan models related to item response theory. Its purpose is to make fitting common item response theory models using Stan easy. |
Authors: | Daniel C. Furr |
Maintainer: | Daniel C. Furr <[email protected]> |
License: | BSD_3_clause + file LICENSE |
Version: | 1.0.7 |
Built: | 2025-03-07 00:18:45 UTC |
Source: | https://github.com/danielcfurr/edstan |
edstan attempts to make easy the fitting of standard item response theory models using rstan.
A user will generally want to use the following functions (in order) to fit a model:
irt_data
to format the data,
irt_stan
to fit a model, and
print_irt_stan
to view some results.
Additionally, labelled_integer
is some times helpful for data
formatting and stan_columns_plot
creates a plots of convergence
and other statistics by parameter vector. The package also includes six Stan
models (see irt_stan
for a list) and two example datasets
(aggression
and spelling
).
It is expected that once a user is comfortable fitting pre-defined
edstan models, they will write their own Stan models and fit them with
stan
, for which irt_stan
is a wrapper.
Case studies for each of the edstan models have been published.
Rasch and two-parameter logistic models
http://mc-stan.org/documentation/case-studies/rasch_and_2pl.html
(Generalized) partial credit model
http://mc-stan.org/documentation/case-studies/pcm_and_gpcm.html
(Generalized) rating scale model
http://mc-stan.org/documentation/case-studies/rsm_and_grsm.html
Item response data regarding verbal agression from 316 persons and 24 items. Participants were instructed to imagine four frustrating scenarios in which either another or oneself is to blame. For each scenario, they responded "yes", "perhaps", or "no" regarding whether they would react by cursing, scolding, and shouting. They also responded whether they would want to engage in those three behaviors, resulting in a total six items per scenario. An example item is, "A bus fails to stop for me. I would want to curse."
aggression
aggression
A long-form data.frame (one row per item response) with the following columns:
Integer person identifier.
Integer item identifier.
Original, polytomous response. 0 indicates "no", 1 "perhaps", and 3 "yes".
Dichotomized response. 0 indicates "no" and 1 indicates "perhaps" or "yes".
Brief description of the item.
Trait anger score for a person.
Indicator for whether person is male.
Indicator for whether item concerns actually doing the behavior instead of wanting to do it.
Indicator for whether item concerns another person being to blame instead of self to blame.
Indicator for whether item concerns scolding behavior instead of cursing or shouting.
Indicator for whether item concerns shouting behavior instead of cursing or scolding.
Vansteelandt, K. (2000). Formal models for contextualized personality psychology. Unpublished doctoral dissertation. K. U. Leuven, Belgium.
De Boeck, P. and Wilson, M. (2004) Explanatory Item Response Models. New York: Springer.
Create a Stan data list from an item response matrix or from long-form data.
irt_data(response_matrix = matrix(), y = integer(), ii = integer(), jj = integer(), covariates = data.frame(), formula = ~1)
irt_data(response_matrix = matrix(), y = integer(), ii = integer(), jj = integer(), covariates = data.frame(), formula = ~1)
response_matrix |
An item response matrix.
Columns represent items and rows represent persons.
NA may be supplied for missing responses.
The lowest score for each item should be 0, with exception to rating scale
models.
|
y |
A vector of scored responses for long-form data.
The lowest score for each item should be 0, with exception to rating scale
models.
NAs are not permitted, but missing responses may simply be ommitted
instead.
Required if |
ii |
A vector indexing the items in |
jj |
A vector indexing the persons in |
covariates |
An optional data frame containing (only) person-covariates.
It must contain one row per person or be of the same length as |
formula |
An optional formula for the latent regression that is applied
to |
A data list suitable for irt_stan
.
See labelled_integer
for a means of creating
appropriate inputs for ii
and jj
.
See irt_stan
to fit a model to the data list.
# For a response matrix ("wide-form" data) with person covariates: spelling_list <- irt_data(response_matrix = spelling[, 2:5], covariates = spelling[, "male", drop = FALSE], formula = ~ 1 + male) # Alternatively, the same may be created by: W <- cbind(intercept = 1, spelling[, "male"]) spelling_list <- irt_data(response_matrix = spelling[, 2:5], covariates = W, formula = NULL) # For long-form data (one row per item-person pair): agg_list_1 <- irt_data(y = aggression$poly, ii = aggression$item, jj = aggression$person) # Add a latent regression and use labelled_integer() with the items agg_list_2 <- irt_data(y = aggression$poly, ii = labelled_integer(aggression$description), jj = aggression$person, covariates = aggression[, c("male", "anger")], formula = ~ 1 + male*anger)
# For a response matrix ("wide-form" data) with person covariates: spelling_list <- irt_data(response_matrix = spelling[, 2:5], covariates = spelling[, "male", drop = FALSE], formula = ~ 1 + male) # Alternatively, the same may be created by: W <- cbind(intercept = 1, spelling[, "male"]) spelling_list <- irt_data(response_matrix = spelling[, 2:5], covariates = W, formula = NULL) # For long-form data (one row per item-person pair): agg_list_1 <- irt_data(y = aggression$poly, ii = aggression$item, jj = aggression$person) # Add a latent regression and use labelled_integer() with the items agg_list_2 <- irt_data(y = aggression$poly, ii = labelled_integer(aggression$description), jj = aggression$person, covariates = aggression[, c("male", "anger")], formula = ~ 1 + male*anger)
Estimate an item response model with Stan
irt_stan(data_list, model = "", ...)
irt_stan(data_list, model = "", ...)
data_list |
A Stan data list created with |
model |
The file name for one of the provided .stan files, or
alternatively, a user-created .stan file that accepts |
... |
Additional options passed to |
The following table lists the models inlcuded in edstan along with the
associated .stan files. The file names are given as the model
argument.
Model | File |
Rasch | rasch_latent_reg.stan |
Partial credit | pcm_latent_reg.stan |
Rating Scale | rsm_latent_reg.stan |
Two-parameter logistic | 2pl_latent_reg.stan |
Generalized partial credit | gpcm_latent_reg.stan |
Generalized rating Scale | grsm_latent_reg.stan |
Three simplified models are also available: rasch_simple.stan, pcm_simple.stan, rsm_simple.stan. These are (respectively) the Rasch, partial credit, and rating scale models omitting the latent regression. There is no reason to use these instead of the models listed above, given that the above models allow for rather than require the inclusion of covariates for a latent regression. Instead, the purpose of the simplified models is to provide a straightforward starting point researchers who wish to craft their own Stan models.
A stanfit-class
object.
See stan
, for which this function is a wrapper,
for additional options.
See irt_data
and labelled_integer
for functions
that facilitate creating a suitable data_list
.
See print_irt_stan
and print.stanfit
for ways of
getting tables summarizing parameter posteriors.
# List the Stan models included in edstan folder <- system.file("extdata", package = "edstan") dir(folder, "\\.stan$") # List the contents of one of the .stan files rasch_file <- system.file("extdata/rasch_latent_reg.stan", package = "edstan") cat(readLines(rasch_file), sep = "\n") ## Not run: # Fit the Rasch and 2PL models on wide-form data with a latent regression spelling_list <- irt_data(response_matrix = spelling[, 2:5], covariates = spelling[, "male", drop = FALSE], formula = ~ 1 + male) rasch_fit <- irt_stan(spelling_list, iter = 300, chains = 4) print_irt_stan(rasch_fit, spelling_list) twopl_fit <- irt_stan(spelling_list, model = "2pl_latent_reg.stan", iter = 300, chains = 4) print_irt_stan(twopl_fit, spelling_list) # Fit the rating scale and partial credit models without a latent regression agg_list_1 <- irt_data(y = aggression$poly, ii = labelled_integer(aggression$description), jj = aggression$person) fit_rsm <- irt_stan(agg_list_1, model = "rsm_latent_reg.stan", iter = 300, chains = 4) print_irt_stan(fit_rsm, agg_list_1) fit_pcm <- irt_stan(agg_list_1, model = "pcm_latent_reg.stan", iter = 300, chains = 4) print_irt_stan(fit_pcm, agg_list_1) # Fit the generalized rating scale and partial credit models including # a latent regression agg_list_2 <- irt_data(y = aggression$poly, ii = labelled_integer(aggression$description), jj = aggression$person, covariates = aggression[, c("male", "anger")], formula = ~ 1 + male*anger) fit_grsm <- irt_stan(agg_list_2, model = "grsm_latent_reg.stan", iter = 300, chains = 4) print_irt_stan(fit_grsm, agg_list_2) fit_gpcm <- irt_stan(agg_list_2, model = "gpcm_latent_reg.stan", iter = 300, chains = 4) print_irt_stan(fit_grsm, agg_list_2) ## End(Not run)
# List the Stan models included in edstan folder <- system.file("extdata", package = "edstan") dir(folder, "\\.stan$") # List the contents of one of the .stan files rasch_file <- system.file("extdata/rasch_latent_reg.stan", package = "edstan") cat(readLines(rasch_file), sep = "\n") ## Not run: # Fit the Rasch and 2PL models on wide-form data with a latent regression spelling_list <- irt_data(response_matrix = spelling[, 2:5], covariates = spelling[, "male", drop = FALSE], formula = ~ 1 + male) rasch_fit <- irt_stan(spelling_list, iter = 300, chains = 4) print_irt_stan(rasch_fit, spelling_list) twopl_fit <- irt_stan(spelling_list, model = "2pl_latent_reg.stan", iter = 300, chains = 4) print_irt_stan(twopl_fit, spelling_list) # Fit the rating scale and partial credit models without a latent regression agg_list_1 <- irt_data(y = aggression$poly, ii = labelled_integer(aggression$description), jj = aggression$person) fit_rsm <- irt_stan(agg_list_1, model = "rsm_latent_reg.stan", iter = 300, chains = 4) print_irt_stan(fit_rsm, agg_list_1) fit_pcm <- irt_stan(agg_list_1, model = "pcm_latent_reg.stan", iter = 300, chains = 4) print_irt_stan(fit_pcm, agg_list_1) # Fit the generalized rating scale and partial credit models including # a latent regression agg_list_2 <- irt_data(y = aggression$poly, ii = labelled_integer(aggression$description), jj = aggression$person, covariates = aggression[, c("male", "anger")], formula = ~ 1 + male*anger) fit_grsm <- irt_stan(agg_list_2, model = "grsm_latent_reg.stan", iter = 300, chains = 4) print_irt_stan(fit_grsm, agg_list_2) fit_gpcm <- irt_stan(agg_list_2, model = "gpcm_latent_reg.stan", iter = 300, chains = 4) print_irt_stan(fit_grsm, agg_list_2) ## End(Not run)
Transform a vector into consecutive integers
labelled_integer(x = vector())
labelled_integer(x = vector())
x |
A vector, which may be numeric, string, or factor. |
A vector of integers corresponding to entries in x
.
The lowest value will be 1, and the greatest value will equal the number of
unique elements in x
.
The elements of the recoded vector are named according to the original
values of x
.
The result is suitable for the ii
and jj
options for
irt_data
.
x <- c("owl", "cat", "pony", "cat") labelled_integer(x) y <- as.factor(x) labelled_integer(y) z <- rep(c(22, 57, 13), times = 2) labelled_integer(z)
x <- c("owl", "cat", "pony", "cat") labelled_integer(x) y <- as.factor(x) labelled_integer(y) z <- rep(c(22, 57, 13), times = 2) labelled_integer(z)
irt_stan
View a table of selected parameter posteriors after using irt_stan
print_irt_stan(fit, data_list = NULL, ...)
print_irt_stan(fit, data_list = NULL, ...)
fit |
A |
data_list |
An optional Stan data list created with
|
... |
Additional options passed to |
# Make a suitable data list: spelling_list <- irt_data(response_matrix = spelling[, 2:5], covariates = spelling[, "male", drop = FALSE], formula = ~ 1 + male) ## Not run: # Fit a latent regression 2PL twopl_fit <- irt_stan(spelling_list, model = "2pl_latent_reg.stan", iter = 300, chains = 4) # Get a table of parameter posteriors print_irt_stan(twopl_fit, spelling_list) # Or print_irt_stan(twopl_fit) ## End(Not run)
# Make a suitable data list: spelling_list <- irt_data(response_matrix = spelling[, 2:5], covariates = spelling[, "male", drop = FALSE], formula = ~ 1 + male) ## Not run: # Fit a latent regression 2PL twopl_fit <- irt_stan(spelling_list, model = "2pl_latent_reg.stan", iter = 300, chains = 4) # Get a table of parameter posteriors print_irt_stan(twopl_fit, spelling_list) # Or print_irt_stan(twopl_fit) ## End(Not run)
Item response data regarding student spelling performance on four words: infidelity, panoramic, succumb, and girder. The sample includes 284 male and 374 female undergraduate students from the University of Kansas. Each item was scored as either correct or incorrect.
spelling
spelling
A wide-form data.frame (one row per person) with the following columns:
Indicator for whether person is male.
Indicator for whether person spelled infidelity correctly.
Indicator for whether person spelled panoramic correctly.
Indicator for whether person spelled succumb correctly.
Indicator for whether person spelled girder correctly.
Thissen, D., Steinberg, L. and Wainer, H. (1993). Detection of Differential Item Functioning Using the Parameters of Item Response Models. In Differential Item Functioning, edited by Holland. P. and Wainer, H., 67-114. Hillsdale, NJ: Lawrence Erlbaum.
irt_stan
View a plot of summary statistics after using irt_stan
stan_columns_plot(fit, stat = "Rhat", ...)
stan_columns_plot(fit, stat = "Rhat", ...)
fit |
|
stat |
A string for the statistic from the |
... |
Additional options (such as |
A ggplot
object.
See stan_rhat
, which provides a histogram of
Rhat statistics.
# Make a suitable data list: spelling_list <- irt_data(response_matrix = spelling[, 2:5], covariates = spelling[, "male", drop = FALSE], formula = ~ 1 + male) ## Not run: # Fit a latent regression 2PL twopl_fit <- irt_stan(spelling_list, model = "2pl_latent_reg.stan", iter = 300, chains = 4) # Get a plot showing Rhat statistics rhat_columns(twopl_fit) # Get a plot showing number of effective draws rhat_columns(twopl_fit, stat = "n_eff") ## End(Not run)
# Make a suitable data list: spelling_list <- irt_data(response_matrix = spelling[, 2:5], covariates = spelling[, "male", drop = FALSE], formula = ~ 1 + male) ## Not run: # Fit a latent regression 2PL twopl_fit <- irt_stan(spelling_list, model = "2pl_latent_reg.stan", iter = 300, chains = 4) # Get a plot showing Rhat statistics rhat_columns(twopl_fit) # Get a plot showing number of effective draws rhat_columns(twopl_fit, stat = "n_eff") ## End(Not run)