I-STUDIO is an automatic assessment dataset for both NLP and statistical 
education use. This dataset consists of an instrument to assess the 
statistical reasoning skills of college students.

This dataset derives from an investigation of an assessment instrument entitled: 
The Introductory Statistics Transfer of Understanding and Discernment Outcomes 
(I-STUDIO).

The instrument consists of three scenarios, each one followed by two short 
answer question prompts. A detailed rubric was used for assessment, placing 
each answer on a 3-way scale of incorrect, partially correct, correct. The 
rubric provides an intensional specification of the criteria for each 
correctness value, along with an extensional specification consisting of 
example student answers. A reference answer is shown, along with two student 
answers, one that is correct (2), one that is partially correct (1).

The data collection used a sample of 1,935 students (from colleges across the 
USA and elsewhere) who completed the I-STUDIO instrument, which included 6 
open ended questions with one or more parts to each question, while 
describing their thinking out loud. For the reliability study, the I-STUDIO
investigator provided two sets of labels, his original labels from 2015, and 
a new labeling he applied in 2021. The other two raters were graduate 
students in statistics, trained by the investigator. The data is all 
de-identified. The original study had IRB approval, and that IRB confirmed 
that the data can now be made public for research purposes in this 
de-identified form.

To assemble the final Col-STAT dataset for NLP research, we performed data 
cleaning to eliminate non-answers, and responses that were unusually long 
(greater than 125 word tokens). Then it was partitioned into training, 
validation, and test sets in the proportion 8:1:1. The test set was 
initialized to contain all the responses that human raters had graded, so 
that at test time, inter-rater agreement of model predictions can be computed 
with the reliable human labels. The remaining data was randomly partitioned 
to produce, resulting in the training (N=5,018), validation (N=627), and test 
(N=627) set sizes.

The question info.csv file contains detailed information for each question. 
It consists of six columns:

* q id: The unique identifier for each question.
* question context: The context or background information provided for the 
  question.
* question text: The actual text of the question.
* reference list: A list of 12 reference answers for each question. This list 
  is divided into:
  - Four completely correct answers,
  - Four partially correct answers,
  - Four incorrect answers.
* correct reference list: A subset of the reference list containing the four 
  completely correct reference answers (included in reference list).
* partially correct reference list: A subset of the reference list containing 
  the four partially correct reference answers (included in reference list).
* incorrect reference list: A subset of the reference list containing the four 
  incorrect reference answers (included in reference list).


Additionally, IStudioFinalQualtrics.pdf contains the details of the Qualtrics 
survey, while istudioRubricExcerpt.pdf provides the assessment rubric for each 
question in the I-STUDIO dataset. The work related to the I-STUDIO dataset 
has already been published in the paper [1].

References

Susan Lloyd, Beckman, Matthew D., Pearl, Dennis K., Passonneau, Rebecca J., 
Li, Zhaohui, and Wang, Zekun. Foundations for AI-Assisted Formative 
Assessment Feedback for Short-Answer Tasks in Large-Enrollment Classes. 
Rosario, Argentina, September 2022.