I-STUDIO is an automatic assessment dataset for both NLP and statistical education use. This dataset consists of an instrument to assess the statistical reasoning skills of college students. This dataset derives from an investigation of an assessment instrument entitled: The Introductory Statistics Transfer of Understanding and Discernment Outcomes (I-STUDIO). The instrument consists of three scenarios, each one followed by two short answer question prompts. A detailed rubric was used for assessment, placing each answer on a 3-way scale of incorrect, partially correct, correct. The rubric provides an intensional specification of the criteria for each correctness value, along with an extensional specification consisting of example student answers. A reference answer is shown, along with two student answers, one that is correct (2), one that is partially correct (1). The data collection used a sample of 1,935 students (from colleges across the USA and elsewhere) who completed the I-STUDIO instrument, which included 6 open ended questions with one or more parts to each question, while describing their thinking out loud. For the reliability study, the I-STUDIO investigator provided two sets of labels, his original labels from 2015, and a new labeling he applied in 2021. The other two raters were graduate students in statistics, trained by the investigator. The data is all de-identified. The original study had IRB approval, and that IRB confirmed that the data can now be made public for research purposes in this de-identified form. To assemble the final Col-STAT dataset for NLP research, we performed data cleaning to eliminate non-answers, and responses that were unusually long (greater than 125 word tokens). Then it was partitioned into training, validation, and test sets in the proportion 8:1:1. The test set was initialized to contain all the responses that human raters had graded, so that at test time, inter-rater agreement of model predictions can be computed with the reliable human labels. The remaining data was randomly partitioned to produce, resulting in the training (N=5,018), validation (N=627), and test (N=627) set sizes. The question info.csv file contains detailed information for each question. It consists of six columns: * q id: The unique identifier for each question. * question context: The context or background information provided for the question. * question text: The actual text of the question. * reference list: A list of 12 reference answers for each question. This list is divided into: - Four completely correct answers, - Four partially correct answers, - Four incorrect answers. * correct reference list: A subset of the reference list containing the four completely correct reference answers (included in reference list). * partially correct reference list: A subset of the reference list containing the four partially correct reference answers (included in reference list). * incorrect reference list: A subset of the reference list containing the four incorrect reference answers (included in reference list). Additionally, IStudioFinalQualtrics.pdf contains the details of the Qualtrics survey, while istudioRubricExcerpt.pdf provides the assessment rubric for each question in the I-STUDIO dataset. The work related to the I-STUDIO dataset has already been published in the paper [1]. References Susan Lloyd, Beckman, Matthew D., Pearl, Dennis K., Passonneau, Rebecca J., Li, Zhaohui, and Wang, Zekun. Foundations for AI-Assisted Formative Assessment Feedback for Short-Answer Tasks in Large-Enrollment Classes. Rosario, Argentina, September 2022.