Title: Reliable Rubric-Based Assessment of Physics Lab Reports: Data for Machine Learning Authors: Passonneau, Rebecca J.; Li, Zhaohui; Atil, Berk; Koenig, Kathleen M. Release Date: November, 2022 Abstract: This dataset documents an assessment of de-identified lab reports written by students in a large-enrollment, inquiry-oriented introductory physics class. Students are asked to write four lab reports during the one-semester course. We conducted a post-hoc reliability study of the grades on two of the lab reports from several semesters. To achieve reliable assessments, we created assessment rubrics that are similar to but much more detailed than the rubrics that define the expectations to students. The dataset consists of plain text versions of the dicussion and concusion sections of the submitted reports, the grades assigned by TAs, and the assessment scores provided by the reliable raters. ===================================================================== Two Lab Reports from Multiple Years of a Physics Curriculum --------------------------------------------------------------------- The lab reports come from two years of a large-enrollment college physics course that integrates scientific reasoning and science writing, with four increasingly difficult lab reports. Lab Reports --------------------------------------------------------------------- The first two of the four labs given in a semester are used in this work. The first lab was related the motion of a pendulum where students are required to analyze the effects of the length of the string, the mass of the bob at the end of the string, and the angle of the release. For the second lab, students learned about the effect of the applied force on the acceleration of a system, which is Newton's second law. Each original lab report consisted of an introduction, some graphs and tables to report results of individual experiments, and a discussion and conclusion. For this dataset, we extracted the discussion and conclusion from the submitted word documents, and converted equations into latex expressions. In total, there are 1078 Pendulum lab and 1005 Newton Lab reports. In addition, there are 2 grading rubrics one of which has 7 dimensions and the other one has 8 dimensions. Each dimension was rated on a 6-point scale (0 to 5). Annotation Process --------------------------------------------------------------------- First, 2 very detailed grading rubrics were created to minimize subjectivity in grading. Four advanced undergraduate students were hired as raters. Each had from 1 to 3 courses in college level physics. They were trained in practice phases, using reserved reports. After each phase, inter-rater agreement was measured and the rburics were revised to increase consistency. Data Structure --------------------------------------------------------------------- The data is seperated into 2 based on the lab types. "newton" folder contains the lab reports for the second lab and "pendulum" folder contains the lab reports for the first lab. Lab reports are in "txt" format and they only contain text data. The labels of reports are provided in two different "csv" files (NewtonLab.csv and PendulumLab.csv). The content of the fields are: ID string identifer of the lab report major string major of the students Dimension 1 integer score for the first dimension of the rubric Dimension 2 integer score for the second dimension of the rubric Dimension 3 integer score for the third dimension of the rubric Dimension 4 integer score for the fourth dimension of the rubric Dimension 5 integer score for the fifth dimension of the rubric Dimension 6 integer score for the sixth dimension of the rubric Dimension 7 integer score for the seventh dimension of the rubric (Dimension 8) integer score for the eighth dimension of the rubric (this one exists only in the second lab) source string identifier of source of assessment score, which is one of the original four raters (AB, AW, CZ, V), or a ground truth label (GT) based on the majority among N raters who assessed the same report status string indicates whether a single rater assessed the report (single), multiple raters (common), or if a consensus was arrived at through group discussion with researchers (discussion) assign integer indicator of which of N batches (assignments) produced the assessment Report string lab type Semester string the semester in which the lab was hold TA string identifier of the teaching assistant (TA) Reported Score integer grade assigned by the teaching assistant (TA) Max Score integer maximum grade (from TA) for this report RescaledTA float rescaled TA grade in [0,1] RescaledRater float rescaled rater grade in [0,1] Authors --------------------------------------------------------------------- Rebecca J. Passonneau, Professor Department of Computer Science and Engineering, Pennsylvania State University, State College, PA, USA Zhaohui Li, Ph. D. Student Department of Computer Science and Engineering, Pennsylvania State University, State College, PA, USA Berk Atil, Ph D Student Department of Computer Science and Engineering, Pennsylvania State University, State College, PA, USA Kathleen J. Koenig, Professor Department of Physics University of Cincinnati Acknowledgments ---------------------------------------------------------------------