11/16/2023 0 Comments Knox gifted academy absentee![]() Course 0.013974Īfter I selected the variables I wanted to focus on, I created scatterplots to visualize the relationships between the data. I chose to work with these variables as they showed the most significant correlation with the Scale Score. Ultimately, I focused on RIT Score, Grade Level, the number of days present, the economically disadvantaged status, and the student with disabilities status. However, I excluded TestDurationMinutes as it was not logical to use data from the MAP test for predicting the TCAP score. These variables were TestDurationMinutes, TestRITScore, EnrolledGrade, ED, SWD, and n_days_present. Based on the results (see below), only a few variables showed any correlation with the Scale Score. In the visualization stage of the project, I performed a Pearson correlation to determine the correlation between each variable and the TCAP Scale Score, which is the target variable. Although this method may not be ideal for automation, it suited my project's purposes. Therefore, it was easier for me to clean the data in Excel than to type it all into Python. While best practices suggest working with fewer files and keeping them consistent, for this project, I only needed one-off files. To clean the data, I deleted more than 100 columns from the original files using Excel. However, the Course column didn't show a strong correlation with the target variable, so I ultimately decided to exclude it from the analysis. Additionally, I converted Math scores to 1 and English scores to 2. After that, I realized that I didn't need both the Course and ContentAreaCode columns, so I deleted the latter. ![]() Next, I removed rows with Science and Social Studies because my analysis was only interested in Math and English scores. Firstly, I converted the Course and ContentAreaCode columns from strings into numeric data. To prepare the data for analysis, I had to make some changes. I eventually realized that I only needed to merge the Attendance, MAP, and TCAP files together after concatenating them, because the year of the test was not important for the final result. But both approaches resulted in a lot of NaN values and over 140k rows of data. At first, I thought an outer join would work, then I considered a left join. When I tried to merge the files together, I ran into some challenges because I overcomplicated the process. However, I understand that it's best practice to do all the cleaning in the code, and I did the remaining cleaning in Python. I did this to ensure that any identifying information was removed from the files and to get a more accurate picture of the relationship between attendance and Scale Scores. When cleaning the data, I opted to do some of it manually in Excel by removing unnecessary columns and creating a new column in the absentee data files for the number of days present. This document will cover the following sections: ![]() The data files are merged using the student's state ID as the Primary Key. Finally, TCAP data includes the ScaleScore for students. Attendance data shows how many days a student attended school in a year, and includes demographic data that adds additional variables for multiple regression analysis. NWEA MAP is a benchmarking test series that provides a snapshot of student progress through a grade-level, and is also used to predict achievement levels. The data sources used for this project include NWEA MAP Data, Attendance Data, and TCAP data from the years 22. This project is focused on using three different datasets to predict the ScaleScore for students on their Tennessee Comprehensive Assessment Program (TCAP) exam. ![]() Using different variables, about how reliably can we predict test scores? For this project, the answer was around 40% reliable. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |