Data Stewardship and Data Governance

data sgp

Data sgp is swiftly overtaking physical assets as the primary asset class at many organizations. As such, keeping data reliable, safe, and accessible has become as important to the success of companies as maintaining factory machinery was in the industrial era. Data stewards are the people responsible for data governance; they help to keep the heaps of structured, semi-structured and unstructured information that are stored in multiple databases, platforms and spreadsheets under control and manageable.

The data sgp project is working towards its first goal, analysis of multi-proxy shale geochemical data covering the Neoproterozoic through Paleozoic time slice. A major effort has been made to assemble or generate the required multi-dimensional (lithology, sulfide and iron) data set and associated metadata for each of the geological periods.

The higher level functions (wrappers for the lower level SGP function) in the sgp package require WIDE formatted data and the sgpData_LONG version of this data is recommended for use with operational analyses. Using this long data format is more straightforward than using wide formatted data and more comprehensive documentation can be found in the SGP Data Analysis Vignette.

For the student growth percentiles that are used to compare or report on the relative performance of a student to their academic peers, SGP requires access to longitudinal (time dependent) student assessment data. This data is typically stored in the WIDE format where each case/row represents a single student and columns represent variables associated with that student at different points in time. In the SGP package, the sgpData set provides this longitudinal student data and is available in both the wide (sgpData) and long formats (sgpData_LONG). In addition, the sgpData_INSTRUCTOR_NUMBER dataset, an anonymized lookup table that associates a teacher with each of the students test records, can also be used with SGP analyses.

Unlike standardized tests, which only measure the achievement of students in relation to other students, SGP measures student growth by comparing a student’s current performance against the performance of their academic peers nationwide. It is this student growth percentile that teachers and administrators use to determine if a student has grown more or less than their peers, which growth percentiles they have reached, or how much growth they need to reach their achievement targets.

In order to perform an SGP analysis, users will need a copy of the SGP data available from their state’s educational agency. This data is usually stored in a database or data warehouse and may be accessed via web forms or APIs. SGP analyses are performed in R, an open-source programming language. This software is freely downloadable for Windows, Mac OSX and Linux from the CRAN website. As SGP analyses are quite advanced, familiarity with the R programming language is required. The numerous resources on the CRAN website and in the SGP Data Analysis Vignette can provide help for users getting started with this programming language. Once a user is familiar with the R programming environment, SGP analyses are fairly straightforward and can be executed very quickly.