Project Brief

Education Data Simulation Engine

A public-safe bootstrap layer for assessment analytics, LMS workflow prototyping, decision-support dashboards, and learning-systems development.

Data Lab GitHub Repo Assessment Layer

Overview

Public-safe simulation foundation

This project demonstrates how to build realistic education analytics infrastructure without publishing protected student data, raw LMS exports, real gradebooks, teacher names, section labels, or school-private records.

The generator creates a coherent simulated department system rather than isolated fake rows: students, teachers, courses, sections, enrollments, assessment scores, attendance behavior, Canvas-style course artifacts, SQL warehouse tables, and validation checks.

Current Artifacts

Canonical synthetic school state JSON
All-school math assessment gradebook CSV
Course, section, and enrollment exports
Canvas-style course profile JSON files
DuckDB SQL warehouse and star-schema marts
LMS-to-SQL roster reconciliation outputs
Supabase/Postgres hosted serving path documentation
Pipeline validation narrative with 20 / 20 checks passing
Generation and validation scripts
Aggregate grade-level calibration diagnostics

Source repo Methodology Synthetic data

Statistical Design

What this project proves

Assessment simulation

The model separates present-student academic scores from attendance and non-participation, so observed zeros are treated as administrative outcomes rather than readiness evidence.

Longitudinal readiness

Assignment 02 applies the reusable score engine with readiness updates, school-year growth, course and track context, teacher and section effects, regression to the mean, and observation noise.

Validation boundary

The validator checks row counts, schema, enrollment consistency, score bounds, assignment population policy, Canvas-style profile coverage, and banned private/source strings.

SQL analytics layer

The DuckDB warehouse normalizes Canvas-like JSON into raw LMS tables, reconciles rosters against canonical enrollments, and exports star-schema facts and dimensions for downstream reporting.

Hosted serving contract

The public build documents a Supabase/Postgres path for serving curated synthetic marts while preserving DuckDB as the reproducible local warehouse.

Relationship

Feeds the assessment portfolio

education-data-simulation-engine is the simulation, validation, and SQL warehouse foundation. assessment-intelligence is the analytics and reporting layer that consumes SQL-backed extracts for dashboards, diagnostics, reports, and decision-support workflows.

Current Scope

Seven-year horizon from 2025-2026 through 2031-2032
696 all-ever synthetic students
287 active students per school year
5 synthetic teachers per school year
174 synthetic sections across the horizon
62 synthetic Canvas course JSON profiles
14 populated assessment assignment windows
4,018 long-form assessment score rows
20 warehouse validation summary checks passing
20 / 20 hosted-pipeline validation checks passing

Assessment brief Hosted dashboard

Safety Boundary

Synthetic by design

Public artifacts may include fake identifiers, synthetic enrollments, synthetic assignment scores, generalized calibration parameters, and public-safe aggregate diagnostics. They must not include real students, rosters, LMS exports, private assessment artifacts, private teacher names, internal section labels, private paths, or credentials.

Warehouse Outputs

What downstream tools can consume

Analytic marts

Exports include assessment facts, LMS enrollment facts, readiness, growth, missingness, roster reconciliation, teacher-section effects, and validation summaries.

Star schema

The SQL layer provides student, course, section, teacher, assignment, assessment-score, and LMS-enrollment dimensions and facts.

Hosted path

The repo documents an optional Supabase/Postgres serving layer for public-safe synthetic analytics tables after private credentials are supplied locally.