Project Brief

Education Data Simulation Engine

A public-safe bootstrap layer for assessment analytics, LMS workflow prototyping, decision-support dashboards, and learning-systems development.

Overview

Public-safe simulation foundation

This project demonstrates how to build realistic education analytics infrastructure without publishing protected student data, raw LMS exports, real gradebooks, teacher names, section labels, or school-private records.

The generator creates a coherent simulated department system rather than isolated fake rows: students, teachers, courses, sections, enrollments, assessment scores, attendance behavior, Canvas-style course artifacts, SQL warehouse tables, and validation checks.

Current Artifacts

  • Canonical synthetic school state JSON
  • All-school math assessment gradebook CSV
  • Course, section, and enrollment exports
  • Canvas-style course profile JSON files
  • DuckDB SQL warehouse and star-schema marts
  • LMS-to-SQL roster reconciliation outputs
  • Supabase/Postgres hosted serving path documentation
  • Pipeline validation narrative with 20 / 20 checks passing
  • Generation and validation scripts
  • Aggregate grade-level calibration diagnostics

Statistical Design

What this project proves

Assessment simulation

The model separates present-student academic scores from attendance and non-participation, so observed zeros are treated as administrative outcomes rather than readiness evidence.

Longitudinal readiness

Assignment 02 applies the reusable score engine with readiness updates, school-year growth, course and track context, teacher and section effects, regression to the mean, and observation noise.

Validation boundary

The validator checks row counts, schema, enrollment consistency, score bounds, assignment population policy, Canvas-style profile coverage, and banned private/source strings.

SQL analytics layer

The DuckDB warehouse normalizes Canvas-like JSON into raw LMS tables, reconciles rosters against canonical enrollments, and exports star-schema facts and dimensions for downstream reporting.

Hosted serving contract

The public build documents a Supabase/Postgres path for serving curated synthetic marts while preserving DuckDB as the reproducible local warehouse.

Relationship

Feeds the assessment portfolio

education-data-simulation-engine is the simulation, validation, and SQL warehouse foundation. assessment-intelligence is the analytics and reporting layer that consumes SQL-backed extracts for dashboards, diagnostics, reports, and decision-support workflows.

Current Scope

  • Seven-year horizon from 2025-2026 through 2031-2032
  • 696 all-ever synthetic students
  • 287 active students per school year
  • 5 synthetic teachers per school year
  • 174 synthetic sections across the horizon
  • 62 synthetic Canvas course JSON profiles
  • 14 populated assessment assignment windows
  • 4,018 long-form assessment score rows
  • 20 warehouse validation summary checks passing
  • 20 / 20 hosted-pipeline validation checks passing

Safety Boundary

Synthetic by design

Public artifacts may include fake identifiers, synthetic enrollments, synthetic assignment scores, generalized calibration parameters, and public-safe aggregate diagnostics. They must not include real students, rosters, LMS exports, private assessment artifacts, private teacher names, internal section labels, private paths, or credentials.

Warehouse Outputs

What downstream tools can consume

Analytic marts

Exports include assessment facts, LMS enrollment facts, readiness, growth, missingness, roster reconciliation, teacher-section effects, and validation summaries.

Star schema

The SQL layer provides student, course, section, teacher, assignment, assessment-score, and LMS-enrollment dimensions and facts.

Hosted path

The repo documents an optional Supabase/Postgres serving layer for public-safe synthetic analytics tables after private credentials are supplied locally.