Introduction
Why analytics queries kill EdTech platforms
Event sourcing as the foundation
| Column | Type | Why it's there |
|---|---|---|
| event_id | uuid | Idempotency. The same client retry doesn't double-count. |
| event_type | text | The verb. Indexed, low cardinality, used in every filter. |
| actor_id | bigint | Usually the student. Indexed for per-learner rollups. |
| tenant_id | bigint | School or org. Non-negotiable in multi-tenant EdTech. |
| occurred_at | timestamptz | When it happened on the client, not when we stored it. |
| payload | jsonb | Everything else. Course id, score, video position, device. |
The CQRS pattern for read-heavy analytics
Tools we use in production
| Tool | Reach for it when | The catch |
|---|---|---|
| Plain Postgres + matviews | You're under roughly 10M events per account and the team already runs Postgres. Default choice. | Aggregations get heavy past a point. Partition by month before you feel it. |
| TimescaleDB | Time-series queries dominate and you want to stay inside Postgres. Continuous aggregates are the killer feature for rollups. | It's still Postgres underneath, so very wide analytical scans eventually hit the same ceiling. |
| ClickHouse | You're past tens of millions of events and need column-store scan speed. This is the exam-platform tier, the kind of load that runs into 10M+ requests per minute. | A real second system to operate. Updates and deletes are awkward by design. Worth it only at genuine scale. |
| dbt | Your rollup logic has grown into a tangle of SQL nobody can read. dbt versions and tests the transformations. | It's a discipline, not a database. It organizes the work, it doesn't make queries faster. |
| Metabase | Admins and your own team need ad-hoc exploration without an engineer in the loop. | Great internally, looks like Metabase. Not the thing you embed in a teacher-facing product. |













