REAL-TIME SYSTEMS

Real-timeEdTechthatholdsatthousandsofconcurrentstudents.

Live collaboration, presence, instant feedback, synchronized state. WebSocket gateways, CRDT-based collaboration, pub/sub messaging, designed against explicit latency budgets, not bolted on after launch.

250K+

Daily active users

10M+

Peak requests per minute

50+

Products shipped

Zero

Downtime through migrations

Who we work with

Platforms at three inflection points.

Live class interaction layer

Who: Platforms with live classes that need chat, polls, breakouts, and real-time engagement signals.
Problem: Media stack handles video. Interaction layer needs to be separate, scaled independently, and resilient to media-layer failures.
What we do: Dedicated real-time interaction layer with WebSockets, room management, and event sourcing. Decoupled from media.

Common

Collaborative learning experiences

Who: Platforms with shared whiteboards, group notes, collaborative coding, or multiplayer learning activities.
Problem: Multiple students editing the same artifact in real time creates concurrency challenges that scale poorly with naive implementations.
What we do: CRDT-based collaboration with Yjs. Real-time sync engine. Persistence layer that snapshots state without blocking live edits.

Live assessment and feedback

Who: Platforms running live assessments where teachers see student work in progress.
Problem: Teacher dashboard needs to show 30 students' real-time progress without lagging or losing updates.
What we do: Event-sourced student work stream. Per-student channels with aggregated teacher view. Backpressure handling so a few fast typers do not flood the dashboard.

What we fix

Where platforms break. And how we rebuild them.

WebSocket connections dropping at scale

The pain: Above 10,000 concurrent connections, the WebSocket layer starts dropping connections under load. Students lose live updates.

Our approach: WebSocket gateway with sticky session routing. Per-instance connection caps. Auto-scaling based on connection count, not just CPU. Clients with reconnection logic that gracefully resyncs missed messages.

Message ordering inconsistency

The pain: In a live chat, messages arrive in different orders for different students. Confusion in class discussions.

Our approach: Per-room message ordering enforced server-side. Sequence numbers attached to each message. Client-side reordering buffer for messages that arrive out of order during reconnection.

Memory bloat from long-lived connections

The pain: WebSocket processes consume more and more memory over hours of uptime. OOM kills cascade outages.

Our approach: Connection-level memory budgets. Periodic compaction of in-memory state. Stateless gateway pattern where session state lives in Redis, not the gateway process.

Message delivery during partial failures

The pain: One backend service goes down. Messages route to it but never deliver. Students think the platform is broken.

Our approach: Pub/sub messaging with persistence (NATS JetStream, Redis Streams, or Kafka). Failed deliveries get retried automatically. Acknowledgement-based delivery for messages that must not be lost.

How we approach this

Methodology tuned for platforms at scale.

01
Real-time architecture spec (weeks 1-2)
Identify which features need real-time vs near-real-time vs eventual. Pick transport (WebSocket, SSE, polling) per feature. Design the message routing topology. Set explicit latency budgets that the architecture must meet.
02
Real-time gateway build (weeks 3-8)
WebSocket gateway with horizontal scaling. Pub/sub messaging backbone. Per-room channel management. Authentication and authorization integrated into the connection lifecycle.
03
Feature integration (weeks 6-14)
Real-time features layered onto the gateway: chat, presence, live polls, collaborative editing. Each feature treated as an independent module that consumes the gateway's primitives.
04
Scale testing and hardening (weeks 12-18)
Load testing at expected concurrency plus 3x. Chaos testing of partial failures. Observability tuned for real-time metrics, connection count, message latency percentiles, message loss rate. Pre-launch test of the full peak scenario.

The platform behind this work

250,000+ daily users. Multi-tenant by design.

Our multi-tenant EdTech platform powers white-label brands including Your CA Buddy and Youth Pathshala. It holds 250,000+ daily active users, 10 million requests per minute at peak, and has sustained zero downtime through three major scaling migrations. Every pattern on this page, the architecture, the decisions, the approach, has been battle-tested there first.

READ THE PLATFORM STORYHow the platform scaled from 20K to 250K daily active users over 3 years.Read case study →

FAQ

Questions founders ask about this.

WebSockets, Server-Sent Events, or polling, how to choose?+

WebSockets for bidirectional real-time (chat, collaborative editing, live whiteboards). SSE for server-push only (notifications, leaderboards, live counts). Polling for low-frequency updates where infrastructure simplicity beats latency. Most EdTech platforms end up using all three for different features.

How many concurrent WebSocket connections can one server handle?+

On a single Node.js server with proper tuning: 10,000-50,000. With clustered Node + load balancer: hundreds of thousands. With dedicated WebSocket gateways like Socket.io with Redis adapter: millions. The right answer depends on message volume per connection, not just connection count.

How do you handle live collaborative editing for documents or whiteboards?+

CRDT-based (Conflict-free Replicated Data Types) for true conflict-free collaboration. Yjs is the most mature library. Operational Transform (OT) is older and harder to implement correctly. For EdTech specifically, collaborative annotations, shared whiteboards, group note-taking, CRDT is the right default.

Presence at scale, how do you show 'who is online' for 10K students?+

Heartbeats from each connected client. Aggregated presence state stored in Redis with TTL. Per-room presence channels rather than global. UI updates throttled to 1-2 Hz so you do not flood clients with presence changes. Scales to hundreds of thousands of concurrent users with the right architecture.

Latency budget for real-time EdTech features?+

Live class chat: under 250ms perceived. Collaborative editing: under 150ms for character-level operations. Live polls: under 500ms. Below these numbers feels instant; above them feels broken. The architecture should be designed to those budgets, not designed first and tested against them.

Building real-time learning experiences?

Book a real-time architecture review↗

Real-timeEdTechthatholdsatthousandsofconcurrentstudents.

Platforms at three inflection points.

Live class interaction layer

Collaborative learning experiences

Live assessment and feedback

Where platforms break. And how we rebuild them.

WebSocket connections dropping at scale

Message ordering inconsistency

Memory bloat from long-lived connections

Message delivery during partial failures

Methodology tuned for platforms at scale.

Real-time architecture spec (weeks 1-2)

Real-time gateway build (weeks 3-8)

Feature integration (weeks 6-14)

Scale testing and hardening (weeks 12-18)

250,000+ daily users. Multi-tenant by design.

Questions founders ask about this.

Explore related EdTech practices.

Live Learning Platform

Platform Scaling

Architecture Audit

Building real-time learning experiences?