CASE STUDY

HowWeBuildaVirtualClassroomPlatform

A WebRTC live classroom: a custom SFU for low-latency video, an HLS fallback when a lecture gets big, a real-time whiteboard, breakout rooms, and recording that happens whether or not anyone remembers to start it.

Overview

DomainEdTech / Live Learning

Typical Timeline12 to 16 weeks

Team Shape2 Mobile + 1 Backend + 1 WebRTC

Market Range$90K to $140K

This is the playbook we build with our team when a learning product needs live, interactive classrooms instead of a video link bolted onto an LMS. The numbers up top are how a build like this usually shapes up. Everything below is the engineering underneath it. The decisions, the trade-offs, and a couple of things we got wrong the first time and fixed.

The Problem We Keep Seeing

An education team grows past a few thousand students and the duct tape starts to show. Live classes run on one video tool. The whiteboard is a shared doc in another tab. Attendance lives in the LMS. So every teacher spends the first ten minutes of class wrangling three browser tabs instead of teaching, and students feel the seams. The experience is stitched together, and stitched-together things lose people.

Recording is the part that quietly breaks the most. When it depends on a human clicking a button at the start of every session, a slice of those sessions just never get recorded. Nobody notices until a student asks for last week's lecture and it is not there. That is the kind of gap that erodes trust in a platform faster than any flashy missing feature.

Then there is the scaling cliff. A platform that is comfortable with a few hundred people on at once falls over when exam season pulls thousands into live sessions on the same evening. Teams look at hosted video SDKs to dodge the engineering, and the per-minute math stops them cold. At real volume those fees run into the thousands of dollars a month, and you still do not own the experience. So the brief lands the same way most times. One platform. Real-time video that holds up under load. A whiteboard people can actually draw on together. Recording that never depends on memory. And it has to survive the busiest night of the year.

How We Build It

We build it cross-platform from day one. A mobile app for students on iOS and Android, usually in Flutter, and a React web dashboard for teachers and admins. Students are mostly on phones. Teachers are mostly on a laptop running the room. Designing for both up front saves you from retrofitting one onto the other later, which never goes well.

The video layer is WebRTC with a custom Selective Forwarding Unit. Active speakers connect through the SFU for sub-200ms latency, the kind of responsiveness a teacher needs to read a room and not talk over a student. The clever part is what happens when a lecture gets big. Past a threshold, passive viewers stop getting individual WebRTC streams and start receiving an HLS broadcast instead. They lose a few seconds of latency they were never going to use anyway, and in exchange the cost of a 2,000-seat lecture stops looking like a 2,000-seat lecture. That hybrid is the single biggest reason a build like this stays affordable at scale.

The whiteboard runs on the Canvas API with a real-time sync channel underneath. Every stroke, shape, and bit of text reaches the rest of the room in well under a tenth of a second, so it feels like one shared surface and not a laggy screen-share. Recording is server-side on purpose. An FFmpeg pipeline composites the teacher video, the whiteboard, and any screen-share into one stream and ships it to S3, asynchronously, so it never steals cycles from the live session. No button. No human to forget. The session records because the server records it.

Tech Stack

Flutter, React, Node.js, WebRTC (custom SFU), Socket.io, Redis pub/sub, PostgreSQL, AWS (S3, CloudFront, EC2), Canvas API, FFmpeg, HLS streaming

The Decisions That Mattered

Build the SFU or rent one. That is the first real fork, and we land on building it more often than people expect. Managed WebRTC services charge per participant-minute, which is painless in a demo and brutal at volume. Run hundreds of sessions a week with dozens of people in each and that meter climbs into the thousands of dollars a month. A self-hosted SFU on a few EC2 boxes does the same job for a fraction of that. The cost is roughly three extra weeks of engineering, and on a platform with real usage that pays for itself almost immediately.

For whiteboard sync we reach for Redis pub/sub rather than broadcasting straight off one WebSocket server. The reason is boring and correct. Put 200 students drawing on the same canvas through a single process and that process becomes the ceiling. Redis fans the events out across several Node instances behind a load balancer, and because the canvas state lives in Redis and not in any one server's memory, a box can die mid-class and students reconnect somewhere else without losing the board. That last bit is the difference between a blip and a disaster.

Recording fought us, honestly. The tidy idea is to record in the browser with the MediaRecorder API. It works on a laptop and falls apart on phones, where mobile browsers handle it every which way. So we moved the whole thing server-side. FFmpeg grabs the composite stream, transcodes to H.264 and AAC, and pushes to S3, all of it off the hot path so a live class never feels it. The reliable boring answer beat the clever fragile one, which is how it usually goes.

And the mobile audio. The flutter_webrtc package gets you most of the way, but a real classroom has a cheap laptop speaker echoing, a fan in the background, and a student on a wobbly 3G connection. The stock echo cancellation and noise suppression are not built for that. So we drop into native code on iOS and Android for the audio path. It is unglamorous work, and it is the work that decides whether a teacher can stand to use the thing for an hour.

What Gets Built

Live Interactive Whiteboard

A shared canvas teachers draw, type, and annotate on, with everyone seeing it update in real time. Strokes render at 60fps locally while the sync layer pushes coordinate deltas to every client. Undo and redo, shape tools, text, a color palette, the usual kit. We add a follow-teacher mode so a student cannot wander off the active area by accident, which sounds minor until you watch a class of forty people scroll in forty directions. The board persists after class too, so students can come back and read the annotated slides instead of trying to remember them.

Breakout Rooms

One click splits a class into smaller groups. Each room is its own WebRTC session with its own whiteboard and chat. The teacher drops into any room, broadcasts a note to all of them at once, or yanks everyone back to the main session. Grouping can be random, hand-picked, or based on a roster you set ahead of time. We wire in a timer that auto-closes the rooms and pulls students back when it runs out, because relying on a teacher to herd a dozen rooms manually is how you lose the last five minutes of every class.

Automatic Recording and Playback

Every session records itself. The server-side FFmpeg pipeline folds the teacher video, the whiteboard, and any screen-share into a single recording, uploads it to S3, and has it ready for playback within a few minutes of the session ending. We line up chapter markers with whiteboard page changes so a student can jump straight to the topic they need instead of scrubbing through an hour. The whole point is that nothing rides on a person remembering to press record, because the version that rides on memory is the version that loses a third of your sessions.

Engagement Signals for Teachers

We design a scoring layer that watches a handful of signals during class. Camera on or off, how fast a student answers a quiz, chat activity, whiteboard interaction, and how long they actually stay present. Roll those into a per-student score and a live heatmap, and a teacher can spot who is drifting while there is still time to pull them back. Over weeks, the same signals help an admin spot a student sliding toward dropping out before it is too late to do anything about it. This is a design we build to the data we have, not a promise about any one school's numbers.

Live Quizzes and Polls

A teacher can fire off a quiz mid-class without leaving the room. Multiple choice, true or false, short answer. Results come back as a bar chart that grows live as students submit, which is its own small bit of theatre that keeps people watching. The quiz engine feeds the engagement layer, so a quick, thoughtful answer counts for something. Done well, this turns a passive lecture into something people lean into, which is the entire reason live beats a pre-recorded video in the first place.

Attendance and Reporting

Attendance logs itself the moment a student joins. The system records join time, leave time, and real active duration, so a student who joins and then minimizes the tab for most of the class is flagged differently from one who was actually there. Reports export to CSV or PDF on a weekly or monthly cadence. Admins set their own thresholds, and dropping below one can trigger an automatic note to the student, or a parent, without anyone manually chasing it.

What the Build Is Engineered to Do

These are the design targets we hold the system to, the engineering it takes to make a live classroom feel instant and stay reliable when a few thousand people pile in at once. They describe the architecture, not any one school's results.

Capability	Target	How
Speaker latency	Under 200ms	Active speakers on the custom WebRTC SFU
Big-lecture scale	Thousands concurrent	HLS fallback for passive viewers past a threshold
Whiteboard sync	Sub-50ms strokes	Delta updates fanned out over Redis pub/sub
Recording	Every session	Server-side FFmpeg composite, no manual trigger
Resilience	Survive a node loss	Canvas state in Redis, students reconnect elsewhere
Cost at scale	No per-minute meter	Self-hosted SFU instead of a hosted video SDK

Build It, or License a Video SDK?

Most teams start with a hosted SDK, and they should. For a quick launch it is the right call. It starts to pinch the moment you need your own layout, breakout rooms with a specific teaching flow, branded recording, or any feature the vendor will never expose. Here's the honest comparison for an organization running hundreds of concurrent sessions.

Approach	Cost	Timeline	Customization	Best For
Custom WebRTC Classroom	$90K to $140K upfront	3 to 5 months	Full control	EdTech running high session volume with its own pedagogy
Zoom SDK	$8K to $15K/mo at scale	2 to 4 weeks	Moderate (Zoom's UI constraints)	Quick launches where Zoom branding is acceptable
BigBlueButton	Free (self-hosted) plus $2K to $5K/mo infra	2 to 6 weeks	High (open source, fiddly setup)	Universities with DevOps capacity and tight budgets
Microsoft Teams Education	Free for schools (M365 license)	1 to 2 weeks	Low (locked to Microsoft's ecosystem)	K-12 schools already living in Microsoft 365

So when does building pay off? Roughly when per-minute SDK pricing crosses what self-hosted infrastructure costs, which at high session volume can be a wide gap year over year. But the invoice is rarely the real reason. The reason is that you stop being boxed in by someone else's product. No 40-minute limit, no UI you cannot touch, no recording pipeline you cannot reach into. You own the classroom.

The same WebRTC pattern travels well beyond classrooms. Telemedicine consultations use the same shape. Live customer support with screen-share uses it. Corporate training platforms run on it. If you are weighing a custom build against another year of SDK fees, the deciding question is control. Do you need your own layouts, breakout flows, or branded recording? If yes, building with a product partner is usually the move.

What We've Learned Building These

Mobile WebRTC is harder than the demos make it look. The package handles a clean peer connection on a good laptop just fine. Real classrooms are not clean. There is echo off a cheap speaker, a fan droning in the background, a student on flaky 3G. The audio work, the native echo cancellation and noise suppression, is where the time goes, and it is also the work that decides whether teachers actually like using the thing. Audio quality is teacher satisfaction. They are almost the same metric.

Never trust a collaboration feature until you have load-tested it at ten times the peak. Our first whiteboard sync sent the full canvas every hundred milliseconds. With ten people, lovely. With two hundred it was eating megabytes a second per client and the whole thing crawled. We switched to sending only the new strokes since the last frame, and the bandwidth fell off a cliff in the good way. The lesson stuck. Anything real-time, you push it to absurd numbers before you believe it.

The hybrid WebRTC and HLS split is the right architecture, but the timing of the handoff is everything. Flip to HLS too early and people feel the latency jump from a fifth of a second to several seconds. Too late and the SFU is gasping. We settle on a load-based trigger. When the SFU sits over 70% CPU for half a minute, new joiners go to HLS while everyone already on WebRTC stays put until they leave. Smooth for the people in the room, kind to the server.

Automatic recording fixes a problem nobody admits they have. The version that depends on a human pressing a button loses sessions, every single time, and nobody finds out until a recording is missing. Making the server own it closes that gap entirely. And recordings get used in ways you do not plan for. Students binge them at 1.5x the night before an exam, which is exactly the kind of real usage that tells you which feature to build next.

Frequently Asked Questions

How long does it take to build a virtual classroom platform?

Plan on 12 to 16 weeks for a production build. An MVP with live video and chat can ship in 8 to 10 weeks. The interactive whiteboard, breakout rooms, automatic recording, and engagement scoring are what push it toward the full timeline. The biggest variable is how big the rooms get, because a 12-person seminar and a 2,000-seat lecture are genuinely different engineering problems.

How much does a virtual classroom platform cost to build?

In the market today a production virtual classroom usually runs $90,000 to $140,000 for the first build, with hosting between $1,200 and $2,500 a month at scale depending on concurrent load and how much you record. Those are market ranges to plan against, not a quote from us. The media-server cost is the part that grows with usage, since it scales with the number of simultaneous video streams.

Why build on WebRTC instead of a video SDK like Zoom or Twilio?

Per-minute SDK pricing is fine until you have real volume, and then it compounds. Once you are running hundreds of sessions a week, a hosted SDK can cost several thousand dollars a month in API fees alone, while a self-hosted WebRTC stack with your own SFU runs a fraction of that. The bigger reason is control. You own the layout, the recording pipeline, the whiteboard, and your data, with no vendor deciding what your product looks like.

Can a platform like this handle thousands of concurrent users?

Yes, and the trick is not treating everyone the same. Active speakers connect over WebRTC for sub-200ms latency. Passive viewers in a big lecture get an HLS stream instead, which trades a few seconds of latency for cheap, near-unlimited scale. Redis pub/sub keeps the whiteboard and chat in sync across multiple server instances, so no single box is the bottleneck. We load-test these well past the expected peak before anyone goes live.

Can Geminate Solutions build a virtual classroom for our education business?

Yes. The WebRTC SFU, the whiteboard engine, and the recording pipeline are patterns we build with our team and reuse from one product to the next. Geminate Solutions has shipped 50+ products, including EdTech at real scale. A production virtual classroom usually takes 12 to 16 weeks. Start at geminatesolutions.com/get-started for a free project assessment.

Should you build a custom classroom or license a video SDK?

License when you just need video on a screen and the vendor branding is fine. Build when the classroom is the product, when you need breakout rooms with your own pedagogy, branded recording, an engagement model, or features the SDK will never expose. At high session volume the custom build also tends to cost a lot less per year than per-minute pricing, but the deciding factor is usually control, not the invoice.

What are the costs people forget to budget for?

Two big ones. TURN server relay scales with concurrent streams, so a portion of every participant-minute has a real cost behind it. And recording storage adds up faster than people expect once you are saving every session at high volume. If you are selling recorded courses you also need payment plumbing, and if trainers schedule sessions you need calendar logic. Together these can add fifteen to twenty-five percent on top of the base build. These are market ranges to plan against, not a quote.

When does custom classroom software actually make sense?

When you need things an off-the-shelf tool will not give you. Maybe you want to white-label the classroom as your own product. Maybe you run live demos, certification courses, or training that needs its own workflow and recording. If you are spending heavily every month on video subscriptions and still cannot get the experience you want, that is usually the signal that building is worth it.

How do you keep an online learning build from bloating?

Ship the thing people show up for first, which is live video and chat that just works. Skip breakout rooms in the MVP. A driver-training program does not need a whiteboard on day one, and a compliance course usually needs reliable recording before it needs anything interactive. Build what drives adoption, watch how people actually use it, then add the next feature against real usage instead of a wish list.

What a Build Like This Costs in the Market

For planning, a complete production virtual classroom sits in the $90,000 to $140,000 range in the market right now. That covers the custom WebRTC SFU, the student mobile app, the teacher web dashboard, the interactive whiteboard, the automatic recording pipeline, and the weeks of work to make it all hold together under load. Running it tends to land around $1,200 to $2,500 a month in hosting at real scale, with a few hundred more for ongoing tuning and feature work. These are market ranges to budget against, not a quote from us. The exact number depends on how big your rooms get, how much you record, and how custom the experience needs to be.

The more useful way to frame it is against the alternative. A hosted video SDK at real volume can run into the thousands of dollars a month, every month, forever, and the meter only goes up as you grow. A build you own is a one-time investment plus much cheaper infrastructure, and it becomes an asset instead of a subscription that renews and reprices on someone else's schedule. Whether that trade works for you depends on your volume, and that is exactly the conversation we would rather have honestly up front than oversell.

Why Teams Bring This to Us

A real-time classroom needs WebRTC, a working SFU, Canvas-based collaboration, and mobile audio chops in the same heads at the same time, and that combination is genuinely hard to pull together in a hurry. But more often the issue is not skill at all. The in-house engineers are already flat out on the core learning product, and pulling them onto a live-video build would stall the roadmap for months. We build it with our team in parallel, so the main product keeps moving while the classroom gets built by people who have shipped this exact shape of thing before.

Timing is usually the real driver. Exam season does not wait, and a live-classroom build that misses its window costs you a whole term. We come in as a product partner, not a code shop taking a spec over the wall. The SFU architecture, the HLS fallback, the whiteboard sync, the server-side recording, these are patterns we have built before, so we are reusing battle-tested decisions instead of discovering them on your deadline. And when we suggest the engagement layer, or push to make recording automatic, it is because we have watched which features actually keep students showing up, and we would rather tell you early than wait to be asked.

Related Resources

Thinking about building one?

The WebRTC architecture, the whiteboard engine, and the recording pipeline on this page are the same ones we build with our team for your product. Tell us how your classes actually run and we will tell you, honestly, what it takes.

Get a Free Project Assessment View Related EdTech Portfolio

Ready to get started?

Start a Project↗