Skip to main content
EdTech Platform

Rebuildvs.Refactor:WhenYourEdTechPlatformNeedsaRewrite

Decision framework for whether your EdTech platform needs a full rewrite or targeted refactoring, and how to avoid the 18-month rewrite trap that sinks half of these projects.

Rebuild vs. Refactor: When Your EdTech Platform Needs a Rewrite
|2026-04-19|EdTechArchitectureScale

Introduction

Almost no EdTech platform actually needs the full rewrite that gets proposed. Of the rewrite projects we have evaluated, maybe one in four was the right call. The others were either targeted refactoring problems mislabeled as rewrites, or they were rewrites that should have been migrations (different architecture, same product). Full rewrites are the most expensive technical decision a platform can make. They take 12-18 months. Product velocity drops to near zero. The team gets exhausted. Investors get nervous. And the success rate is famously bad, most large rewrites either ship late, ship without feature parity, or get abandoned halfway through. This article walks through the decision framework we use when a founder asks 'do we need to rewrite this?', and the four alternatives we consider before a full rewrite ever gets recommended.

Symptoms that look like 'we need a rewrite'

Here is how the conversation usually starts. Deploys take 40 minutes and somebody is always holding their breath. There are two or three modules nobody wants to touch, the ones where a one-line change breaks something three screens away. A feature that should take a week takes two months. New engineers need a month before they can ship anything that matters. Every one of these is real pain. None of them, on its own, is a reason to rewrite.
Because slow deploys are a CI and build problem, not an architecture problem. Scary modules are a test-coverage and refactoring problem. Features dragging on usually means the code is tangled in a few hot spots, not that the whole foundation is rotten. We have walked into platforms where the founder was certain the codebase was beyond saving, and the actual issue was a 6,000-line controller and a database with no indexes on the columns every query filtered by. Two weeks of focused work and the platform that needed a rewrite was suddenly fine. (Nobody ever frames it that way at the start.)
The tell that separates a refactoring problem from a rewrite problem is whether the pain is local or structural. Local pain clusters. It lives in specific files, specific endpoints, specific tables, and you can point at it on a map. You fix those and the rest of the system breathes again. Structural pain is everywhere at once and follows from a decision baked into the foundation, the kind you cannot refactor your way out of because the assumption is wired through every layer. Most pain is local. People mistake a pile of local pain for structural rot because it all hurts at the same time, and a rewrite feels like a clean slate. A clean slate is exactly the wrong instinct, which is the next thing worth being honest about.
There is also an ego component nobody likes to name. Inheriting someone else's code is unpleasant, and "this is garbage, let me start over" is a very natural reaction for a strong engineer who did not write the original. We have felt it ourselves. But the existing system, ugly as it is, encodes years of hard-won knowledge about what your users actually do. Throwing it away to escape the discomfort of reading it is the most expensive way to learn that the old team was solving real problems you did not know existed.

When a rewrite IS the right answer

A rewrite earns its place in one situation, and it is narrow. The architectural model is wrong for where the product has to go over the next two to three years, and no amount of refactoring gets you there because the wrong assumption is structural. Refactoring reshapes code inside the existing model. When the model itself is the problem, refactoring just gives you cleaner code that still cannot do the thing you need.
In EdTech this shows up in a few recognizable shapes. A platform built single-tenant, one database per school, that now needs to serve thousands of institutions from shared infrastructure. You cannot bolt multi-tenancy onto a schema that assumed one customer per deployment, because tenant isolation has to live in the data model from the first table. Or a system whose core was designed around overnight batch processing, nightly grade rollups, nightly report generation, that the product now needs to do in real time. Batch-to-real-time is not a refactor. The data flow runs the opposite direction. We dug into that exact split in our piece on real-time EdTech analytics, because it is one of the most common walls aging platforms hit.
The cleanest test we know: write down the three or four capabilities the product must have in two years that it cannot have today. Then ask, for each, whether the blocker is messy code or a foundational assumption. If you can describe a refactoring path, even an ugly multi-month one, it is not a rewrite. If every path runs into the same load-bearing decision (the single-tenant schema, the batch core, a framework two major versions past end-of-life with no migration route), that is your signal. And notice the framing: the case for a rewrite is always about the future. "This code is bad" is never sufficient. "This model cannot become what we need" is.
One more honest filter. Even when the model is genuinely wrong, a rewrite is only worth it if the platform is going to live long enough to pay it back. A 12-to-18-month rewrite on a product that might pivot in six months is a bet against your own roadmap. We have told founders to keep limping along on the wrong architecture because the business was still figuring out what it was, and rewriting a foundation while the product is mid-pivot just means rewriting it twice.

The four alternatives we evaluate first

Before "rewrite" gets written on any slide, we run the problem against four cheaper options. Most of the time one of them is the actual answer, and the founder walks away having spent a fraction of the money for most of the result.
Targeted refactoring. Find the 20 percent of the code causing 80 percent of the pain and fix only that. The 6,000-line controller gets carved into modules. The missing indexes go in. The one tangled service gets a clean boundary drawn around it and a test harness wrapped over it. This is unglamorous and it is usually enough. You get most of the relief of a rewrite for a few weeks of work instead of a year, and the product never stops shipping. We start here on nearly every engagement.
Strangler-fig migration. When you genuinely need new architecture but the old system works, you do not replace it, you grow the replacement around it. A routing layer sits in front, new functionality gets built behind it one slice at a time, and each slice quietly takes its traffic off the old code. The old system keeps serving everything not yet migrated and gets switched off only when nothing calls it anymore. This is how you move to a new model without a single big-bang cutover, and it is the backbone of how we run the rare rewrite that is real. We walked through it end to end in our EdTech monolith migration guide.
Ground-up new module. Sometimes only one part needs the new architecture and the rest is fine. Build that one part fresh, with the model it actually needs, and integrate it back through a clean API. The new live-classroom engine or the new assessment service gets to be modern while the surrounding platform stays exactly as it is. You contain the risk to one bounded piece instead of betting the whole system. Infrastructure rewrite without a code rewrite. And surprisingly often the code is fine and the operations are the problem. Slow deploys, flaky environments, no autoscaling, a database starved of memory. You can fix every one of those, containerize the app, add a real CI pipeline, move to managed data services, tune the queries, without rewriting a line of business logic. The platform feels brand new and the actual application never changed.

If you do need a rewrite: how to do it without dying

Say you have run the framework and a rewrite is genuinely warranted. The single decision that determines whether it survives is this: it has to be incremental, and the product cannot freeze. The big-bang version, where the team disappears for a year and emerges with a shiny replacement, is the one that fails. Not sometimes. As a rule. By the time you ship, the market moved, the old system kept accumulating fixes your new one does not have, and you are trying to switch every user over in one weekend that has to go perfectly.
So you run it strangler-fig. New slices go live continuously, each one taking real traffic off the old system the moment it is ready. The team is shipping the entire time, which keeps morale alive and keeps you honest, because a slice that has to handle production traffic this month cannot hide the edge cases the way a year-long greenfield branch can. Every release is small enough to roll back without ceremony. You are never one bad deploy away from a catastrophe, just one bad deploy away from a quick revert.
The discipline that makes or breaks it is ruthless scope reduction. The new system rebuilds exactly what exists, nothing more. Not the redesigned UI, not the three features sales has been promising, not the migration to a trendy framework, not the new mobile app. Every one of those is a reasonable idea and every one of them, bolted onto a rewrite, turns a hard project into an impossible one. Feature parity first, full stop. New ideas wait until the new platform is carrying the load. Teams that cannot hold this line do not finish, because the target keeps moving and the old system never gets retired.
And budget for the edge cases, because they are where rewrites actually die. The headline features are easy, you know they exist and you can see them. It is the quiet 20 percent that gets you. The one school district that uses a grading scheme nobody else does. The integration with a state reporting system that has undocumented rules. The CSV import that silently tolerates three malformed formats because a customer needed it to in 2019. None of that is written down. It lives in the old code and in the muscle memory of users who will not tell you about it until the new system breaks it. You cannot move anyone over until the new platform handles their exact weirdness, so the rediscovery of those edge cases is the real work, and it is most of the timeline whether you planned for it or not.

The product velocity question

If there is one question that decides the fate of a rewrite, it is this one: can you keep shipping the whole way through? Not "is the new architecture better," not "is the code cleaner." Can the product keep moving forward while the foundation gets replaced underneath it. Get this wrong and even a technically flawless rewrite can sink the company, because while you went quiet your competitors did not.
A frozen roadmap is a slow-motion emergency. EdTech buyers, schools, districts, training companies, renew on a calendar, and they renew based on momentum. A platform that ships nothing visible for a year reads as a platform that is dying, even if behind the scenes the team is doing the best engineering of their careers. Sales cannot demo anything new. Support cannot promise the fix anyone is waiting on. Churn climbs for reasons that have nothing to do with the quality of the rewrite and everything to do with the silence. We have watched a sound rewrite finish on time and land into a customer base that had already started shopping around, because the year of nothing told them everything.
This is the entire reason we will not run a big-bang rewrite. Incremental is not a stylistic preference, it is the mechanism that keeps velocity alive. Slices ship every few weeks, customers see steady motion, sales keeps a story to tell, and the migration happens in the background where it belongs. The honest cost is the part nobody enjoys: during the transition, an urgent fix sometimes has to land in both the old and the new system until that module is fully migrated. That double work is real and it is annoying. It is also vastly cheaper than the alternative, which is a year of silence followed by a cutover you are praying goes clean. Budget the double work, plan for it, and it stays an inconvenience instead of becoming the thing that stalls the whole effort.

Rewrite cost in time, cash, and morale

Let us be concrete about what a real rewrite costs, because the number that gets pitched is almost always the optimistic one. A platform that took three or four years to build does not get faithfully rebuilt in six months, no matter how good the team or how much cleaner the new code is. Plan on 12 to 18 months for true feature parity, and treat anything shorter as a sign somebody has not yet found the edge cases. The cash follows the calendar. It is your full engineering cost for that entire window, except a large slice of it produces no new customer-facing value, it produces a system that does what the old one already did.
The cost founders consistently miss is the opportunity cost, and it dwarfs the salary line. For 12 to 18 months your best engineers are rebuilding the past instead of building the future. Every feature you did not ship, every market you did not enter, every competitor who pulled ahead while you were heads-down reaching parity, that is the real bill, and it never shows up on an invoice. This is exactly why the future-facing test in the rewrite decision matters so much. If the new architecture does not give the business something it genuinely needs and could not have otherwise, you have paid the largest opportunity cost in your company's life to stand still.
Here is a rough honest comparison of the paths, with the time and risk we actually see.
PathTypical timeProduct keeps shipping?Risk profile
Targeted refactoring2 to 8 weeksYes, continuouslyLow
Infrastructure rewrite1 to 3 monthsYesLow to moderate
Strangler-fig migration6 to 18 monthsYes, in slicesModerate, contained per slice
Big-bang rewrite12 to 18 months plusNo, near-total freezeHigh, often fatal
And then there is morale, which is the cost that quietly decides whether the project ever finishes. Twelve months of work where the deliverable is "the same thing, but cleaner" is genuinely hard on a team. There is no flashy launch, no new feature to demo to friends, no obvious dopamine. Halfway through, the new system can do maybe 60 percent of what the old one does, which means it does nothing useful yet, and that 60 percent valley is exactly where rewrites get quietly abandoned. The incremental approach is the antidote here too, because shipping real slices gives the team the steady wins that a year-long greenfield branch never does. If you take one thing from this whole piece, take this: be very sure the model is genuinely wrong before you spend a year and a chunk of your team's spirit replacing it. When it is not, refactor, ship, and keep building. If you want a second opinion before you commit, that is exactly the conversation we would rather have up front. See how we built an EdTech platform to 250K daily active users, read the rest of our EdTech engineering work, or tell us what you are weighing through custom software development and we will give you the honest read.
YK
Written by

CEO and co-founder of Geminate Solutions, a software and product development partner. He has led teams shipping custom web apps, mobile apps, SaaS platforms, and AI products that serve over 250,000 daily active users.

FAQ

Frequently asked questions

Should we rewrite or refactor our aging EdTech platform?
Refactor, almost always. Of the rewrite proposals we have evaluated, maybe one in four was the right call. A rewrite only earns its place when the architectural model itself blocks where the product needs to go for the next two to three years, a single-tenant schema that cannot become multi-tenant, or a batch core that cannot do real time. If the pain is slow deploys, a few scary modules, or features that take too long, that is a refactoring and tooling problem wearing a rewrite costume. Fix it in place and keep shipping.
How long does an EdTech platform rewrite actually take?
Plan on 12 to 18 months for a platform that took years to build, and treat any estimate under a year with real suspicion. The reason is feature parity. Every edge case, every quiet integration, every weird grading rule a teacher relies on has to be rediscovered and rebuilt before you can switch a single user over. The old system spent years accumulating that behavior. You do not get to skip the rediscovery just because you are writing cleaner code this time.
What is the strangler-fig pattern and why does it beat a big-bang rewrite?
You put a routing layer in front of the old system, then build new functionality behind it one slice at a time. Each new slice takes over its traffic from the old code, which keeps running until nothing calls it anymore. The old system gets strangled gradually instead of replaced overnight. It beats a big-bang rewrite because you ship value every few weeks instead of going dark for a year, every release is small enough to roll back, and you are never betting the company on one cutover weekend that has to go perfectly.
How do you keep shipping features during a rewrite?
By never freezing the product, which means never doing a big-bang rewrite in the first place. With an incremental approach the new code goes live in slices, so the team is shipping the whole way through rather than disappearing for a year. The honest hard part is that urgent fixes during the transition have to land in both the old and the new system until a module fully migrates. That double work is real, and budgeting for it up front is the difference between a transition that finishes and one that stalls.
What kills most platform rewrites?
Scope creep and the absence of a feature freeze. A rewrite that also tries to redesign the UI, add the features sales has been promising, and adopt three new technologies is no longer a rewrite, it is a brand-new product with a deadline borrowed from a smaller job. The other killer is underestimating edge cases. The boring 20 percent of behavior nobody documented is where rewrites go to die, because you cannot move users until the new system handles the exact weirdness they depend on.
How do you migrate student and course data without breaking the old platform?
Run both systems in parallel and migrate read traffic before write traffic. You backfill historical data into the new store, then dual-write new changes to old and new at the same time so they stay in sync, then move reads over account by account once you have reconciled the two. Keep the old system authoritative until reconciliation passes clean for long enough that you trust the new one. For EdTech specifically, you protect grades, submissions, and enrollment records hardest, because a silent data error there is the kind of incident that loses a school district.
Can Geminate Solutions help us decide and run an EdTech platform rewrite?
Yes, and the first thing we do is try to talk you out of the full rewrite, because most teams are better served by targeted refactoring or an incremental migration. We have built EdTech platforms running at 250,000-plus daily active users and exam systems handling 10M-plus requests a minute, so we have seen which parts of these systems actually need replacing and which just need attention. If a rewrite is genuinely warranted we run it the strangler-fig way, with the product still shipping. Start at geminatesolutions.com/get-started for an honest assessment.
GET STARTED

Ready to build something like this?

Partner with Geminate Solutions to bring your product vision to life with expert engineering and design.

Related Articles