The architecture pattern we use in production
The pattern is boring on purpose, and boring is what survives at 250K+ daily active users. It rests on four moving parts that each do one job. First, a tenant_id on every single table that holds tenant data, no exceptions, including the join tables and the audit tables. The moment one table forgets it, that table becomes the hole the whole model leaks through. We treat a missing tenant_id the way you would treat a missing primary key, it does not get past review.
Second, row-level security enforced in the database. We set a session variable for the current tenant at the start of every request, and the database policy says a session can only see rows where tenant_id equals that variable. This is the part teams skip, and skipping it is the root cause of nearly every cross-tenant leak we have been called in to fix. If isolation lives only in your application code, then isolation is one tired developer and one forgotten filter away from failing. Push it into the database and a forgotten filter returns zero rows instead of someone else's rows. The floor holds even when the code above it is wrong.
Third, tenant-context middleware that runs before anything else touches data. It resolves the tenant from the request, the subdomain, the custom domain, or a token claim, validates that the tenant is active, sets the session variable, and only then lets the request proceed. One place, one responsibility. Every downstream query inherits the right tenant scope automatically because the context was set once at the door, not re-derived in forty handlers that can each get it slightly wrong. Fourth, audit logging on any cross-tenant query. A handful of legitimate jobs do span tenants, platform analytics, a support tool, a billing rollup, and those are fine. But they are rare and known. So we log every query that reads more than one tenant and alert when one shows up that should not. In a healthy system that alert is silent for weeks, and the day it fires it has caught a bug before it became an incident.
Put together, this is defense in depth rather than a single clever trick. The middleware sets the scope, RLS enforces it at the floor, the tenant_id makes enforcement possible, and the audit log watches the seams. We have run this shape under exam-week load, and on the related exam platform side it has held at 10M+ requests per minute. If you want how we keep that login surge from melting the system, we wrote that up separately in
handling the EdTech login storm.