Case Study

Real-Time Collaboration Tool for Documents & Chat

How we built a real-time collaboration platform for 5K+ workspaces and 25K users — with operational transform, presence, permissions, and offline sync achieving <100ms sync latency.

Industry
SaaS / Productivity
Duration
11 Months
Services
Full-Stack · WebSocket · CRDT
Markets
United States

OT · Presence · Offline Sync

5K+
Workspaces
25K
Users
<100ms
Sync Latency
CRDT
Conflict Resolution

The Client's Vision

A productivity startup wanted to compete with Notion and Google Docs — documents and chat in one place, with real-time collaboration. The MVP used polling and had 2–3 second lag when multiple users typed. Offline support was non-existent.

They wanted sub-100ms sync for cursor presence and document edits, granular permissions (view vs edit vs comment), and offline sync so users could work without connectivity and merge changes when back online — without conflicts or data loss.

What Was Breaking

Operational Transform

When two users edit the same paragraph simultaneously, changes had to merge correctly. Plain last-write-wins caused overwrites. OT or CRDT was required for conflict-free merge.

Presence

Users needed to see who else was viewing a document and where their cursors were. High-frequency updates (cursor position) couldn't flood the server. Throttling and batching were critical.

Permissions

Workspaces had folders and documents with inherited permissions. View, edit, comment, and admin levels. Sharing links with expiry. Permissions had to be checked on every operation without slowing sync.

Offline Sync

Users on trains or unreliable networks needed to keep working. Local edits had to be queued and merged when reconnected. Divergent edits required conflict resolution — CRDT could handle this automatically.

The Architecture We Built

We built a real-time collaboration platform with WebSocket connections, CRDT-based document sync, and Redis for presence broadcast. Documents use a CRDT (Yjs) for conflict-free merging. Presence updates are throttled and broadcast via Redis pub/sub. Permissions are cached and checked at sync layer. Offline queue replays on reconnect.

System Architecture

Next.js Document Editor & Chat UI
Rich text editor with Yjs integration. Real-time cursor and presence overlay. Inline chat threads and workspace switcher
WebSocket Server & Auth Layer
Persistent WebSocket connections per user. JWT validation on connect. Room-based routing — documents and chat channels
PostgreSQL — Workspaces, Docs & Permissions
Workspaces, documents, and folder hierarchy. Permission matrix and sharing settings. Chat messages and thread metadata
Redis — Presence & Document State
Presence (user, cursor, doc) with TTL. Pub/sub for real-time broadcast. Cached permissions. Document CRDT state for active sessions
CRDT Sync & Offline Queue
Yjs CRDT for document edits. Client-side offline queue. Merge on reconnect. Periodic persistence to PostgreSQL

CRDT (Yjs) was the right choice for document sync. Unlike OT, it doesn't require a central server to transform operations — each client can merge independently. That enables offline editing: local changes are stored, and when the client reconnects, the merged state is computed. We persist document state to PostgreSQL on a debounced schedule and on room disconnect.

Tech Stack

Next.js
Node.js
PostgreSQL
Redis
WebSocket
CRDT
Yjs
Presence

How We Delivered It

Phase 1 — Weeks 1–4
Discovery & Sync Strategy

Evaluated OT vs CRDT. Chose Yjs for CRDT-based sync. Designed permission model and workspace hierarchy. Defined WebSocket room and message schemas.

Phase 2 — Weeks 5–24
Core Platform & WebSocket

Built WebSocket server with room routing. Integrated Yjs for document sync. Implemented presence with throttling. Built workspace, doc, and permission APIs.

Phase 3 — Weeks 25–36
Offline Sync & Chat

Implemented client-side offline queue and merge on reconnect. Built chat with Redis pub/sub. Added permission checks at sync layer. Persisted document state to PostgreSQL.

Phase 4 — Weeks 37–44
Performance & Rollout

Optimized for <100ms sync. Load-tested with 500 concurrent doc editors. Phased rollout — beta users first, then general availability. Monitored latency and conflict rates.

The Impact

Sync latency
<100ms
Real-time edits and cursor updates
5K+ workspaces
Active
Multi-tenant with isolated data
25K users
On platform
Collaborative docs and chat
Offline edits
Conflict-free merge
CRDT handles divergent changes
“The real-time sync feels like magic — we went from 2–3 second lag to sub-100ms. Offline support was the killer feature for our remote teams. No more lost edits.”
— CTO, Productivity Startup

What Made This Work

CRDT (Yjs) vs OT is a fundamental choice. OT requires a central authority to transform operations — good for simple cases but complex for offline. CRDT allows merge without server — any two replicas can converge. The tradeoff is CRDT state size grows with history; we use Yjs's garbage collection to bound it. For documents under 1MB, it works well.

Presence is high-frequency and must be throttled. We send cursor updates at most every 100ms per user. Presence is ephemeral — stored in Redis with 30s TTL. When a user disconnects, we don't persist; they simply disappear from the presence list. Redis pub/sub broadcasts to all clients in the document room.

Permissions are checked at the WebSocket layer before accepting document updates. We cache permission results in Redis (short TTL) to avoid hitting PostgreSQL on every keystroke. Folder inheritance is computed on permission grant/revoke and stored — we don't walk the tree on each sync. When permissions change, we invalidate the cache and disconnect affected clients so they re-auth.

Building a Real-Time Collaboration Tool?

We help SaaS companies build production-grade real-time and offline-first applications. Let's talk about your architecture.

Book Strategy CallCase Studies