Skip to main content

google-calendar

Introduction

Google Calendar is a comprehensive calendar management system where users create events, invite attendees, RSVP, and receive automated reminders — it sounds straightforward until you dig into the design. Building a system like Google Calendar requires solving several interconnected problems:

  1. Entity Relationship Complexity — A single event can involve multiple users with different roles (organizer vs attendee), each with their own RSVP status. The many-to-many relationship between users and events, combined with per-attendee metadata (rsvp_status, notification preferences), creates a schema that must be carefully modeled to support efficient queries.

  2. Reliable Notification Scheduling — Every attendee must receive an email reminder exactly 30 minutes before an event. With millions of events, that means millions of notifications dispatched at precisely the right time. Late reminders are useless; duplicate reminders are annoying. This is essentially a distributed job scheduling problem disguised as a calendar feature.

  3. Consistency Under Concurrent Access — When an organizer updates the meeting time while an attendee is RSVPing, both operations must complete correctly. The system can't show stale data — if a meeting was moved to 3pm, sending a reminder for the old 2pm slot is a bug, not a latency issue.

Unlike social media platforms where eventual consistency is acceptable for likes and comments, Google Calendar prioritizes data correctness and relationship integrity. Each event may involve multiple users, resources, time constraints, and access controls — making consistency and structured interactions central to the design.

At scale, the system serves ~10 million DAU. Users create ~3 events/day on average (~347 write QPS), and each event is accessed multiple times for viewing, syncing, and notifications (~3,472 read QPS). This is well within the capacity of modern cloud services, so the design focuses on correctness, relationships, and reliable async processing rather than raw throughput.

Google Calendar system overview showing event lifecycle from creation to notification
Google Calendar system overview showing event lifecycle from creation to notification

Functional Requirements

We extract verbs from the problem statement to identify core operations:

  • "retrieves" events → Event retrieval by time range
  • "creates, updates, deletes" events → Event CRUD
  • "invites" attendees → Attendee management
  • "RSVPs" to events → RSVP flow
  • "notifies" attendees → Automated reminders

Each verb maps to a functional requirement that forms an event lifecycle: create → invite → RSVP → remind → retrieve.

  1. Event Retrieval — Users retrieve their calendar events for a specific time range. Only calendar owners can view all events in their calendar.

  2. Event CRUD — Calendar owners can create, update, and delete events. Modifications trigger notifications to attendees and update the notification scheduling system.

  3. RSVP Management — Event attendees can RSVP (accepted / pending / denied) to events they're invited to. The system validates that only listed attendees can RSVP and triggers notifications about status changes.

  4. Automated Notifications — The system sends email notifications to all event attendees 30 minutes before each event. The notification system must be reliable and handle large volumes efficiently.

Out of Scope

  • Multi-calendar support per user
  • Calendar sharing and permissions beyond owner/attendee
  • Time zone handling and conversion
  • Advanced recurring event patterns (discussed in deep dive)
  • Integration with external calendar systems (CalDAV, iCal)
  • Mobile push notifications (only email in scope)
  • Event conflict detection and resolution (discussed in deep dive)

Non-Functional Requirements

We extract adjectives and descriptive phrases to identify quality constraints:

  • "consistent" → Event data must be the same across all views
  • "reliable" → Notifications must not be missed or duplicated
  • "authorized" → Access control on every operation
  • "scalable" → Millions of users and events without degradation
NFRTargetReasoning
Strong ConsistencyAll users see the same event state at all timesA meeting moved from 2pm to 3pm must never show 2pm to any user — stale calendar data causes people to miss or arrive at wrong times
Notification ReliabilityEvery scheduled reminder delivered, zero duplicatesLate reminders are useless (the meeting already started). Duplicate reminders erode trust and clutter inboxes
Data IntegrityForeign key constraints, valid state transitionsAn attendee record must always reference a valid event and user. RSVP status must be one of: accepted, pending, denied — no invalid states
SecurityOnly authorized users can view/modify eventsCalendar events contain sensitive information (meeting topics, attendee lists). Only the calendar owner can modify events; attendees can only RSVP
Availability99.9% uptimeCalendar is a coordination tool — downtime means teams can't schedule meetings
Scalability10M DAU, ~30M events/dayThe system should handle growth without architectural changes

Key insight: Unlike social media feeds where eventual consistency is fine, calendar systems demand strong consistency. If an event is updated, every attendee must see the latest state immediately. Stale data causes real-world consequences — people showing up at the wrong time, missing meetings, or RSVPing to cancelled events.

Resource Estimation

Scale Assumptions

ParameterValue
Daily active users (DAU)10 million
Events created per user per day~3
Average attendees per event~5
Read:Write ratio~10:1
Average event record size~500 bytes
Average notification record size~200 bytes
Data retentionIndefinite

Throughput

Write QPS (event mutations):

10,000,000×386,400347 QPS (avg)\frac{10{,}000{,}000 \times 3}{86{,}400} \approx 347 \text{ QPS (avg)}

This is relatively low — a single relational database can handle this throughput comfortably.

Read QPS (calendar views, syncs):

Each user opens their calendar ~10 times/day (views, refreshes, syncs):

10,000,000×1086,4001,157 QPS (avg)\frac{10{,}000{,}000 \times 10}{86{,}400} \approx 1{,}157 \text{ QPS (avg)}

Including API calls per view (metadata, attendees, events for the week):

1,157×33,472 QPS (avg)1{,}157 \times 3 \approx 3{,}472 \text{ QPS (avg)}

Notification throughput:

Events per day × attendees per event:

30,000,000×5=150,000,000 notifications/day30{,}000{,}000 \times 5 = 150{,}000{,}000 \text{ notifications/day}

Fairly distributed over 24 hours (events aren't uniformly distributed — peak during business hours in each timezone):

150,000,00086,4001,736 notifications/sec (avg)\frac{150{,}000{,}000}{86{,}400} \approx 1{,}736 \text{ notifications/sec (avg)}

Peak (business hours, ~8 hour window covering major timezones):

150,000,0008×3,6005,208 notifications/sec (peak)\frac{150{,}000{,}000}{8 \times 3{,}600} \approx 5{,}208 \text{ notifications/sec (peak)}

Storage

Event data (1 year):

30,000,000 events/day×365×500 B5.5 TB/year30{,}000{,}000 \text{ events/day} \times 365 \times 500 \text{ B} \approx 5.5 \text{ TB/year}

Attendee records (1 year):

150,000,000 attendee records/day×365×100 B5.5 TB/year150{,}000{,}000 \text{ attendee records/day} \times 365 \times 100 \text{ B} \approx 5.5 \text{ TB/year}

Notification records (1 year):

150,000,000 notifications/day×365×200 B11 TB/year150{,}000{,}000 \text{ notifications/day} \times 365 \times 200 \text{ B} \approx 11 \text{ TB/year}

Total storage is modest (~22 TB/year) and well within the capacity of a sharded relational database.

Bandwidth

Inbound (writes):

347 QPS×500 B174 KB/sec347 \text{ QPS} \times 500 \text{ B} \approx 174 \text{ KB/sec}

Outbound (reads):

3,472 QPS×2 KB (avg response)6.9 MB/sec3{,}472 \text{ QPS} \times 2 \text{ KB (avg response)} \approx 6.9 \text{ MB/sec}

Bandwidth is negligible. The design is clearly compute-and-logic bound (notification scheduling, consistency enforcement) rather than bandwidth-bound.

API Design

We derive API endpoints from the functional requirements. All endpoints are RESTful — Google Calendar is a CRUD system with asynchronous side effects (notifications), not a real-time streaming system.

# ── Event Retrieval ───────────────────────────────────────────
GET /api/calendars/{calendarId}/events?start={ISO}&end={ISO}
Authorization: Bearer {token}
→ 200 OK
{
  "events": [
    {
      "event_id": "evt_123",
      "title": "Team Meeting",
      "description": "Weekly sync",
      "start_time": "2024-01-15T14:00:00Z",
      "end_time": "2024-01-15T15:00:00Z",
      "location": "Conference Room A",
      "attendees": [
        { "user_id": "user_456", "rsvp_status": "accepted" }
      ]
    }
  ]
}

# ── Event Creation ────────────────────────────────────────────
POST /api/calendars/{calendarId}/events
Authorization: Bearer {token}
{
  "title": "Team Meeting",
  "description": "Weekly sync",
  "start_time": "2024-01-15T14:00:00Z",
  "end_time": "2024-01-15T15:00:00Z",
  "location": "Conference Room A",
  "attendees": ["user_456", "user_789"]
}
→ 201 Created
{ "event_id": "evt_123", "status": "created" }

# ── Event Update ──────────────────────────────────────────────
PATCH /api/calendars/{calendarId}/events/{eventId}
Authorization: Bearer {token}
{
  "title": "Updated Team Meeting",
  "start_time": "2024-01-15T15:00:00Z"
}
→ 200 OK
{ "event_id": "evt_123", "status": "updated" }

# ── Event Deletion ────────────────────────────────────────────
DELETE /api/calendars/{calendarId}/events/{eventId}
Authorization: Bearer {token}
→ 200 OK
{ "status": "deleted" }

# ── RSVP ─────────────────────────────────────────────────────
POST /api/events/{eventId}/rsvp
Authorization: Bearer {token}
{ "rsvp_status": "accepted" }
→ 200 OK
{ "status": "rsvp_updated" }

Design Decisions

  • Calendar-scoped URLs (/calendars/{id}/events) — Events belong to a calendar. This naturally scopes authorization: verify the requester is the calendar owner before returning events.
  • RSVP is event-scoped (/events/{id}/rsvp) — The attendee's identity comes from the JWT token, not the URL. The service validates that the authenticated user is in the attendee list.
  • Side effects are asyncPOST /events doesn't wait for notification scheduling to complete. It writes the event, then publishes an event-changed message. Notifications are handled asynchronously by worker consumers.

Data Model

Five entities model the calendar domain. The key insight is that attendees is a join table with its own attribute (rsvp_status), not just a foreign key — it captures the many-to-many relationship between users and events.

users — Application accounts.

FieldTypeDescription
user_idUUIDPrimary key
emailTEXTUnique, for notifications

calendars — Each user owns one calendar (extensible to many).

FieldTypeDescription
calendar_idUUIDPrimary key
owner_idUUIDFK → users

events — Time-bounded items within a calendar.

FieldTypeDescription
event_idUUIDPrimary key
calendar_idUUIDFK → calendars
created_byUUIDFK → users (organizer)
titleTEXTEvent title
descriptionTEXTOptional description
locationTEXTOptional location
start_timeTIMESTAMPEvent start
end_timeTIMESTAMPEvent end

attendees — Join table (many-to-many between users and events).

FieldTypeDescription
event_idUUIDFK → events (composite PK)
user_idUUIDFK → users (composite PK)
rsvp_statusTEXTaccepted, pending, denied

notifications — Scheduled email reminders.

FieldTypeDescription
notification_idUUIDPrimary key
event_idUUIDFK → events
user_idUUIDFK → users
notify_atTIMESTAMPWhen to send
sentBOOLEANDefault false
sent_atTIMESTAMPWhen actually sent

Key indexes: events(calendar_id, start_time) for range queries, attendees(event_id, user_id) composite PK, notifications(notify_at, sent) for scheduler polling.

Relationships: users → calendars (1:many). calendars → events (1:many). events ↔ users via attendees (many:many). events → notifications (1:many).

High-Level Design

We build the architecture progressively, starting from single-user event CRUD and evolving through four iterations. Each step addresses a capability that the previous design lacks.

Step 1: Basic Event CRUD — Single-User Calendar

The Starting Architecture

Start with the simplest case: one user creates events and retrieves them by time range. No attendees, no notifications — just event storage and retrieval.

Event Service

A single stateless service handles all event operations. The client sends REST requests through an API Gateway, which authenticates the user (JWT) and forwards to the Event Service.

Authorization

The Event Service checks that the authenticated user owns the calendar before allowing any operation. This is a simple ownership check: calendar.owner_id == token.sub.

Indexed Range Queries

Calendar views always request events within a time window (a week, a month). The key query pattern is:

SELECT * FROM events
WHERE calendar_id = :calendar_id
  AND start_time >= :range_start
  AND start_time <= :range_end
ORDER BY start_time;

A composite index on (calendar_id, start_time) makes this query efficient — it narrows to the user's calendar first, then scans the time range.

Basic event CRUD: Client → API Gateway → Event Service → PostgreSQL
Basic event CRUD: Client → API Gateway → Event Service → PostgreSQL

Works for: A single user managing their own calendar events

Missing: No attendees — events are single-user only. No notifications. No RSVP. No way to invite others to a meeting.

Step 2: Attendee Management & RSVP — Multi-User Events

Adding the Social Layer

A calendar without attendees is just a to-do list. Real value comes from multi-user events — invitations, RSVP tracking, and ensuring only valid attendees can respond.

The Attendees Join Table

Events and users have a many-to-many relationship. The attendees table captures this relationship with its own attribute — rsvp_status:

-- When creating an event with attendees:
BEGIN;
  INSERT INTO events (event_id, calendar_id, created_by, ...)
  VALUES (:event_id, :calendar_id, :creator, ...);

  -- Auto-add creator as attendee with 'accepted'
  INSERT INTO attendees (event_id, user_id, rsvp_status)
  VALUES (:event_id, :creator, 'accepted');

  -- Add invited attendees with 'pending'
  INSERT INTO attendees (event_id, user_id, rsvp_status)
  VALUES (:event_id, :invitee_1, 'pending'),
         (:event_id, :invitee_2, 'pending');
COMMIT;

RSVP Validation

When an attendee RSVPs, the service must validate two things:

  1. The user is actually in the attendee list (prevents unauthorized RSVP)
  2. The new status is a valid transition (accepted, pending, or denied)

The RSVP flow:

  1. Attendee sends POST /events/{id}/rsvp with their status
  2. Event Service checks attendees table for (event_id, user_id) — if no row exists, reject with 403
  3. Update rsvp_status in the attendees table

Transactional Integrity

Event creation with attendees must be atomic — if the attendee inserts fail (e.g., invalid user_id), the event insert must also roll back. This is a natural fit for a SQL transaction wrapping all inserts.

Attendee management: Event creation with attendees and RSVP validation flow
Attendee management: Event creation with attendees and RSVP validation flow

NFRs addressed: Data integrity (transactional event+attendee creation), security (RSVP validation ensures only listed attendees can respond)

Still missing: When an event is created or updated, attendees don't know about it. No notifications — the organizer could change the meeting time and attendees would never find out.

Step 3: Async Notification Pipeline — Scheduled Email Reminders

Making the Calendar Actually Useful

A calendar that doesn't remind you about meetings is barely better than a spreadsheet. Notifications transform it from a passive record into an active coordination tool. But notifications introduce asynchronous complexity: they must be scheduled for the future, survive service restarts, and handle failures gracefully.

Two Types of Notifications

  1. Immediate: Event created/updated/cancelled → notify attendees now
  2. Scheduled: Reminder 30 minutes before event start → deliver at start_time - 30min

The Pipeline

We decouple event mutations from notification delivery using a message queue:

  1. Event Service writes the event to the database
  2. After commit, it publishes an event-changed message to the queue
  3. Notification Worker consumes the message and:
    • For immediate notifications: sends emails via the Email Service
    • For scheduled reminders: inserts rows into the notifications table with notify_at = event.start_time - 30min
  4. A Scheduler periodically polls the notifications table for rows where notify_at <= now() AND sent = false
  5. The Scheduler dispatches emails via the Email Service and marks rows as sent = true

Why a Message Queue?

If the Event Service called the Email Service directly, a failed email would block the event creation response. The message queue decouples the write path from the notification path:

  • Event creation always succeeds quickly (synchronous DB write)
  • Notification delivery is retried independently (at-least-once via queue)
  • If the notification worker is down, messages accumulate in the queue and are processed when it recovers

Handling Event Updates

When an event's time changes, existing notification rows become invalid. The Notification Worker must:

  1. Delete old unsent notification rows for that event
  2. Insert new rows with the updated notify_at time

This is an idempotent operation — processing the same event-changed message twice produces the same result.

Async notification pipeline: Event Service → Message Queue → Notification Worker → Scheduler → Email Service
Async notification pipeline: Event Service → Message Queue → Notification Worker → Scheduler → Email Service

Failure Handling

  • Email delivery failure: Retry 3 times with exponential backoff. After 3 failures, move to a Dead Letter Queue (DLQ) for manual review.
  • Scheduler crash: On restart, it simply polls notify_at <= now() AND sent = false — unprocessed rows are picked up. No state is lost.
  • Duplicate notifications: The sent flag prevents re-sending. The Scheduler uses UPDATE ... WHERE sent = false RETURNING * to atomically claim rows.

NFRs addressed: Notification reliability (at-least-once delivery via queue + DLQ), availability (async pipeline doesn't block event writes)

Still missing: No support for recurring events — a user can't create "Team standup every Monday at 9am" without creating 52 separate events. No concurrent modification handling.

Step 4: Recurring Events — RRULE Storage and Virtual Expansion

The Feature That Changes Everything

Recurring events are by far the most complex feature in any calendar system. A simple "weekly standup" generates potentially infinite instances — you can't store them all. The system must generate instances on the fly while supporting exceptions ("skip next Monday" or "move this Friday's meeting to Thursday").

RRULE: The Industry Standard

We use the iCalendar RFC 5545 RRULE format — a well-defined standard with open-source libraries for parsing and expanding rules:

ALTER TABLE events ADD COLUMN recurrence_rule TEXT;

-- Example values:
-- 'FREQ=WEEKLY;BYDAY=MO,WE,FR'  (Mon/Wed/Fri weekly)
-- 'FREQ=MONTHLY;BYMONTHDAY=15'  (15th of each month)
-- 'FREQ=DAILY;INTERVAL=2'       (Every 2 days)
-- 'FREQ=WEEKLY;BYDAY=MO;UNTIL=20241231T235959Z'  (Weekly until end of 2024)

Querying: Two-Phase Approach

When a user requests events for a time range, we query two categories:

  1. Non-recurring events — straightforward range query on start_time:
SELECT * FROM events
WHERE calendar_id = :cal_id
  AND recurrence_rule IS NULL
  AND start_time >= :range_start
  AND start_time <= :range_end;
  1. Recurring events — select all recurring series that started before the query end:
SELECT * FROM events
WHERE calendar_id = :cal_id
  AND recurrence_rule IS NOT NULL
  AND start_time <= :range_end;

Then expand each recurring event in application code using an RRULE library (e.g., rrule.js) to generate instances within the requested range. Merge both result sets into a single list sorted by start_time.

Exceptions to Recurring Events

Users often modify or cancel individual instances ("move this Monday's standup to Tuesday"). We handle this with an exceptions table:

CREATE TABLE recurrence_exceptions (
  exception_id UUID PRIMARY KEY,
  parent_event_id UUID REFERENCES events(event_id),
  original_date TIMESTAMP NOT NULL,  -- which instance
  replacement_event_id UUID REFERENCES events(event_id),  -- NULL = cancelled
  UNIQUE (parent_event_id, original_date)
);
  • Skip an instance: Insert an exception with replacement_event_id = NULL
  • Modify an instance: Create a new standalone event and reference it as the replacement

During expansion, the service checks the exceptions table and substitutes or removes instances accordingly.

Notification Scheduling for Recurring Events

We can't pre-schedule notifications for all future instances (infinite series). Instead, a rolling window scheduler creates notification rows for the next 7 days. A daily cron job refreshes the window.

Recurring events: RRULE storage, on-the-fly expansion, and exception handling
Recurring events: RRULE storage, on-the-fly expansion, and exception handling

NFRs addressed: Scalability (one DB row represents infinite instances), data integrity (exceptions cleanly override individual occurrences)

Still missing: No conflict detection — two events can overlap without warning. No concurrent modification handling — two users editing the same event simultaneously can overwrite each other's changes.

Step 5: Complete Architecture — Conflict Detection & Concurrent Modifications

The Final Architecture

The last pieces address event conflicts and concurrent modifications — features that make the calendar reliable enough for enterprise use.

Conflict Detection & Free/Busy Queries

Users need to see availability before scheduling. The system exposes a free/busy API that merges time ranges:

  1. Fetch all accepted events for a user in a time range
  2. Merge overlapping intervals (classic greedy algorithm)
  3. Return busy blocks — other users see these as unavailable times

Conflict detection at event creation is a warning, not a block — Google Calendar allows double-booking but shows a visual warning.

Optimistic Locking for Concurrent Modifications

When multiple users can edit events (e.g., shared calendars in the future), concurrent modifications must not silently overwrite each other. We use optimistic locking with a version field:

ALTER TABLE events ADD COLUMN version INTEGER DEFAULT 1;

-- Update with version check:
UPDATE events
SET title = :new_title,
    start_time = :new_start,
    version = version + 1
WHERE event_id = :event_id
  AND version = :expected_version;
-- If 0 rows affected → conflict, ask user to retry

If the version doesn't match, the update fails and the client receives a 409 Conflict response. The client fetches the latest state and asks the user to re-apply their changes — a reject-and-retry approach that's simple and effective for the low edit frequency of calendar events.

Caching Layer

Calendar views are read-heavy (10:1 read/write ratio). A Redis cache layer reduces database load:

  • Cache key: calendar:{calendarId}:events:{weekStart} → list of events for that week
  • Invalidation: On any event mutation, invalidate the cache entry for the affected calendar and time range
  • TTL: 5 minutes — stale data window is short enough for a calendar app
Complete Google Calendar architecture with all components
Complete Google Calendar architecture with all components

NFR Scorecard

NFRTargetHow It's Met
Strong ConsistencyAll users see latest event statePostgreSQL ACID transactions for all writes; optimistic locking prevents lost updates; cache invalidation on every mutation
Notification ReliabilityEvery reminder delivered, zero duplicatesMessage queue guarantees at-least-once delivery; sent flag prevents re-delivery; DLQ captures persistent failures
Data IntegrityValid state transitions, referential integritySQL foreign keys enforce relationships; RSVP validation checks attendee list; transactional event+attendee creation
SecurityAuthorization on every operationJWT authentication at API Gateway; calendar ownership check on every event operation; attendee validation for RSVP
Availability99.9% uptimeStateless Event Service scales horizontally; async notification pipeline doesn't block writes; scheduler recovers from crashes by re-polling
Scalability10M DAU, 30M events/dayIndexed range queries; Redis cache for read-heavy workload; rolling notification window for recurring events

Deep Dives

How do you handle recurring events efficiently at scale?

Recurring Event Deep Dive

Recurring events are deceptively complex. A "weekly standup on Mondays" sounds simple, but consider: What if the user changes the name of just one instance? What if a holiday falls on Monday and they cancel that instance? What if they change the recurrence from weekly to biweekly — does that affect past instances?

Storage Strategies

StrategyProsCons
Materialized instances (store every occurrence)Simple queries, easy per-instance edits"Every weekday forever" = infinite rows; bulk updates (change all future titles) require N writes
Virtual expansion (store rule, expand on query)One row per series; bulk updates modify one rowComplex query logic; exception handling is tricky
Hybrid (virtual + materialized exceptions)Best of both — efficient storage with clean exception handlingMore complex code path

We choose the hybrid approach: the recurring series is stored as a single row with an RRULE; exceptions (cancelled or modified instances) are materialized in a separate table.

The RRULE Expansion Pipeline

When expanding recurring events for a time range:

  1. Parse the recurrence_rule string using an RFC 5545-compliant library
  2. Generate all instance dates within [range_start, range_end]
  3. Check the recurrence_exceptions table for this series:
    • If an instance date has a replacement_event_id → use the replacement
    • If an instance date has replacement_event_id = NULL → skip (cancelled)
  4. For each surviving instance, clone the parent event's metadata (title, location, attendees) with the computed start/end time

The "This and All Future" Problem

When a user modifies "this and all future events", we split the series:

  1. Update the original series' RRULE to end at the split date (UNTIL=...)
  2. Create a new recurring event starting from the split date with the new properties

This avoids retroactively changing past instances while giving the user a clean new series going forward.

<discuss-with-ai-button title="Recurring Event Strategies" context="Google Calendar uses hybrid virtual expansion: one DB row per series with RRULE, materialized exceptions for modified/cancelled instances. 'This and all future' splits the series into two. Expansion happens in application code using RFC 5545 libraries." points='["How do you handle notification scheduling for events that repeat infinitely?","What happens if an attendee RSVPs to a specific instance of a recurring event — where is that stored?","How does timezone handling interact with recurring events — what if DST changes the wall-clock time?","What is the performance impact of expanding recurring events for a year view (52+ instances per series)?","How does Google Calendar handle 'every weekday' vs 'every Mon-Fri' when holidays are involved?"]'>

How do you implement conflict detection and find free time slots?

Conflict Detection & Free/Busy Queries

Finding free time slots is essentially the merge intervals problem from LeetCode, but applied at system design scale with real-world complications: recurring events, multiple attendees, and privacy constraints.

Overlap Detection

Two events overlap if one starts before the other ends:

SELECT COUNT(*) FROM events e
JOIN attendees a ON e.event_id = a.event_id
WHERE a.user_id = :user_id
  AND a.rsvp_status = 'accepted'
  AND e.start_time < :proposed_end
  AND e.end_time > :proposed_start;

If the count is > 0, there's a conflict. The system shows a warning but still allows creation — Google Calendar intentionally permits double-booking.

Free/Busy API

The free/busy endpoint returns merged busy blocks without exposing event details (privacy). Other users can see when someone is busy, but not why.

The Merge Intervals Algorithm

  1. Fetch all accepted events for the user in the requested time range
  2. Sort by start_time
  3. Merge overlapping intervals:
def get_free_slots(events, range_start, range_end):
    # Step 1: Merge overlapping busy intervals
    intervals = sorted((e.start_time, e.end_time) for e in events)
    merged = []
    for start, end in intervals:
        if merged and start <= merged[-1][1]:
            merged[-1] = (merged[-1][0], max(merged[-1][1], end))
        else:
            merged.append((start, end))

    # Step 2: Find gaps between busy blocks
    free = []
    prev_end = range_start
    for busy_start, busy_end in merged:
        if prev_end < busy_start:
            free.append((prev_end, busy_start))
        prev_end = max(prev_end, busy_end)
    if prev_end < range_end:
        free.append((prev_end, range_end))
    return free

Recurring Event Complication

Free/busy queries must expand recurring events within the requested range before running the merge algorithm. This means the computation is O(total instances in range), not O(stored events).

Performance Optimization

  • Index: (user_id, start_time, end_time) on the attendees+events join for efficient conflict queries
  • Caching: Cache free/busy results per user per day (invalidate on event mutation)
  • Batch queries: For scheduling meetings with multiple attendees, run free/busy queries in parallel and intersect the results

How do you handle concurrent event modifications safely?

Concurrent Modification Strategies

When multiple users can edit the same event — e.g., shared calendars or delegated access — concurrent modifications can silently overwrite each other. This is a classic distributed systems problem.

The Problem

Alice and Bob both open the same event. Alice changes the title, Bob changes the location. If both save:

  • Without protection: Last writer wins. Alice saves first → title=new, location=old. Bob saves second → title=old, location=new. Alice's title change is silently lost.
  • With optimistic locking: Bob's save fails because the version changed. He's prompted to refresh and retry.

Approach Comparison

ApproachBehaviorTradeoff
Last Writer WinsAccept latest write, overwrite previousSimple but loses data silently
Pessimistic LockingLock the event while editingPrevents conflicts but adds latency and lock management
Optimistic LockingDetect conflict at write time, reject stale writesNo locks, works well for low-contention resources
Field-Level MergeMerge non-conflicting field changesComplex but preserves both changes

Our Choice: Optimistic locking. Calendar events are low-contention resources — the probability of two users editing the same event at the same second is low. When it does happen, the simple reject-and-retry UX is acceptable.

Implementation Details

  1. Event record includes a version integer (starts at 1)
  2. Client loads event and receives the current version
  3. On save, the client sends version in the request body
  4. Server performs: UPDATE events SET ... WHERE event_id = ? AND version = ?
  5. If 0 rows affected → version mismatch → return 409 Conflict
  6. Client shows a dialog: "This event was modified by someone else. Refresh to see the latest changes."

Notification Implications

If an event update is rejected due to a version conflict, no notification is triggered — the update never happened. This prevents spurious notifications from failed writes.

How do you build a reliable notification scheduler at scale?

Distributed Notification Scheduling

Sending 5,000+ notifications per second at exactly the right time is a distributed scheduling problem. The naive approach — a single cron job polling the database — breaks down when you need precision, reliability, and scalability.

The Naive Scheduler and Its Problems

A simple SELECT * FROM notifications WHERE notify_at <= now() AND sent = false running every minute works for small scale, but:

  • Latency: Up to 60 seconds late (poll interval)
  • Throughput: One query returns thousands of rows during peak hours
  • Single point of failure: If the cron job crashes, all notifications stop

Partitioned Scheduler

Divide the notification table into time-based partitions. Multiple scheduler instances each claim a partition (e.g., by minute) using a distributed lock:

  1. Scheduler instance claims partition for minute T (e.g., via Redlock)
  2. Fetches all notifications where notify_at falls within minute T
  3. Sends emails in batch via the Email Service
  4. Marks all as sent = true
  5. Releases the partition lock

Multiple instances can process different minutes in parallel, and no two instances process the same minute.

Exactly-Once Delivery

The combination of distributed lock + sent flag gives us effectively exactly-once delivery:

  • The lock prevents two instances from claiming the same partition
  • If an instance crashes mid-processing, the lock expires and another instance takes over — but only unsent rows are processed (checking sent = false)

Clock Skew Handling

Scheduler instances must agree on "now". If instance A's clock is 2 seconds ahead of instance B, they might both try to process the same minute. The distributed lock prevents this, but to avoid wasted work, instances use a shared time source (database server time via SELECT NOW() or a time service).

Staff-Level Discussion Topics

The following topics contain open-ended architectural questions for staff+ conversations.

Timezone Handling as a Fundamental Architecture Decision

Context: Google Calendar operates across timezones, but the SDS intentionally scopes out timezone handling. In practice, it's a fundamental design decision that affects storage, queries, notifications, and recurring events.

Discussion Points:

  • All timestamps in the database should be stored in UTC. But events have a "wall-clock" intent — "team standup at 9am" means 9am in the organizer's timezone, not 9am UTC.
  • Recurring events are especially tricky: "every Monday at 9am Pacific" should remain at 9am Pacific even when DST changes (jumping from UTC-8 to UTC-7). This means the RRULE must store the timezone, and expansion must use timezone-aware libraries.
  • Notifications must be sent at the right wall-clock time for each attendee's timezone — or should they? Should the reminder be 30 minutes before the event's UTC start, or 30 minutes before the event's wall-clock time in the attendee's timezone?
  • The VTIMEZONE component in iCal RFC 5545 defines how to handle timezone transitions, but implementing it correctly is notoriously difficult.

<discuss-with-ai-button title="Timezone Handling in Calendar Systems" context="Timestamps stored in UTC, but events have wall-clock intent. Recurring events must handle DST transitions. Notifications must be timezone-aware. iCalendar RFC 5545 VTIMEZONE component defines the standard." points='["How does DST affect a 'weekly Monday 9am' recurring event — does the UTC representation shift?","Should the database store both UTC and the original timezone, or just UTC with the timezone as metadata?","How do you handle events across timezone boundaries — a meeting at 5pm ET that an attendee in PT sees at 2pm?","What is the VTIMEZONE component in iCal and why is it hard to implement?","How does timezone handling interact with free/busy queries across different timezones?"]'>

Calendar Federation and the CalDAV Protocol

Context: In the real world, organizations use different calendar systems (Google Calendar, Outlook, Apple Calendar). Scheduling a meeting across organizations requires interoperability — and the industry standard is CalDAV (RFC 4791).

Discussion Points:

  • CalDAV is an extension of WebDAV that uses iCal format (.ics files) for event data exchange. It's the protocol that makes cross-platform calendar syncing possible.
  • Free/busy lookup across organizations uses the "scheduling outbox" and "scheduling inbox" pattern — a user's calendar server publishes their availability, and other servers query it.
  • The trust model is complex: how much availability data should you expose to external organizations? Full event details? Just busy/free? Should you allow external users to propose times?
  • Supporting CalDAV means our internal data model must be mappable to/from the iCal format — which constrains schema design.
  • Event identity across systems: if an event is created in Google Calendar and synced to Outlook via CalDAV, both systems must agree on the event's identity for updates and cancellations.

<discuss-with-ai-button title="Calendar Federation via CalDAV" context="CalDAV (RFC 4791) enables cross-platform calendar interoperability using iCal format. Free/busy queries work across organizations via scheduling inboxes/outboxes. Supporting CalDAV constrains internal data model design." points='["How would supporting CalDAV change our internal data model?","What are the security implications of exposing free/busy data to external organizations?","How do you handle event updates that originate from an external CalDAV client?","What is the 'scheduling outbox' pattern and how does cross-org meeting scheduling work?","How do you handle conflicts between the internal state and a CalDAV sync — which one wins?"]'>

Calendar as an Event-Sourced System

Context: Every mutation in Google Calendar (create, update, delete, RSVP) produces a side effect — a notification, a cache invalidation, a sync to an external system. Can we treat the calendar as an event-sourced system where the mutation log is the source of truth?

Discussion Points:

  • The event-changed and rsvp-changed messages we publish to the message queue are already domain events. We could formalize them into an event store.
  • Calendar state would be a materialized view derived by replaying the event stream. Multiple consumers (notification worker, cache invalidator, CalDAV sync, analytics) each maintain their own projection.
  • Event sourcing enables temporal queries: "show me what my calendar looked like last Tuesday" — useful for debugging and auditing.
  • But: event sourcing adds complexity for a CRUD-heavy domain. Calendar mutations are relatively simple (compared to, say, financial transactions). Is the added complexity justified?
  • GDPR implications: if the event log contains all changes with user identity, "right to be forgotten" requires scrubbing the log — which breaks the append-only guarantee.

Level Expectations

Dimension
RequirementsIdentify core FRs: event CRUD, attendees, RSVP. Mention notifications. Basic scale numbers (10M DAU → write/read QPS).Define NFRs: strong consistency for calendar data, notification reliability (no duplicates, no missed sends), data integrity via foreign key constraints. Explain why eventual consistency fails for calendars.Challenge scope — argue whether recurring events and timezone handling should be in scope. Identify the notification scheduler as a distributed job scheduling problem. Discuss CalDAV interoperability.
High-Level DesignDraw client → API Gateway → Event Service → DB. Mention REST endpoints. Show the attendees join table.Design the async notification pipeline: message queue → notification worker → scheduler → email service. Decouple writes from notifications. Add caching for read-heavy workload.Event sourcing for calendar mutations. CQRS if read/write patterns diverge. Timezone-aware RRULE expansion. CalDAV federation for cross-org scheduling.
Data ModelIdentify the core entities: users, calendars, events. Know that attendees is a many-to-many relationship.Design the full schema including the attendees join table with rsvp_status, notifications table with sent flag, and key indexes. Explain the recurrence_exceptions table.Discuss VTIMEZONE storage, event versioning for optimistic locking, and how the schema must support CalDAV iCal mapping.
Recurring EventsMention that recurring events need special handling.Design RRULE storage with virtual expansion. Handle exceptions (cancel/modify single instance). Explain the "this and all future" series split.Analyze DST interaction with recurring events. Discuss materialized vs virtual expansion tradeoffs at scale. Rolling window notification scheduling for infinite series.
ReliabilityMention retries for failed notifications.Design the DLQ for failed emails. Explain idempotent notification scheduling (event-changed message processed twice produces same result). Optimistic locking for concurrent edits.Distributed scheduler with partitioned locks and exactly-once semantics. Clock skew handling. Event sourcing for complete audit trail.

Interview Cheatsheet

Core Architecture in 60 Seconds

"A CRUD system with async side effects. The write path is simple: REST endpoints → Event Service → PostgreSQL with ACID transactions. The interesting part is what happens after the write — a message queue triggers notification workers that schedule reminders, send emails, and invalidate caches."

"Strong consistency, not eventual. Calendar data can't be stale. If a meeting moves from 2pm to 3pm, every attendee must see 3pm immediately. PostgreSQL transactions + optimistic locking + cache invalidation on every mutation ensure this."

"Recurring events are virtual, not materialized. One DB row with an RRULE represents an infinite series. Instances are expanded on the fly during queries. Exceptions (cancelled or modified instances) are stored in a separate table. 'This and all future' splits the series."

"The notification scheduler is a distributed job scheduler. Time-based partition locking with multiple worker instances. Each instance claims a time partition, processes its notifications, marks them sent. The sent flag + distributed lock gives effectively exactly-once delivery."

Key Trade-offs to Mention

Trade-offOption AOption BWhen to Choose
ConsistencyStrong (ACID)EventualStrong for calendar — stale data causes real-world scheduling failures
Recurring storageMaterialized (all instances)Virtual (RRULE + expand)Virtual — can't store infinite instances; materialized only for exceptions
Concurrent editsPessimistic lockOptimistic lockOptimistic — low contention on calendar events makes lock overhead wasteful
Notification couplingSynchronous (in request path)Async (message queue)Async — decouples write latency from email delivery
Conflict detectionBlock double-bookingWarn but allowWarn — users intentionally double-book (travel + meetings)
SchedulerSingle cron jobPartitioned workersPartitioned workers for reliability and throughput
Free/busy privacyShow event detailsShow busy/free onlyBusy/free for external users; details for internal team

Common Mistakes to Avoid

  • ❌ Materializing all recurring event instances — "every weekday" is infinite; store the RRULE and expand on query
  • ❌ Sending notifications synchronously in the event creation path — blocks the response and creates a fragile coupling
  • ❌ Using eventual consistency for event data — stale calendar data causes real-world scheduling failures (wrong time, cancelled events)
  • ❌ Ignoring the "this and all future" problem in recurring events — modifying all instances retroactively is almost never the desired behavior
  • ❌ A single cron job for notification scheduling — single point of failure and can't handle burst traffic during business hours
  • ❌ Forgetting to cancel stale notifications when events are rescheduled — sending a reminder for 2pm when the meeting moved to 3pm is a critical bug