Find Median from Data Stream
Description
The median is the middle value in an ordered integer list. If the size of the list is even, there is no single middle value, and the median is defined as the average of the two middle values.
- For example, for arr = [2, 3, 4], the median is 3.
- For example, for arr = [2, 3], the median is (2 + 3) / 2 = 2.5.
Implement the MedianFinder class:
- MedianFinder() — Initializes the MedianFinder object.
- void addNum(int num) — Adds the integer
numfrom the data stream to the data structure. - double findMedian() — Returns the median of all elements so far. Answers within 10⁻⁵ of the actual answer will be accepted.
The key challenge is that numbers arrive one at a time from a stream, so we need a data structure that can efficiently incorporate new numbers and quickly return the current median at any point.
Examples
Example 1
Input:
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output: [null, null, null, 1.5, null, 2.0]
Explanation:
- MedianFinder medianFinder = new MedianFinder();
- medianFinder.addNum(1); → data = [1]
- medianFinder.addNum(2); → data = [1, 2]
- medianFinder.findMedian(); → return 1.5 (median of [1, 2] is (1 + 2) / 2 = 1.5)
- medianFinder.addNum(3); → data = [1, 2, 3]
- medianFinder.findMedian(); → return 2.0 (median of [1, 2, 3] is 2)
Example 2
Input:
["MedianFinder", "addNum", "addNum", "addNum", "addNum", "findMedian"]
[[], [5], [15], [1], [3], []]
Output: [null, null, null, null, null, 4.0]
Explanation:
- After adding 5, 15, 1, 3 → sorted data = [1, 3, 5, 15]
- Median of [1, 3, 5, 15] = (3 + 5) / 2 = 4.0
Constraints
- -10⁵ ≤ num ≤ 10⁵
- There will be at least one element in the data structure before calling findMedian
- At most 5 × 10⁴ calls will be made to addNum and findMedian
Editorial
Brute Force
Intuition
The most straightforward approach: maintain an unsorted list. Every time we need the median, sort the entire list and pick the middle element(s).
Imagine you have a pile of numbered cards on a table. Each time someone hands you a new card, you just toss it onto the pile. When someone asks "what's the median?", you sort the entire pile from smallest to largest, find the middle card, and read its value.
This is simple to implement — addNum is O(1) since we just append to the list. But findMedian requires sorting the entire collection each time, which is O(n log n). If findMedian is called frequently (say after every insertion), the total cost becomes O(n² log n) for n insertions, which is very expensive.
Step-by-Step Explanation
Let's trace through the operations: addNum(3), addNum(1), addNum(5), addNum(2), findMedian():
Step 1: addNum(3). Simply append 3 to our list. data = [3]
Step 2: addNum(1). Append 1. data = [3, 1]
Step 3: addNum(5). Append 5. data = [3, 1, 5]
Step 4: addNum(2). Append 2. data = [3, 1, 5, 2]
Step 5: findMedian(). First, sort the entire list: sorted = [1, 2, 3, 5]. Size = 4 (even). Median = (sorted[1] + sorted[2]) / 2 = (2 + 3) / 2 = 2.5.
Every findMedian call re-sorts from scratch, even if only one element was added since the last call. This is wasteful.
Algorithm
- Maintain an unsorted list
data - addNum(num): Append
numtodata→ O(1) - findMedian():
a. Sortdata→ O(n log n)
b. Let n = length of data
c. If n is odd: return data[n / 2]
d. If n is even: return (data[n/2 - 1] + data[n/2]) / 2.0
Code
class MedianFinder {
vector<int> data;
public:
MedianFinder() {}
void addNum(int num) {
data.push_back(num);
}
double findMedian() {
sort(data.begin(), data.end());
int n = data.size();
if (n % 2 == 1) {
return data[n / 2];
}
return (data[n / 2 - 1] + data[n / 2]) / 2.0;
}
};class MedianFinder:
def __init__(self):
self.data = []
def addNum(self, num: int) -> None:
self.data.append(num)
def findMedian(self) -> float:
self.data.sort()
n = len(self.data)
if n % 2 == 1:
return self.data[n // 2]
return (self.data[n // 2 - 1] + self.data[n // 2]) / 2.0class MedianFinder {
private List<Integer> data;
public MedianFinder() {
data = new ArrayList<>();
}
public void addNum(int num) {
data.add(num);
}
public double findMedian() {
Collections.sort(data);
int n = data.size();
if (n % 2 == 1) {
return data.get(n / 2);
}
return (data.get(n / 2 - 1) + data.get(n / 2)) / 2.0;
}
}Complexity Analysis
Time Complexity:
- addNum: O(1) — simple append
- findMedian: O(n log n) — sorting the entire list each time
- For n operations total: O(n² log n) in the worst case
Space Complexity: O(n) — storing all the numbers
Why This Approach Is Not Efficient
Sorting the entire list every time findMedian is called is clearly wasteful. If we've added 50,000 numbers and call findMedian after each insertion, we perform 50,000 sorts, each taking up to O(n log n) time. The total work is O(n² log n).
The core issue is that sorting throws away work from the previous sort — after adding just one new element, we re-sort everything from scratch. We should be maintaining some structure that keeps the data "nearly sorted" or at least gives us fast access to the middle element(s).
Better Approach - Insertion Sort
Intuition
Instead of re-sorting everything, we can maintain the list in sorted order at all times. When a new number arrives, we find its correct position using binary search and insert it there.
Think of organizing a hand of playing cards. You keep your cards sorted. When you draw a new card, you slide it into the right spot among your existing sorted cards. Finding the right spot is quick (like binary search), but physically shifting the other cards to make room takes time.
With this approach, findMedian becomes O(1) — just look at the middle of the already-sorted list. The cost shifts to addNum, which requires O(log n) to find the insertion position via binary search, but O(n) to actually insert (shifting elements in an array). So addNum is O(n) overall.
Step-by-Step Explanation
Let's trace: addNum(5), addNum(2), addNum(8), addNum(1), findMedian(), addNum(4), findMedian():
Step 1: addNum(5). List is empty, insert 5. sorted = [5]
Step 2: addNum(2). Binary search for position of 2 in [5]. Position = 0 (before 5). Insert: sorted = [2, 5]
Step 3: addNum(8). Binary search for 8 in [2, 5]. Position = 2 (after 5). Insert: sorted = [2, 5, 8]
Step 4: addNum(1). Binary search for 1 in [2, 5, 8]. Position = 0 (before 2). Shift 2, 5, 8 to the right. Insert: sorted = [1, 2, 5, 8]
Step 5: findMedian(). n=4 (even). Median = (sorted[1] + sorted[2]) / 2 = (2 + 5) / 2 = 3.5. O(1) lookup!
Step 6: addNum(4). Binary search for 4 in [1, 2, 5, 8]. Position = 2 (between 2 and 5). Shift 5, 8 right. Insert: sorted = [1, 2, 4, 5, 8]
Step 7: findMedian(). n=5 (odd). Median = sorted[2] = 4. O(1) lookup!
Algorithm
- Maintain a sorted list
data - addNum(num):
a. Use binary search to find the correct insertion index fornumin the sorted list
b. Insertnumat that index (this shifts all subsequent elements right) - findMedian():
a. Let n = length ofdata
b. If n is odd: return data[n / 2]
c. If n is even: return (data[n/2 - 1] + data[n/2]) / 2.0
Code
class MedianFinder {
vector<int> data;
public:
MedianFinder() {}
void addNum(int num) {
auto pos = lower_bound(data.begin(), data.end(), num);
data.insert(pos, num);
}
double findMedian() {
int n = data.size();
if (n % 2 == 1) {
return data[n / 2];
}
return (data[n / 2 - 1] + data[n / 2]) / 2.0;
}
};import bisect
class MedianFinder:
def __init__(self):
self.data = []
def addNum(self, num: int) -> None:
bisect.insort(self.data, num)
def findMedian(self) -> float:
n = len(self.data)
if n % 2 == 1:
return self.data[n // 2]
return (self.data[n // 2 - 1] + self.data[n // 2]) / 2.0class MedianFinder {
private List<Integer> data;
public MedianFinder() {
data = new ArrayList<>();
}
public void addNum(int num) {
int pos = Collections.binarySearch(data, num);
if (pos < 0) pos = -(pos + 1);
data.add(pos, num);
}
public double findMedian() {
int n = data.size();
if (n % 2 == 1) {
return data.get(n / 2);
}
return (data.get(n / 2 - 1) + data.get(n / 2)) / 2.0;
}
}Complexity Analysis
Time Complexity:
- addNum: O(n) — binary search is O(log n), but the insertion into an array/list requires shifting elements, which is O(n)
- findMedian: O(1) — direct index access to the middle element(s)
- For n operations total: O(n²) in the worst case
Space Complexity: O(n) — storing all the numbers in a sorted list
Why This Approach Is Not Efficient
The insertion sort approach improved findMedian to O(1) but made addNum O(n) due to the cost of shifting elements in an array to maintain sorted order. For 50,000 insertions, the total cost is O(n²), which is approximately 2.5 billion operations — too slow.
The fundamental bottleneck is the array data structure. Arrays give us O(1) indexed access (great for findMedian) but O(n) insertion in the middle (bad for addNum). We need a data structure that supports both O(log n) insertion AND O(1) median retrieval.
The key insight is that to find the median, we don't actually need the entire list sorted. We only need to know:
- What is the largest element in the smaller half?
- What is the smallest element in the larger half?
A max-heap efficiently tracks the largest element in a collection, and a min-heap tracks the smallest. If we split our data into two halves — the smaller half in a max-heap and the larger half in a min-heap — we can find the median in O(1) by looking at the tops of both heaps, while inserting in O(log n).
Optimal Approach - Two Heaps
Intuition
We divide all incoming numbers into two halves:
- Left half — stored in a max-heap (gives us instant access to the largest element of the smaller half)
- Right half — stored in a min-heap (gives us instant access to the smallest element of the larger half)
Imagine you're a teacher lining up students by height. Instead of keeping one long sorted line, you split them into two groups: the shorter half and the taller half. You only need to know who is the tallest in the shorter group and who is the shortest in the taller group — those one or two students in the middle determine the median height.
We maintain one invariant: the two heaps are always balanced in size — the max-heap has either the same number of elements as the min-heap, or exactly one more. This ensures the median is always at the top(s) of the heaps.
Insertion strategy:
- Always push the new number into the max-heap (left half) first
- Then take the maximum of the max-heap and move it to the min-heap (right half) — this ensures nothing in the left half is larger than anything in the right half
- If the min-heap now has more elements than the max-heap, move the minimum of the min-heap back to the max-heap — this rebalances
Finding the median:
- If both heaps have the same size: median = (max-heap top + min-heap top) / 2
- If max-heap has one more element: median = max-heap top
Step-by-Step Explanation
Let's trace: addNum(1), addNum(2), findMedian(), addNum(3), findMedian():
Step 1: addNum(1).
- Push 1 to maxHeap → maxHeap = [1], minHeap = []
- Move max of maxHeap (1) to minHeap → maxHeap = [], minHeap = [1]
- minHeap (size 1) > maxHeap (size 0)? Yes! Move min of minHeap back → maxHeap = [1], minHeap = []
- State: maxHeap = [1], minHeap = []
Step 2: addNum(2).
- Push 2 to maxHeap → maxHeap = [2, 1], minHeap = []
- Move max of maxHeap (2) to minHeap → maxHeap = [1], minHeap = [2]
- minHeap (size 1) > maxHeap (size 1)? No. Balanced.
- State: maxHeap = [1], minHeap = [2]
Step 3: findMedian(). Both heaps have size 1 (equal). Median = (maxHeap top + minHeap top) / 2 = (1 + 2) / 2 = 1.5
Step 4: addNum(3).
- Push 3 to maxHeap → maxHeap = [3, 1], minHeap = [2]
- Move max of maxHeap (3) to minHeap → maxHeap = [1], minHeap = [2, 3]
- minHeap (size 2) > maxHeap (size 1)? Yes! Move min of minHeap (2) back → maxHeap = [2, 1], minHeap = [3]
- State: maxHeap = [2, 1], minHeap = [3]
Step 5: findMedian(). maxHeap has size 2, minHeap has size 1. maxHeap is bigger. Median = maxHeap top = 2.0
The smaller half [1, 2] has its max (2) at the top, and the larger half [3] has its min (3) at the top. The median of [1, 2, 3] is indeed 2.
Two Heaps — Maintaining Balanced Halves for Instant Median — Watch how each new number flows through the max-heap and min-heap, with rebalancing ensuring the median is always accessible at the heap tops.
Algorithm
-
Initialize two heaps:
maxHeap— a max-heap for the smaller half of the numbersminHeap— a min-heap for the larger half of the numbers
-
addNum(num):
a. Pushnumonto themaxHeap
b. Pop the maximum frommaxHeapand push it ontominHeap
c. IfminHeaphas more elements thanmaxHeap:- Pop the minimum from
minHeapand push it ontomaxHeap - This maintains the invariant: maxHeap.size ≥ minHeap.size and maxHeap.size - minHeap.size ≤ 1
- Pop the minimum from
-
findMedian():
a. If both heaps have the same size: return (maxHeap.top() + minHeap.top()) / 2.0
b. Otherwise: return maxHeap.top()
Code
class MedianFinder {
priority_queue<int> maxHeap; // left half (smaller)
priority_queue<int, vector<int>, greater<int>> minHeap; // right half (larger)
public:
MedianFinder() {}
void addNum(int num) {
maxHeap.push(num);
minHeap.push(maxHeap.top());
maxHeap.pop();
if (minHeap.size() > maxHeap.size()) {
maxHeap.push(minHeap.top());
minHeap.pop();
}
}
double findMedian() {
if (maxHeap.size() == minHeap.size()) {
return (maxHeap.top() + minHeap.top()) / 2.0;
}
return maxHeap.top();
}
};import heapq
class MedianFinder:
def __init__(self):
self.max_heap = [] # left half (negate values for max-heap)
self.min_heap = [] # right half
def addNum(self, num: int) -> None:
heapq.heappush(self.max_heap, -num)
heapq.heappush(self.min_heap, -heapq.heappop(self.max_heap))
if len(self.min_heap) > len(self.max_heap):
heapq.heappush(self.max_heap, -heapq.heappop(self.min_heap))
def findMedian(self) -> float:
if len(self.max_heap) == len(self.min_heap):
return (-self.max_heap[0] + self.min_heap[0]) / 2.0
return -self.max_heap[0]class MedianFinder {
private PriorityQueue<Integer> maxHeap; // left half
private PriorityQueue<Integer> minHeap; // right half
public MedianFinder() {
maxHeap = new PriorityQueue<>(Collections.reverseOrder());
minHeap = new PriorityQueue<>();
}
public void addNum(int num) {
maxHeap.offer(num);
minHeap.offer(maxHeap.poll());
if (minHeap.size() > maxHeap.size()) {
maxHeap.offer(minHeap.poll());
}
}
public double findMedian() {
if (maxHeap.size() == minHeap.size()) {
return (maxHeap.peek() + minHeap.peek()) / 2.0;
}
return maxHeap.peek();
}
}Complexity Analysis
Time Complexity:
- addNum: O(log n) — each heap operation (push/pop) is O(log n), and we perform at most 3 heap operations per call (push to maxHeap, pop+push to minHeap, and possibly pop+push back)
- findMedian: O(1) — we just peek at the tops of both heaps
- For n total operations: O(n log n)
Space Complexity: O(n)
We store all n numbers across the two heaps. Each number is in exactly one heap.
Why this is optimal: We need at least O(log n) per insertion in any comparison-based data structure that supports O(1) median queries. The two-heap approach achieves this lower bound, making it asymptotically optimal for this problem.