Substrings with K Distinct Characters
Description
Given a string s consisting of only lowercase English letters and an integer k, count the total number of substrings of s that contain exactly k distinct characters.
A substring is a contiguous sequence of characters within the string. Two substrings that are identical in content but occur at different positions in the string should each be counted separately.
For example, in the string "aa" with k = 1, the substrings "a" starting at index 0, "a" starting at index 1, and "aa" are all valid — giving a count of 3.
Examples
Example 1
Input: s = "abc", k = 2
Output: 2
Explanation: The substrings with exactly 2 distinct characters are:
- "ab" (characters 'a' and 'b') — starting at index 0
- "bc" (characters 'b' and 'c') — starting at index 1
The substring "abc" has 3 distinct characters, so it does not qualify. Single characters have only 1 distinct character each, so they do not qualify either.
Example 2
Input: s = "aba", k = 2
Output: 3
Explanation: The substrings with exactly 2 distinct characters are:
- "ab" (indices 0-1, characters 'a' and 'b')
- "ba" (indices 1-2, characters 'b' and 'a')
- "aba" (indices 0-2, characters 'a' and 'b')
All three substrings contain exactly 2 distinct characters.
Example 3
Input: s = "aa", k = 1
Output: 3
Explanation: The substrings with exactly 1 distinct character are:
- "a" (index 0)
- "a" (index 1)
- "aa" (indices 0-1)
Even though the two single-character substrings look identical, they occur at different positions and are counted separately.
Constraints
- 1 ≤ s.length ≤ 10^6
- 1 ≤ k ≤ 26
- s consists of only lowercase English letters ('a' to 'z')
Editorial
Brute Force
Intuition
The simplest way to solve this problem is to examine every possible substring of the given string and count how many of them have exactly k distinct characters.
A string of length n has n × (n + 1) / 2 substrings. For each one, we can track which characters appear using a frequency array of size 26 (one slot per lowercase letter). As we extend a substring character by character, we update the frequency array and maintain a running count of distinct characters. Whenever the distinct count equals k, we increment our answer.
This approach is intuitive because it directly mirrors the problem statement — check every substring, count its distinct characters, and see if it matches k. The downside is that we process a quadratic number of substrings.
Step-by-Step Explanation
Let's trace through with s = "aba", k = 2:
Step 1: Fix starting index i = 0. Initialize a frequency array of 26 zeros and distinctCount = 0.
Step 2: Extend to j = 0. Character 'a'. freq['a'] becomes 1. distinctCount = 1.
- Is distinctCount == 2? No. (Only 1 distinct character so far.)
Step 3: Extend to j = 1. Character 'b'. freq['b'] becomes 1. distinctCount = 2.
- Is distinctCount == 2? Yes! Count this substring "ab". answer = 1.
Step 4: Extend to j = 2. Character 'a'. freq['a'] becomes 2. distinctCount stays 2 (a was already seen).
- Is distinctCount == 2? Yes! Count this substring "aba". answer = 2.
Step 5: No more characters. Move to i = 1. Reset frequency array and distinctCount = 0.
Step 6: Extend to j = 1. Character 'b'. freq['b'] becomes 1. distinctCount = 1.
- Is distinctCount == 2? No.
Step 7: Extend to j = 2. Character 'a'. freq['a'] becomes 1. distinctCount = 2.
- Is distinctCount == 2? Yes! Count "ba". answer = 3.
Step 8: No more characters. Move to i = 2. Reset.
Step 9: Extend to j = 2. Character 'a'. distinctCount = 1.
- Is distinctCount == 2? No.
Step 10: No more characters. All starting positions exhausted. Return answer = 3.
Brute Force — Checking All Substrings of "aba" — Watch how nested loops enumerate every substring and count distinct characters using a frequency tracker. Each valid substring with exactly k=2 distinct characters is counted.
Algorithm
- Initialize
answer = 0. - For each starting index
ifrom 0 to n-1:
a. Reset a frequency array of size 26 to all zeros.
b. ResetdistinctCount = 0.
c. For each ending indexjfromito n-1:- If
freq[s[j]]is 0, incrementdistinctCount(new character). - Increment
freq[s[j]]. - If
distinctCount == k, incrementanswer.
- If
- Return
answer.
Code
#include <string>
#include <vector>
using namespace std;
class Solution {
public:
int countSubstr(string& s, int k) {
int n = s.length();
int answer = 0;
for (int i = 0; i < n; i++) {
vector<int> freq(26, 0);
int distinctCount = 0;
for (int j = i; j < n; j++) {
if (freq[s[j] - 'a'] == 0) {
distinctCount++;
}
freq[s[j] - 'a']++;
if (distinctCount == k) {
answer++;
}
}
}
return answer;
}
};class Solution:
def countSubstr(self, s: str, k: int) -> int:
n = len(s)
answer = 0
for i in range(n):
freq = [0] * 26
distinct_count = 0
for j in range(i, n):
char_index = ord(s[j]) - ord('a')
if freq[char_index] == 0:
distinct_count += 1
freq[char_index] += 1
if distinct_count == k:
answer += 1
return answerclass Solution {
public int countSubstr(String s, int k) {
int n = s.length();
int answer = 0;
for (int i = 0; i < n; i++) {
int[] freq = new int[26];
int distinctCount = 0;
for (int j = i; j < n; j++) {
if (freq[s.charAt(j) - 'a'] == 0) {
distinctCount++;
}
freq[s.charAt(j) - 'a']++;
if (distinctCount == k) {
answer++;
}
}
}
return answer;
}
}Complexity Analysis
Time Complexity: O(n²)
The outer loop runs n times and the inner loop runs up to n times for each starting position. In total, we examine n × (n + 1) / 2 substrings. Each iteration performs O(1) work (array lookup, increment, comparison). Therefore, the overall time complexity is O(n²).
Space Complexity: O(1)
We use a frequency array of fixed size 26 (one slot per lowercase letter). This does not grow with the input size, so the space usage is constant.
Why This Approach Is Not Efficient
The brute force approach examines O(n²) substrings. When the string length can be up to 10^6 (as stated in the constraints), the number of operations would be approximately (10^6)² = 10^12 — far too many to complete within typical time limits.
The core inefficiency is that we re-examine overlapping portions of the string repeatedly. When we shift the starting index from i to i+1, we discard all the frequency information we built up and start fresh. This redundant work is what the sliding window technique eliminates.
We need an approach that processes each character a constant number of times, achieving O(n) time complexity.
Optimal Approach - Sliding Window with At-Most-K Trick
Intuition
Counting substrings with exactly k distinct characters directly using a sliding window is tricky, because adding a character might increase distinct count while removing one might decrease it — the window boundaries do not move monotonically for an "exactly k" condition.
The elegant trick is to decompose the problem:
exactly(k) = atMost(k) − atMost(k − 1)
Here, atMost(k) counts the number of substrings with at most k distinct characters. The "at most" condition is perfectly suited for the sliding window technique because:
- When we add a character that pushes distinct count above
k, we shrink the window from the left until distinct count is ≤kagain. - The window boundaries move monotonically — the left pointer never moves backward.
For any valid window [left, right] where the number of distinct characters is ≤ k, every substring ending at right and starting at any index from left to right is also valid. That gives us right − left + 1 new valid substrings each time we extend the right pointer.
By computing atMost(k) and atMost(k − 1) separately and subtracting, we isolate exactly the substrings that have precisely k distinct characters — no more, no fewer.
Step-by-Step Explanation
Let's trace through with s = "aba", k = 2.
We need: exactly(2) = atMost(2) − atMost(1).
Computing atMost(2):
Initialize: left = 0, distinctCount = 0, count = 0, freq = [0]*26.
Step 1: right = 0, char = 'a'. freq['a'] = 1. distinctCount = 1 (≤ 2, OK).
- Valid substrings ending at 0: right − left + 1 = 0 − 0 + 1 = 1 → "a".
- count = 1.
Step 2: right = 1, char = 'b'. freq['b'] = 1. distinctCount = 2 (≤ 2, OK).
- Valid substrings ending at 1: 1 − 0 + 1 = 2 → "b", "ab".
- count = 1 + 2 = 3.
Step 3: right = 2, char = 'a'. freq['a'] = 2. distinctCount stays 2 (≤ 2, OK).
- Valid substrings ending at 2: 2 − 0 + 1 = 3 → "a", "ba", "aba".
- count = 3 + 3 = 6.
atMost(2) = 6.
Computing atMost(1):
Reset: left = 0, distinctCount = 0, count = 0, freq = [0]*26.
Step 4: right = 0, char = 'a'. freq['a'] = 1. distinctCount = 1 (≤ 1, OK).
- Valid substrings: 0 − 0 + 1 = 1 → "a".
- count = 1.
Step 5: right = 1, char = 'b'. freq['b'] = 1. distinctCount = 2 (> 1, SHRINK!).
- Shrink: remove s[left=0] = 'a'. freq['a'] = 0. distinctCount = 1. left = 1.
- Now distinctCount ≤ 1. Valid substrings: 1 − 1 + 1 = 1 → "b".
- count = 1 + 1 = 2.
Step 6: right = 2, char = 'a'. freq['a'] = 1. distinctCount = 2 (> 1, SHRINK!).
- Shrink: remove s[left=1] = 'b'. freq['b'] = 0. distinctCount = 1. left = 2.
- Now distinctCount ≤ 1. Valid substrings: 2 − 2 + 1 = 1 → "a".
- count = 2 + 1 = 3.
atMost(1) = 3.
Result: exactly(2) = atMost(2) − atMost(1) = 6 − 3 = 3. ✓
Sliding Window — Computing atMost(2) on "aba" — Watch how the sliding window expands right and counts valid substrings. Since all characters fit within 2 distinct, the left pointer never needs to shrink in this pass.
Algorithm
- Define a helper function
atMost(s, k)that counts substrings with at mostkdistinct characters:
a. Initializeleft = 0,count = 0,distinctCount = 0, and a frequency array of size 26.
b. For eachrightfrom 0 to n-1:- Increment
freq[s[right]]. If this is the first occurrence, incrementdistinctCount. - While
distinctCount > k: decrementfreq[s[left]]. If the frequency drops to 0, decrementdistinctCount. Incrementleft. - Add
right - left + 1tocount(all valid substrings ending atright).
c. Returncount.
- Increment
- The answer is
atMost(s, k) - atMost(s, k - 1).
Code
#include <string>
#include <vector>
using namespace std;
class Solution {
public:
// Count substrings with at most k distinct characters
int atMost(string& s, int k) {
int n = s.length();
int count = 0;
int left = 0;
int distinctCount = 0;
vector<int> freq(26, 0);
for (int right = 0; right < n; right++) {
// Expand window: add character at right
freq[s[right] - 'a']++;
if (freq[s[right] - 'a'] == 1) {
distinctCount++;
}
// Shrink window if distinct exceeds k
while (distinctCount > k) {
freq[s[left] - 'a']--;
if (freq[s[left] - 'a'] == 0) {
distinctCount--;
}
left++;
}
// All substrings ending at right with start in [left, right]
count += right - left + 1;
}
return count;
}
int countSubstr(string& s, int k) {
return atMost(s, k) - atMost(s, k - 1);
}
};class Solution:
def countSubstr(self, s: str, k: int) -> int:
def at_most(max_distinct: int) -> int:
n = len(s)
count = 0
left = 0
distinct_count = 0
freq = [0] * 26
for right in range(n):
# Expand window: add character at right
char_idx = ord(s[right]) - ord('a')
freq[char_idx] += 1
if freq[char_idx] == 1:
distinct_count += 1
# Shrink window if distinct exceeds max_distinct
while distinct_count > max_distinct:
left_idx = ord(s[left]) - ord('a')
freq[left_idx] -= 1
if freq[left_idx] == 0:
distinct_count -= 1
left += 1
# All substrings ending at right with start in [left, right]
count += right - left + 1
return count
return at_most(k) - at_most(k - 1)class Solution {
// Count substrings with at most k distinct characters
private int atMost(String s, int k) {
int n = s.length();
int count = 0;
int left = 0;
int distinctCount = 0;
int[] freq = new int[26];
for (int right = 0; right < n; right++) {
// Expand window: add character at right
freq[s.charAt(right) - 'a']++;
if (freq[s.charAt(right) - 'a'] == 1) {
distinctCount++;
}
// Shrink window if distinct exceeds k
while (distinctCount > k) {
freq[s.charAt(left) - 'a']--;
if (freq[s.charAt(left) - 'a'] == 0) {
distinctCount--;
}
left++;
}
// All substrings ending at right with start in [left, right]
count += right - left + 1;
}
return count;
}
public int countSubstr(String s, int k) {
return atMost(s, k) - atMost(s, k - 1);
}
}Complexity Analysis
Time Complexity: O(n)
The atMost function processes the string in a single pass. The right pointer moves from 0 to n-1, and the left pointer only moves forward — it never goes backward. Each character is added to the window exactly once (when right reaches it) and removed at most once (when left passes it). So the total work across both pointers is at most 2n. Since we call atMost twice (for k and k-1), the total time is O(2 × 2n) = O(n).
Space Complexity: O(1)
We use a frequency array of fixed size 26 and a handful of integer variables. The space does not grow with the input size, so it is constant.