Skip to main content

Substrings with K Distinct Characters

MEDIUMProblemSolveExternal Links

Description

Given a string s consisting of only lowercase English letters and an integer k, count the total number of substrings of s that contain exactly k distinct characters.

A substring is a contiguous sequence of characters within the string. Two substrings that are identical in content but occur at different positions in the string should each be counted separately.

For example, in the string "aa" with k = 1, the substrings "a" starting at index 0, "a" starting at index 1, and "aa" are all valid — giving a count of 3.

Examples

Example 1

Input: s = "abc", k = 2

Output: 2

Explanation: The substrings with exactly 2 distinct characters are:

  • "ab" (characters 'a' and 'b') — starting at index 0
  • "bc" (characters 'b' and 'c') — starting at index 1

The substring "abc" has 3 distinct characters, so it does not qualify. Single characters have only 1 distinct character each, so they do not qualify either.

Example 2

Input: s = "aba", k = 2

Output: 3

Explanation: The substrings with exactly 2 distinct characters are:

  • "ab" (indices 0-1, characters 'a' and 'b')
  • "ba" (indices 1-2, characters 'b' and 'a')
  • "aba" (indices 0-2, characters 'a' and 'b')

All three substrings contain exactly 2 distinct characters.

Example 3

Input: s = "aa", k = 1

Output: 3

Explanation: The substrings with exactly 1 distinct character are:

  • "a" (index 0)
  • "a" (index 1)
  • "aa" (indices 0-1)

Even though the two single-character substrings look identical, they occur at different positions and are counted separately.

Constraints

  • 1 ≤ s.length ≤ 10^6
  • 1 ≤ k ≤ 26
  • s consists of only lowercase English letters ('a' to 'z')

Editorial

Brute Force

Intuition

The simplest way to solve this problem is to examine every possible substring of the given string and count how many of them have exactly k distinct characters.

A string of length n has n × (n + 1) / 2 substrings. For each one, we can track which characters appear using a frequency array of size 26 (one slot per lowercase letter). As we extend a substring character by character, we update the frequency array and maintain a running count of distinct characters. Whenever the distinct count equals k, we increment our answer.

This approach is intuitive because it directly mirrors the problem statement — check every substring, count its distinct characters, and see if it matches k. The downside is that we process a quadratic number of substrings.

Step-by-Step Explanation

Let's trace through with s = "aba", k = 2:

Step 1: Fix starting index i = 0. Initialize a frequency array of 26 zeros and distinctCount = 0.

Step 2: Extend to j = 0. Character 'a'. freq['a'] becomes 1. distinctCount = 1.

  • Is distinctCount == 2? No. (Only 1 distinct character so far.)

Step 3: Extend to j = 1. Character 'b'. freq['b'] becomes 1. distinctCount = 2.

  • Is distinctCount == 2? Yes! Count this substring "ab". answer = 1.

Step 4: Extend to j = 2. Character 'a'. freq['a'] becomes 2. distinctCount stays 2 (a was already seen).

  • Is distinctCount == 2? Yes! Count this substring "aba". answer = 2.

Step 5: No more characters. Move to i = 1. Reset frequency array and distinctCount = 0.

Step 6: Extend to j = 1. Character 'b'. freq['b'] becomes 1. distinctCount = 1.

  • Is distinctCount == 2? No.

Step 7: Extend to j = 2. Character 'a'. freq['a'] becomes 1. distinctCount = 2.

  • Is distinctCount == 2? Yes! Count "ba". answer = 3.

Step 8: No more characters. Move to i = 2. Reset.

Step 9: Extend to j = 2. Character 'a'. distinctCount = 1.

  • Is distinctCount == 2? No.

Step 10: No more characters. All starting positions exhausted. Return answer = 3.

Brute Force — Checking All Substrings of "aba" — Watch how nested loops enumerate every substring and count distinct characters using a frequency tracker. Each valid substring with exactly k=2 distinct characters is counted.

Algorithm

  1. Initialize answer = 0.
  2. For each starting index i from 0 to n-1:
    a. Reset a frequency array of size 26 to all zeros.
    b. Reset distinctCount = 0.
    c. For each ending index j from i to n-1:
    • If freq[s[j]] is 0, increment distinctCount (new character).
    • Increment freq[s[j]].
    • If distinctCount == k, increment answer.
  3. Return answer.

Code

#include <string>
#include <vector>
using namespace std;

class Solution {
public:
    int countSubstr(string& s, int k) {
        int n = s.length();
        int answer = 0;
        
        for (int i = 0; i < n; i++) {
            vector<int> freq(26, 0);
            int distinctCount = 0;
            
            for (int j = i; j < n; j++) {
                if (freq[s[j] - 'a'] == 0) {
                    distinctCount++;
                }
                freq[s[j] - 'a']++;
                
                if (distinctCount == k) {
                    answer++;
                }
            }
        }
        
        return answer;
    }
};
class Solution:
    def countSubstr(self, s: str, k: int) -> int:
        n = len(s)
        answer = 0
        
        for i in range(n):
            freq = [0] * 26
            distinct_count = 0
            
            for j in range(i, n):
                char_index = ord(s[j]) - ord('a')
                if freq[char_index] == 0:
                    distinct_count += 1
                freq[char_index] += 1
                
                if distinct_count == k:
                    answer += 1
        
        return answer
class Solution {
    public int countSubstr(String s, int k) {
        int n = s.length();
        int answer = 0;
        
        for (int i = 0; i < n; i++) {
            int[] freq = new int[26];
            int distinctCount = 0;
            
            for (int j = i; j < n; j++) {
                if (freq[s.charAt(j) - 'a'] == 0) {
                    distinctCount++;
                }
                freq[s.charAt(j) - 'a']++;
                
                if (distinctCount == k) {
                    answer++;
                }
            }
        }
        
        return answer;
    }
}

Complexity Analysis

Time Complexity: O(n²)

The outer loop runs n times and the inner loop runs up to n times for each starting position. In total, we examine n × (n + 1) / 2 substrings. Each iteration performs O(1) work (array lookup, increment, comparison). Therefore, the overall time complexity is O(n²).

Space Complexity: O(1)

We use a frequency array of fixed size 26 (one slot per lowercase letter). This does not grow with the input size, so the space usage is constant.

Why This Approach Is Not Efficient

The brute force approach examines O(n²) substrings. When the string length can be up to 10^6 (as stated in the constraints), the number of operations would be approximately (10^6)² = 10^12 — far too many to complete within typical time limits.

The core inefficiency is that we re-examine overlapping portions of the string repeatedly. When we shift the starting index from i to i+1, we discard all the frequency information we built up and start fresh. This redundant work is what the sliding window technique eliminates.

We need an approach that processes each character a constant number of times, achieving O(n) time complexity.

Optimal Approach - Sliding Window with At-Most-K Trick

Intuition

Counting substrings with exactly k distinct characters directly using a sliding window is tricky, because adding a character might increase distinct count while removing one might decrease it — the window boundaries do not move monotonically for an "exactly k" condition.

The elegant trick is to decompose the problem:

exactly(k) = atMost(k) − atMost(k − 1)

Here, atMost(k) counts the number of substrings with at most k distinct characters. The "at most" condition is perfectly suited for the sliding window technique because:

  • When we add a character that pushes distinct count above k, we shrink the window from the left until distinct count is ≤ k again.
  • The window boundaries move monotonically — the left pointer never moves backward.

For any valid window [left, right] where the number of distinct characters is ≤ k, every substring ending at right and starting at any index from left to right is also valid. That gives us right − left + 1 new valid substrings each time we extend the right pointer.

By computing atMost(k) and atMost(k − 1) separately and subtracting, we isolate exactly the substrings that have precisely k distinct characters — no more, no fewer.

Step-by-Step Explanation

Let's trace through with s = "aba", k = 2.

We need: exactly(2) = atMost(2) − atMost(1).

Computing atMost(2):

Initialize: left = 0, distinctCount = 0, count = 0, freq = [0]*26.

Step 1: right = 0, char = 'a'. freq['a'] = 1. distinctCount = 1 (≤ 2, OK).

  • Valid substrings ending at 0: right − left + 1 = 0 − 0 + 1 = 1 → "a".
  • count = 1.

Step 2: right = 1, char = 'b'. freq['b'] = 1. distinctCount = 2 (≤ 2, OK).

  • Valid substrings ending at 1: 1 − 0 + 1 = 2 → "b", "ab".
  • count = 1 + 2 = 3.

Step 3: right = 2, char = 'a'. freq['a'] = 2. distinctCount stays 2 (≤ 2, OK).

  • Valid substrings ending at 2: 2 − 0 + 1 = 3 → "a", "ba", "aba".
  • count = 3 + 3 = 6.

atMost(2) = 6.

Computing atMost(1):

Reset: left = 0, distinctCount = 0, count = 0, freq = [0]*26.

Step 4: right = 0, char = 'a'. freq['a'] = 1. distinctCount = 1 (≤ 1, OK).

  • Valid substrings: 0 − 0 + 1 = 1 → "a".
  • count = 1.

Step 5: right = 1, char = 'b'. freq['b'] = 1. distinctCount = 2 (> 1, SHRINK!).

  • Shrink: remove s[left=0] = 'a'. freq['a'] = 0. distinctCount = 1. left = 1.
  • Now distinctCount ≤ 1. Valid substrings: 1 − 1 + 1 = 1 → "b".
  • count = 1 + 1 = 2.

Step 6: right = 2, char = 'a'. freq['a'] = 1. distinctCount = 2 (> 1, SHRINK!).

  • Shrink: remove s[left=1] = 'b'. freq['b'] = 0. distinctCount = 1. left = 2.
  • Now distinctCount ≤ 1. Valid substrings: 2 − 2 + 1 = 1 → "a".
  • count = 2 + 1 = 3.

atMost(1) = 3.

Result: exactly(2) = atMost(2) − atMost(1) = 6 − 3 = 3. ✓

Sliding Window — Computing atMost(2) on "aba" — Watch how the sliding window expands right and counts valid substrings. Since all characters fit within 2 distinct, the left pointer never needs to shrink in this pass.

Algorithm

  1. Define a helper function atMost(s, k) that counts substrings with at most k distinct characters:
    a. Initialize left = 0, count = 0, distinctCount = 0, and a frequency array of size 26.
    b. For each right from 0 to n-1:
    • Increment freq[s[right]]. If this is the first occurrence, increment distinctCount.
    • While distinctCount > k: decrement freq[s[left]]. If the frequency drops to 0, decrement distinctCount. Increment left.
    • Add right - left + 1 to count (all valid substrings ending at right).
      c. Return count.
  2. The answer is atMost(s, k) - atMost(s, k - 1).

Code

#include <string>
#include <vector>
using namespace std;

class Solution {
public:
    // Count substrings with at most k distinct characters
    int atMost(string& s, int k) {
        int n = s.length();
        int count = 0;
        int left = 0;
        int distinctCount = 0;
        vector<int> freq(26, 0);
        
        for (int right = 0; right < n; right++) {
            // Expand window: add character at right
            freq[s[right] - 'a']++;
            if (freq[s[right] - 'a'] == 1) {
                distinctCount++;
            }
            
            // Shrink window if distinct exceeds k
            while (distinctCount > k) {
                freq[s[left] - 'a']--;
                if (freq[s[left] - 'a'] == 0) {
                    distinctCount--;
                }
                left++;
            }
            
            // All substrings ending at right with start in [left, right]
            count += right - left + 1;
        }
        
        return count;
    }
    
    int countSubstr(string& s, int k) {
        return atMost(s, k) - atMost(s, k - 1);
    }
};
class Solution:
    def countSubstr(self, s: str, k: int) -> int:
        def at_most(max_distinct: int) -> int:
            n = len(s)
            count = 0
            left = 0
            distinct_count = 0
            freq = [0] * 26
            
            for right in range(n):
                # Expand window: add character at right
                char_idx = ord(s[right]) - ord('a')
                freq[char_idx] += 1
                if freq[char_idx] == 1:
                    distinct_count += 1
                
                # Shrink window if distinct exceeds max_distinct
                while distinct_count > max_distinct:
                    left_idx = ord(s[left]) - ord('a')
                    freq[left_idx] -= 1
                    if freq[left_idx] == 0:
                        distinct_count -= 1
                    left += 1
                
                # All substrings ending at right with start in [left, right]
                count += right - left + 1
            
            return count
        
        return at_most(k) - at_most(k - 1)
class Solution {
    // Count substrings with at most k distinct characters
    private int atMost(String s, int k) {
        int n = s.length();
        int count = 0;
        int left = 0;
        int distinctCount = 0;
        int[] freq = new int[26];
        
        for (int right = 0; right < n; right++) {
            // Expand window: add character at right
            freq[s.charAt(right) - 'a']++;
            if (freq[s.charAt(right) - 'a'] == 1) {
                distinctCount++;
            }
            
            // Shrink window if distinct exceeds k
            while (distinctCount > k) {
                freq[s.charAt(left) - 'a']--;
                if (freq[s.charAt(left) - 'a'] == 0) {
                    distinctCount--;
                }
                left++;
            }
            
            // All substrings ending at right with start in [left, right]
            count += right - left + 1;
        }
        
        return count;
    }
    
    public int countSubstr(String s, int k) {
        return atMost(s, k) - atMost(s, k - 1);
    }
}

Complexity Analysis

Time Complexity: O(n)

The atMost function processes the string in a single pass. The right pointer moves from 0 to n-1, and the left pointer only moves forward — it never goes backward. Each character is added to the window exactly once (when right reaches it) and removed at most once (when left passes it). So the total work across both pointers is at most 2n. Since we call atMost twice (for k and k-1), the total time is O(2 × 2n) = O(n).

Space Complexity: O(1)

We use a frequency array of fixed size 26 and a handful of integer variables. The space does not grow with the input size, so it is constant.