Skip to main content

Longest Common Substring

Description

Given two strings s1 and s2, find the length of the longest common substring.

A substring is a contiguous sequence of characters within a string. Unlike a subsequence, a substring cannot skip characters — all characters must appear consecutively in the original string.

For example, in the string "ABCDE":

  • "BCD" is a substring (contiguous block from index 1 to 3)
  • "ACE" is NOT a substring (characters are not contiguous), although it is a subsequence

Your task is to find the longest string that appears as a contiguous block in both s1 and s2.

Examples

Example 1

Input: s1 = "ABCDGH", s2 = "ACDGHR"

Output: 4

Explanation: The longest common substring is "CDGH". It appears at positions 2–5 in s1 and positions 1–4 in s2. Both occurrences are contiguous. Other common substrings like "A" (length 1) and "DGH" (length 3) exist but are shorter.

Example 2

Input: s1 = "abc", s2 = "acb"

Output: 1

Explanation: The longest common substrings are "a", "b", and "c", each with length 1. Although both strings contain 'a', 'b', and 'c', no two consecutive characters appear in the same order in both strings.

Example 3

Input: s1 = "YZ", s2 = "yz"

Output: 0

Explanation: 'Y' and 'y' are different characters (case-sensitive comparison). Similarly for 'Z' and 'z'. There is no common substring at all, so the answer is 0.

Constraints

  • 1 ≤ s1.size(), s2.size() ≤ 10^3
  • Both strings may contain upper and lower case English alphabets.

Editorial

Brute Force

Intuition

The most straightforward approach is to consider every possible pair of starting positions — one from s1 and one from s2 — and check how far the match extends consecutively.

Imagine placing the two strings side by side. For every pair of positions (i in s1, j in s2), you start comparing characters one by one: s1[i] vs s2[j], then s1[i+1] vs s2[j+1], and so on, until you hit a mismatch or reach the end of either string. The number of consecutive matches is the length of the common substring starting at those positions.

You repeat this for every possible pair (i, j) and keep track of the maximum length found. This is like trying every possible alignment of the two strings and measuring the longest matching segment at each alignment.

Step-by-Step Explanation

Let's trace through with s1 = "ABC", s2 = "DABC":

Step 1: Start at i=0, j=0. Compare s1[0]='A' vs s2[0]='D'. Mismatch. Length = 0.

Step 2: Try i=0, j=1. Compare s1[0]='A' vs s2[1]='A'. Match! Start extending.

Step 3: Extend: compare s1[1]='B' vs s2[2]='B'. Match! Running length = 2.

Step 4: Extend: compare s1[2]='C' vs s2[3]='C'. Match! Running length = 3. Reached end of s1. Update max_len = 3.

Step 5: Try i=0, j=2. Compare s1[0]='A' vs s2[2]='B'. Mismatch. Length = 0.

Step 6: Try i=1, j=0. Compare s1[1]='B' vs s2[0]='D'. Mismatch. Length = 0.

Step 7: Try i=1, j=2. Compare s1[1]='B' vs s2[2]='B'. Match! Extend: s1[2]='C' vs s2[3]='C'. Match! Length = 2. max_len stays 3.

Step 8: Try i=2, j=3. Compare s1[2]='C' vs s2[3]='C'. Match! Length = 1. max_len stays 3.

Result: The longest common substring is "ABC" with length 3.

Brute Force — Checking All Starting Pairs — Watch how we try every pair of starting positions in s1 and s2, extending each match as far as it goes consecutively.

Algorithm

  1. Initialize max_len = 0.
  2. For each starting position i in s1 (0 to m-1):
    • For each starting position j in s2 (0 to n-1):
      • Set len = 0.
      • While s1[i + len] == s2[j + len] and neither index goes out of bounds:
        • Increment len.
      • Update max_len = max(max_len, len).
  3. Return max_len.

Code

class Solution {
public:
    int longestCommonSubstr(string &s1, string &s2) {
        int m = s1.size(), n = s2.size();
        int maxLen = 0;

        for (int i = 0; i < m; i++) {
            for (int j = 0; j < n; j++) {
                int len = 0;
                // Extend as far as characters match
                while (i + len < m && j + len < n &&
                       s1[i + len] == s2[j + len]) {
                    len++;
                }
                maxLen = max(maxLen, len);
            }
        }

        return maxLen;
    }
};
class Solution:
    def longestCommonSubstr(self, s1, s2):
        m, n = len(s1), len(s2)
        max_len = 0

        for i in range(m):
            for j in range(n):
                length = 0
                # Extend as far as characters match
                while (i + length < m and j + length < n and
                       s1[i + length] == s2[j + length]):
                    length += 1
                max_len = max(max_len, length)

        return max_len
class Solution {
    public int longestCommonSubstr(String s1, String s2) {
        int m = s1.length(), n = s2.length();
        int maxLen = 0;

        for (int i = 0; i < m; i++) {
            for (int j = 0; j < n; j++) {
                int len = 0;
                // Extend as far as characters match
                while (i + len < m && j + len < n &&
                       s1.charAt(i + len) == s2.charAt(j + len)) {
                    len++;
                }
                maxLen = Math.max(maxLen, len);
            }
        }

        return maxLen;
    }
}

Complexity Analysis

Time Complexity: O(m × n × min(m, n))

The outer two loops iterate over all m × n pairs of starting positions. For each pair, the inner while loop can extend up to min(m, n) characters. In the worst case (e.g., when both strings are identical), every pair extends fully, giving O(m × n × min(m, n)). For m = n = 1000, this could be up to 10^9 operations.

Space Complexity: O(1)

We only use a few integer variables (i, j, len, maxLen). No additional data structures are needed.

Why This Approach Is Not Efficient

The brute force approach has a worst-case time complexity of O(m × n × min(m, n)). For the given constraints where m and n can each be up to 1000, this translates to potentially 10^9 operations — far too slow for typical time limits of 1-2 seconds.

The core inefficiency is redundant extension work. When we find that s1[i..i+k] matches s2[j..j+k], and then later check starting positions (i+1, j+1), we re-verify the overlap s1[i+1..i+k] vs s2[j+1..j+k] from scratch even though we already know those characters match.

The key insight: if we know the length of the longest common suffix ending at positions (i-1, j-1), and the characters s1[i] and s2[j] match, then the longest common suffix ending at (i, j) is simply one more. If they don't match, it's zero — because a substring must be contiguous, so any mismatch breaks the chain.

This dependency pattern — each cell only needs the diagonal predecessor — is perfectly suited for dynamic programming.

Better Approach - Dynamic Programming (Tabulation)

Intuition

We build a 2D table where dp[i][j] represents the length of the longest common suffix ending at s1[i-1] and s2[j-1]. This is different from the Longest Common Subsequence — here each cell tracks a contiguous match, not a possibly-skipping match.

The recurrence is simple:

  • If s1[i-1] == s2[j-1]: dp[i][j] = dp[i-1][j-1] + 1 — extend the matching suffix by one character.
  • If they don't match: dp[i][j] = 0 — the contiguous match is broken. Any mismatch resets the count to zero.

The answer is the maximum value anywhere in the table, not just the bottom-right cell (unlike LCS). This is because the longest common substring can end at any pair of positions.

Think of it like building a heat map: bright spots appear along diagonals where characters match consecutively. The brightest spot (maximum value) is your answer. Each diagonal streak represents a common substring, and the longest streak is what you're looking for.

Step-by-Step Explanation

Let's trace through with s1 = "ABC", s2 = "DABC":

Step 1: Create a (4×5) DP table. Rows represent prefixes of s1: "", "A", "AB", "ABC". Columns represent prefixes of s2: "", "D", "DA", "DAB", "DABC". Initialize all to 0.

Step 2: dp[1][1]: s1[0]='A' vs s2[0]='D'. Mismatch → dp[1][1] = 0. The contiguous match is broken.

Step 3: dp[1][2]: s1[0]='A' vs s2[1]='A'. Match! dp[1][2] = dp[0][1] + 1 = 0 + 1 = 1. Update max_len = 1.

Step 4: dp[1][3]: s1[0]='A' vs s2[2]='B'. Mismatch → dp[1][3] = 0.

Step 5: dp[1][4]: s1[0]='A' vs s2[3]='C'. Mismatch → dp[1][4] = 0.

Step 6: dp[2][1]: s1[1]='B' vs s2[0]='D'. Mismatch → 0.

Step 7: dp[2][2]: s1[1]='B' vs s2[1]='A'. Mismatch → 0. Even though 'A' matched above, 'B'≠'A' breaks the contiguous chain.

Step 8: dp[2][3]: s1[1]='B' vs s2[2]='B'. Match! dp[2][3] = dp[1][2] + 1 = 1 + 1 = 2. We extended the 'A' match diagonally — now 'AB' is a common substring. max_len = 2.

Step 9: dp[2][4]: s1[1]='B' vs s2[3]='C'. Mismatch → 0.

Step 10: dp[3][1]: s1[2]='C' vs s2[0]='D'. Mismatch → 0.

Step 11: dp[3][2]: s1[2]='C' vs s2[1]='A'. Mismatch → 0.

Step 12: dp[3][3]: s1[2]='C' vs s2[2]='B'. Mismatch → 0. The 'AB' diagonal streak is broken.

Step 13: dp[3][4]: s1[2]='C' vs s2[3]='C'. Match! dp[3][4] = dp[2][3] + 1 = 2 + 1 = 3. We extended the 'AB' match to 'ABC'! max_len = 3.

Result: max_len = 3. The longest common substring is "ABC". Notice the diagonal streak: dp[1][2]=1, dp[2][3]=2, dp[3][4]=3.

Longest Common Substring — DP Table with Diagonal Streaks — Watch how matches form diagonal streaks in the DP table. Each match extends the previous diagonal value by 1, while mismatches reset to 0.

Algorithm

  1. Create a 2D table dp of size (m+1) × (n+1), initialized to 0.
  2. Initialize max_len = 0.
  3. For each row i from 1 to m:
    • For each column j from 1 to n:
      • If s1[i-1] == s2[j-1]: dp[i][j] = dp[i-1][j-1] + 1, and update max_len if dp[i][j] > max_len.
      • Else: dp[i][j] = 0 (contiguous match is broken).
  4. Return max_len.

Key difference from LCS: On mismatch, the cell becomes 0 (not max of neighbors), and the answer is the maximum across the entire table (not just dp[m][n]).

Code

class Solution {
public:
    int longestCommonSubstr(string &s1, string &s2) {
        int m = s1.size(), n = s2.size();
        vector<vector<int>> dp(m + 1, vector<int>(n + 1, 0));
        int maxLen = 0;

        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (s1[i - 1] == s2[j - 1]) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                    maxLen = max(maxLen, dp[i][j]);
                }
                // else dp[i][j] remains 0
            }
        }

        return maxLen;
    }
};
class Solution:
    def longestCommonSubstr(self, s1, s2):
        m, n = len(s1), len(s2)
        dp = [[0] * (n + 1) for _ in range(m + 1)]
        max_len = 0

        for i in range(1, m + 1):
            for j in range(1, n + 1):
                if s1[i - 1] == s2[j - 1]:
                    dp[i][j] = dp[i - 1][j - 1] + 1
                    max_len = max(max_len, dp[i][j])
                # else dp[i][j] stays 0

        return max_len
class Solution {
    public int longestCommonSubstr(String s1, String s2) {
        int m = s1.length(), n = s2.length();
        int[][] dp = new int[m + 1][n + 1];
        int maxLen = 0;

        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
                    dp[i][j] = dp[i - 1][j - 1] + 1;
                    maxLen = Math.max(maxLen, dp[i][j]);
                }
                // else dp[i][j] stays 0
            }
        }

        return maxLen;
    }
}

Complexity Analysis

Time Complexity: O(m × n)

We fill every cell of the (m+1) × (n+1) table exactly once. Each cell computation is O(1) — a character comparison and possibly an addition. No inner extension loop is needed because the DP table captures the running match length. For m = n = 1000, this is 10^6 operations.

Space Complexity: O(m × n)

We store the full 2D table with (m+1) × (n+1) entries. For m = n = 1000, this requires about 10^6 integers ≈ 4 MB.

Why This Approach Is Not Efficient

The tabulation approach runs in O(m × n) time, which is efficient enough for the constraints. However, it uses O(m × n) space for the full 2D table.

Looking at the recurrence closely, dp[i][j] depends on only one cell: dp[i-1][j-1] — the diagonal predecessor. It does not depend on dp[i-1][j] or dp[i][j-1] (unlike LCS). This means each row only needs the previous row's values.

We can exploit this by maintaining just two 1D arrays — one for the previous row and one for the current row — reducing space from O(m × n) to O(n). When characters match, we read from the previous row's diagonal entry. When they don't match, we write zero.

Optimal Approach - Space-Optimized Dynamic Programming

Intuition

The algorithm is identical to the tabulation approach — same recurrence, same time complexity. The only change is the storage strategy.

Instead of the full 2D table, we keep two arrays:

  • prev: the completed previous row
  • curr: the row we are currently computing

For each cell in the current row:

  • If characters match: curr[j] = prev[j-1] + 1 — extend the diagonal match from the previous row.
  • If they don't match: curr[j] = 0 — reset the contiguous match.

We also maintain a running max_len that we update whenever a cell exceeds the current maximum.

After finishing each row, we swap prev = curr and reset curr. This is even simpler than the LCS space optimization because we only need the diagonal value, not the cell directly above.

Step-by-Step Explanation

Let's trace through with s1 = "ABC", s2 = "DABC":

Step 1: Initialize prev = [0, 0, 0, 0, 0]. Columns represent: ε, D, A, B, C. max_len = 0.

Step 2: Process s1[0]='A', j=1: 'A' vs 'D'. Mismatch → curr[1] = 0.

Step 3: j=2: 'A' vs 'A'. Match! curr[2] = prev[1] + 1 = 0 + 1 = 1. max_len = 1.

Step 4: j=3: 'A' vs 'B'. Mismatch → curr[3] = 0.

Step 5: j=4: 'A' vs 'C'. Mismatch → curr[4] = 0. Swap: prev = [0, 0, 1, 0, 0].

Step 6: Process s1[1]='B', j=2: 'B' vs 'A'. Mismatch → curr[2] = 0.

Step 7: j=3: 'B' vs 'B'. Match! curr[3] = prev[2] + 1 = 1 + 1 = 2. max_len = 2. The 'A' match extends to 'AB'!

Step 8: j=4: 'B' vs 'C'. Mismatch → curr[4] = 0. Swap: prev = [0, 0, 0, 2, 0].

Step 9: Process s1[2]='C', j=3: 'C' vs 'B'. Mismatch → curr[3] = 0.

Step 10: j=4: 'C' vs 'C'. Match! curr[4] = prev[3] + 1 = 2 + 1 = 3. max_len = 3!

Step 11: Swap: prev = [0, 0, 0, 0, 3].

Result: max_len = 3. The longest common substring is "ABC", using only O(n) = O(5) space.

Space-Optimized LCSubstring — Diagonal Extension with Two Arrays — Watch how the two rolling arrays track the diagonal match extension. Matches grow via prev[j-1]+1, while mismatches reset to 0.

Algorithm

  1. Create two arrays prev and curr, each of size n+1, initialized to 0.
  2. Initialize max_len = 0.
  3. For each character s1[i-1] (i from 1 to m):
    • For each character s2[j-1] (j from 1 to n):
      • If s1[i-1] == s2[j-1]: curr[j] = prev[j-1] + 1. Update max_len if curr[j] > max_len.
      • Else: curr[j] = 0.
    • After all columns: swap prev and curr, reset curr to zeros.
  4. Return max_len.

Code

class Solution {
public:
    int longestCommonSubstr(string &s1, string &s2) {
        int m = s1.size(), n = s2.size();
        vector<int> prev(n + 1, 0), curr(n + 1, 0);
        int maxLen = 0;

        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (s1[i - 1] == s2[j - 1]) {
                    curr[j] = prev[j - 1] + 1;
                    maxLen = max(maxLen, curr[j]);
                } else {
                    curr[j] = 0;
                }
            }
            swap(prev, curr);
            fill(curr.begin(), curr.end(), 0);
        }

        return maxLen;
    }
};
class Solution:
    def longestCommonSubstr(self, s1, s2):
        m, n = len(s1), len(s2)
        prev = [0] * (n + 1)
        curr = [0] * (n + 1)
        max_len = 0

        for i in range(1, m + 1):
            for j in range(1, n + 1):
                if s1[i - 1] == s2[j - 1]:
                    curr[j] = prev[j - 1] + 1
                    max_len = max(max_len, curr[j])
                else:
                    curr[j] = 0
            prev, curr = curr, [0] * (n + 1)

        return max_len
import java.util.Arrays;

class Solution {
    public int longestCommonSubstr(String s1, String s2) {
        int m = s1.length(), n = s2.length();
        int[] prev = new int[n + 1];
        int[] curr = new int[n + 1];
        int maxLen = 0;

        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
                    curr[j] = prev[j - 1] + 1;
                    maxLen = Math.max(maxLen, curr[j]);
                } else {
                    curr[j] = 0;
                }
            }
            int[] temp = prev;
            prev = curr;
            curr = temp;
            Arrays.fill(curr, 0);
        }

        return maxLen;
    }
}

Complexity Analysis

Time Complexity: O(m × n)

Identical to the tabulation approach. We iterate through every (i, j) pair exactly once, performing O(1) work per cell. Total: m × n operations.

Space Complexity: O(n)

We store only two arrays of size n+1, compared to the full (m+1) × (n+1) table. This reduces space from O(m × n) to O(n). For m = n = 1000, this means ~1000 integers instead of ~10^6.

Note: We can further optimize by making the shorter string the column string (s2), ensuring the array size is O(min(m, n)).