Matrix Chain Multiplication Bottom-Up Approach
Description
Given the dimensions of a sequence of matrices in an array arr[], where the i-th matrix has dimensions arr[i-1] × arr[i] (for i ≥ 1), find the most efficient way to multiply these matrices together such that the total number of scalar multiplications is minimized.
The problem is not to actually perform the multiplications, but to decide the order (parenthesization) in which to multiply the matrices. Matrix multiplication is associative — the result is the same regardless of how you parenthesize — but the number of scalar multiplications changes dramatically depending on the grouping.
When a matrix of size m × n is multiplied with a matrix of size n × p, it produces a matrix of size m × p and requires m × n × p scalar multiplications.
Return the minimum number of scalar multiplications needed to compute the product of the entire chain.
Focus: This problem emphasizes the bottom-up (tabulation) dynamic programming approach, where we iteratively fill a DP table by solving subproblems in order of increasing chain length — from pairs of adjacent matrices up to the full chain.
Examples
Example 1
Input: arr = [2, 1, 3, 4]
Output: 20
Explanation: There are 3 matrices: M1 (2×1), M2 (1×3), M3 (3×4). Two possible parenthesizations:
- ((M1 × M2) × M3): Cost = (2×1×3) + (2×3×4) = 6 + 24 = 30.
- (M1 × (M2 × M3)): Cost = (1×3×4) + (2×1×4) = 12 + 8 = 20.
The minimum is 20.
Example 2
Input: arr = [1, 2, 3, 4, 3]
Output: 30
Explanation: There are 4 matrices: M1 (1×2), M2 (2×3), M3 (3×4), M4 (4×3). The optimal parenthesization is ((M1 × M2) × M3) × M4:
- M1 × M2 = 1×2×3 = 6
- (M1M2) × M3 = 1×3×4 = 12
- ((M1M2)M3) × M4 = 1×4×3 = 12
- Total = 6 + 12 + 12 = 30.
Example 3
Input: arr = [10, 30, 5, 60]
Output: 4500
Explanation: There are 3 matrices: A (10×30), B (30×5), C (5×60).
- (AB)C: (10×30×5) + (10×5×60) = 1500 + 3000 = 4500
- A(BC): (30×5×60) + (10×30×60) = 9000 + 18000 = 27000
The minimum is 4500, achieved by first multiplying A×B to get a 10×5 matrix, then multiplying the result by C.
Example 4
Input: arr = [3, 4]
Output: 0
Explanation: Only one matrix (3×4). No multiplication is needed.
Constraints
- 2 ≤ arr.size() ≤ 100
- 1 ≤ arr[i] ≤ 200
Editorial
Brute Force
Intuition
The most straightforward approach is to recursively try every possible way to split the matrix chain and pick the cheapest one.
For a chain of matrices represented by indices [i..j] in the dimensions array, we can split at any position k (where i < k < j). This divides the chain into two groups: [i..k] and [k..j]. We recursively find the minimum cost for each group, then add the cost of multiplying the two resulting matrices together: arr[i] × arr[k] × arr[j].
We try every possible split point and return the minimum total cost. The base case is when there is only one matrix (i+1 == j), which costs nothing.
Step-by-Step Explanation
Let's trace with arr = [10, 30, 5, 60] (matrices: A=10×30, B=30×5, C=5×60):
Step 1: Call minCost(0, 3). Try splits k=1 and k=2.
Step 2: Split k=1: left=[0,1] (A alone, cost 0), right=[1,3] (B×C).
- minCost(1,3): split k=2 → 0 + 0 + 30×5×60 = 9000.
- Merge cost for k=1 = 10×30×60 = 18000. Total = 0 + 9000 + 18000 = 27000.
Step 3: Split k=2: left=[0,2] (A×B), right=[2,3] (C alone, cost 0).
- minCost(0,2): split k=1 → 0 + 0 + 10×30×5 = 1500.
- Merge cost for k=2 = 10×5×60 = 3000. Total = 1500 + 0 + 3000 = 4500.
Step 4: Compare: k=1 → 27000, k=2 → 4500. Minimum = 4500.
Result: 4500.
Algorithm
- Define
minCost(i, j)representing the minimum cost to multiply chain [i..j]. - Base case: if
i + 1 == j, return 0. - For each split k from i+1 to j-1:
- cost = minCost(i, k) + minCost(k, j) + arr[i] × arr[k] × arr[j]
- Track minimum.
- Return minimum cost. Answer is
minCost(0, n-1).
Code
#include <bits/stdc++.h>
using namespace std;
int minCostRec(vector<int>& arr, int i, int j) {
if (i + 1 == j)
return 0;
int res = INT_MAX;
for (int k = i + 1; k < j; k++) {
int cost = minCostRec(arr, i, k)
+ minCostRec(arr, k, j)
+ arr[i] * arr[k] * arr[j];
res = min(res, cost);
}
return res;
}
int matrixMultiplication(vector<int>& arr) {
return minCostRec(arr, 0, arr.size() - 1);
}import sys
def matrixMultiplication(arr):
def minCostRec(i, j):
if i + 1 == j:
return 0
res = sys.maxsize
for k in range(i + 1, j):
cost = (minCostRec(i, k)
+ minCostRec(k, j)
+ arr[i] * arr[k] * arr[j])
res = min(res, cost)
return res
return minCostRec(0, len(arr) - 1)class Solution {
static int minCostRec(int[] arr, int i, int j) {
if (i + 1 == j)
return 0;
int res = Integer.MAX_VALUE;
for (int k = i + 1; k < j; k++) {
int cost = minCostRec(arr, i, k)
+ minCostRec(arr, k, j)
+ arr[i] * arr[k] * arr[j];
res = Math.min(res, cost);
}
return res;
}
static int matrixMultiplication(int[] arr) {
return minCostRec(arr, 0, arr.length - 1);
}
}Complexity Analysis
Time Complexity: O(2^n)
The number of possible parenthesizations is the Catalan number C(n-2), which grows exponentially. The recursion explores all of them due to massive overlapping subproblems that are recomputed from scratch.
Space Complexity: O(n)
The recursion depth is at most n-1. Each stack frame uses O(1) space, so the call stack uses O(n) total.
Why This Approach Is Not Efficient
The brute force has exponential time complexity because it recomputes the same subproblems repeatedly. For example, minCost(1,3) might be needed by both the k=1 split of minCost(0,3) and the k=1 split of minCost(1,4) (in a larger chain). Each recomputation triggers its own cascade of recursive calls.
With arr.size() up to 100, we have up to 99 matrices. The exponential branching makes brute force completely impractical — it would take longer than the age of the universe to finish.
Key insight: There are only O(n²) unique subproblems defined by pairs (i, j). We can cache each result using memoization to avoid recomputation, bringing time down to O(n³).
Better Approach - Memoization (Top-Down DP)
Intuition
We keep the same recursive structure but add a cache: a 2D table memo[i][j] that stores the result of each subproblem after its first computation. Before solving any subproblem, we check the cache. If the answer is there, we return it instantly without re-recursing.
This transforms the exponential recursion into an O(n³) algorithm because:
- There are O(n²) unique (i, j) pairs.
- Each is computed exactly once, trying O(n) split points.
- Total: O(n²) × O(n) = O(n³).
However, this top-down approach still uses the call stack for recursion, which adds function-call overhead and uses O(n) stack space on top of the O(n²) memo table.
Step-by-Step Explanation
Let's trace with arr = [10, 30, 5, 60] using memoization:
Step 1: Call minCost(0, 3). memo is empty.
Step 2: Try k=1. Need minCost(0,1) → base case, 0. Cache it. Need minCost(1,3) → not cached.
Step 3: Solve minCost(1,3): k=2, left=minCost(1,2)=0, right=minCost(2,3)=0, merge=30×5×60=9000. Cache memo[1][3]=9000.
Step 4: k=1 total: 0 + 9000 + 10×30×60 = 0 + 9000 + 18000 = 27000. Best so far = 27000.
Step 5: Try k=2. Need minCost(0,2) → not cached. Solve: k=1, left=0 (CACHE HIT), right=0 (CACHE HIT), merge=10×30×5=1500. Cache memo[0][2]=1500.
Step 6: Need minCost(2,3)=0 (CACHE HIT). k=2 total: 1500 + 0 + 10×5×60 = 4500. Compare: 4500 < 27000. Best = 4500.
Step 7: Cache memo[0][3]=4500. Return 4500.
Algorithm
- Create
memo[n][n]initialized to -1. - Define
minCost(i, j): if cached, return cache; else compute by trying all splits, cache and return minimum. - Return
minCost(0, n-1).
Code
#include <bits/stdc++.h>
using namespace std;
int minCostRec(vector<int>& arr, int i, int j,
vector<vector<int>>& memo) {
if (i + 1 == j)
return 0;
if (memo[i][j] != -1)
return memo[i][j];
int res = INT_MAX;
for (int k = i + 1; k < j; k++) {
int cost = minCostRec(arr, i, k, memo)
+ minCostRec(arr, k, j, memo)
+ arr[i] * arr[k] * arr[j];
res = min(res, cost);
}
return memo[i][j] = res;
}
int matrixMultiplication(vector<int>& arr) {
int n = arr.size();
vector<vector<int>> memo(n, vector<int>(n, -1));
return minCostRec(arr, 0, n - 1, memo);
}import sys
def matrixMultiplication(arr):
n = len(arr)
memo = [[-1] * n for _ in range(n)]
def minCostRec(i, j):
if i + 1 == j:
return 0
if memo[i][j] != -1:
return memo[i][j]
res = sys.maxsize
for k in range(i + 1, j):
cost = (minCostRec(i, k)
+ minCostRec(k, j)
+ arr[i] * arr[k] * arr[j])
res = min(res, cost)
memo[i][j] = res
return res
return minCostRec(0, n - 1)class Solution {
static int minCostRec(int[] arr, int i, int j, int[][] memo) {
if (i + 1 == j)
return 0;
if (memo[i][j] != -1)
return memo[i][j];
int res = Integer.MAX_VALUE;
for (int k = i + 1; k < j; k++) {
int cost = minCostRec(arr, i, k, memo)
+ minCostRec(arr, k, j, memo)
+ arr[i] * arr[k] * arr[j];
res = Math.min(res, cost);
}
return memo[i][j] = res;
}
static int matrixMultiplication(int[] arr) {
int n = arr.length;
int[][] memo = new int[n][n];
for (int[] row : memo)
java.util.Arrays.fill(row, -1);
return minCostRec(arr, 0, n - 1, memo);
}
}Complexity Analysis
Time Complexity: O(n³)
O(n²) unique subproblems, each trying O(n) split points with O(1) work per split.
Space Complexity: O(n²)
The memo table uses O(n²) space. The recursion stack adds O(n) depth. Total: O(n²).
Why This Approach Is Not Efficient
The memoization approach achieves O(n³) time and O(n²) space, which are asymptotically optimal for this problem. However, there are practical drawbacks:
-
Recursion overhead: Each recursive call incurs function-call overhead — pushing/popping stack frames, managing return addresses. For n=100, the recursion can go up to 99 levels deep, and the total number of function calls can reach hundreds of thousands.
-
Stack space: The O(n) recursion depth adds to memory usage. For very deep recursion or constrained environments, this can be a concern.
-
Unpredictable order: The top-down approach computes subproblems in the order they're first needed by the recursion, which may not be cache-friendly for the CPU.
Key insight: We can eliminate recursion entirely by computing subproblems iteratively in a specific order — from smaller subchains to larger ones. Since every subproblem dp[i][j] only depends on subproblems with shorter chain lengths (which have smaller j - i values), we can fill the table diagonal by diagonal. This is the bottom-up tabulation approach — same O(n³) time and O(n²) space, but with no recursion overhead and better cache locality.
Optimal Approach - Bottom-Up DP (Tabulation)
Intuition
Instead of starting from the full chain and recursing down to base cases, we flip the computation: start from the smallest subchains and build up to the full chain iteratively.
We create a 2D table dp[i][j] where dp[i][j] represents the minimum multiplication cost for the subchain from index i to index j. The key observation is:
- Chain length 1 (single matrix, j = i+1):
dp[i][j] = 0. No multiplication needed. - Chain length 2 (two matrices, j = i+2):
dp[i][j] = arr[i] × arr[i+1] × arr[j]. Only one way to multiply. - Chain length L (L matrices, j = i+L): Try every split k from i+1 to j-1.
dp[i][j] = min over k of (dp[i][k] + dp[k][j] + arr[i] × arr[k] × arr[j]).
We fill the table diagonally — first all length-1 chains (the main diagonal), then all length-2 chains (one diagonal above), then length-3, and so on. When we compute dp[i][j] for a chain of length L, all shorter subchains dp[i][k] and dp[k][j] are already computed and available in the table.
Imagine building a pyramid brick by brick. The bottom row (length-1 chains) is the foundation — all zeros. Each higher row uses bricks from the row below. The top brick dp[0][n-1] is the answer.
The table is upper-triangular: we only fill cells where j > i. The diagonal dp[i][i+1] is all zeros, and the answer sits at dp[0][n-1] in the top-right corner.
![DP table showing diagonal fill order for Matrix Chain Multiplication with arr = [10, 30, 5, 60]](https://pub-a1b3030acfb94e84ba8a89fb182c53bc.r2.dev/public/2d3ffc92-f549-416f-9f70-e5b36e10be50/mcm_dp_table_fill_order_v1.webp)
Step-by-Step Explanation
Let's trace through with arr = [1, 2, 3, 4, 3] (4 matrices: M1=1×2, M2=2×3, M3=3×4, M4=4×3):
Step 1: Initialize DP table. Create dp[5][5] filled with 0. The diagonal dp[i][i+1] = 0 for all i (single-matrix base cases).
Step 2: Fill chain length 2 (two adjacent matrices).
- dp[0][2]: M1×M2. Only split k=1: dp[0][1] + dp[1][2] + arr[0]×arr[1]×arr[2] = 0 + 0 + 1×2×3 = 6.
- dp[1][3]: M2×M3. Only split k=2: 0 + 0 + arr[1]×arr[2]×arr[3] = 2×3×4 = 24.
- dp[2][4]: M3×M4. Only split k=3: 0 + 0 + arr[2]×arr[3]×arr[4] = 3×4×3 = 36.
Step 3: Fill chain length 3 (three adjacent matrices).
- dp[0][3]: M1×M2×M3. Try two splits:
- k=1: dp[0][1] + dp[1][3] + arr[0]×arr[1]×arr[3] = 0 + 24 + 1×2×4 = 32.
- k=2: dp[0][2] + dp[2][3] + arr[0]×arr[2]×arr[3] = 6 + 0 + 1×3×4 = 18.
- Minimum = 18. dp[0][3] = 18.
- dp[1][4]: M2×M3×M4. Try two splits:
- k=2: dp[1][2] + dp[2][4] + arr[1]×arr[2]×arr[4] = 0 + 36 + 2×3×3 = 54.
- k=3: dp[1][3] + dp[3][4] + arr[1]×arr[3]×arr[4] = 24 + 0 + 2×4×3 = 48.
- Minimum = 48. dp[1][4] = 48.
Step 4: Fill chain length 4 (full chain M1×M2×M3×M4).
- dp[0][4]: Try three splits:
- k=1: dp[0][1] + dp[1][4] + arr[0]×arr[1]×arr[4] = 0 + 48 + 1×2×3 = 54.
- k=2: dp[0][2] + dp[2][4] + arr[0]×arr[2]×arr[4] = 6 + 36 + 1×3×3 = 51.
- k=3: dp[0][3] + dp[3][4] + arr[0]×arr[3]×arr[4] = 18 + 0 + 1×4×3 = 30.
- Minimum = 30. dp[0][4] = 30.
Result: dp[0][4] = 30.
Bottom-Up DP — Filling the Table Diagonal by Diagonal — Watch how the DP table fills from the base-case diagonal outward. Each cell dp[i][j] is computed using already-filled cells with shorter chain lengths, building toward the answer at dp[0][n-1].
Algorithm
- Let n = arr.size(). Create a 2D table
dp[n][n]initialized to 0. - Iterate over chain lengths
lenfrom 2 to n-1:- For each starting index
ifrom 0 to n-1-len:- Set
j = i + len - Set
dp[i][j] = infinity - For each split point
kfrom i+1 to j-1:- Compute
cost = dp[i][k] + dp[k][j] + arr[i] × arr[k] × arr[j] - Update
dp[i][j] = min(dp[i][j], cost)
- Compute
- Set
- For each starting index
- Return
dp[0][n-1].
Code
#include <bits/stdc++.h>
using namespace std;
int matrixMultiplication(vector<int>& arr) {
int n = arr.size();
// dp[i][j] = minimum cost to multiply chain [i..j]
vector<vector<int>> dp(n, vector<int>(n, 0));
// Fill by increasing chain length
for (int len = 2; len < n; len++) {
for (int i = 0; i < n - len; i++) {
int j = i + len;
dp[i][j] = INT_MAX;
// Try every split point
for (int k = i + 1; k < j; k++) {
int cost = dp[i][k] + dp[k][j]
+ arr[i] * arr[k] * arr[j];
dp[i][j] = min(dp[i][j], cost);
}
}
}
return dp[0][n - 1];
}import sys
def matrixMultiplication(arr):
n = len(arr)
# dp[i][j] = minimum cost to multiply chain [i..j]
dp = [[0] * n for _ in range(n)]
# Fill by increasing chain length
for length in range(2, n):
for i in range(n - length):
j = i + length
dp[i][j] = sys.maxsize
# Try every split point
for k in range(i + 1, j):
cost = (dp[i][k] + dp[k][j]
+ arr[i] * arr[k] * arr[j])
dp[i][j] = min(dp[i][j], cost)
return dp[0][n - 1]class Solution {
static int matrixMultiplication(int[] arr) {
int n = arr.length;
// dp[i][j] = minimum cost to multiply chain [i..j]
int[][] dp = new int[n][n];
// Fill by increasing chain length
for (int len = 2; len < n; len++) {
for (int i = 0; i < n - len; i++) {
int j = i + len;
dp[i][j] = Integer.MAX_VALUE;
// Try every split point
for (int k = i + 1; k < j; k++) {
int cost = dp[i][k] + dp[k][j]
+ arr[i] * arr[k] * arr[j];
dp[i][j] = Math.min(dp[i][j], cost);
}
}
}
return dp[0][n - 1];
}
}Complexity Analysis
Time Complexity: O(n³)
The outer loop runs for chain lengths from 2 to n-1 (n-2 iterations). For each length, the middle loop iterates over starting positions (up to n times). For each cell, the inner loop tries all split points (up to n-2 times). The three nested loops give O(n³) total. Each iteration does O(1) work (one multiplication, two additions, one comparison).
Space Complexity: O(n²)
The DP table dp[n][n] uses O(n²) space. No recursion stack is needed — this is the key advantage over the memoization approach. All computation is done iteratively with predictable, cache-friendly memory access patterns.
Comparison with Memoization:
- Both are O(n³) time and O(n²) space asymptotically.
- Bottom-up eliminates recursion overhead (no function calls, no stack frames).
- Bottom-up has better cache locality (fills table in a predictable diagonal pattern).
- In practice, bottom-up is faster by a constant factor, especially for larger inputs.