Leetcode Explainer: LeetCode Problem 792 - Number of Matching Subsequences

792. Number of Matching Subsequences

Medium, Dynamic Programming

Given a string s and an array of strings words, return the number of words[i] that is a subsequence of s.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.

For example, "ace" is a subsequence of "abcde".

Example 1:

Input: s = "abcde", words = ["a","bb","acd","ace"]

Output: 3

Explanation: There are three strings in words that are a subsequence of s: "a", "acd", "ace".

Example 2:

Input: s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"]

Output: 2

Constraints:

1 <= s.length <= 5 * 10⁴
1 <= words.length <= 5000
1 <= words[i].length <= 50
s and words[i] consist of only lowercase English letters.

SOLUTION:

To solve this problem, we need to efficiently check how many strings in the words array are subsequences of the string s. A subsequence is defined as a sequence of characters that appear in the same order as in the original string s, but not necessarily consecutively.

Plan:

Optimize the Matching Process: Instead of checking each word in words against s character by character (which would be inefficient), we can use a bucket approach:
- Group the words based on their first character.
- For each character in s, process all words waiting for that character.
Iterate Through s:
- As we iterate through the characters of s, we advance the matching process for words waiting on that character.
Use Buckets:
- Maintain a dictionary where each key is a character, and the value is a list of iterators. Each iterator represents a word and its current position in the matching process.
Count Matches:
- When a word's iterator reaches the end, it means the word is a subsequence of s.

Implementation:

using System;
using System.Collections.Generic;

public class Solution {
    public int NumMatchingSubseq(string s, string[] words) {
        // Dictionary to hold buckets of words grouped by the current character they are waiting for
        var waiting = new Dictionary<char, Queue<IEnumerator<char>>>();
        
        // Initialize the dictionary with all lowercase letters
        for (char c = 'a'; c <= 'z'; c++) {
            waiting[c] = new Queue<IEnumerator<char>>();
        }

        // Populate the buckets based on the first character of each word
        foreach (var word in words) {
            var iterator = word.GetEnumerator();
            if (iterator.MoveNext()) {
                waiting[iterator.Current].Enqueue(iterator);
            }
        }

        int count = 0;

        // Iterate through characters of the string `s`
        foreach (char c in s) {
            var bucket = waiting[c]; // Get the bucket for the current character
            int size = bucket.Count;

            for (int i = 0; i < size; i++) {
                var iterator = bucket.Dequeue(); // Get the next word's iterator
                if (iterator.MoveNext()) {
                    waiting[iterator.Current].Enqueue(iterator); // Move to the next character
                } else {
                    count++; // Word is a subsequence
                }
            }
        }

        return count;
    }

    public static void Main(string[] args) {
        var solution = new Solution();
        Console.WriteLine(solution.NumMatchingSubseq("abcde", new string[] { "a", "bb", "acd", "ace" })); // Output: 3
        Console.WriteLine(solution.NumMatchingSubseq("dsahjpjauf", new string[] { "ahjpjau", "ja", "ahbwzgqnuk", "tnmlanowax" })); // Output: 2
    }
}

Explanation of Code:

Dictionary for Buckets:
- waiting is a dictionary where keys are characters ('a' to 'z'), and values are queues of iterators for words waiting on that character.
Iterating Over s:
- For each character in s, process all words waiting for it.
- If the iterator can move to the next character, enqueue it into the bucket for the next character.
- If the iterator is exhausted, increment the match count.
Efficiency:
- Each word is processed exactly once, and each character in s is processed once.
- Total complexity: $O(L + M)$ , where $L$ is the total length of all words in words, and $M$ is the length of s.