KMP string matching algorithm is an efficient string search algorithm, mainly used to deal with pattern matching problems in text data. The algorithm improves the efficiency of string search by using prefix tables to reduce repetitive comparison steps. The core idea of the KMP algorithm is to find a substring in the pattern string so that when matching from the starting position of the substring in the original string, it will not lead to any repeated comparison steps. In the KMP algorithm, we first create a prefix table to store the occurrence position of each character in the pattern string. Then, starting from the first character, we check whether each character in the pattern string appears in the prefix table one by one. If a character is not in the prefix table, we skip it and move on to the next character. If a character is in the prefix table, we move the corresponding part of the prefix table one bit to the right. In this way, we can find the position of the pattern string in the original string without increasing the number of comparisons. The main advantage of the KMP algorithm is that it can complete string matching within the time complexity of O (n + m), where n is the length of the mode string and m is the length of the original string. Compared with naive string matching algorithms (such as brute force matching), KMP algorithm has higher efficiency.
Detailed explanation and code implementation of KMP string matching algorithm.
In modern software development, string matching is a common and important problem. Whether it is keyword matching in search engines or search and replace functions in text editors, efficient string matching algorithms are indispensable.
KMP (Knuth-Morris-Pratt) algorithm is a classic string matching algorithm, which avoids repeated matching through partial matching table (prefix table), thereby greatly improving the matching efficiency.
This article will explain the principle of the KMP algorithm in detail, and show its implementation process step by step through the code.
\n#
What is KMP Algorithm?.
The KMP algorithm, proposed by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977, is used to efficiently search a pattern string in a text string. The core idea of the KMP algorithm is to use part of the information that has been matched to avoid re-matching from scratch, thereby improving the matching efficiency.
\n#
The core concept of the KMP algorithm.
The core of the KMP algorithm is to build a "partial matching table" (also known as a "prefix table" or "failure function "). This table records the longest common prefix length of the substring before each position in the schema string.
Specifically, for pattern strings P
, define π[i]
For substring P[0...i]
The length of the longest common prefix.
For example, for pattern strings "ababab"
, whose partial matching table is as follows:
P: a b a b a b
π: 0 0 1 2 3 4
Explanation: \n-π[0] = 0
: A single character has no prefix and suffix. \n-π[1] = 0
: Substring "ab"
No prefix and suffix.
\n-π[2] = 1
: Substring "aba"
The longest common prefix of is "a"
。
\n-π[3] = 2
: Substring "abab"
The longest common prefix of is "ab"
。
\n-π[4] = 3
: Substring "abab"
The longest common prefix of is "aba"
。
\n-π[5] = 4
: Substring "ababab"
The longest common prefix of is "abab"
。
\n#
KMP algorithm steps.
1. # Build partial matching table #: first according to the pattern string P
Build a partial match table π
。
2. # to match #: use partial match table for text T
And mode P
Match.
\n#
Build a partial match table.
The construction process of the partial matching table is as follows:
def compute_prefix_table(pattern):
m = len(pattern)
pi = [0] * m
j = 0 # length of the previous longest prefix suffix
for i in range(1, m):
while j > 0 and pattern[i] != pattern[j]:
j = pi[j - 1]
if pattern[i] == pattern[j]:
j += 1
pi[i] = j
return pi
\n#To match.
With a partial match table, we can use it for text T
And mode P
Match:
def kmp_search(text, pattern):
n = len(text)
m = len(pattern)
pi = compute_prefix_table(pattern)
j = 0 # index for pattern
i = 0 # index for text
matches = []
while i < n:
if pattern[j] == text[i]:
i += 1
j += 1
if j == m:
matches.append(i - j)
j = pi[j - 1]
elif i < n and pattern[j] != text[i]:
if j != 0:
j = pi[j - 1]
else:
i += 1
return matches
\n#Sample code and explanation.
Suppose we have a text string T = "ababcabcabababd"
And a pattern string P = "ababd"
, we want to find where all patterns appear in the text. The following is the complete code implementation and explanation:
# 构建部分匹配表
def compute_prefix_table(pattern):
m = len(pattern)
pi = [0] * m
j = 0 # length of the previous longest prefix suffix
for i in range(1, m):
while j > 0 and pattern[i] != pattern[j]:
j = pi[j - 1]
if pattern[i] == pattern[j]:
j += 1
pi[i] = j
return pi
# KMP搜索算法
def kmp_search(text, pattern):
n = len(text)
m = len(pattern)
pi = compute_prefix_table(pattern)
j = 0 # index for pattern
i = 0 # index for text
matches = []
while i < n:
if pattern[j] == text[i]:
i += 1
j += 1
if j == m:
matches.append(i - j)
j = pi[j - 1]
elif i < n and pattern[j] != text[i]:
if j != 0:
j = pi[j - 1]
else:
i += 1
return matches
# 主程序
if __name__ == "__main__":
text = "ababcabcabababd"
pattern = "ababd"
matches = kmp_search(text, pattern)
print(f"Pattern '{pattern}' found at positions: {matches}")
# Run result #:
Pattern 'ababd' found at positions: [8]
\n#Summarize.
The KMP algorithm avoids repeated matching by constructing a partial matching table, thereby greatly improving the efficiency of string matching. In practical application scenarios, such as search engines, text editors, etc., KMP algorithms can play an important role.
I hope this article can help you understand the principle of KMP algorithm and its implementation process, so that you can flexibly apply this efficient string matching algorithm in actual projects.