Class NFATraceFinderGenerator
-
Method Summary
Modifier and TypeMethodDescriptionstatic NFAgenerateTraceFinder(NFA nfa) Generates a NFA that can be used to generate a backward-searching DFA that can find the result (capture group offsets) of a regex match found by a forward-searching DFA.
-
Method Details
-
generateTraceFinder
Generates a NFA that can be used to generate a backward-searching DFA that can find the result (capture group offsets) of a regex match found by a forward-searching DFA.The idea behind this is the following: If a regular expression does not contain any loops (+ and *), its NFA will be a directed acyclic graph, which can be converted to a tree. If the search pattern is a tree, we can find all possible results (capture group offsets) ahead of time, by a simple tree traversal. Knowing all possible results in advance saves us the work of updating individual result indices during the search. The only problem is that we generally have to find the pattern in a string - the result will not necessarily start at the index we start searching from.
Doing an un-anchored search with a DFA requires that we add a loop in the beginning. For example, in order to find
/a(b|cd)/in"____acd____", we have to prepend the expression with[ -]*, and the resulting NFA can no longer be seen as a tree. In order to still gain some performance from the ability to pre-calculate all results, we can do the following:- We create a "reverse" tree from the NFA, so the root of the tree is the final state of the original NFA. We annotate every node in the tree-shaped NFA with all pre-calculated results that are reachable from the respective node. All leaves of the tree have exactly one possible pre-calculated result.
- We find the regex match using a regular forward searching DFA, with no regard to capture groups.
- The result of the forward search gives us the anchor necessary to do a search based on the tree-shaped NFA, and it also gives us the information that a match has definitely been found.
- We can now use this information to search for the pre-calculated result that corresponds to the string we found. To do so, we create a DFA from the tree-shaped NFA, where all states whose NFA state set contains only one possible pre-calculated result are final states without further successors.
- To find the correct pre-calculated result, we simply run the new DFA in reverse direction, starting from the index we found with the forward searching DFA.
We also have to take care of the order in which results are to be found. For example, in the expressionExample: regular expression: /a(b|c)d(e|fg)/ this expression has two possible results: (in the form: [start CG 0, end CG 0, start CG 1, end CG 1,... ]) 0: [0, 4, 1, 2, 3, 4] 1: [0, 5, 1, 2, 3, 5] NFA in reverse tree form: I / \ e g | | d f / | | b c d | | | \ a a b c | | a a When searching for the correct result, we can immediately determine it based on the last character: e -> result 0 g -> result 1/a(b)c|ab(c)/, we always have to return the result created from taking the first branch, never the second. Therefore, we create the reverse tree-shaped NFA while traversing the original NFA in priority order.- Parameters:
nfa- the NFA used for the forward-searching DFA.- Returns:
- a new NFA suitable to find the result while searching backward.
-