Book:Bruce W. Watson/Taxonomies and Toolkits of Regular Language Algorithms
Jump to navigation
Jump to search
Bruce W. Watson: Taxonomies and Toolkits of Regular Language Algorithms
Published $\text {1995}$, Eindhoven University of Technology
- ISBN 90-386-0396-7
Subject Matter
Contents
- Acknowledgements
- I Prologue
- 1 Introduction
- 1.1 Problem statement
- 1.2 Structure of this dissertation
- 1.3 Intended audience
- 1 Introduction
- 2 Mathematical preliminaries
- 2.1 Notations and conventions
- 2.2 Basic definitions
- 2.3 Strings and languages
- 2.4 Regular expressions
- 2.5 Trees
- 2.6 Finite automata and Moore machines
- 2.6.1 Definitions and properties involving $FA$s and $MM$s
- 2.6.2 Transformations on finite automata
- 2.6.2.1 Imperative implementations of some transformations
- 2 Mathematical preliminaries
- II The taxonomies
- 3 Constructing taxonomies
- 4 Keyword pattern matching algorithms
- 4.1 Introduction and related work
- 4.2 The problem and some naive solutions
- 4.2.1 The $(P_+)$ algorithms
- 4.2.1.1 The $(P_+, S_+)$ algorithm and its improvement
- 4.2.1.2 The $(P_+, S_-)$ algorithm
- 4.2.2 The $(S_-)$ algorithms
- 4.2.2.1 The $(S_-, P_+)$ algorithms
- 4.2.2.2 The $(S_-, P_-)$ algorithm
- 4.2.1 The $(P_+)$ algorithms
- 4.3 The Aho-Corasick algorithms
- 4.3.1 Algorithm detail $(AC)$
- 4.3.2 Method $(AC-OPT)$
- 4.3.3 A Moore machine approach to the $(AC-OPT)$ algorithm
- 4.3.4 Linear search
- 4.3.5 The Aho-Corasick failure function algorithm
- 4.3.6 The Knuth-Morris-Pratt algorithm
- 4.3.6.1 Adding indices
- 4.3.7 An alternative derivation of Moore machine $M_0$
- 4.4 The Commentz-Walter algorithms
- 4.4.1 Safe shift distances and predicate weakening
- 4.4.1.1 General weakening strategies
- 4.4.1.2 The $l = \varepsilon$ and the no-lookahead cases
- 4.4.1.2.1 The no-lookahead shift function
- 4.4.2 A shift function without further weakening
- 4.4.3 Towards the CW and BM algorithms
- 4.4.4 A more easily precomputed shift function
- 4.4.5 The standard Commentz-Walter algorithm
- 4.4.6 A derivation of the Boyer-Moore algorithm
- 4.4.7 A weakened Boyer-Moore algorithm
- 4.4.8 Using the right lookahead symbol
- 4.4.1 Safe shift distances and predicate weakening
- 4.5 The Boyer-Moore family of algorithms
- 4.5.1 Larger shifts without using match information
- 4.5.2 Making use of match information
- 4.6 Conclusions
- 4 Keyword pattern matching algorithms
- 5 A new RE pattern matching algorithm
- 5.1 Introduction
- 5.2 Problem specification and a simple first algorithm
- 5.2.1 A more practical algorithm using a finite automaton
- 5.3 Greater shift distances
- 5.3.1 A more efficient algorithm by computing a greater shift
- 5.3.2 Deriving a practical range predicate
- 5.4 Precomputation
- 5.4.1 Characterising the domains of functions $d_1$ and $d_2$
- 5.4.2 Precomputing function $t$
- 5.4.3 Precomputing functions $d_1$ and $d_2$
- 5.4.4 Precomputing function $f_r$
- 5.4.5 Precomputing sets $L_q$
- 5.4.6 Precomputing function $emm$
- 5.4.7 Precomputing function $st$ and languages $L'$ and $suff(L')$
- 5.4.8 Precomputing relation $Reach(M)$
- 5.4.9 Combining the precomputation algorithms
- 5.5 Specializing the pattern matching algorithm
- 5.6 The performance of the algorithm
- 5.7 Improving the algorithm
- 5.8 Conclusions
- 5 A new RE pattern matching algorithm
- 6 $FA$ construction algorithms
- 6.1 Introduction and related work
- 6.2 Items and an alternative definition of $RE$
- 6.3 A canonical construction
- 6.4 $\varepsilon$-free constructions
- 6.4.1 Filters
- 6.5 Encoding Construction $(REM-\varepsilon)$
- 6.5.1 Using begin-markers
- 6.5.2 Composing the subset construction
- 6.6 An encoding using derivatives
- 6.7 The dual constructions
- 6.8 Precomputing the auxiliary sets and Null
- 6.9 Constructions as imperative programs
- 6.9.1 The item set constructions
- 6.9.2 The Symnodes constructions
- 6.9.3 A dual construction
- 6.10 Conclusions
- 6 $FA$ construction algorithms
- 7 $DFA$ minimisation algorithms
- 71 Introduction
- 7.2 An algorithm due to Brzozowski
- 7.3 Minimization by equivalence of states
- 7.3.1 The equivalence relation on states
- 7.3.2 Distinguishability
- 7.3.3 An upperbound on the number of approximation steps
- 7.3.4 Characterizing the equivalence classes of $E$
- 7.4 Algorithms computing $E$, $D$, or $[Q]_e$
- 7.4.1 Computing $D$ and $E$ by layerwise approximations
- 7.4.2 Computing $D$, $E$, and $[Q]_e$ by unordered approximation
- 7.4.3 More efficiently computing $D$ and $E$ by unordered approximation
- 7.4.4 An algorithm due to Hopcroft and Ullman
- 7.4.5 Hopcroft's algorithm to compute $[Q]_e$ efficiently
- 7.4.6 Computing $(p, q) \in E$
- 7.4.7 Computing $E$ by approximation from below
- 7.5 Conclusions
- 7 $DFA$ minimisation algorithms
- III The implementations
- 8 Designing and implementing class libraries
- 8.1 Motivations for writing class libraries
- 8.2 Code sharing
- 8.2.1 Base classes versus templates
- 8.2.2 Composition versus protected inheritance
- 8.3 Coding conventions and performance issues
- 8.3.1 Performance tuning
- 8.4 Presentation conventions
- 8 Designing and implementing class libraries
- 9 SPARE Parts: String PAttern REcognition in C++
- 9.1 Introduction and related work
- 9.2 Using the toolkit
- 9.2.1 Multi-threaded pattern matching
- 9.2.2 Alternative alphabets
- 9.3 Abstract pattern matchers
- 9.4 Concrete pattern matchers
- 9.4.1 The brute-force pattern matchers
- 9.4.2 The KMP pattern matcher
- 9.4.3 The AC pattern matchers
- 9.4.3.1 AC transition machines and auxiliary classes
- 9.4.4 The CW pattern matchers
- 9.4.4.1 Safe shifters and auxiliary functions
- 9.4.5 The BM pattern matchers
- 9.4.5.1 Safe shifters and auxiliary functions
- 9.4.6 Summary of user classes
- 9.5 Foundation classes
- 9.5.1 Miscellaneous
- 9.5.2 Arrays, sets, and maps
- 9.5.3 Tries and failure functions
- 9.6 Experiences and conclusions
- 9.7 Obtaining and compiling the toolkit
- 9 SPARE Parts: String PAttern REcognition in C++
- 10 FIRE Lite: $FA$s and $RE$s in C++
- 10.1 Introduction and related work
- 10.1.1 Related toolkits
- 10.1.2 Advantages and characteristics of FIRE Lite
- 10.1.3 Future directions for the toolkit
- 10.1.4 Reading this chapter
- 10.2 Using the toolkit
- 10.3 The stricture of FIRE Lite
- 10.4 $RE$s and abstract $FA$s
- 10.5 Concrete $FA$s
- 10.5.1 Finite automata
- 10.5.2 $\varepsilon$-free finite automata
- 10.5.3 Deterministic finite automata
- 10.5.4 Abstract-states classes
- 10.5.5 Summary of concrete automata classes
- 10.6 Foundation classes
- 10.6.1 Character ranges
- 10.6.2 States, positions, nodes, and maps
- 10.6.3 Bit vectors and sets
- 10.6.4 Transitions
- 10.6.5 Relations
- 10.7 Experiences and conclusions
- 10.8 Obtaining and compiling FIRE Lite
- 10.1 Introduction and related work
- 10 FIRE Lite: $FA$s and $RE$s in C++
- 11 DFA minimisation algorithms in FIRE Lite
- 11.1 Introduction
- 11.2 The algorithms
- 11.2.1 Brzozowski's algorithm
- 11.2.2 Equivalence relation algorithms
- 11.3 Foundation classes
- 11.4 Conclusions
- 11 DFA minimisation algorithms in FIRE Lite
- IV The performance of the algorithms
- 12 Measuring the performance of algorithms
- 13 The performance of pattern matchers
- 13.1 Introduction and related work
- 13.2 The algorithms
- 13.3 Testing methodology
- 13.3.1 Test environment
- 13.3.2 Natural language test data
- 13.3.3 DNA sequence test data
- 13.4 results
- 13.4.1 Performance versus keyword set size
- 13.4.2 Performance versus minimum keyword length
- 13.4.3 Single-keywords
- 13.5 Conclusions and recommendations
- 13 The performance of pattern matchers
- 14 The performance of FA construction algorithms
- 14.1 Introduction
- 14.2 The algorithms
- 14.3 Testing methodology
- 14.3.1 Test environment
- 14.3.2 Generating regular expressions
- 14.3.3 Generating input strings
- 14.1 Results
- 14.4.1 Construction times
- 14.4.2 Constructed automaton sizes
- 14.4.3 Single transition performance
- 14.5 Conclusions and recommendations
- 14 The performance of FA construction algorithms
- 15 The performance of DFA minimization algorithms
- 15.1 Introduction
- 15.2 The algorithms
- 15.3 Testing methodology
- 15.4 Results
- 15.5 Conclusions and recommendations
- 15 The performance of DFA minimization algorithms
- V Epilogue
- 16 Conclusions
- 16.1 General conclusions
- 16.2 Chapter-specific conclusions
- 16.3 A personal perspective:
- 16 Conclusions
- 17 Challenges and open problems
- References
- Index
- Summary
- Samenvatting (Dutch summary)
- Curriculum Vitae