Book:Bruce W. Watson/Taxonomies and Toolkits of Regular Language Algorithms

Subject Matter

 * Computer Science

Contents

 * Acknowledgements


 * I Prologue


 * 1 Introduction
 * 1.1 Problem statement
 * 1.2 Structure of this dissertation
 * 1.3 Intended audience


 * 2 Mathematical preliminaries
 * 2.1 Notations and conventions
 * 2.2 Basic definitions
 * 2.3 Strings and languages
 * 2.4 Regular expressions
 * 2.5 Trees
 * 2.6 Finite automata and Moore machines
 * 2.6.1 Definitions and properties involving $$FA$$s and $$MM$$s
 * 2.6.2 Transformations on finite automata
 * 2.6.2.1 Imperative implementations of some transformations


 * II The taxonomies


 * 3 Constructing taxonomies


 * 4 Keyword pattern matching algorithms
 * 4.1 Introduction and related work
 * 4.2 The problem and some naive solutions
 * 4.2.1 The $$(P_+)$$ algorithms
 * 4.2.1.1 The $$(P_+, S_+)$$ algorithm and its improvement
 * 4.2.1.2 The $$(P_+, S_-)$$ algorithm
 * 4.2.2 The $$(S_-)$$ algorithms
 * 4.2.2.1 The $$(S_-, P_+)$$ algorithms
 * 4.2.2.2 The $$(S_-, P_-)$$ algorithm
 * 4.3 The Aho-Corasick algorithms
 * 4.3.1 Algorithm detail $$(AC)$$
 * 4.3.2 Method $$(AC-OPT)$$
 * 4.3.3 A Moore machine approach to the $$(AC-OPT)$$ algorithm
 * 4.3.4 Linear search
 * 4.3.5 The Aho-Corasick failure function algorithm
 * 4.3.6 The Knuth-Morris-Pratt algorithm
 * 4.3.6.1 Adding indices
 * 4.3.7 An alternative derivation of Moore machine $$M_0$$
 * 4.4 The Commentz-Walter algorithms
 * 4.4.1 Safe shift distances and predicate weakening
 * 4.4.1.1 General weakening strategies
 * 4.4.1.2 The $$l = \varepsilon$$ and the no-lookahead cases
 * 4.4.1.2.1 The no-lookahead shift function
 * 4.4.2 A shift function without further weakening
 * 4.4.3 Towards the CW and BM algorithms
 * 4.4.4 A more easily precomputed shift function
 * 4.4.5 The standard Commentz-Walter algorithm
 * 4.4.6 A derivation of the Boyer-Moore algorithm
 * 4.4.7 A weakened Boyer-Moore algorithm
 * 4.4.8 Using the right lookahead symbol
 * 4.5 The Boyer-Moore family of algorithms
 * 4.5.1 Larger shifts without using match information
 * 4.5.2 Making use of match information
 * 4.6 Conclusions


 * 5 A new RE pattern matching algorithm
 * 5.1 Introduction
 * 5.2 Problem specification and a simple first algorithm
 * 5.2.1 A more practical algorithm using a finite automaton
 * 5.3 Greater shift distances
 * 5.3.1 A more efficient algorithm by computing a greater shift
 * 5.3.2 Deriving a practical range predicate
 * 5.4 Precomputation
 * 5.4.1 Characterising the domains of functions $$d_1$$ and $$d_2$$
 * 5.4.2 Precomputing function $$t$$
 * 5.4.3 Precomputing functions $$d_1$$ and $$d_2$$
 * 5.4.4 Precomputing function $$f_r$$
 * 5.4.5 Precomputing sets $$L_q$$
 * 5.4.6 Precomputing function $$emm$$
 * 5.4.7 Precomputing function $$st$$ and languages $$L'$$ and $$suff(L')$$
 * 5.4.8 Precomputing relation $$Reach(M)$$
 * 5.4.9 Combining the precomputation algorithms
 * 5.5 Specializing the pattern matching algorithm
 * 5.6 The performance of the algorithm
 * 5.7 Improving the algorithm
 * 5.8 Conclusions


 * 6 $$FA$$ construction algorithms
 * 6.1 Introduction and related work
 * 6.2 Items and an alternative definition of $$RE$$
 * 6.3 A canonical construction
 * 6.4 $$\varepsilon$$-free constructions
 * 6.4.1 Filters
 * 6.5 Encoding Construction $$(REM-\varepsilon)$$
 * 6.5.1 Using begin-markers
 * 6.5.2 Composing the subset construction
 * 6.6 An encoding using derivatives
 * 6.7 The dual constructions
 * 6.8 Precomputing the auxiliary sets and Null
 * 6.9 Constructions as imperative programs
 * 6.9.1 The item set constructions
 * 6.9.2 The Symnodes constructions
 * 6.9.3 A dual construction
 * 6.10 Conclusions


 * 7 $$DFA$$ minimisation algorithms
 * 71 Introduction
 * 7.2 An algorithm due to Brzozowski
 * 7.3 Minimization by equivalence of states
 * 7.3.1 The equivalence relation on states
 * 7.3.2 Distinguishability
 * 7.3.3 An upperbound on the number of approximation steps
 * 7.3.4 Characterizing the equivalence classes of $$E$$
 * 7.4 Algorithms computing $$E$$, $$D$$, or $$[Q]_e$$
 * 7.4.1 Computing $$D$$ and $$E$$ by layerwise approximations
 * 7.4.2 Computing $$D$$, $$E$$, and $$[Q]_e$$ by unordered approximation
 * 7.4.3 More efficiently computing $$D$$ and $$E$$ by unordered approximation
 * 7.4.4 An algorithm due to Hopcroft and Ullman
 * 7.4.5 Hopcroft's algorithm to compute $$[Q]_e$$ efficiently
 * 7.4.6 Computing $$(p, q) \in E$$
 * 7.4.7 Computing $$E$$ by approximation from below
 * 7.5 Conclusions


 * III The implementations


 * 8 Designing and implementing class libraries
 * 8.1 Motivations for writing class libraries
 * 8.2 Code sharing
 * 8.2.1 Base classes versus templates
 * 8.2.2 Composition versus protected inheritance
 * 8.3 Coding conventions and performance issues
 * 8.3.1 Performance tuning
 * 8.4 Presentation conventions


 * 9 SPARE Parts: String PAttern REcognition in C++
 * 9.1 Introduction and related work
 * 9.2 Using the toolkit
 * 9.2.1 Multi-threaded pattern matching
 * 9.2.2 Alternative alphabets
 * 9.3 Abstract pattern matchers
 * 9.4 Concrete pattern matchers
 * 9.4.1 The brute-force pattern matchers
 * 9.4.2 The KMP pattern matcher
 * 9.4.3 The AC pattern matchers
 * 9.4.3.1 AC transition machines and auxiliary classes
 * 9.4.4 The CW pattern matchers
 * 9.4.4.1 Safe shifters and auxiliary functions
 * 9.4.5 The BM pattern matchers
 * 9.4.5.1 Safe shifters and auxiliary functions
 * 9.4.6 Summary of user classes
 * 9.5 Foundation classes
 * 9.5.1 Miscellaneous
 * 9.5.2 Arrays, sets, and maps
 * 9.5.3 Tries and failure functions
 * 9.6 Experiences and conclusions
 * 9.7 Obtaining and compiling the toolkit


 * 10 FIRE Lite: $$FA$$s and $$RE$$s in C++
 * 10.1 Introduction and related work
 * 10.1.1 Related toolkits
 * 10.1.2 Advantages and characteristics of FIRE Lite
 * 10.1.3 Future directions for the toolkit
 * 10.1.4 Reading this chapter
 * 10.2 Using the toolkit
 * 10.3 The stricture of FIRE Lite
 * 10.4 $$RE$$s and abstract $$FA$$s
 * 10.5 Concrete $$FA$$s
 * 10.5.1 Finite automata
 * 10.5.2 $$\varepsilon$$-free finite automata
 * 10.5.3 Deterministic finite automata
 * 10.5.4 Abstract-states classes
 * 10.5.5 Summary of concrete automata classes
 * 10.6 Foundation classes
 * 10.6.1 Character ranges
 * 10.6.2 States, positions, nodes, and maps
 * 10.6.3 Bit vectors and sets
 * 10.6.4 Transitions
 * 10.6.5 Relations
 * 10.7 Experiences and conclusions
 * 10.8 Obtaining and compiling FIRE Lite


 * 11 DFA minimisation algorithms in FIRE Lite
 * 11.1 Introduction
 * 11.2 The algorithms
 * 11.2.1 Brzozowski's algorithm
 * 11.2.2 Equivalence relation algorithms
 * 11.3 Foundation classes
 * 11.4 Conclusions


 * IV The performance of the algorithms


 * 12 Measuring the performance of algorithms


 * 13 The performance of pattern matchers
 * 13.1 Introduction and related work
 * 13.2 The algorithms
 * 13.3 Testing methodology
 * 13.3.1 Test environment
 * 13.3.2 Natural language test data
 * 13.3.3 DNA sequence test data
 * 13.4 results
 * 13.4.1 Performance versus keyword set size
 * 13.4.2 Performance versus minimum keyword length
 * 13.4.3 Single-keywords
 * 13.5 Conclusions and recommendations


 * 14 The performance of FA construction algorithms
 * 14.1 Introduction
 * 14.2 The algorithms
 * 14.3 Testing methodology
 * 14.3.1 Test environment
 * 14.3.2 Generating regular expressions
 * 14.3.3 Generating input strings
 * 14.1 Results
 * 14.4.1 Construction times
 * 14.4.2 Constructed automaton sizes
 * 14.4.3 Single transition performance
 * 14.5 Conclusions and recommendations


 * 15 The performance of DFA minimization algorithms
 * 15.1 Introduction
 * 15.2 The algorithms
 * 15.3 Testing methodology
 * 15.4 Results
 * 15.5 Conclusions and recommendations


 * V Epilogue


 * 16 Conclusions
 * 16.1 General conclusions
 * 16.2 Chapter-specific conclusions
 * 16.3 A personal perspective:


 * 17 Challenges and open problems


 * References


 * Index


 * Summary


 * Samenvatting (Dutch summary)


 * Curriculum Vitae