Book:Bruce W. Watson/Taxonomies and Toolkits of Regular Language Algorithms

From ProofWiki

Jump to navigation Jump to search

Bruce W. Watson: Taxonomies and Toolkits of Regular Language Algorithms

Published $\text {1995}$, Eindhoven University of Technology

ISBN 90-386-0396-7

Subject Matter

Computer Science

Contents

Acknowledgements

I Prologue

1 Introduction

1.1 Problem statement

1.2 Structure of this dissertation

1.3 Intended audience

2 Mathematical preliminaries

2.1 Notations and conventions

2.2 Basic definitions

2.3 Strings and languages

2.4 Regular expressions

2.5 Trees

2.6 Finite automata and Moore machines

2.6.1 Definitions and properties involving $FA$s and $MM$s

2.6.2 Transformations on finite automata

2.6.2.1 Imperative implementations of some transformations

II The taxonomies

3 Constructing taxonomies

4 Keyword pattern matching algorithms

4.1 Introduction and related work

4.2 The problem and some naive solutions

4.2.1 The $(P_+)$ algorithms

4.2.1.1 The $(P_+, S_+)$ algorithm and its improvement

4.2.1.2 The $(P_+, S_-)$ algorithm

4.2.2 The $(S_-)$ algorithms

4.2.2.1 The $(S_-, P_+)$ algorithms

4.2.2.2 The $(S_-, P_-)$ algorithm

4.3 The Aho-Corasick algorithms

4.3.1 Algorithm detail $(AC)$

4.3.2 Method $(AC-OPT)$

4.3.3 A Moore machine approach to the $(AC-OPT)$ algorithm

4.3.4 Linear search

4.3.5 The Aho-Corasick failure function algorithm

4.3.6 The Knuth-Morris-Pratt algorithm

4.3.6.1 Adding indices

4.3.7 An alternative derivation of Moore machine $M_0$

4.4 The Commentz-Walter algorithms

4.4.1 Safe shift distances and predicate weakening

4.4.1.1 General weakening strategies

4.4.1.2 The $l = \varepsilon$ and the no-lookahead cases

4.4.1.2.1 The no-lookahead shift function

4.4.2 A shift function without further weakening

4.4.3 Towards the CW and BM algorithms

4.4.4 A more easily precomputed shift function

4.4.5 The standard Commentz-Walter algorithm

4.4.6 A derivation of the Boyer-Moore algorithm

4.4.7 A weakened Boyer-Moore algorithm

4.4.8 Using the right lookahead symbol

4.5 The Boyer-Moore family of algorithms

4.5.1 Larger shifts without using match information

4.5.2 Making use of match information

4.6 Conclusions

5 A new RE pattern matching algorithm

5.1 Introduction

5.2 Problem specification and a simple first algorithm

5.2.1 A more practical algorithm using a finite automaton

5.3 Greater shift distances

5.3.1 A more efficient algorithm by computing a greater shift

5.3.2 Deriving a practical range predicate

5.4 Precomputation

5.4.1 Characterising the domains of functions $d_1$ and $d_2$

5.4.2 Precomputing function $t$

5.4.3 Precomputing functions $d_1$ and $d_2$

5.4.4 Precomputing function $f_r$

5.4.5 Precomputing sets $L_q$

5.4.6 Precomputing function $emm$

5.4.7 Precomputing function $st$ and languages $L'$ and $suff(L')$

5.4.8 Precomputing relation $Reach(M)$

5.4.9 Combining the precomputation algorithms

5.5 Specializing the pattern matching algorithm

5.6 The performance of the algorithm

5.7 Improving the algorithm

5.8 Conclusions

6 $FA$ construction algorithms

6.1 Introduction and related work

6.2 Items and an alternative definition of $RE$

6.3 A canonical construction

6.4 $\varepsilon$-free constructions

6.4.1 Filters

6.5 Encoding Construction $(REM-\varepsilon)$

6.5.1 Using begin-markers

6.5.2 Composing the subset construction

6.6 An encoding using derivatives

6.7 The dual constructions

6.8 Precomputing the auxiliary sets and Null

6.9 Constructions as imperative programs

6.9.1 The item set constructions

6.9.2 The Symnodes constructions

6.9.3 A dual construction

6.10 Conclusions

7 $DFA$ minimisation algorithms

71 Introduction

7.2 An algorithm due to Brzozowski

7.3 Minimization by equivalence of states

7.3.1 The equivalence relation on states

7.3.2 Distinguishability

7.3.3 An upperbound on the number of approximation steps

7.3.4 Characterizing the equivalence classes of $E$

7.4 Algorithms computing $E$, $D$, or $[Q]_e$

7.4.1 Computing $D$ and $E$ by layerwise approximations

7.4.2 Computing $D$, $E$, and $[Q]_e$ by unordered approximation

7.4.3 More efficiently computing $D$ and $E$ by unordered approximation

7.4.4 An algorithm due to Hopcroft and Ullman

7.4.5 Hopcroft's algorithm to compute $[Q]_e$ efficiently

7.4.6 Computing $(p, q) \in E$

7.4.7 Computing $E$ by approximation from below

7.5 Conclusions

III The implementations

8 Designing and implementing class libraries

8.1 Motivations for writing class libraries

8.2 Code sharing

8.2.1 Base classes versus templates

8.2.2 Composition versus protected inheritance

8.3 Coding conventions and performance issues

8.3.1 Performance tuning

8.4 Presentation conventions

9 SPARE Parts: String PAttern REcognition in C++

9.1 Introduction and related work

9.2 Using the toolkit

9.2.1 Multi-threaded pattern matching

9.2.2 Alternative alphabets

9.3 Abstract pattern matchers

9.4 Concrete pattern matchers

9.4.1 The brute-force pattern matchers

9.4.2 The KMP pattern matcher

9.4.3 The AC pattern matchers

9.4.3.1 AC transition machines and auxiliary classes

9.4.4 The CW pattern matchers

9.4.4.1 Safe shifters and auxiliary functions

9.4.5 The BM pattern matchers

9.4.5.1 Safe shifters and auxiliary functions

9.4.6 Summary of user classes

9.5 Foundation classes

9.5.1 Miscellaneous

9.5.2 Arrays, sets, and maps

9.5.3 Tries and failure functions

9.6 Experiences and conclusions

9.7 Obtaining and compiling the toolkit

10 FIRE Lite: $FA$s and $RE$s in C++

10.1 Introduction and related work

10.1.1 Related toolkits

10.1.2 Advantages and characteristics of FIRE Lite

10.1.3 Future directions for the toolkit

10.1.4 Reading this chapter

10.2 Using the toolkit

10.3 The stricture of FIRE Lite

10.4 $RE$s and abstract $FA$s

10.5 Concrete $FA$s

10.5.1 Finite automata

10.5.2 $\varepsilon$-free finite automata

10.5.3 Deterministic finite automata

10.5.4 Abstract-states classes

10.5.5 Summary of concrete automata classes

10.6 Foundation classes

10.6.1 Character ranges

10.6.2 States, positions, nodes, and maps

10.6.3 Bit vectors and sets

10.6.4 Transitions

10.6.5 Relations

10.7 Experiences and conclusions

10.8 Obtaining and compiling FIRE Lite

11 DFA minimisation algorithms in FIRE Lite

11.1 Introduction

11.2 The algorithms

11.2.1 Brzozowski's algorithm

11.2.2 Equivalence relation algorithms

11.3 Foundation classes

11.4 Conclusions

IV The performance of the algorithms

12 Measuring the performance of algorithms

13 The performance of pattern matchers

13.1 Introduction and related work

13.2 The algorithms

13.3 Testing methodology

13.3.1 Test environment

13.3.2 Natural language test data

13.3.3 DNA sequence test data

13.4 results

13.4.1 Performance versus keyword set size

13.4.2 Performance versus minimum keyword length

13.4.3 Single-keywords

13.5 Conclusions and recommendations

14 The performance of FA construction algorithms

14.1 Introduction

14.2 The algorithms

14.3 Testing methodology

14.3.1 Test environment

14.3.2 Generating regular expressions

14.3.3 Generating input strings

14.1 Results

14.4.1 Construction times

14.4.2 Constructed automaton sizes

14.4.3 Single transition performance

14.5 Conclusions and recommendations

15 The performance of DFA minimization algorithms

15.1 Introduction

15.2 The algorithms

15.3 Testing methodology

15.4 Results

15.5 Conclusions and recommendations

V Epilogue

16 Conclusions

16.1 General conclusions

16.2 Chapter-specific conclusions

16.3 A personal perspective:

17 Challenges and open problems

References

Index

Summary

Samenvatting (Dutch summary)

Curriculum Vitae

Retrieved from "https://proofwiki.org/w/index.php?title=Book:Bruce_W._Watson/Taxonomies_and_Toolkits_of_Regular_Language_Algorithms&oldid=254156"

Books/Computer Science