Artificial Intelligence/Search/Exhaustive search/Finite state automata

< Artificial Intelligence < Search < Exhaustive search

Searching with Finite State Automata

A common way of searching for a pattern in a string of symbols is by using Finite State Automata, also known as finite state machines or FSMs. FSMs are abstract structures which can be implemented in a number of different ways, but all of them share some common properties. Each FSM has[1]:

Acceptors

When used for pattern matches within strings, an FSM is given each successive character of the string as its input. For simple substring matching, the FSM would have one state for each number of characters in the substring matched so far.

Finite state automata which simply accept or reject an input string with a yes/no response are called acceptors [2]. Here is a state diagram of an acceptor that returns a "yes" response when a string contains the substring "aba":

The machine starts in the "empty" state, representing an empty substring buffer. The "other" transitions represent inputs of any character besides "a" or "b", so the only input that will get the machine out of the empty state is the first character of the substring to be matched, "a". From then on, inputs are converted into transitions from state to state.

The double circle around "aba" indicates that it is an accept state of this FSM, meaning that the machine will give a "yes" result if it is in this state when it has no more inputs to process. All other states are non-accpting states and would result in a "no" result at the end of processing.

Transducers

An FSM with actions that provide output is instead called a finite state transducer[2]. Here is an example of a transducer written in Ruby which strips C++/Java-style comments out of an input stream:

#!/usr/bin/env ruby

# comment_stripper.rb
# 
# Requires Ruby 1.8.7
# 
# Strips C++/Java style comments from an input stream using a finite state
# transducer. Single-line comments begin with two forward slashes ("//") and
# extend to the end of the line. Block comments begin with a slash followed by an
# asterisk ("/*") and end with an asterisk followed by a slash ("*/").
# 
# When run from the command line, performs the transformation on standard input.
# 
# Example:
# 
# cat hello_world.cpp | ruby comment_stripper.rb

class CommentStripper
  def initialize()
    @current_state = :code
  end
  
  def input(c)
    output = INPUT_ACTIONS[c][@current_state] || ''
    @current_state = TRANSITIONS[c][@current_state] || @current_state
    return output
  end
  
  TRANSITIONS = Hash.new do |hash,input|
    # triggered by any characters other than the ones explicitly defined below
    {
      :slash  => :code,
      :star   => :block
    }
  end.merge(
    # overrides the default transitions defined above
    "/" => {
      :code   => :slash,
      :slash  => :line,
      :star   => :code
    },
    "*" => {
      :slash  => :block,
      :block  => :star
    },
    "\n" => {
      :line   => :code,
      :slash  => :code,
      :star   => :block
    }
  )
  
  INPUT_ACTIONS = Hash.new do |hash,input|
    {
      :code   => input,
      :slash  => ("/" + input)
    }
  end.merge(
    "/" => {},
    "*" => {
      :code   => "*"
    },
    "\n" => {
      :code   => "\n",
      :slash  => "/\n",
      :line   => "\n"
    }
  )
end

if $0 == __FILE__
  cs = CommentStripper.new
  $stdin.each_char do |c|
    print cs.input(c)
  end
end

The transitions and input actions are both defined in nested hash tables keyed by input and state. The input method takes in a single character of the input string, performs any necessary transition to a new state, and returns the transduced character based on the original state. This diagram shows the transitions and states defined in this machine:

References

  1. Wagner, Ferdinand; Ruedi Schmuki, Thomas Wagner, Peter Wolstenholme (2006). Modeling Software with Finite State Machines. CRC Press. pp. 369. ISBN 0849380863.
  2. 1 2 Roche, Emmanuel (1997). Finite-state language processing. MIT Press. pp. 464. ISBN 0262181827.

Further Reading

This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.