SEL292: COMPUTATIONAL LINGUISTICSLecture 10 Theory Introduction to pushdown automata The finite state machine is the simplest kind of computational architecture. But, as we have just seen, there are languages which no FSA can define. The standard example of such a language is anbn, that is, the language consisting of strings where some number of a's is followed by exactly the same number of b's: ab, aabb, aaabbb and so on. This limitation is not merely theoretical for the computational linguist who wants to design an NLP system: natural languages include the anbn structure. An example from English is centre-embedding sentences like
where the number of noun phrases and the number of verb phrases must be the same. FSA computational architecture cannot, therefore, be used for general NLP work. A more appropriate architecture is proposed in what follows: the pushdown automaton, or PDA. A pushdown automaton is essentially an FSA with some extra features added to make it more powerful as a language defining mechanism, and in particular to overcome thc anbn problem just mentioned. Since we have already studied FSAs, and since PDAs are just uprated FSAs, it is probably best to preface a formal definition of PDAs by an informal discussion of the extra features which these machines have. Input tape In the discussion of FSAs no mention was made of how an input string is actually presented to the machine. For a PDA, an input medium is specified. It is a tape on which the input string is written. The tape can be thought of as being divided into squares or cells, as below. Initially each cell is blank; the $ symbol is used to indicate this:
The string to be input is written on the tape, starting at the left and putting one symbol in each successive cell. For example, the string abbde appears like this:
The tape extends as far as one likes to the right, so that it can accommodate any finite length string. When the machine reads the tape, it does so one symbol at a time, starting at the leftmost cell, and keeps going until it reaches the first blank cell on the right (ie, the first $), which tells it that the string has been completely read. The tape may be read from left to right only, and any given cell may only be read once. Stack Unlike an FSA, a PDA has a memory by means of which it can remember what symbols it has read from the input tape, and what order it read them in. It is this memory which crucially distinguishes the class of PDAs from the class of FSAs. The mechanism which gives a PDA its memory is called a stack: understanding how a stack works is essential to understanding PDAs. Assume that you are washing dishes in a retaurant. Once you have dried each one, you put it on top of the existing pile. Every now and then, a waiter needs a plate and takes one from the top; the pile shrinks and grows as you wash and the waiter serves, depending on supply and demand.
The two important things to note are:
The stack in a PDA works just like the pile of plates except that. instead of storing plates, it stores symbols. As the PDA reads symbols, it may decide that it needs to keep some record of having done so, and inserts each successive one on top of the stack: the first symbol goes into an empty stack, the second goes on top of it, the third on top of the second, and so on. Thus, if the machine were to read the string abcd and store each successive symbol on the stack, the following would happen:
If these symbols are needed again later (and they typically are), they can be removed one at a time from the top of the stack. As with the top plate in a restaurant stack, only the top symbol of the PDA stack is accessible. Why the apparently arbitrary restriction that only the top of the stack should be accessible? To function, a PDA needs to be able to remember what symbols it has read from the input tape, and also the order in which it read them. The first half of this memory requirement could be satisfied by simply throwing the symbols already read onto a heap, and letting the machine root through the heap when it needed to know whether or not it had read some particular symbol. But if the order of reading is also crucial, then this approach is clearly inadequate: this is where the restriction that symbols can only be added to or taken from the top of the stack begins to make sense. The machine knows that the first symbol it read is at the bottom of the stack. the second above it, and so on up to the most recently read symbol at the top. The original order of reading is presented: allowing random access to any part of the stack would destroy that ordering. Controller The heart of a PDA is a controller which coordinates (i) reading from the tape, and (ii) putting symbols into and taking them out of the stack. This controller is an FSA whose states are restricted to the following types:
What such a controller actually looks like will emerge shortly.
We are now in a position to give a formal definition of a PDA. A pushdown automaton consists of six components:
Note that items (1), (5), and (6) of this definition are exactly the same as those for an FSA: this is because a PDA is, at heart, an FSA with tape and stack added. Nothing has yet been said about the stack alphabet. This is irrelevant for the moment, but will be important when the issue of parsing or syntax analysis is addressed later in this module. Practice Implementing a stack in Pascal
Assignment Enter and run the above stack program on a computer Reading Pushdown automata
Pushdown automata and context free languages
|