next up previous contents
Next: Representation of item sets Up: Implementation Previous: Implementation

Representation of grammar and lexical entries

tex2html_wrap_inline12225 is used to represent feature structures. Feature structures are represented in a string-like notation where variables are prefixed with the sign %. Such string notation is then transformed by a special tex2html_wrap_inline12225 reader in some internal Lisp structures.

Definite clause specifications are then described in Lisp-list notation, where a definite clause is represented as a list, where the first element represents the head and the remaining elements represent the body of a clause. Each relation is represented as a list, where the first element represents the relational atom and the remaining elements represents the arguments of the relations directly specified as feature structures in string notation. The following structure shows the Lisp representation of rule (r tex2html_wrap_inline12179 ):

((sign "[(cat vp)
         (sc %Tail)
         (sem %Sem)
         (lex no)
         (v2 %V)
         (string [(d-l %p0)
                  (e-l %p)])
         (deriv [(name vp-sc)
                 (dtrs [(first %1)
                        (rest [(first %2)
                                (rest end)])])])]")

     (sign "%Arg=[(sc end)
                  (v2 no)
                  (string [(d-l %p0)
                           (e-l %p1)])
                  (deriv %1)]")

     (sign "[(cat vp)
             (sc [(first %Arg)
                  (rest %Tail)])   
             (sem %Sem)
             (v2 %V)
             (string [(d-l %p1)
                      (e-l %p)])
             (deriv %2)]"))

Grammar and lexical rules are transformed in some internal representation, where the grammar and lexicon are stored as a hash table whose key values are lists of possible entries. For grammar rules the relation name of the head of a clause and its arity is used to construct a symbol which is used as the key in the hash table. Thus, the above rule would be inserted under the key SIGN/1. However, storing of grammar rules in a hash table in that way only makes sense if we use a set of different relational symbols. For the grammar in appendix A however we only used the relation name SIGN. In order to retrieve grammar rules efficiently, we therefore use the value of the feature CAT for determining the hash key. The effect is that all grammar rules with same category value are stored in a list under the same key. Our implementation can be parameterized with respect to this issue.

For lexical rules we actually use two different hash tables, where both point to the same set of lexical entries (i.e., the same set of entries can be retrieved from two different directions). For parsing we use the first string element (which can be accessed via the path tex2html_wrap_inline12235 , and for the generation we are using the predicate name (via the path tex2html_wrap_inline12237 ).

Using these data structures, scanning is performed in two steps. First, for the selected element of an active item, a possible key is determined which is either found under the path tex2html_wrap_inline12235 or tex2html_wrap_inline12237 . If such a key can be determined the corresponding entries in the lexicon are retrieved and a list of possible candidates are returned (which can be the empty list, if no entries exist for this current key). Each candidate is then unified with the constraints of the selected element. If the constraints of the selected element do not specify any information about a string or a semantic information, no key can be determined. If this happens, the scanner does not apply. This is also the case if no lexical entry for a key can be determined.

In order to syntactically distinguish between definite clauses, that represent grammar rules and lexical rules, we prefix each clause either with the special grammar rule symbol ``;SPMlt;-'' or with the lexical entry symbol ``;SPMlt;;SPMlt;-''. Thus the above grammar rule would be rewritten as

(;SPMlt;- (sign ``...'') (sign `...'') (sign ``...''))

where as a lexical entry would be written as

(;SPMlt;;SPMlt;- (sign ``...''))

Both symbols are actually macros that call the respective functions.


next up previous contents
Next: Representation of item sets Up: Implementation Previous: Implementation

Guenter Neumann
Mon Oct 5 14:01:36 MET DST 1998