Mark Seligman, Jan Alexandersson and Kristiina Jokinen

Tracking Morphological and Semantic Co-occurrences in Spontaneous Dialogues

[Full Text]
[send contribution]
[debate procedure]
[copyright]


Overview of interactions

No Comment(s) Answer(s) Continued discussion
1 3.4.2000 Lars Ahrenberg
14.4.2000 Mark Seligman et al

C1. Lars Ahrenberg (3.4.2000)

This paper is a short one with only tentative results. The value of the approach remains to be proved (as the authors admit), but given this I still think there could be more to say about the approach and the use the authors want to make of it.

The paper describes a new set of facilities for tracking lexical co-occurrences. The underlying motivation is that such tracking could enable more accurate speech recognition and lexical disambiguation. Several functions are provided that given a morph in a discourse segment, a window consisting of that segment and a number of segments to the right, can predict a set of morphs that are likely to occur within that window.

As the most innovative part of the paper is this idea of semantic smoothing it is unfortunate that it is not explained in more detail, nor is it tested in the proof-of-concept evaluation. In fact, it is not clear to the reader whether what is done is smoothing or something else such as inheritance or generalization. As an example, let Q(M1,M21) be 0.2 for two morphs M1 and M2, Q(cat(M1),cat(M2)) be 0.2 and let M2 have 8 category mates, M22-M29. What should be the value of Q(M1,M22) given that M22 does not occur in the corpus? Is anything taken away from Q(M1,M2) in order to take care of the mates, or are they simply assigned the same conditional probability as Q(M1,M21)? If the latter is the case it seems misleading to talk about smoothing.

An objection one might have to the procedure is that there may exist dependencies between morphs of categories with co-occurrence associations. The pairs bus/street and car/street seem straightforward, but what about pairs such as pilot/airplane, driver/airplane, butcher/airplane (assuming pilot, driver and butcher to be in the same semantic category)? But I believe many examples of this kind can be found.

A matter not discussed in the paper is the relation between this approach and semantic priming implementable e.g. via marker passing. I think a comment on this relation would be appropriate, especially as one of the problems the authors will apply the method to is lexical disambiguation. I would also like to encourage you to find a better example than the notorious 'bank'-ambiguity. For one thing, it is not an ambiguity that is likely to occur in the practical dialogue systems we know today.

Finally, I wonder how evidence from several morphs can be combined in determining co-occurrence probabilities. Given that both M0 and M1 arne known to occur in a segment (and hence cat(M0) and cat(M1)) is the idea to look for morphs that are supported by both of them or is this just further research?

Lars Ahrenberg


A1. Mark Seligman et al. (14.4.00):

Lars Ahrenbergs comments:

This paper is a short one with only tentative results. The value of the approach remains to be proved (as the authors admit), but given this I still think there could be more to say about the approach and the use the authors want to make of it.

The authors reply:

We agree, and will say more in future papers.

Lars Ahrenbergs comments:

The paper describes a new set of facilities for tracking lexical co-occurrences. The underlying motivation is that such tracking could enable more accurate speech recognition and lexical disambiguation. Several functions are provided that given a morph in a discourse segment, a window consisting of that segment and a number of segments to the right, can predict a set of morphs that are likely to occur within that window.

As the most innovative part of the paper is this idea of semantic smoothing it is unfortunate that it is not explained in more detail, nor is it tested in the proof-of-concept evaluation.

The authors reply:

Yes, it is unfortunate. Unfortunately, there was insufficient time for experimentation with semantic smoothing during the preliminary evaluation. This is indeed a weakness of the current paper, and filling this gap will be a main thrust of the current round of study. However, we did hope that a preview description of the semantic smoothing technique would be of interest in a workshop paper, where discussion of work in progress is expected.

Lars Ahrenbergs comments:

In fact, it is not clear to the reader whether what is done is smoothing or something else such as inheritance or generalization. As an example, let Q(M1,M21) be 0.2 for two morphs M1 and M2, Q(cat(M1),cat(M2)) be 0.2 and let M2 have 8 category mates, M22-M29. What should be the value of Q(M1,M22) given that M22 does not occur in the corpus? Is anything taken away from Q(M1,M2) in order to take care of the mates, or are they simply assigned the same conditional probability as Q(M1,M21)? If the latter is the case it seems misleading to talk about smoothing.

The authors reply:

We're a little confused by your indexes, and assume that "M2" and "M21" were intended to be the same morph. In our answer, we'll assume that M21 replaces M2 throughout.

Our procedure is as follows:

Q(cat(M1),cat(M21)) can be used as the "emergency substitute" co-occurrence value for Q(M1,M22) if

  1. the first value is dependably derived from the corpus while the second value is not and
  2. M22 is a semantic category-mate of M21, i.e. a member of cat(M21).
"Dependably derived" means that Q is above a pre-specified threshold.

In other words, one may have insufficient data to judge the co-occurrence probabilities of a particular morph pair, but may still know the semantic categories of the morphs in question, and may further have sufficient data about category co-occurrence, derived with respect to other morphs in the same class. In this case, the known co-occurrence probabilities of the categories can replace the unknown morph probability.

Note that Q(cat(M1),cat(M21)) will not always be equal to Q(M1,M21) as in your assumptions, but rather will in general be greater-than-or-equal, since category-mates of M1 and M21 may contribute to the category co-occurrence probability.

Assuming M23 through M29 were also semantic category-mates of M21 and M22, they would be subject to exactly the same considerations as M22. But note that the "emergency substitute" for any category-mate would be Q(cat(M1),cat(M21)), and not Q(M1, M21).

The cat and morph probabilities are derived independently of each other. While the Q of a semantic category may substitute for a missing or unreliable morph Q, neither score is in any way adjusted to take account of the other one. (It's true that one could estimate the unknown morph co-occurrence value by averaging data for semantic category-mates for which good data has been obtained, but we haven't tried this alternative technique so far.)

Now as for terminology:

The term "smoothing" here means "compensation for, or regularization of, sparse morph co-occurrence data through the use of semantic co-occurrence data". We think this usage is consistent with that of e.g. the speech recognition community, where part-of-speech clustering and similar techniques can smooth, or help compensate for, missing word N-gram training data. However, if this terminology proves to be confusing or misleading for many readers, we're certainly prepared to replace it.

"Generalization" would in fact be a good suggestion for a replacement term. We are indeed generalizing by assuming (when necessary) that morphs which are similar in one way will be similar in other ways. Specifically, morphs which are semantic class-mates are guessed to have similar probabilities of co-occurring with morphs outside the semantic class.

It would also be possible to think in terms of inheritance, since morphs lacking a reliable Q with respect to M1 can receive ("inherit"?) their semantic class's Q with respect to cat(M1). However, we're afraid this terminology would be more confusing than helpful.

Lars Ahrenbergs comments:

An objection one might have to the procedure is that there may exist dependencies between morphs of categories with co-occurrence associations. The pairs bus/street and car/street seem straightforward, but what about pairs such as pilot/airplane, driver/airplane, butcher/airplane (assuming pilot, driver and butcher to be in the same semantic category)? But I believe many examples of this kind can be found.

The authors reply:

Any generalization brings the danger of overgeneralization.

But one defense against uncontrolled generalization is our preference for the most specific co-occurrence information available. In a well-articulated semantic hierarchy, "pilot" and "driver" should share a relatively specific sub-class of HUMAN-OCCUPATIONS - call it VEHICLE-OPERATOR. Thus we think generalization from "pilot" to "driver" would actually be a pretty reasonable guess, even if it turned out to be wrong. "Butcher", however, is a more distant cousin - related only via the higher-level HUMAN-OCCUPATIONS - and for just this reason would be a weaker candidate to share co-occurrences with the first two words.

Your comment does point out our dependence on a suitably branching ontology, however.

Lars Ahrenbergs comments:

A matter not discussed in the paper is the relation between this approach and semantic priming implementable e.g. via marker passing. I think a comment on this relation would be appropriate, especially as one of the problems the authors will apply the method to is lexical disambiguation.

The authors reply:

Our anticipated disambiguation does in fact amount to marker passing or spreading activation. One difference from many such approaches is that the networks through which our activation will flow will be corpus-based rather than hand-made. Granted, this feature is not entirely new, as other researchers (e.g. Hinrich Schuetze, Gen-ichiro Kikui) have recently used co-occurrence networks for spreading activation in the service of disambiguation. Actually, then, the main innovation in our approach to spreading activation is the way in which we combine statistical and symbolic resources: activation spreads (paths will be traced) through statistically derived networks of semantic symbols derived by hand or through distributional analysis.

Lars Ahrenbergs comments:

I would also like to encourage you to find a better example than the notorious 'bank'-ambiguity. For one thing, it is not an ambiguity that is likely to occur in the practical dialogue systems we know today.

The authors reply:

Okay. How would you like "table", which should receive its FURNITURE reading in the context of "room", "door", etc. but its CHART reading in the context of words in mathematical, statistical, publishing, and suchlike domains? (On the other hand, a familiar example does help to orient readers quickly, so that we can quickly move on to other matters.)

Lars Ahrenbergs comments:

Finally, I wonder how evidence from several morphs can be combined in determining co-occurrence probabilities. Given that both M0 and M1 are known to occur in a segment (and hence cat(M0) and cat(M1)) is the idea to look for morphs that are supported by both of them or is this just further research?

The authors reply:

Yes, we definitely want to predict most strongly those morphs which are predicted along multiple independent lines of evidence. So far, however, we haven't tried to develop mathematical formulas for combining evidence. We should probably experiment with several ways.

Warmest thanks for your stimulating comments.

Mark, Jan and Kristiina


Additional questions and answers will be added here.
To contribute, please click [send contribution] above and send your question or comment as an E-mail message.
For additional details, please click [debate procedure] above.
This debate is moderated by the guest editors.