Richard McConachy and Ingrid Zukerman

Dialogue Requirements for Argumentation Systems

[Full Text]
[send contribution]
[debate procedure]
[copyright]


Overview of interactions

No Comment(s) Answer(s) Continued discussion
1 7.2.00 Ehud Reiter 22.2.00 Ingrid Zukerman

2 24.2.00 Elisabeth André 30.3.00 Richard McConachy and Ingrid Zukerman

C1. Ehud Reiter (7.2.00):

I would be interested in knowing more about how the authors model and reason about persuasion. I've been working on a system which attempts to generate material which helps people stop smoking, and we don't directly argue with people or challenge their beliefs as we believe that's likely to be counter-productive. Of course the smoking domain (which is very personal and emotion-laden) may differ from the domain the authors are working in. Anyways, it would be interesting to see more of a discussion of the author's model of persuasion and why they think its appropriate.

The paper was also a bit difficult to follow at times, it would be useful to have more examples which illustrate the type of processing the system performs.

I guess I was also slightly concerned that all the texts were made-up ones. Would it be possible to also include some naturally occuring texts, and discuss how well these fit the model?


C1. Ingrid Zukerman (22.2.00):

Paragraph 1
How does NAG model and reason about persuasion? Is directly arguing and maybe challenging a user's beliefs bad?

Response
Our model of reasoning is Bayesian. We maintain two Bayesian networks: one that models the user and a normative one that models the system's beliefs. If possible, the argument is generated from nodes and links which appear in both networks. The effect of an argument is assessed by performing Bayesian propagation in both models. Bayesian networks were chosen because their reasoning is normatively correct. In this research we are not interested in swaying the user's opinion through other means, e.g., omitting information or appealing to emotion. Still, owing to our use of a user model, the system described in the paper tries not to challenge people's presumed beliefs. Two effects of this approach are: (1) the user is less likely to find propositions in the argument that they disagree with, and hence challenge their beliefs, and (2) the user is more inclined to follow and believe the resulting argument. If too few beliefs common to both models can be found in support of an argument, NAG switches to using the normative model only. For instance, this might happen when the user model is an insufficient subset of the normative model, so additional propositions (not known to the user) must be used in an argument. In such cases, the goal of being correct is favored rather than the goal of being persuasive.

It is important to keep in mind that the system described in our paper does not engage in a discussion with the user. It it an exploratory system which presents an initial argument, and then lets the user explore what would be the effect of different situations and beliefs on this argument. For example, the user may decide that a belief should not be considered, or s/he may want to force a proposition to have a certain belief. As a result of the system not arguing back and forth with the user, our user model is made up. However, the beliefs in the user model change as a result of an argument. For example, if the user explicitly or implicitly (by not disagreeing) accepts NAG's argument, then at the end of the session the beliefs in the user model are updated to reflect those mentioned in the argument. At present we are working on a system where we perform some user model acquisition.

Owing to the focus of our work, there are several issues pertaining to persuasion which we have not addressed. Firstly, as mentioned above, NAG has no rhetorical skills; at best it will select its argument presentation ordering to try to address phenomena such as the anti-primacy effect. Secondly, NAG has no idea about emotions or subjective value judgments. Combining evidence is merely a mechanical process; there are no special subjects to be avoided or taken advantage of, hence no belief is more or less confrontational than any other.


Paragraph 2
It would be useful to have more examples to help follow the type of processing the system performs.

Response
We shall add a running example to help counter this problem in a future revision of the paper. Thank you for pointing this out.


Paragraph 3
All the texts are made-up ones. Would it be possible to include some naturally occurring texts and discuss how well these fit the model?

Response
We have some examples that are derived from real scenarios. For instance, one that has appeared in other papers revolves around the Alvarez' theory of what caused the dinosaurs to become extinct, e.g., [Zukerman et al. 1998]. This theory is nicely summarized in the Dinosaur Extinction Page and described in more detail in Walter Alvarez's book [Alvarez, 1997]. The goal we set for NAG is to argue that an asteroid struck the Earth about 65 million years ago. The extinction of the dinosaurs, the discovery of a layer of deposited iridium and the discovery of what looks to be a large impact crater in the Gulf of Mexico (all dated as happening about 65 million years ago) form real evidence for this hypothesis.

Preliminary comparisons between the arguments produced by NAG with those found in books and other media (e.g., the web-site above) yields several differences, e.g., NAG does not omit information that undermines its argument (since it includes the ``whole truth'' as it knows it), and NAG cannot use different turns of phrase to artificially strengthen useful links and weaken those damaging its goal. Comparing NAG's arguments to those seen in live media, e.g, political interviews, shows further `tricks' that NAG cannot perform, such as appealing to an audience or changing the topic rather than responding.

References
[Zukerman et al. 1998] Zukerman, I., McConachy, R., and Korb, K. (1998), Bayesian Reasoning in an Abductive Mechanism for Argument Generation and Analysis. In AAAI98 Proceedings -- the Fifteenth National Conference on Artificial Intelligence, Madison, Wisconsin, pp. 833-838, AAAI Press.

[Alvarez, W. 1997] Alvarez, W. (1997), T. rex and the crater of doom, Princeton, New Jersey, Princeton University Press.


C1. Elisabeth André (24.2.00):

Dear Ingrid and Richard,

First of all, thanks a lot for your valuable contributions to the ETAI discussion. I have a couple of comments on your paper:

  1. As far as I can see, the requirements you listed are not specifically tailored to argumentation systems. Some of them are closely related to the maxims of Grice which represent very general principles of conversation. Just replace "argument" and "argumentation" in your list with more general terms, and you will get requirements that should be met by any dialogue system. You argue that "a system which looks up bus routes is unlikely to require a complex rebuttal-handling capability." I agree that that such a system needs different dialogue strategies, but I think all the requirements you mentioned should still be satisfied. I would also like to support Ehud in suggesting that you list a real dialogue (maybe one of those you mentioned in your response to Ehud). Otherwise, the reader might get the impression that your list reflects more or less just your own research interests.
  2. An additional requirement that comes to my mind is the consideration of social factors, such as the social status and the personality of the conversational partner, which may have a significant influence on the acceptance of a system. For instance, a system might come up with very persuasive arguments, but the user might not be willing to accept them since the system simply does not find the right tone of interaction.
  3. I'm wondering in how far human-like qualities should be integrated into an argumentation system or a dialogue system in general. For instance, you list as a requirement that the system should be able to track the user's focus. However, in the sample dialogue (turn S5) the system first fails in recognizing that the focus is the player's health. From my point of view, this is ok since it makes the system more human-like. Humans make mistakes. Thus, it is quite natural if computers make mistakes as well - as long as they are making them for plausible reasons. So the question is whether we should rather be aiming at perfect or at human-like dialogue partners.
  4. I would like to see requirements discussed also from a user's point of view. That is instead of just explaining how you are trying to satisfy these requirements, you might perform an empirical evaluation and discuss the users' impression of the system. For instance, a dialogue system might not be able to perfectly track the user's focus of attention. But, a user might interpret this as an attempt of the system to deviate from a topic because it is missing good arguments at a certain point in the dialogue. User studies might also help to identify requirements for argumentation systems.
  5. I enjoyed reading your paper.

C1. Richard McConachy and Ingrid Zukerman (30.3.00):

Item 1
Paragraph 1
How does NAG model and reason about persuasion? Is directly arguing and maybe challenging a user's beliefs bad? As far as I can see, the requirements you listed are not specifically tailored to argumentation systems. Some of them are closely related to the maxims of Grice which represent very general principles of conversation. Just replace "argument" and "argumentation" in your list with more general terms, and you will get requirements that should be met by any dialogue system. You argue that "a system which looks up bus routes is unlikely to require a complex rebuttal-handling capability." I agree that such a system needs different dialogue strategies, but I think all the requirements you mentioned should still be satisfied.

Response
We agree that these requirements indeed are general dialogue requirements, rather than specific argumentation requirements. The intention in our introduction (which may not have been clear) was to show that argumentation systems are an interesting area of study for dialogue systems because their requirements are at the ``high end'' of those of dialogue systems. That is, many dialogue systems, e.g., a bus-scheduling system, are likely to have simpler requirements, but it is hard to imagine a dialogue system that has more complex requirements.

Paragraph 2
I would also like to support Ehud in suggesting that you list a real dialogue (maybe one of those you mentioned in your response to Ehud). Otherwise, the reader might get the impression that your list reflects more or less just your own research interests.

Response
Here are several arguments from different domains. A brief analysis of the similarities and differences between these arguments and NAG's is presented thereafter.

Investment: Email contribution to newsgroup sci.astro.amateur
The recent spike has no doubt been caused by the March 20th airing of CBS Marketwatch Weekend, where TeraBeam CEO Dan Hesse was interviewed. He made a nice pitch about the future of high speed ISP technology. The show also mentioned that while TeraBeam wasn't a public company (yet), Meade Instruments *was* and they had just inked a new service alliance with TeraBeam. Presto! Guess whose stock got a big boost? ;)

Asteroid Theory: Dinosaur Extinction WWW Page
The first people to suggest the asteroid theory were the team lead by Luis and Walter Alvarez. It has been calculated that a chondritic asteroid approximately 10km in diameter would contain enough iridium to account for the iridium spike contained in the clay layer. Since the original discovery of the iridium spike other evidence has come to light to support the asteroid theory. Analysis of the clay layer has revealed the presence of soot within the layer. It is thought that the presence of the soot comes from the very large global fires that would have been the result of the large temperatures caused by an impact. Something else that was found within the clay were quartz crystals that had been physically altered. This alteration only occurs under conditions of extreme temperature and pressure and quartz of this type is known as shocked quartz. Despite all of this evidence many geologists did not believe in this theory and many were saying 'show us the crater'.

In 1990 a scientist called Alan Hildebrand was looking over some old geophysical data that had been recorded by a group of geophysicists searching for oil in the Yucatan region of Mexico. Within the data he found evidence of what could have been an impact site. What he 'found' was a ring structure 180km in diameter which was called Chicxulub. The location of this structure was just off the northwest tip of the Yucatan Peninsula. The crater has been dated (using the 40Ar/39Ar method) as being 65 million years old. The size of the crater is comparable to that which would have been caused by an impacting body with a diameter of roughly 10km.So we now have some of the proof of the asteroid theory. We know that a chondritic meteorite with a diameter of 10km contains enough iridium to cause a spike. We also know that about 65 million years ago there was an impact of a large object. The big question is what were the results, and how did they effect the dinosaurs.

Chess: The Art of Positional Play, by Samuel Reshevsky (Chapter 4). David McKay Company, Inc. New York (1976).
There are sixty four squares on the chess board. Before the start of the game, each side controls his own first three ranks, but as soon as White makes the first move - say 1. P-K4 - his control of space increases dramatically: his pawn, his KB and his Queen already strike at squares in Black's half of the board. As long as White makes no errors and loses no time, he should continue to command slightly more space than Black. That is why White wins most decisive games and why the first move is an advantage.

Itchy and Scratchy: Collected from three year-11 students after showing them the preamble from the paper.
Student 1: I think that Itchy murdered Scratchy and all the evidence is here.
Itchy and Scratchy hated each other. Itchy's fingerprints were found on the murder weapon and Itchy's ladder with oblong supports, all support this answer.

Student 2: Poochie, because there were cylindrical/circular support marks found outside the window, and that shape ladder was only available to him.

Student 3: Poochie did it.

  1. Poochie had a cylindrical ladder, which works out with the marks outside the window.
  2. Itchy hated Scratchy. So Itchy wouldn't be stupid enough to kill him, because everyone would think he did it.
  3. Itchy wouldn't be stupid enough to leave a gun with his fingerprints in the room.

Now, the content of these arguments is comparable to the content NAG can include in its arguments. The main difference between these arguments and NAG's is in the presentation. The naturally occurring arguments use presentation devices beyond the scope of NAG's capabilities, such as rhetorical questions (e.g., Presto! Guess whose stock got a big boost?); foreshadowing required results (e.g., Despite all of this evidence many geologists did not believe in this theory and many were saying 'show us the crater'); chronological narrative and background information interleaved with the argument (in the Asteroid text); examples and background information interleaved with the argument (in the Chess text); and skipping many steps of the argument (in particular for the Itchy-Scratchy and Investment texts). In addition to the premise-to-goal argumentation strategy, NAG considers strategies such as reductio-ad-absurdum, inference to the best explanation, and reasoning by cases. However, NAG argues like a Vulcan (ref Star Trek) rather than a human. Its arguments are normatively correct and precise, and lack the flair seen in the above examples. We believe that the first order of business is to produce correct and dispassionate arguments, and only then proceed to manipulate the presentation of these arguments to increase their persuasiveness. The initial (dispassionate) arguments will then give us a benchmark against which we can compare the effect of varying different presentation parameters.


Item 2
An additional requirement that comes to my mind is the consideration of social factors, such as the social status and the personality of the conversational partner, which may have a significant influence on the acceptance of a system. For instance, a system might come up with very persuasive arguments, but the user might not be willing to accept them since the system simply does not find the right tone of interaction.

Response
Part of this question is answered in the above paragraph. At present we are focusing on the generation of arguments that persuade by means of their content, rather than their style. Hovy (1990) and Marcu (1996) studied the effect of social factors on parameters that affect argument style, e.g., lexical choice, sentence type and conciseness. However, they did not evaluate the persuasiveness of the resulting arguments.


Item 3
I'm wondering in how far human-like qualities should be integrated into an argumentation system or a dialogue system in general. For instance, you list as a requirement that the system should be able to track the user's focus. However, in the sample dialogue (turn S5) the system first fails in recognizing that the focus is the player's health. From my point of view, this is ok since it makes the system more human-like. Humans make mistakes. Thus, it is quite natural if computers make mistakes as well - as long as they are making them for plausible reasons. So the question is whether we should rather be aiming at perfect or at human-like dialogue partners.

Response
Focus tracking is considered a human activity. Endowing a system with a focus-tracking capability doesn't guarantee perfect (i.e., super-human) behaviour. Rather, it aims to incorporate a human ability into the system's dialogue processing capabilities. In the given example, the focus tracking capability leads to the selection of a plausible interpretation, which turns out to be the ``wrong'' one (i.e., not the one intended by the user). We expect that in general people wouldn't be able to do better (some may ask a clarification question if uncertain about the intended interpretation).
So, to answer your question. We have nothing against perfect behaviour. It would be great if machines could always understand what their users mean (in this case we could use machines to mediate between people :-). But achieving human-like behaviour (in particular in a dialogue context) would be just fine. Focus tracking is one more tool to help achieve human-like behaviour. It is unclear whether the path to perfect behaviour passes through ``human-like'' or is altogether different.


Item 4
I would like to see requirements discussed also from a user's point of view. That is instead of just explaining how you are trying to satisfy these requirements, you might perform an empirical evaluation and discuss the users' impression of the system. For instance, a dialogue system might not be able to perfectly track the user's focus of attention. But, a user might interpret this as an attempt of the system to deviate from a topic because it is missing good arguments at a certain point in the dialogue. User studies might also help to identify requirements for argumentation systems.

Response
On one hand, NAG has limited interactive capabilities, while on the other it offers a new way of interacting with a system, i.e., via exploratory queries, such as what if questions or requests for the exclusion or inclusion of certain propositions in an argument. This exploratory mode is ``unnatural'', in the sense that it doesn't model a human-human dialogue. In addition, it gives users the initiative and control both of the interaction and of the system's activities. Thus, in this style of interaction, there is little opportunity for a system to mis-interpret a user's intent. We speculate that having such control may encourage a user to perform a more detailed examination of an argument than the examination afforded by a ``normal'' dialogue. In this context, an empirical study could compare users' reactions to a normal dialogue with their reactions to an exploratory dialogue. We do not have a normal dialogue system (and such a system is not likely to be available in the near future), so at best we could ask people what they think of the different available capabilities.

Once the usability of the system has been established, the main factor of interest is whether people will learn from this mode of interaction. One result of our preliminary evaluation was that after an exploratory interaction, there was a clear tendency to shift belief towards the normative values for the more ``difficult'' propositions in the argument. These are propositions whose value was not clear cut after reading the initial argument (in our example, the difficult propositions are Itchy is guilty and Itchy had opportunity to murder Scratchy) [Zukerman et al. 1999].

Another observation from our evaluation was that some users expressed dissatisfaction with the system because of its limited domain knowledge (rather than its other capabilities). This happened when the system was unable to consider options users were interested in, e.g., Itchy and Poochie colluded to murder Scratchy, or Poochie framed Itchy. Some users also found the text a bit repetitive. However, nobody attributed to the system intentions to deceive (people don't think computers are really that smart). Thus, the main hurdle for an open-ended dialogue system (in terms of the domain) pertains to its knowledge representation. In contrast, dialogue systems that engage in simpler interactions, e.g., systems that look up schedules or hotels, don't face this problem.

We are not sure whether this addresses your question. Perhaps you can give additional details of what sort of evaluation you had in mind.


Item 5
I enjoyed reading your paper.

Response
Thanks! :-)


References
Eduard H. Hovy (1990), Pragmatics and Natural Language Generation, Artificial Intelligence , 43(2), 153-197.

Daniel Marcu (1996), The conceptual and linguistic facets of persuasive arguments. In Proceedings of ECAI-96 Workshop -- Gaps and Bridges: New Directions in Planning and NLG, Budapest, Hungary, pages 43-46.

Ingrid Zukerman, Richard McConachy, Kevin B. Korb and Deborah A. Pickett (1999), Exploratory Interaction with a Bayesian Argumentation System. In IJCAI99 - Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pages 1294-1299.


Additional questions and answers will be added here.
To contribute, please click [send contribution] above and send your question or comment as an E-mail message.
For additional details, please click [debate procedure] above.
This debate is moderated by the guest editors.