untitled.html

D-B DM Life Cycle

Laila Dybkjær and Niels Ole Bernsen

Overall design goal(s): What is the general purpose(s) of the design process?

The original design goal was to build a dialogue manager for the Sundial research systems.
This entailed the following features:

Multilingual capability
Different application areas
Modularity (because of distributed development at several sites)

Later the design goal has turned into a goal of being able to use the dialogue manager not only in research prototypes but also in commercial products.

Hardware constraints: Were there any a priori constraints on the hardware to be used in the design process?

No; Sun WS were eventually chosen because of availability at most sites in Sundial, but systems also ran on DEC Ultrix WS and HP. The first German parsers from Siemens ran on HP Explorer Lisp Machines.

Software constraints: Were there any a priori constraints on the software to be used in the design process?

No; Quintus Prolog was chosen again on practical grounds.

Customer constraints: Which constraints does the customer (if any) impose on the system/component? Note that customer constraints may overlap with some of the other constraints. In that case, they should only be inserted once, i.e. under one type of constraint.

The dialogue manager was originally developed for the Sundial research prototype systems. This means that there were no customers. In the Access project customers are insurance companies. Since such companies are not interested in revealing all their internal calculation rules on which they base their offers, the system developers could not have access to the entire task structure. This means that part of the task model and the inferences had to be made external (in contrast to what has been the case earlier applications including the dialogue manager) to the dialogue manager. They are provided by the insurance database meaning that task goals may also be introduced from outside the task interpretation module.
In other applications, the constraints were mainly on the type of the task interface, in software, protocols, and hardware connections. Currently, there is an urge towards PC-based solutions.

Other constraints: Were there any other constraints on the design process?

Cf. above

Design ideas: Did the designers have any particular design ideas which they would try to realise in the design process?

There were two basic ideas underlying the design of the dialogue manager. First, it had to be generic, i.e. had to be able to work in more than one language and across several task domains. Second, it had to be co-operative during interaction with the user.

Designer preferences: Did the designers impose any constraints on the design which were not dictated from elsewhere?

Rapid prototyping was necessary in order to meet the need for intermediate demos.

Design process type: What is the nature of the design process?

Originally the design process type was exploratory research. Later design stages were adaptation and optimisation.

Development process type: How was the system/component developed?

In the Sundial project the dialogue model was developed through Wizard of Oz.

Requirements and design specification documentation: Is one or both of these specifications documented?

There was a first-year deliverable in Sundial called 'Detailed design specification'.

Development process representation: Has the development process itself been explicitly represented in some way? How?

No.

Realism criteria: Will the system/component meet real user needs, will it meet them better, in some sense to be explained (cheaper, more efficiently, faster, other), than known alternatives, is the system/component "just" meant for exploring specific possibilities (explain), other (explain)?

The goal was to

explore the possibilities of what today would be called 'conversational' dialogue;
build a prototype (especially for Vocalis) to demonstrate technological leadership in speech dialogue.

Typically, the applications in which the dialogue manager has been used , have been meant to make the performance of a task faster and more efficient.
In the further development and the porting to different applications, of course deployer (!!) criteria were central, like, e.g. performance criteria or fin{anci}al adequacy criteria.

Functionality criteria: Which functionalities should the system/component have (this entry expands the overall design goals)?

The dialogue manager is a module the main functionalities of which are

to assign an interpretation to the input it receives from the linguistic analysis module,
to negotiate the effects of this interpretation with one or more given application systems,
to decide, on both the interpretation and the application information, how the dialogue may best continue, and,
if it is the system's turn to speak, plan an adequate system utterance.

Usability criteria: What are the aims in terms of usability?

The component should be language independent, it should be possible to use the component in a variety of applications with a minimum of modifications, and the component should be able to run in different environments.

Organisational aspects: Will the system/component have to fit into some organisation or other, how?

The dialogue manager must be usable for DASA and fit their programs.

Customer(s): Who is the customer for the system/component (if any)?

The component started as part of a research system (Sundial) for which there was no real customer but for which a potential customer could have been a travel agency. In fact British Airways was involved. For the Access system companies with a call centre, e.g. insurance companies, are the (potential) customers. For the dialogue manager as such DASA is the customer.

Users: Who are the intended users of the system/component?

The original dialogue manager was developed for four different languages: German, French, Italian and English. D-B's later use of the dialogue manager has been for German systems. All systems in which the dialogue manager has been used, are walk-up-and-use systems which should be self-explicatory and thus requiring no particular background from their users.

Developers: How many people took significant part in the development? Did that cause any significant problems, such as time delays, loss of information, other (explain)? Characterise each person who took part in terms of novice/intermediate/expert wrt. developing the system/component in question and in terms of relevant background (e.g., novice phonetician, skilled human factors specialist, intermediate electrical engineer).

Between 5 and 12 people at different sites participated in the development of the Sundial dialogue manager. These were people with many different backgrounds, such as electrical engineer, philosophy, cognitive science, computer scientist and linguist. After Sundial finished three people at D-B have currently been involved in working on the dialogue manager, though not full time. Two of these people have a background in computer science and linguistics and the third one in German philology.

Development time: When was the system developed? What was the actual development time for the system/component (estimated in person/months)? Was that more or less than planned? Why?

The first dialogue manager was developed between 1989 and 1993. Between 40 and 50 person years were spent on the dialogue manager. This was also what had been planned. Since then there has currently been ongoing work on improving the dialogue manager and using it in other applications. About 4-5 person years have been spent on the dialogue manager after 1993. It typically takes 4-6 person months spent on dialogue to adapt the dialogue manager to a new application (and 3 person months for the grammar ).

Requirements and design specification evaluation: Were the requirements and/or design specifications themselves subjected to evaluation in some way, prior to system/ component implementation? If so, how?

No. The closest to this would be the reviewers' comments on the plans in the Sundial project. However, these are not publicly available. In later projects there has been no requirements and design specification evaluation for the dialogue manager, but there has been for the application as such. Of course, the dialogue manager has to be able to manage the appropriate dialogue structures.

Evaluation criteria: Which quantitative and qualitative performance measures should the system/component satisfy?

Such criteria did not exist when the Sundial project started. They were invented when there was a need which means when the system/component was going to be evaluated. In the Access project there are very few evaluation metrics and they are basically all related to costs. The Call Center manager (in the ACCeSS-1 Insurance application) has a figure of Cost-Per-Contract, i.e. the expenses needed to get one customer to sign a contract. This figure can be measured for both the 'natural' and the automatic version of the dialogue. This figure captures all aspects of the dialogue, in that an insufficient system will only get considerably less contracts, i.e. customer acceptance. So, at the end of the day, you compare the Cost-Per-Contract, and this comparison also tells you exactly what the dialogue system may cost, i. e. its value on the market.

Evaluation: At which stages during design and development was the system/component subjected to testing/evaluation? How? Describe the results.

[Describe, one-by-one, the aspects that were evaluated, when, the set-up and the methodologies used, e.g. Wizard of Oz scenario-based, glassbox, blackbox, progress (comparing successive measurements), diagnostic, performance, adequacy, acceptance, field, objective, subjective. Number of subjects/users involved in each test.

Data collection was done in logfiles. The dialogues were transcribed but not annotated. Thus what was available was the transcribed dialogues, the speech data and for the user utterances the recogniser results, the parser results and the predictions used during processing. There was no automatic information extraction from the data. The dialogues were looked through and transaction successes counted. In case a problem was detected, it was tried to identify the reason and find a way in which to repair the problem. Problem detection was not done systematically but rather on the fly by browsing the transcribed dialogues.

Test suites were used. They are available as annexes to the Sundial final report. These tests used typed input. Other tests were performed but cannot be replicated since spoken input was used.

Nothing is really stated about comparability of the test (result)s with those of other components of similar scope.

Data on a further development of the Sundial system have been collected through a series of iterations, cf. Table 1. Phase 1 used read speech for recogniser evaluation and is not included in the table. In phase 2 a high quality microphone was used and a first corpus of inhouse inquiries was collected. Phase 3 switched to telephone quality.

< TD>9404 1365 < /TR> </ TR> avg length of utterance in sec < TD> 3.11

P2 P3 P3+ P4 P5 P6 P7 P8

collection date, started 9306 9311 9312 9401 9408 9409 9501

corpus size (MB) 127 42 53 1 33 50 28 204

number of dialogues 237 49 77 &n bsp;161 42 35 325

number of utterances 1742 585 533 199 303 2187

number of words 6384 1841 1668 & nbsp;4238 1144 1154 6773

number of different words 239 191 168 &nbs p;320 196 174 1056 number of unknown words 68 25 21 1 11 77 - -

total duration in sec 3983 1318 1677 &nb sp;4152 1590 902 6324

avg length of dialogue in sec 183 220 204 183 - 188 179

2.4 2.3 3.1 3.0 2.8 3.0 2.89

avg words per utterance 3.67 3.16 3.13 5.75 3.80 3.10

microphone/telephone mic PABX PSTN

supervised/unsupervised supervised selfsup. unsupervised

user skills seminaive expert naive

Mastery of the development and evaluation process: Of which parts of the process did the team have sufficient mastery in advance? Of which parts didn't it have such mastery?

In the Sundial project there was no mastery of the development and evaluation process in advance since everything was new. In later projects the team has had good mastery of the process.

Problems during development and evaluation: Were there any major problems during development and evaluation? Describe these.

In Sundial there were competing implementations of some of the modules of the dialogue manager. For example there were two task modules and two dialogue modules. This caused problems as to which module to use. Also it was not unproblematic that so many sites were involved. There was a tendency to develop individual methodologies at the individual sites and not always conform to the same standards.

Development and evaluation process sketch: Please summarise in a couple of pages key points of development and evaluation of the system/component. To be done by the developers.

Requirements were specified, an architecture was outlined, these were distributed among the sites in the Sundial project. The flow (messages) between the individual modules was specified, a message file was specified to enable each module to interact with dummies. Every three months there would be an integration week. This meant that representatives from the involved sites would meet for a couple of days to work together and i.e. address interface problems between the modules. After each integration meeting there would be one person responsible for distributing the new official versions of software. The evaluation was mainly done in terms of comparing the current version to the requirements to see what worked and what did not work yet.

Component selection/design: Describe the system components and their origins.

The first dialogue manager version was built in the Sundial project. The sites involved were Vocalis, Erlangen University, Daimler-Benz, Surrey University, CSELT, CNET, Cap Gemini and IRISA. The dialogue manager was later further developed in-house by D-B.

Robustness: How robust is the system/component? How has this been measured? What has been done to ensure robustness?

Graceful degradation is used to ensure a robust and co-operative dialogue interaction. In the beginning this method caused the system to break down quite often. Now, however, it is very stable and if it turns out that the system cannot help the user, i.e. even the lowest level does not help the system to understand the user correctly, the user is told that this is the case and/or redirected to a human operator.

Maintenance: How easy is the system to maintain, cost estimates, etc.

Maintenance is not needed very much. There are no guidelines on how to do it. For modifications, customisation and additions see below.

Portability: How easily can the system/component be ported?

If you can run Quintus Prolog on the machine it is very easy to port the dialogue manager to this machine. It runs on Windows, NT, and Unix platforms. A version for Sixtus Prolog under Linux is being considered.

Modifications: What is required if the system is to be modified?

The dialogue management model has been extended for multimodal applications where a speech interface is integrated into a direct manipulation environment [McGlashan 1996]. Interpretation of graphical input is based on the same semantic and pragmatic structures required for spoken language, although the structures are less complex to process due to the absence of underspecified input like anaphora and ellipsis. Moreover, the algorithm for realising system dialogue acts has been modified to allow generation of graphical output, and the semantic and interpretative functions have been enhanced to handle 'command and control' utterances. However, the basic principles of the dialogue management model used in speech-only applications remain intact.

Additions, customisation:

To adapt the system to a new information service, modelling the tasks, the application system, the discourse world and the language coverage is sufficient. Language and task can be changed in the dialogue manager simply by resetting switches which govern the static knowledge bases consulted during dialogue management.

Property rights: Describe the property rights situation for the system/component.

Daimler-Benz has the property rights of the dialogue manager.

Documentation of the design process

E.g. specification documents or parts thereof.

References to additional project/system/component documentation

Bayer, Thomas; Heisterkamp, Paul; Mecklenburg, Klaus; Renz, Ingrid; Regel-Brietzmann, Peter; Kaltenmeier, Alfred; Ehrlich, Ute (1995): Natürliche Sprache - ein multimedialer Träger von Information. InfoPort - ein Projekt zur Überbrückung von Medienbrüchen. In: Proceedings of DAGM-95, Bielefeld.

Brietzmann, Astrid; Class, Fritz; Ehrlich, Ute; Heisterkamp, Paul; Kaltenmeier, Alfred; Mecklenburg, Klaus; Regel-Brietzmann, Peter; Hanrieder, Gerhard; Hiltl, Waltraud (1994): Robust speech understanding. In: Proceedings of ICSLP '94, Yokohama.

Hanrieder, Gerhard; Heisterkamp, Paul (1994): Robust analysis and interpretation in speech dialogue. In: Niemann, Heinrich; de Mori, Renato; Hanrieder, Gerhard (Eds.): Progress and prospects of speech research and technology. Proceedings of the CRIM/Forwiss Workshop, 5.-7. September 1994, Munich, Germany. St. Augustin: Infix. (Proceedings in Artificial Intelligence. 1.).

Heisterkamp, Paul; McGlashan, Scott; Youd, Nick (1992): Dialogue semantics for an oral dialogue system. In: Proceedings of ICSLP-92, Banff, Alberta, Canada, 1992.

Heisterkamp, Paul (1993): Ambiguity and uncertainty in spoken dialogue. In: Proceedings of Eurospeech '93, Berlin.

Heisterkamp, Paul (1996): Natural language analysis and generation. Materials of the course held at the 4th European Summer School on Language and Speech Communication - Dialogue Systems. Budapest, Hungary.

Heisterkamp, Paul; McGlashan, Scott (1996): Units of dialogue management: an example. In: Proceedings of ICSLP-96, Philadelphia, Pa.

[Reference to McGlashan 1995 is missing]

McGlashan, Scott; Fraser, Norman M.; Gilbert, G. Nigel; Bilange, Éric; Heisterkamp, Paul; Youd, Nick (1992): Dialogue management for telephone information systems. In: Proceedings of the 3rd Conference on Applied Natural Language Processing, Trento, Italy.

Mecklenburg, Klaus; Hanrieder, Gerhard; Heisterkamp, Paul (1995): A Robust parser for continuous spoken language using PROLOG. In: Proceedings of Natural Language Understanding and Logic Programming 1995, Lisbon, Portugal.

Regel-Brietzmann, Peter et al.(forthcoming): ACCeSS - Automated Call CEnter through Speech understanding System. A description of an advanced application. Proceedings of Eurospeech '97, Rhodes.

	P2	P3	P3+	P4	P5	P6	P7	P8
collection date, started	9306	9311	9312	9401	9408	9409	9501
corpus size (MB)	127	42	53	1 33	50	28	204
number of dialogues	237	49	77	&n bsp;161	42	35	325
number of utterances	1742	585	533	199	303	2187
number of words	6384	1841	1668	& nbsp;4238	1144	1154	6773
number of different words	239	191	168	&nbs p;320	196	174	1056	number of unknown words	68	25	21	1 11	77	-	-
total duration in sec	3983	1318	1677	&nb sp;4152	1590	902	6324
avg length of dialogue in sec	183	220	204	183	-	188	179
2.4	2.3	3.1	3.0	2.8	3.0	2.89
avg words per utterance	3.67	3.16	3.13	5.75	3.80	3.10
microphone/telephone	mic	PABX		PSTN
supervised/unsupervised	supervised		selfsup.	unsupervised
user skills	seminaive		expert	naive