D-B DM Life Cycle

 

Laila Dybkjær and Niels Ole Bernsen

 

Overall design goal(s): What is the general purpose(s) of the design process?

The original design goal was to build a dialogue manager for the Sundial research systems.
This entailed the following features:

Later the design goal has turned into a goal of being able to use the dialogue manager not only in research prototypes but also in commercial products.

Hardware constraints: Were there any a priori constraints on the hardware to be used in the design process?

No; Sun WS were eventually chosen because of availability at most sites in Sundial, but systems also ran on DEC Ultrix WS and HP. The first German parsers from Siemens ran on HP Explorer Lisp Machines.

Software constraints: Were there any a priori constraints on the software to be used in the design process?

No; Quintus Prolog was chosen again on practical grounds.  

Customer constraints: Which constraints does the customer (if any) impose on the system/component? Note that customer constraints may overlap with some of the other constraints. In that case, they should only be inserted once, i.e. under one type of constraint.

The dialogue manager was originally developed for the Sundial research prototype systems. This means that there were no customers. In the Access project customers are insurance companies. Since such companies are not interested in revealing all their internal calculation rules on which they base their offers, the system developers could not have access to the entire task structure. This means that part of the task model and the inferences had to be made external (in contrast to what has been the case earlier applications including the dialogue manager) to the dialogue manager. They are provided by the insurance database meaning that task goals may also be introduced from outside the task interpretation module.
In other applications, the constraints were mainly on the type of the task interface, in software, protocols, and hardware connections. Currently, there is an urge towards PC-based solutions.  

Other constraints: Were there any other constraints on the design process?

Cf. above  

Design ideas: Did the designers have any particular design ideas which they would try to realise in the design process?

There were two basic ideas underlying the design of the dialogue manager. First, it had to be generic, i.e. had to be able to work in more than one language and across several task domains. Second, it had to be co-operative during interaction with the user.

Designer preferences: Did the designers impose any constraints on the design which were not dictated from elsewhere?

Rapid prototyping was necessary in order to meet the need for intermediate demos.

Design process type: What is the nature of the design process?

Originally the design process type was exploratory research. Later design stages were adaptation and optimisation.

Development process type: How was the system/component developed?

In the Sundial project the dialogue model was developed through Wizard of Oz.

Requirements and design specification documentation: Is one or both of these specifications documented?

There was a first-year deliverable in Sundial called 'Detailed design specification'.

Development process representation: Has the development process itself been explicitly represented in some way? How?

No.

Realism criteria: Will the system/component meet real user needs, will it meet them better, in some sense to be explained (cheaper, more efficiently, faster, other), than known alternatives, is the system/component "just" meant for exploring specific possibilities (explain), other (explain)?

The goal was to

Typically, the applications in which the dialogue manager has been used , have been meant to make the performance of a task faster and more efficient.
In the further development and the porting to different applications, of course deployer (!!) criteria were central, like, e.g.  performance criteria or  fin{anci}al adequacy criteria.

Functionality criteria: Which functionalities should the system/component have (this entry expands the overall design goals)?

The dialogue manager is a module the main functionalities of which are

Usability criteria: What are the aims in terms of usability?

The component should be language independent, it should be possible to use the component in a variety of applications with a minimum of modifications, and the component should be able to run in different environments.

Organisational aspects: Will the system/component have to fit into some organisation or other, how?

The dialogue manager must be usable for DASA and fit their programs.

Customer(s): Who is the customer for the system/component (if any)?

The component started as part of a research system (Sundial) for which there was no real customer but for which a potential customer could have been a travel agency. In fact British Airways was involved. For the Access system companies with a call centre, e.g. insurance companies, are the (potential) customers. For the dialogue manager as such DASA is the customer.

Users: Who are the intended users of the system/component?

The original dialogue manager was developed for four different languages: German, French, Italian and English. D-B's later use of the dialogue manager has been for German systems. All systems in which the dialogue manager has been used, are walk-up-and-use systems which should be self-explicatory and thus requiring no particular background from their users.

Developers: How many people took significant part in the development? Did that cause any significant problems, such as time delays, loss of information, other (explain)? Characterise each person who took part in terms of novice/intermediate/expert wrt. developing the system/component in question and in terms of relevant background (e.g., novice phonetician, skilled human factors specialist, intermediate electrical engineer).

Between 5 and 12 people at different sites participated in the development of the Sundial dialogue manager. These were people with many different backgrounds, such as electrical engineer, philosophy, cognitive science, computer scientist and linguist. After Sundial finished three people at D-B have currently been involved in working on the dialogue manager, though not full time. Two of these people have a background in computer science and linguistics and the third one in German philology.

Development time: When was the system developed? What was the actual development time for the system/component (estimated in person/months)? Was that more or less than planned? Why?

The first dialogue manager was developed between 1989 and 1993. Between 40 and 50 person years were spent on the dialogue manager. This was also what had been planned. Since then there has currently been ongoing work on improving the dialogue manager and using it in other applications. About 4-5 person years have been spent on the dialogue manager after 1993. It typically takes 4-6 person months spent on dialogue to adapt the dialogue manager to a new application (and 3 person months for the grammar ).

Requirements and design specification evaluation: Were the requirements and/or design specifications themselves subjected to evaluation in some way, prior to system/ component implementation? If so, how?

No. The closest to this would be the reviewers' comments on the plans in the Sundial project. However, these are not publicly available. In later projects there has been no requirements and design specification evaluation for the dialogue manager, but there has been for the application as such. Of course, the dialogue manager has to be able to manage the appropriate dialogue structures.

Evaluation criteria: Which quantitative and qualitative performance measures should the system/component satisfy?

Such criteria did not exist when the Sundial project started. They were invented when there was a need which means when the system/component was going to be evaluated. In the Access project there are very few evaluation metrics and they are basically all related to costs. The Call Center manager (in the ACCeSS-1 Insurance application) has a figure of Cost-Per-Contract, i.e. the expenses needed to get one customer to sign a contract. This figure can be measured for both the 'natural' and the automatic version of the dialogue. This figure captures all aspects of the dialogue, in that an insufficient system will only get considerably less contracts, i.e. customer acceptance. So, at the end of the day, you compare the Cost-Per-Contract, and this comparison also tells you exactly what the dialogue system may cost, i. e. its value on the market.

Evaluation: At which stages during design and development was the system/component subjected to testing/evaluation? How? Describe the results.

[Describe, one-by-one, the aspects that were evaluated, when, the set-up and the methodologies used, e.g. Wizard of Oz scenario-based, glassbox, blackbox, progress (comparing successive measurements), diagnostic, performance, adequacy, acceptance, field, objective, subjective. Number of subjects/users involved in each test.

Data collection was done in logfiles. The dialogues were transcribed but not annotated. Thus what was available was the transcribed dialogues, the speech data and for the user utterances the recogniser results, the parser results and the predictions used during processing. There was no automatic information extraction from the data. The dialogues were looked through and transaction successes counted. In case a problem was detected, it was tried to identify the reason and find a way in which to repair the problem. Problem detection was not done systematically but rather on the fly by browsing the transcribed dialogues.

Test suites were used. They are available as annexes to the Sundial final report. These tests used typed input. Other tests were performed but cannot be replicated since spoken input was used.

Nothing is really stated about comparability of the test (result)s with those of other components of similar scope.

Data on a further development of the Sundial system have been collected through a series of iterations, cf. Table 1. Phase 1 used read speech for recogniser evaluation and is not included in the table. In phase 2 a high quality microphone was used and a first corpus of inhouse inquiries was collected. Phase 3 switched to telephone quality.

< TD>9404  1365 < /TR> </ TR> avg length of utterance in sec < TD> 3.11
   P2  P3  P3+  P4  P5  P6  P7  P8                
collection date, started  9306 9311 9312 9401 9408 9409  9501                  
corpus size (MB)  127  42  53  1 33  50  28  204                
number of dialogues  237  49  77 &n bsp;161  42  35  325                
number of utterances  1742  585  533  199  303  2187                  
number of words  6384  1841  1668 & nbsp;4238  1144  1154  6773                
number of different words  239  191  168 &nbs p;320  196  174  1056 number of unknown words  68  25  21  1 11  77  -  -
total duration in sec  3983  1318  1677 &nb sp;4152  1590  902  6324                
avg length of dialogue in sec  183  220  204   183  -  188  179                
 2.4  2.3  3.1   3.0  2.8  3.0  2.89                  
avg words per utterance  3.67  3.16  3.13  5.75  3.80  3.10                  
microphone/telephone  mic PABX PSTN                
 supervised/unsupervised  supervised  selfsup. unsupervised                
 user skills  seminaive  expert naive                

Mastery of the development and evaluation process: Of which parts of the process did the team have sufficient mastery in advance? Of which parts didn't it have such mastery?

In the Sundial project there was no mastery of the development and evaluation process in advance since everything was new. In later projects the team has had good mastery of the process.

Problems during development and evaluation: Were there any major problems during development and evaluation? Describe these.

In Sundial there were competing implementations of some of the modules of the dialogue manager. For example there were two task modules and two dialogue modules. This caused problems as to which module to use. Also it was not unproblematic that so many sites were involved. There was a tendency to develop individual methodologies at the individual sites and not always conform to the same standards.

Development and evaluation process sketch: Please summarise in a couple of pages key points of development and evaluation of the system/component. To be done by the developers.

Requirements were specified, an architecture was outlined, these were distributed among the sites in the Sundial project. The flow (messages) between the individual modules was specified, a message file was specified to enable each module to interact with dummies. Every three months there would be an integration week. This meant that representatives from the involved sites would meet for a couple of days to work together and i.e. address interface problems between the modules. After each integration meeting there would be one person responsible for distributing the new official versions of software. The evaluation was mainly done in terms of comparing the current version to the requirements to see what worked and what did not work yet.

Component selection/design: Describe the system components and their origins.

The first dialogue manager version was built in the Sundial project. The sites involved were Vocalis, Erlangen University, Daimler-Benz, Surrey University, CSELT, CNET, Cap Gemini and IRISA. The dialogue manager was later further developed in-house by D-B.

Robustness: How robust is the system/component? How has this been measured? What has been done to ensure robustness?

Graceful degradation is used to ensure a robust and co-operative dialogue interaction. In the beginning this method caused the system to break down quite often. Now, however, it is very stable and if it turns out that the system cannot help the user, i.e. even the lowest level does not help the system to understand the user correctly, the user is told that this is the case and/or redirected to a human operator.

Maintenance: How easy is the system to maintain, cost estimates, etc.

Maintenance is not needed very much. There are no guidelines on how to do it. For modifications, customisation and additions see below.

Portability: How easily can the system/component be ported?

If you can run Quintus Prolog on the machine it is very easy to port the dialogue manager to this machine. It runs on Windows, NT, and Unix platforms. A version for Sixtus Prolog under Linux is being considered.

Modifications: What is required if the system is to be modified?

The dialogue management model has been extended for multimodal applications where a speech interface is integrated into a direct manipulation environment [McGlashan 1996]. Interpretation of graphical input is based on the same semantic and pragmatic structures required for spoken language, although the structures are less complex to process due to the absence of underspecified input like anaphora and ellipsis. Moreover, the algorithm for realising system dialogue acts has been modified to allow generation of graphical output, and the semantic and interpretative functions have been enhanced to handle 'command and control' utterances. However, the basic principles of the dialogue management model used in speech-only applications remain intact.

Additions, customisation:

To adapt the system to a new information service, modelling the tasks, the application system, the discourse world and the language coverage is sufficient. Language and task can be changed in the dialogue manager simply by resetting switches which govern the static knowledge bases consulted during dialogue management.

Property rights: Describe the property rights situation for the system/component.

Daimler-Benz has the property rights of the dialogue manager.

Documentation of the design process


E.g. specification documents or parts thereof.

References to additional project/system/component documentation


Bayer, Thomas; Heisterkamp, Paul; Mecklenburg, Klaus; Renz, Ingrid; Regel-Brietzmann, Peter; Kaltenmeier, Alfred; Ehrlich, Ute (1995): Natürliche Sprache - ein multimedialer Träger von Information. InfoPort - ein Projekt zur Überbrückung von Medienbrüchen. In: Proceedings of DAGM-95, Bielefeld.

Brietzmann, Astrid; Class, Fritz; Ehrlich, Ute; Heisterkamp, Paul; Kaltenmeier, Alfred; Mecklenburg, Klaus; Regel-Brietzmann, Peter; Hanrieder, Gerhard; Hiltl, Waltraud (1994): Robust speech understanding. In: Proceedings of ICSLP '94, Yokohama.

Hanrieder, Gerhard; Heisterkamp, Paul (1994): Robust analysis and interpretation in speech dialogue. In: Niemann, Heinrich; de Mori, Renato; Hanrieder, Gerhard (Eds.): Progress and prospects of speech research and technology. Proceedings of the CRIM/Forwiss Workshop, 5.-7. September 1994, Munich, Germany. St. Augustin: Infix. (Proceedings in Artificial Intelligence. 1.).

Heisterkamp, Paul; McGlashan, Scott; Youd, Nick (1992): Dialogue semantics for an oral dialogue system. In: Proceedings of ICSLP-92, Banff, Alberta, Canada, 1992.

Heisterkamp, Paul (1993): Ambiguity and uncertainty in spoken dialogue. In: Proceedings of Eurospeech '93, Berlin.

Heisterkamp, Paul (1996): Natural language analysis and generation. Materials of the course held at the 4th European Summer School on Language and Speech Communication - Dialogue Systems. Budapest, Hungary.

Heisterkamp, Paul; McGlashan, Scott (1996): Units of dialogue management: an example. In: Proceedings of ICSLP-96, Philadelphia, Pa.

[Reference to McGlashan 1995 is missing]

McGlashan, Scott; Fraser, Norman M.; Gilbert, G. Nigel; Bilange, Éric; Heisterkamp, Paul; Youd, Nick (1992): Dialogue management for telephone information systems. In: Proceedings of the 3rd Conference on Applied Natural Language Processing, Trento, Italy.

Mecklenburg, Klaus; Hanrieder, Gerhard; Heisterkamp, Paul (1995): A Robust parser for continuous spoken language using PROLOG. In: Proceedings of Natural Language Understanding and Logic Programming 1995, Lisbon, Portugal.

Regel-Brietzmann, Peter et al.(forthcoming): ACCeSS - Automated Call CEnter through Speech understanding System. A description of an advanced application. Proceedings of Eurospeech '97, Rhodes.