Evaluating Multimodal Interfaces


The innovation of multimodal interface system is only the beginning of a long journey towards computing applications that will serve the users’ needs without presenting any form of limitation, whether in terms of disability or location. This achievement has fuelled the continuous invention of ways to create many forms of input modes, which can be recognized while the users engage in diverse daily activities. The advancement in technology is promising and it predicts that in future multimodal interface will have the potential to achieve flexibility and support any new input mode.


Multimodal interfaces process more than one combined input mode at a time; the modes may include speech, body movement, and touch among others (Mazor & Cohen, 2000). The modes integration presents to the world of computing a new and high-tech means of communication between humans and computers. The multimodal system aims to appreciate human forms of communication including speech, gesture and behavior (Mazor & Cohen, 2000). The success of multimodal interface has been enabled by the availability of input and output technology, which has high levels of recognizing errors and resolving them easily.

The increasing interest towards and demand of multimodal system has been inspired by the steadily growing need to have a flexible, transparent and efficient communication system. The system is expected to be powerful and easy to use in order to serve any user. Therefore, it is expected that even with the global technology development, the applications will not pose a challenge in usage.

Literature Review


In early 1980, Bolt came up with a demonstration model which could process speech and touch at the same time; it was then referred to as “put that there” system.  With the help of that model Bolt made users face a projection referred to as “Data land”, which was placed in a media room (Bers et al., 1998). The users would then speak and touch a pad, which would create movement of objects on the screen. That would then give the expected result from the speech.

The earliest multimodal interface utilized speech along with keyboard and mouse (Bers et al., 1998); these aimed at supporting the use of natural language in order to increase expressive power. This multimodal interface experienced maturation in early 1990, when speech recognition was enhanced and more input modes, such as visual inputs, were added.

During the early period of multimodal interface development, tourists and militia could speak and then use the mouse to find certain locations on a map (Kalmer & Blaner, 2008). For instance, using Georal system, tourists could locate a place of interest by speaking and pointing on the map system screen. The feedback arrived in form of a diagram, showing the location, or a text, explaining the location. The development of multimodal interface technology has altered the system from speech and touch-pad pointing to the system which allows parallel input of more than one modality and that can convey rich information (Kalmer & Blaner, 2008).

 Invention of such a system as Bolt’s formed a base for intensive innovation in the use of several modalities, which would later enhance human-computer interaction. Massive development in multimodal interface has been experienced over past decades; the growth has been identified mainly in improvement of the system’s robustness and transparency. The advanced technology, which has been employed in designing hardware and software, is the primary reason as to why integration of several input modalities has been successful (Kalmer & Blaner, 2008)

Human-computer interaction (HCL)

Human-computer interaction field has achieved the integration of knowledge on technology and cognitive science limitations in human communication (McGrawn & Summerth, 2005). The principal goal of this field is to ensure that there is an interaction paradigm and a design principle, which enhance communication of humans with computers.  Human-human communication requires interpretation when it involves audio-visual signals; due to this many technologies, such as unimodal techniques, have been developed in order to help simplify the process (McGrawn & Summerth, 2005).

Growth in computer technology in particular has enabled communication between multiple users due to the remarkable enhancement of processor speed, storage, and memory. These qualities have then been spiced up with availability of devices, such as phones, which are making advanced computing a reality.

Advantages of multimodal interfaces are that they effectively help to prevent errors, they make interface robust, they aid in correcting errors as well as recovering from errors easily, they enhance communication with a strong bandwidth, and they are offering more alternative forms of communication in diverse environments (McGrawn & Summerth, 2005).

How Multimodal Interfaces Serve Diverse Users under Diverse Context

Multimodal interfaces have been developed because of their potential to serve diverse users and to perform in areas which previously did not have any form of computing introduced. People possess various communication abilities and preferences and, therefore, depending on the ability and preference, the choice of communication mode is different among various people.

Multimodal interface serves people by presenting a single way through which the diverse users can utilize computer devices. Therefore, multimodal interface enables even people with certain disabilities such as motor impairment, cognitive problems, temporary illness and low skill level to communicate effectively (Klinger, 2001). This is because it presents the users with a variety of choices how to employ devices when communicating. For instance, individuals with visual impairment prefer speaking to typing and so does a person with an injured arm.

Multimodal interface has also enabled the combination of different input modes, which the users can alternate anytime. Such flexibility in usage has help prevent physical damages to the users (Kumar & Cohen, 2000). For example, when using a keyboard or mouse for a long period of time, the forearm can suffer and hurt due to stress; prolonged use of vocal cords can injure the speech system. The flexibility of multimodal system is of great help to the users since it allows them switch to any modality, which perfectly suits the task or environment the user is working in at that given time.

High Performance Paired with Robustness

Multimodal interface has improved the stability of recognition systems as well as their robustness; this has been possible due to the ability to reduce errors when applying high-tech computing advancement. The multimodal interface allows two different input modes to be utilized at the same time with maximum functionality. Due to ambiguity, which may result from parallel use of diverse input modes, multimodal architecture has been well designed in a way that one modal can support correction of the other when parallel usage is in process (Kumar & Cohen, 2000).  For instance, if a user uses “ditches” instead of the singular form “ditch” in his/her speech, the speech recognizer can effectively recover the error (Klinger, 2001).

Such architectural design has increased the stability of the multimodal systems and has enabled the systems to be error–free, thus making them significantly effective for communication. The design has also proved that multimodal systems can provide effective and high performance which a unimodal system cannot grant. For instance, when employing a unimodal system, children, who have not developed ability to communicate clearly yet, or people, who are not fluent in a certain language, may not communicate effectively with other people. 

However, multimodal systems are of great help to such users since they prevent recognition gap for such challenges or corrects errors, thus enabling effective communication. Such state of affair makes the multimodal interface cater for a wide range of users as well as wide usage in case communication is done in an environment where it is constantly interrupted, for example, in a noisy place.

Multimodal Interface Enables Users to Express Themselves in Communication

Due to the ability to combine modal inputs in a multimodal system, users are free to manipulate information in a way that will allow them express themselves in communication. People may combine visuals and other multimedia in communication by means of multimodal interfaces; this contrasts with a unimodal system whereby if the user relies on the mouse or the keyboard to communicate, he/she is limited in expressing feelings using visuals in communication (Klinger, 2001).


Multimodal interface provides a sophisticated way of communication characterized by robustness, efficiency and transparency. The technology is attractive since it caters for people with diverse needs and in different environments. In a fast developing technological era, where everyone needs to feel well served, multimodal interface proves that technology can cater for people with diverse needs, thus removing any bias which prevents effective communication.

Order now

Related essays