Mr. Bengler, how can communication between people and their cars be improved?
Bengler: Communication needs to become clearer and more natural. The car of the future will be able to understand its driver better and better, and communication won’t just be by voice, either. In order to minimize misunderstandings and the need to ask for clarification, the “Man-Machine Interaction” research project is using a small camera to capture gestures and facial expressions. For example, many people subconsciously nod their heads when answering a question in the affirmative and shake their heads slightly when saying no. If the system isn’t totally sure that it has understood the driver’s answer, it can look at his body language to confirm his meaning. Voice recognition can even evaluate the tone of a person’s voice. We are also looking at what the driver does with his hands. In future, he might be able to activate the CD player simply by waving a finger.
The BMW Group’s research is currently focusing on refining voice-control systems so that cars can understand every day language as well as set commands. What are the major difficulties in this exercise and how can they be resolved?
Bengler: Voice recognition is already very convenient and robust. At the moment, our research is focused on German. However, some of the voice recognition systems which are already in series production can handle several different languages. Nevertheless, a few small hitches remain, but these can be reduced by coupling voice recognition with other means of gathering information. For example, gestures can also be interpreted. After all, when people talk to each other, they evaluate gestures and tone as well as listening to what is said.
And how does gesture recognition work in the car?
Bengler: A stereo camera system coupled with infrared lighting picks up hand movements, analyses them and automatically equates them to various standard gestures. The system can distinguish between 17 different gestures. These include simple intuitive movements like waving a hand to the left or right, for example to change between radio stations, as well as more complex gestures to control the navigation system.
A single camera unit evaluates color and shape data to evaluate head movements. The color of human skin is an excellent distinguishing feature, since it can be clearly identified, even with darker skin tones. The position of the head can therefore be clearly identified so that head movements can be recognized. Both recognition systems are based on a method which stores information on the average duration of a driver’s gestures, his symmetrical properties and the length of the pauses he leaves before and after each gesture. This data is then compared with standard statistical values.
In certain situations, recognition of dynamic hand and head movements can provide an alternative to traditional means of interaction such as pressing buttons. It is particularly useful in environments with a high level of background noise where voice input systems have to deal with recognition problems.
The use of dynamic head movements as an additional input method is largely limited to the recognition of shaking and nodding of the head to signify disagreement and assent. This method allows the driver to receive an incoming telephone call simply by nodding his head, for example.
Does that mean that communication with the vehicle will no longer work properly if the driver uses a lot of gestures when communicating with passengers or if the background noise level is too loud?
Bengler: Voice recognition becomes more difficult when the level of background noise is too high. That’s why we’re researching several different recognition methods to allow the vehicle to understand commands clearly. Using a lot of gestures when communicating with other people is no problem, since the driver needs to use clear gestures accompanied by clear commands to communicate with the system.
We hear that in future cars will be able to respond to the moods of their drivers and automatically change radio channels, for example. What is the technology behind such sensitive vehicles?
Bengler: Our philosophy here is very clear – the driver always holds full responsibility and control over the car. The vehicle won’t do anything that the driver doesn’t want it to. In general, this means that it won’t change radio channels without explicit commands from the driver. The emotion recognition system will only be activated if the driver’s voice input is not clear. If clear voice input and gestures or head movements contradict one another, the system will “believe” the voice input. The other parameters serve only to stabilize the voice input system.
What is your view on a central intelligence which not only monitors electronic systems such as ABS but also assists the driver and learns from him?
Bengler: In general, we make a distinction between regulatory systems such as ABS, which are directly related to driving, and secondary information and communication functions. An intelligent system is only useful if it can help the driver in situations which occur regularly, for example with telephone numbers he dials frequently or radio stations he often listens to, perhaps in specific driving situations. We need to provide enhanced protection from attack here, though, so we’ll be using firewalls and encryption technology to protect the user from outside manipulation.
How long do you think it will be before cars can communicate with one another in a manner which allows them to avoid traffic jams or accidents?
Bengler: Our research vehicles are already using W-LAN systems to communicate with each other and to exchange traffic information and warnings. However, before we can get these “talking vehicles” onto the roads, we need to do more than just solve technical problems. We also need common standards, for example. The BMW Group is a member of the “Car2Car Communication Consortium”, a forum in which European car manufacturers are working together to achieve standardization. Of course, the long-term ideal is for all vehicles to be able to communicate with one another. However, I’m sure it will take a number of years before the standards and the necessary dedicated frequency band are in place to make this a reality.