Spoken Language Processing by Mind and Machine

Tuesday, 20 April 2010

A hard limit on the performance of automatic speech recognition?

I've just finished reading Jerry Feldman's book - 'From Molecule to Metaphor' - which I found quite compelling, especially with regard to his insistence that meaning must be grounded in embodied experience. What I found less savoury was one of his final questions -"if the meaning of language is based in bodily experience and if computers cannot share our subjective experience, will we ever be able to communicate naturally with computers?"(page 340).

This is a fundamental issue in human-machine (and human-robot) interaction that appears to be hardly addressed. I'm always 'banging on' about us not using human-like voices in interactive speech synthesis applications, but instead insisting that we should employ voices that are more 'appropriate' to the application/agent (e.g. a voice-enabled garbage can should have a garbage can voice). I've always argued that this is vitaly important if we're to avoid a user overestimating the capabilities of an automated system, and thus stepping outside the range of behaviours that such a system could handle.

However, the question raised by Feldman goes even further - maybe it will never be possible to hold a meaningful conversation with a machine/robot, simply because of the lack of common grounding in real-world experiences.

If this is true, then coupled with the argument that one of the reasons for the exceptional abilities of human beings to be able to recognise speech in difficult circumstances is context (expressed as a dynamic statistical prior), then there must be a hard limit on the accuracy we can expect from automatic speech recognition systems.

So, are we there yet?

Saturday, 8 November 2008

Audiory Illusion

Every week I drive from my family home to stay at our cottage near the University. Normally I have the SatNav switched on - not because I don’t know the way (I’ve been doing this journey every week for the past four years), but because the SatNav beeps in advance of all the speed cameras along the route and I don’t want to earn a speeding ticket by an inadvertent lapse of concentration.

The other day I drove my regular route, but this time I didn’t have the SatNav (as it had been switched into my wife’s car). As usual I was listening to the radio, and on this particular night they were playing a live Oasis rock concert. As the journey progressed, I was amazed to find that on a number of occasions I was convinced that I could hear the usual speed camera warning beeps (issued by the non-existent SatNav)! As I became aware of what was happening, I noticed that I appeared to 'hear' the warnings at the exact times/locations on the road as they would have occurred if the SatNav had been on board.

The music was loud and cacophonous, so I can only assume that I experienced an auditory illusion that had been triggered by visual (episodic) memory – a dramatic illustration of the powerful top-down predictive nature of perception, and very much in line with my PRESENCE theory. Had I not had the radio switched on, then I’m sure that I would not have experience the illusion.

Thursday, 16 October 2008

Modelling Cognitive Behaviour

Last Friday (10 October 2009) I attended a small but very stimulating meeting organised by Bristol University on ‘Modelling Cognitive Behaviour - see http://patterns.enm.bris.ac.uk/modelling-cognitive-behaviour for the details. The line-up of speakers included Sue Blackmore (author of ‘The Meme Machine’) and Richard Gregory (author of ‘Eye and Brain’), and the meeting covered an impressive range of scientific topics and historical time – the latter arising from Owen Holland’s review of the post-war ‘Ratio Club’ and Richard Gregory’s reminiscences of the meeting on Artificial Intelligence that took place at the National Physical Laboratory exactly 50 years ago.

Although it was interesting to hear about the early work on cybernetics, and that ideas such as conditional probabilities, control systems and information theory had all been identified as key elements of intelligent machines, by far the most stunning talk was Tony Trewavas’s insight into the intelligence of plants! If anyone had any doubt that all living organisms share many of the behaviours that one might think to be unique to animals and/or human beings, then Tony soon put us right. Amongst much other behaviour, plants were shown to exhibit general information processing, adaptation and storage, active planning, communication, competitive behaviour and an ability to distinguish between self and others. In the final discussion, Tony delivered the memorable soundbite – “plants don't have brains, in some sense they are brains”!

Whilst it was generally agreed that plants may be said to have ‘intelligence’, it was felt that they could not be said to exhibit ‘cognition’. Plants certainly have complex self-regulated networks that communicate with the environment but, unlike many animals, they don’t seem to possess an internal representation of their external world. Also, as Richard Gregory observed: “plants have needs, but humans have needs and wants”. Richard’s remark followed an extensive discussion on the nature of ‘consciousness’ and the various explanations that had been put forward during the day, ranging from internal simulations of the self and the external world (Owen Holland), the cuing of episodic memory when the next actions aren’t obvious during search (Joanna Bryson) and that it was all an illusion anyway (Sue Blackmore).

So what, if any, were the implications for spoken language processing? I think it’s interesting to appreciate that some of the earliest ideas in machine intelligence (such as conditional probabilities) have subsequently become central to the design and implementation of contemporary state-of-the-art speech technology systems. It confirmed my opinion that not all old ideas are bad ideas. Indeed, the vital dependencies that exist between a speaker/listener and their interlocutors/environment are still understood very poorly; no-one models spoken language as part of a complex control-feedback process. Even plant behaviour appears to be based on fundamental cybernetic principles that seem to underpin the behaviour of all living systems. Maybe the only way to make progress in our understanding of spoken language processing is to revisit some those early ideas in cybernetics?

Wednesday, 15 August 2007

Whither spoken language?

Despite tremendous scientific progress over the past fifty or so years, there still seems to be a long way to go before we can reach a comprehensive explanation of human spoken language behaviour and can create a speech technology with performance approaching or exceeding that of a human being. It is my belief that progress is hampered by the fragmentation of spoken language research across many different disciplines, coupled with a failure to create an integrated view of the fundamental mechanisms that underpin one organism’s ability to interact with another.

In fact I would argue that "spoken language is the most sophisticated behaviour of the most complex organism in the known universe", and that we have grossly underestimated the amazing ability that human being's have evolved for communicating with each other. As a consequence, we have completely failed to realise the significance of spoken language as a topic of fundamental scientific investigation that could provide a unique window into the intricate workings of the human mind. Spoken language should sit alongside particle physics (the science of the infinitessimally small) and cosmology (the science of the infinitely large) as one of the most important scientific research topics of the current era. It is only the illusion of the ease with which we acquire and use spoken language that blinds us to its truly fantastic nature.