Opinion: I orea te tuatara ka puta ki waho. This whakataukī translates as: “A problem is solved by continuing to find solutions.” It refers to the need for creative thinking, adaptability and perseverance. To solve a problem, you need to have all of these.
It is easy, when we are interacting with voice assistants on digital devices or customer services phone lines, to feel we are wrong in how we speak and understand languages. But the problem lies with the inability of speech technology to recognise and synthesise the diversity of how humans speak. Automated voice assistants often recognise wrongly the words we speak and, in many cases, we struggle to understand what the voice assistant is saying as well.
Voice assistants have automatic speech recognition and speech synthesis systems embedded in them. These systems are trained on popular languages and standardised accents such as American English. Users without exposure to these accents, or who do not have these accents, find it challenging to interact with voice assistants. This is bias, which is defined as providing systematically and statistically worse usability of technology for a social group.
There is growing recognition of language bias in speech technology, but the research community is not yet certain what needs to be done to address it.
This bias is particularly crucial for Aotearoa New Zealand where two of our official languages – New Zealand English and te reo Māori – are under-represented in speech technology. Because of the similarities between New Zealand English and standardised English variants, the problem in this case is not too pronounced. However, that is not the case for te reo Māori and this must be addressed as use of te reo on the web and social media is essential to ensure the language is used extensively by new generations.
The first and obvious solution is to start building speech technology for all languages, which is what research communities across the world are trying to achieve. However, as speech technology is dominated by certain languages and their standardised accents, researchers and technology developers tend to follow data collection, data analysis and technology development as they would for these popular languages.
This often involves collecting large amounts of speech and language data for a language from one or multiple sources, analysing the data, training models based on the data and developing technology. Such approaches pose a danger of treating the speech and language data merely as a commodity to train models.
Communities, specifically indigenous communities, have responded negatively to commodifying their unique speech and language resources. Māori regard speech data as taonga (highly treasured resource). When it is shared with researchers, it is done so based on trust. Researchers then become kaitiaki (guardians) of the taonga, and not owners. Many communities have similar beliefs about their language and speech.
A danger with commodifying speech and language resources is that the tendency for unique ideologies the communities have towards their speech and language tends to get lost in the process of technology building.
Another danger is these resources then become a means for monetary benefit. Once speech and language resources are collected from the community, the community is no longer involved. Positive benefits from technology development may not reach back to the community that shared their data – in terms of impact and revenue.
In short, the solution described above has not proved to be the best for all. So, we think, adapt, and persevere to find other solutions that bring broader benefits to the community for whom the technology development is happening in the first place.
Every language is unique, and only its speakers can truly understand the philosophy behind their language. It is the voices of the language speakers that need to be prioritised while building technology. So, we need to build technology along with the language speakers at all stages of technology development. A co-design and co-development of speech technology along with speakers and experts in the language is the way ahead.
This approach will be a long-term process of relationship building and brainstorming to ensure we understand what the community really wants at each stage. This will include collecting data alongside the community, making sure researchers truly understand the philosophy of the community and remain kaitiaki of the resources entrusted to them.
Feedback from the community should also be taken on board and changes made as needed. The goal should be to ensure technology development supports the needs of the community and brings long-term positive impacts.
This approach is new and we do not yet know if it will work. However, what we do know is that involving communities in matters related to their language and speech will avoid colonisation of speech technology.
The Speech Research@UoA group is involved in developing speech and language technology for the languages of Aotearoa. We have worked closely with Māori experts and speakers in all te reo Māori-related research we have done so far. Te Hiku Media is a Māori radio station in Northland that has been pioneering speech and language technology development for Māori. Our research at the university is developed in consultation with Te Hiku Media to co-design Māori language resources.
We are also taking this collaboration further by committing ourselves to a SfTI Kaupapa kākano seed project fund, where the investigating team includes te reo Māori experts, te reo Māori speakers, linguists, data scientists and engineers. The project idea was framed in consultation with Te Hiku Media and changes made to ensure technology development is done correctly, with maximum impacts to the Māori community.
As the project unfolds over the next two years, we hope the co-design approach will help develop te reo Māori speech technology and ensure the technology brings positive benefits to the community.