In early January, Facebook announced that they had acquired Wit.ai, a company that’s been building an API for voice-activated interfaces. There’s been industry speculation for some time about Facebook’s intention with speech-to-text, with the main focus on the possibilities of communicating via Messenger, hands-free. Star Trek-like hands-free messaging certainly does sound interesting, but another element that may be under construction at Facebook HQ could be the automatic translation of languages within the Facebook environment.
This is more in-line with Facebook’s overall vision of ‘connecting the world’ – what if Facebook were able to prefect a translation system that could enable seamless communication between different languages? That’d be a huge boost, both for Zuckerberg’s over-arching ambitions and the platform’s growth strategy in general. While it’s already possible to do this using something like Google Translate (and note: Google Translate also got a major upgrade last week), the results can be patchy and inefficient. Facebook may look to build a system that can translate in real-time, enabling immediate interaction, all within the Messenger app. Such a move could revolutionize connectivity on the platform – with around 100 languages currently used on Facebook, and Messanger being one of the most popular messaging apps in the world (WhatsApp, also owned by Facebook, is the most popular), group geography, personal connectivity, knowledge sharing – everything could change if such capability were enabled.
The other aspect that Facebook is no doubt interested in is data. More data means more power – once you control the flow of information, you can dictate terms to advertisers and groups who’d want to use it – and the company’s moves on speech-to-text may be linked to the next evolution in data gathering.
In May last year, Facebook put a cat amongst the privacy pigeons when they announced a new, optional, feature that would ‘listen’ to what was happening around you as you posted updates to the app. The system can tune into your surroundings and translate what TV show or song you’re listening to, then add that detail to your post. Predictably, people freaked out – this effectively meant Facebook was able to listen in on your life, could hear what was happening inside your home, inside your bedroom even. What’s more, critics of the functionality also theorized that while Facebook had highlighted that the functionality was optional (several times in their announcement), this ability to listen could, possibly, be made active without the user even knowing it.
With a bigger trove of personal data than any company has had in history, Facebook walks a fine line on user privacy. And while the company cops more than its fair share of criticism over its handling of such sensitive info, on balance, you’d have to say they’ve managed that conflict pretty well. They’re in uncharted waters for the most part, and they’ve gone to significant efforts to communicate with users and raise awareness of privacy issues in order to keep people’s data protected. But at the end of the day, the fact remains that Facebook’s business model is structured around your personal data and the value it holds for other parties. Facebook stores data on everything, down to the status updates you write but never publish. Their databanks are their most valuable assets, and in order to maintain their market position, they need to keep that data flowing, keep seeking new ways to build upon their overflowing data lakes.
So, what if Facebook could devise a process to translate all conversations to text? You’re carrying your phone around with you all the time, it’s sitting on the table as you have coffee with friends, rested beside you as you drive. What if Facebook could track what people were talking about, in real life, and add that data to their stocks? Suddenly they’d have a whole new stream of insight to provide to marketers, a vast expanse of keyword mentions and conversational queries that could be collated, logged and passed onto third parties to target marketing messages and focus specific advertisements.
Of course, Facebook would need user permission to do this, and storing an unending amount of speech-to-text data would put a huge burden on their data capacity. But privacy concerns are lessening each day – Facebook announces a new measure and people are up in arms, but then it dies down as the new data they gather is not mis-used wholesale, and people go about their daily, Facebook-aligned lives. Data storage options too are always improving – it’s not hard to imagine that in a few years Facebook could announce a new process where they’re translating specific segments of everyday conversation to text and noting those mentions for data gathering purposes – never to be shared in detail, of course, never to be linked to any specific user, such data would only ever be used internally. Would people stop using Facebook if they did?
Lost in Translation
Speech-to-text has long been seen as the next progression in communications – working in media monitoring, we looked into this for years, as it would revolutionize how that business is conducted, being able to detect mentions within TV and radio broadcasts. The problem is that speech-to-text technology has never been up to the required standard to make a significant impact. It’s improved a lot, it’s getting better over time, but we were never able to rely on the accuracy to a significant degree. Where speech-to-text has improved over time is in learning a single speaker’s voice – some of the top speech-to-text tools on the market actually have a very high level of accuracy when they are trained to translate a single voice; the user speaks to it, corrects mis-spellings and mis-interpretations as they go, and over time the system learns that person’s nuances, which enables it to produce very accurate results. It’s when other distrators come in that the systems have trouble – background noises, different intonations, accents.
Where Facebook may have an advantage here is if they can incentivise users to ‘train’ the system to their individual voices. If speech-to-text for messenger proves popular, they’ll be able to build better systems based on user examples, narrowed down to regional dialects and colloquialisms. Every time a user translates from speech-to-text, for example, they might make a correction here or there – Facebook could track those corrections and find common mis-interpretation patterns, narrowed down to specific regions. Starting off small enables Facebook to build that accuracy and increase the usability, making it more popular when it’s eventually rolled out to everyone. And as that accuracy improves, so too does the breadth of Facebook’s data gathering capabilities. It’s still some way off being anything close to a reality, but real-life conversational data tracking may be the next frontier in the big data journey.