Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Microsoft shows off universal translator that preserves your voice (extremetech.com)
126 points by mrsebastian on March 12, 2012 | hide | past | favorite | 33 comments


Microsoft has been showing off a lot of these lab projects lately. I guess they think it helps with their marketing. The problem is they've not just started working on these types of projects. Their R&D has always been doing stuff like this, but either they never seem to come out in products, or if they do, it happens a decade later. So other than marketing bragging rights, this is all pretty pointless if they don't show up in products at all, or not for a decade.


They also released several such projects.

Kinect itself was first announced on June 1, 2009 at E3 2009 under the code name "Project Natal".[48] Following in Microsoft's tradition of using cities as code names,[10] "Project Natal" was named after the Brazilian city of Natal as a tribute to the country by Brazilian-born Microsoft director Alex Kipman, who incubated the project.[10][49] The name Natal was also chosen because the word natal means "of or relating to birth", reflecting Microsoft's view of the project as "the birth of the next generation of home entertainment".[37]

I don't know if their R&D department has been a net gain, but you need to consider 20 years from they could be making billions from idea that seemed to have been abandoned today.


Something a lot of people forget is that for Microsoft, a research project may culminate in something that's a minor feature to be turned on or off -- a checkbox in a menu buried in a control panel "turn on Chinese Handwriting recognition" or some such, wheras we might think that these things could be entire startups!

Take a look at http://research.microsoft.com/apps/dp/pr/projects.aspx#p=1&#... to see how many of the projects, if successful, will probably just end up as a library with a few API calls, or a series of specifications for hardware manufacturers to go after.


There is some point in what you write, and I wouldn't have downvoted you. But there are counterexamples, like the Surface for example.


In reply to ocdprogrammer's dead comment:

> And where is Surface ? Is it available to buy somewhere ?(I never saw a single trace of it in europe) I have a feeling that Surface is already either too expensive or too obsolete.

I can't speak to what happened to it commercially, but it definitely was available to buy - I organised a demo of it at a trade show in 2010. Since then haven't really paid attention to it, I know a second version of it came out, but no idea how well it did or is doing.


I do Surface development. Surface v2 isn't that great, TBH. Lots of hardware issues. Lighting is still a huge problem.


Isn't that typical of (academic or industrial) research in general? What about it is specific to MSR?


I believe something similar was developed at CMU under the name ABBY. I wonder if they are the same; nevertheless, the CMU tech was impressive - real-time translation with voice characteristic preservation. We wanted to use it to develop a plugin for Skype for international business, but it didn't pan out. A missed opportunity in my opinion. However, now that Microsoft owns Skype, maybe we can expect an announcement soon.


The samples in the article don't do a very good job of demoing the technology.

You can download an mp3 of the talk: http://msrvideo.vo.msecnd.net/rmcvideos/160725/dl/160725.mp3 forward to 19:25 for a demo of english and mandarin

the full explanation of the system starts at 12:00


Thanks for the MP3 link. I'm going to snip out the proper English bit and fix up the story.



For example, Microsoft’s standard model of Spanish will have a default “S” (ess) sound, but the training process replaces it with your “S” sound. This is done for every individual sound (phoneme)

I don't think it can work as a phoneme for phoneme replacement; there must be a different heuristic at work.

Different languages have different phonemes and one of the most telling things of a non-expert non-native speaker is that they don't get the sounds right.

Eg "E" is pronounced differently in English, German and Czech. Japanese speakers don't differentiate between "L" and "R". Spanish "C" is in some contexts different to "C"s in other languages. Etc.

Obviously you can't pronounce a sound that you don't even know exists.

I would love to hear more examples.


> Obviously you can't pronounce a sound that you don't even know exists.

It's possible that they could interpolate those sounds. For example, Bengali has a sound that's halfway between a 'd' and a 't'. Using some statistical heuristics, it's possible that they could approximate that letter using your pronunciations of 'd' and 't' (and knowing how bilingual Bengali-English speakers who pronounce 'd' and 't' similarly to how you do also pronounce that letter).


Impressive technology. But this doesn't really solve the problem of accurately translating from one language to another, with all the ambiguities of language, does it? I mean, as far as I can tell, the really hard part is completely transferring the meaning of something spoken/written in one language into another. Sometimes it's obviously just not possible (like certain jokes, etc.). Most of the time though, it's just very hard to do with software.


This doesn't replace a professional translator, but it does replace a phrase book. "One bread, please", and "help, please call an ambulance" are massively useful, even if you end up with a baguette instead of a whole wheat bread and ask people to shout at an ambulance instead of calling 911.


Sure, this is hard to do with software but given that 99% of people are way worse at 99.5% of the planet's languages... a we have a massive improvement right there.


Or they just all speak English? Living in another country that speaks another language, my translation skills are only rarely needed.


This is really frustrating to me. "Learn German! It's the language of business in Europe!" Now when I'm in Germany, people speak English to me even when I say "auf Deutsch, bitte." Somewhere along the line, I must have picked up an accent in German, because some people even assume I'm from France and start speaking French to me. I'm B1 level in German (used to be B2, but it's slipping) and have no use for it outside of academic pursuits.

About the only language that is useful to me living in the northern US is French (being that my company does business in Quebec). Sometimes Spanish is helpful (we also do business in southern Florida). But the problems with learning these languages is it's not French you need to learn, it's Canadian French. It's not Spanish you need to learn, it's Mexican Spanish (/Cuban Spanish/Latin Spanish/Caribbean Spanish, however you want to classify it).

Good luck finding excellent self study (like memrise.com) classes in regional dialects.


>...it's not French you need to learn, it's Canadian French. It's not Spanish you need to learn, it's...

Is that level necessary? It's not as if a Qubequois would _not_ understand Standard French, or that a Cuban would not understand standard Spanish/Castillian. At least, I would hope it's not like that. It would be as if we'd need some translation to understand British English (coming from NAm). There is a bit getting used to an accent and some lexicon, but nothing that isn't overcome within a few interactions (with some exceptions).


It's not that they need to understand me (which I doubt they would have a problem with), it's that I need to be able to understand them. If I'm introductory in a language, all I'm really going to be doing is listening for keywords and piecing the meaning together with a small amount of grammar knowledge. That doesn't work when their words are different or sound different than what my brain is looking for.

Also, there are some people in the world who will pretend they don't understand you if you don't speak their language perfectly (looking at you, Finland!)


Makes me think of Iron Man, when Gweneth Paltrow clicks on the 'Translate' button in the terrorist video, and it was the original voice but translated into English. I never thought it would actually be possible, but this looks pretty interesting.


This technology, once perfected, will usher in a new era of fraud and crank calls.


I don't understand the samples in the article. The English audio says something, and the Spanish audio translates to "Welcome to TechFest 2012, were today you will see first-hand how Microsoft Research is studying the key technological tendencies that will define the XXI century." which is completely different.


Listening to the two audio samples in the report, I would say the voices don't sound very similar and the translated one sounds very mechanical.


Not only that; I think the contents do not match; I am no spanish speaker but i detected some words that do not exist in the english version. so unless i have missed something...


Hey! I'm the author of the story.

I think Technology Review got them mixed up (which is where I got the audio samples from). I first saw it in the TechFest keynote: http://research.microsoft.com/apps/video/dl.aspx?id=160725

I think Technology Review has an English clip from Mundie, but then translations of Rick Rashid speaking in the keynote.

Sorry about the mix-up :(

(And I wish I spoke another language...)


I speak both languages and you are dead on. Maybe a simple link mix up ?


You are right.

English version says: "So... I'm just gonna open up for questions. Anything you wanna know... you know... as usual on any(?) subject but probably about what we are doing today"

Spanish version says: "Bienvenido a [...] 2012 donde (...) podrá ver de primera mano como Microsoft Research esta estudiando las tendencias tecnologicas clave que definirán el siglo 21."

Which translates to something like: "Welcome to [...] 2012 where (you)'ll be able to see firsthand how Microsoft Research is studying the key tendencies that will define the 21st century."


yes,the italian audio is about contemporary bookwriters, the spanish one seems to welcome users to this new Microsoft Research projects, while the english one is a request for questions.

And I agree the audio is mechanical and seems the usual text-to-speech, maybe a little better


While the technology is very impressive, as a native Chinese/English speaker, I can tell you that the mixture of Chinese and English demonstrated is only useful for bilingual speaker like myself.

Also watching the demo of this presentation in contrast with Apple's keynote I can really see a world of difference on how two companies communicate with their customers.


(Pet peeve) This is not a universal translator! Until you show me a translator that can learn a new language, you're just building a common multi-language translator.


I know it's Microsoft, but why are they using Windows Media for web video in 2012? I don't want to install Flip4Mac just to watch the demo.


There is a link to the iPod (MP4) version in the MSR page, I'm watching it right now from Chrome/Mac:

http://msrvideo.vo.msecnd.net/rmcvideos/160725/dl/160725.mp4




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: