So speaking it back to you seems pretty natural, right? That’s how humans communicate. But we also wanted to do that only when we had a text-to-speech engine that was extremely high quality. And what you hear today, if you ask Google a question on Jelly Bean, is quite spectacular. There isn’t a text-to-speech engine, as we call them, that has accuracy as high as that.
We didn’t talk about this in the keynote, but we have built a text-to-speech engine that’s networked-based, meaning it uses a very large amount of data to compose a spoken answer. You know, purely from a synthesis perspective — forget about answering questions — it takes a very large amount of data to generate a synthesized audio of someone speaking. But we also have a matching engine that sits on the device. It’s the exact same voice but with a very different computational technique. You’ll always hear the same voice whether it’s speaking back to you in a connected use-case, in which it comes from the server, or a disconnected offline use-case, in which it would just be synthesized on the device.