I have had reservations about the Echo line because of the whole "always listening" thing, regardless of what anyone's said about how it's not recording, how I can unplug it, etc. The whole "always listening" thing isn't what interests me about playing with Alexa.
As someone who's spent a fair amount of time with hardware, I think this is what will make me tinker with the Alexa service - I am interested to see what it can do and I like keeping up with Amazon's hardware projects. I've got all the parts lying around to throw this together without spending anything, so it's a neat way for them to grab some interest from a different user demographic. This also should be fairly easy to get running on a BeagleBone too, which I tend to lean towards (more I/O, PRU can be useful)
>I have had reservations about the Echo line because of the whole "always listening" thing
I've spoken with other people about feeling this way. I think that the difference here is actually 100% psychological, and that always-listening devices like the Echo are exactly as trustworthy as the company that makes them.
I am currently standing next to at least 4 different devices with microphones and internet connections that are on or in "sleep" mode. Just because they aren't "listening" to me in a way that is obvious (e.g., they respond to a command) doesn't mean that they're not recording every sound I make. There are in fact trojans designed to do exactly that.
Either you trust the manufacturer of these devices or you don't. The fact that there's a secondary processor on the Echo that does low-power constant voice recognition for the word "Alexa" (and similarly for some phones which can be activated with "OK Google") doesn't make it suddenly more likely to be storing all of your audio, all the time.
The only salient difference is just that it makes it obvious that it was in fact listening, whereas any internet-connected device around you could be listening to you right now and simply never let on.
Even if you trust the company, you also have to trust that they have no logs that can be subpoenaed and that they cannot be compelled or hacked to wiretap you.
Sure, but that misses the point: if your phone were doing that it would be functionally equivalent. You wouldn't be able to tell any more than you would with Alexa
I'm pretty sure Alexa isn't recording or transmitting what's said at all times, there's just a local process looking for the queue to start recording a sample to upload. That queue being the word "Alexa".
I believe you're right, but it still feels icky to have something always listening for that special phrase, to me. Maybe if I wrote and maintained the code myself that always listened, I'd feel more comfortable with it.
Oh, I think the proper word was cue, not queue, by the way.
You could try adding in some simple motion, like raising your hand, before the system would start listening for hot words. Maybe connect a cheap infrared sensor to GPIO and block its view so it only detects motion at or above certain height.
I believe the terms and conditions for the Alexa SDK state you can't activate the Alexa Voice Service via voice, your user has to purposefully interact with something like a button to use it.
EDIT: I'm wondering if this is a legal thing, i.e. they don't want any tom, dick or harry creating "always listening" devices associated with their brand, or they just want to differentiate their Echo product and not have competitors
Its worth remembering that these services are dependent on the network 'carrying their packets'. If you're worried about them 'phoning home' just make sure your home network is configured to record such events to the best of your ability, and setup some simple alerts or blocking.
Agreed, I think this can be generalized to say that this should be the case with all these cloud services. The hardware should be open source, so you can at least control that part. At least you have that much control over the cloud APIs.
This may bring me one step closer to my personal "holy grail" of home automation: every room in the house[1] working with seamless voice-activated home automation. This is what I'm ultimately after:
- A cheap device (DIY if possible) in the form factor of a small plug-in unit. Ideally the device itself should be practically "invisible" in each room, and won't require any special home wiring. This is definitely in the realm of possibility for a Raspberry Pi (or similar).
- A microphone for the device that works at least as well as the Echo's far-field mic. I have not been able to find any good options for this, apart from some obscure parts that are too expensive for me to test, let alone buy for every room.
- Software that allows for voice-activated operation. There's probably a suitable workaround for doing this with the Alexa Voice Service now, though it may require more CPU power than is available on the Raspberry Pi.
- Ideally, I could host the voice service myself and wouldn't have to worry about the privacy implications of going through someone like Amazon. I know there are several existing software packages that claim to do this, but none that I've found can match the quality of Echo/Alexa for everyday interaction.
- Audio feedback does not need to be high quality, but at least audible. A small speaker within the device is probably enough. For other areas of the house, it would be nice for the output to be connected to a bluetooth speaker in the room or a home audio system (if available).
The Echo Dot appears to be a pretty close match for this (though I haven't tried it) - at least in terms of functionality, but the form factor still seems a bit off. I'd rather have a self-contained plug-in unit than something that sits on a desk or table.
> is may bring me one step closer to my personal "holy grail" of home automation: every room in the house[1] working with seamless voice-activated home automation.
But why? What do you want to automate?
That's the part I never understood about this stuff. What is there to automate in the first place?
That's a good question. Maybe "home automation" isn't exactly the right name for it, because that's just a part of it. It is nice to be able to control things by voice (lights, locks, window shades, music, TV, etc), but to call any of those things "essential" just sounds lazy :)
For me, the Amazon Echo is more about having a connection to the world without looking at a screen. Getting answers to random questions (or utilities like timers), latest traffic conditions for travel, weather, etc are all really great applications for a voice interface. If that could also be combined with a good communication platform (voice calls, text/email messaging, etc), it would be even better.
If I lived alone with no pets I'd love to be able to automate the temperature to save money when I wasn't home. But there are usually people in my house, and there are always pets there, so I have to keep it pretty temperate in there at all times.
Other automatable things (lights, lawn sprinklers) are so easily accomplished with simple analog timers that I really do not see the attraction in controlling them via an app. Maybe it's just the programmer in me thinking... at this point in my life, I'm very jaded when it comes to software.
Security is a big one, I guess. It would be nice to be able to monitor that remotely. Though honestly I already have a dog in the house which is probably more effective than 90% of the solutions on the market.
My big thing is whole-house streaming audio, and we've had that for years with Airplay and Sonos and now with Chromecast as well. So that's cool.
Things I've considered automating: different light settings (temperature, which light to keep on, watch-style notifications/interactions when not at my device (usually because it's charging), instructions to nearby screens or audio systems, reminders... A lot of it relates to integrations and interactions which require too many button pushes. Voice control is one step closer to "mind-reading"-like experiences and having knowledge of who is in which room would improve the context-awareness of existing apps and whole-home systems. For example, ideally if I said "play some music" it would know who I was and what I liked vs others in the house. This is more than voice control, and is likely phone- or app-integrated, but every small step gets us closer. ;-)
Not the OP but here's some of the stuff I have/want:
Have:
Adding things to my shopping list (via Alexa - currently in the kitchen and bathroom - two most frequent rooms where I'm like "hey I need more XXXX")
Changing the temperature (Nest via phone app and command line tool I wrote, and now Alexa just did a native integration)
Finding out my schedule for the day.
Finding out the weather forecast for the day.
Playing music (via Sonos or Alexa)
Locking front door when I'm going to bed.
Getting notified of movement or sound in the house while I away (Nest Cam and Smart Things)
......
Like the OP I'd love to have the voice control available in any room of the house. I'd love to have better security features (i.e. if I drive up my driveway at 3 AM turn on the outside lights, if a stranger drives up my driveway at 3 AM WAKE ME UP WITH AN ALARM). I'd love better voice controlled Sonos music selection. Composing emails, having handsfree phone or video calls, warming up the oven at a certain time, etc...
I have a few Dash buttons, but it's SUPER limited what you can link them to. Plus I'd need hundreds of them:) The "Alexa, add XXXX to my shopping list" works really well for me.
I don't wear a watch around the house, and I don't really like smart watches honestly - I like big heavy metal automatics. Likewise I enjoy being able to leave my phone on the charger somewhere else. I dunno. I mean long term dream would be integrating things like "smart" mirrors, semi-smart AI-esque assistants, and so on. Then again, I'm currently in a zero-tech house on the beach in Costa Rica, with a barely there internet connection, so it's not like any of this is that important...
* lights
* alarm clock (setting, turning it off and on, 15 more minutes)
* be able to turn on my xbox/fire tv and tell it I want to watch <x>, then it figures out how
* taking the dog out when it's 1am and I just want to go to bed (okay this one might take a little longer).
OP more describes home control than home automation. Home automation is the house intelligently doing things for you, without your prompting. Home control is the house doing things for you when you ask it do things for you.
E.g. Home automation: Every day at sunset, lower the blinds
E.g. Home control : You say "House, lower the blinds now".
I just want technology that WORKS at home.
Too much of my time is spent fixing other people's "mistakes".
I find Alexa fits that bill. Takes a bit of learning the right cadence of speaking to her. And unfortunately she has the same name as my daughter, which causes some interesting interactions.
It's pretty hard to order something via Alexa - I know, my friends have tried to order me several things ranging from sex toys to more Echos. Never worked. Please don't spread nonsense.
Yeah, if anyone found something equivalent to the Echo in mic tech that I can hook up to my own setup, I'd die to know. That's the one thing I can't really replicate on my own system.
I haven't bothered getting the SDK for the Kinect 2 yet but both of them seem to only be good for about 135 degrees around the device - the Echo is a much better omnidirectional microphone.
The kinect seems pretty cool. I've heard of one person who brought it into work and used it to identify and greet people who walked into his office. So there is probably some pretty cool user identification projects you could do with it.
My (admittedly anecdotal) experience having an Xbox One is that it would pick up phrases that barely sounded like built in triggers and do unexpected things at unexpected times, such that I felt forced to disable it.
We've had an Echo since before they were available to the public (ex-Amazon employee here), and it does occasionally beep when it thinks it hears "Alexa."
But it's very rare that it goes beyond that; usually we hear the beep and one of us will yell "never mind!", and then the Echo will go silent. Sometimes it will cancel itself, too. I think that the always-listening part is less discriminating than the active voice recognition, and that it can re-parse the last couple of seconds and decide, in retrospect, that you didn't say "Alexa."
That said, I just recently said to my wife, "I wonder if Alexa knows 'sudo make me a sandwich'". The timing was such that it parsed "sudo make me a sandwich" and answered, "Well, if you ask like that, how can I refuse?" :)
We had a good laugh. Funny thing was, looking at the app later, it was actually parsed as "Pseudo make me a sandwich." I bet Google Now would have corrected it to sudo. :)
But presumably you could have the Pi listen for a trigger word or whistle or whatever using software running locally, and when triggered, kick over to the Alexa API?
You could setup an IFTTT "Do" button [1] with the Maker Channel, which allows you to make an arbitrary web request. Then have a server running locally that can receive the request and trigger the recording. Nodered [2] would make setting that server up pretty simple.
To expand a little on this: The Echo has a 7-microphone array which is crucial to speech recognition accuracy. This gives it the best far-field recognition ability of any consumer product I've seen, with the ability to stay accurate even if you're across the room, with music playing. That's just the hardware, and replicating it's abilities will not be easy.
On the software side, supposedly they're using Nuance for recognition. Nuance isn't cutting edge: In the tests I've done, Nuance has a Word Error Rate (WER) that's 10%-20% higher than Google's, but it's still much better than something like Pocketsphinx or any other open source recognizer.
There are a lot of factors that go into making a speech interface a good experience for users: Good recognition accuracy even with background noise, good voice activity detection (even with background noise), very accurate word spotting, low latency. It's hard to hit all these things well enough to make the interface usable.
Does it mean that we can run it on any computer that runs Java? I read through the tutorial but couldn't find anything that specifically tied to raspberry pi.
From what I understand, the echo has a specific piece of hardware in it that is 'always listening', and once triggered via voice command the echo actually begins to listen. So unless you have something connected to the device that could reproduce that initial voice analysis hardware, you cant have the 'always listening' feature.
Probably for power reasons, much like the most recent iPhones can be controlled by ; Hey Siri". However older iPhones can always listen too but are required to be plugged in to do it because they're using their main processor and doing it at a software level.
In short, always listening isn't difficult on non-battery devices, it's just a software problem.
This is awesome. I am strongly considering getting setting this up as I just purchased a fresh raspberry pi.
The only limitation appears to be you have to click a "start listening" button to get it to start recording audio. You can't simply say "Alexa" to get the raspberry pi + alexa web service to listen for your query.
Anyone have any ideas for a work around/ solution to this?
I've heard the anecdotes about Alexa responding erroneously when people weren't home and doing things like turning the furnace on, etc. That combined with the general privacy concerns make me much more comfortable with being able to push a button to get her to listen – but I'd rather it be a button on my person – like on my fitbit, watch, etc. – easy, always available – plus maybe a way to turn it on for listening over a duration – say while cooking.
i wrote a simple clap trigger (using the raspberry micro) that is always listening. I use it to shuffle my phillips hue light but it can easily be used to trigger echo voice
https://github.com/131/clap-trigger
I've been looking for an excuse to tinker with a raspberry pi for a while - this seems like something I could have some fun with then give away to someone less paranoid/concerned with the privacy issues.
That's a great point. A little openness can only help Amazon sell even more of these things. I don't have one, but everybody I know who does really likes it.
Nice to see them walking through pretty much everything from getting your RPi running to making it work with AVS. That said, Sam Machin's Python CHIP / RPi client was there first, and has a smaller footprint: https://github.com/sammachin/AlexaCHIP
The value is in Amazon's voice services and speech recognition platform, not in the Echo device itself. As machine learning improves, voice and speech may play a bigger part in user interaction, and Amazon wants to be out in front of that.
The Echo hardware was just a way to get the ball rolling with this platform.
I'd imagine the incentive is more points for the speech recognition model. The race is for who can handle completely unstructured speech best, and the field is vast at this stage.
Like with their hackable instant order button, I would expect that at least for the short term, they want to use it as a way to make purchasing as fast and easy as possible, obviously to maximize their income.
Longer term, conversational AI is clearly the next huge thing in sales and customer interaction, and they probably want to start progressively building grounds in that domain
I'm able to use it on an account with only a device generated via the Alexa Voice Service. I made a device profile on the developer site, then authenticated to it via the OAuth flow.
Has anyone done hardware tinkering with the Echo? Does it run Linux? what does the mic array look like? Possible to use just the mic array and pipe the audio elsewhere?
Has anyone found a decent solution to hooking more than one mic input into an RPi? Something that would allow doing some simple DSP across, say, a 4-input array?
We have 100% totally pivoted on this one. Every proposal we put out, now has Echo front and center. As we say "screens", you mean like your father/mother used to use? How old school. A screen? Oh boy ... :-)
As Woz says, "bigger than the iPhone." That sounds like a hell of a prediction to me. Woz knows all. :-)
As someone who's spent a fair amount of time with hardware, I think this is what will make me tinker with the Alexa service - I am interested to see what it can do and I like keeping up with Amazon's hardware projects. I've got all the parts lying around to throw this together without spending anything, so it's a neat way for them to grab some interest from a different user demographic. This also should be fairly easy to get running on a BeagleBone too, which I tend to lean towards (more I/O, PRU can be useful)