Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Amazon Provides DIY Echo Plans for Raspberry Pi (github.com/amzn)
461 points by rpdillon on March 25, 2016 | hide | past | favorite | 101 comments


I have had reservations about the Echo line because of the whole "always listening" thing, regardless of what anyone's said about how it's not recording, how I can unplug it, etc. The whole "always listening" thing isn't what interests me about playing with Alexa.

As someone who's spent a fair amount of time with hardware, I think this is what will make me tinker with the Alexa service - I am interested to see what it can do and I like keeping up with Amazon's hardware projects. I've got all the parts lying around to throw this together without spending anything, so it's a neat way for them to grab some interest from a different user demographic. This also should be fairly easy to get running on a BeagleBone too, which I tend to lean towards (more I/O, PRU can be useful)


>I have had reservations about the Echo line because of the whole "always listening" thing

I've spoken with other people about feeling this way. I think that the difference here is actually 100% psychological, and that always-listening devices like the Echo are exactly as trustworthy as the company that makes them.

I am currently standing next to at least 4 different devices with microphones and internet connections that are on or in "sleep" mode. Just because they aren't "listening" to me in a way that is obvious (e.g., they respond to a command) doesn't mean that they're not recording every sound I make. There are in fact trojans designed to do exactly that.

Either you trust the manufacturer of these devices or you don't. The fact that there's a secondary processor on the Echo that does low-power constant voice recognition for the word "Alexa" (and similarly for some phones which can be activated with "OK Google") doesn't make it suddenly more likely to be storing all of your audio, all the time.

The only salient difference is just that it makes it obvious that it was in fact listening, whereas any internet-connected device around you could be listening to you right now and simply never let on.

I'm keeping my Echo plugged in. :)


Even if you trust the company, you also have to trust that they have no logs that can be subpoenaed and that they cannot be compelled or hacked to wiretap you.


Sure, but that misses the point: if your phone were doing that it would be functionally equivalent. You wouldn't be able to tell any more than you would with Alexa


I'm pretty sure Alexa isn't recording or transmitting what's said at all times, there's just a local process looking for the queue to start recording a sample to upload. That queue being the word "Alexa".


I believe you're right, but it still feels icky to have something always listening for that special phrase, to me. Maybe if I wrote and maintained the code myself that always listened, I'd feel more comfortable with it.

Oh, I think the proper word was cue, not queue, by the way.


It would scare you to know that OnStar can be remotely activated without the driver knowing, wouldn't it?


Or any cell phone.


This has always seemed pretty marginal to me. You might be able to turn the microphone on, but when my battery dies an hour later I'll be suspicious.


Lithium ion batteries don't get old and lose capacity. That's just the NSA backing off the sleep interval.


>Maybe if I wrote and maintained the code myself that always listened, I'd feel more comfortable with it.

Maybe? Maybe you'd be comfortable? There's a world where you wrote the code yourself and still don't trust that it's not sending data back?


You could try adding in some simple motion, like raising your hand, before the system would start listening for hot words. Maybe connect a cheap infrared sensor to GPIO and block its view so it only detects motion at or above certain height.


"queue" -> "cue"?


Looking at what Amazon's posted, it looks like what they've released doesn't even give you the "always listening" option.

You have to click on the "start listening" button and then the "stop listening" button.


Now all you need is a remote microphone in the shape of a star trek communicator for your shirt pocket. Tap, "Alexa, three to beam aboard." :-)


Just needs a vocera badge integration and a connection to Coding Insight from Talix to allow for NLP voice access to patient records...


I believe the terms and conditions for the Alexa SDK state you can't activate the Alexa Voice Service via voice, your user has to purposefully interact with something like a button to use it.

EDIT: I'm wondering if this is a legal thing, i.e. they don't want any tom, dick or harry creating "always listening" devices associated with their brand, or they just want to differentiate their Echo product and not have competitors


Confirmed.

This is not always listening app. You've to click "start listening".


Yes, agreed - this is perfectly fine with me. If I wanted the "always listening" feature I'd grab an "official" Alexa device :)


Its worth remembering that these services are dependent on the network 'carrying their packets'. If you're worried about them 'phoning home' just make sure your home network is configured to record such events to the best of your ability, and setup some simple alerts or blocking.


Agreed, I think this can be generalized to say that this should be the case with all these cloud services. The hardware should be open source, so you can at least control that part. At least you have that much control over the cloud APIs.


This may bring me one step closer to my personal "holy grail" of home automation: every room in the house[1] working with seamless voice-activated home automation. This is what I'm ultimately after:

- A cheap device (DIY if possible) in the form factor of a small plug-in unit. Ideally the device itself should be practically "invisible" in each room, and won't require any special home wiring. This is definitely in the realm of possibility for a Raspberry Pi (or similar).

- A microphone for the device that works at least as well as the Echo's far-field mic. I have not been able to find any good options for this, apart from some obscure parts that are too expensive for me to test, let alone buy for every room.

- Software that allows for voice-activated operation. There's probably a suitable workaround for doing this with the Alexa Voice Service now, though it may require more CPU power than is available on the Raspberry Pi.

- Ideally, I could host the voice service myself and wouldn't have to worry about the privacy implications of going through someone like Amazon. I know there are several existing software packages that claim to do this, but none that I've found can match the quality of Echo/Alexa for everyday interaction.

- Audio feedback does not need to be high quality, but at least audible. A small speaker within the device is probably enough. For other areas of the house, it would be nice for the output to be connected to a bluetooth speaker in the room or a home audio system (if available).

The Echo Dot appears to be a pretty close match for this (though I haven't tried it) - at least in terms of functionality, but the form factor still seems a bit off. I'd rather have a self-contained plug-in unit than something that sits on a desk or table.

[1] Or most of the house anyway


> is may bring me one step closer to my personal "holy grail" of home automation: every room in the house[1] working with seamless voice-activated home automation.

But why? What do you want to automate?

That's the part I never understood about this stuff. What is there to automate in the first place?


That's a good question. Maybe "home automation" isn't exactly the right name for it, because that's just a part of it. It is nice to be able to control things by voice (lights, locks, window shades, music, TV, etc), but to call any of those things "essential" just sounds lazy :)

For me, the Amazon Echo is more about having a connection to the world without looking at a screen. Getting answers to random questions (or utilities like timers), latest traffic conditions for travel, weather, etc are all really great applications for a voice interface. If that could also be combined with a good communication platform (voice calls, text/email messaging, etc), it would be even better.


What does Echo offer that Siri, etc. don't? You already have all of that functionality through your phone.


This is what I always wonder too.

If I lived alone with no pets I'd love to be able to automate the temperature to save money when I wasn't home. But there are usually people in my house, and there are always pets there, so I have to keep it pretty temperate in there at all times.

Other automatable things (lights, lawn sprinklers) are so easily accomplished with simple analog timers that I really do not see the attraction in controlling them via an app. Maybe it's just the programmer in me thinking... at this point in my life, I'm very jaded when it comes to software.

Security is a big one, I guess. It would be nice to be able to monitor that remotely. Though honestly I already have a dog in the house which is probably more effective than 90% of the solutions on the market.

My big thing is whole-house streaming audio, and we've had that for years with Airplay and Sonos and now with Chromecast as well. So that's cool.


Things I've considered automating: different light settings (temperature, which light to keep on, watch-style notifications/interactions when not at my device (usually because it's charging), instructions to nearby screens or audio systems, reminders... A lot of it relates to integrations and interactions which require too many button pushes. Voice control is one step closer to "mind-reading"-like experiences and having knowledge of who is in which room would improve the context-awareness of existing apps and whole-home systems. For example, ideally if I said "play some music" it would know who I was and what I liked vs others in the house. This is more than voice control, and is likely phone- or app-integrated, but every small step gets us closer. ;-)


Not the OP but here's some of the stuff I have/want:

Have:

Adding things to my shopping list (via Alexa - currently in the kitchen and bathroom - two most frequent rooms where I'm like "hey I need more XXXX")

Changing the temperature (Nest via phone app and command line tool I wrote, and now Alexa just did a native integration)

Finding out my schedule for the day.

Finding out the weather forecast for the day.

Playing music (via Sonos or Alexa)

Locking front door when I'm going to bed.

Getting notified of movement or sound in the house while I away (Nest Cam and Smart Things)

......

Like the OP I'd love to have the voice control available in any room of the house. I'd love to have better security features (i.e. if I drive up my driveway at 3 AM turn on the outside lights, if a stranger drives up my driveway at 3 AM WAKE ME UP WITH AN ALARM). I'd love better voice controlled Sonos music selection. Composing emails, having handsfree phone or video calls, warming up the oven at a certain time, etc...


It's not low cost yet, but check out http://josh.ai. Starting high end but price will drop significantly.


I will, thanks!


Sounds like all of these could also be handled by a phone/watch instead of needing a device for each room.

For adding to shopping list, check out Amazon Dash.


I have a few Dash buttons, but it's SUPER limited what you can link them to. Plus I'd need hundreds of them:) The "Alexa, add XXXX to my shopping list" works really well for me.

I don't wear a watch around the house, and I don't really like smart watches honestly - I like big heavy metal automatics. Likewise I enjoy being able to leave my phone on the charger somewhere else. I dunno. I mean long term dream would be integrating things like "smart" mirrors, semi-smart AI-esque assistants, and so on. Then again, I'm currently in a zero-tech house on the beach in Costa Rica, with a barely there internet connection, so it's not like any of this is that important...


Let me introduce you to Dasher: https://github.com/maddox/dasher


Personally I'd love to automate

* lights * alarm clock (setting, turning it off and on, 15 more minutes) * be able to turn on my xbox/fire tv and tell it I want to watch <x>, then it figures out how * taking the dog out when it's 1am and I just want to go to bed (okay this one might take a little longer).


I'd also like to ask: how is inviting more networked microphones into your house a good idea?


OP more describes home control than home automation. Home automation is the house intelligently doing things for you, without your prompting. Home control is the house doing things for you when you ask it do things for you.

E.g. Home automation: Every day at sunset, lower the blinds

E.g. Home control : You say "House, lower the blinds now".


Agreed. I want minimal technology at home. I deal with it enough all day at work.


I just want technology that WORKS at home. Too much of my time is spent fixing other people's "mistakes". I find Alexa fits that bill. Takes a bit of learning the right cadence of speaking to her. And unfortunately she has the same name as my daughter, which causes some interesting interactions.


lots of things would be nice to have control of from work, or even just from bed at night. climate control, door locks, lights, etc.


Yeah, and what about the endless false positives in every room of the house?


It's pretty hard to order something via Alexa - I know, my friends have tried to order me several things ranging from sex toys to more Echos. Never worked. Please don't spread nonsense.


Didn't know that. Edited to reflect the actual concern, minus the humor.


Yeah, if anyone found something equivalent to the Echo in mic tech that I can hook up to my own setup, I'd die to know. That's the one thing I can't really replicate on my own system.


The kinect has gotten good reviews for its microphone array


I haven't bothered getting the SDK for the Kinect 2 yet but both of them seem to only be good for about 135 degrees around the device - the Echo is a much better omnidirectional microphone.


The kinect seems pretty cool. I've heard of one person who brought it into work and used it to identify and greet people who walked into his office. So there is probably some pretty cool user identification projects you could do with it.


My (admittedly anecdotal) experience having an Xbox One is that it would pick up phrases that barely sounded like built in triggers and do unexpected things at unexpected times, such that I felt forced to disable it.


We've had an Echo since before they were available to the public (ex-Amazon employee here), and it does occasionally beep when it thinks it hears "Alexa."

But it's very rare that it goes beyond that; usually we hear the beep and one of us will yell "never mind!", and then the Echo will go silent. Sometimes it will cancel itself, too. I think that the always-listening part is less discriminating than the active voice recognition, and that it can re-parse the last couple of seconds and decide, in retrospect, that you didn't say "Alexa."

That said, I just recently said to my wife, "I wonder if Alexa knows 'sudo make me a sandwich'". The timing was such that it parsed "sudo make me a sandwich" and answered, "Well, if you ask like that, how can I refuse?" :)

We had a good laugh. Funny thing was, looking at the app later, it was actually parsed as "Pseudo make me a sandwich." I bet Google Now would have corrected it to sudo. :)


tl;dr:

It's literally a tutorial on configuring Alexa Voice Services + their sample code on Debian.

The way you interact with it is by clicking on a button in a Java app. No trigger phrase like Echo.


But presumably you could have the Pi listen for a trigger word or whistle or whatever using software running locally, and when triggered, kick over to the Alexa API?


You could setup an IFTTT "Do" button [1] with the Maker Channel, which allows you to make an arbitrary web request. Then have a server running locally that can receive the request and trigger the recording. Nodered [2] would make setting that server up pretty simple.

[1] https://ifttt.com/products/do/button [2] http://nodered.org/


There's no good open source software to do this.

Also you need a microphone array to do it reliable (the Echo has 7 microphones).


Yes, but it won't work as well as the Echo, especially in a noisy environment.


To expand a little on this: The Echo has a 7-microphone array which is crucial to speech recognition accuracy. This gives it the best far-field recognition ability of any consumer product I've seen, with the ability to stay accurate even if you're across the room, with music playing. That's just the hardware, and replicating it's abilities will not be easy.

On the software side, supposedly they're using Nuance for recognition. Nuance isn't cutting edge: In the tests I've done, Nuance has a Word Error Rate (WER) that's 10%-20% higher than Google's, but it's still much better than something like Pocketsphinx or any other open source recognizer.

There are a lot of factors that go into making a speech interface a good experience for users: Good recognition accuracy even with background noise, good voice activity detection (even with background noise), very accurate word spotting, low latency. It's hard to hit all these things well enough to make the interface usable.


That's against the Alexa Voice Service ToS though


Does it mean that we can run it on any computer that runs Java? I read through the tutorial but couldn't find anything that specifically tied to raspberry pi.


Probably. The nice thing about the Pi is that it's cheap and has crazy low energy consumption


the trick left for the "makers" is to add a button to Raspberry Pi that will let you press it and have the app "listen" to your voice.


From what I understand, the echo has a specific piece of hardware in it that is 'always listening', and once triggered via voice command the echo actually begins to listen. So unless you have something connected to the device that could reproduce that initial voice analysis hardware, you cant have the 'always listening' feature.


Probably for power reasons, much like the most recent iPhones can be controlled by ; Hey Siri". However older iPhones can always listen too but are required to be plugged in to do it because they're using their main processor and doing it at a software level.

In short, always listening isn't difficult on non-battery devices, it's just a software problem.


I made an Alexa clone and use PocketSphinx to listen out for a wake word.

There's a phrase detection function you can configure to trigger audio streaming to the cloud.


This is awesome. I am strongly considering getting setting this up as I just purchased a fresh raspberry pi.

The only limitation appears to be you have to click a "start listening" button to get it to start recording audio. You can't simply say "Alexa" to get the raspberry pi + alexa web service to listen for your query.

Anyone have any ideas for a work around/ solution to this?


On a related note, check this project out. http://jasperproject.github.io/

You can perhaps trigger alexa to start listening through it by wiring the voice recognition to click the "start listening" button.


Jasper is outdated and incredibly difficult to install on any recent model (2, 3 or 0).


yes this is what I came into say. You can hardwire a phrase similar to how jasper does it and use that to trigger the start listening method


This actually sounds more like my ideal.

I've heard the anecdotes about Alexa responding erroneously when people weren't home and doing things like turning the furnace on, etc. That combined with the general privacy concerns make me much more comfortable with being able to push a button to get her to listen – but I'd rather it be a button on my person – like on my fitbit, watch, etc. – easy, always available – plus maybe a way to turn it on for listening over a duration – say while cooking.


Haven't tried it yet, but I hear Pocketsphinx has a keyword spotting mode.

https://github.com/cmusphinx/pocketsphinx


Look into OpenEars. It's pretty crummy but it's free :/


i wrote a simple clap trigger (using the raspberry micro) that is always listening. I use it to shuffle my phillips hue light but it can easily be used to trigger echo voice https://github.com/131/clap-trigger


I'm imagining carrying around a little bell.


I have Blather configured to do various key-presses and so control pentadactyl and so control firefox.

It could be set up to provide whatever signals a button can provide.

http://www.jezra.net/projects/blather


I think part of the terms for the Alexa voice service forbid auto-listening.


You could use a bluetooth remote with 1 button or so to trigger that listening. I don't think the RPI is powerful enough to do voice recognition.


It is indeed for a limited number of phrases! I have used it that way for at least 2 years with Jasper and PocketSphinx


I wish I could go one step further and instead of even having a mic on the device, use a web app on my phone to record and send the audio to the pi


Privacy concerns aside, this is pretty damn cool.

I've been looking for an excuse to tinker with a raspberry pi for a while - this seems like something I could have some fun with then give away to someone less paranoid/concerned with the privacy issues.


Well, isn't the point here that you can verify that it isn't listening except when you want it to?


That's a great point. A little openness can only help Amazon sell even more of these things. I don't have one, but everybody I know who does really likes it.


You do have more control, sure, but its still cloud dependent


But you can hard-interrupt the microphone. I mean it's a completely different dynamic as far as security.


Sure, But you still have to send your data through their API


Nice to see them walking through pretty much everything from getting your RPi running to making it work with AVS. That said, Sam Machin's Python CHIP / RPi client was there first, and has a smaller footprint: https://github.com/sammachin/AlexaCHIP


Props to Amazon for putting this up. There are hundreds of steps and a lot of it is manual drudgery. 10/10 would hack again.


Out of curiosity, does anyone know what amazon's incentive is to do this?


The value is in Amazon's voice services and speech recognition platform, not in the Echo device itself. As machine learning improves, voice and speech may play a bigger part in user interaction, and Amazon wants to be out in front of that.

The Echo hardware was just a way to get the ball rolling with this platform.


I'd imagine the incentive is more points for the speech recognition model. The race is for who can handle completely unstructured speech best, and the field is vast at this stage.


A direct response to Google opening their Voice Recognition API?


Like with their hackable instant order button, I would expect that at least for the short term, they want to use it as a way to make purchasing as fast and easy as possible, obviously to maximize their income.

Longer term, conversational AI is clearly the next huge thing in sales and customer interaction, and they probably want to start progressively building grounds in that domain


Guessing they'd make more money off of people ordering products through any echo device than they would on their own hardware?


Hypothetically, amazon wants to make money of what people use echo for and not the hardware itself.


What I think you guys are really looking for is something like this: http://www.microsemi.com/products/audio-processing/home-auto.... Ambarella uses those in their IP Cameras designs, so it should be straight forward to integrate...


Does anyone know a way to use the Android Alexa app without buying an Echo device or Fire TV first?


I'm able to use it on an account with only a device generated via the Alexa Voice Service. I made a device profile on the developer site, then authenticated to it via the OAuth flow.


Has anyone done hardware tinkering with the Echo? Does it run Linux? what does the mic array look like? Possible to use just the mic array and pipe the audio elsewhere?


Has anyone found a decent solution to hooking more than one mic input into an RPi? Something that would allow doing some simple DSP across, say, a 4-input array?


We have 100% totally pivoted on this one. Every proposal we put out, now has Echo front and center. As we say "screens", you mean like your father/mother used to use? How old school. A screen? Oh boy ... :-)

As Woz says, "bigger than the iPhone." That sounds like a hell of a prediction to me. Woz knows all. :-)


Would there be a way to dodge the privacy issues with this, by spoofing the service somehow?


I suppose, if you feel like implementing all the API calls yourself on your own server.


Self-signed cert?

Miss opportunity for Amazon to push Let's Encrypt...


For what reason? Amazon offers their own free certs [0] and I doubt you can get an LE cert for a local IP address.

[0]: https://aws.amazon.com/de/blogs/aws/new-aws-certificate-mana...


Anyone know how to buy one if based outside the US




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: