This should be a lesson to people who don't understand the distinction between "...

icebraining · on March 7, 2017

Or as cperciva put it, "Playing chicken with cat.jpg"

http://www.daemonology.net/blog/2012-01-19-playing-chicken-w...

leeoniya · on March 7, 2017

> In Tarsnap I might take this to an extreme — in addition to the aforementioned encryption, I encourage users to read the tarsnap source code rather than trusting that I got everything right

You still have to trust that what's running on your machine and on the servers is compiled directly from the source you have access to, right?

cperciva · on March 7, 2017

You don't need to trust the code on the server -- that's the point.

You can compile the client code yourself. In fact, until recently you had to compile the client code yourself.

leeoniya · on March 7, 2017

ah yes, sorry.

is there a trusted compilation service that uses a distributed agreement mechanism, à la blockchain or DHT for validating some git/svn hash against a binary.

every downloadable source publishes its own binary hashes but the whole practice is somewhat moot in the event of server compromise, not to mention build reproducability.

beojan · on March 7, 2017

You can compile the code yourself, then you would know the binary matches the source.

Of course, you have to trust your compiler and OS, but with a service you would have to trust their compiler and OS.

spb · on March 7, 2017

Though, of course, this still means trusting the compiler: https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thomp...

cperciva · on March 7, 2017

True, although if someone was going to put a backdoor into a compiler, they'd have to be idiots to attack tarsnap; much better to have it trojan an OS kernel or something else which is trusted and far more widely used.

williamscales · on March 7, 2017

Do you trust your compiler?

koolba · on March 7, 2017

Hey you guys provide binaries now? News to me!

Was never an issue till now though.

cperciva · on March 7, 2017

We're shipping binaries for Debian and derived platforms. Other platforms will follow.

Asooka · on March 8, 2017

If you build it yourself with the same compiler and options, and get the same bit-exact binary, then it's safe to assume the binary was built from those sources, since infiltrating the compiler team is not an easy job.

sesutton · on March 7, 2017

>If you're running things yourself and control the encryption keys required to access your data, then your service provider can't be compelled to release your data as it's not possible

The court can just hold you in contempt until you do[1]. They also probably won't buy "I forgot".

[1]: https://arstechnica.com/tech-policy/2016/04/child-porn-suspe...

saycheese · on March 7, 2017

If you delete the data, prior to getting the notice to produce the data and it wasn't deleted in anticipation of such a notice, sure the court could hold you in contempt, but legally you've done nothing wrong.

You cannot easily make a third random third-party delete your data.

ghshephard · on March 7, 2017

You might put yourself in a worse position if the court compels you to produce the data, but now you can't. You better be very certain that you are able to prove that you deleted the data prior to any notice, or you might find yourself in for a long jail stay.

This is what a lot of people overlook - when the court compels you to produce evidence, or ill gotten gains, saying ,"I can't" - isn't a legal defense.

koolba · on March 7, 2017

> This is what a lot of people overlook - when the court compels you to produce evidence, or ill gotten gains, saying ,"I can't" - isn't a legal defense.

We're getting closer and closer to testing that. IANAL but I'd imagine that some combination of the 4th and 5th amendments should cover that situation. The onus would be upon the prosecutor to prove that you destroyed the evidence after the fact.

As a general rule, one does not have to prove they didn't commit a crime, the prosecutor has to prove that they did. Where this gets murky is if a judge orders you to present the non-existent data as contempt of court may apply.

ghshephard · on March 7, 2017

There's lots of case history where people are faced with a court order compelling them to return ill-gotten gains, and when they claim "I can't do it, I don't have the money." - they are found guilty of contempt of court if the court believes otherwise. Burden of evidence is not on the court to prove you do have the money, just as in this case, the burden wouldn't be on them to prove you can retrieve the data....

rayiner · on March 7, 2017

It depends. If you delete all data after a year pursuant to a well established practice, the court should not be able to hold you in contempt. But if your explanation smacks of destruction of evidence, then you've got a problem. Remember, it's not about what you did, in fact (the judge and the prosecutor can't know that). It's about what the circumstantial suggests you did.

saycheese · on March 7, 2017

Just to be clear, my assumption was the jurisdiction was US federal law, the entity deleting the data was an individual, the data was their data, and the systems deleting the data were owned by them; this based on the context as I understood it provided by the comment I replied to of this thread.

As such, would you please explain why you provide a timeframe for keeping the data prior to deleting it?

scotch_drinker · on March 7, 2017

I think he's saying "If you have a policy of deleting your data older than X, you are OK." You have to be able to prove that well before being asked for the data, you had a policy that resulted in it being deleted.

Where I work, we delete emails older than 6 months as a company policy. I have to assume this is done for exactly this reason. Don't ask why we might want to do that, I'm not in charge. :-)

rayiner · on March 7, 2017

Right. It's about insulating yourself from the accusation that you deleted that data to cover up wrongdoing, rather than for other reasons.

seanhunter · on March 7, 2017

There are quite a few situations where you have an obligation to not delete data so "legally you've done nothing wrong" is probably overly broad.

saycheese · on March 7, 2017

Right, though my assumption is the relate statements are in the context of the comment I replied to, that being an individual using only systems owned by them. As such, beyond my prior comment in this thread, I'm not aware of any law related to an individual that would bar them from deleting data at will assuming to legal request to do so was expected or in place.

Opening up to any situation beyond that would be dependent on the context of that situation.

If I am missing something, please explain explicitly what you are referencing.

ynniv · on March 7, 2017

Under which circumstances is an individual required to retain data that they created?

brokenmachine · on March 7, 2017

Under which circumstances is an individual required to retain data at all?

mtgx · on March 7, 2017

Well it would still be a whole lot better than what happens now. Microsoft, for instance, sued the DoJ recently because almost half of its data requests came with a gag order.

So you wouldn't even know when they got your data, and then perhaps used it against you - in secret (like putting you on a no-fly list, etc). And then they won't even tell you why you're on the no-fly list or why you're getting an audit.

The whole system is broken from top to bottom, and I think that has a lot to do with the fact that politicians who are supposed to represent the people, don't care about what the people want anymore or to actually represent them. They care about what various rich people or powers want them to do. It's not just me saying that:

https://www.youtube.com/watch?v=5tu32CCA_Ig

M_Grey · on March 7, 2017

Leave the thing on a timed kill switch requiring secure input to reset the timer. It acts like a warrant canary, because the default is that without acting to intercede, your data is physically destroyed. That said... unless you have something really critical you must hide, potentially at the expense of whatever an angry/frustrated state might do to you... don't do it.

If you have an incredibly valuable idea, if you're protecting state secrets, if you're a journalist with real integrity, or a criminal who stands to lose more through exposure... then it makes sense. Otherwise, just for kicks, I don't see the point.

shshhdhs · on March 7, 2017

They can only reveal voice data IF the victim said "Alexa" when the crime was taking place. If the victim didn't say "Alexa" that day, then there's nothing to reveal. Your example about encryption doesn't make sense in the context of the Echo.

npongratz · on March 7, 2017

I'm not certain there exists nothing to reveal. Is there a technical limitation of the system that prevents or avoids recording of "non-'Alexa'" audio? Or do I just trust Amazon when they say their alway-on device only records when triggered with the magic word?

As I think about it, I'd ask the same regarding the voice-controllable phone in my pocket.

FWIW, I don't trust the devices, because their operations are out of any consumer's ability to control them, but maybe I'm missing something about their technical limitations that would work in my favor.

joezydeco · on March 7, 2017

From looking at the internals, there's technically nothing that prevents uploading all audio constantly to AWS.

Practically, it would be unwieldy to have 10,000,000 devices sending audio to 10,000,000 audio decoding and processes 24 hours a day nonstop.

You'd also see a huge bandwidth hit at your ISP, which would certainly kill future adoptions if the word got out.

austenallred · on March 7, 2017

So record the audio 24/7 and compress and send once an hour? Once a day? Or eliminate empty space first. The files wouldn't be that big

squarefoot · on March 7, 2017

Totally possible. As an example, Codec2 used in ham-radio applications can send pretty decent voice audio at a few kilobit/s or less, and it keeps sounding perfectly understandable even at 1.2 kbit/s like this short sample: http://www.rowetel.com/downloads/codec2/hts1a_1200.wav

More information here: http://www.rowetel.com/?page_id=452

kbenson · on March 7, 2017

To spell it out, that's < 12.7 megabytes every 24 hours. If there are 10 million Echo devices, that's 127 million megabytes a day, or 127 terabytes a day. That's actually not that hard to handle for the company that runs AWS, so while extremely unlikely, it's not impossible. Just very, very costly.

squarefoot · on March 7, 2017

You don't need to record 24/7. A well planned spying operation would involve multiple devices and connections, location discovery, proximity with other devices etc. Phones, tablets, Home and laptop PCs, Car PCs, Smart TVs, and pretty much every connected device, can be hijacked into becoming a bug or cooperate with any of them if in proximity. The victim cellphone could establish a secure connection via WiFi or Bluetooth with the Echo o any similar assistant, grab the audio data to transmit, alert the user some important upgrade is needed on the phone then start transmitting the data and fake some random download just to make the downlink act as it's receiving something. That way those 12 megabytes of data would remain totally unnoticed.

This is of course the product of tinfoilhattery at its finest level, until someone does it for real.

kbenson · on March 7, 2017

Sure, I was just providing an upper bound for requirements if they decided to store all audio all the time from every device. Of companies that have the capability to do so under their own resources, Amazon in on the short list. Amazon could possibly pull it off and hide it in the rounding of numbers for their normal business.

Companies that interact (peer) with them would likely see something though, but possibly not as easily as it seems. The average home internet connection probably downloads far more than 12.6 MB of content from AWS hosted services every day. The only question is whether the upload amount would trigger any alarms. I think in most cases not, as it would probably just go a very small amount towards evening those peering connections out, which are likely very heavy in the other direction.

treebeard901 · on March 7, 2017

Easy answer. They may not have to send the audio. They could transcribe it locally at the client, encrypt, and send text to store on the server. Consider that almost a decade ago, programs like dragon naturally speaking could be run on a relatively inexpensive laptop. It's entirely possible that a dedicated device like the echo could do this today. EDIT: Original reply sounded too definitive

ghaff · on March 8, 2017

It's not transcribed locally at the client except for the wake word.

ThrowawayR2 · on March 7, 2017

Totally not possible. 1.2kbps * 10M devices is 12Gbps, or greater than the bandwidth of a STM-64 link. Not practical to either receive or store, even for Amazon, and certainly bandwidth consumption on that scale would be extremely noticeable.

kbenson · on March 7, 2017

I'm not sure why you would assume one of the largest computational and datacenter service providers in the world with many datacenters in many regions would require all input to be over a single connection to a single location, and even if it was, why it wouldn't come across the many, many peering agreements they have.

There are many reasons why it doesn't make sense for them to do this, but this isn't one of them.

Edit: To clarify, and put this in perspective, 12 Gbps is 1.5 GB per second, which is less than 127 terabytes a day. Amazon, through AWS in multiple regions, is entirely capable of adding 127 terabytes of storage a day, and already transfers MUCH more than 12 Gbps. This is not impossible, just very improbable.

dheera · on March 7, 2017

Not to mention that people aren't actually speaking 24/7. In any given space there are probably people talking for less than 10% of the day.

dpark · on March 7, 2017

Compression over an hour or a day won't do anything meaningful vs compression over, say, a minute. There's just not that much additional redundancy to eliminate. Uploading in a big burst doesn't save on overall bandwidth, either.

owenversteeg · on March 8, 2017

I see a lot of people in this thread saying that's not possible, so here's the math:

As given by squarefoot, you can record human voice at 1200 bit/sec = 150 bytes/sec. A day is 86400 seconds, assume people are talking (generously) 10% of the day, so 8640 seconds * 150 bytes = 1.3 megabytes per day uploaded to Amazon.

Does anyone doubt that _the company that runs AWS_ is incapable of dealing with barely a megabyte per device per day?

njharman · on March 7, 2017

You do realize this is same company running AWS. One of if not the largest network of cloud services. They can probably do that with idle capacity of one region.

michaelt · on March 7, 2017

Doesn't seem any more unwieldy than Skype or Spotify?

ynniv · on March 7, 2017

I don't have a link, but the echo doesn't transmit network data until the watchword is detected. It probably has a small local buffer, but not large enough to be useful here. So I expect that an unmodified echo would not provide historical evidence, but there's probably a way for one to be configured to stream audio continuously.

squarefoot · on March 7, 2017

Unfortunately the only way to check if the above is true would be by examining the device firmware.

Also, the definition of "unmodified" and "configured" is very broad in this context. There's no need for a screwdriver or messing with a JTAG port when a well hidden magic packet coming from the Internet can trigger a watchword=off function.

ynniv · on March 7, 2017

Right, but this is a targeted action that can only be applied to a limited number of people without causing problems for Amazon. It's not useful when something unexpected happened and there was an echo present.

ahallock · on March 7, 2017

If it's true that all audio is being sent to Amazon, I will throw my Echo out today. That's a serious invasion of privacy and breach of trust.

JetSpiegel · on March 7, 2017

The entire point of the Echo and similar devices is that it's always listening.

FungalRaincloud · on March 7, 2017

Listening locally for a trigger word != listening, recording and transmitting. The point of the echo is to have an on-demand voice channel to an artificially intelligent assistant.

Karunamon · on March 7, 2017

The device is always listening (unless you mute the mic), but that doesn't mean it's always sending a stream to Amazon.

The device has to continuously process audio to listen for the wake word (upon which it does start sending everything to amazon) and a few seconds of the buffer before that so you can just bark out commands rather than waiting for it to acknowledge you.

fossuser · on March 7, 2017

The device does claim to only send voice recordings to Amazon during a query (when you say the wake word).

You can ask Alexa, "Are you always listening?" to get details about it. It's possible that the device doesn't have anything interesting anyway and they were just resisting the disclosure for PR reasons.

npongratz · on March 7, 2017

> You can ask Alexa, "Are you always listening?" to get details about it.

True, I can ask, but I don't know if I'd always believe the answer.

tonylemesmer · on March 7, 2017

I happen to have met the hardware engineers who developed the chipset and microphone array PCB inside the Echo and they told me there is no recording unless the trigger word is detected.

I saw no evidence to prove or disprove this and there's nothing to say that the device couldn't be altered after installation.

[edited: grammar]

jessaustin · on March 7, 2017

Once upon a time this behavior would have been determined by hardware. For a device developed this decade, however, there's no way that a PCB design would determine this. Rather, this behavior is determined by the firmware loaded onto the microcontroller.

Even if one didn't know that, the fact that the trigger word is easily changed by the user should indicate that everything is in software.

IncRnd · on March 7, 2017

False Positives.

"I'll Let Some" one else discuss how false positives can affect detection.

verelo · on March 7, 2017

Makes me think in future cases when someone says in court "I do not recall", we are going to see Google Home / Echo data being called into play. As someone with 3 google home devices in the house, I feel like my feara I've been suppressing are all coming true.

andrepd · on March 7, 2017

Then why did you buy 3 Google home devices?

simplehuman · on March 7, 2017

Adrenaline rush

Neliquat · on March 7, 2017

You need hobbies.

verelo · on March 8, 2017

Home automation is an amazingly cool thing to do, and i theorize that it doesn't become a true part of your home unless you remove as many of the barriers to regular use as possible. Voice is very convenient, and access on every floor of the house was therefore something I wanted, so I bought 3.

The experiment is still in progress, but I can tell you they're 90% used for playing music right now :)

vollmond · on March 7, 2017

I have these same fears, yet there are 4 android devices, an ipad, and an echo dot in my home.

High utility and low chance of me being in the specific situation where internal home recordings are requested.

High downside if I do end up there, but low risk of ending up there (hopefully).

njharman · on March 7, 2017

Except Amazon refused until defendant said it was OK.

It's often easier to coerce single person rotting jail on contempt charges than a huge corporation.

Point being, like everything, it is not as b/w, not as simple as you imply.

strunz · on March 7, 2017

Seriously, did anyone read the article? The defendant consented, Amazon didn't want to.

dheera · on March 7, 2017

Apple and Amazon also have some amount of clout that helps them stand their ground to a great degree in situations like this. The Feds realize that if they shut down Amazon or Apple, the country is going to fall to pieces.

As usual, startups are the ones that will suffer if faced with a situation like this.

Kiro · on March 7, 2017

I much rather put my trust in them than myself. I am clueless about security.

pizzetta · on March 7, 2017

But in this case, even if defendant had developed a system by themselves and had complete control over the data, nothing is preventing him from turning it over as well. Here they are complying with a customer's wishes and I imagine the same person would decide to open their own data if they had developed their own system.

nojvek · on March 7, 2017

That assumes the data is there in the first place. What if Alexa just didn't record everything everyone said. It just sounds really really dumb.

Like snapchat saving every image privately shared on a server. It's such a big scam. Sole reason I'm not buying their shares.

I'm hoping someday we'll get true offline speech to annotated text translation that we as consumers control. Like I control what data is in my machine's hdd.

mikeash · on March 7, 2017

That's a good lesson, but I don't see how this example teaches it. The defendant consented to releasing the information. If Echo were designed to give the user total control over the encryption keys required to access their data, they still could have consented.