This should be a lesson to people who don't understand the distinction between "can't" and "won't".
If you're running things yourself and control the encryption keys required to access your data, then your service provider can't be compelled to release your data as it's not possible[1][2].
If you're delegating all of that to your service provider and they have access to the raw data, then you are putting all your trust in them to protect your data and prevent it's release. And that has to cover everything from hackers, to snooping employees, to the Feds.
[1]: Kind of ... I don't recall the Apple/FBI case going to court for a final resolution so it's possible they can compel the service provider to hack you to get the keys but at least they can't get it directly.
[2]: And obviously they can always come after you with a court order or rubber hose (or both).
> In Tarsnap I might take this to an extreme — in addition to the aforementioned encryption, I encourage users to read the tarsnap source code rather than trusting that I got everything right
You still have to trust that what's running on your machine and on the servers is compiled directly from the source you have access to, right?
is there a trusted compilation service that uses a distributed agreement mechanism, à la blockchain or DHT for validating some git/svn hash against a binary.
every downloadable source publishes its own binary hashes but the whole practice is somewhat moot in the event of server compromise, not to mention build reproducability.
True, although if someone was going to put a backdoor into a compiler, they'd have to be idiots to attack tarsnap; much better to have it trojan an OS kernel or something else which is trusted and far more widely used.
If you build it yourself with the same compiler and options, and get the same bit-exact binary, then it's safe to assume the binary was built from those sources, since infiltrating the compiler team is not an easy job.
>If you're running things yourself and control the encryption keys required to access your data, then your service provider can't be compelled to release your data as it's not possible
The court can just hold you in contempt until you do[1]. They also probably won't buy "I forgot".
If you delete the data, prior to getting the notice to produce the data and it wasn't deleted in anticipation of such a notice, sure the court could hold you in contempt, but legally you've done nothing wrong.
You cannot easily make a third random third-party delete your data.
You might put yourself in a worse position if the court compels you to produce the data, but now you can't. You better be very certain that you are able to prove that you deleted the data prior to any notice, or you might find yourself in for a long jail stay.
This is what a lot of people overlook - when the court compels you to produce evidence, or ill gotten gains, saying ,"I can't" - isn't a legal defense.
> This is what a lot of people overlook - when the court compels you to produce evidence, or ill gotten gains, saying ,"I can't" - isn't a legal defense.
We're getting closer and closer to testing that. IANAL but I'd imagine that some combination of the 4th and 5th amendments should cover that situation. The onus would be upon the prosecutor to prove that you destroyed the evidence after the fact.
As a general rule, one does not have to prove they didn't commit a crime, the prosecutor has to prove that they did. Where this gets murky is if a judge orders you to present the non-existent data as contempt of court may apply.
There's lots of case history where people are faced with a court order compelling them to return ill-gotten gains, and when they claim "I can't do it, I don't have the money." - they are found guilty of contempt of court if the court believes otherwise. Burden of evidence is not on the court to prove you do have the money, just as in this case, the burden wouldn't be on them to prove you can retrieve the data....
It depends. If you delete all data after a year pursuant to a well established practice, the court should not be able to hold you in contempt. But if your explanation smacks of destruction of evidence, then you've got a problem. Remember, it's not about what you did, in fact (the judge and the prosecutor can't know that). It's about what the circumstantial suggests you did.
Just to be clear, my assumption was the jurisdiction was US federal law, the entity deleting the data was an individual, the data was their data, and the systems deleting the data were owned by them; this based on the context as I understood it provided by the comment I replied to of this thread.
As such, would you please explain why you provide a timeframe for keeping the data prior to deleting it?
I think he's saying "If you have a policy of deleting your data older than X, you are OK." You have to be able to prove that well before being asked for the data, you had a policy that resulted in it being deleted.
Where I work, we delete emails older than 6 months as a company policy. I have to assume this is done for exactly this reason. Don't ask why we might want to do that, I'm not in charge. :-)
Right, though my assumption is the relate statements are in the context of the comment I replied to, that being an individual using only systems owned by them. As such, beyond my prior comment in this thread, I'm not aware of any law related to an individual that would bar them from deleting data at will assuming to legal request to do so was expected or in place.
Opening up to any situation beyond that would be dependent on the context of that situation.
If I am missing something, please explain explicitly what you are referencing.
Well it would still be a whole lot better than what happens now. Microsoft, for instance, sued the DoJ recently because almost half of its data requests came with a gag order.
So you wouldn't even know when they got your data, and then perhaps used it against you - in secret (like putting you on a no-fly list, etc). And then they won't even tell you why you're on the no-fly list or why you're getting an audit.
The whole system is broken from top to bottom, and I think that has a lot to do with the fact that politicians who are supposed to represent the people, don't care about what the people want anymore or to actually represent them. They care about what various rich people or powers want them to do. It's not just me saying that:
Leave the thing on a timed kill switch requiring secure input to reset the timer. It acts like a warrant canary, because the default is that without acting to intercede, your data is physically destroyed. That said... unless you have something really critical you must hide, potentially at the expense of whatever an angry/frustrated state might do to you... don't do it.
If you have an incredibly valuable idea, if you're protecting state secrets, if you're a journalist with real integrity, or a criminal who stands to lose more through exposure... then it makes sense. Otherwise, just for kicks, I don't see the point.
They can only reveal voice data IF the victim said "Alexa" when the crime was taking place. If the victim didn't say "Alexa" that day, then there's nothing to reveal. Your example about encryption doesn't make sense in the context of the Echo.
I'm not certain there exists nothing to reveal. Is there a technical limitation of the system that prevents or avoids recording of "non-'Alexa'" audio? Or do I just trust Amazon when they say their alway-on device only records when triggered with the magic word?
As I think about it, I'd ask the same regarding the voice-controllable phone in my pocket.
FWIW, I don't trust the devices, because their operations are out of any consumer's ability to control them, but maybe I'm missing something about their technical limitations that would work in my favor.
Totally possible. As an example, Codec2 used in ham-radio applications can send pretty decent voice audio at a few kilobit/s or less, and it keeps sounding perfectly understandable even at 1.2 kbit/s like this short sample:
http://www.rowetel.com/downloads/codec2/hts1a_1200.wav
To spell it out, that's < 12.7 megabytes every 24 hours. If there are 10 million Echo devices, that's 127 million megabytes a day, or 127 terabytes a day. That's actually not that hard to handle for the company that runs AWS, so while extremely unlikely, it's not impossible. Just very, very costly.
You don't need to record 24/7. A well planned spying operation would involve multiple devices and connections, location discovery, proximity with other devices etc.
Phones, tablets, Home and laptop PCs, Car PCs, Smart TVs, and pretty much every connected device, can be hijacked into becoming a bug or cooperate with any of them if in proximity.
The victim cellphone could establish a secure connection via WiFi or Bluetooth with the Echo o any similar assistant, grab the audio data to transmit, alert the user some important upgrade is needed on the phone then start transmitting the data and fake some random download just to make the downlink act as it's receiving something. That way those 12 megabytes of data would remain totally unnoticed.
This is of course the product of tinfoilhattery at its finest level, until someone does it for real.
Sure, I was just providing an upper bound for requirements if they decided to store all audio all the time from every device. Of companies that have the capability to do so under their own resources, Amazon in on the short list. Amazon could possibly pull it off and hide it in the rounding of numbers for their normal business.
Companies that interact (peer) with them would likely see something though, but possibly not as easily as it seems. The average home internet connection probably downloads far more than 12.6 MB of content from AWS hosted services every day. The only question is whether the upload amount would trigger any alarms. I think in most cases not, as it would probably just go a very small amount towards evening those peering connections out, which are likely very heavy in the other direction.
Easy answer. They may not have to send the audio. They could transcribe it locally at the client, encrypt, and send text to store on the server. Consider that almost a decade ago, programs like dragon naturally speaking could be run on a relatively inexpensive laptop. It's entirely possible that a dedicated device like the echo could do this today.
EDIT: Original reply sounded too definitive
Totally not possible. 1.2kbps * 10M devices is 12Gbps, or greater than the bandwidth of a STM-64 link. Not practical to either receive or store, even for Amazon, and certainly bandwidth consumption on that scale would be extremely noticeable.
I'm not sure why you would assume one of the largest computational and datacenter service providers in the world with many datacenters in many regions would require all input to be over a single connection to a single location, and even if it was, why it wouldn't come across the many, many peering agreements they have.
There are many reasons why it doesn't make sense for them to do this, but this isn't one of them.
Edit: To clarify, and put this in perspective, 12 Gbps is 1.5 GB per second, which is less than 127 terabytes a day. Amazon, through AWS in multiple regions, is entirely capable of adding 127 terabytes of storage a day, and already transfers MUCH more than 12 Gbps. This is not impossible, just very improbable.
Compression over an hour or a day won't do anything meaningful vs compression over, say, a minute. There's just not that much additional redundancy to eliminate. Uploading in a big burst doesn't save on overall bandwidth, either.
I see a lot of people in this thread saying that's not possible, so here's the math:
As given by squarefoot, you can record human voice at 1200 bit/sec = 150 bytes/sec. A day is 86400 seconds, assume people are talking (generously) 10% of the day, so 8640 seconds * 150 bytes = 1.3 megabytes per day uploaded to Amazon.
Does anyone doubt that _the company that runs AWS_ is incapable of dealing with barely a megabyte per device per day?
You do realize this is same company running AWS. One of if not the largest network of cloud services. They can probably do that with idle capacity of one region.
I don't have a link, but the echo doesn't transmit network data until the watchword is detected. It probably has a small local buffer, but not large enough to be useful here. So I expect that an unmodified echo would not provide historical evidence, but there's probably a way for one to be configured to stream audio continuously.
Unfortunately the only way to check if the above is true would be by examining the device firmware.
Also, the definition of "unmodified" and "configured" is very broad in this context. There's no need for a screwdriver or messing with a JTAG port when a well hidden magic packet coming from the Internet can trigger a watchword=off function.
Right, but this is a targeted action that can only be applied to a limited number of people without causing problems for Amazon. It's not useful when something unexpected happened and there was an echo present.
Listening locally for a trigger word != listening, recording and transmitting. The point of the echo is to have an on-demand voice channel to an artificially intelligent assistant.
The device is always listening (unless you mute the mic), but that doesn't mean it's always sending a stream to Amazon.
The device has to continuously process audio to listen for the wake word (upon which it does start sending everything to amazon) and a few seconds of the buffer before that so you can just bark out commands rather than waiting for it to acknowledge you.
The device does claim to only send voice recordings to Amazon during a query (when you say the wake word).
You can ask Alexa, "Are you always listening?" to get details about it. It's possible that the device doesn't have anything interesting anyway and they were just resisting the disclosure for PR reasons.
I happen to have met the hardware engineers who developed the chipset and microphone array PCB inside the Echo and they told me there is no recording unless the trigger word is detected.
I saw no evidence to prove or disprove this and there's nothing to say that the device couldn't be altered after installation.
Once upon a time this behavior would have been determined by hardware. For a device developed this decade, however, there's no way that a PCB design would determine this. Rather, this behavior is determined by the firmware loaded onto the microcontroller.
Even if one didn't know that, the fact that the trigger word is easily changed by the user should indicate that everything is in software.
Makes me think in future cases when someone says in court "I do not recall", we are going to see Google Home / Echo data being called into play. As someone with 3 google home devices in the house, I feel like my feara I've been suppressing are all coming true.
Home automation is an amazingly cool thing to do, and i theorize that it doesn't become a true part of your home unless you remove as many of the barriers to regular use as possible. Voice is very convenient, and access on every floor of the house was therefore something I wanted, so I bought 3.
The experiment is still in progress, but I can tell you they're 90% used for playing music right now :)
Apple and Amazon also have some amount of clout that helps them stand their ground to a great degree in situations like this. The Feds realize that if they shut down Amazon or Apple, the country is going to fall to pieces.
As usual, startups are the ones that will suffer if faced with a situation like this.
But in this case, even if defendant had developed a system by themselves and had complete control over the data, nothing is preventing him from turning it over as well. Here they are complying with a customer's wishes and I imagine the same person would decide to open their own data if they had developed their own system.
That assumes the data is there in the first place. What if Alexa just didn't record everything everyone said. It just sounds really really dumb.
Like snapchat saving every image privately shared on a server. It's such a big scam. Sole reason I'm not buying their shares.
I'm hoping someday we'll get true offline speech to annotated text translation that we as consumers control. Like I control what data is in my machine's hdd.
That's a good lesson, but I don't see how this example teaches it. The defendant consented to releasing the information. If Echo were designed to give the user total control over the encryption keys required to access their data, they still could have consented.
If you're running things yourself and control the encryption keys required to access your data, then your service provider can't be compelled to release your data as it's not possible[1][2].
If you're delegating all of that to your service provider and they have access to the raw data, then you are putting all your trust in them to protect your data and prevent it's release. And that has to cover everything from hackers, to snooping employees, to the Feds.
[1]: Kind of ... I don't recall the Apple/FBI case going to court for a final resolution so it's possible they can compel the service provider to hack you to get the keys but at least they can't get it directly.
[2]: And obviously they can always come after you with a court order or rubber hose (or both).