If we are permitted baseless speculation: NO, almost certainly not. Reason #1: A...

mlsu · on Aug 8, 2022

Just to elaborate a bit because this is a pet-peeve.

What's going on here is that people are misdirected. They've seen a bunch of sci-fi where physical robots moving through space can sense the world around them and use the information they gather to make the world legible to them. The assumption (fair, because moving robots look like us!) is that that is the easiest way to gather information is to physically See, Touch, and Hear it. Sophisticated robot system that looks scifi -> this must be the leading edge of the cyberpunk dystopia.

The reality is completely different. Your life is already legible to the computer. For the purposes of whatever cyberpunk dystopia you imagine we are in, your 1kb credit card statement is like an 8K Imax camera to the Ministry for Coercion.

The leading edge of the cyberpunk dystopia is the apparatus that tries to get you to be more online. Write more posts and emails, sign up for smart banking, use an internet system to keep track of your employees, digitize paper files and make them legible.

Why would any intelligent malignant thing use the extremely power hungry and inefficient raw input from a physical robot for its purposes when it could use the extraordinarily efficient raw text format of bank statements, insurance claims, credit reports, browser history...

Every piece of information that you imagine could be used against you in the hypothetical dystopia hinted at in this piece would fit inside of 100 megabytes of plain text. Plain text that you, or an entity working for your insurance company, typed into a computer. They already have it.

These things -- scifi-looking robots -- really are just trying to clean your house better and sell you products more cheaply.

godelski · on Aug 8, 2022

I kinda agree with you, but also not. I think a lot of people here are over assuming what the robots can sense and yes, this is that Sci-Fi like reality. But this doesn't mean we should throw the baby out with the bath water. Anyone that has seriously worked with data will tell you how subtle things can reveal a lot of information. Let's look at these robots. Knowing that there are pets and/or children in a house is useful data to sell kids/pet toys and other items. You might think "Amazon already has that info from purchasing history" but a second source builds higher confidence and can differentiate if something was bought for yourself or for a friend. There's also information like how much the roomba fills up. Are you a messy person or very organized? Do you take off your shoes at the door? When you sit down at the couch? At all? Do you make frequent spills or messes in certain rooms? This obviously has selling potential.

You don't need microphones or super sensors. Basic information can do a lot. There's also a lot to be said about redundancy and verification. I think you're over-correcting in your rebuttal. This is the reason data is scary, because it is simple and things you wouldn't think are useful. But they are useful when you have thousands or millions of data points.

mlsu · on Aug 8, 2022

This is a great point, and you are absolutely correct about extracting signal from noise.

But not this kind of data!

There might be some useful signal in the points you describe. But it will be extremely difficult to extract, because first you have to sense it ($$$$ -- what is a pet hair to a robot?), then you have to turn the sensor data into something parsable ($$$$ -- how do you determine if this thing the vacuum picked up is a pet hair?), then you have to do the hard data work to make a link between these physical world things and a signal Amazon actually cares about ($$$$ -- OK, they have a Shih Tzu. How can we improve product recommendations with that?).

The time of a sensor fusion/robotics person/statistics person at Amazon will not be spent on any of this, not for many years. The data that Amazon does not need these robots to collect, such as zip code, other devices on the LAN, purchase history and login time, and so on -- there is enough low-hanging fruit there to occupy every newly minted PhD for years. The opportunity cost of a statistician working on this robot stuff for an hour outweighs the benefit by an order of magnitude or more. The opportunity cost of a robotics expert, this is even more true. Robots run into stuff and break in the warehouse every single day; for a company like Amazon, that is millions of dollars.

I do concede: they will definitely collect and store it. But it will be a very long time before any of this data is used in the name of evil. On the other hand, if they make the robot good at cleaning your house, they might sell a $400 robot to a hundred million people.

I totally agree with your point, though, and will offer an Amazon-flavored extension of it:

Look at how Amazon gets YOU to freely give them data to digitize your life. They want you to sign up for digital statements on the ChaseBank Silver Plus Amazon Card. Why? They lobby against data-protection legislation. Why? The ebook app wants location data from your phone. Why? They want you to purchase PS4 games on Amazon, instead of on the publisher's website. Why?

The robots thing is easy to understand for us because we are morphologically similar to Amazon robots. The rest of this stuff, though, is the real low hanging fruit. And there is enough of it to occupy every single $200,000/yr data science PhD being graduated -- for now, and for a long time coming.

buran77 · on Aug 8, 2022

> But it will be a very long time before any of this data is used in the name of evil.

I'm surprised that people here are honestly giving Amazon the benefit of the doubt when it comes to data collection. Or asking "what could Amazon do with the data?", "must be benign". "Very long time" is very relative. One year? Three years? Even under the relatively naive assumption that Amazon doesn't already have any plan to monetize all that collected data or use it any other way, what happens when they eventually do? At that point even discounting all the collected data as "expired", Amazon will have an army of these in the wild and realistically people won't throw them out, they'll just internalize the escalation of data collection and keep going.

This is a Trojan horse, if they came equipped with cameras, LIDAR, and microphones. Nothing wrong with that as long as you don't start from the wrong idea and assumption. This will be used to collect as much data as it can and you'll have close to no control over that. If you're fine with it or think you can benefit too, all good.

godelski · on Aug 8, 2022

I think it is because people are really bad at logic. Seriously. Most of the arguments on here are coming from a very bad set of assumptions. The major assumption here is, like the OP said, some Sci-Fi sensor data. Even as I tried to explain this people are still thinking way too exact data ("Shih Tzu" vs "small dog"). People are correct to counter these claims, but when they do, they usually go overboard. They say the data is useless or "they already have it, so it is effectively useless." Verification is a powerful tool.

But I think a large amount really comes down to the fact that people don't think from the statistics or mathematical viewpoint (despite claims). I've read probably more on stats and prob than the vast majority of people here yet I would never remotely claim I'm good at stats. It is fucking hard, yet so many think it is so easy (be wary of anyone that suggests it is easy). So here's some things I notice:

- People think we need crazy specific details to find trends, but you can extract signal from noise with a lot of data (especially diversified data) and very careful insights. The counter to this isn't to just dismiss but to ask if simpler data can be useful[0]. We feel the need to take a drastically opposing argument rather than a nuanced argument.

- Thinking that there are singular causal factors (see the arguments "it is about robots, not data" when it is clearly both). We talk about PCA all the time but a lot of our arguments create frameworks where we're only discussing what is the dominant factor and pretending that others don't matter[1].

- Being nowhere near familiar with emergence and how it plays a role in data and our lives (I suspect lack of emergence knowledge is why so many believe in "deep state" or conspiracies).

- Not framing things from a probabilistic viewpoint. I see this in security arguments (usually among non-experts but tech literate: see the Signal community forums or here). Most security is probabilistic in nature and about putting bounds, not guarantees. The classic example is remotely wiping a phone. If you wipe, there's a probability that an adversary hasn't gotten the data first. If it isn't wiped, said adversary has all the time in the world. Everything, and I mean literally everything, is a probability. What we call truth just has tight bounds.

- Zero sum fallacy. Many people think the vast majority of games are zero sum, when very few really are. We see this a lot in economics (value can in fact be added to the system. It is only zero sum in a infinitesimal point in time). A tide raises or sinks ships equally, it does not raise some and sink others to the point where we have a balanced system.

- Over simplification and thinking higher order approximations aren't necessary for "good enough."[1] People think that most probability distributions in the wild are Normal(ish) when they aren't (most have heavy tails). This is all caused by creating a "spherical cow in a vacuum" framework. An oversimplification of a problem isn't necessarily a good approximation and can often lead you in the wrong direction! Specifically the major challenges we face today, we need higher order terms to even get a reasonable approximation. For those math inclined, think of the Taylor series approximation of e^x (1 + x + x^2/2! +...+ x^n/n!). y = 1 isn't a good approximation except in a very narrow region (around x=0). y = x + 1 is even a worse approximation (depending on your region of interest)! Even a 4th order approximation is only useful on the bounds [-1, 1] (8th order will get us to [-2,2], maybe [-3,3]), but diverges quickly beyond that. If we're concerned about x<0 then the first order approximation is better than the first 5 (y=1 is closer to e^-100 than the 5th order x=-100). But if we're concerned with x>0 then the higher orders are even better.

The real Trojan horse is how we've structured a belief that these things don't matter. That we think simple answers are good ("you don't know something unless you can explain it to a child" is bullshit). I suspect that this is evolutionary as this framework has allowed us to solve most of the issues related to survival. The problem is that our modern society is much more complicated than that and we have effectively solved these survival hardships. The problems we face are now so complicated that our simple frameworks are no match for solving them. The above are things that most people struggle with yet a single one can quickly ruin our frameworks. I'd argue that we see all of these points showing up in the arguments of this post (this rant is still rooted in the topic, just meta. I'm writing this so we can have better arguments).

[0] To help, let me give a very clear example of something almost trivial to determine but highly useful. Suppose your roomba constantly bumps into things near a door and those things move every single day. It is very likely that those are shoes. We now know to advertise a shoe rack so the person can organize their shoes at their door. Yes, there are more complicated examples where we can get more intimate details of ones life and sell more products, but the simplicity here is that there is noise that we can extract signal out of and in a way that purchasing behavior or online behavior wouldn't capture.

mlsu · on Aug 9, 2022

edit:

OK, I re-read this, it does read sort of confrontational. I actually probably agree with like 90% of what you are saying. Probably not really worth it to quibble on made up examples like this. I just want to emphasize that robotics in uncontrolled environments is very very hard.

What I'm getting at is, why do the hard robotics work to get a new stream of data when there are many streams that are already computer legible, probably give very similar insights *, and are more compact. The data is not useless, certainly not, but it's actually quite hard to obtain, even with the sensor kit this thing has.

If you were to focus your efforts on starving Amazon's insatiable consumer data appetite, the robot stuff they're talking about here should be very low on your priority list.

Your broader point definitely stands.

(*not precisely the same insights; I certainly see where you are going there!)

-----------------------------

I'm being overly specific to illustrate a point.

Try doing the shoe rack thing, I mean really try it. Is it really that easy? These shoes don't have barcodes the way things at the warehouse do, and the robots already screw that stuff up in the warehouse all the time. How much money will you make improving shoe rack recommendations by 2.5%? How could you come up with that number, which might justify such a project to Amazon executives, who are deciding what you work on? Why spend so much effort on the shoe thing when your robot is already running into the shoes!?

The big thing that I probably should have made clear is that robotics is really hard. Every single separate physical thing your robot runs into is difficult, and they're all different. It would take a lot of $$$ to nail the problem of determining that the thing you ran into is a pair of shoes, or that this area is the front door, or that you even ran into anything at all. Seriously!

Now, imagine putting the same number of statistics people, such as yourself, on a team in Amazon Healthcare. Or even just on a team that gets just 200 more people to sign up for Amazon Healthcare. I would be willing to bet that munging the data that comes out of that is simultaneously far easier to work with (no pesky physical reality!) and an order of magnitude more lucrative than any of these robot projects.

Amazon has five hundred open positions for data scientists. I assure you, none of them will be working on something like this. Not for a long time.

godelski · on Aug 9, 2022

> I just want to emphasize that robotics in uncontrolled environments is very very hard.

I'm not sure what convinced you I was making this argument. My example was illustrative as a counter to *your example* of determining a specific dog breed. I'm trying to show an easier problem. But "easier" is comparative and doesn't mean the problem is "easy." I'm guessing this is the confusion? Let's be real, figuring out the dog breed by determining dog hair from other debris is substantially harder and I'm pretty sure you'd need a DNA scan. It would be near impossible (probably entirely) to differentiate hair of any type with current sensors. I don't actually think the shoe example is extremely hard considering these robots have lidar on them. Lidar can in fact tell you that something is roughly shoe shaped, so I'm not sure what your gripe is. No need for DNA scanning. No need for cameras or microphones. The problem is likely solvable given *existing instruments on commercially sold products.* This, again, doesn't mean the problem is "easy" though. Just that it can be solved.

> How much money will you make improving shoe rack recommendations by 2.5%?

Okay, but I'm assuming there's more than exactly one singular problem that the mapping can be solve. So I'm not sure what your point is.

Let's try a trivial example. One that *is easy* and will demonstrate that we can sell things beyond shoe racks with said data. Knowing the (approximate) square footage of your house is useful. For example, if you live in a 500sqft place, you can tell a lot about income because we know this is likely a small apartment. You probably can't afford a lot of stuff and Amazon probably shouldn't advertise luxury goods to you, especially large ones.

> Now, imagine putting the same number of statistics people, such as yourself, on a team in Amazon Healthcare.

I'm not sure why you're pigeon holling me and making this a zero sum game. I'm actively arguing an "and" position, not an "or" position. Yeah, health care is lucrative. But Amazon has an insane amount of wealth, more than they actually know what to do with. They are perfectly capable of being able to hire more than 200 positions for their robotics department, which is what they advertise on their robotics site. They are perfectly capable of increasing this number without taking away any funding from their health care side (not a zero sum game). I'd also assume that there's a significant amount of domain overlap considering that Roombas. Mapping warehouses is a pretty similar problem to mapping houses, so I'm not sure what the problem is.

godelski · on Aug 8, 2022

So I still think your dog example is too complicated. All you need to know is if there's a 4 legged creature walking around. You can probably differentiate a cat from a dog here. You can also probably differentiate a small, medium, or large dog. You don't need to know the exact type for this data to be incredibly useful. Just knowing the size will get you a lot of useful data. It is really easy to over complicate the situation. A little data goes a long way.

tootie · on Aug 8, 2022

It's true. Why is Amazon/Google/Facebook considered the evil triumverate of the panopticon and not JP Morgan Chase/Capital One/AmEx? These three know more about the US economy than the BLS or Dept of Commerce.

godelski · on Aug 8, 2022

Why are you framing it as an "or" situation? I'm pretty confident most people dislike both these sectors. It is an "and" situation. Turning "ands" into "ors" is really whataboutism.

TrevorJ · on Aug 8, 2022

Given Amazon's reliance on consumer data and metrics, as well as their past behavior with alexa and echo I find this viewpoint to be almost stubbornly naïve.

tra3 · on Aug 8, 2022

Agreed. Somewhat mirroring the "Nothing to hide" argument [0] -- just because I'm currently not doing anything illegal, I'm ok with invasion of privacy.

Amazon et al, need data. Even though I can't think off the top of my head, how this data can be monetized, does not mean that someone with a correct incentive has not come up with a way to take advantage and abuse access to that data.

[0]: https://en.wikipedia.org/wiki/Nothing_to_hide_argument

googlryas · on Aug 8, 2022

Would you like to make some specific timebounded claims as to what you believe Amazon will do with this acquisition?

So we can come back later, and tell whether OP is naive and you are a realist, or whether OP is a realist and you are overly paranoid.

sangnoir · on Aug 8, 2022

Within the next 15 years, Amazon will provide data gathered from Roomba LIDAR and/or Sidewalk[1] to police (with or without a warrant) which will tell them how many people are in a private home, and when.

1. Or whatever their blanket WiFi system that piggybacks on your bandwidth is called

googlryas · on Aug 9, 2022

Well, I appreciate you playing, but 15 years is way too long of a time frame. A majority of the bets on longbets.org aren't even for 15 years.

sangnoir · on Aug 9, 2022

Roomba doesn't have a Lidar vacuum,AFAIK, so I'm giving them 2-3 years to close the deal and requisite product development.

I know Amazon already has their heart in the wrong place (thanks to Ring LEO posture), buy it's going to take time for police departments to cotton-on to what is possible (likely from Amazon slides). The surveillance will be technically possible in 5 years, but adjusting human factors, I think the first such news stories may break in about 7 years, albeit poorly sourced. In 15 years, it will be common knowledge.

That said, you may take up sibling in their offer, as their are confident it's going to happen in 5 years.

OrderlyTiamat · on Aug 8, 2022

same, but I'm upping to within 5 years. I do not think given the history with ring that it'll take very long. Anyone wanna bet 3 years? Also reasonable I think, but I'm staying at 5.

zamalek · on Aug 8, 2022

There are going to be high resolution cameras on them in no time, and people are going to love it.

threatofrain · on Aug 8, 2022

> Non-Reason #1: They want to sell your home information to the police state. Come on people, your local government already knows the layout of your home, you have to tell them that when you build it!

Amazon is not the state and gathering information on your home improves their chances to win in the smart home. Having access to your home network also helps.

The Roomba can also improve over time. If Amazon sees the Roomba is successful, then it will be considered more deeply for its strategic value. Given Amazon's willingness to fork over Ring data to police without a warrant, they don't have high credibility in my eyes for what they are or aren't willing to do to achieve any kind of advantage.

ebertx · on Aug 8, 2022

While I agree that #1 and #2 are the biggest drivers for acquiring Roomba, I find it hard to imagine Amazon executives not actively discussing in-home robotics as a vector for increased data collection. It might be a few years out, but it will happen.

throwaway10436 · on Aug 8, 2022

Does anyone have any data on if Amazon robotics devs are treated any better than Amazon's reputation would imply?

I would fit squarely into this profile, but I didn't even bother applying. For context, I talked to over 20 bay area robotics companies, have upcoming on-sites at some of the self drivings and have passed technical interviews at other FAANGS. I've done everything from embedded computer vision / SLAM to drivers to kinematics.

_dw7s · on Aug 8, 2022

Reason #1 - I agree

Reason #2 - Nobody disagrees with this. The robots would probably clean better

Non-Reason #1 - There's a lot more information than the layout of the home which can be gathered by these robots. The layout of the home argument is a strawman. Of course the government knows how your rooms are partitioned. But the layout of your furniture, kids toys on the floor, water bowls for pets are just some of the things which can be detected just via blind exploration. The new models have cameras, I mean. And if you're saying 'but then all this data can already be inferred from your shopping history'. Yes, yes it can, but also another data point confirming speculative data can be valuable. Also, they can expose this data without your consent to the police and the data can be used against you in court. Think drug paraphernalia left of the floor or a stash of cash under your bed. Amazon's privacy policy states that they will share information with law enforcement in case of 'emergency'. A roomba with a camera turned on remotely is a wet dream for cops.

Non-Reason #2 - Yeah, for now.

Non-Reason #3 - `They already know this information` - they will know that information better. Whenever I hear this argument 'Oh, they already know'. No, they don't already 'know'. They have indicators about a particular piece of intelligence and now they'll have more. And the information you're talking about is the information which can be collected with the current sensors on the robot on the market. Who's to say what the future brings?

Reason #0 - That's incredibly reductionist about what a company the size of Amazon does. But sure, have it your way. They sell things. This will enable them to sell things better. But that doesn't dismiss the fact that Amazon did and does share your personal information with law enforcement without customer consent and without a warrant. And let's not forget, Amazon received the largest GDPR fine in history. I would not trust any of my data to Amazon, let alone invite them into my house.

Comments like yours, on a forum filled with people actually working on data collection related features makes me lose all hope that we'll actually reach a point in which the industry will side with the consumer.

Just to remind everyone about the 'this is fine..' comments. Amazon bought Ring: 'this is fine..' - shared footage with LEO agencies without a warrant and without consent. Amazon releases Alexa 'this is fine..' - 'Amazon and third parties are collecting and sharing Alexa voice interactions from Echo speakers with up to 41 different advertising partners'.

godelski · on Aug 8, 2022

This just sounds like 6 reasons, not 3 reasons and 3 non-reasons.