What is the core purpose of the new Self - Driving Nanodegree - to crowdsource knowledge or to graduate students funneling them into this sector? It seems like there are two uniquely different value propositions at play here and the advantage is entirely on Udacity's end.
Might be a good idea to mention that in the readme.md next to the datasets since they're hosted separately. The wording of the MIT license doesn't really help avoid confusion here.
In many countries you can not "release" works as public domain. E.g. here in Germany, a work can become public domain by expiring copyright, but creators have rights that can not be removed (not even voluntarily) before that.
(In such countries, a court might recognize the intent behind a claim of public domain from a country where it is possible, but that requires interpretation of the law of the country the creator is from. And it's likely not an option for locals, since the concept does not exist in local law)
Creative Commons created the CC0 license to work around this: it clearly lays out that a work is intended to be released as public domain, and failing that, all possible rights are granted and the creator doesn't intend to limit them in any way with his remaining ones.
The purpose of work contracts doesn't rule out any laws that prevent you from signing up as a slave...
(the comparison is appropriate because that non-revocable subset of rights in Germany's Urheberrecht - and other European countries and probably elsewhere - is based on a notion of human rights)
This is just a database of normal driving. That's useful for learning how to follow the car in front and avoid stationary objects, but not much else. It's going to result in systems that drive like humans right up to the point they do something really bad.
A more useful database would be the one Nexar is accumulating.[1] They collect dashcam imagery of events where the driver did a hard brake or the system detected some other hazardous condition. That database could be used to train a system which recognizes trouble before braking starts.
Both systems need a much wider field of view. Probably at least 160 degrees, so cross traffic shows up before the collision.
Definitely good points. We have three cameras that are arranged colinearly along the whole width of the windshield, so this dataset has a pretty big effective field of view. And while it is currently limited in a lot of ways, it's just the beginning of the types of data we will be releasing. Everything will start scaling up to cover more use cases, as this data is mainly meant to support the training of a visual network for steering wheel predictions. For the moment, we actually do want to train the networks to drive like "normal" humans in normal situations. Thanks for your thoughts!
A great, widely-used, dataset that teams can benchmark against is a superb start. Kudos to Udacity on this. I'd love to have a blind test set as well that teams can test and rank against.
There will be a blind test set for the challenge itself! Including a public leaderboard. We are asking the world to compete on building the best vision based network for predictive steering.
Holy crap. And yes, it definitely is; if you haven't looked up Industry 4.0 or Industrial Internet, that entire sector is making a push to sensorize.
As a rearguard defense: Yes, but, what percent of the time is CERN running an experiment [that generates that data]?
According to a quick Google search, average time spent driving is 101 minutes / day.
Totally makes sense that CERN (and, likely, any large Science! efforts) produces that level of burst data, but wouldn't these cars produce more data over time?
As a different topic, a different friend of mine is of the opinion that AI is dependent on the throughput of data through the system; think about the amount of information your body feeds to your brain, and how much time it was doing that before you were capable of communication.
Similarly, I've worked on robotics platforms where we had to down sample incoming sensor data, otherwise the perception algorithms (which are very statistical) wouldn't be able to spin fast enough.
It's worth pointing out some other awesome datasets, such as:
• http://research.comma.ai
• https://devblogs.nvidia.com/parallelforall/deep-learning-sel...
• http://data.selfracingcars.com
Any questions I can answer?