Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Driving dataset for car autopilot AI training (github.com/commaai)
100 points by EvgeniyZh on Aug 4, 2016 | hide | past | favorite | 44 comments


Cool, but this really confirms what I suspected about Hotz's car all along. He's just done the 'easy' bit - output steering angle on easy highways. That was done in the 80s (slower admittedly, but still).

Wake me up when it can drive here (to pick a random example):

https://www.google.co.uk/maps/place/London/@52.1986058,0.143...


That's not a hard case. It's just a narrow, straight road lined with obstacles. If you have sensors that can get a height field, that's no problem. Our DARPA Grand Challenge vehicle could have done that in 2005, using a LIDAR. Google wouldn't have any problem with that. Tesla would have a problem; they're dependent on lane lines.


The main difficulty with a side street like that is there isn't actually enough room for 4 vehicles across--2 parked and 2 going in opposite directions. Humans can deal with this pretty easily because it's usually fairly obvious who can most easily yield or even back up a little bit. It's not an especially hard problem and 2 vehicles that could formally communicate would make it easier yet. However, if the behavior depends on essentially social signals, that's somewhat harder but far from impossible to model. (You probably program the computer to politely yield if at all possible.)

However, as you say, this is yet another case where vehicles aren't expected to unthinkingly follow lane lines--if they even exist.


The point is that this is a two-way street with (effectively) only one lane, so regardless of how good your sensors are, your AI needs to predict and react to the actions of oncoming cars in order to successfully navigate this road.


As taco_emoji said, the difficulty is not avoiding the obstacles. What happens if a car is coming the other way. Do you reverse? Do you let them reverse? Did they flash you? Are they waiting in a 2-lane section? Should you wait? What if it's a bike instead - can you both fit?

Way more difficult than "don't hit the sides".


Comma.ai has stated their goals very clearly: they want to sell a retrofit kit you install yourself for under $1000 that allows level 3 autonomy under optimal circumstances, that being highway/freeway driving in good weather conditions.


And why would people buy that? That pretty much comes standard nowadays on a somewhat more high-end Mercedes, giving you a fully integrated system with the added bonus that your insurance won't refuse to cover you because you installed an unproven startups Python deep learning steering wheel.

I'm being facetious but this is how this will work out in the end for the first early-adopter to get into a crash.


Right. I wonder which year this will launch, and how long it takes until the first death


So uhh what happens when the weather conditions change? Will the system automatically turn off it's autonomous function and let you take over?


Cruse control disengages with user input (breaking) which seems to work well. Doing the same for steering input is easy.


Pretty much all most self-driving efforts has done is "the easy bit". Google's record is primarily focused on bragging about miles driven. And glossing over the fact the hundreds (or thousands?) of times their test drivers have to take over, presumably in difficult areas like intersections. If you see someone bragging about their "miles driven" as a key statistic, you should be wary of their actual accomplishment. Because racking up miles down a straight stretch of road is easy.

There's a massive amount of hype in the area, and a terrifyingly small amount of actual substance.


Google is driving in urban traffic which is never easy. Tesla is being driven 'by customers' an thus out of the companies control.

Look at any of the accident tapes and these are generally situations where people would likely also cause an accident. Just look at the number of people rear ended for daring to stop at a stop light.


The problem is Google's most bragged about statistic, miles driven, ignores the frequency at which test drivers intervened to prevent accidents. Google reports yearly to the California DMV how many disengages the system had, both due to driver intervention and due to system failure on the part of the self-driving system. They also report how many accidents would've occurred if test drivers hadn't intervened.

Those numbers are staggering, if you read the reports, and it's incredible that people believe these vehicles are anywhere near as safe as human drivers. We are nowhere near the point where you could remove the steering wheel... something Google has begged legislators to let them do. The important statistics about self-driving car safety are NOT the statistics Google is telling Congress or the media about their cars.


Average US in the US is 10,000 to ~15,000 miles per year with over 65 drivers averaging under 7,000. Overall an accident is reported to insurance very ~17.5 years. Fender benders are often not reported and disengagements could default to simply hitting the breaks. Two of those 13 likely accidents would have hit traffic cones, which is below the insurance threshold making the numbers hard to directly compare.

Still, ~1 accident every 250,000 miles is likely significantly better than human norms and Google is getting close to that level.

PS: People talk about bad weather, but surprisingly that's not nearly as important as you might think. Integrate weather reports and trucks could get off the road hours before a storm alternatively hand off to human drivers 30 minutes before expected bad weather.


https://static.googleusercontent.com/media/www.google.com/en...

269 disengages over their entire fleet per year and perhaps 13 hypothetical accidents of unknown magnitude (but some quite minor: "2 of these involved simulated contact with traffic cones") if they hadn't disengaged is 'staggering'?


A disengage should be understood to probably be an accident, given that it's a failure of the self-driving system without warning, and given that Google's argument is that the steering wheel should be removed. Contact with a traffic cone means the car could just as easily make contact with a baby in the street, you can not just say "since it hit something cheap, it doesn't count".

282 failures of their system across 434,000 miles is 1,539 miles between incidents. That's a system that fails more often than even old school cars need to change their oil.

Suffice to say, Google's claim that they're ready to remove the steering wheel and it's only the law holding them up, which is the argument they made on the floor of Congress... they're liars.

EDIT: Since I'm rate limited, my response to Retric below: False. The system disengages when the system has an error and has conflicting data about the state of the world around it. It isn't overabundance of caution, it's overabundance of confusion.


> A disengage should be understood to probably be an accident

I don't think it should. A disengage is simply a precaution. Whether it would have been an accident if the self-driving car had continued on is a question of counterfactuals, and they examine it with the recorded data and simulations of the car software to see what would happen; and they report that of the 269 disengages, only 13 probably would've led to an accident with physical contact with something it shouldn't've, and at least 2 of those are trivial. You claim to have read the report in detail, so I'm not sure why I have to explain this.


Google was really conservative because peoples lives where at stake. Disengage was not a system crash where the car had no idea what was going on, the car noticing a potential issue well ahead of time and handing it off to a driver which is a slow process and not capable of preventing an imminent crash.

PS: If they just wanted to add miles they would have operated on highways not the far more accident prone city streets.


Great example, this road is half a mile from my house. I've been wondering how self driving cars will deal with a street like this.


Or in a snowstorm.


I think Ford is the first company to really be trying to test in snow at all, but that's mostly focused on mitigating the problem of snow on the ground. The sort of snowstorms I've driven in, I don't expect a self-driving car to be able to do in the next sixty or seventy years at least.


Between highly detailed maps and multi-spectral/multi-method imaging, I think automated systems will quickly be far better than humans in snowstorms.

They will also likely have a much better handle on what speed is safe given the moment to moment road surface conditions and imaging quality.


under normal conditions, self-driving car systems are tracking white\yellow lines on the road. I am not sure what they are going to do when it is all white during snow. I know GPS is also used but it might not have latest info about road directions.


Furthermore, you don't just blindly follow marked lanes (even if you knew exactly where they are) in a snowstorm. You may follow tracks left by other vehicles, deal with the fact that two lanes in cities often effectively constrict to one, etc.

No, lots of people don't do a great job whether it's because they're driving too fast, not leaving enough room, or slamming on the brakes. But there's a huge amount of judgment and situational awareness needed in bad weather--especially snow.

Certainly, autonomous systems can (and almost certainly will) be restricted to certain roads/types of roads in certain weather conditions and still be useful driving aids. But as soon as you restrict systems to only working some of the time, you've effectively closed off any use that doesn't allow for having a competent driver behind the wheel at all times.


There's a miltary anomaly recognition system that has a person with sensors attached to his head, and he gets shown satellite photos in quick succession. His conscious brain might not notice things, but his subconscious sees things like man-made structures, and the sensors see this, and the images that cause stimulation are flagged. I wonder if a self-driving system would be made that uses the brain's image recognition facilities.

I would guess the person scanning those images would need to concentrate instead of talking to someone on the phone ot next to them. So the next step would be to outsource the required concentration to brains in Elbonia...


Hope the signal doesn't drop in that tunnel!


I don't think anyone's trying to do it presently, but if the car was creating a virtual 3D map of the road surface as it was driving, it could conceivably learn to drive in the tracks of the vehicles in front of it.

But yeah, a car without a steering wheel right now wouldn't do a lot of people much good, if the car only works in pleasant conditions in daytime.


This is very cool. I believe scientific papers, especially in the AI space, should habitually share all data that was used so others can repeat and build on the results.

That said, a few hours of highway driving is of course woefully inadequate for learning anything but steering in normal conditions on that particular highway, if even that. So this is not the "build your Tesla Autopilot" kit, even though the OP decided to use the word “autopilot” in the title.


> This is very cool. I believe scientific papers, especially in the AI space, should habitually share all data that was used so others can repeat and build on the results.

I believe the problem here is that most AI researches are commercial and not scientific, so no data.

> even though the OP decided to use the word “autopilot” in the title.

Yeah it might be a bit misleading. Honestly, I had some problems with formulating title in English


> I believe the problem here is that most AI researches are commercial and not scientific, so no data.

One would hope that OpenAI [1] would come through with some significant data sets for various areas in the future. There are some significant commercial names backing up the initiative (like Musk), so hopefully they will actually be relatively open and not protecting corporate interest until end of time.

[1] https://openai.com/


Most papers use public data sets. I agree there should be more public data sets, for nearly everything.

Having said that - what's the cost of doing something like this yourself? 7 hours of driving video, plus the positional/steering/etc data? Should be easy for almost anyone with enough inclination to gather hundreds of hours of data like this, if you really wanted to.


I suspect that a certain company whose name starts with a 'G' has been doing just that while at the same time they were filling out their streetview database.


And if not, upon seeing this they will be filled with regret.


The big question for me is... does comma.ai intend to continue to build a public dataset? Because for companies like Google, their primary "value" is their data, that's what they won't share with everyone else, it's what gives them their edge.

If a company was truly willing to share their large datasets down the road, that'd be a Big Deal.


I guess it could get quite hard to share public data sets like that once you get into hundreds of terabytes of video.

I'd say for autonomous cars it's the ML algorithms, rather than the data that's the threshold. I'd wage it would be relatively inexpensive for even smaller companies or researchers to pay a delivery/taxi/whatever company a small amount for a few thousand hours of dashcam footage


Yeah, I imagine the eventual data we'll want to have is the trained models, based on that input video.


Well, most researchers want the raw data, so they can build and train their own architectures, and benchmark results against each other


When you think about it, at least in Russia many people have video cameras in their cars, and you probably can buy their data for relatively small money, acquiring many hours of driving video. I'm surprised no company have done it yet.


In fact, this is effectively comma.ai's strategy! They released an app called chffr which is a dashcam which records all your driving and sends it to them.

https://www.youtube.com/watch?v=sD-8CteXJl4&list=PLlDK3GJKKj...

As a privacy-minded person, I'll pass on participating in that, but it's an interesting approach to take, since they're a small company and can't afford to do Google-scale things like deploying company cars nationwide.


probably you just wouldn't want to train your AI on Russian driving. I mean it is like R rated movies - you don't want to expose unprepared mind to it until it reaches stage of maturity that would allow to handle it :)


Using videos from vehicles in accidents could be useful. Take the last seconds before the accident, extract a top-view model, and try to train for ways to detect imminent trouble and avoid it.


Author George Hotz, also known as geohot, is the author of first working iOS jailbrake. He decided to enter self driving car area[1]. Apparently he founded a new company called comma.ai [2]

[1] http://www.bloomberg.com/features/2015-george-hotz-self-driv...

[2] http://comma.ai/


Once I read George Hotz.. I knew i recognized the name.


This creates a model for driving that totally ignores things that can go wrong. It will work great in the normal case, and totally screw up if anything unusual happens. There's no model of "obstacle" or "oncoming vehicle". That's unsafe.

This is a field where bug reports are written in blood.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: