"A lie", FSD as it stands right now is a lie. A few cars might be able to drive ...

Dylan16807 · 2025-06-05T10:10:53 1749118253

> I cannot fathom the amount of storage and processing that would take, to have that for every location with roads. On board, in the car? Maybe in 20 years.

Doing some napkin math, with 4 million miles of road in the US, if you wanted to store 1KB of data per meter of road, hundreds of data points, you'd only need 7TB for the entire database.

And the processing to make it shouldn't be anything special, should it? Collection would be hard.

genewitch · 2025-06-05T21:24:10 1749158650

> only [...] 7TB

Currently that would probably cost ~$500 per car to implement based on retail pricing of 8TB SSDs. It would need to be updated constantly, too, with road closures, potholes, missing signage, construction. With an external GPS unit like a tomtom, they had radio receivers in the power cord that tuned to traffic frequencies, if available, and could route you around closures, construction, and the like, so you need a nationwide network to handle this. Cellphone won't cut it. Starlink might, but regardless, you need to add that radio and accoutrements to the BOM for each car.

and i'm not talking about the processing of the dataset that gets put onto the 8TB SSD in the car; i am talking about the processing of the data on the 8TB SSD on the car while at speed.

furthermore, i am fairly certain that it would take, on average, more than 1.6MB per mile to describe the road, road condition, hazards, etc. a shapefile of all roads in the US - that which gets one closer to knowing where the lanes are, how wide the shoulders are, etc is 616MB. and it's incomplete - i put in two roads near me with fairly unique names and neither are in the dataset. So your self driving car using these GIS datasets won't know those roads.

I had an idea to put an atomicpi in my car, with two cameras. it has a bosch 9-dof sensor on the board, coupled with the cameras you can map road surface perturbations, hazards, and the like, which i believe will be much more than 1KB per meter, especially as you need "base" conditions and updates and current conditions (reported by the cars in front of you, ideally). the csv GIS dataset looks like this:

>OBJECTID,ID,DIR,LENGTH,LINKID,COUNTRY,JURISCODE,JURISNAME,ROADNUM,ROADNAME,ADMIN,SURFACE,LANES,SPEEDLIM,CLASS,NHS,BORDER,Shape__Length

> 568143,964990,0,0.07,02_36250355,2,02_39,Ohio,S161,DUBLIN GRANVILLE RD,State,Paved,4,88,3,7,0,0.000759951397761616

and i ran, for example `awk -F, '/PACIFIC COAST/ {sum += $4} END {print sum}' NTAD*.csv` and it spat out 79.04, which i think is a bit shorter than reality. Looks like the dataset i pulled is only "major roads" as well - but that doesn't explain 79.04 as the sum of lengths of all rows with "PACIFIC COAST" in them. It does show the total length of interstate 10 is 3986.55, which is roughly double what the actual length is (2460mi), so perhaps i'm just not understanding this dataset.

Anyhow 600+ MB for just that sort of information (plus shapes) for only a really quite small subset of roads in the US.

anyhow my thoughts are scattered, this input box is too small, and i'm not really arguing. Maybe it is possible, but it will raise the price thousands of dollars per auto, you need infrastructure (starlink will work) to update the cars, and so on. I'm prepared to admit i am wrong, but your comment didn't move the needle for me.

also, just to be fun, which self driving car could manage this entire drive? https://www.youtube.com/watch?v=sNqFN7KeOYE

Dylan16807 · 2025-06-05T21:52:51 1749160371

If you want such constant updates that's tricky to distribute and hard to collect, but let's put that aside for a bit. I want to focus on the amount of data and how the car would use it. With $500 of SSD being nice and cheap.

> i am talking about the processing of the data on the 8TB SSD on the car while at speed.

I'm not worried about that. The actual driving takes such powerful computers that even if there was a petabyte of total data, the amount the car would have to process as it moves would be a trickle in comparison to what it's already doing. Max 50KB per 10 milliseconds. And obviously the data would by sorted by location so there's very little extra processing required.

But you tell me, how many data points do you think you need per meter of road?

I really don't think you need millimeter-level surface perturbations all the way across. Mapping the precise edges of the road and lanes should only need dozens of data points, 4 bytes each. And then you can throw a few more dozen at points inside the lanes to flesh it out. You can throw a hundred data points at each pothole without breaking a sweat. Measuring the surface texture in various ways and how it responds to weather is only going to take a handful of bytes per square meter, in a way that repeats a lot and is easy to compress.

> 568143,964990,0,0.07,02_36250355,2,02_39,Ohio,S161,DUBLIN GRANVILLE RD,State,Paved,4,88,3,7,0,0.000759951397761616

That's an extremely inefficient format. Unnecessary object ids, repeating metadata over and over, way too many decimal places, and all stored as text.

But even then, your database is so tiny compared to the size I suggested that I don't think we can extrapolate anything useful. Even if we 4x it or whatever to compensate for a lack of rural roads.

genewitch · 2025-06-06T00:07:55 1749168475

Suffice to say that the 600MB just lets you draw the roads on a plane, it's like comparing an ascii art drawing of the road (from .csv/shp) to a digital still of the road (the amount of information you'd actually need). you absolutely cannot rely on "a couple of sensor [types]". I mentioned i have nearly a half million miles on the road. All of that prior experience influences my driving when i am driving someplace new. in that 8TB disk, you have to find a way to produce that "experience", except instead of my 0.5mm miles, you are talking about the aggregate "experience" of 0.5mm miles per road per unit of time (day for some places like I-10 through Los Angeles, month for others, maybe a year for some "rural" roads.)

none of this has to do with visual or proprioception. It's knowing "every inch" of road. It's knowing how far i can leave the center of the lane if someone else crowds me or goes over the center divider, because the shoulder is soft through here because logging trucks have been exiting the forest onto the highway. It's knowing what part of I-605 floods - not the whole thing, some lanes, some places, and "flood" means 2+ inches of water on the road surface, hitting it at speed makes a tidal wave flying into other lanes. If someone hits that in front of you, you're blind for a couple of seconds minimum. If we want to have semi trucks be "FSD" it needs to know, for the traffic and other conditions, how fast to go and what gear to be in to climb each hill, and then the hazards that are over the hill - that a trucker would know. Where's the gravel bed on more mountainous passes? Or more simply, what time of day neighborhoods are more likely to have people approaching or going through / out of intersections, blind or otherwise. How many "bytes" is that information, times every neighborhood? If many cars brake at the same place, there's probably a reason, and that needs to be either in the dataset or updated somehow if conditions change. You ever used Waze and had a report of something on the road or a cop parked somewhere, and it's nowhere to be seen? And that's updated much more frequently than the radio-info on the GPS systems i referred to earlier. Some roads become impassable in the rain, some roads ice more readily.

If this was easy/simple/solved, waymo et al would be bragging about it, the tech in their cars. Waymo (or the other one) specifically, because they cover less than 0.1% of road surfaces in the US, in some of the most maintained and heavily traveled corridors in the world. So, if anyone from a robotaxi company happens by and knows roughly how much storage is needed for <0.1% of the road surfaces in the US, then we could actually start to have this dialog in a meaningful way. Also i am unsure how much coverage robotaxis actually have in their service area. A "grid system" of roads makes mapping and aggregate data "simple", for sure.

This reminded me a bit of the idea that somewhere in the US there's a database of every sms sent to or from US cellular phones. "it's just text; it'll compress well" - belies how much text there is, there.

for reference, the map in my lexus is ~8GB, for the US. And that's just "shapes" and POI and knowing how the addressing works on each road. It doesn't know what lane i'm in, it doesn't track curves in the road effectively (the icon leaves the road while i'm driving quite often), and overpasses and the like confuse all GPS systems i've ever used - like in Dallas, TX where it's 4 layers high and parallel roads stacked. furthermore, just the road data on google maps for the nearest metro area to my house is 20MB. i have a recollection it goes real quick into hundreds of MB if you need to download maps for the swaths of areas where there is no cellphone reception, like areas in western Nevada. given 20MB for my metro, that's 40GB of just road shapes and addresses for the US, which is much more than the 600MB incomplete GIS files i downloaded.

so we've moved from fencing 600MB "text" data; to the actual data needed by a GPS to give directions, 8000MB. Your claim is that a mere 1000x more data is enough to autonomously self-drive anywhere in the US, at any time of day or year, etc...

you know who actually has this data and would know how big it is? Tesla.

Dylan16807 · 2025-06-06T02:53:15 1749178395

> prior experience

The part of the computer that knows how to drive is completely separate from the 7TB database of the exact shape and location of every lane and edge and defect.

> knowing how far i can leave the center of the lane if someone else crowds me or goes over the center divider

Experience, not in the database.

> knowing what part of I-605 floods

> Where's the gravel bed on more mountainous passes?

That goes in the database but it's less than one byte per meter.

> How many "bytes" is that information, times every neighborhood?

I don't know why you would want that data, you should be wary of blind traffic at all times, but that's easy math. There's less than a million neighborhoods and time based activity levels for each neighborhood would be about a hundred bytes. So: Less than 1 byte per meter and less than 100MB total.

> If this was easy/simple/solved, waymo et al would be bragging about it

This doesn't happen for two reasons. One they are collecting orders of magnitude more data than road info, two like I keep saying the collection is extremely difficult and I'm only defending the storage and use as being feasible.

> This reminded me a bit of the idea that somewhere in the US there's a database of every sms sent to or from US cellular phones. "it's just text; it'll compress well" - belies how much text there is, there.

Well we know how many meters of road there are. So it's basic multiplication.

I can tell you how many hard drives you need to store a trillion texts. It's five hard drives.

Google thinks the human race sends almost ten trillion text messages per year. So I guess you could store them all very easily? Why do you think it's not doable?

> Your claim is that a mere 1000x more data is enough to autonomously self-drive anywhere in the US, at any time of day or year, etc...

My claim is that 1000x is enough for utterly exhaustive road maps. Figuring out how to drive is another thing entirely.

genewitch · 2025-06-06T05:13:02 1749186782

ohhhh, we're arguing past eachother. I am unsure how to reconcile.

an SMS isn't just "140 characters/bytes" or whatever (i honestly don't care what your definition of "SMS" is). Of course you could fit 140 characters * 1e12 onto 5 hard drives. Where are you going to put the 1PB (for 1e12, but your own cite says it's 1e13, so 10PB) of metadata, minimum? the most barebones amount of metadata you need to actually have actionable "intelligence" is 1KB per message (technically i was able to finagle it to ~1016 bytes.) And that's for every message, even an SMS that is the single character "K".

you need the metadata to derive any information from the SMS. "Lunch?" "yeah" "where?" "the place with the wheel" "okay see you in 25, bring Joel" This is what you propose to save. (quick math shows you went off something like ~32TB of sms data per 1e12 messages)

in the same way that you propose that the shapes of a road and it's direction and distance "plus 1KB of metadata per meter" is enough to derive the ability to drive upon those roads.

It's pretty obvious that just using sensors is not going to get FSD. Maybe in the next 20 years we will develop sensor technology (and swarm networking and whatever else) that will allow us to dispense with the "7TB" of metadata. My argument is that: we need much more "metadata" than 1KB per meter to "know the road baseline, current conditions, hazards", much in the same way a text message is more than 140 bytes. Driving with "only sensors" and rough GPS has killed people. It does not matter if human drivers have more death per million miles or whatever, because i am strictly talking about FSD, what other people are calling level 5 (i'd even concede level 4; although i wouldn't be able to use a level 4 car where i live for roughly 1/4th the year - and other areas would have more than 1/4th the year.)

enjoy your night!

note: the metadata for a meter of road could be:

     {
       "road_segment": {
         "segment_id": 3500000,
         "meter_position": 128534,
         "coordinates": {
           "latitude": 32.5385,
           "longitude": -92.9222
         },
         "timestamp_added": "2024-06-05T23:57:34Z",
         "last_updated": {
           "timestamp": "2025-06-05T23:57:36Z",
           "delta_seconds": 2.0
         },
         "hash_signature": "89a25b6f3cd829e671bb9d42e8fae2c6",
         "road_type": "highway",
         "lane_data": {
           "lane_count": 4,
           "lane_width_m": 3.5,
           "shoulder_width_m": 2.0,
           "divider_type": "concrete barrier",
           "markings": ["solid white", "dashed white", "double yellow"]
         },
         "speed_limit_mph": 65,
         "road_material": "asphalt",
         "incline_percent": 1.2,
         "curve_radius_m": 150,
         "surface_condition": "dry",
         "weather_conditions": {
           "timestamp": "2025-06-05T23:57:36Z",
           "temperature_c": 25,
           "precipitation_mm": 0,
           "visibility_m": 5000,
           "wind_speed_mps": 2.5
         },
         "baseline_hazards": [
           {
             "type": "grade_crossing",
             "description": "Railroad crossing with signal lights",
             "location": { "latitude": 32.5386, "longitude": -92.9224 }
           },
           {
             "type": "roadwork",
             "description": "Permanent lane narrowing from past construction",
             "location": { "latitude": 32.5390, "longitude": -92.9230 }
           }
         ],
         "current_hazards": [
           {
             "type": "construction",
             "description": "Active roadwork zone with lane closure",
             "severity": "high",
             "location": { "latitude": 32.5387, "longitude": -92.9226 }
           },
           {
             "type": "downed_power_line",
             "description": "Reported electrical hazard near shoulder",
             "severity": "critical",
             "location": { "latitude": 32.5395, "longitude": -92.9232 }
           }
         ]
       }
     }

Obviously you can reduce this, but there's a minimum viable amount of metadata, that's my claim, and it's more than 1KB per meter. that snippet is ~1800bytes as is. the "current conditions" would not be part of the dataset on the "7TB" disk. that would need to be relayed or otherwise ingested by the car as it drives - the way my 2012 lexus tells me that i'm about to drive into a wild storm, but that's all the extra information i get out of its infotainment system. waze is a better example of the sort of realtime updates i expect a FSD to need; although i expect many times more points of information than waze has, maybe dozens, maybe hundreds more. and each "trick" you do to reduce the size of the metadata necessarily implies more CPU needed to parse and process it.

Dylan16807 · 2025-06-06T06:55:05 1749192905

> the most barebones amount of metadata you need to actually have actionable "intelligence" is 1KB per message (technically i was able to finagle it to ~1016 bytes.) And that's for every message, even an SMS that is the single character "K".

How did you reach that number?

I figure the most important metadata is source and destination phone numbers and a timestamp, and I guess what cell tower each phone was on. A phone number needs 8 bytes, and timestamp and cell tower can be 4 bytes, so that's 28 bytes of important metadata.

> (quick math shows you went off something like ~32TB of sms data per 1e12 messages)

I was going for a full 140TB of data. 20-30TB hard drives are available.

I did consider metadata, but I figured you could probably put that in the savings from non-full-length messages.

> Where are you going to put the 1PB (for 1e12, but your own cite says it's 1e13, so 10PB) of metadata, minimum?

Well for just the US it would be closer to 1PB. But, uh, I'd store it in a single server rack? (ideally with backups somewhere) As of backblaze's last storage pod post, almost three years ago, it cost them $20k per petabyte. That's absolutely trivial on the scale of telecomms or governments or whatever.

> My argument is that: we need much more "metadata" than 1KB per meter to "know the road baseline, current conditions, hazards", much in the same way a text message is more than 140 bytes.

I mean, I agree with you about needing extra information.

But that's why the number I gave is 10000x larger than your CSV. My number is supposed to be big enough to include those things!

> note: the metadata for a meter of road could be:

I really appreciate the effort you put into this. I have two main things to say.

A) That's less than a kilobyte of information. Most of the bytes in the JSON are key names, and even without a schema for good compression, you can replace key names with 2-byte identifier numbers. And things like "critical" and "Active roadwork zone with lane closure" should also be 1-byte or 2-byte indexes into a table. And all the numbers in there could be stored as 4 byte values. Apply all that and it goes down below 300 bytes. If you had a special schema for this, it would be even lower by a significant amount.

B) Most of those values would not need to be repeated per meter. Add one byte to each hazard to say how long it lasts, 0-255 meters, instant 99% savings on storing hazard data.

> each "trick" you do to reduce the size of the metadata necessarily implies more CPU needed to parse and process it.

CPUs are measured in billions of cycles per second. They can handle some lookup tables and basic level compression easily. Hell, these keys are just going to feed into a lookup table anyway, using integers makes it faster. And not repeating unchanged sections makes it a lot faster.

genewitch · 2025-06-06T07:01:28 1749193288

> A phone number needs 8 bytes

a phone number is not a 64 bit integer, like, just off the top of my head, a phone number can start with "0"

  {
    "message_id": "b72f9a6c-34d2-4ef9-89a5-623c1d7b890a",
    "timestamp_sent": "2025-06-05T23:57:34Z",
    "timestamp_received": "2025-06-05T23:57:35Z",
    "sender": {
      "phone_number": "+1-318-555-1234",
      "carrier": "AT&T",
      "device_id": "IMEI-354812345678901",
      "location": {
        "cell_tower_id": "LA5321",
        "latitude": 32.5385,
        "longitude": -92.9222
      }
    },
    "receiver": {
      "phone_number": "+1-225-555-5678",
      "carrier": "Verizon",
      "device_id": "IMEI-869712345678902",
      "location": {
        "cell_tower_id": "LA6723",
        "latitude": 30.4515,
        "longitude": -91.1871
      }
    },
    "network": {
      "protocol": "GSM",
      "message_type": "SMS",
      "message_size": "160 bytes",
      "sms_center": "+1-800-555-9876",
      "routing_path": [
        "Cell Tower LA5321",
        "Switching Center AT&T Baton Rouge",
        "SMS Center",
        "Switching Center Verizon New Orleans",
        "Cell Tower LA6723"
      ]
    },
    "status": "Delivered"
  }

and again - if you use clever tricks to reduce this, you increase the overhead to actually use the data.

get a celltower snooper on your phone and watch the data it shows - that's the metadata for your phone. SMS dragnet would need that for both phones, plus the message itself.

Dylan16807 · 2025-06-06T07:14:28 1749194068

> a phone number is not a 64 bit integer

It's not an integer. But you can store it inside 64 bits. You can split it into country code and then number, or you can use 60 bits to store 18 digits and then use the top 4 bits to say how many leading 0s to keep/remove. Or other things. A 64 bit integer is enough bits to store variable length numbers up to 19 digits while remembering how many leading zeros they have.

If you want really simple and extremely fast to decode you can use BCD to store up to 16 digits and pad it with F nibbles.

> JSON

Most of this is unimportant. Routing path, really? And we don't need to store the location of a cell tower ten million times, we can have a central listing of cell towers.

I don't think we really need both phone number and IMEI but fine let's add it. Two IMEI means another 16 bytes. And two timestamps sure.

Phone number, IMEI, timestamp, cell tower ID, all times two. That's still well under 100 bytes if we put even the slightest effort into using a binary format.

> and again - if you use clever tricks to reduce this, you increase the overhead to actually use the data.

No no no. Most of the things I would do are faster than JSON.