Maybe you're right, I don't care about Tesla drama.
Here is one possible perspective from an engineering standpoint:
Same amount of $$, same amount of software complexity, same size of engineering teams, same amount of engineering hours, same amount of moving parts. One company focuses on multiple different sensors and complex fusion with some reliance on AI. Another company focuses on limited sensors and more reliance on AI. Which is better? I don't think the answer is clear.
The other point is that I am arguing that many people are over-stating the importance of the sensors. They are important, but far more important is the post-processing. Any raw sensor data is a poor actual representation of the real environment. It is not about the sensors, but about everything else. The brain or the post-sensor processing is responsible for reconstructing an approximation of the environment. We have to infer from previous learned experiences of the 3D world to successfully navigate. There is no 3D information coming in from sensors, no objects, no motion, no corners, no shadows, no faces, etc. That is all constructed later. So whoever does a better job at the post-processing will probably out perform regardless of the choice of sensors.
People absolutely get that. Their issue is that Tesla is only relying on visual data and then on what is a disingenuous basis, insist that this is okay because humans "only need eyes" or some other similar sort of strawman argument.
Here is one possible perspective from an engineering standpoint:
Same amount of $$, same amount of software complexity, same size of engineering teams, same amount of engineering hours, same amount of moving parts. One company focuses on multiple different sensors and complex fusion with some reliance on AI. Another company focuses on limited sensors and more reliance on AI. Which is better? I don't think the answer is clear.
The other point is that I am arguing that many people are over-stating the importance of the sensors. They are important, but far more important is the post-processing. Any raw sensor data is a poor actual representation of the real environment. It is not about the sensors, but about everything else. The brain or the post-sensor processing is responsible for reconstructing an approximation of the environment. We have to infer from previous learned experiences of the 3D world to successfully navigate. There is no 3D information coming in from sensors, no objects, no motion, no corners, no shadows, no faces, etc. That is all constructed later. So whoever does a better job at the post-processing will probably out perform regardless of the choice of sensors.