> However, they don’t actually have to land it 500 times - all they need is an engineering analysis
That very analysis will include the fact that they have no launch abort system like normal capsule based systems and that the belly flop landing is much more dangerous than landing a capsule. Space Shuttle memories may come to mind. So the only realistic way to get the 1 in 500 confidence would probably be to really land hundreds of times for unmanned missions and not mess up even once. Granted, they may eventually get there, but I estimate this would take about two decades, if Falcon 9 is any indication.
> That very analysis will include the fact that they have no launch abort system like normal capsule based systems and that the belly flop landing is much more dangerous than landing a capsule. Space Shuttle memories may come to mind.
You are talking about this as if it is based on feelings as opposed to probability calculations. SpaceX will present a probability calculation to NASA, who then either accepts it or disagrees with it on engineering grounds, not "memories". There is no formal requirement for a launch abort system in NASA's safety standards – just a requirement that the probability of death be below a certain threshold. A launch abort system is one way to get that probability below the threshold, but if you don't have one, that's okay so long as you have some other way of getting there.
> So the only realistic way to get the 1 in 500 confidence would probably be to really land hundreds of times for unmanned missions and not mess up even once.
What they do is list every possible failure mode, give it a probability of happening and probability of a lethal outcome if it happens, multiply those two probabilities together, and then add them all up. If the result is above the threshold, they will fail certification. If the result is below it, they then have to convince NASA engineers that (1) their probability estimates are accurate and have sufficient evidence to justify them (2) there are no failure modes they've omitted in the analysis. For (1), actual flight experience is a valid source of evidence, but there are others as well – such as simulations and on-the-ground testing of components. There is no minimum number of flights required to gather sufficient evidence, it all depends on how much non-flight evidence is available and how NASA engineering evaluates that non-flight evidence (which is going to depend on the component or failure mode we are talking about).
You are looking at this from a "big picture" perspective, whereas NASA will be looking at it scenario by scenario, component by component
Starship separation from booster could be the launch abort system. There is no landing abort system.
My concern is that the flip maneuver is just too risky for landing with people on board even if Starship manages to do 200 perfect landings in a row (that would be a better record than the Shuttle).
I guess one factor which makes it less bad is that Starship can throttle its engines to hover (Falcon 9 can only throttle them down completely in a "suicide burn", which leaves no possibility of hovering in place). Starship also uses multiple engines for the landing. So if one fails, it still has a few others left, and perhaps even enough time to start another one.
That very analysis will include the fact that they have no launch abort system like normal capsule based systems and that the belly flop landing is much more dangerous than landing a capsule. Space Shuttle memories may come to mind. So the only realistic way to get the 1 in 500 confidence would probably be to really land hundreds of times for unmanned missions and not mess up even once. Granted, they may eventually get there, but I estimate this would take about two decades, if Falcon 9 is any indication.