Vehicle-induced vibrations would result as significant (overpowering any noise) peaks on one or more accelerometer axis, and their timestamp (relative to other peaks) will precisely align with peaks on any other device on that vehicle.
As a layman I think you can just sum all axis for each person, overlay the resulting track with everyone else's and try to find a position where enough peaks correlate (constrained within a 5 minute timeframe from the on-device timestamp to account for clock drift while limiting the search space) and that should work well enough.
I'm sure the sociopaths working for Facebook will have a smarter way of doing this that's even more accurate.
As a layman I think you can just sum all axis for each person, overlay the resulting track with everyone else's and try to find a position where enough peaks correlate (constrained within a 5 minute timeframe from the on-device timestamp to account for clock drift while limiting the search space) and that should work well enough.
I'm sure the sociopaths working for Facebook will have a smarter way of doing this that's even more accurate.