It seems to me there is a reasonably low cost solution to demonstrate safety: SpaceX should load and unload the fuel multiple times for its human-qualification spaceflights. If SpaceX is correctly confident fuel loading is safe, the extra handling won't result in incidents. This would be a fairly minor expense given nothing is consumed or destroyed -- just handling costs and incidental losses due to evaporation and so on.
It's a good point that indeed there might be something different at actual-launch-time versus testing time, but I'd imagine that doing the whole procedure a few times (as complete as possible) would reveal any important unknown unknowns if present. (Where I define important as 'reasonably likely to occur': if they are indeed so likely, they'll probably occur with repeated tests.)