Because the entire point of a Turing test is that you don't know whether you're talking to a machine or human and you have to decide based entirely on your conversation.
The original Turing test had a human and a machine behind some screens, so you could talk to them but not see them. Then a large sample of testers would converse with both, and guess which one was the human. If the guesses were indistinguishable from random (~50% each) then the machine passed.
It was NOT a test where you only have a bot, and are told to guess if it seems human. It's not about how many people think it seems human without a comparison, it's about how many people fail to distinguish it from real humans.
I haven't been able to run down the parameters of the test in this case. I imagine there are some humans mixed in to obfuscate the machines? (Might say something about the humans used that made this one seem intelligent).