The key piece will be when it queries multiple services by default and compares the answers to its own inferences, and is prompted to trust majority opinion or report that there isn't consensus. The iterative question about moons larger than Mercury in the Wolfram Alpha thread is a simple example of iterative tool use.