> We found that humans were able to infer the underlying program and generate the correct test output for a novel test input example, with an average of 84% of tasks solved per participant
> We found that humans were able to infer the underlying program and generate the correct test output for a novel test input example, with an average of 84% of tasks solved per participant