I personally like that it's using speech recognition.
First, chatting and speaking are not nearly using the same skills, training for one does not necessarily train the other and you can end up having a hard time to find the words you want on the spot.
Secondly, speech recognition while not perfect, does help to make you understood by a native speaker. Speech recognition is usually working best on what's considered some of the most neutral accents in the target language, which is as a foreign speaker, exactly what you want. Seeing the recognition failed is a clue that you might need to train again to speak those words.
> TTS can't model speech accurately (it lacks emotions etc.)
I do agree on this last part though and usually TTS lacks support for other accents.
I agree that chatting and speaking requires different skill set. However I would argue that it is even more of an argument to not use speech recognition here (or at least not to force it), because chatGPT is chatting and learner is speaking. Transcription will always lose some information (for example your tone can indicate sarcasm, but chatGPT can't detect it).
To the second point: whisper can be helpful, but how can you know if it fails because of you and not the software's error? I spoke in my native language with traditional accent and it still made mistakes, also it hallucinates. Additionally being understood by whisper doesn't mean, native will understand you.
I do agree that the text generated by ChatGPT isn't really "natural" but more verbose and text oriented, some things that a native speaker would not necessarily say and won't understand speech nuances. It's clearly not perfect, I'm really not claiming that it is.
It's a good tool that I'm going to use a few times per day though, there's no really substitutes to speaking to get better at it. I'm also using other methods and tools and this would be a minor addition to my learning schedule.
> I spoke in my native language with traditional accent and it still made mistakes, also it hallucinates.
I'm also in this scenario actually because I'm a native French speaker and I cannot make myself understood by Google or Siri at all because my accent is way too strong and far outside the training voices that they used.
It's kind of a paradox but it's less a problem for non native speakers in my opinion who are trying to pick the most common accents they can in order to be broadly understood.
That's understandable,these tools definitely can be helpful, but learners should know their limitations and problems.
Also I just think speaking to the actual native speaker is still much better practice, especially given tts quality. It even pronounces words incorrectly in Japanese (wrong pitch accent).
Oh yeah sure, there's no doubt about that, it's just that these ChatGPT tools have two massive qualities compared to native speakers, they are available at any time and timezone, day or night (even for just 2 minutes) and they never get bored, you can ask the most mundane questions over and over again to practice.
First, chatting and speaking are not nearly using the same skills, training for one does not necessarily train the other and you can end up having a hard time to find the words you want on the spot.
Secondly, speech recognition while not perfect, does help to make you understood by a native speaker. Speech recognition is usually working best on what's considered some of the most neutral accents in the target language, which is as a foreign speaker, exactly what you want. Seeing the recognition failed is a clue that you might need to train again to speak those words.
> TTS can't model speech accurately (it lacks emotions etc.)
I do agree on this last part though and usually TTS lacks support for other accents.