Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're using them wrong. Everyone is though I can't fault you specifically. Chatbot is like the worst possible application of these technologies.

Of late, deaf tech forums are taken over by language model debates over which works best for speech transcription. (Multimodal language models are the the state of the art in machine transcription. Everyone seems to forget that when complaining they can't cite sources for scientific papers yet.) The debates are sort of to the point that it's become annoying how it has taken over so much space just like it has here on HN.

But then I remember, oh yeah, there was no such thing as live machine transcription ten years ago. And now there is. And it's going to continue to get better. It's already good enough to be very useful in many situations. I have elsewhere complained about the faults of AI models for machine transcription - in particular when they make mistakes they tend to hallucinate something that is superficially grammatical and coherent instead - but for a single phrase in an audio transcription sporadically that's sometimes tolerable. In many cases you still want a human transcriber but the cost of that means that the amount of transcription needed can never be satisfied.

It's a revolutionary technology. I think in a few years I'm going have glasses that continuously narrate the sounds around me and transcribe speech and it's going to be so good I can probably "pass" as a hearing person in some contexts. It's hard not to get a bit giddy and carried away sometimes.



> You're using them wrong. Everyone is though I can't fault you specifically.

If everyone is using them wrong, I would argue that says something more about them than the users. Chat-based interfaces are the thing that kicked LLMs into the mainstream consciousness and started the cycle/trajectory we’re on now. If this is the wrong use case, everything the author said is still true.

There are still applications made better by LLMs, but they are a far cry from AGI/ASI in terms of being all-knowing problem solvers that don’t make mistakes. Language tasks like transcription and translation are valuable, but by no stretch do they account for the billions of dollars of spend on these platforms, I would argue.


LLM providers actually have an incentive not to write literature on how to use LLM optimally, as that causes friction which means less engagement/money spent on the provider. There's also the typical tin-foil hat explanation of "it's bad so you'll keep retrying it to get the LLM to work which means more money for us."


Isn't this more a product of the hype though? At worst you're describing a product marketing mistake, not some fundamental shortcoming of the tech. As you say "chat" isn't a use case, it's a language-based interface. The use case is language prediction, not an encyclopedic storage and recall of facts and specific quotes. If you are trying to get specific facts out of an LLM, you'd better be using it as an interface that accesses some other persistent knowledge store, which has been incorporated into all the major 'chat' products by now.


Surely you're not saying everyone is using them wrong. Let's say only 99% of them are using LLMs wrong, and the remaining 1% creates $100B of economic value. That's $100B of upside.

Yes the costs of training AI models these days are really high too, but now we're just making a quantitative argument, not a qualitative one.

The fact that we've discovered a near-magical tech that everyone wants to experiment with in various contexts, is evidence that the tech is probably going somewhere.

Historically speaking, I don't think any scientific invention or technology has been adopted and experimented with so quickly and on such a massive scale as LLMs.

It's crazy that people like you dismiss the tech simply because people want to experiment with it. It's like some of you are against scientific experimentation for some reason.


“If everything smells like shit, check your shoe.”


If the goal is to layoff all the customer support and trap the customer in a tarpit with no exit, LLMs are likely the best choice.


US can have fun with that. In EU well likely get laws that force companies to let us talk to a human if it gets bad enough.


I think all the technology is already in place. There are already smart glasses with tiny text displays. Also smartphones have more than enough processing capacity to handle live speech transcription.


What is the best open source live machine transcription tools would you say? Know of any guides that make it easy to setup locally if so?


I’ve had the exact same vibes around chatbots as an application of LLMs. But other than translation/transcription, what else is there?


> ...there was no such thing as live machine transcription ten years ago.

What? Then what the hell do you call Dragon NaturallySpeaking and other similar software in that niche?


Thru the 90s and 00s and well into the 10s I generally dismissed speech recognition as useless to me, personally.

I have a minor speech impediment because of the hearing loss. They never worked for me very well. I don't speak like a standard American - I have a regional accent and I have a speech impediment. Modern speech recognition doesn't seem to have a problem with that anymore.

IBM's ViaVoice from 1997 in particular was a major step. It was really impressive in a lot of ways but the accuracy rate was like 90 - 95% which in practice means editing major errors with almost every sentence. And that was for people who could speak clearly. It never worked for me very well.

You also needed to speak in an unnatural way [pause] comma [pause] and it would not be fair to say that it transcribed truly natural speech [pause] full stop

Such voice recognition systems before about 2016 also required training on the specific speaker. You would read many pages of text to the recognition engine to tune it to you specifically.

It could not just be pointed at the soundtrack to an old 1980s TV show then produce a time-sync'd set of captions accurate enough to enjoy the show. But that can be done now.


So, you started by saying

> ...there was no such thing as live machine transcription ten years ago.

Now you're saying that live machine transcription existed thirty years ago, but it has gotten substantially better in the intervening decades.

I agree with your amended claim.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: