Hacker News new | past | comments | ask | show | jobs | submit login
Using a BCI with LLM for enabling ALS patients to speak again with family (thevccorner.substack.com)
61 points by vasco_ 3 months ago | hide | past | favorite | 23 comments



This sounds like hocus pocus to me. I presume the "headset" band is some kind of EEG given that it's "non-invasive", and if so EEG signals are very crude, far too crude to map to anything like language beyond, perhaps, "I'm sad" or "I'm excited". They also only measure brain activity at the surface of the brain, and have no information whatsoever on anything below the first centimetre or so, if that. EEG headsets that aren't a full skullcap with sensors that have some kind of conductive gel applied to each one and a tangle of wires coming off the back are pretty much entirely useless, particularly when they're claiming signals as granular as language. Not only that, the "or" in the phrase "sensors embedded in a wristband or headband capture input from your brain" suggest that they think a wristband alone is enough to "capture input from your brain", which is an outrageous claim. Just because they hook this up to an LLM that churns out language doesn't mean much - we all know that LLMs are far too happy to generate language for its own sake. Unless they show a great deal more detail about how this is supposed to work this should be regarded as AI-induced magical thinking, and they're making wildly irresponsible claims that could generate a lot of false hope.


Those non-invasive headbands (which work very differently from implanted electrodes) are notoriously inaccurate at recording brain signals. Even scientific studies, which use advanced setups like the 10-20 system for scalp EEG, face unsolved challenges in removing noise from the data and in using the data to reconstruct underlying brain activity [0] – let alone making meaningful inferences about it.

Patients with locked-in syndrome (one of the use cases mentioned in the article, also called a pseudo-coma), or with other disorders of consciousness, are unable to protest, or to confirm the accuracy of the generative message which is being attributed to them. Communicating on your own terms and in your own words is fundamental to human dignity.

Meanwhile, this coincides with lukewarm reception of generative AI from consumers; perhaps it is the lack of autonomy of locked-in patients, which makes them an interesting segment to this new generation of ventures, scrambling for a ROI on the enormous over-investment in the sector.

The conference venues look lush tho.

[0] https://en.wikipedia.org/wiki/Electroencephalography#Artifac...


I've spoken to a lot of smart people on the topic of EEG (I'm in a very related field). I agree with you.

It's an extremely powerful tool for diagnosis of a limited range of conditions but it is not magic. Electrical signal gets attenuated heavily when signals are not on the outside of the brain. Even still, a headband like this is susceptible to noise from movement and other factors. You either need to correct for this with AI, which introduces a second source for error, or you need a very still user. I'm not convinced by the ability to "read minds" with the technology; I would need the man in the video answer some specific questions to be convinced.

Is this better than not being able to communicate at all? Yes.


What they need to provide is surveys from the patients without the device (even locked in patients can often communicate slowly via eye-scan interfaces). How well do the patients rate the system at aligning with what they want to say?

If they don't find that it aligns at all, then honestly that is worse than nothing. Imagine being locked in and your family communicating with an LLM pretending to be you - all while you have to watch and can't do anything about it.


It might be beneficial to the family though, but indeed not to you.


I'd argue the family doesn't have the right to feel better at your expense, over something that was no one's fault.


I spent a long time trying to do some sort of machine learning over the OpenBCI helmet's data with the eventual goal of moving a cursor, but I didn't seem to get anywhere. The data was indeed _extremely_ noisy, and I don't believe my model ever converged to anything useful.

That said, I was just a high schooler and so my method of collecting training data was to run the script and "think really hard about moving left". Probably could have been a good deal more sophisticated too.


If it's any consolation for young you, this is a really hard problem, even with electrodes implanted in the brain. There's an amazing podcast by lex with the team from NeuralLink, and they go into depth on how even with good neuron signals, there's still a lot of work on the software side to "moving a cursor". The first recipient of their implant still has to do a 10-30min calibration run every morning to be able to move a cursor reliably on the screen. So all in all, don't beat yourself up, it's a really hard problem even with good data.


It's good that people are trying to help ALS patients with technology, but most ALS patients are not in the terminal phase of their disease.

Most pALS have difficulties speaking, their voice is weak and sometimes there is a pronunciation problem. Many pALS use simple assistive devices like a personal voice amplifier, or a text to speech device.

Sometimes the problem is that "normal" people think pALS are deaf so they speak loudly but the patient can hear quite normally. Or doctors or other people ignore the patient and insist on speaking to the carer as if the patient cannot comprehend it.

https://alsnewstoday.com/forums/forums/topic/artificial-voic...

https://alsnewstoday.com/columns/microphone-voice-hear/


Have they only demo'd it with patients who can't speak? Seems like if they cracked mind reading it would work just as well on someone with full faculties to confirm it's accurate.


Are there any details on how this works? Based on what is available in the linked article, it looks like they have an LLM+RAG and are trying to pass off the responses as speech from the user. Done with full transparency, and right protections, this could be useful, but calling it BCI, and overselling it as user's voice (especially given voice cloning is also being done) can be misrepresenting it.


Agreed - I don't want to come across as negative, but this is certainly in the "extraordinary claims" category for me right now. If this works, it's obviously huge, but I really want to see third party validation before I can let go of my skepticism.

I would be very curious to hear about interviews with the patients (conducted through their current means of communication, eg: eye gaze interfaces). Are they finding that the speech generated by the system reflects their intentions accurately?

EDIT: the EEG peripheral they are using is 4 channels / 250 Hz sample rate. I freely admit I have little knowledge of neuroscience and happily defer to the experts, but that really doesn't seem like a lot of data to be able to infer speech intentions.


Even if the LLM hallucinates every word, just knowing when to say something versus stay quiet based on EEG data would be a huge breakthrough.


If that's all they were doing - showing when the patient wanted to speak - that would be fine. Presenting speech as attributable to that patient, though? That feels irresponsible without solid evidence, or at least informing the families of the risk that the interface may be just totally hallucinating. Imagine someone talking to an LLM they think is their loved one, all while that person has to watch.


You’ll get no argument from me there. The whole LLM part seems like a gimmick unless it’s doing error correction on a messy data stream like a drunk person fat fingering in a question to ChatGPT except with an EEG. It might be a really fancy autocorrect.

I’m just saying that EEG data is so unreliable and requires so much calibration/training per person that reliably isolating speech in paralyzed patient would be a significant development.


Definitely, seems like the wrong tool even, or not the right first one, surely you need some sort of big classifier for EEG patterns to words or thoughts/topics; then if used an LLM it would be 'clean up this nonsense into coherent sentences keeping the spirit of the ideas or topics that are mentioned'?


seems like they have built on top of HALO using generative AI now (with partnership from unababel?)


Human speech is 54 bits per second (and that is surprisingly uniform across languages). Bandwidth from a consumer-grade EEG headband is maybe four bits per second. Something doesn't add up.


I agree about the EEG part. I was curious how they managed to get that work, and found [0], which seem to confirm my guess: they didn't - they went for EMG instead. Now EMG sounds very plausible, given that it's well-understood, already applied for "controlling with thought" (prosthetics), and a person can learn to make their signal more clear/intentional, easier for the machine to understand.

As for 54 bits per second[1], that's assuming healthy person speaking, which is not relevant here. Communication systems for people unable to talk, write or sign because of ALS, paralysis, or similar things, do not have to aim for 54 bits per second! A few bits per second is already great! The alternative is no communication, or like half a bit per second but only when you're paying very close attention.

Here are some quotes from [0] about the most important aspects of the solution:

> “The LLM expands what you’re saying. And then I confirm before sending it back. So there’s an interaction with the LLM where I build what I want it to say, and then I get to approve the final message,” explained Pedro. (...) “The LLM that takes a basic prompt and expands it into a fully fledged answer, almost right away. I wouldn’t have time to type all of that in the natural way. So I’m using the LLM to do the heavy lifting on the response,” he added.

> He also pointed out that the wearer has absolute control of what they are outputting: “It’s not recording what I’m thinking. It’s recording what I want to say. So it’s like having a conversation.

So no magic here. Seems like a direct combination of:

1. Using EMG as input to get specific words/phrases;

2. Using LLM to expand those into full-blown sentences;

3. Using a TTS model to sound it out in a person's voice.

Feels like 2 and 3 could be applied to existing solutions across various ranges of illness and disabilities.

--

[0] - https://techcrunch.com/2023/08/18/communication-using-though...

[1] - Or is it 39? https://www.science.org/content/article/human-speech-may-hav....


Thanks for the reference, I suspected it would be EMG as well. Especially in the video you can see how the patient modulates his eyebrow, facial muscles, and mouth. The vestigial muscle movements can be decoded to speech with the help of LLM much more easily. Actually, if form factor was not a concern, this can be done even more easily with other sensors as well.


Halo, developed by Unbabel, combines a non invasive BCI with an LLM to enable ALS patients to regain the ability to talk with loved ones. The search for a CEO is on.


In their desperation to push the Nvidia stock price even higher, VCs are taking advantage of vulnerable people, who cannot defend their right to dignified silence, by putting computer generated strings into their mouths. What a disgrace.


"Please don't fulminate."

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: