More

davidz · 2025-02-04T20:55:05 1738702505

It's really cool to see how you guys are using the voice AI stack to overcome language barrier.

(btw I work at LiveKit, so let me know if we could make Agents easier to use for your use case.)

davidz · 2024-10-04T23:15:38 1728083738

I'm working on a PR now :)

davidz · 2024-10-04T23:14:06 1728083646

Currently it does: all audio is sent to the model.

However, we are working on turn detection within the framework, so you won't have to send silence to the model when the user isn't talking. It's a fairly straight forward path to cutting down the cost by ~50%.

rukuu001 · 2024-10-05T05:17:54 1728105474

Working on this for an internal tool - detecting no speech has been a PITA so far. Interested to see how you go with this.

balloob · 2024-10-05T07:28:52 1728113332

Use the voice activity detector we wrote for Home Assistant. It works very well: https://github.com/rhasspy/pymicro-vad

ValentinA23 · 2024-10-05T14:38:19 1728139099

What if I'm watching TV and use the AI to control it ? It should only react to my voice (a problem I had that forced me to use a wake word).

davidz · 2024-10-05T18:29:59 1728152999

currently we are using silero VAD to detect speech: https://github.com/livekit/agents/blob/main/livekit-plugins/...

it works well for voice activity; though it doesn't always detect end-of-turn correctly (humans often pause mid-sentence to think). we are working on improving this behavior.

pj_mukh · 2024-10-05T19:20:38 1728156038

Can I currently put a VAD module in the pipeline and only send audio when there is an active conversation? Feel like just that would solve the problem?

davidz · 2024-10-02T02:19:50 1727835590

only if you buy enough tokens.

davidz · on Sept 27, 2022

Speculating here, but I would read this as "anycast" as a concept, where each user is connected to the closest location. versus anycast as in the IP protocol. The complexity far outweighs benefits with routing each UDP packet to different servers within the same session.

yencabulator · on Sept 27, 2022

Cloudflare uses Anycast for the TCP connections they terminate. See e.g. https://blog.cloudflare.com/magic-transit-network-functions/ or ponder DNS-over-HTTPS to 1.1.1.1

I don't think they've talked much about what happens if the connections gets routed to a different PoP mid-stream.

davidz · on May 20, 2022

Thanks Roshan!

davidz · on May 20, 2022

Arcas is fantastic! They are doing some really great work with WebRTC.

davidz · on Sept 21, 2021

> However, it's still a lowest common denominator solution with respect to what codec is being used. The broadcasting "peer" has to send using a codec that every other peer supports. Essentially that means you're stuck with the lowest common denominator codec.

This is correct, though I wouldn't call them "inferior codecs": all WebRTC capable browsers support both H.264 and VP8 as required by the specs.

davidz · on May 17, 2019

It's fine to dislike Medium and run your own blog, but I find the article intellectually dishonest in comparing Medium to Facebook. One harvests your data and lets advertisers target you with ads, the other charges a subscription to provide access to content.

It's disingenuous to claim they "sell whatever advertising they please", and link to a ToS screen cap that's completely unrelated.

Companies have to make money in order to pay their developers and pay for servers. Personally, I'd rather pay them instead of seeing ads plastered everywhere.

davidz · on Oct 20, 2016

"How quickly can we get this into people's hands? If you read the papers, you see maybe it's three years, maybe it's thirty years. And I am here to tell you that honestly, it's a bit of both."

This statement is based on what he is aware of at the time. The most pessimistic side of it is purely a data game, and Tesla's ability to have 100k cars collecting data for them will give them a significant advantage to accelerate their progress.