More

GRVYDEV · on Feb 1, 2024

Dropped you an email!

GRVYDEV · on Feb 1, 2024

You can aggregate data from multiple different sites with the hope that it gives you a true picture of what the company does

GRVYDEV · on Jan 31, 2024

Thank you for this thoughtful response. We have the same mindset. This is exactly what we are building. I’d love to chat with you more on this.

GRVYDEV · on Jan 31, 2024

Thank you for the feedback. Apologies for the crappy response. This is still super nascent technology and I’m still working on improving the search process. In my eyes the real value here lies in data enrichment from other sources allowing you to get a really deep understanding of a company. I am actively building out that functionality now.

GRVYDEV · on Jan 31, 2024

Thank you for the feedback! I'll get to work on an About page.

What we are doing is automating bespoke account qualification and research. Say, for example, you are a company that sells software to chemical synthesis labs. Your ideal customer synthesizes more than 5 chemicals. They need to synthesize them in large batches and they need to have more than 10 chemists on staff.

To go out and find those ideal customers, right now, you would need to spend hours upon hours searching and manually qualifying each account. We are going to automate that. By providing more accurate leads we will reduce the amount of time that companies need to spend on prospecting while also increasing the open rate.

jll29 · on Jan 31, 2024

This is potentially useful for "partner search" functionality (as operated by governments and e.g. the European Commission) - e.g. when people want to team up to co-apply for research grants.

GRVYDEV · on Jan 31, 2024

This is a great use-case!

GRVYDEV · on July 3, 2023

Id love to add hotword detection! Or, even better, you (or someone in the community) could add it :)

GRVYDEV · on July 3, 2023

Hey! First of all thank you for this really detailed response! I am very new to the voice space and definitely have a TON to learn. I'd love to connect and chat with you sometime :)

I totally agree with you about latency. It is very very important for use cases such as a voice assistant. I also think there are use cases in which latency doesn't matter that much. One thing I think I may have understated about S.A.T.U.R.D.A.Y. is the fact that, at it's core, it is simply an abstraction layer over vocal computing workloads. This means it is 100% inference implementation agnostic. Yes, for my demo I am using whisper.cpp however there is an implementation that also uses faster-whisper.

I also want to call out that I have spent very little time optimizing and reducing the latency in the demo. Furthermore, when I recorded it I was on incredibly shoddy WiFi in northern Scotland and since the demo still depends on OpenAI a large chunk of the latency was introduced by the text-to-text inference. That being said there is still a ton of areas where the latency in the current demo could be reduced probably to the neighborhood of 1s - 1.5s. This will get better in the future :)

I want to touch on something else you mentioned. GPUs. I intentionally tried to avoid using any GPU acceleration with this demo. Yes, it would make it faster BUT I think a large part of making this kind of technology ubiquitous is making it accessible to as many clients as possible. I wanted to see how far you can get with just CPU.

In regards to your comments about NLU/NLP I have not dug into using them in place of LLMs but this seems like an area in which I need to do more research! I am very far from an AI expert :) I have a bunch of ideas for different ways to build the "brains" of this system. I simply have just not had time to explore them yet. The nice part about this project and demo is that it doesn't matter if you are using an LLM or an NLU/NLP model, either will plug in seamlessly.

Thank you again for your response and all of this information! I look forward to hopefully chatting with you more!

kkielhofner · on July 3, 2023

> Yes, for my demo I am using whisper.cpp however there is an implementation that also uses faster-whisper.

The benchmarks I referenced above show a GTX 1070 beating an Threadripper PRO 5955WX by at least 5x. Our inference server implementation runs CPU-only as well and is based on the same core as faster-whisper (ctranslate2) but our feature extractor and audio handling makes it slightly faster that faster-whisper. The general point is GPUs are so vastly architecturally and physically different - a $1K CPU can barely do large-v2 in realtime, while a $1k RTX 3090 is 17x realtime (4090 is 27x realtime).

Many demos, etc online that feature local Whisper use tiny - we've found that in the real world, under real conditions, Whisper medium is the minimum for quality speech recognition with these tasks and many of our users end up using large-v2. Using the same benchmarks above, this puts the floor for response time at 1.5 seconds (medium) for ~3 seconds of speech on CPU - just to get the transcript. I understand you're early but if you can eventually break five seconds with this all-local on any CPU in the world I would be very, very surprised and impressed! I suspect you'll find that even with the worst internet connection in the world OpenAI is still faster than llama.cpp, etc on CPU (they use GPUs, of course):

With highly tuned Whisper, LLM, and TTS our inference server is around three-four seconds all in (Whisper, LLM, TTS) for this task - on an RTX 3090 and I don't consider that usable (the LLM is almost all of that). Imagine trying to have a conversation with a person and every time you say something they stare at you blankly for 5-10 seconds (or more). Frustrating to say the least...

I suppose the point is that for these tasks Apple, Amazon, Google, OpenAI, etc all use GPUs (or equivalent) for their commercial products and that is the benchmark in terms of user expectations - and it's often still not fast enough and merely tolerated. For these tasks if you're bringing a CPU to a GPU fight you're going to lose - an RTX 3090 (for example) has nearly 20,000 cores and 935 GB/s of memory bandwidth. All of the software tricks and optimization in the world can't make CPUs compete with that.

That said, what Apple is doing with Apple Neural is very exciting but that's another accessibility issue - outside of HN most people don't have the latest and greatest Apple hardware (or Apple hardware at all). Not like many people just have GPUs lying around either but for today and the foreseeable future, given the fundamental physical realities, "it is what it is" - you either have specialized hardware or you wait.

Accessibility is important to us as well (why we support CPU only) but I question the value of accessible if it isn't near practical, and for many people waiting at least several seconds for a voice response puts these kinds of tasks in "take out your phone and do it there" territory, or in the case of already being on a desktop open a browser tab, type it out, and read it.

I DM'd you on twitter from @toverainc - let's do something!

jzombie · on July 4, 2023

There's a lot to read here, and maybe this is already implemented, but my initial thoughts were it was waiting for the full response to be generated before starting to read it.

GRVYDEV · on July 3, 2023

More-so an oversight :D

dang · on July 3, 2023

Ok, we've added some periods.

GRVYDEV · on July 3, 2023

It's made with https://excalidraw.com/

GRVYDEV · on July 3, 2023

Thank you for the very kind words :)