@irfn - that's an interesting idea. will definitely try to create benchmark using my local M2 machine and llama3-7b, just for comparison.
yes, ollama and Bodhi App both use llama.cpp but our approaches are different. Ollama embeds a binary within its binary, that it copies to a tmp folder and runs this webserver. any request that comes to ollama is then forwarded to this server, and replied sent back to the client.
Bodhi embeds the llama.cpp server, so there is no tmp binary that is copied. when a request comes to Bodhi App, it invokes the code in llama.cpp and sends the response back to client. So there is no request hopping.
Hope that approach do provide us with some benefits.
Also Bodhi uses Rust as programming language. IMHO rust have excellent interface with C/C++ libraries, so the C-code is invoked using the C-FFI bridge. And given Rust's memory safety, fearless concurrency and zero cost abstractions, should definitely provide some performance benefit to Bodhi's approach.
Will get back to you once I have results for these benchmarks. Thanks for the idea.
Hope you try Bodhi, and have some equally valuable feedback on the app.
The question to about the obvious quality drop for Google is, Is this intentional? Perhaps some cost saving or ROI measures? Or the motive always was to just train their AI and we just helped with that?
>Defunkt makes it sound like it was a choice and they were a steward ... bs. If Google had a better offer, it would have been done.
Exactly! Total BS!
There are however some things that would motivate many founders to turn down their(Google) offers.
- Biggest issue with Google Acquisitions. They want the business and not the tech. They want to rewrite the tech riding their high horse.
- They often wont offer full time roles to employees of acquired companies and would keep them on contract positions pending "interview" for full time.
I'll cry if/when Google buys some enterprise software I happen to support one more time. The appeal for a lot of niche enterprise software is responsive/knowledgeable tech support, willingness to implement feaure requests, and their ability to meet SLAs. All of these instantly drop to zero in a google buyout. Also there's no guarantee that the software will even exist in a usable way within a year.
I say this purely anecdotally, though, so take it as such. It has been a point of considerable personal frustration in the past. One instance was particularly painful because we had engaged the particular company looking for several custom features, and were basically buying a sizable percentage of the product they sold, with the promise that they would work with us implementing those features. When google bought them out, not only did the ability to get features implmented vanish, but so did the ones in flight.
Not sure what you run this on and why its $50/day but sounds like a lot.
Convert your server cost to a pay as you go model. You are currently paying even for time when your server is idle.
Containerise your app and run on something like AWS Lambda which has a pay for what you use model.
- s3 is cheap for storing your model data and as long as you are just reading that should be fine.
- AWS Lambda has 1 million free invocations per month in free tier.
- Post that you pay $0.60 per 1 million requests
- AWS stack is just an example, you can easily use GCP Cloud run or any other equivalent service.
I have made a chatbot like this and I am not a software engineer.
The whole thing is trivial to make. The system prompt is trivial. I would even say it was harder to make a geocities page in the 90s than to make a chatbot like this.
3.5 turbo is great to play with but at scale it is brutal cost wise.
The real issue though is anyone who already has a working 3.5 turbo chatbot can knock this off in the next 5 minutes by just changing the system prompt.
I think this is the future though. The labor is pretty much trivial, the capital for tokens at scale is what matters.