More

banditelol · 2025-05-31T01:17:52 1748654272

I tried this before, but since I often need to open different browser even if a link came from the same app, I ended up moving to https://github.com/will-stone/browserosaurus

Not to say you cant use both tho

banditelol · 2025-05-02T02:32:40 1746153160

I've tried airbyte, sling, and dlt (besides building several tools from scratch)

My best bet for now will be dlt if you have dedicated DE team, but sling will get you a long way for moving data around your warehouse

banditelol · 2025-05-02T02:29:35 1746152975

Hi, I've been loking something like this! Any of your custumer has success story migrating off bigquery to your platform? And how do you compare to motherduck? (Looks like you built some of ypur stack on top of duckdb)

mritchie712 · 2025-05-02T10:39:54 1746182394

Yes, we've had many bigquery / snowflake converts. The reality is, most companies don't have 100tb of data (which is what those platforms are optimized for). Motherduck has a good post[0] on this:

> There were many thousands of customers who paid less than $10 a month for storage, which is half a terabyte. Among customers who were using the service heavily, the median data storage size was much less than 100 GB.

I'm a fan of what motherduck is doing. We're building something different (opinionated, instant data stack), but yes, we both use duckdb under the hood.

0 - https://motherduck.com/blog/big-data-is-dead/

banditelol · 2025-02-24T01:26:39 1740360399

Anyone have tried comparing with Qwen VL based model? I heard good things about its performance on ocr compared to other self hostable model, but haven't really tried benchmarking its performance

jimmySixDOF · 2025-02-24T10:34:41 1740393281

Yes I'd like to see this repeated with any of the small VLM's like IBM Granite or the HF Smols. Pretty much anything in the sub 7B range.

banditelol · 2024-12-21T12:11:00 1734783060

Now you make me wonder if I could run this entirely inside pyscript

banditelol · 2024-12-02T12:39:30 1733143170

I think you want something aling the line of dvc (github.com/iterative/dvc)

banditelol · 2024-11-08T05:18:47 1731043127

Looking at the syncer it seems like copying data to csv from the whole table everytime (?) Code: https://github.com/BemiHQ/BemiDB/blob/6d6689b392ce6192fe521a...

I cant imagine until at what scale can you do this and is there anything better we can do before using debezium to sync the data via cdc?

Edit: add code permalink

exAspArk · 2024-11-08T14:39:42 1731076782

Our initial approach was to implement periodic full table re-syncing. We're starting to work on CDC with logical replication for incremental syncing. Here is our roadmap https://github.com/BemiHQ/BemiDB#future-roadmap

banditelol · 2024-10-18T00:47:20 1729212440

Lol I automatically read C&H as Cyanide and Happiness

banditelol · 2024-10-11T14:02:11 1728655331

I'm curious what kind of quick calculation do you usually use llm for?

Edited for clarity

golol · 2024-10-11T14:51:04 1728658264

Just earlier today I wanted to check if exp(inx) is an orthonormal basis on L^2((0, 1)) or if it needs normalization. This is an extremely trivial one though. Less trivially I had an issue where a paper claimed that a certain white noise, a random series which diverges in a certain Hilbert space, is actually convergent in some L^infinity type space. I had tried to use a Sobolev embedding but that was too crude so it didn't work. o1 correctly realized that you have to use the decay of the L^infinity norm of the eigenbasis, a technique which I had used before but just didn't think of in the moment. It also gave me the eigenbasis and checked that everything works (again, standard but takes a while to find in YOUR setting). I wasn't sure about the normalization so again I asked it to calculate the integral.

This kind of adaptation to your specific setting instead of just spitting out memorized answers in commonn settings is what makes o1 useful for me. Now again, it is often wrong, but if I am completely clueless I like to watch it attempt things and I can get inspiration from that. That's much more useful than seeing a confident wrong answer like 4o would give it.

banditelol · 2024-10-03T05:41:19 1727934079

If we're talking about chrome extension, yes.

But firefox extension expose API to inspect the response stream.

throwaway48476 · 2024-10-03T07:53:17 1727941997

Can you link the MDN page for this?