More

Plough_Jogger · on Feb 15, 2023

The team at R-Zero Systems has a Far-UV product on the market: https://rzero.com/vive/

Plough_Jogger · on Feb 13, 2023

My guess is that they are simply finetuning the model on your provided data: https://platform.openai.com/docs/guides/fine-tuning

Plough_Jogger · on Feb 13, 2023

I have a feeling we will see a resurgence of some of the ideas around expert systems; current language models inherently cannot provide guarantees of correctness (unless e.g., entire facts are tokenized together, but this limits functionality significantly).

Plough_Jogger · on Dec 20, 2022

If you'd like to determine if other words are harmful, feel free to use this GPT playground: https://beta.openai.com/playground/p/E8MTssK5Xk4dphyWFDtGwDb...

Plough_Jogger · on Sept 29, 2022

They forgot the "Testimonials" from fake customers.

Plough_Jogger · on July 28, 2022

My sense from the few projects I've seen attempt to use Druid is that there is quite a lot of infrastructure overhead / DevOps support required to manage a cluster at scale, and that fairly complex ingestion pipelines are required to load the data in the right format.

Anecdotally, I've heard that ClickHouse is easier to deploy from this perspective with similar performance, but would love to get others views / experience with these and similar data stores.

didip · on July 28, 2022

We found the opposite. Setting up Druid clusters is the easiest compared to its competitors: Clickhouse and Pinot.

Especially since ingestion goes straight to S3. We don’t really worry about backups (just deal with PG backups).

Just make sure your ZK is happy and all will be well.

The hard part about Druid is tuning:

- the ingestions: Spec definition, compaction, sharding strategy, RAM consumptions, etc.

- and query performance: RAM consumptions, number of threads, timeouts, etc.

doliveira · on July 28, 2022

How is the documentation for this deployment and tuning? Last time I checked I had the impression that anything about the dozen of different node types weren't very clear, not to mention details about the ingestion were all over the place.

And it was hard to find examples of configuration, ingestion other than basic tutorials

didip · on July 28, 2022

Docs definitely have rooms for improvement.

Architecturally, It is easier to visualize this two big group:

- query serving: coordinator, historical, broker

- ingestion: overlord, middlemanager

router unifies all of Druid API together.

I would start with the Helm chart to get some basic idea on tunings.

hobofan · on July 28, 2022

> Just make sure your ZK is happy and all will be well.

That sounds like the opposite of easy to setup (and maintain).

dominotw · on July 28, 2022

We had druid in production and this was our main weak link. Its really hire to find people who know how to operate ZK well in production.

jhgg · on July 28, 2022

Strange - we use Druid at scale at work, and it doesn't really require much maintenance. In fact, for the most part aside from some incidents caused by our own misconfiguration, it buzzes away just fine. Runs on Kubernetes. Most annoying thing was tuning segment compaction.

It's actually quite nice - because since everything is stored on GCS/S3, it is mostly self healing, we can treat the historical as cattle and not pets.

We also run clickhouse, and unfortunately the above is not true - at least in our setup.

ing33k · on July 28, 2022

ClickHouse is great. Been using it without any major issues.

We started experiencing some issues with data duplication with ClickHouse when we moved our table to a Sharded+Replicated setup.

Optimize with DEDUPLICATE helped us a lot.. and we can just run this on a Partition instead of the full table.

https://clickhouse.com/docs/en/sql-reference/statements/opti...

ClickHouse is a very powerful system but it's not just setup and forget type.

magundu · on July 28, 2022

I agree the same. Clickhouse is the best if you are looking for Analytics database.

fuziontech · on July 28, 2022

This is one of the reasons why we chose ClickHouse as our backend at PostHog. We wanted something that was relatively simple to operate for our users who deploy PostHog on-prem. We've been super happy with it so far. Still not turn key in many ways, but it's been pretty great.

david38 · on July 28, 2022

I run a ClickHouse cluster. I’ve heard Druid is more difficult, but ClickHouse isn’t exactly a piece of cake. It’s great for a while, but sometimes you can get bitten by weird states. Still, compared to everything I’ve used, for the use case where it excels, it really excels.

kpfly2022 · on July 28, 2022

clickhouse is difficult to operate and does support join well, i use https://github.com/apache/doris replace our clickhouse and druid workloads

drewbug01 · on July 28, 2022

Clickhouse benefits from the ability to get started as a single binary (and even includes a clickhouse-local binary to do ad-hoc analysis on CSVs on your laptop). There’s only one node type, as well. It’s simpler and easier in that sense.

Running it at scale is different. It includes everything you need, and it’s not horrible of course - but there are certainly a lot of sharp edges to be mindful of.

metadat · on July 28, 2022

Druid is horizontally scalable by itself if you have access to something like S3 or any compatible Object Storage. Druid's core design is remarkable, designed from the ground up to optimally leverage and work harmoniously with cloud tech. Once it is setup appropriately for your use-case, it's trivial to stamp out over and over with Terraform.

mohitsingh · on July 28, 2022

Several years back, I was running it on a single server along with Kafka ( I was requested to keep everything related to analytics to a single huge server ) and it started with a fight for zookeeper between these two. While it worked quite well thereafter, keeping it up was a battle.

Maybe situation is better with clusters and Kafka moving away from zookeeper.

doliveira · on July 28, 2022

I feel like it shares a lot of the complexity with the rest of Hadoop-adjacent products. If your company already manages standard Hadoop infra, it's probably not too different, otherwise it seems a quite bumpy road.

cultureulterior · on July 28, 2022

Agreed- we use Druid, it's quite successful, but the Druid hosting situation up until this point has been pretty lousy.

Plough_Jogger · on July 21, 2022

This review omits techniques from reinforcement learning (especially bandits) that have been used successfully in industry for years now.

bertil · on July 21, 2022

I think that the main issue is less the technique (although… yes, please use RL if you can) and more the lack of data. Browsing gives very little insight: dwell-time is a poor proxy for interest, and mixes horrid ideas that are so bad they are worth sharing with friends and confusing photos where you need to squint to figure out if it’s what you are looking for.

Both e-commerce and social media are really not good at gathering express feedback for what people want and valuing that expressly. Please, let me tell you that I did spend time looking at this thread about the latest reality TV scandal but I don‘t want to hear about it ever again! Please, let me tag options as “maybe” or let me tell you what you’d need to change for me to buy that shirt. Public, performative Likes and Favourite lists that are instantly reactivation spam-fodder… Come on, you know better.

I used to work for a big e-commerce site (the leading site for 18-25 y.o. females). We had millions of references (really) and it was a problem. The search team had layers upon layers of ranking algos, incredible papers at conference… but still, low impact on conversion. It was more than anything else that we could do, but nowhere as transformative as it could be. Instead, I suggested copying the Tinder interaction in a companion app:

* left, never see that item again;

* right, add it to a long list of stuff you might want to revisit. We probably would have to separate that from the Favourite list to avoid clutter, but maybe not, to make that selection worthwhile.

The learning you could get from that dataset, even with a basic RL algo to queue suggestions… People thought it was “too much” which I’m still bitter about.

jeffreyrogers · on July 21, 2022

How are bandits used in consumer choice problems? Bandits solve almost the inverse problem: which choice to offer/take when it's uncertain which is best, but the problem under consideration in the blog post is about predicting which choice a consumer will pick, a standard marketing problem.

Plough_Jogger · on July 20, 2022

Is this referring to the first version of the model, or DALL-E 2?

Plough_Jogger · on May 26, 2022

No, you are not your own relative.

> a thing having a relation to or connection with or necessary dependence on *another* thing

actually_a_dog · on May 26, 2022

Fine. A "family member." I'm a member of my family.

But, also: https://dictionary.cambridge.org/us/dictionary/english/relat...

Integrity: still intact.

I hope you enjoyed this little exercise.

Plough_Jogger · on Aug 16, 2021

Previous [flagged] discussion on this: https://news.ycombinator.com/item?id=28203148