mkarlsch's comments

mkarlsch · on Aug 2, 2022

I can only second this. The placeholder in the search bar is “ex: tacos, breakfast, …” so I searched for tacos - zero results. Having a bit more content would likely leave a better first impression.

nathanfromny · on Aug 2, 2022

So, we were trying to display the recipe creation software but I realized now that we should upload some of the couple hundred recipes we have. I have just been mainly sharing this with engineering friends to get input on the software, not the recipes.

But, I'm going to begin uploading them all over the next few days. I am also getting involved with people who have more recipes to upload some of their own.

honkdaddy · on Aug 3, 2022

Why would I want to take the time to upload to your service a recipe I already know?

mkarlsch · on May 2, 2022

We help app businesses to grow.

Our stack: Go, Kafka, Druid, Bigquery, ElasticSearch, Python+Tensorflow, Ruby/Rails, Typescript+React, lots of GCP services, Ansible + Terraform.

You would help us in designing, building, improving and operating services that process several trillion event per - coming with an "interesting technical challenges included" guarantee.

At remerge we put culture and people first:

- we are fully transparent (from salaries to company financials)

- clear role design and a growth framework - so you know where you stand, what's next and how to get there

- our renumeration includes share options, an education budget and a percentage of the company profit as end of year bonus

- unlimited vacation days (with a minimum you have to take) + "fully covered one month work from any office" program

- work from home/remote support (setup + internet reimbursement)

Apply at:

- https://www.remerge.io/careers/5106614003 (backend)

- https://www.remerge.io/careers/4953590003 (frontend)

- https://www.remerge.io/careers/4956657003 (machine learning)

- https://www.remerge.io/careers/4964502003 (VP Engineering)

mkarlsch · on June 25, 2021

We used Apache Drill a few years back and it could do exactly that - run locally and provide a SQL interface to query parquet files. Still great to see more and better options!

wenc · on June 25, 2021

Drill is a fair choice (it supports tons of formats), but for me it was a bit heavy -- it required Java and plus it returned results in a form that wasn't easily manipulated (the result is a table in the console drawn with text-chars [1]). If you had a really long result set, you couldn't do a buffered scroll due to text UI limitations -- everything gets printed out and you have to either do manual pagination using LIMIT clauses or just have a really long scrollback buffer. This kind of breaks the continuity of many data analysis workflows.

With DuckDB, the return result is a Python data structure (list of tuples), which is amenable to further manipulation without exporting to another format.

Many GUI db tools often return results in a tabular form which can be copy-pasted into other GUI apps (or copy-pasted as CSV).

[1] https://drill.apache.org/docs/drill-in-10-minutes/#querying-...

sixdimensional · on June 26, 2021

You didn't connect to drill using JDBC? You can use any open source GUI with that, like DBeaver.

wenc · on June 26, 2021

Oh I didn't think to do that. That's something I could definitely explore, particularly for apps like Tableau. Thanks!

mkarlsch · on Feb 9, 2021

That looks great, especially some of the unique optimisations. I gave it a nonscientific test run with a set of 1k different 2-4kb JSON encoded messages that we saw in our day to day traffic using the default Go benchmark library. Compared to easyjson (generated parser) goccy/go-json is unfortunately 20-25% slower and allocated four times the number of bytes.

Any chance that you will implement code generation like easyjson?

jdknezek · on Feb 9, 2021

Are you unmarshaling to interface{}? Have you tried commenting out the easyjson json.Un/Marshaler implementations? By default they wrap the easyjson code and add reflection overhead.

I had the same results when testing with a project that is already using easyjson, but after commenting out easyjson's json.Un/Marshaler impls I am seeing much improved performance using goccy/go-json.

mkarlsch · on Feb 9, 2021

Indeed. Thanks for the hint! While the allocations are still 2.5x, not by accident calling into easyjson makes goccy/go-json perform faster than easyjson with generated code by ~10%. That is some impressive result.

goccy · on Feb 10, 2021

Thank you for your report. It's an interesting result. If you have code that can be reproduced, I will try to optimize it to be faster than easyjson.

mkarlsch · on Jan 15, 2020

Agree! Both reports are a pretty good and reflect the status quo in the industry. There is an overcollection and oversharing of data without proper consent and that has to stop. For Europe - forcing the ad-tech industry to adhere to the GDPR is the correct next step as self regulation did not work.

Having said that the report paints a pretty dark and one-sided picture. Let's see how far the authorities will follow their argumentation / conclusion.

(full disclosure: I work in that industry.)

mkarlsch · on July 10, 2018

Interesting project, however it looks like the last commit is from September 2017 so either it is just very stable or not maintained anymore?

mkarlsch · on July 2, 2018

We wrote our bidder (in app advertising) in Go. It is globally distributed (close to the exchanges) and handles 1.5-2M requests/s (OpenRTB,~50-90k/s per instance) with a p99 of 10-20ms (without network latency). Really happy with Go, especially the GC improvements done by the Go team in the last few releases. For a previous similar project we used Ruby which was quite a bit slower.

kasey_junk · on July 2, 2018

Similar problem space (we don’t bid but capture rtb data for analysis).

Similar throughput, our bottleneck at this point is moving data around.

We’ve abandoned channels for most of this. The next major improvement would be to rebuild the http stack & that’s just not worth it.

jonathan-kosgei · on July 2, 2018

Someone shared this with me https://blog.golang.org/share-memory-by-communicating - Share memory by communicating.

And also pointed me to nats.io, a messaging system that handles 10M messages per second on a $50 server.

See the comment at: https://www.indiehackers.com/forum/how-we-handle-25m-api-cal...

nemothekid · on July 2, 2018

Have you checked out https://github.com/valyala/fasthttp

jonathan-kosgei · on July 2, 2018

Should fasthttp be used in production? It seems to get a lot of flak for not fully implementing HTTP.

https://www.reddit.com/r/golang/comments/5w3ang/switching_fr...

sagichmal · on July 2, 2018

Like every engineering decision it is a risk/reward spectrum. But unless you have profiled to know that net/http is your bottleneck, no, you should almost certainly not use fasthttp.

kasey_junk · on July 2, 2018

We use it in production, though fair warning we have very limited endpoint and consumers that allow us to test it aggressively.

What I really meant is we'd need to take things down to the bare network stack. Lots of our memory use/bottlenecks are fairly deep into the std lib.

mkarlsch · on July 2, 2018

Same here - we don’t use channels in the hot code path and did replace the Go HTTP parser.

csben · on July 2, 2018

How large is your payload? Asking out of curiousity.

mkarlsch · on July 2, 2018

Between 2-40KB, the average is likely closer to 2-4KB (Gzipped JSON or Protobuf)

mkarlsch · on Feb 16, 2017

Was already there but disabled due to https://github.com/golang/go/issues/18190#issuecomment-26525...