Hacker News new | past | comments | ask | show | jobs | submit | mavster's comments login

I've got some CS1.6 / CS:GO / CS2 photos from Australian LANS https://www.flickr.com/photos/dfragtv/albums/


I'm just guessing, but...

"developer gets a great idea - let's push an update to the API as a GET request so we can cache this on the CDN... forgetting that the JWT token is potentially returned in the call. Now, whoever makes the call first gets their JWT token stored for everyone else to load instead when the API call is made."

Ta-da, Klarna.


I worked with a team that owned a service that resizes images. An engineer was assigned a task to add support for auto rotating images. His solution involved saving the image to a file and then using a library to handle the rotation. He used a hardcoded value for the file name. In a local environment where requests are sparse this looked fine to him and other engineers on the team missed it in code reviews. It wasn't until it went out to prod that he realized the error in this. Users started seeing other users' images because the file's content was constantly being overwritten.

When you test features like this or caching a response with a JWT it can be very easy to default to the happy path or ignore the impact of a large volume of concurrent users.


"An engineer was assigned"

Nope. That definitely wasn't an engineer.


Mistakes happen. I've never met an engineer who has never made a mistake. However, I have met brilliant engineers who have written incredibly complex software and have also managed to make some silly mistakes along the way.


No true Scottish engineer would have made that error!


:-)


Real software engineers don't make mistakes?


Not mistakes like that.


Elitism alert!


Years ago I added varnish in front of a website to cache image requests, not realizing that if the response included 'set-cookie' that was also cached.

We immediately started getting reports of random products appearing in our customers' shopping carts, as people's sessions got merged with random strangers.


Just feel the urge to point out that Varnish by default do specifically not cache requests with a set-cookie header. :)


I expect something exactly like this happened. I had a similar bug long time ago. Apache was somehow incorrectly caching the request and the session cookie in the request ended up in a cache. But it happened only about 1/10,000th of the time so it was impossible to figure out the root cause.

However, one common source for this kind of bugs is to ”cache any URL ending .pdf as a static file” and then you are in fact serving logged in PDFs like customer invoices that come with the session cookie.

I think CloudFlare used to come with a default rule to treat .pdf as a static content. The responses were cached when you hit their ”cache the good stuff” checkbox.


I doubt that Klarna, a bank, have OSI layer 7 proxies in the cloud, with TLS termination in their CDN solution, on AWS. I would assume this traffic is outside of that. But then again, I know they wasted 25M+ Euros on a garbage NodeJS platform. They also created an own cloud once. Yes, it is in the trash bin.


I'd actually bet against you on that one. They are still stuck with one foot in the startup mindset.


Surprisingly many IT companies tried to create their own clouds, or at least their own kubernetes.


Surprisingly many have saved boatloads of time automating processes pertaining to the tasks at hand. So, yeah, sound reasonable. :)


They didn’t “create” their own cloud - they wanted to host their own hardware using an api layer to provision resources. That stuff was not built in-house.

Manhandled in-house though...


Sebastian used the word cloud when I met him.


Yeah, I actually took part in setting it up with them. It was CloudStack. API layer in front of hypervisors.

Such is the cloud software... :) Cloud, besides APIs, i.e managing hardware at scale was not really what they did.

They did roll 1000s of vms per week through it in ci/cd flows though.

As such it did what it was supposed to do - docker/containers was not a thing at that point in time, and I remember thinking it was pretty awesome.

To many nifty engineers, with long fingers, for their own good though. You need to be strict with automations if you want to keep something like that running reliably over time.


There was a Klarna cloud yes. At the time it was unclear if finance/banks could utilise public cloud services (regulatory requirements), so it made sense in that way, but creating your own cloud is something few orgs are capable of.


It is not that uncommon among finance tech companies in Sweden to have some in house cloud. They often already have the knowledge to run servers with virtualization, logging, backups, redundancy and so on. Adding a service layer to that by using Kubernetes for instance is doable.


Klarna Cloud was a deployment of Cloudstack or Openstack (my memory fails me now) for internal usage, when there was still a lot of discussions around cloud lock-in, it was not an in-house built cloud platform.


I did not think they actually wrote the code. But I think the ambition was higher. Pretty much every CTO in this country have hubris and think their services will be sold to third parties.


What makes you doubt that?


I can 100% see this being the cause if this comes out as the root.

But... API's really shouldn't be cached? At least not at the CDN level. The risk of serving up stale dashboard data alone makes users go ????... and we definitely don't want - not even mentioning the problem here, that's crazy.


100% agree with this. A database is, in some form, a cache of its own. If you have to add additional cache on top, it's an additional source of complexity and risk. If you are building a financial platform, you should DESIGN around this.


Depends on the scope of the API of course, but it's a good rule of thumb for any API with private auth


Of course you can cache it, but your assuming it should never. Nothing wrong with caching API calls on the CDN forever as long as your purge the cache once you need it. Event based purging.


"There are only two hard things in Computer Science: cache invalidation and naming things."

Cache invalidation is always a very tricky affair. It can work for a while but as complexity grows it gets very hard to maintain and debug. It's very much a "here be dragons" situation and you have to go into it with your guard up.

I was at a small startup that had a quick and dirty contractor built API. It worked, but for our largest customers, 99th percentile latency started going over the API gateway timeout. The quick and dirty hack on top of it was aggressive caching with too-clever invalidation logic. It worked until new features were added and then it started failing dramatically and unpredictably. The bugs were an absolute nightmare. We ended up spending almost a year cleaning up the data model, sharding things by customer, and fixing a bunch of N+1 queries, all so that we could get rid of our API cache layer and kill the bugs for good.


This reminds me -

A couple of years back, I was making https://lifeboxhq.com which involved users uploading quite a bit of content. I was happily testing security with some url resource enumeration and for some reason, I could non-deterministically access user uploads via url, even on accounts I didn't own. I spent several days looking at my Flask code, javascript, etc. to debug....

I knew it wasn't my code, but I was getting more and more frustrated, then I remembered I set up Cloudflare....

Remember to exclude certain routes from Cloudflare if you want to avoid arbitrary user content from being cached without authentication.


I introduced a similar bug into one of my products in the past (Be honest, who hasn't?). But I'm surprised here because Klarna is a quite mature product and something like this shouldn't really happen at that stage.


Oh, it can definitely happen even in mature products. One I worked on had pretty much the same issue as Klarna (people seeing others' info) when someone updated a web client library we were using to a new version that subtly changed how it handled concurrency.


I remember something similar when there was a load balancing issue with some website where it would randomly assigning a user with someone else's account.


To get around this, one could include the request IP address in the JWT and required a refresh token to be sent when the user's IP switches.


This is not a safe method for protecting against this type of cache vulnerability. IP addresses are regularly shared by multiple users, especially when behind NAT (even mobile ISPs are doing carrier grade NAT these days).


So there should be no fail safe since it can't be guaranteed to work in every scenario.


In this context, this would just prevent everybody from logging in. The JWT would correctly get rejected but people would still be getting the wrong token from the CDN over and over.


Which would you rather? The situation you just described or users accidentally spoofing each other's session?


very real


I wholeheartedly disagree with this entire statement.

If you want to grow as a programmer (in my case), then side projects are going to get you the skills to advance your career and get you a better job over "buddying up" with recruiters. Sure, you can be a 9-5 kinda person and then "switch off" when you get home, but you will be the one stuck wondering where it all went wrong and write a blog article complaining about it.

insert principal skinner out-of-touch meme here


easily a 4 - Co-Founder felt embarrassed trying to show it to clients but it worked and people could give us money. knocked it out in like 2 months. - now we're 2 years in, still here.


i voted for and still use sass but inside vue/nuxt - i have no desired to write css inside js, why? because I don't have time to relearn the wheel and I choose vue over everything else because I get to write code faster, not just css but html and javascript at the same time. I use the tools that work well for me so I can move prototypes to production faster.


The next step is to work with the tools that let your whole team work fast, on board new members fast, etc...


In a similar boat; I switch between multiple AWS, Gmail, Digital Ocean, GCP, etc accounts and it just didn't handle a 'full on power user'. I dev everything in Chrome and just utilize pinned tabs still.

The autolaunch was also very annoying. Having user profiles or personas to handle all my logged in states for that user would be awesome. let me know when that's in there and I will get it another try.

I host a few services for family/friends/clients and am also working on a couple of projects at a time so I am forever switching accounts on a weekly basis.


Andddd it's down again


the charts have stopped updating as well. edit: i stand corrected, they are literally off the charts as you say haha


unable to commit or use their web interface. status page shows spikes across the board.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: