How Pinterest scaled

ksec · on Dec 23, 2023

>In January 2012, Pinterest hit 11.7 million monthly unique users with only 6 engineers.

The thing to strive for 12 years later should be how to scale to 100M+ users with 6 engineers. NAND and CPU core has an order of magnitude reduction in price / performance. 1-2TB DRAM is obtainable. MySQL or Postgres has a decade of improvements made, along with simpler, battle tested solution for scaling. Both Python and Ruby, Rails or Django has all the lessons learned from real world.

At least that is what should have happened.

beacon294 · on Dec 23, 2023

Imagine how much more complex things are now :)

archerx · on Dec 23, 2023

Honestly it's a lot of artificial complexity and over engineering.

snihalani · on Dec 23, 2023

Everytime I read one of these, no one tells me what resource was the bottleneck in vertical scaling. Does it feel like silicon valley is addicted to horizontal scaling?

dub · on Dec 23, 2023

Typically the price of not having horizontal scaling is felt more by the engineers than the users, at first:

- Data migrations, schema changes, backfills, backups & restores, etc., take so long that they can either cause or risk outages or just waste a ton of engineer time waiting around for operations to complete. If you have serious service level objectives regarding time to restore from backup, that alone could be a forcing function for horizontal sharding (doing a point-in-time backup of a 40TB database while dropping some unwanted bad DELETE transaction or something like that from the transaction log is going to be very slow and cause a long outage).

- The lack of fault isolation means that any rogue user or process making expensive queries impacts performance and availability for all users, vs being able to limit unavailability to a single shard

- When people don't have horizontal scalability, I've seen them normalize things like not using transactions and not using consistent reads even when both would substantially improve developer and end-user experience, with the explanation being a need to protect the primary/write database. It's kind of like being in an abusive relationship: you internalize the fear of overloading the primary/write server and start to think it's normal not to be able to consistently read back the data you just wrote at scale or not to be able to use transactions that span multiple tables or seconds as appropriate.

dalyons · on Dec 23, 2023

To your first point I find in these discussions the “just buy a bigger server” crowd massively underestimate the operational problems with giant db servers. They are no fun to babysit, and change gets really hard and tedious to not accidentally bring the whole thing down. It becomes a massive drain on the velocity and agility of the business.

baobun · on Dec 25, 2023

Giant sharded DB clusters aren't that much more fun or less precarious... Ever run Cassandra or Clickhouse at scale?

IME vertically scaled replicas/hot-stand-bys are a lot more stable to operate if your requirements allow you to get away with it. OtoH you better already be prepared if/when you hit scaling limits.

John23832 · on Dec 23, 2023

Cost (due to higher effective utilization) and single points of failure?

Vertical scaling is more expensive without the benefit of failover. With modern clouds, Horizontal scaling is cheaper with commodity hardware and gives the added benefit of failover redundancy.

wongarsu · on Dec 23, 2023

If you are already in one of the big 3 cloud providers, vertical scaling is more expensive, especially once you add failover. If you rent bare metal it's often the other way around.

So maybe one of the core reasons for Silicon Valley's love for horizontal scaling is due to plentiful AWS credits.

John23832 · on Dec 23, 2023

Whether people use modern clouds is orthogonal to whether they choose to use vertical scaling.

I can both horizontally and vertically scale in AWS. I can both horizontally and vertically scale in my home office.

Horizontally scaling is just more cost efficient if you can dynamically adjust to load. Clouds enable that because they have massive reserve compute.

In the startup world, there infinite compute, but finite money. It’s cost savings.

CaptainOfCoit · on Dec 23, 2023

> In the startup world, there infinite compute, but finite money. It’s cost savings.

If you're using cloud stuff like EC2 on AWS for cost savings, you're in for a bad time. It makes sense for stuff that you need to be able to dynamically scale on a whim, but once you know what kind of setup you need, it's almost always cheaper to go dedicated.

wongarsu · on Dec 23, 2023

Bare metal, doubling the amount of RAM and using a more powerful CPU is cheaper than adding a whole server that needs space, network, its own Motherboard, case, power supplies etc. This makes vertical scaling much more cost effective (up to a point, which is currently at about 2TB RAM).

At AWS's data center AWS also buys big servers. But their advantage is that they only have big servers, and then rent it out in pieces. They don't care whether they rent you 2 48 core 96GB instances or one 96 core 192GB instances. Both options run as VMs on much larger servers and take the same resources, so both are priced the same (a machine of the same category with twice the resources costs exactly twice as much). Thus there's a much smaller benefit to vertical scaling.

If you compare actual prices on large-ish instance types between AWS and regular dedicated servers (rented in a DC, not doing your own thing) then AWS is very expensive. And the hopes of paying less due to better utilization somehow never pan out. AWS has lots of advantages, but price isn't one of them; and their price structure will influence the architecture you choose.

klooney · on Dec 23, 2023

AWS charges the same money per core and gig of ram, whether you get a big instance or 5 small instances. I guess their top end isn't as top end as hardware you can get in your DC, though.

roncesvalles · on Dec 25, 2023

The 21st century love for horizontal scaling is for not wanting to be locked in with IBM, because in the limit vertical scaling means buying an IBM mainframe.

angarg12 · on Dec 23, 2023

For us blast radius is a reason to intentionally limit the size of our servers. We could have smaller fleets with bigger servers, but then if anything went wrong it would impact a larger number of customers.

That being said I'm partial of having fleets as small as possible to simplify operations. While trying different instance sizes, I found our bottleneck ended up being network. The largest instance types saturate the network first, and hence we can't use servers as efficiently.

baz00 · on Dec 23, 2023

It's always the accountants.

Those big ass SQL boxes cost serious money.

lifeisstillgood · on Dec 23, 2023

The interesting takeaway for me when I see "one engineer, two engineers" is not only "I could have built that stack" but "I did actually build that stack back then"

Like millions of other devs, I have made a (decent) living out of this career, but you do wonder what might have been.

I think there is a new stack - a new Dev Manual - of what to do and build to keep it simple today (Hadoop is involved :-)

angarg12 · on Dec 23, 2023

These days hardware and tech have come such a long way that building a service that supports millions of customers isn't hard at all. A colleague and I spent the last 3 years at work building a system that supports tens of millions of QPS, and we did 90% of the work ourselves.

Of course this doesn't detract merit from the post. The devil is on the details, and there are many complexities other than raw scale.

duckmysick · on Dec 24, 2023

I might be misunderstanding something, but it seems like a hard problem if you spent 3 years solving it.

Or are those 3 years mostly maintenance and the bulk of work was done in a few weeks at the start?

vishnugupta · on Dec 23, 2023

Am I the only one to think this is bit too much for 11m users? 88DB insurances, ~300 across cache and nosql DB, ~200 web servers?

maccard · on Dec 23, 2023

I do agree but that said, they actually built and ran this, and it worked. The world is filled with technically superior products that died.

VWWHFSfQ · on Dec 23, 2023

Python is slow and Django is a beast. It's very expensive to scale to non-trivial usage. Great for productivity though. But you'll pay for that in your cloud bill.

yuppiepuppie · on Dec 23, 2023

Depends what your org is optimizing for. Are they optimizing for cost or speed of development? It’s a question I have to deal with on a daily basis when I here of yet again another “fast” tech to add to our stack.

throwat2341 · on Dec 23, 2023

> Are they optimizing for cost or speed of development?

Historically I would agree. There are options now that are both easy to develop in and are blazingly fast (C#, Go, Rust, Kotlin to name a few).

The elephant in the room is that both Ruby and Python fell behind in relative ease of use and computational performance with respect to other ecosystems and instead of addressing pressing matters there is a lot of mental gymnastics and cognitive dissonance at play such as presenting false dichotomies.

whstl · on Dec 23, 2023

100% agree.

I never experienced this mythical speed of development of Django or Rails, and I've been using both on/off for the last 15 years. Back then I was using stuff that was quite fast for development already.

But as soon as you have something moderately sized the slowness of languages like Python or Ruby becomes a problem for running tests, migrations, debugging, increased server costs, downtime, overhead in things like queues. I worked in some quite large projects though, so maybe for small stuff they work, but I absolutely hate them for writing big apps.

Like you said, the speed of development is IME fantastic with languages like Golang, C# or Node.js, and they don't suffer from the performance issues. Lots of companies local to me moving to Kotlin as well and my friends have only good things to say.

pjmlp · on Dec 23, 2023

No, I feel the same.

Serving ecommerce sites with a couple of Java or .NET servers, backed by Oracle or SQL Server instances, surviving heavy load days like black Friday.

See Stackoverflow architecture, for example.

FridgeSeal · on Dec 23, 2023

StackOverflow gets away with their arch because they have a much bigger read load than write load, and a design that can “hide” slow writes.

I would hazard a guess the Pinterest usage model has a somewhat higher write-load, and is more sensitive to slow writes/non-propagating writes.

Do I think their arch is still overkill? Yeah, do I think equating them with SO and friends is reasonable? No.

pjmlp · on Dec 23, 2023

Still, the amount of servers is most likely a side effect of using Python as well.

The scaling issues we had with Tcl in the dot-com wave with our AOLServer clone, is one of the reasons why I wouldn't use something like Python.

FridgeSeal · on Dec 23, 2023

Definitely.

There was a post on HN a couple of months back about Pinterest saving 2m/month on infra costs by swapping from Python to Elixir/Erlang. It copped flack in the comments because basically “ppphhh $2m that’s not even worth saving, how dumb, Python is good enough”.

re-thc · on Dec 24, 2023

> t copped flack in the comments because basically “ppphhh $2m that’s not even worth saving

When is anything worth doing then? Do people really think there's magic everywhere?

t8sr · on Dec 23, 2023

This is over 10 years out of date. Pinterest today has literally 40x the number of users this article is talking about, and I’d bet the architecture is very different. It might still hold lessons for scaling today, but things sure have changed since 2012.

engineercodex · on Dec 23, 2023

Correct. This is a retrospective of an artifact from the early 2010s. The architecture is much different now.

exikyut · on Dec 23, 2023

Do you have any straightforward leads or pointers on what the tech stack looks like today?

It would probably be very interesting to compare then and now.

impulser_ · on Dec 23, 2023

My theory is the majority of the bad tech decisions made at these companies is mostly due to the high turnover rate of employees in these company.

The average tenure of a software engineer/developer is just 2 years.

People come and go and the next thing you know you're using 6 different database with services written in 5 different languages.

Instead if you kept the same 6 engineers around for decades you could probably scale to 100m with those same engineers.

It one of the reasons why it seems like Google can never release a product. They can't keep people around long enough to see it through.

danielovichdk · on Dec 23, 2023

The best teams I have worked with have been tightly knit for years. I had a tenure as a consultant for a team I knew quite well and those guys, around 7 or 8, had been together for perhaps 10 years.

They knew eachother so well and of course a side effect from that was that they produced absolutely astonishing things.

I was fended by these guys and the level of tooling set up for me not to fuck up, and their reviews, test strategies etc. was lean, direct and nutured by every single individual. They had built these skills as a team through years of hardship. It was quiet a Ln experience to be honest.

So yes, I totally agree with you on that people are what matters most and being in an organisation which hands down are backing it.

absoluteunit1 · on Dec 23, 2023

What is meant by “Ln experience” ?

danielovichdk · on Dec 23, 2023

Sorry. "An" not "Ln". Updates for trying though :)

jordinl · on Dec 23, 2023

logarithmic?

redditor98654 · on Dec 23, 2023

learning?

di4na · on Dec 23, 2023

Google has longer tenure.

The thing you seem to miss are the other common denominator.

Huge amount of money and unreasonably far into the future expectations of returns.

Means there is no short to medium term pressure to optimise for efficiency or returns, which means one of the fundamental element of good engineering environment is missing.

These companies build in a vacuum of limitations in term of cost and a vacuum in term of goals.

John23832 · on Dec 23, 2023

The average tenure at Google has been widely reported for while now as roughly one year.

https://stackoverflow.blog/2022/04/19/whats-the-average-tenu...

testbf · on Dec 23, 2023

That explains why working for Google seems so common. Hmm interesting, that explains a lot about why former Google developers are shit employees.

ren_engineer · on Dec 23, 2023

because they were in a massive hiring spree at that time, if you hire 10s of thousands of people in a single year it's going to drive down your average tenure

nielsole · on Dec 23, 2023

s/average/median/g according to the linked source(cnbc)

John23832 · on Dec 23, 2023

When you have an average that low (or far away from the median) you by definition have a extremely large number of 1 year or lower tenures.

The floor is 0. You can't have a negative tenure at a company.

nielsole · on Dec 23, 2023

What I meant to point out is that Stackoverflow quotes the source CNBC wrong (primary source linked to by CNBC is no longer available). It is the median that is that low. The average might be higher or lower than that number.

game_the0ry · on Dec 23, 2023

> My theory is the majority of the bad tech decisions made at these companies is mostly due to the high turnover rate of employees in these company.

I think your theory is less theory and more fact.

Another problem is that engineers will use a new tech project at work as an opportunity to use a new, hot technology that you can then put on the resume for your next job. That's when you see over-complication.

How do I know this? Bc I have done it - I am part of the problem. I do not want to, but the job market is competitive and I know my employer will never counter an offer I get from a new company.

A lot of this would be solved if companies would try hard to keep their engineers. I would not have made some of the decisions I have made if I knew that I would have to maintain my own garbage. I do try to write good design docs and code comments at least.

apwell23 · on Dec 23, 2023

I think people just respond to incentives.

Engineers get promoted for creating complex architecture. I explicitly remember my manager asking me in my yearly review for promotion "what is so hard about that" .

Managers get promoted for hiring more people to support the said complex systems.

danielovichdk · on Dec 23, 2023

Charles Munger has wrote some very wise words on this. I cannot recall the title but you will find it interesting if you look it up.

koliber · on Dec 23, 2023

How can I look it up?

BrainBacon · on Dec 23, 2023

It's called the Psychology of Human Misjudgement https://fs.blog/great-talks/psychology-human-misjudgment/

agent281 · on Dec 23, 2023

Just ask ChatGPT. I'm sure we'll get exactly what OP was thinking.

Infinitesimus · on Dec 23, 2023

Part of it is how many of these companies handle performance. If you're more likely to get a good rating by building a complex system, you might as well do it and take the money.

psadri · on Dec 23, 2023

WhatsApp scaled this way. -500m daily users with only -30 total engineers across 5 devices + backend.

pjmlp · on Dec 23, 2023

Depends on the company culture, I have mostly worked in companies where there are company wide regulations on what technologies to use, exceptions for when customers require otherwise.

So it is possible to avoid this, regardless of employees turnover.

sidlls · on Dec 23, 2023

Instead if you kept the same 6 engineers around for decades you could probably scale to 100m with those same engineers.

I doubt it. Effort does not scale linearly with user scale, in my experience, primarily because features don't. This means that horizontal scalability is insufficient to go from 10M to 100M. And that means that more complex systems are required to support it--I can easily see a team of 6 very experienced engineers with deep domain knowledge of the application becoming overwhelmed as that scale change occurs.

dilyevsky · on Dec 23, 2023

It’s poor leadership whichever way you slice it. Even with turnover you could implement a culture that prevents this garbage inconsistent architecture but leaders seem eager to trade away any culture for “scale” and “psychological safety” basically so that dont have to be the ones saying “no” to someone. Hey, it works as long as you dont ever have to make money, yeah about that…

toast0 · on Dec 23, 2023

> Instead if you kept the same 6 engineers around for decades you could probably scale to 100m with those same engineers.

Scaling to 100m users is more about your users/product than your engineering IMHO.

If your product doesn't appeal to 100m users, you can't scale that high. It's not necessary to be ready for usage that's unrealistically high. And your team won't develop experience scaling if it's not happening.

Mega scaling gets more approachable all the time. Ebay had to shard their databases pretty early, because what they could fit on the biggest Sun machine to run Oracle they could buy isn't much. Now you can run a dual socket Epyc and get 256 cores with terrabytes of memory and petabytes of storage in one system. Might not be the best way to run your database, but if your queries aren't super awful, you can do a ton of queries per second with that much ram.

rmbyrro · on Dec 23, 2023

The discussion is implicitly based on the assumption that the product can attract 100M users and the engineering challenges arise from that

jdorfman · on Dec 23, 2023

Circa 2013 I worked at a bootstrapped startup (MaxCDN). We got a trial request from Pinterest who put a small fraction of their traffic on dedicated edge servers and we couldn’t sustain it. They ended up going with EdgeCast that had the capacity and resources.

theshrike79 · on Dec 23, 2023

I think people commenting how overblown the architecture for Pinterest is haven't seen a true Pinterest Person in the real world.

There are people whose lives rotate around having different boards on Pinterest. They collect ideas for recipes, interior design, clothing, vacations and _everything_ on Pinterest. It's not a site where people click three links and come back in a week for their next three clicks.

iforgotpassword · on Dec 23, 2023

I always thought the sole purpose of Pinterest was to show images in Google image search that you can't actually see when you click them.

kramerger · on Dec 23, 2023

Yeah, that's how I see it too.

This is so annoying people have created browser extensions to remove Pinterest links from search results

timrichard · on Dec 23, 2023

Hmm, sounds like the relationship I have with Trello.

phartenfeller · on Dec 23, 2023

"The team removed table joins and complex queries from the database layer."

As a database person, I dislike reading this. I understand their need and priority to keep things running. But I would have loved it if they emphasized that doing something like this has so many disadvantages.

Normalization, referential integrity and a powerful query language bring so many benefits. I see young people oftentimes seeing relational vs. NoSQL as two totally valid opinions, like Coke vs. Pepsi. And not as one thing runs the world for decades and is perfect for >90% of use cases, and the other one is for niche cases and fast hyper-scalers.

t8sr · on Dec 23, 2023

I don’t like it either, but apparently semi-structured data is the future. It’s proving easier to build systems that tolerate bad data and broken references than it is to get our data normalized and cleaned.

I’m the guy on most teams pontificating about SQL and data-oriented code, but even I’ll concede that large scale live systems probably shouldn’t serve from an SQL database.

koliber · on Dec 23, 2023

Do they have only one huge table in the db that contains all of their entities? From the key structure that seems reasonable: shard id, type, object id.

If that is the case, is a relational db still the right place to keep such data? It seems that if the goal is to store this kind of data in one or a few huge sharded tables, a different tool could be optimizing to do the job better.

It seems that I’m missing something…

rwultsch · on Dec 23, 2023

A pin was a 1.2 KB json blob. There were other tables but pins was the big one. Why MySQL? It did not destroy data like the alternatives.

how storage became efficient https://medium.com/pinterest-engineering/evolving-mysql-comp... https://medium.com/pinterest-engineering/evolving-mysql-comp...

tmsh · on Dec 23, 2023

In 2024 all major cloud providers have NoSQL (AWS DynamoDB and its equivalents) that will easily scale to beyond this use case, managed purely by good single table design. That is the best practice for this and most basic use cases — not RDBMS in 2024.

Why? key/value data stores have consistent hashing which scales to planet scale built in. No need for sharding or caching or other bs that brings down slack and GitHub every other month.

Denvercoder9 · on Dec 23, 2023

Most applications don't need planet scale, though.

adrianN · on Dec 23, 2023

I know cases where caches on top of a kv store improved performance a lot. I don’t think it’s as simple as you claim.

rsanek · on Dec 23, 2023

do you have a reference to how this might be accomplished? how do you avoid high read / write latencies if you put everything into a single table / key-value pair per user?

Zanfa · on Dec 23, 2023

I think they're referring to single-table data design as described in https://www.youtube.com/watch?v=HaEPXoXVf2k It's quite clever and scales well, but completely unnecessary for the vast majority of CRUD apps.

evntdrvn · on Dec 23, 2023

IMO it’s also difficult when you’re in the initial phases of a new company/product, because the system is evolving fast and that doesn’t play nice with single-table-design needing upfront design to work well.

I have had much better luck sticking to a plain old relational DB stack for those phases, because it’s fast to iterate, and then consider moving to single-table NOSQL when the system/DDD has “gelled” and traffic is starting to hockey-stick

rwultsch · on Dec 23, 2023

Before I joined in late 2013, they had not known how to run schema change without downtime. Once we fixed the kernel the db’s were nearly completely untaxed in terms of performance. They did however need large instances due to disk usage.

jimlikeslimes · on Dec 23, 2023

Like another comment pointed out they seem to store pins as a JSON blob in a single column anyway. They probably should've used Drupal.

Really, this is pinterest. They store 'boards' and 'pins', you may as well have a bit of fun with it. What use cases do they have for the features you list? They probably only used a database for write transactions?

I deeply value database technology and use it daily, but I don't think something simple like this needs to worry too hard about it.

mentos · on Dec 23, 2023

Question I have for HN is if you had a weekend to create a Pinterest competitor for a hackathon that needed to scale to the same number of users, what would you use?

qingcharles · on Dec 23, 2023

I think everyone will answer this with their current stack-of-choice. If you have a hammer, you see every problem as a nail.

I'd use .NET Razor pages, Dapper, Postgres, Kestrel on Linux?

What's amazing is that 14 years later and the major browsers are still hoarding the Pinterest masonry layout behind a flag!

https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_grid_la...

maccard · on Dec 23, 2023

Whatever you know best. The tech stack is irrelevant. Assuming 11m MAU, we can finger wave to 200k peak active users. That's handleable on small hardware even if you use express.

The hard part is finding 11 million users.

christophilus · on Dec 23, 2023

Whatever I knew best. Node, Go, .NET, and Postgres in my case. Those can all handle a lot of load if properly engineered and adequately provisioned. Redis or some other caching layer might come into play if the scale really warranted it, but it certainly wouldn’t for a hackathon project.

maccard · on Dec 23, 2023

Completely agree - go + myself + redis would be my go-to here. I'm of the camp that the minute you think you'll need redis, use redis. It's much easier to manage and deal with when you can turn it off than when you can't.

emmanueloga_ · on Dec 24, 2023

There are a few distributed databases available now: PostgreSQL -> CockroachDB, MySQL -> TiDB or Vitess/Planetscale, DynamoDB -> ScyllaDB, so it sounds sensible to start with any of these, maybe even the hosted versions, then move to self hosted when necessary.

pacifika · on Dec 23, 2023

You can’t create a Pinterest competitor in a weekend. Or the answer is enough money to purchase Pinterest.

tim333 · on Dec 23, 2023

10k users or 11 million?

scottwick · on Dec 23, 2023

Nice to see that they're just using EC2 + S3 instead of a whole suite of niche cloud offerings that we see these days. Maybe this is mentioned in the linked presentation (https://www.infoq.com/presentations/Pinterest/) but are they using managed databases? Or just running their own on EC2 instances as well?

Curious to watch https://www.infoq.com/presentations/Pinterest/ at some point.

manusachi · on Dec 23, 2023

They also used to be "pretty jazzed about Elixir" and had it in their stack [1], at least while Steve Cohen was there (btw, Steve is also the author of lexical[2] - one of the newer Elixir LSPs out there)

[1] https://web.archive.org/web/20160203120655/https://engineeri... [2] https://github.com/lexical-lsp/lexical

pawelduda · on Dec 23, 2023

Yeh, the question is: are they still?

alberth · on Dec 23, 2023

  66 MySQL DBs + 66 secondaries
  59 Redis Instances
  51 Memcache Instances

That’s interesting.

For every 1 database, they also needed 1 redis cache + 1 memcache

yeldarb · on Dec 23, 2023

Cool to read. Looks very familiar. I started an Easter Egg Hunt Facebook game out of my dorm room in 2008 called Hatchlings. We actually had them beat in the engineers to users ratio. When we had 5 million users and more pageviews/mo than the NY Times it was still just me (and I had hired my mom and sister part time to do customer support). It was running on a LAMP stack + memcached (first at DreamHost, then Joyent, and finally AWS).

This article sparked in my mind a bit-twiddling hack I implemented to get a ton more scale out of our single giant MySQL box that survives in their code base to this day, much to every one of their dev’s chagrin (crazy to think that 15 years later the weekend project I launched still has tens of thousands of monthly subscribers).

At Hatchlings, we had several hundred collections of Easter Eggs (usually 7-12 unique eggs in each collection). Each user would “open” and then hunt for a few collections at a time until they “finished” them. New collections would come out every week (to ensure there was new content daily).

As we scaled both in users and number of collections I realized that the “normal” SQL layout you’d use to represent this (a users table, a collections table keyed on user_id, collection_id, and an eggs table keyed on user_id, collection_id, egg_id) was growing rapidly (each user was adding new rows every day) and needed multiple joins in our “hot” routes (when a user searches for an egg we need to know which collections they have access to, and when a user clicks an egg we need to know if it was new for them and if it newly completed that collection) which were being hit tens of thousands of times per second during peak times. Additionally, every egg collected increased the user’s score.

So I implemented an optimization: store the collection unlocked and completed info compactly in the users table in bit fields. These were several 64 bit integers where eg collections_unlocked_1 being equal to 4 (binary 100) meant that the collection with ID 2 was unlocked. (collections_unlocked_2 equaling 1 would mean collection 64 was unlocked since there were 64 bits in 1). Your “active” collections were the bits that were 1 in an unlocked bit field but 0 in the parallel completed bit field.

This did a few things:

* Reduced the rows per user from total_collectionstotal_eggs to ~collections_in_progress (just in-flight collections) — about a 100x reduction in our total DB size

* Reduced joins to 0

* Since all the “common” egg collections were already completed by all the most active users we didn’t have to check the eggs table for the vast majority of finds because if a collection was marked finished we knew you already had all the eggs in it

* Made everything needed for the hot route compactly storable in a single memcached entry (which allowed us to greatly reduce writes because we would read scores and collections from memcached in the hot route and only write changes to the DB once/minute)

It was a great speed and scaling optimization… But it was really tricky to deal with and reason about… and a binary arithmetic error could completely nuke months of game progress for people. And we also had to remember to add more columns to the users table every year or two or we’d run out of space for collections in our bit fields (we forgot about this a couple of times and it lead to downtime which tended to happen during the most peak times where we had hyped that we were going to release a ton of new content all at once).

poisonborz · on Dec 23, 2023

Honestly the last 10 years of software made scaling in terms of serving requests easy, such feats are common. Real hurdles come with spammers, moderation, support.

1023bytes · on Dec 23, 2023

Interesting that they use both Redis and Memcache at the same time. Is there any particular reason for that?

kramerger · on Dec 23, 2023

Are we talking about actual , "willing", users or people who accidentally clicked on a Pinterest link in their search results then immediately clicked back?

Because that's exactly how I use Pinterest.

lolinder · on Dec 23, 2023

May I recommend Kagi? I had the exact same problem until I blocked Pinterest, and a lot of others had the same idea:

https://kagi.com/stats?stat=leaderboard&k=-2

ipaddr · on Dec 23, 2023

Wish we could block the kagi advertising on here. If you visit 100 top tech sites it's great if you go outside of that bubble you get pretty thin results.

qingcharles · on Dec 23, 2023

I've been a Pinterest user since launch. I couldn't imagine life without it. I love finding creative photos and videos and saving them all for inspiration.

wlonkly · on Dec 24, 2023

I understand the complaint, but I don't think that was a problem back in 2012, the timeframe of the article.

ochoseis · on Dec 23, 2023

Did they switch from Django to Flask, and if so, why?

stefanos82 · on Dec 23, 2023

Without knowing their reasoning behind their decision, I would assume they did so, so they could have better microservices control than Django's monolithic approach; I could be wrong though, but that's the first thing that came in mind.

liviux · on Dec 23, 2023

Internet would be a better place without websites like Pinterest and Quora.

And articles like this, about a presentation older than 10 years.

seper8 · on Dec 23, 2023

Whats a web engine?

BiteCode_dev · on Dec 23, 2023

11m monthly user is not that of a big number. Any stack will do providing you don't do something stupid.

Our stack was similar than pinterest, django but postgres and redis. Team of 2.

Indeed, keep it simple. Cache things. Use queue, etc.

You can scale vertically a lot. Db perfs and disk space were our points of focus.

kugelblitz · on Dec 23, 2023

I think 11m monthly users is a lot. Sure, compared to Facebook 2023 it's minuscule. But assuming those 11m monthly users log in 2 times a week (~8 per month) and check out 15 pages each time that comes out to 1.3bn pageviews a month. Divided by (60s/m * 60m/h * 24h/d * 30d/m) comes out to about sustained 500 pageviews per second.

I assume since they're a viral / addictive platform the numbers will be higher. Plus they're not a blog or so, but rather a personalized social media platform, which makes stuff more complicated (you can't just cache their news feeds, they're all personalized).

It's easy to say meh, 11m is nothing, there are other platforms with more users (esp. since you don't bring up examples how you managed otherwise). But I think it's a big technical feat to do this with 6 people.

BiteCode_dev · on Dec 23, 2023

500 page views / second is, again, not that big.

Basic servers are capable of handling hundred of the housands of those if you optimize your db and use a low level language.

We used python and an orm and we handled 1000 r/s on peak hours daily.

And we are talking user accounts, comments, votes, upload, transcoding, video streaming...

Todays computers are mind blowingly fast.

eimrine · on Dec 23, 2023

> transcoding

Afaik a transcoding session holds the entire CPU core, for example malformed gif avatar typically can lie down a single-server web-forum for a few seconds.

blackoil · on Dec 23, 2023

500 pageviews per second are big but 90 Web Engine + 50 API engine? i.e. ~6 rps per web server and 10*x rps/ API server. Apart from that ~250 servers for DB+cache. I know my comment smells so HN and kudos for the team that they build and scale it. But as an outsider hmmm.

snovv_crash · on Dec 23, 2023

A single SSD can serve over 100k ops/s. Pubsub systems built for the finance industry operate in excess of 60M messages/second [0]. I understand having multiple machines for failover reasons, but I can't help but feel that the majority of scale-out is due to the people who are doing the development not having the skills to properly optimise their systems.

0. https://youtu.be/8uAW5FQtcvE

maccard · on Dec 23, 2023

There's an element of "how can they be so inefficient" in this thread, but hardware has come a long way in the last 12 years. I'd bet you could handle this scale with a single DB and 3-4 read replicas with modern hardware for example.

UberFly · on Dec 23, 2023

Kagi Search's #1 user blocked site is... pinterest.com

If they were once a good idea, those days have long passed.

petepete · on Dec 23, 2023

I think that's more because Pinterest dominates the image results but you can't view things at a higher resolution without signing in. I certainly blocked Pinterest for that reason.

suddenclarity · on Dec 23, 2023

Doesn't it just show that non-members dislike walls? In this case the requirement of an account. I think it would be a bad idea to rate Netflix and Discord based on how non-users experience them.

DrSiemer · on Dec 23, 2023

Netflix and Discord don't tease people with content that appears to be available until you click anywhere near them and get all the black patterns vomited out over you.

Google Image search is almost useless if you don't block out crap sites like this.

ceejayoz · on Dec 23, 2023

Netflix absolutely does this.

https://www.quora.com/Why-do-some-movies-show-On-Netflix-in-...

(The irony of linking to Quora as evidence doesn’t escape me.)

suddenclarity · on Dec 28, 2023

> Netflix and Discord don't tease people with content

They do.

> Google Image search is almost useless

So your issue is with Google and not Pinterest then.

zeusk · on Dec 23, 2023

Except for, netflix and discord aren't SEO spam and don't pollute search results.

dfgfek · on Dec 23, 2023

They don't? If I google the name of a TV series to get more information about it, several streaming platforms that have it, including Netflix, take the first spots of the results, although I have no interest in that spam because I intend to pirate it.

rsanek · on Dec 23, 2023

this hasn't been my experience. streaming sites tend to rank much lower than IMDb, RT, Wikipedia, etc. If you already intend to pirate, wouldn't any non-pirate site be irrelevant?

dfgfek · on Dec 23, 2023

IMDb, wikipedia, etc usually are what I'm looking for when I search for a series' name.

pacifika · on Dec 23, 2023

Agree. Pinterest is a valuable service for logged in users.

threeseed · on Dec 23, 2023

We really need to ban sites like this.

It is just repackaged content from the original Pinterest presentation back in 2012:

https://www.infoq.com/presentations/Pinterest/

engineercodex · on Dec 23, 2023

I’m the writer of this. I linked to this presentation very clearly throughout the article as a source.

This was an article meant to resurface learnings from something that happened a decade ago, with added images and a clear distillation so that somebody doesn’t have to watch the 45 minute presentation to understand what happened.

I’m sorry you didn’t like it.

But I hope you can see the value I’m trying to provide here, as someone myself who doesn’t really have the time to sit through hour-long presentations to learn something.

rmbyrro · on Dec 23, 2023

I think most people here see the value of the article. Thank you for publishing it.

Parent comment was also valuable to me, for seeing how important it is to listen to others before jumping to conclusions about the worthiness of something.

mmaunder · on Dec 23, 2023

There was a time when many of the OGs here would write original content, not for subscribers, not to make money, but to share our ideas, facilitate our own learning, to have to defend our ideas, to go deeper, learn more, and get better at our art. That was the way, once.

ipaddr · on Dec 23, 2023

Can't wait to see what you submit. Don't criticize and offer nothing better.

mmaunder · on Dec 23, 2023

Check my history. Over 280 submissions going back to 2007. Welcome to HN btw.

ipaddr · on Dec 23, 2023

I'm on my nth account. 12,13 submissions year and barely a few this year isn't going to feed the masses. Produce more or accept that others will submit articles not up to your standards.

bad_username · on Dec 23, 2023

I got great value out of the article and shared it with my colleagues. Were it not for your effort, we'd never stumble upon this info, even though it existed in some other form.

graycat · on Dec 23, 2023

Uh, I really liked both the article and the posts in this thread. Both are timely and right on target for me as I work to get my Web site running on the Internet.

I like the 11 million unique users a month early in the company -- simple architecture, popular programming tools and languages, small team, and likely enough ad revenue to pay the bills and get some earnings!

I wrote my code using Microsoft's Visual Basic .NET, ASP.NET, SQL Server, and one use of platform invoke. The code appears to run as intended.

I like the mention of DB (relational data base): I wrote the code using a free version of Microsoft's SQL Server. Right, it's only for development work, has some severe limits on DB size, gets expensive for a production version, also a pain since have to count processor cores, was glad to see Postgres and MySQL since have been planning to use one of those. I liked seeing the mention, and remark on power, of key-value stores (e.g., Redis) since I wrote my own key-value store, with all the data just in main memory, using two instances of the .NET collection class.

Now with TB (trillion byte) main memories am even toying with the outlandish idea of keeping nearly all the DB data in main memory with SQLite -- outlandish!

And, liked seeing the steps up in capacity. That there are likely better architectural ideas now doesn't disappoint me!

The outlines of the architectures of even huge Web server farms was really good to see. You mean Google, Facebook, Amazon, Microsoft, etc. have surprisingly simple architectures??? WOW.

koliber · on Dec 23, 2023

Your article was clear, informative, and I learned a few things from it.

You can’t please everyone.

wokwokwok · on Dec 23, 2023

[flagged]

engineercodex · on Dec 23, 2023

I don’t really understand this comment.

I didn’t “slap” any date on top of the article. 2 Oct 2023 is the date I published my article. (This is how blogs on the Internet work.)

The first line of the article says that this all happened in 2012. I’m not sure I could get any clearer there.

The “GPT distillation” phrase is pretty rude. Anyone reading the article can see the difference between it and the presentation. Every “hand-drawn” image in the article is created by me in Excalidraw, and any images from the presentation are sourced with a very clear link to the presentation.

If you have any constructive feedback, I am happy to listen and take it into account.

Otherwise, I’m not really a fan of the rudeness. Thanks.

ramijames · on Dec 23, 2023

Don't worry about it man. Some people just like to complain. Don't take it to heart.

I found it interesting and valuable. It doesn't matter if the base content originated from another source. I would never have seen that original content. I saw this. It's ok.

jasonwatkinspdx · on Dec 23, 2023

I missed the original InfoQ presentation so this post was definitely useful to me. In particular I'm working on something related to the clustering vs simple sharding dilemma so finding another big name use case is quite helpful. Thanks and don't let the haters get you down.

clippyplz · on Dec 23, 2023

How do you figure this 45 minute talk with slides is "literally the same" as a five minute article with pictures?

thruway516 · on Dec 23, 2023

We really need to ban attitudes like yours. Stop hn from going the way stackoverflow went. Because a question has been asked before and you know the answer doesnt mean it cant be asked and answered again in many different forms so to speak.

lionkor · on Dec 23, 2023

I wasnt here in 2012 so this was new info for me - I liked the article

sethammons · on Dec 23, 2023

That seems to only be a slide deck. An article is vastly more useful

Takennickname · on Dec 23, 2023

We don't. People will upvote useful things.

dfgfek · on Dec 23, 2023

When I read the first line I thought you meant Pinterest itself :D

thatwasunusual · on Dec 23, 2023

> We really need to ban sites like this.

Yes! Let's censor stuff we don't like! /s

smcleod · on Dec 23, 2023

Isn’t Pinterest just a spam / advertising website? Most of the folks I’ve asked have it blocked in unlock/kagi/pihole etc…

tim333 · on Dec 23, 2023

Lots of people like it. I've noticed female friends like it for interior design / fashion ideas / similar. It's now floated with a $25bn market cap, and 450 million users. According to Wikipedia they had 3,987 employees in 2022 so I guess they've bloated up a bit since 2012.

qingcharles · on Dec 23, 2023

No, it isn't. I use it significantly, both personally and professionally for my work. It's an exceptionally well-designed tool and there is nothing else right now that comes close. I'm looking forward to them upgrading their AI based on all the recent advances.

wlonkly · on Dec 24, 2023

On one hand, when I don't want to use Pinterest, I too curse how it takes over Google Images results.

But on the other hand, when I do want to use it, it's a great scrapbook and recommendation tool -- easy to capture images I find while browsing, and good at figuring out what else I'd like to see related to them on the same theme. A visual notebook with suggestions.

Things like:

- keep track of the bags I've been thinking about buying so I can see them all at once and find others I missed

- pin a bunch of home office photos I like, and then it suggests even more ideas (or garden ideas, or kitchen ideas, or workbench ideas...)

- pin all the recipes I've enjoyed making and that I want to make, and get even more suggestions that have the same dietary restrictions

- pin some things with a particular vibe -- let's say "motorcycle photos I think are cool" -- and get a better idea of the whole aesthetic or subculture or whatever they're from

For better or worse, it is really good at what it does. No argument that it pollutes search results, no argument that there are too many ads in the form of promoted pins, but underlying that, the recommendation algorithm is really useful.

bratwurst3000 · on Dec 23, 2023

If i want to visualize an idea I use it. Mostly interior design and such

mfru · on Dec 23, 2023

it is a nice tool for visual thinkers, also good for creating mood boards.

graphe · on Dec 23, 2023

It is. It aggregstes other content and doesn't add anything of its own besides ads whichbice never seen since I AdBlock, worst thumbnails, and a walled garden for blogs.

beacon294 · on Dec 23, 2023

Curation and curation tooling both add value, similar to how purifying chemicals or making food adds value. And that isn't the only dimension either

graphe · on Dec 23, 2023

The only reason I know about Pinterest is because it ruined Google images and it destroyed curation on a much better platform.