A Depressive Journey With MongoDB

jm4 · on Jan 12, 2012

I don't think the author is exactly unnecessarily bashing MongoDB, but MongoDB is not the problem here. It is only part of the story because that's what the author chose to use. I would think that under these circumstances and time constraints, any solution the author might have come up with would have failed.

The bottom line is there were only 4 days to go from idea to supporting 30k concurrent users. That is a recipe for disaster. There are so many things to consider and implement in addition to the software. Nevermind the fact that there were probably laughable requirements for this project and no time allocated to design and testing. There wasn't even sufficient hardware to support the application. This project was a failure almost from the moment it was conceived. This guy never even had a chance.

There is a story in here, but it is not so much about MongoDB.

spelunker · on Jan 12, 2012

That is what I got from the article. MongoDB may have issues, but:

1. Being expected to develop a fully working, robust application that accepts 30k users in 4 days

2. Agreeing to this commitment

I think are the big failures. Who in their right mind would agree to something like this, even going as far as taking responsibility for it?!

True, when it comes to it, your boss is your boss, fine - if your hand is being forced, do it under protest, rather than pretending this isn't a terrible idea.

TheSmoke · on Jan 12, 2012

as i have said in the disclaimer, i'm definitely not bashing or blaming mongodb for anything. when you read it, you will see that i have done a 30K/sec load testing. the url it was pointed at was doing writes to mongodb. it opened every connection, wrote to the db and closed the connection. this is a huge success for mongodb. the problem is, when you send every single read query to mongodb under a heavy load, within a few seconds, it will just accept connections and will not respond to queries. this is definitely not the case for writing. writing handles them very well. i really would like to know how we screwed there.

jm4 · on Jan 12, 2012

I don't know enough about MongoDB to answer your question, and it sounds like you didn't know enough about it to even consider putting it into production for this project. That's where you got screwed. Trying out new technology when you have a likely impossible deadline and then having it fall over in miserable fashion for everyone to see is a good way to get fired. At that point, the debate is not over the ridiculous constraints placed upon you, but rather over the decisions you made.

http://www.mongodb.org/display/DOCS/Connections

That page says each connection spawns a thread. That could be a good lead. The documentation suggests using a connection pool (although many drivers will do this automatically). Were you using one?

http://stackoverflow.com/questions/8439639/mongodb-max-conne...

That StackOverflow question is about supporting 20k concurrent users.

It could also be a matter of insufficient hardware. I don't know, but I would start looking at what kind of performance can be expected in different setups and go from there. The most important thing is to properly test your setup. Prepare for a process of trial and error and expect it to take some time.

hengli · on Jan 12, 2012

You opened a new connection for every request? Shouldn't you have just 1 connection with all the queries going through that connection.

tmountain · on Jan 12, 2012

I work at a company where we use Mongo to store and retrieve session information for over 2.5 million users per day. Our initial deployment crashed and burned in a similar fashion for one very simple reason. Mongo opens a thread per connection. Under this model, you simply cannot point every Apache request to Mongo and expect it to handle the load.

The solution is fairly simple. You can either use persistent connections at the driver level (the PHP driver provides this, I'm not sure which other languages do), or you can put mongos (used for sharding) in front of mongod. Mongos does connection pooling and tremendously reduces the load on your mongod server.

The author of the post mentions that redis is "heaven sent". Well, redis handles its connections in an asynchronous fashion which is why it can handle a large amount of direct traffic gracefully. I believe there are plans to change mongo's connection handling to use select/epoll, but I'm not sure when.

hello_moto · on Jan 12, 2012

I probably don't like the tone of my comment on this one but TheSmoke, please don't do this repeatedly. ever. again.

While I respect your honesty, but after reading this particular blog post and a few bits of your website, I think you should slow down and value tried and tested tools.

Let's start with your "About me:"

"Mengu, is a web developer with Ruby on Rails, Grails, Django, Pyramid, web2py, CodeIgniter and jQuery in his tool box. Available for any kind of web development jobs. Read more.."

Lots of technology there. Then you take on a project where you chose MongoDB. I think I'm starting to notice a trend here.

You burned your team member for not standing up for your choice.

"Our manager accused me and MongoDB for the failure while we all have agreed to use MongoDB in the team. None of my teammates stood up."

My prejudice sense something is off: I felt that you were the one who pushed MongoDB hard to the point where your team member gave up and just "alright, let's just get this ball rolling since we have 4 days anyway..." rather than argue with you.

It fails. You got blamed alone. None of your team member stood up for your "ignorance" of tried-and-tested tools. That's fitting if I may say so.

Don't do that in public blog. Don't ever ever do that. Especially when it's your fault.

You also started off with saying that "MongoDB is okay" then trash the community and not willing to purchase for the support while admitting that you do need the support.

I know it has been hard for you but you really need to step back a little bit and do a retrospective on your decision-making ability.

I know I'm saying harsh thinsg but I hope you understand that there's a bigger problem and MongoDB is not that problem (at least not now).

In my short time in this hi-tech world, I've met and worked with a few people who religiously pushing new hip tools with zero experience (using the said tools) lately (especially lately, due to the explosion of blogs and hip tools) and would want to spread the blame to everybody else if it fails just because they agreed to use it. It's a common trend that most managers can see quickly and it may not be good for your future career.

TheSmoke · on Jan 12, 2012

hi. i'm not offended by what you said. in fact, i'm glad that you took your time and wrote to me. if i was reading a post like mine, i would also come to a conclusion like yours. unfortunately, i have to say that you are wrong. our team consists of 3 developers. there is not a single minute that i have pushed mongodb or anything else hard. this is a web site that has thousands of reqs / sec. nobody in our team can push new tools let alone pushing religiously. this is not a toy for new things to try. "i" was accused because it was me who stood up for the decision, among the team. plus, i am brave enough to take the blame when i am to blame. thanks again.

ww520 · on Jan 12, 2012

While the OP has made some mistakes, not posting them in public is not one of them. It takes guts to publicly admitted your failure. I commend OP for manning it up.

He will grow from it and will do better next time.

rtperson · on Jan 12, 2012

> We could not test the application because we were told about it on tuesday and it had to be ready on sunday.

Well there's your problem right there. That is not an appropriate timeframe for a new app. The only professional response to such a request is: "It cannot be done. Not if you want this app to work in production. Here's why." Follow up with an explanation of why QA and load testing are crucial. If they press you, make your next follow-up a letter of resignation.

haasted · on Jan 12, 2012

Also, if you accept the timeframe, go with only known technology. This is not a scenario for getting experience in a new platforms.

EwanToo · on Jan 12, 2012

Given that his disclaimer reads:

DISCLAIMER You are about to read a long story on how I got burnt with MongoDB and depressed with it. I am not blaming MongoDB, anyone using, advocating or developing it. I am blaming myself for this. MongoDB is a good tool. You can use it but just make sure it is what you need and it handles your requirements very well. This is not specific to MongoDB but applies to every tool we use.

I think it's a bit harsh for people to criticise either him or MongoDB too much.

At the end of the day, it was a big rush job, no time was allowed for testing for scaling, and it fell over.

My only real surprise is that it wasn't Apache that fell over first.

davidw · on Jan 12, 2012

> My only real surprise is that it wasn't Apache that fell over first.

The Apache web server is a very well-tested, tried and true piece of infrastructure, even if it's not "hip" these days.

EwanToo · on Jan 12, 2012

True, but it's defaults are not really "battle ready" and since they didn't tweak MongoDB in advance, I can only guess that Apache was already pre-configured for that load - perhaps they used it for their existing web servers?

spydum · on Jan 12, 2012

I would guess Apache httpd is a tool they are familiar with and comfortable managing. Trying to building a high performance MongoDB system in just a couple days was probably a bad first project with the tool. They really should have just stuck to what they know, until familiar with the new shiny.

dasil003 · on Jan 12, 2012

True, but it often needs tuning to handle massive loads.

TheSmoke · on Jan 12, 2012

apache had not fallen first because netscaler is load balancing between 26 web servers and they are already pre-configured for all these heavy load.

nphase · on Jan 12, 2012

26 webservers and only two mongodb instances? Seems stacked.. Why not set up more instances and split the load across them (sharding, splitting collections across replica sets, etc) or similar?

Also - big question for me: were you using a ridiculously old version of mongo? I have yet to run into anyone who is on 2.0 or 1.8 and is still running master/slave (as opposed to replica sets). Sounds like your writes weren't working when you moved to replica sets because your client wasn't properly configured.

obtu · on Jan 12, 2012

People may be prompted to react because the title doesn't reflect the content.

kennu · on Jan 12, 2012

So he did zero load testing before publishing an app live on television, developed with technology completely new to him?

Jgrubb · on Jan 12, 2012

Developed in four days, don't forget that part.

kennu · on Jan 12, 2012

It just sounds like a huge gamble. It would be quite lucky to get everything working by chance on the first try, without any bottlenecks in any part of the newly created config.

GFischer · on Jan 12, 2012

I guess he took away some good lessons from this, since his "morals of the story" say:

* Do not accept responsibility for anything that you had to do in a very limited time.

* Do not accept the job if the timeline is short, the work is big and the load is heavy.

* Load test your application no matter what the cost. If they want to get them all, they need to pay for them all.

Failing is hard, but I'm sure he learned :)

gbog · on Jan 12, 2012

I'd add "don't ever change your techno 4 days before you go on TV"

skormos · on Jan 12, 2012

Seriously, the article should've been titled "Why You Can't Deploy Publicly Available Apps in Four Days".

sylvinus · on Jan 12, 2012

yes to me this is the craziest part.

psyren · on Jan 12, 2012

If you have 4 days to implement something important, the last thing you should do is pick a technology you have no experience with.

jeswin · on Jan 12, 2012

Among reasons, I see stuff like this: "Appearantly someone set the connection timeout to 9000 secs which means a connection to a MongoDB instance through Netscaler is open for 2.5 hours. I have set it to 20 secs. I am not looking for someone to blame but probably it's our sys admin.

Wouldn't any database give you plain bad performance if settings were badly configured? I understand it was merely one of the reasons, but sometimes all you need is one.

TheSmoke · on Jan 12, 2012

this was a reason specific to our stupidity. why on earth would someone keep connections open for 2.5 hours? how he misses this setting?

my point in this is, if you have something that balance connections between servers, check if it kills them or not.

mnutt · on Jan 12, 2012

Perhaps it was just some mismatch between the expectation (pooled, long-running connections) and reality. (many, many short-lived connections)

I think another takeaway from this is any time you need to adjust the ulimit, take a step back and ask "Does this make sense? Is my use case exceptional enough that it won't perform within the default limits set by my OS?"

hieronymusN · on Jan 12, 2012

Were you doing any connection pooling with Mongo? I'm curious what driver you were using, etc.

alexchamberlain · on Jan 12, 2012

He gives the impression in the article that it was PHP.

meanguy · on Jan 12, 2012

"the only thing I had in my mind was MongoDB."

Well, when you approach it like that...

samarudge · on Jan 12, 2012

Interesting, I recently worked on Project4Awesome, a community event on Youtube. Our site used a single MongoDB node on a relatively standard server and handled >100k pageviews in 24 hours with no caching. That being said, I've worked with Mongo before and know many of it's issues, so I knew what to expect.

Though, without meaning to criticize the author too much, increasing various ulimits should be standard on pretty much any database server, I would have thought?

kennu · on Jan 12, 2012

Do you have any advice on the MongoDB issues to be aware of in this kind of usage? I wonder if there is a checklist somewhere.

DanielShir · on Jan 12, 2012

I'd also appreciate the same checklist samarudge :)

meow · on Jan 12, 2012

A pre-production move checklist would be very helpful :)

po · on Jan 12, 2012

It is hard to tell but it seems like he's talking about serving 1/3 of your daily traffic in a few minutes. When web servers fail, they tend to fall off a cliff.

samarudge · on Jan 12, 2012

Indeed, but Mongo is actually quite good at failing predictably. I think the biggest issue is not Mongo, but failing to test the application correctly. Particularly when working with a new technology. Mongo isn't the best fit for everything but nearly all the issues he describes are more down to not having enough time to plan and test before the project, and one would have probably encountered similar issues if using nearly any software at that scale, with no prior knowledge or tuning.

lrobb · on Jan 12, 2012

Yeah, I read it as 30k requests in a matter of seconds.

TheSmoke · on Jan 12, 2012

i've checked how many pageviews we've got the very next day, it's >100K as well. that's without caching. i have not observed any problems that day. the real deal here is handling such jump in a very few seconds.

and yes, increasing ulimits should be standard. one does simply forgets every single optimization when you just have to deliver a working app.

samarudge · on Jan 12, 2012

Sudden increases are hard for any app. I just find it strange that, when you talk about Redis with such familiarity, you'd choose a new technology with 4 days to launch. An F1 car might be fast, but if I had to race one I'd end up in a wall.

In saying that, I'm quite envious of your job, that sounds pretty cool (until things go wrong).

TheSmoke · on Jan 12, 2012

it is definitely both cool and exciting but depressing and stressing as well. i think every developer should have this experience at some point of their carrier.

cagenut · on Jan 12, 2012

This is straight up web operations 101, has nothing to do with MongoDB.

Hearing you pushing for a new tool at the last minute, then changing its setup/config/architecture repeatedly over a few days, then blaming your sysadmin for a setting. Man I'd hate to have that guys job, you just steamrolled him with "dev found a shiny new toy" behavior.

level09 · on Jan 12, 2012

This post was very helpful indeed. we are facing almost simliar issues with our default box configurations (ulimit and connection counts etc) while we are serving real time updates to our visitors..

Thanks for posting this.

whyme · on Jan 12, 2012

> THE MORAL OF THIS STORY

The OP missed:

When given a project that has high visibility and a very short timeline, don't wing it and choose a technology that 1. you have no experience with and 2. you've done no testing on.

Sorry, but I feel the OP is failing to see he made the critical error.

ww520 · on Jan 12, 2012

One advice is to use tried and mature tech that you KNOW when have to get something done under pressure. There are enough moving parts to get something done in 4 days. Let alone introducing some "hip" untested tech in short amount of time.

rjurney · on Jan 12, 2012

Bigger machines. More RAM. When in doubt, and in a rush, throw hardware at the problem.

mrinterweb · on Jan 12, 2012

I appreciate that the author put the disclaimer at the top that MongoDB is not to blame, but I do feel that for someone skimming headlines, the impression they may arrive at is MongoDB = bad.

chris123 · on Jan 12, 2012

Sounds like less of a Mongo problem and more of a "trying to do too much, too fast, too big" problem. The Greek call this "Hubris." Great lesson/reminder to share. Thanks.

benmmurphy · on Jan 12, 2012

how many db servers can handle 20k connections. dbs usually aren't written to handle that. most mysql implementations are going to have problems > 500.

jerf · on Jan 12, 2012

Put this in the URL bar for the page:

javascript:void(document.getElementsByClassName("post-body")[0].style.whiteSpace = 'pre-wrap')

PaulHoule · on Jan 12, 2012

so far I haven't been impressed with mongodb. i've seen it break down and die doing tasks that mysql does without breaking a sweat.

wseymour · on Jan 12, 2012

Stop blogging and start growing a pair. Aside from the monumental error of judgement in developing on an unfamiliar database in 4 days with zero testing, the 'morals of the story' crack me up:

"Do not accept responsibility for anything that you had to do in a very limited time." Yeah, because the ability to stand by your decision is a function of deadlines.

"Do not accept the job if the timeline is short, the work is big and the load is heavy." Wow, that's some pretty serious ambition right there.

I also like the way these points are first on the list, before actual lessons like: "Do not have trust in any of your tools until they prove themselves."

GFischer · on Jan 12, 2012

I think that learning to stand up to people making unreasonable requests is an important lesson.

I hope he also learned from his error in judgement.

And he actually did have some courage in submitting his mistakes in public for us to discuss :)

TheSmoke · on Jan 12, 2012

i have written all these because i have made these mistakes so someone in a similar situation does not. it's not about courage, it's knowing yourself. when you do stupid things, you have to acknowledge so you don't do them the very next time. since the show has 5 more weeks and we will have more load, more requests, more everything, i would like to hear everyone's opinion on that.

wseymour · on Jan 12, 2012

+1. I should change that opening to "stop blogging and continue growing a pair."

dextorious · on Jan 12, 2012

""" "Do not accept responsibility for anything that you had to do in a very limited time." Yeah, because the ability to stand by your decision is a function of deadlines. """

It most certainly is.

One can be forced to come up with something under a tight deadline that he wouldn't under normal circumstances.

emmapersky · on Jan 12, 2012

I thought MongoDB was webscale. hmmm. ;)

dasil003 · on Jan 12, 2012

Yes but apparently not televisionscale.