This is an excellent course and helped me get my current job.
My background is chemistry/chemical engineering. I had applied for a data scientist position. Phone interview included a problem where I was asked about my solution's complexity. I admitted I didn't know about it.
Still got called back for an interview on site, but the weekend before I powered through this course. Unsurprisingly, it came up in the on-site and they were really pleased I had learnt about it. I got the job.
I also found it useful to implement all the algorithms in Python.
Python is the algorithm king as far as I'm concerned. It really gets out of your way and lets you focus on the abstract nature of what you're trying to accomplish.
While I applaud Python to have established a well-designed[1] layer on top of the math/numeric libraries written in Fortran, C and C++, I hope that one day Rust will smoothen the corners.
While Rust might not become a replacement for the Python layer, it may replace the C/C++/Fortran layer with all their speed and low-level optimization, yet provide good (and especially safe!) abstractions on top of that.
Currently, people try to use C++ to fill that gap, but I'd love to see Rust's type system, borrow checker and macro system, instead of C++ templates.
It also forces you to know what's happening under the hood though. If learning the material comprehensively is your goal I think it's not a bad idea to dig in to a c implementation.
Javascript is the kitchen hand that you taught how to chop vegetables. Yesterday you asked him to chop wood, and you've spent all of today trying to figure out why your kitchen knives are dull.
I'd disagree. Python abstracts a little too much away, and it's much easier to figure out loop ranges, invariants etc in an algorithm from say Java code.
Python works well when you have to convey your algorithm in a nice succinct functional manner, but that's optimizing for cuteness, not understanding.
Python is a fantastic language for learning algorithms. It's also a good interface for higher performance, lower level systems language libraries. That said, it is certainly worth the effort to learn one of those systems level languages to pair with python.
I always say that python is like the English of programming languages; it's an amalgamation of other languages, programming paradigms and a rich set of libraries, and often serves as a 'Common Tongue' to glue disparate processes together. There is an inherit risk to that, though. Because it's so easy to transition between object/procedural/functional modes, because it's so easy to tie together so many different interfaces, because it's so easy to `import everything`... it is often too easy to avoid a proper separation of concerns. The flexibility comes with a disincentive towards discipline.
All of the pedantry of systems level language performance and practices aside, even the context switch alone between 'algorithm language' and 'application language' is beneficial to promoting better discipline in both areas.
Hardly. How often is it that you can read an uncommented Python program that implements a tricky algorithm, and you can easily recover basic things like loop invariants?
But then there's no such thing as an unannotated program. I'm confident in my ability to prove things the old-fashioned way, using brain, pencil and paper, so I don't see much value in merely having my proofs verified by a machine. If the machine can't contribute to the effort of actually coming up with the proof, it should stay out of the way.
In my opinion, an understanding of data structures is _much_ more useful for a data scientist than algorithms.
Why should data scientists know about algorithms? Because data scientists are typically interviewed by computer scientists/software engineers, and that's what they tend to ask.
I recently conducted many phone and on site interviews for a data scientist position. I didn't ask about algorithms.
For software engineers, algorithmic complexity is a good filter for, say, Javascript hackers vs people with a university education in computer science. Just saying.
And in your mind, "javascript hackers" are worse than "people with a university education in computer science" at doing modern front-end web development?
In my experience, building performant web applications is much more about things like reducing bundle size, making sure animations are hardware-accelerated, being smart about _when_ you do complex work... The cost of using an O(n^2) algorithm over an O(n) algorithm will rarely have a tangible impact, since you don't typically deal with item sets that large on the client.
Not debating that CS fundamentals are valuable, just that they are _far_ from the most important skillset to have. Give me a JS hacker who understands page load times over a CS grad who writes his own bucket sort algorithm any day.
I'm not sure where you got the implication that one was being called out as worse than the other, but the comment is really just noting it's a way to filter out specific groups. That filter may be useless when hiring for a front-end developer, but on the other hand, if hiring a developer to work on your new database product, it may be very useful indeed. As you note, they commonly deal with different types of complexity.
If the 'javascript hacker' doesn't learn about the difference between iterating through a list and binary searching, and how/when one is better than the other, yes it is a problem.
I say this as a self taught programmer who studied a non-CS engineering well after learning about big-O.
Can you give me an example of when a front end developer would need to do either of those things? On the back end sure, but on the front end? Who in the world is using JS to iterate through a list or do binary searching on the front end?
And also who in the back-end uses binary search? We're the people who invented NoSQL databases with HTTP/Json interface, because traditional databases were too much of a hassle.
The DBs implement the binary-search, not the back-end Dev.
For the average programmer, IMHO learning about data structures/algorithms makes you a better programmer, but it's not that essential.
Maybe for front end a better example would be the difference between a list and a hash map.
And the front end is used for some pretty crazy things these days, eg 3D rendering. Sure if you're making a social network for cats it probably isn't an issue, but then not everyone is.
Where I work, one of the applications we've recently built is a planning and design tool for a specific type of structure. There are both electrical and mechanical considerations to address, and the problem domain is sufficiently broad and complex as to require, at time of writing, about 25 megabytes of constant data to cover all the possible designs. Because of a client requirement, the application is also frontend-only, with no backend interaction beyond the initial download, and a PDF of many pages as the end product of the process.
When your UI is based on 25 megabytes of lists and maps, you do a lot of iterating and filtering. In our current implementation, it's possible, but difficult, to produce a case where it spends as much as a quarter of a second doing that in one go. But only the first time; if it takes that long (anything over 100ms), we memoize the call so the next time you get it for free. Binary search hasn't yet been required, but it's on our short list of performance improvements to apply once the backing data grows enough to require them.
Another project from last year was a tool to consume very large (250-300M) XML dumps from a very large, very expensive line-of-business system used by our finance department, and produce a wide variety of aggregations to simplify verifying the output of said very large, very expensive system, which I gather may not always be able to perform arithmetic correctly.
This one wasn't a team effort; it was one of those things where you get the spec on Friday afternoon and the deadline is Monday morning. (Not something I'd tolerate on a regular basis, but when the stakes are "it's this, or the SVP Finance sends a helicopter to retrieve the VP IT from a cruise liner"...) But it's also a frontend-only application, because installers take time to get right, too, and in any case something so simple has no need for a backend.
It takes about three minutes to fully parse, process, and report on a 250M XML dump. It leaks no resources, does not hang the UI thread, and presents a cute Bootstrap progress bar along with a table of running totals, so the user knows what it's doing. (Not gonna lie, I was showing off a little with the table. It's fun to watch numbers blur as they spin upwards in value!)
I concede that this tool doesn't do much work with long lists - mainly just the values of interest from the raw dump, which are several lists of maximum cardinality on the order of ten thousand, and the negligibly short lists used to maintain parser state information. But I do gather a certain impression that you're one of those sadly behind the times folks who still thinks of the browser/JS platform as a cute toy and a decent document reader, but not up to anything remotely resembling Real Computational Work, and I thought I'd include this example, as well as the other, in order to help disabuse you of that rather outdated notion.
Hm. Yeah, fair question. I guess I was assuming that, since the applicants could be described as "javascript hackers", then it was a JS position. Most JS positions are front-end ones.
True. But before the days of Structured Programming widespread understanding of algorithms was hoped to lead to a professional Software Engineering field.
The idea of "Software Engineering" was born with the Software Crisis report in the 1960's.
Later on practicing SE's would study things like Design Patterns so things evolved over time.
EDIT: Software Engineers probably should know some relevant basics, but programmers could never come up with a "body of knowledge" like real engineers have.
It all depends what you work on and what new developments keep coming out...
Seems like a terrible filter. Some CS grads who slept through college will fail, while some non-CS grads who studied on their own will pass. Of course, to me, that would a feature, not a bug; but if you really want to filter on "university education in computer science", just read their resume instead.
Back in 2010, I was on a team with a intake of about 1 TB per day of mapping data, all of which needed to be integrated with previous data and condensed into processed output. I quite assure you that we cared a _lot_ about the runtime complexity of our algorithms.
All the really interesting jobs require knowing data structures, algorithms and discrete mathematics well.
Sure. But that just means they're more domain specific than fundamental, at least as far as tech interview conventions go. To name another core CS subject- understanding operating systems and the underlying assembly and machine language that your nifty web and mobile platforms are built upon are important. But they're not relevant to every single type of job.
It's strange they didn't cover dynamic programming at all. IMO every course should include at least one classical example of dynamic programming. For example:
You might call it a super-set of recursion-with-caching. It's a common technique, but it's not the whole field - you can also start at the "leaves" and summarize your way up to the "start" (and similar-ish things that don't involve recursive thinking at all). In many ways similar, but not actually the same. No call stack, for starters.
"just caching" is also hard to apply to e.g. the tower of hanoi problem.
The Wikipedia article is incorrect. The caching, or memoization, is a technique heavily utilized in dynamic programming. Page 183 of Algorithms [0] has an explanation of what memoization is and how it is used in dynamic programming. This StackOverflow answer [1] also has a good explanation of the differences between DP and memoization. Any decent algorithms textbook will also have a section dedicated to DP.
> Then why did recursion work so well with divide-and-conquer? The key point is that in
divide-and-conquer, a problem is expressed in terms of subproblems that are substantially
smaller, say half the size. For instance, mergesort sorts an array of size n by recursively
sorting two subarrays of size n/2. Because of this sharp drop in problem size, the full
recursion tree has only logarithmic depth and a polynomial number of nodes.
> In contrast, in a typical dynamic programming formulation, a problem is reduced to
subproblems that are only slightly smaller—for instance, L(j) relies on L(j − 1).
I think I got it. The distinction isn't that clear cut anyway. For example, I could implement a mergesort with caching and say that I'm technically doing dynamic programming, even though the cache would get hit exactly zero times. It would sort of be "trivially" dynamic programming. On the other hand, the first time I calculated fibonacci numbers recursively, I've added a cache. This would be "memoization". Or in other words, wikipedia article is correct, but some problems don't benefit from the caching.
My Data Structures & Algorithms class uses this book - it's fantastic. One nitpick is the "EASYQUESTION" sorting visualizations in the book; they aren't too easy to quickly understand (why use letters instead of numbers demonstrating sorting algorithms or trees?
The Coursera Stanford [0] and Princeton [1] courses start again soon, February 20 to be exact. Not sure which one is better, but to refresh my atrophied CS skills of 10 years I've joined the Stanford course. Not sure how it compares to the Khan Algorithms course. Anyone have any feedback?
This is just my opinion and I'm sure it differs from others...
Roughgarden's class is advance and expects mathematical maturity. You may find his course quite fast and rough if you are a beginner.
Sedgwick's class is much easier. He is a bit boring and tries to use "real life" examples (in some instances) from the physical sciences to make the material relatable. This in my opinion detracts from the material. Also, he doesn't always fully explain where he got some of the big ohs here and there.
My advice? Follow MIT's OCW course (it uses CLRS). Supplement it with Algorithms Unlocked, the Khan Academy link in OP and CLRS. If you use those 4 resources and put in the work you'll understand the material.
All 4 sources have Thomas C's DNA touch to it (he is the C in CLRS). So you'll find it consistent when you read from one source to the other. After reading/hearing the same thing about 4 different times in 4 different ways it'll begin to click.
Order of easiness is probably Khan Academy > Algorithms Unlocked > MIT Algorithms Course > CLRS.
Algorithms Unlocked is like "pre-CLRS" and Khan Academy's version is the TL;DR version of Algorithms Unlocked.
I've taken both Stanford's and Princetons Coursera courses, and powered through the MIT OCW, and I would say this evaluation is spot on.
If you have to pick only one go with the MIT OCW, and snag a copy of CLRS. I got mine from my local lib. and it gave me more than enough time to work through the problem sets from the mooc.
Great feedback and insight. Appreciated. I'll checkout the MIT OCW and Khan Academy first. I'm hardly a beginner, but my skills feel a bit rusty and want to refresh them.
I'm going through Sedgewick's class right now. Is the MIT OCW's course math heavy? It lists "Mathematics for Computer Scientists" as a prerequisite, I am somewhat familiar with the material, but not in a very deep level. Should I take that one before?
I find there are a lot of online resources for those looking to learn algorithms and data structures but I've had trouble finding the same breadth and depth of resources surrounding the math behind CS (discrete math, probability, etc.). Any suggestions?
All are good and are pitched at various levels of complexity. The Princeton course uses Java. Okay if you're into that sort of language/thinking. MIT is using Python. Found one using lisp,
"Systematic Program Design" ~ https://www.edx.org/xseries/how-code-systematic-program-desi...
I didn't care for this book. I found though the use "doodle drawings" for visualization to be hard to look at and distracting. The book felt half-finished to me. For instance how does an algorithms book not include anything on trees?
I'm not sure I would say the book is "un-pythonic", but rather the book intentionally avoids overly language specific idioms in order to reduce algorithms to their most basic form. Once you have the basic knowledge its trivial to implement them in any language you want, using those language specific idioms - swaps without a temp variable or list comprehensions etc. It is not a "Python book" per se it is an algorithms book that uses Python to teach. I've seen many Algorithm book that use Java use a stripped-down back to basics procedural style as well I think for the same reason. The book also incorporate code lens and Python Tutor so you can step through your stack frames and pause execution. This is a wonderful teaching aide, especially for things like recursion.
It's nearly a third of the length of CLRS, and half of Sedgwick. Much more precise, yet offers more in that it talks about common problem solving uses cases with data structures and algorithms, rather than writing going through the theoretical proofs behind them.
Talked to someone who wanted to work at one of the big megacorps as a software engineer, asked for advice to pass the interview. I asked them if they could implement quicksort. They said maybe, but they didn't really want to study algorithms. I guess they really didn't want the job after all.
I understand that the focus on sorting is to have a simple application that everyone can understand. But I can't seriously expect people to get that excited about sorting. I sure didn't. It's not like we don't have tons of other applications that demonstrate the same principles.
And I'm kind of smirking right now, because again asymptotic got butchered.
I've spent the good part of this semester trying to get my head around a very formal, very dense script of my own algorithms course. And I finally cracked asymptotic.
Maybe I'm just dense. But If that's the case, I'm sharing a classroom with others who are equally dense.
We dealt with all 5 classes, big oh, small oh, theta, big omega and little omega. We're required to always give the "most exact" classification for best/avg/worst. Including "does not get as fast as" or "does not get as slow as"
I'm willing to write a "freshman friendly" write up if someone's willing to post it or use it. I'm shit at self publishing.
My high-school-age daughter is using Khan Academy to learn about logarithms for her math class. She was telling me about it, and I thought "hmm, maybe I should finally figure out what Big-O actually means". Now here we are! I guess we'll both be on KA tonight.
It's hard to pick one thing to tell budding developers they have to learn but Big-O notation is definitely up there.
The follow up to that is understanding what you're counting and why, i.e. branches v.s. statements v.s. dereferences v.s. logical I/Os v.s. physical I/Os ...
Odd choice to start with the iterative factorial before moving on to the recursive one. Usually it's the other way around, since the iterative algorithm is faster and uses less memory.
The general idea is that something takes O(f(n)) time if it takes at most C·f(n) time for some constant C and all but finitely many values of n. The 'all but finitely many values' is what makes this definition 'asymptotic'. Basically 'O(f(n))' ignores constant factors and the behaviour at 'small' n (i.e. small inputs), the reasoning behind this is that an algorithm in O(f(n)) is faster than any algorithm not in O(f(n)) provided you make the input big enough.
The little o, big Omega, big Theta are just small variations on this, which won't be too hard to understand if you get the general concept, and really the distinction isn't too important usually, just know that O(f(n)) gives an upper bound, not necessarily the best possible upper bound. To understand the big-O notation better it might help to have some basic knowledge of limits.
The information made intuitive sense to me. I just couldn't apply what was read directly to the exercises. It felt as though something crucial had been omitted. That something turns out to be calculus.
It is really sad that people still start the discussion about algorithms by telling it as a sequence of actions or operations to accomplish a task. How to go to airport is not an algorithm, how to cook food is not an algorithm.
While I agree "following instructions" !== "algorithm", it does serve as a decent bit of context to people who are COMPLETELY unfamiliar with computing let alone programming.
It provides them incorrect context, if you really want to provide context then start with long addition which everyone understands and tell them that it was the first algorithm they learned. Physical activities are not algorithms, algorithms are sequence of calculations on data and there are many many examples that people know about that they learned in elementary mathematics and that can serve as the proper context.
Data structures and algorithms are foundational topics in computer science. While they can seem daunting to beginning programmers, they become very important as you progress into writing more advanced programs. Also, the interview process for software engineers at most companies ask about them almost exclusively.
> they become very important as you progress into writing more advanced programs
As someone with a Computer Science degree I can say that the only times I have ever used any of the algorithms I learned directly was when writing low level C and GoLang. I'm willing to bet 90% of programmers... even those that right "advanced" programs do not use them day to day.
Every algorithm and data structure worth anything has been abstracted out into easy to use libraries years ago.
Any time I make the effort to learn more or brush up on CS fundamentals I end up forgetting most of it in short order because I so rarely use any of it at work.
Ditto mathematics past roughly Algebra 1. Every now and then I try to brush up on calculus or linear algebra or something, but it takes so much time to keep those skills up when you're not using them at work. Like keeping up foreign language skills while living somewhere you rarely encounter native speakers.
I'm a self-taught software engineer and learning about data structures and algorithms has made a huge difference in my ability to deliver solid code. The basics of graph data structures and algorithms (DAGs, topological sort, BFS, connected components, etc) seem especially helpful, and it's good to have a constant awareness of the time complexity of the code I'm writing.
I'm not talking about implementing my own sorting algorithms every time I need to sort things; I'm talking about recognizing that a particular problem can be efficiently represented by data structure X and solved with algorithm Y. Hopefully I'm writing none of this code myself, but a programmer who doesn't in theory know how it all works is never going to be able to compose pre-written code into an optimal solution. I know, because I've been that programmer.
I don't know what you mean by "advanced" programs. I guess I fall within your 10% by definition, since I can anecdotally say that having and using this knowledge has made a big difference for me. I would say that I use this stuff 5% or less of my time, but for that 5% it can make the difference between run time of under a second vs. multiple days, and between a brittle, defective solution and a rock solid one.
Not to mention that it's extremely handy for getting hired in the first place, and it's just plain fun.
Sure, they'll already exist, but you need to know the performance characteristics of each, so you know when to use one algorithm/data structure vs the alternatives.
I suspect that's untrue for many modern areas of development that are still "worth" something. When it comes to road routing I don't know of any .Net libraries providing modern functionality (hierarchy pre-processing, landmarks, arc flags etc...).
There's a lot more to data structures and algorithms than sorting. For example, just knowing that bloom filters or interval trees exist opens up a ton of options, even if you don't implement them yourself.
For that matter, the choice of sorting algorithm can have security implications. QuickSort on user-facing data can easily become a DoS vulnerability.
Let's say you are writing a python program, and you want to simply check if an element is in a list. So, you build a list and then check if 'd' is in the list:
mylist = ['a','b','c','d'].
if 'd' in mylist:
...
This works just fine, however, the time it takes, to find the item, grows proportional to the number of items in the list. If your list grows to 1000 items, and the item you are searching for is positioned last, python will check 1000 times. This is known as O(N).
Now, how does the performance compare, when using a set data structure?
myset = ('a','b','c','d')
if 'd' in myset:
...
Well, underneath the hood, the set stores the data in what's known as a hash. The time it takes to check if an item is (or isn't) in a list does not grow proportional to the number of items in the list—it's always constant: O(1).
in all seriousness, algorithms and data structures are essential to computer science theory. When trying to get a job as a software engineer, technical interviews will mainly be problems about algos and data structures.
Data structures are ways of organizing and storing data.
Algorithms are recipes or instructions.
Usually it's learning algorithmic techniques for solving various computational problems and implementing algorithmic coding problems in a programming language.
At some point the performance of your code (in time and or space) will really matter. Solid knowledge of algorithms and data structures are essential to getting that right.
Early in my career, actually before my career, I taught myself the available language on our department's mini-computer (don't remember the language or the computer), so I could automate parts of my and my colleagues' job.
I had to sort something, and I innocently "invented" bubble sort to do it. The system administrator visited me the day I first ran it, and told me to stop doing that.
You will learn more about this, and that will make it easier to join the tech community.
--------------------------------------------
Also, to add some signal to my noise, Tom Cormen is apparently one of the designers of the course. He wrote the gigantic encyclopedia of algorithms, as well as a shorter, more-entry-level book on the subject. You'd be in good hands to learn the topic.
And the topic is one that teaches you how to reason about your programs in a way that works across languages and disciplines.
exposure to more solution techniques into programming problems that you might run into most likely in an interview session.
A lot will say learn this as it is the "core" of programming, but in industry practice I've seen using pure functions (functions that return the same result when the same parameter is passed in) is becoming more commonplace than having a trickier function that mutates a variable differently during each iteration, making it harder to reason about program state. Algorithmic problem solutions usually involve non-pure functions that are not intuitive until you've seen them before. Front-end UI is better built using several pure functions.
why python???? ... any language with functions will do. I mean just create a java class with all public static functions if you want it to work like python (global functions). Its really language agnostic. Your answer will be a number a string or a list of things. All languages can do that.
Im making an explicit opinion that python is no better than any other language for implementing algorithms. HN please prove me wrong in an objective way so we may all learn?
Significant whitespace is great for technical interviews - you don't need to worry about matching brackets and whatnot. No brackets also means you have some more vertical space to work with on the whiteboard.
Beyond that, if your interviewer is fluent in python, list comprehensions can greatly shorten the amount of boilerplate you have to write.
Amusingly, significant whitespace is the one reason I don't like using Python in whiteboard interviews - my handwriting is far from excellent on a board, and I don't want any ambiguity when reading my control flow.
I'll definitely second the list comprehension point, though. Between that and pleasant string support, a lot of standard interview answers are maybe 50% as long in python as Java. Not easier, necessarily, but faster to write and cleaner to read.
Indentation is a pretty good thing to practice for whiteboards. A lot of people have a tendency to waste more and more space on the left as they go. It's a pretty easy thing to fix, just do some questions and have someone there to correct you whenever you start doing it.
Even if you aren't using python, it will give you more room to work (the other part of this is divide the board before you start).
Yeah, I definitely suffer from that rightward drift, which I suppose I should work on. Somehow I never thought to divide the board, that's an easy improvement!
I'm not making a claim about Python being "best" for algorithms.
The course uses JavaScript. I recommend re-implementing the algorithms covered in the course in another language that you know well. It helped me really understand what's going on.
One thing that I like about Python implementations of algorithms is less thinking about types. When I implement my sort algorithm, I just need to say > or < and assume the caller has given those terms meaning.
Not a Java expert, can you do something similar with Java? Maybe with generics?
I'm no expert but due to the language I think you need to specify types somewhere. It just offloads it to your test code as opposed to the algorithms.
With python, your algorithm will work with the primitives without specifying either in the algorithm or test code. You only need to specify type, i.e maybe a new class and definitely instantiation, if you want to use something more abstract.
This would be true of Ruby and other dynamically typed languages. For beginner algorithms courses at least, a dynamically typed languages makes more sense since the algorithms taught are only clouded by typing. Complexity theory still applies and what not.
Maybe if learning algorithms that do heavy crunching, a lower level language, probably with static types, may be used but by that point you're not going to mind type info.
Yes, generics in Java or templates in C++ is how any algorithms library would handle this. Any type passed into such a function needs to possess the required operators and functions or it will fail to compile/run.
You might be complaining about students being taught Python and it becoming their "blub" language. We all start somewhere, and you can indeed go far with Python in some fields. It's popular with scientists for instance.
I agree with this. For me, I'm using the language I'm currently using at work (C#) to learn them better. If I'm going to be tested (in tech interviews) in the language I know my bet is that it'd be better to know the C# way of doing it and knowing how to implement it using C# conventions. I haven't interviewed yet, but this is just the hunch I have if the goal is to do well in a tech interview
My background is chemistry/chemical engineering. I had applied for a data scientist position. Phone interview included a problem where I was asked about my solution's complexity. I admitted I didn't know about it.
Still got called back for an interview on site, but the weekend before I powered through this course. Unsurprisingly, it came up in the on-site and they were really pleased I had learnt about it. I got the job.
I also found it useful to implement all the algorithms in Python.