When learning code out of a book for a new language, one trick I found that work...

teach · on Dec 14, 2012

I'm a high school computer science teacher, and this is a WONDERFUL idea. I'm totally going to steal it and create a few assignments based on the concept.

keithpeter · on Dec 14, 2012

I teach Maths (in the UK), so pinch of salt needed with my suggestion below...

Suggestion: code fragment on your interactive whiteboard. Ask students what each line does, how the loop works &c.

Then blank the screen and get them to replicate....

teach · on Dec 16, 2012

Good idea but wouldn't work for me since I don't lecture anymore.

robomartin · on Dec 15, 2012

Please do teach them the value of data representation. I don't think this is obvious but it can make the difference between having to write a mountain of code or finding a simple solution.

Super-simple silly example: You have to write a program to work with four colors: Red, Green, Blue, Black. Naive representation will use strings: "red", "green", "blue", "black". A little smarter would be an enum type where red=0, green=1, blue=2, black=4. And, depending on context, it might even make sense to use individual bits to identify each color: red=01h, green=02h, blue=04h, black=08h. And, if the goal is to extend to other colors, it might make sense to represent them as RGB vectors.

You get the idea. I find that a lot of newbie programmers are too focused on the mechanics of writing code. They forget (or don't know) that investing some time in optimizing the representation of the data or problem they are trying to solve could make a world of difference.

UnoriginalGuy · on Dec 15, 2012

I think it is important to learn enum's but it is a bad example. You really should use the built-in types when they're available. Color is a particular example (in both .Net and Java) where you should never re-invent the wheel.

In general teaching enum's is very important and something I've not really seen schools do (although their "learn to program" programs are typically 101 to newbie, I've never seen a program that really builds on the basics).

robomartin · on Dec 15, 2012

While I generally agree with what you are saying, it is very important to understand that built-in types are not magical. This is where coming from a low-level embedded C or assembly background can be very useful.

As an example, there are a number of sites showing the performance hit you take if you use some of the NS types in Objective-C. If I remember correctly, in some cases you are talking about NS types running 400 times slower than alternative code.

In my mind, the question of data representation could also include this very choice: Do I use an NSArray or do I do it "by hand", allocate memory and use a "simple" array?

KPLauritzen · on Dec 15, 2012

COuld you elaborate a bit on what the benefit of the other representations in your example. I don't get it.

robomartin · on Dec 15, 2012

OK, I pulled colors as an example out of thin air but I'll see if I can make it work.

Let's say you have to write a routine that does something based on a color as the input. You have a few choices in terms of how to represent the colors:

    - The name of the color in a string
    - A typdef enum for your colors (integers)
    - A color-per-bit scheme (U8, U16, U32)
    - Channel-per-bit scheme (U8)
    - 4 or 6 bit packed RGB values (U16 or U32)
    - 8 bit packed RGB values (U32)
    - 8 bit unpacked RGB values (struct of three U8)
    - 16 or 32 bit unpacked RGB values (struct of three U16 or U32)
    - Unpacked RGB floats (struct of three floats)
    - and more...

I won't go into the implications of each of the above. Some of it is highly dependent on both the system and the objectives of the work being done.

Say, for example, that you choose to use the names of colors stored in strings as your color representation. Now you have to compare strings in order to identify the colors:

    if(strcmp(input_color, "red") == 0)
    {
      // Do something with red
    }
    else if (strcmp(input_color, "green") == 0)
    {
      // Do something with green
    }
    else if (strcmp(input_color, "blue") == 0)
    {
      // Do something with blue
    }
    ... etc

Regardless of language the strings need to be compared character by character. Even if a language or OO framework allows you to say something like if(string1 == string2) you have to keep in mind that what is going on behind the scenes is pretty much exactly what strcmp() has to do. Which means that the above is, at the very least, slow.

And, of course, it isn't very portable. What happens if the input has to be in German or Japanese?

The typdef enum representation gives you the ability to use a far more efficient construct to identify your colors:

    switch(color)
    {
      case COLOR_RED:
        // Do something with red
        break;
      case COLOR_GREEN:
        // Do something with green
        break;
      case COLOR_BLUE:
        // Do something with blue
        break;
    ... etc

This is much, much faster. It is, at the core, an if/else-if structure that is only comparing integers, which is a single machine language instruction. Fast and clean and language-portable by means of the proper text-to-integer function somewhere to deal with different languages.

If you are on an embedded system that can do bit testing in machine language it might make sense to encode one color per bit or one color channel per bit. for example, in some embedded C dialects you might be able to do something like this:

    if(color.0) // Select and test bit 0
    {
      // This is red
    }
    else if(color.1) // Select and test bit 1
    {
      // This is green
    }
    ... etc

At this level the advantages of doing this are tightly linked to the platform and the goals of the application.

If, for example, one needs to be able to expand the available range of color inputs beyond what can be described with simple words a discrete RGB representation might be the best choice. This is also the case if you wanted to future-proof the program and be ready for when more colors arrive.

Here you have several choices, two of which are to represent each channel with an 8 bit value or choose floats instead.

The 8 bit values can be packed nicely into a U32, making it very efficient. You could also create a struct to facilitate access to the components and let the compiler optimize for you.

The float example is interesting because the conversion from float to whatever (if necessary) can be of any bit width. So, for example, if the color needs to ultimately be mapped to an 8-bit-per-channel display device you can translate from float to 8 bits on output. All of your intermediate math and color manipulation would be done in full-resolution floats which means that you are not going to accumulate errors. This, for example, is important if you are applying FIR filters to calculate missing color sample data from certain video data formats.

Packing has its issues as well. If you are dealing with little-endian vs. big-endian systems there might be overhead associated with unpacking and possibly rearranging a packed RGB value. If you are dealing with processing colors at a massive scale this can have performance and even power consumption implications.

I may have been lucky in that my very first CS professor was hell-bent to teach the importance of thinking deeply about data representation BEFORE thinking about code. He'd repeat this mantra 'till you were sick from hearing it. Years later I'd learn to appreciate this bit of wisdom in more ways than one.

kragen · on Dec 15, 2012

I think it depends on a lot more of the circumstances than this. For example, in many languages, you can intern the strings so that a string equality test is just a single machine instruction. And if these color representations are crossing some interface that needs to be kept stable, it's a lot easier to add new colors if what's crossing is "red" or "#ff0000" than if it's "2". And it may be that what you're doing with the colors is just generating HTML, rather than doing multi-way switches, in which case the enum implementation has no advantage over the string representation; it just increases code duplication.

The probably more important consideration is that with an enum, your compiler can catch misspellings. Depending on your runtime environment, this can be a huge killer advantage. In particular, if your runtime environment can't do much beyond blink an LED to report errors, compile-time checking is really really important.

robomartin · on Dec 16, 2012

I think we are slicing this a little too thin. My original point is that it is important to understand that the choices made when representing data can be important. My off-the-hip example was not meant to be definitive.

tomasien · on Dec 15, 2012

Can I just say that THIS is why I love Hacker News. People are serious about sharing their ideas on a deep level.

Thanks for sharing!

justincormack · on Dec 15, 2012

That's very variable by language. If strings are interned they are equivalent to enums anyway for example.

ordinary · on Dec 15, 2012

Color.Bleu will give a compiler error. "Bleu" will not. And is "White" a valid color? No idea, need to check the docs. Try Color.White and the compiler or editor will tell you.

hsitz · on Dec 14, 2012

Another +1. When I've used this method I've noticed that the tendency at first is to try to just remember the code and copy it out of your memory. There comes a point, though, when you don't remember it well enough to "copy from memory" and this naturally segues into a focus on the logic and syntax of the code so you end up reconstructing working code without fully remembering the original.

sharkweek · on Dec 14, 2012

One thing that has really sped up my my learning over the past few months on Code Academy -- complete a lesson, then in another tab open their scratch pad feature and try to retype the code not only from memory, but also while thinking through the logic.

Has really been effective for me as I feel like I now have a pretty good (beginner) grasp on JS and Python

muratmutlu · on Dec 14, 2012

I have to try this. Could you code in other languages before doing this?

sharkweek · on Dec 14, 2012

Not really, no -- it's been pleasantly surprising how quickly I've picked it up using the above method though. I highly recommend giving it a shot.

eru · on Dec 14, 2012

Interesting concept. Especially if you consider that better memory for plausible situations is one of the differentiators between experts and beginners. Be it in programming or chess. Because experts have developed mental shorthands for common situations.

RegEx · on Dec 14, 2012

This is exactly how I handle learning languages/frameworks. I wrote a little something on it as well [0]. I value thorough comprehension over "how fast can I finish this book?". I try to get my hands as dirty as possible. I want to see the warts of this thing I plan on using to build projects. Trying to rebuild book examples from memory forces you to consider the requirements and the end goal. This was particularly useful when I first learned about linked lists and queues in C.

[0]: http://vertstudios.com/blog/how-to-read-programming-tutorial...

ashutosh2000 · on Dec 15, 2012

This is really a nice trick!! Thank you for sharing!

spfoos · on Dec 14, 2012

This is how I learned/am learning Erlang.

jerf · on Dec 14, 2012

Reimplementing standard library functions is great for Haskell too. The biggest of the "big breakthrough" experiences I had was from trying to reimplement the standard monadic State datatype.

ufo · on Dec 14, 2012

And then it stackoverflows and you learn that there are actually two standard monadic state datatypes :)