Hacker News new | past | comments | ask | show | jobs | submit login

> nobody at Microsoft ever understood how the code worked (much less still understood it), and that most of the code was completely uncommented, we simply couldn't figure out why the collision detector was not working. Heck, we couldn't even find the collision detector!

This continues to be one of my pet peeves, particularly with code samples and a lot of what is posted on Github, even major libraries. Almost no comments, background or guidance on the intent and structure or the code.

No, I am not suggesting that something like this is necessary:

    // Iterate through all elements 
    for(int i=0; i < count; i++
    {
        // Check that velocity isn't above threshold
        if(velocity[i] > THRESHOLD)
        {
         // Limit velocity
         velocity[i] = THRESHOLD;
         ...
That's ridiculous. However, something like this is useful:

    // Bounds-check velocities
    for(int i=0; i < count; i++
    {
        if(velocity[i] > THRESHOLD)
        {
         velocity[i] = THRESHOLD;
         ...
Anyhow, dumb example I pulled out of thin air.

I've heard some say "I just write self-documenting code". That's a myth but for the simplest of structures. Any non-trivial piece of work is far from being self-documenting. Code is self-documenting for the guy who wrote it. I guarantee you that anyone else reading it has to reconstruct a stack in their head to understand what the hell is going on. That's not self-documentation. I shouldn't have to think to understand what a chunk-o-code is doing. The same for functions and/or methods.

The myth of self-documenting code is easy to demonstrate if I show you a piece of code in a language you don't know. I am going to assume that most programmers these days don't know assembler, Forth or Lisp.

    (defun GetPolylineEndEntities ( plename / plends cvcenter cvsize oldcmdecho pt1 pt2 endss outlist)
        (setq plends (GetPolylineEnds plename))
        (setq cvcenter (getvar "viewctr")
            cvsize   (getvar "viewsize")
            oldcmdecho (getvar "CMDECHO")
        )    
        (setvar "CMDECHO" 0)

        (foreach point plends
         (progn
            (setq pt1 (add2d point '(-0.0125 -0.0125)))
            (setq pt2 (add2d pt1 '(0.025 0.025)))
            (command "zoom" "c" point 2)
            (setq endss (ssdel plename (ssget "C" pt2 pt1)))
            (setq outlist (append outlist (list (ssname endss 0))))
         )
        )
        (command "zoom" "c" cvcenter cvsize)
        (setvar "CMDECHO" oldcmdecho)
        outlist
    )
I wrote this twenty years ago. Even if you understand Lisp you'd have to think it through. However, this is not how I wrote it. This is what I actually wrote:

    ;=====================================================================================================
    ; GetPolylineEndEntities
    ;
    ; Argument: Polyline entity name
    ; Return: Two element list containing the first (if any) entity found at the end of the polyline.
    ;         The polyline itself is excluded.
    ;         If nothing is found at a particular end, that element in the list is set to nil.
    ;
    (defun GetPolylineEndEntities ( plename / plends cvcenter cvsize oldcmdecho pt1 pt2 endss outlist)
        (setq plends (GetPolylineEnds plename))    ;Get the endpoints

        (setq cvcenter (getvar "viewctr")
            cvsize   (getvar "viewsize")
            oldcmdecho (getvar "CMDECHO")
        )    
        (setvar "CMDECHO" 0)

        (foreach point plends
         (progn
            ;Examine what connects at each end
            (setq pt1 (add2d point '(-0.0125 -0.0125)))
            (setq pt2 (add2d pt1 '(0.025 0.025)))

            ;Zoom to the end being analyzed to have better selection accuracy
            ; **** Have to figure out a way to do this without zooming ****
            (command "zoom" "c" point 2)
        
            ;Eliminate the original cable from the resulting selection set
            (setq endss (ssdel plename (ssget "C" pt2 pt1)))

            ;Add the first entity found to the output list
            (setq outlist (append outlist (list (ssname endss 0))))
         )
        )
        (command "zoom" "c" cvcenter cvsize)
        (setvar "CMDECHO" oldcmdecho)
        outlist
    )
Even if you don't know Lisp you now have an idea of what this code is doing. My style has changed over the years. This isn't my best example, but it is here to drive a point home.

The use of an unfamiliar language serves to illustrate the point that the idea of self-documenting code is, again, a myth. I wrote that code myself and without the comments I'd have to mentally reconstruct every step to even begin to understand what's going on and what the intent was. I haven't touched Lisp in quite some time.

I've looked through so much code in Github without a single comment that it makes me wonder if this is what is being taught in schools these days. Accurate in-code documentation is, as far as I am concerned, part and parcel of becoming a professional programmer. It recognizes that the work represents a huge investment in time, money and intellectual effort and it ensures that this effort and expense doesn't have to be duplicated in order to maintain, evolve or migrate the product as the Microsoft example clearly demonstrates.




Assembler makes the point even more apparent:

    MSG_LOOP_PostMessage:
        mov     a, MSG_Head    
        cjne    a, #MSG_BUFFER_END - 2, mlpm0
        mov     a, #MSG_BUFFER
        sjmp    mlpm1
    mlpm0:
        inc     a
        inc     a
    mlpm1:
        cjne    a, MSG_Tail, mlpm2
        clr     a
        ret
    mlpm2:
        mov     r0, MSG_Head
        mov     @r0, MSG_Code
        inc     r0
        mov     @r0, MSG_Parameter
        mov     MSG_Head, a
        mov     a, #1
        ret
I wrote that about fifteen years ago. Of course, the routine's label tells you something: "MSG_LOOP_PostMessage". It must post a message to a message buffer. The rest is giberish. What's the intent behind each and every block of code? Well, of course, that's not how I wrote it. This is what I wrote:

    ; POST MESSAGE --------------------------------------------------------------------------
    ; This routine posts a new message to the message loop buffer.
    ;
    ; The message code and parameter are written to the buffer only if buffer space is
    ; available.  This is determined by first looking at the difference between the
    ; head and tail pointers.  By design, we sacrifice one set of locations (two 
    ; bytes) in the buffer in order to create a gap between an advancing head pointer and
    ; a lagging tail pointer.  If a lot of messages are issued and not processed immediately
    ; the head pointer will quickly wrap around and threaten to collide with the tail
    ; pointer.  By sacrificing a set of locations we avoid having to keep a counter of
    ; unprocessed messages.  This, because there would be ambiguity when both head and
    ; tail pointers point to the same location: it could mean that there are no messages
    ; to process or that there's a buffer full of messages to process.  The two byte
    ; gap we are imposing between head and tail removes this ambiguity and makes it easy
    ; to determine the buffer empty and full conditions.
    ;
    ; If there's space for a new message it is stored and the head pointer advanced to
    ; the next available location.  However, if no space remains, the message is discarded
    ; and the head pointer will remain one message (two bytes) away from the tail pointer.
    ;
    ; Arguments
    ; ----------------------------
    ; MSG_Head                Message buffer head pointer
    ; MSG_Tail                Message buffer tail pointer
    ; MSG_Code                Message code
    ; MSG_Parameter        Message parameter
    ;
    ; Return
    ; ----------------------------
    ; ACC =  0 -> Message did not post
    ; ACC <> 0 -> Message posted
    ;
    ; Modified
    ; ----------------------------
    ; MSG_Head
    ; ACC, R0
    ;
    ; Preserved
    ; ----------------------------
    ; MSG_Tail
    ; MSG_Code
    ; MSG_Parameter
    ;
    MSG_LOOP_PostMessage:
        mov      a, MSG_Head                    ;Increment the head pointer by one message
        cjne     a, #MSG_BUFFER_END - 2, mlpm0  ;and parameter.
        mov      a, #MSG_BUFFER                 ;Need to wrap around.
        sjmp     mlpm1                          ;
    mlpm0:                                      ;
        inc      a                              ;No need to wrap around, just increment.
        inc      a                              ;
    mlpm1:
        cjne     a, MSG_Tail, mlpm2             ;Check for a buffer full condition.
        clr      a                              ;Flag that we did not post the message.
        ret                                     ;Exit if it is.
    mlpm2:
        ;The buffer isn't full, we can store the new message
        mov      r0, MSG_Head                   ;Store message
        mov      @r0, MSG_Code
        inc      r0
        mov      @r0, MSG_Parameter             ;Store parameter
        
        ;Now set the head pointer to the next available message location.
        mov      MSG_Head, a                    ;The accumulator already has the next "head" location
                                                ;there's no need to execute the same logic again.
        mov      a, #1                          ;Flag that the message was posted
        ret                                     ;and exit

Now you don't have to know assembler to understand this code. A couple of years later I had to re-write this in C. It was incredibly easy to do because of the exhaustive in-code documentation. This is what came out:

    // POST MESSAGE --------------------------------------------------------------------------
    // This routine posts a new message to the message loop buffer.
    //
    // The message code and parameter are written to the buffer only if buffer space is
    // available.  This is determined by first looking at the difference between the
    // head and tail pointers.  By design, we sacrifice one set of locations (two 
    // bytes) in the buffer in order to create a gap between an advancing head pointer and
    // a lagging tail pointer.  If a lot of messages are issued and not processed immediately
    // the head pointer will quickly wrap around and threaten to collide with the tail
    // pointer.  By sacrificing a set of locations we avoid having to keep a counter of
    // unprocessed messages.  This, because there would be ambiguity when both head and
    // tail pointers point to the same location: it could mean that there are no messages
    // to process or that there's a buffer full of messages to process.  The two byte
    // gap we are imposing between head and tail removes this ambiguity and makes it easy
    // to determine the buffer empty and full conditions.
    //
    // If there's space for a new message it is stored and the head pointer advanced to
    // the next available location.  However, if no space remains, the message is discarded
    // and the head pointer will remain one message (two bytes) away from the tail pointer.
    //
    //
    // Return
    // ----------------------------
    // 0 = Message did not post
    // 1 = Message posted
    //
    // 
    U8 msg_loop_post_message(U8 message, U16 parameter)
    {
        if(msg_loop.count == MESSAGE_LOOP_SIZE)
        {
            return(0);    // Can't post because buffer is full
        }
        else
        {
            msg_loop.item[msg_loop.head].message = message;      // Post message
            msg_loop.item[msg_loop.head].parameter = parameter;  //
            msg_loop.count++;                                    // Update message count

            if( msg_loop.head == (MESSAGE_LOOP_SIZE - 1))        // Wrap around?
            {
                msg_loop.head = 0;                               // Yes.
            }
            else
            {
                msg_loop.head++;                                 // No
            }
            return(1);
        }
    }


This code is definitely readable, but I feel like the port from assember to C might have been too direct. Originally head was a pointer and it made sense to have conditional jumps, but in the C code head is an offset starting at 0. Why keep the entire if structure when you can simply have

  msg_loop.head = (msg_loop.head + 1) % MESSAGE_LOOP_SIZE  // Advance head and wrap


You know, it's been about fifteen years.

I remember I had to convert this to C in a hurry because new features had to be added. Doing it in assembler was just about unthinkable. The entire project consists of nearly 100 assembler files and resulted in about 90 files once translated into C.

This was a marathon run and once the entire project was translated and working the focus was narrowly shifted in the direction of adding functionality rather than optimizing the code.

This code was running on an 8051-derivative. Performance was important. You are always counting clock cycles on these systems. I think that if you look at the resulting machine code with the relevant variables mapped to the data memory space the modulo statement doesn't really do any better than a spelled-out conditional statement. In fact, it could be a couple of clock cycles slower (just guessing).

Again, it's been a long time. I can totally see looking at the generated assembly and saying something like "That's as fast as it it going to get. Move on!".

We all make less-than-elegant code at one point or another. Hopefully not too many times. :-)

http://www.keil.com/support/man/docs/is51/

https://www.silabs.com/Support%20Documents/TechnicalDocs/C80...


> the modulo statement doesn't really do any better than a spelled-out conditional statement

Actually, on modern processors the modulo would probably be more efficient than the conditional due to cool stuff they do like pipelining. Pipelining is basically where they execute instructions ahead of time (sort of in pseudo-parallel); and so when there's a branch, there's a fair chance the results from future processing might have to be thrown out.

One of the techniques that's used to go around this is branch prediction[1], where they initially guess whether a branch will be taken or not and then depending on whether the branch was actually taken, they change their guess. This works particularly well for for loops and the kind.

Another development on the ARM side (and MIPS too, I think) are instructions that are embedded with conditionals in them.[2]

[1] http://en.wikipedia.org/wiki/Branch_predictor

[2] http://en.wikipedia.org/wiki/ARM_architecture#Conditional_ex...


Absolutely true. I was talking about the 8051, which is the context of the code I posted.


>That's ridiculous. However, something like this is useful:

> // Bounds-check velocities

No, that is an example of a comment of a region that should have been a separate method/function.


Not really.

If you throw methods and functions at everything all you are doing is adding the processing overhead of the entry and exit from the method or function. These are not magical entities. There's a time, a place and cost to using them.

Having come up from assembly and, in general, low level coding, one becomes very aware of what is being created behind the scenes. Unless something like this contrived bounds-check test will be used multiple times across a module or modules there's no reason whatsoever to add the overhead of entering and exiting a function for a simple couple of if/else-if/else statements.

I see this all the time. Everything has to be an object and everything has to be a class with a pile of properties and methods. No it doesn't. Massive projects --critical projects-- have been done over the years without any of that. Be careful not to engage in creating a monument to a coding paradigm rather than a tight, fast, practical and sensible solution to a problem.


Then tell your compiler to inline it - but don't bother doing it until you have measured it to actually have a measureable impact. Readability and rewriteability dominate that kind of unguided microoptimizations you seem to like all the time.

That doesn't mean to abstract for the sake of abstraction but to choose the right abstraction. Writing code so that it is easy to reach the right abstraction helps a lot.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: