> nobody at Microsoft ever understood how the code worked (much less still understood it), and that most of the code was completely uncommented, we simply couldn't figure out why the collision detector was not working. Heck, we couldn't even find the collision detector!
This continues to be one of my pet peeves, particularly with code samples and a lot of what is posted on Github, even major libraries. Almost no comments, background or guidance on the intent and structure or the code.
No, I am not suggesting that something like this is necessary:
// Iterate through all elements
for(int i=0; i < count; i++
{
// Check that velocity isn't above threshold
if(velocity[i] > THRESHOLD)
{
// Limit velocity
velocity[i] = THRESHOLD;
...
That's ridiculous. However, something like this is useful:
I've heard some say "I just write self-documenting code". That's a myth but for the simplest of structures. Any non-trivial piece of work is far from being self-documenting. Code is self-documenting for the guy who wrote it. I guarantee you that anyone else reading it has to reconstruct a stack in their head to understand what the hell is going on. That's not self-documentation. I shouldn't have to think to understand what a chunk-o-code is doing. The same for functions and/or methods.
The myth of self-documenting code is easy to demonstrate if I show you a piece of code in a language you don't know. I am going to assume that most programmers these days don't know assembler, Forth or Lisp.
I wrote this twenty years ago. Even if you understand Lisp you'd have to think it through. However, this is not how I wrote it. This is what I actually wrote:
;=====================================================================================================
; GetPolylineEndEntities
;
; Argument: Polyline entity name
; Return: Two element list containing the first (if any) entity found at the end of the polyline.
; The polyline itself is excluded.
; If nothing is found at a particular end, that element in the list is set to nil.
;
(defun GetPolylineEndEntities ( plename / plends cvcenter cvsize oldcmdecho pt1 pt2 endss outlist)
(setq plends (GetPolylineEnds plename)) ;Get the endpoints
(setq cvcenter (getvar "viewctr")
cvsize (getvar "viewsize")
oldcmdecho (getvar "CMDECHO")
)
(setvar "CMDECHO" 0)
(foreach point plends
(progn
;Examine what connects at each end
(setq pt1 (add2d point '(-0.0125 -0.0125)))
(setq pt2 (add2d pt1 '(0.025 0.025)))
;Zoom to the end being analyzed to have better selection accuracy
; **** Have to figure out a way to do this without zooming ****
(command "zoom" "c" point 2)
;Eliminate the original cable from the resulting selection set
(setq endss (ssdel plename (ssget "C" pt2 pt1)))
;Add the first entity found to the output list
(setq outlist (append outlist (list (ssname endss 0))))
)
)
(command "zoom" "c" cvcenter cvsize)
(setvar "CMDECHO" oldcmdecho)
outlist
)
Even if you don't know Lisp you now have an idea of what this code is doing. My style has changed over the years. This isn't my best example, but it is here to drive a point home.
The use of an unfamiliar language serves to illustrate the point that the idea of self-documenting code is, again, a myth. I wrote that code myself and without the comments I'd have to mentally reconstruct every step to even begin to understand what's going on and what the intent was. I haven't touched Lisp in quite some time.
I've looked through so much code in Github without a single comment that it makes me wonder if this is what is being taught in schools these days. Accurate in-code documentation is, as far as I am concerned, part and parcel of becoming a professional programmer. It recognizes that the work represents a huge investment in time, money and intellectual effort and it ensures that this effort and expense doesn't have to be duplicated in order to maintain, evolve or migrate the product as the Microsoft example clearly demonstrates.
MSG_LOOP_PostMessage:
mov a, MSG_Head
cjne a, #MSG_BUFFER_END - 2, mlpm0
mov a, #MSG_BUFFER
sjmp mlpm1
mlpm0:
inc a
inc a
mlpm1:
cjne a, MSG_Tail, mlpm2
clr a
ret
mlpm2:
mov r0, MSG_Head
mov @r0, MSG_Code
inc r0
mov @r0, MSG_Parameter
mov MSG_Head, a
mov a, #1
ret
I wrote that about fifteen years ago. Of course, the routine's label tells you something: "MSG_LOOP_PostMessage". It must post a message to a message buffer. The rest is giberish. What's the intent behind each and every block of code? Well, of course, that's not how I wrote it. This is what I wrote:
; POST MESSAGE --------------------------------------------------------------------------
; This routine posts a new message to the message loop buffer.
;
; The message code and parameter are written to the buffer only if buffer space is
; available. This is determined by first looking at the difference between the
; head and tail pointers. By design, we sacrifice one set of locations (two
; bytes) in the buffer in order to create a gap between an advancing head pointer and
; a lagging tail pointer. If a lot of messages are issued and not processed immediately
; the head pointer will quickly wrap around and threaten to collide with the tail
; pointer. By sacrificing a set of locations we avoid having to keep a counter of
; unprocessed messages. This, because there would be ambiguity when both head and
; tail pointers point to the same location: it could mean that there are no messages
; to process or that there's a buffer full of messages to process. The two byte
; gap we are imposing between head and tail removes this ambiguity and makes it easy
; to determine the buffer empty and full conditions.
;
; If there's space for a new message it is stored and the head pointer advanced to
; the next available location. However, if no space remains, the message is discarded
; and the head pointer will remain one message (two bytes) away from the tail pointer.
;
; Arguments
; ----------------------------
; MSG_Head Message buffer head pointer
; MSG_Tail Message buffer tail pointer
; MSG_Code Message code
; MSG_Parameter Message parameter
;
; Return
; ----------------------------
; ACC = 0 -> Message did not post
; ACC <> 0 -> Message posted
;
; Modified
; ----------------------------
; MSG_Head
; ACC, R0
;
; Preserved
; ----------------------------
; MSG_Tail
; MSG_Code
; MSG_Parameter
;
MSG_LOOP_PostMessage:
mov a, MSG_Head ;Increment the head pointer by one message
cjne a, #MSG_BUFFER_END - 2, mlpm0 ;and parameter.
mov a, #MSG_BUFFER ;Need to wrap around.
sjmp mlpm1 ;
mlpm0: ;
inc a ;No need to wrap around, just increment.
inc a ;
mlpm1:
cjne a, MSG_Tail, mlpm2 ;Check for a buffer full condition.
clr a ;Flag that we did not post the message.
ret ;Exit if it is.
mlpm2:
;The buffer isn't full, we can store the new message
mov r0, MSG_Head ;Store message
mov @r0, MSG_Code
inc r0
mov @r0, MSG_Parameter ;Store parameter
;Now set the head pointer to the next available message location.
mov MSG_Head, a ;The accumulator already has the next "head" location
;there's no need to execute the same logic again.
mov a, #1 ;Flag that the message was posted
ret ;and exit
Now you don't have to know assembler to understand this code. A couple of years later I had to re-write this in C. It was incredibly easy to do because of the exhaustive in-code documentation. This is what came out:
// POST MESSAGE --------------------------------------------------------------------------
// This routine posts a new message to the message loop buffer.
//
// The message code and parameter are written to the buffer only if buffer space is
// available. This is determined by first looking at the difference between the
// head and tail pointers. By design, we sacrifice one set of locations (two
// bytes) in the buffer in order to create a gap between an advancing head pointer and
// a lagging tail pointer. If a lot of messages are issued and not processed immediately
// the head pointer will quickly wrap around and threaten to collide with the tail
// pointer. By sacrificing a set of locations we avoid having to keep a counter of
// unprocessed messages. This, because there would be ambiguity when both head and
// tail pointers point to the same location: it could mean that there are no messages
// to process or that there's a buffer full of messages to process. The two byte
// gap we are imposing between head and tail removes this ambiguity and makes it easy
// to determine the buffer empty and full conditions.
//
// If there's space for a new message it is stored and the head pointer advanced to
// the next available location. However, if no space remains, the message is discarded
// and the head pointer will remain one message (two bytes) away from the tail pointer.
//
//
// Return
// ----------------------------
// 0 = Message did not post
// 1 = Message posted
//
//
U8 msg_loop_post_message(U8 message, U16 parameter)
{
if(msg_loop.count == MESSAGE_LOOP_SIZE)
{
return(0); // Can't post because buffer is full
}
else
{
msg_loop.item[msg_loop.head].message = message; // Post message
msg_loop.item[msg_loop.head].parameter = parameter; //
msg_loop.count++; // Update message count
if( msg_loop.head == (MESSAGE_LOOP_SIZE - 1)) // Wrap around?
{
msg_loop.head = 0; // Yes.
}
else
{
msg_loop.head++; // No
}
return(1);
}
}
This code is definitely readable, but I feel like the port from assember to C might have been too direct. Originally head was a pointer and it made sense to have conditional jumps, but in the C code head is an offset starting at 0. Why keep the entire if structure when you can simply have
msg_loop.head = (msg_loop.head + 1) % MESSAGE_LOOP_SIZE // Advance head and wrap
I remember I had to convert this to C in a hurry because new features had to be added. Doing it in assembler was just about unthinkable. The entire project consists of nearly 100 assembler files and resulted in about 90 files once translated into C.
This was a marathon run and once the entire project was translated and working the focus was narrowly shifted in the direction of adding functionality rather than optimizing the code.
This code was running on an 8051-derivative. Performance was important. You are always counting clock cycles on these systems. I think that if you look at the resulting machine code with the relevant variables mapped to the data memory space the modulo statement doesn't really do any better than a spelled-out conditional statement. In fact, it could be a couple of clock cycles slower (just guessing).
Again, it's been a long time. I can totally see looking at the generated assembly and saying something like "That's as fast as it it going to get. Move on!".
We all make less-than-elegant code at one point or another. Hopefully not too many times. :-)
> the modulo statement doesn't really do any better than a spelled-out conditional statement
Actually, on modern processors the modulo would probably be more efficient than the conditional due to cool stuff they do like pipelining. Pipelining is basically where they execute instructions ahead of time (sort of in pseudo-parallel); and so when there's a branch, there's a fair chance the results from future processing might have to be thrown out.
One of the techniques that's used to go around this is branch prediction[1], where they initially guess whether a branch will be taken or not and then depending on whether the branch was actually taken, they change their guess. This works particularly well for for loops and the kind.
Another development on the ARM side (and MIPS too, I think) are instructions that are embedded with conditionals in them.[2]
If you throw methods and functions at everything all you are doing is adding the processing overhead of the entry and exit from the method or function. These are not magical entities. There's a time, a place and cost to using them.
Having come up from assembly and, in general, low level coding, one becomes very aware of what is being created behind the scenes. Unless something like this contrived bounds-check test will be used multiple times across a module or modules there's no reason whatsoever to add the overhead of entering and exiting a function for a simple couple of if/else-if/else statements.
I see this all the time. Everything has to be an object and everything has to be a class with a pile of properties and methods. No it doesn't. Massive projects --critical projects-- have been done over the years without any of that. Be careful not to engage in creating a monument to a coding paradigm rather than a tight, fast, practical and sensible solution to a problem.
Then tell your compiler to inline it - but don't bother doing it until you have measured it to actually have a measureable impact. Readability and rewriteability dominate that kind of unguided microoptimizations you seem to like all the time.
That doesn't mean to abstract for the sake of abstraction but to choose the right abstraction. Writing code so that it is easy to reach the right abstraction helps a lot.
This continues to be one of my pet peeves, particularly with code samples and a lot of what is posted on Github, even major libraries. Almost no comments, background or guidance on the intent and structure or the code.
No, I am not suggesting that something like this is necessary:
That's ridiculous. However, something like this is useful: Anyhow, dumb example I pulled out of thin air.I've heard some say "I just write self-documenting code". That's a myth but for the simplest of structures. Any non-trivial piece of work is far from being self-documenting. Code is self-documenting for the guy who wrote it. I guarantee you that anyone else reading it has to reconstruct a stack in their head to understand what the hell is going on. That's not self-documentation. I shouldn't have to think to understand what a chunk-o-code is doing. The same for functions and/or methods.
The myth of self-documenting code is easy to demonstrate if I show you a piece of code in a language you don't know. I am going to assume that most programmers these days don't know assembler, Forth or Lisp.
I wrote this twenty years ago. Even if you understand Lisp you'd have to think it through. However, this is not how I wrote it. This is what I actually wrote: Even if you don't know Lisp you now have an idea of what this code is doing. My style has changed over the years. This isn't my best example, but it is here to drive a point home.The use of an unfamiliar language serves to illustrate the point that the idea of self-documenting code is, again, a myth. I wrote that code myself and without the comments I'd have to mentally reconstruct every step to even begin to understand what's going on and what the intent was. I haven't touched Lisp in quite some time.
I've looked through so much code in Github without a single comment that it makes me wonder if this is what is being taught in schools these days. Accurate in-code documentation is, as far as I am concerned, part and parcel of becoming a professional programmer. It recognizes that the work represents a huge investment in time, money and intellectual effort and it ensures that this effort and expense doesn't have to be duplicated in order to maintain, evolve or migrate the product as the Microsoft example clearly demonstrates.