Assembly Language for Beginners [pdf]

GnarfGnarf · on July 17, 2018

I actually got paid a salary for learning & programming in IBM mainframe assembler (BAL or Basic Assembly Language) in 1970, for an insurance company. The CPU memory was so small (32K, yes 32,768 bytes) that the only way we could squeeze enough functionality was to write in assembler, with overlays (no virtual memory). Debugging consisted of manually toggling hex data and instructions at the control panel. What a blast!

It was a lot of fun, but terribly inefficient programmer productivity. I would not want to go back :o) Dereferencing registers prepared me for C pointers later.

cptnapalm · on July 17, 2018

Ooo! I have a question! Where do I learn more about overlays? The BSD 2.11 code I've read has comments about overlays, but I have no idea where to learn more about how to understand the topic. I came across it while I was seeing if I could get newlib to compile for the PDP-11.

sehugg · on July 18, 2018

You could look at some old Turbo Pascal articles (though don't know how applicable it is to other systems): http://www.boyet.com/articles/publishedarticles/theslithytov...

monocasa · on July 17, 2018

Game consoles supported the concept a whole lot longer than other domains. ROM bank switching can be thought of as overlay loading, and beyond that most consoles supported overlays up until 360/PS3. The Nintendo DS cartridge format natively supported overlays for instance.

exikyut · on July 19, 2018

AFAIK, early MS-DOS made heavy use of the technique, probably(?) before EMS/XMS became available/common.

"OVL" popped up in my head for some reason, and https://www.google.com/search?q=dos+ovl seems to return interesting results (showing a few real-world examples).

ddingus · on July 18, 2018

https://www.google.com/amp/s/www.vox.com/platform/amp/policy...

ddingus · on July 18, 2018

Dammit. Wrong link.

Here is the correct one:

https://www.elsevier.com/books/linkers-and-loaders/levine/97...

cptnapalm · on July 18, 2018

Was wondering about that initial link. Thank you for returning with the correction.

ddingus · on July 18, 2018

Glad it got seen.

xigency · on July 17, 2018

Wow, 32K! I'm working on an Arduino Uno project where I need to squeeze some complicated logic into 2K RAM and 32K FLASH :). I haven't resorted to assembly, yet.

Narishma · on July 17, 2018

32KB was a ton of memory in the 70s.

aswanson · on July 18, 2018

Especially in 1970. That was a lot even for 1980.

cylinder714 · on July 17, 2018

Forgive me, but I'm struggling to understand how much useful work one could extract from a computer with only 32k of RAM. A microcontroller for an appliance, sure, but a mainframe? Could you tell us more about the work you were doing?

wglb · on July 17, 2018

In 1970, a company named Telemed built an Electrocardiograph analysis service where hospitals would call in from all around the country and send ECGs over the telephone line with analog FM signals, three channels at a time. The computer located near Chicago would accept the incoming call, digitize the three analog signals every 2ms to 10 bits, write them to disk, and decode the touch-tone patient ID.

When the call was completed, the data from disk would be assembled into a full ECG record and written to tape, and simultaneously passed to the diagnostic program written in Fortran. The system would then initiate a phone call to the hospital's printer and print out an English-language diagnostic. The result was then available in ten minutes to the staff a the hospital.

The front end and back end was all Sigma 5 (first SDS then Xerox midrange real-time computer) assembler in an interrupt-rich process--one interrupt every 2ms for the analog, one interrupt for the disk write complete, interrupts for the tape record writing, interrupts for the outgoing phone call progress. This included an cost optimization process that would choose which line (this was in the days of WATS lines) based on desired response time. The middle was the Fortran program that would analyze the waveforms, identifying all the ECG wave forms--P-wave, QRS, and T-Wave--the height, duration, sometimes slope.

This all took place in a machine with 32k words (four bytes per word). There were two computers, one nominally used for development, but could be hot-switched if the other failed. I think downtime was on the order of an hour per year. This would have been called an Expert System, but I don't think the term was in common use as yet.

So the answer to your question is: "A considerable amount". Today we are all spoiled by environments with more memory on one machine than existed in the entire world at that time.

cylinder714 · on July 17, 2018

Thank you, that's brilliant! It is unfortunate that the source code is lost to humanity, as an example of what is possible.

wglb · on July 17, 2018

Ah, I could replicate it with a bunch of time and likely emulators. There wasn't anything secret about it, just good engineering with an innovation or two. The one that was the coolest was to use coroutines to kind of invert the inner loop. In the middle of the loop was a macro call that said "Get interrupt from the OS" which simply translated as a WAIT that would fire when the particular interrupt came in. We wrapped coroutine around this to properly save and restore state.

By the way, this was significantly easier than what folks have to go through with C or using that dratted asynsc/await pattern.

This particular code that used the coroutine was the outbound call processing low-level stuff. I was second fiddle on that one, and the lead was a fellow who is quoted in TAOCP. We had zero single-thread errors and one multi-thread error when we stood it up. Keep in mind that this was in the days of no debuggers other than console switches.

tux3 · on July 17, 2018

If you want examples of how far you can go packing way too many things in too little code, you may be interested in the demoscene!

Bromskloss · on July 17, 2018

I would not have guessed that there were automatic ECG analysis in 1970! Was it good? Are today's methods any better?

wglb · on July 17, 2018

We were the first commercial offering. There was one or two working at universities.

It was quite good.

I would imagine that they are--I haven't kept track.

wglb · on July 18, 2018

But to further comment --

The landscape is vastly different these days. You can't even stand up something in a medical environment that measures heart rate, much less waveforms without significant clinical trials. Apparently the FDA is all over this one.

znpy · on July 18, 2018

Awesome... Can you please share more stories of that time with us? Maybe write them somewhere ?

wglb · on July 18, 2018

I should write up a blog post. When I do, I'll submit it here.

stevenhuang · on July 18, 2018

You really really should! That was a fascinating read, thanks for sharing.

zeta0134 · on July 18, 2018

A key thing to remember is that 32k of RAM isn't the only memory available; often there was all sorts of longer term storage, and in many cases the data for a job was separate than the machine running the program, which could feed that data in using a number of methods. Today's programs load entire files into RAM because it's available; back then you might load a single record from the file at a time, and there were routines to seek through the offline data (subject to optimization) and manage the working memory more effectively.

It's also worth remembering that, at the time, a machine with 32k of RAM was one of the most powerful on the market, was still considerably expensive, and the alternative was paying (a team of) humans to do the work by hand. For all its shortcomings and the insane complexity required to get the machines to work properly, they were generally much faster than humans performing the same task and generally (assuming they were programmed correctly) could be relied on to make fewer mistakes. Their utility was remarkable, especially their ability to perform arithmetic very quickly, which was (and still is) quite tedious to perform by hand.

TylerE · on July 17, 2018

Oblig NASA factoid: The guidance computer on Apollo that got as to the moon had 4096 bytes of RAM and about 72K of ROM.

cylinder714 · on July 17, 2018

Whew, excellent point. That raises another question, though—how much computational work did that computer have to do? The real heavy lifting was performed by big NASA mainframes on Earth, right?

cbm-vic-20 · on July 17, 2018

The Apollo computer did things like control rocket burns, provide navigational information, etc. More information

http://nassp.sourceforge.net/wiki/Guidance_and_Control_Syste...

The source is on Github, too, for example:

https://github.com/chrislgarry/Apollo-11/blob/master/Luminar...

ehaliewicz2 · on July 17, 2018

The original gameboy has 8K of RAM and the first and second gen pokemon games run on it. Most of the hard work is done by the graphics and sound hardware. Fairly large games though.

colejohnson66 · on July 17, 2018

The original NES had 2K of RAM (but cartridges could expand that).

cbm-vic-20 · on July 17, 2018

You could land a spaceship in the moon with a computer with less RAM.

https://en.wikipedia.org/wiki/Apollo_Guidance_Computer

danellis · on July 17, 2018

> with a computer with less RAM

And a team of scientists who had done all the difficult calculations beforehand...

ridgeguy · on July 18, 2018

In 1969, a computer with "2048 words of RAM" and "36,864 words of ROM" managed a landing on the Moon and subsequent return[1].

People do amazing things with primitive tools.

[1] https://en.wikipedia.org/wiki/Apollo_Guidance_Computer

cylinder714 · on July 17, 2018

(An aside: do terms like "mainframe" and "microcomputer" have any meaning any more, when a Raspberry Pi Zero has orders of magnitude greater RAM and power than a '70s piece of "big iron"?)

joshAg · on July 17, 2018

Yes. Mainframes still exist. IBM still sells them, and you can do things like hotswap CPUs and RAM or setup an geographically distributed HA cluster that can swap which mainframe the VMs or databases are "running" on without dropping connections, requests, or other interruptions.

https://en.wikipedia.org/wiki/IBM_Z

nineteen999 · on July 18, 2018

One of the four large banks in my country I worked at only a few years ago still managed all their internal change management process on a fairly old IBM mainframe.

You had to connect via an arcane telnet client (tn3270 protocol perhaps?) and input the change details. No fancy web forms. Perhaps it was a limitation of the application, but you couldn't mix uppercase and lowercase in the one form.

ci5er · on July 22, 2018

The PDP (11? I think so, but maybe 7? My home PDP for running the internet in my corner of the universe was an 11) at JPL that processed the Voyager flyby snapshots of Saturn into colorized planet+ring images was 64KB. Instead of viewport-centric geometry scans doing texture lookup, they scanned in texture-space (less swapping)

molteanu · on July 17, 2018

Thanks for sharing

amorousf00p · on July 17, 2018

[flagged]

sctb · on July 17, 2018

No matter how you feel, posting like this will get your account banned.

https://news.ycombinator.com/newsguidelines.html

znpy · on July 17, 2018

Maybe too much content? 1000+ pages, many architectures... Probably too much content for a beginner book?

Btw, a great book imho is "Assembly Language Step By Step - Programming with Linux - 3rd ed" (https://musho.tk/l/d2d56a34).

The great things is that it is an easy read and really starts from the basic and explains how the i386 architecture works, and then explains how to program it using assembly.

The sad thing is that afaik the author is quite old and probably is not going to release a 4th edition, meaning that the book will stay on intel i386.

jlarocco · on July 17, 2018

I have the first or second edition of "Assembly Language Step By Step", and it's the best intro I know of.

It must be difficult to write a good assembly book. On one hand there's a lot of basics to cover, like memory addressing, segmentation registers, etc. But on the other hand, the main use case for it today is hand optimized functions for when the compiler can't optimize enough, which is inherently an advanced topic.

johannes1234321 · on July 17, 2018

There is another use: Understanding compiler behavior and reverse engineering. Those mostly need reading skills.

fouc · on July 18, 2018

I read the 2nd edition of Jeff Duntemann's book (DOS & Linux) in little more than a couple of sittings. Incredibly readable! Best introduction on the background of how CPUs work and what machine language is about.

vasili111 · on July 17, 2018

>Assembly Language Step By Step - Programming with Linux - 3rd ed +1. I have read 2-nd edition. Great book!

simias · on July 17, 2018

The fact that the document targets multiple architectures is great, most ASM tutorials online target solely x86 and as such give a very partial view of how assembly is written and how CPUs work, and on top of that x86 ASM is really quite a mess.

I've skimmed the document and it seems rather thorough and doesn't shy away from all the nasty details (which is a good thing for an ASM tutorial), the only criticism I have so far is that the assembly listings are a bit hard to read, additional whitespace and basic syntax highlighting (at least to isolate the comments) would make it a bit easier on the eyes I think, for instance: https://svkt.org/~simias/up/20180717-151848_asm-listing.png

snarfy · on July 17, 2018

Yep, I was very happy to see the ARM and MIPS output. Syntax highlighting is a great idea. It didn't even bother me but this is coming from a guy that had a hard copy print out of the 386 programmers reference.

mbowcutt · on July 17, 2018

Would have been over-the-top cool if RISC-V had been included as well.

tawayway · on July 17, 2018

This seems to the same text as the author's Reverse Engineering for Beginners (https://beginners.re/)

https://news.ycombinator.com/from?site=beginners.re

e12e · on July 17, 2018

Yeah, I'm a little confused about this "build" of the pdf. It clearly states: "The latest version (and Russian edition) of this text is accessible at beginners.re." - which points to: https://github.com/DennisYurichev/RE-for-beginners

But as far as I can tell there's no branch with a different name - maybe this was just a working title for the English version at some point?

Anyway, this new submission with a new title made take a look, so I'm happy :)

Now, I just hope someone takes a crack at forcing an epub build for better reflow/resize on small screens...

There's a (dormant?) issue: https://github.com/DennisYurichev/RE-for-beginners/issues/37...

cylinder714 · on July 17, 2018

The title pages are different. The Reverse Engineering book's title text is hex digits, and is dated July 9, 2018. The Assembly Language book's title page is English text, and is dated today, July 17, 2018. And there is no mention of the Assembly Language book on his home page, yurichev.com.

dennis714 · on July 25, 2018

Yes. Explanation: https://yurichev.com/blog/UAL/

vmateixeira · on July 17, 2018

Fun fact: both books have precisely 1082 pages.

4llan · on July 17, 2018

Fun fact: both books have precisely the same content

sus_007 · on July 17, 2018

Fun fact: both sources have precisely the same favicon as well.

nineteen999 · on July 18, 2018

At one point when I was considerably younger I started learning 32-bit x86 assembly as my very naive career goal was to become the next Linux Torvalds. I managed to construct a multiboot-compliant 32-bit kernel that could load tasks from a ext2 ramdisk image passed to the bootloader and multitask up to around 64 processes. I figured out how to use the CR3 register and craft page tables, enter into kernel syscalls using software interrupts, handle CPU faults, page faults, double faults etc. It was quite the learning experience until it eventually crumbled under my lack of skill and foresight. In short, I got more or less about as far as most amateur, self-taught, lone OS developers get before losing interest and giving up.

Fast forward a couple of decades, and I found myself reverse engineering CP/M for the Z80 processor in order to create a virtual Z80-based system that ran inside Unreal Engine. I started with Udo Munk's wonderful Z80pack system, adapted a public domain Z80 CPU emulator which was written in C to C++, and did minimal reimplementation of the Z80pack terminal and disk I/O devices. Since the systems were implemented as "actors" in UE4 it's possible to spawn and run quite a few concurrently as long as you limit the CPU speed of each instance somewhat.

The resulting virtual system in UE4 is able to run original CP/M ports of Rogue and Zork (https://i.imgur.com/gnOCp3e.png), various Z80 instruction exercisers (https://i.imgur.com/kwNuq5X.png), a Z80 C compiler and and even Wordstar 4 (https://i.imgur.com/Q6307w3.jpg) and Microsoft BASIC.

Learning assembly can be a lot of fun - it can really teach you quite a bit about systems architecture that you otherwise might not get if you're always programming in high-level languages only.

mcdevilkiller · on July 18, 2018

Have you published a plugin or the source? Sounds very interesting for using in computers inside the game (as in Fallout and the like).

nineteen999 · on July 18, 2018

I'd like to but it would still take quite a bit of work to make it production quality code, and I don't really have the time right now. One day I hope.

andreygrehov · on July 17, 2018

My god, 1000+ pages. What a great submission though. I wonder what kind of things keep such people motivated to write one fuckin' thousand and eighty two pages about assembly!? This is nuts.

vram22 · on July 17, 2018

A couple of points come to mind:

1. Assembly language needs more lines of code to achieve the same task than higher level languages, by its very nature.

2. What I call Pascal's Amendment :) - very loosely like claiming the Fifth Amendment (to the US Constitution):

https://en.wikipedia.org/wiki/Fifth_Amendment_to_the_United_...

https://www.brainyquote.com/quotes/blaise_pascal_386732

"I have made this letter longer than usual, only because I have not had the time to make it shorter."

- Blaise Pascal

As a writer, I can corroborate that. In fact, if he "had the time to make it shorter", it implies that he spent even more time to write those 1000+ pages than it would seem on first glance. Plus even more than for the same 1000+ pages if in a higher level language, since assembly is a lot more error-prone.

ianai · on July 17, 2018

I don’t think it’s nuts to share such depth of knowledge. I greatly welcome the knowledge, too. I legitimately am wondering what sort of project I could take up that would be “small enough”, only do-able in asm/c, and interesting.

jimbokun · on July 17, 2018

Think "nuts" was used as an endearing compliment there. Like "insane" in the sense of "insanely good".

K2h · on July 17, 2018

wow - impressive. In the intro of the book, yes, its a book - there is a call for proof read (English and Russian) and translators, will accept work no matter how small and credit. now that's cool. when I get a few minutes between meetings i'm going to see if I can find anything to contribute and submit. I absolutely love the tone of the book! what a cool guy.

jimbokun · on July 17, 2018

"Chapter 1" is over 400 pages. :)

merolish · on July 17, 2018

Old school, but I've always enjoyed this gentle introduction:

https://chortle.ccsu.edu/AssemblyTutorial/index.html

nickpsecurity · on July 17, 2018

That looks nice. People learning hardware, too, might follow-it up with a study of Plasma MIPS:

https://opencores.org/project/plasma

Then, they'll understand at least one architecture inside and out. Plus be able to customize it to their liking. :)

simonebrunozzi · on July 17, 2018

Amazing. One of my fondest memories is when, at age 10-11, I "played" Core War [0] with an older friend of mine (he was 20-21 back then, and a CS student at one of the best CS universities in Italy, Pisa). I loved to learn how to program in a game-like environment. I still remember several things, ~30 years later.

[0]: https://en.wikipedia.org/wiki/Core_War

g-b-r · on July 17, 2018

DiffPDF 2.1.3 finds only two tiny (if weird) differences in the appearance of the two versions of the books (https://yurichev.com/writings/AL4B-EN.pdf and https://beginners.re/RE4B-EN.pdf) besides the title, author and build date: two page references at pages 1075 and 1078.

Since both sites appear to be owned by the book's author this is most likely just a change that has not yet been pushed to github (or mentioned in the author's sites), but it would be better if the author clarified it (would that be you, dennis714?)

dennis714 · on July 25, 2018

Yes. Explanation: https://yurichev.com/blog/UAL/

moistoreos · on July 17, 2018

Godbolt is my favorite goto for "What would this look like in assembly" answer websites.

danellis · on July 17, 2018

That will only tell you what a compiler will generate, though, which sometimes isn't enough.

xfer · on July 17, 2018

I got interested in assembly fairly early on my programming career by playing wargames(io.smashthestack.org anyone?). Writing exploit payloads was very fun. After that in my career i have only written assembly to vectorize some code using neon.

I think the best reason to learn assembly is not to write rather be able to read compiler output.

pq0ak2nnd · on July 17, 2018

This is a great resource. I don't think it is for beginners, it is a bit overwhelming. But as a programmer who has to dive into asm every few months (on a variety of architectures!), this will be a very helpful reference to help me reset my frame of mind for the particular architecture.

neuromantik8086 · on July 17, 2018

I feel like 6502 assembly is probably better for beginners, but that might be because that's what I'm trying to teach myself...

lucb1e · on July 17, 2018

This sentiment is one of the reasons why I didn't bother in the past: it's apparently so complex that I have to learn something completely different and useless these days, before I can start learning something useful. I'd much rather just deep dive into x86_64 or ARM or something.

These days I know that older versions are still (partly?) included in x86_64 and that they're often mostly the same, but that was not clear to me when I saw tutorials for ancient architectures of which I didn't see the point.

But then, I've never taken well to the school system where you get taught something stupid first only to see practical applications later. It's why I dropped out of high school and went to do something of which I did see the use (a sysadmin 'study', because I knew I enjoyed working with computers, and that was indeed a good fit for me).

neuromantik8086 · on July 17, 2018

Saying that 6502 is useless is a bit harsh- I wouldn't call it an employable skill, but it's great for hobbyists who are into retrocomputing (like me).

You could deep dive into x86_64 or ARM, but in the general case you would never actually code in those (i.e., most folks trust the compiler) unless you were writing a driver or writing something with crazy performance like MenuetOS.

nickpsecurity · on July 17, 2018

"Saying that 6502 is useless is a bit harsh. I wouldn't call it an employable skill"

It must be both useful and a job skill for some people:

https://wdc65xx.com/chips/

I wouldn't study it to get a job. There's apparently still utility in it, though, with WDC's versions of it.

pro_zac · on July 17, 2018

What resources are you using to learn?

neuromantik8086 · on July 17, 2018

A little bit of Easy 6502 [0] but mostly Richard Haskell's Apple II-6502 Assembly Language Tutor [1][2] along with Virtual][ [3] . Basically I'm trying to use assorted manuals I could find in my childhood home's basement, most of which seem to be out of print (although there are some books floating around on archive.org).

[0] https://skilldrick.github.io/easy6502/

[1] https://www.amazon.com/Apple-II-6502-assembly-language-tutor...

[2] https://archive.org/details/Apple_II_6502_Assembly_Language_...

[3] http://www.virtualii.com/

eleitl · on July 17, 2018

Now do that for the GPU.

cjhanks · on July 17, 2018

To be fair to NVIDIA, they have done a pretty good job here.

https://docs.nvidia.com/cuda/parallel-thread-execution/index...

monocasa · on July 17, 2018

I thought that PTX is an intermediate format, kind of like a Nvidia specific LLVM bitcode for GPUs.

bigmit37 · on July 17, 2018

Great share. I have been curious about assembly languages for a while. Hopefully it’s not to technical for someone without a CS background.

mikestew · on July 17, 2018

13 year old kids figure it out. I was one of them, and I'm an idiot more days than I care to admit. Granted, it was 6502 assembly, but later I'd be able to sift my through x86, before it went completely nuts.

scarface74 · on July 17, 2018

I think my motivation for learning 65C02 assembly back in the day was necessity. To take advantage of the Apple //e at any reasonable speed I had to. Besides 65C02 was simple, as was 16 bit x86. Things got a lot more complicated and the necessity went down.

mikehodgson · on July 17, 2018

Great resource! It brings me back to the early days of the web, poring over "Fravia's Pages of Reverse Engineering".

kolbe · on July 17, 2018

I've never had a desire to learn about Assembly, but I thought the first few pages of this book were kinda interesting, and I might end up learning more than I had ever cared to.

lprd · on July 17, 2018

This looks great! Thank you for sharing :)

peter_retief · on July 17, 2018

Nice basic intro to assembler, as a person from some electronics background it fits with knowledge of simpler architectures

richfnelson · on July 17, 2018

Thanks for the recommendation.

young_unixer · on July 17, 2018

Thank you. This is a godsend.

max_ · on July 17, 2018

Nice!

kulu2002 · on July 17, 2018

Nice share! Thanks

paidleaf · on July 17, 2018

This isn't for beginners. What beginner's assembly text covers multiple architectures and assembly flavors?

My recommendation for beginner's assembly on linux is to write toy code in C and then view the disassembly in gdb or objdump. You have options to switch to Intel syntax from GAS/AT&T if you want.

I'm generally against using windows for anything, but Visual Studio has decent disassembly debug options where you can step through the native assembly code. You could also look at IL code ( which is very similar to native assembly ) and learn assembly concepts that way. ildasm and ilasm are great tools for that.

Assembly is so low level and can be intimidating to write from scratch in the beginning. It's better for beginners to write code in a higher level language like C and then read the compiler generated assembly code. Once they are comfortable with a disassembly of a "hello world" program, then write more complicated code and understand the disassembly. Then try to edit and compile the disassembled code. Once you are comfortable, then write your own assembly code from scratch.

Edit: Also, if you have time and the will, watch the nand2tetris lectures and try their projects. It'll give you a hands-on general overview of hardware to assembly to VM to OO. How native assembly works with hardware. How VM interacts with native assembly. How you go from OO to VM? It's a very limited but educational overview of code flow from object oriented code all the way to circuitry ( software circuitry ).

mikehodgson · on July 17, 2018

The book literally follows this process starting on page 5.

appslure · on July 17, 2018

[flagged]

JadeNB · on July 17, 2018

This is spam.