Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Assembly Language Programming: Still Relevant Today (2015) (wilsonminesco.com)
119 points by mr_tyzik on Dec 31, 2019 | hide | past | favorite | 96 comments


Excerpt:

"Jeff Laughton (Dr Jefyll on the 6502.org forum) says, "I recall hanging out with a programmer pal o' mine and a younger fella who was in college. The young fella was complaining, 'We have to take assembly language,' and Len corrected him immediately, saying, 'You get to take assembly language!'"

<g>


What a great way to put it.

What's great is that learning assembly is like taking the first step to understanding the bridge between software and hardware.

I remember a taking class that started out as confusing as hell. The first few exercises seemed really mystical and just so brittle. You'd have to be a wizard to ever get this, one thinks. But by the end of the class, we made an asteroids-like game, complete with RNG based on input timing (the time between game execution and when the user clicked start). That was really an enlightening moment.


"...self-modifying code, [is] sometimes appropriate to solve certain problems that have no other solution, or for improving efficiency, as in double-indirect addressing."

I have read that self-modifying code on the x86 architecture is pretty dangerous at the assembly level.

More broadly, this kind of comes back to all the issues in the "C is not a low level language" thread. Some level of assembler certainly gives the programmer as full access to the machine as possible. But naive assembler from the 8086 - 80486 eras is going to be rearranged in a lot of ways in a modern Pentium processor and counting on in-order execution may be a mistake.

Edit: at the same time, the modern processor doesn't really allow a lower level than assembler normally and the default approach is assuming flat memory but being aware of the pitfalls of multiple caches being involved.


X86 is particularly friendly to self modifying code, going so far as to require fewer fences and hints when code is modified. It’s generally easier to implement self- and cross-modifying code on x86 than other ISAs, like say arm.

(I manage a team thag writes self modifying code for a living.)


Sounds interesting. What does your team do?


Among other things, just-in-time compilers for the WebKit JavaScript engine, JavaScriptCore.


It's quite widely used for dynamic library jump tables. When calling a function in a dynamically linked library, it calls a stub, which is initially a call to the lazy linker, but gets replaced with a call to the resolved function.

You may be remembering early x86 chips that didn't properly invalidate the instruction cache after a write. Modern chips are fully cache-coherent.


This is interesting. I've read documentation about dynamic linking and it describes this replacement process but I never truly understood the fact it was self-modifying code. Doesn't this imply the program's code is writable? I know that JIT compilers also emit code into writable and executable pages. Aren't there security implications?


There are. That’s why dynamic linking typically doesn’t use self-modifying code and JIT compilers take a number of precautions to prevent attackers from being able to execute arbitrary code.


I don't know much about x86, but on other cpus/architectures (PowerPC, MIPS, and possibly others) one might still need to fiddle with the instruction cache.


On modern IA32 and AMD64 architectures it involves the instruction cache in the cache protocol, so it's all done for you.


Yes but it still needs to: a) empty the write buffers into the cache, then b) flush the current instruction stream (in case the CPU has already fetched and decoded instructions from the modified memory)

Doing this on every write (especially considering multiple possible virtual to physical mappings) is very expensive in terms of hardware - it's why some architectures (RISC-V for example) have explicit instructions to trigger these things


That’s not self modifying code, though; it’s just an indirect jump through a pointer.


I think the op is saying that the instruction call <stub_function> gets rewritten by the instruction call <resolved_function>.


Generally that kind of thing goes through a GOT/PLT for security and performance reasons.


> I have read that self-modifying code on the x86 architecture is pretty dangerous at the assembly level.

Dangerous in what way?


Having written self modifying x86 code. The most annoying thing is that instructions are not a fixed width. This means patching code requires you are able to parse every instruction, or a look up table where each instruction starts or just regenerate the entire code whole cloth. This also can cause problems if self modifying code has more than one thread since you suddenly may need to update 1 to 15 bytes atomically.

Generally, easiest to do the last or put NOPs or doing something like windows hot patch point for functions. Where hot patchable functions are preceded with a 5 bytes of nops, and the function always starts with MOV EDI, EDI which again is a pretty much a NOP, but takes two bytes.

This allows one to replace MOV EDI, EDI to a short jump to the start of those 5 bytes which is large enough to hold a long jump to any code. Windows went this route because originally multi byte NOPs where not part of the spec so if you used the one byte NOPs not only would each nop need to be execute slowing down function calls, but in multi-threaded code you would have to lock all threads to edit the code since it would be fetching on byte at time ect...


The original 8086 had a six byte instruction cache (the 8088 in the original IBM PC had a four byte instruction cache). If you modify an instruction less than six bytes away, it won't be seen by the CPU unless you issue a JMP (or CALL) instruction. It was not normally that big of an issue (just make sure you modify the instruction from far enough away). You can use the fact that the 8086 has a six byte instruction cache and the 8088 a four byte instruction cache to determine which CPU the program is executing on.

These days, I think you would need to 1) have memory pages with code with write permissions, 2) possibly flush the instruction cache and 3) hope no other thread is using said routine. With today's security concerns, 1) will not be likely, 2) possibly requires elevated privileges (I don't recall---I've only really done ring-3 level code on x86) and 3) is probably okay in a single-threaded program.


You can get around 1 by having the page be mapped multiple times with different permissions. Modern x86 processors don’t need a cache flush.


I remember using the instruction-cache as a way to catch debuggers single-stepping through my code. If you rewrote a near instruction to be a jump-to-self single-steppers would fall victim to it..


When I was young I imagined that when I'll be an advanced programming/CS student I would learn advanced stuff like self-modifying code. Too bad real life is not that exciting...


You can get a job in compilers if you want. Message me via my profile if you don't know where to start.


If you’re interested, JIT compilers or malware reversing is a “real” place where this knowledge is useful.


Also don't forget binary translation which basically is assembly level JIT. Although static binary translation is a thing but has limitations.


> I have read that self-modifying code on the x86 architecture is pretty dangerous at the assembly level.

Why's that? I'm not aware of any issues specific to x86.


x86 is actually one of the easier platforms to have self-modifying code on, because you don’t have to flush the instruction cache.


Which is kind of interesting considering how removed the assembly which was written is from what is actually executed. You'd think x86 would be even less likely to notice you changed the plan out from underneath it.


While I can only recall writing real assembly once in 20 years I've read disassembly thousands of times while debugging.


> It is common for the beginner to want all the fancy tools too soon

Am I the only one who’s never felt that way? I get grief from people around me (especially “hurry up and get it done” management types) for spending too much time in the low levels, trying to really understand what I’m doing and what’s going on.


You should change what people are around you, if possible.


Not sure why HN is saying it is obvious that assembly is relevant because compilers... The article's intent is around a programmer writing assembly. I'm sure there are niches but I can see web developers getting away without writing assembly in their professional career.


Ok, we've put programming in the title above to make this distinction clearer. The author is not writing about computer-generated assembly language, such as compiler back ends.


Might see that flip in the coming years, if you consider web assembly (wasm) to be assembly. I do.


WebAssembly is not assembly in the ways that the article talks about. Like, writing it directly doesn’t give you any special control or guarantees over timing.


It's assembly against a virtual machine, not a physical one. You're right it's not appropriate for an embedded system or some other RTS, but assembly doesn't stop being assembly when you target a virtual machine.


It kinda does in this case. Don’t kid yourself. In real assembly, the really interesting part is how to use a finite register file. WebAssembly has an infinite slab of variables available, in the sense that you get to say how big it is. That fundamentally changes the game.


There are real CPUs that are just like that.

In fact, most mainframes have always made use of microcoded CPUs, with Assembly being referred as bytecode on the programming manuals.

You just need to dive into IBM and Xerox PARC manuals, for starters.


Yep. I feel like most of HN's readership's asm education begins and ends with their 6502 class at uni.


Sounds like you’re saying those machines executed bytecode.

Otherwise there isn’t a great limiting principle to your logic. Just because someone once built hardware that executes such a high level assembly that the manual referred to it as bytecode doesn’t mean that all bytecode formats are assembly.


Indeed I am, the interpreter is the microcoded CPU.

Even modern 80x86 Assembly is a low level form of bytecode, given that the micro-ops that are processed by the microcoded CPU are completly unrelated to 80x86 Assembly opcodes.


Wasm is different than assembly, so I don't think so.


It targets a virtual machine, not a physical one, but other than that it's "assembly-like" enough that learning some core ASM coding practices will help you.


Not sure why you are sticking to your guns here. First, it's highly unlikely anyone would ever write wasm by hand. Second, the article rambles a bit but the most compelling argument for assembly is writing fast code for constrained hardware. At a high level, that's not what wasm is solving.


WebAssembly is a bytecode format.


Bytecode is just the instructions, which is actually a level lower than asm, but for a virtual machine instead of a physical one. It's a direct enough abstraction that you can predict the bytecode you'd build from the asm you write.


> Assembly language yields maximum control and execution speed.

It yields the only control and execution. Almost programs are generated through an assembler and assembly language. How could it possibly not be relevant?


I read that line as suggesting that when using HLL the programmer generally does not have control over the assembly that is generated. I guess that is why sometimes, as the author suggests, the programmer may write part of a program in HLL and another part in assembly in order to achieve increase execution speed.


It's more than just execution speed. It could something as simple making a function using assembly that swaps the virtual memory page table on a processor. That's not something you gonna find in a high level language.


Yup. The example I implemented in college is stack pointer manipulation to implement multithreading (think pthread).

Some things can only be done in assembly.


Yea done that myself, save all the register states, save the stack pointer, and then set a new stack pointer and load all that tasks register states.

Although doing the same on windows is kinda annoying. Also the amount callee saved registers is kinda like what the heck. Here is some code I wrote doing that for window's ABI. https://pastebin.com/jnxeMRcV


There are a huge number of software engineers today, dare I say most, who cannot read or write assembly and are isolated from it by multiple layers of abstractions.


I would probably struggle to write assembly without difficulty and blowing something up but... is most macro assembler really that hard to read? The examples the author of the article gave seemed reasonably easy to parse out for anyone who has a conceptual model of control flow, variables, memory locations, and basic operations.


I know it's abstracted away, but it's still relevant to how the programs run because almost all programs go through an assembly language phase.


What languages don’t? All of them execute on the processor in some way.


You can generate machine language without an assembly language intermediate phase, if you just directly put bytes into a file. But most compilers have some kind of assembler in them.


I could be mistaken, but, don't JVM languages avoid assembly steps? They use bytecode and are translated directly to machine code; I'm not sure there is an translate-to-assembly step. I'm also not sure how much of this confusion is semantics vs actually different components of the compilation and running process.


They still have assemblers in them - they're just done with method calls rather than text.

https://github.com/openjdk/jdk/blob/e73ce9b406c34bd460f0797f...

Also if you look at the compiler's intermediate representation at the very last phases you'll see it basically looks like an assembly program.


Pure interpreters like CPython?


The interpreters themselves go through assembly language. Assembly is relevant because the whole stack is still actively built on it.


To me, assembly is one of the things while self studying CS that I felt lacked good support in resources such as this, MOOCs, or just plain explaining it. Usually when I post a topic trying to Demystify the topic I am greeted with an extremely hard to digest read about said topic that is more meant for people Already knowledgeable in the subject.

And I feel assembly should be more a core building skill in a programmers toolbox. So this article is very welcoming for o me.


Modern assembly is kind of...bad.

I would recommend checking out an old book for an old mainframe's assembly language. They're usually much less mystic by virtue of being much less complex. IBM had some really nice manuals and books; no one ever got fired for buying IBM because an IBM machine could be programmed by a dog.

Octal is where it's really at, though, if you get really into this. A fun weekend project is to write an octal "decompiler" (ideally you won't have compiled anything, just having written some octal by hand) that allows you to reason with what it's doing by translating it to an actual language rather than just thin syntactic sugar over 1s and 0s. Octal itself isn't so difficult, it makes binary much easier to reason with, but this definitely helps you get a more intuitive sense of what is what.

Of course, it's not something that has a substantial amount of value with modern machines. Maybe eventually we'll get back there; I think I'll enjoy it when we do. Until then, though, it's fun to play with.


> Octal is where it's really at

Why octal, not hex?


Octal is easier to keep in your head while not sacrificing any efficiency.


If you really want to dig into the concepts behind assembler, the assembler language part of TAOCP is available for free download: http://mmix.cs.hm.edu/doc/fasc1.pdf. It doesn’t talk about a “real” assembler language like 6502 or 8086, but a made-up one that was designed to present the concepts. It’s easy to move from Knuth’s academic introduction to an actual assembler.


I actually want to learn it to work on reverse engineering projects.

I just don’t know how to get started at all.

I don’t know how people can reverse engineer a device that you don’t access to the running program to. How do you monitor and track all the bits being passed around to break back firmware? Specifically video game mods and hacks I wanted to dabble in since I find their programming fascinating and know I’d be interested to contribute most in my spare time in that.


> video game mods and hacks I wanted to dabble in

Not sure you need assembly for that.

If you want to modify 3D rendered output, you normally need to adjust shaders, textures and such. For extreme cases, you can hook the entire Direct3D API adjusting how it works for the game. The only assembly you might need for that is shader assembly https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/... but not always necessary as the HLSL decompilers are often OK.

If you want to modify game logic, it’s normally implemented as scripts. Game designers and level designers don’t often know C++, and they certainly don’t want to recompile the game because it’s slow, they adjust scripts and see the result in real time.


I know and understand C++ as it’s the main language I’ve been working in for some time.

How do I modify the code of that which I don’t have access to?

What reverse engineer projects are good for beginners? I see people post here their first project attempt to reverse an older gadget. I’d love to pick up an older gadget and try to reverse engineer it and make it do what I want it to.


> How do I modify the code of that which I don’t have access to?

Native code reverse engineering is very time consuming. It’s often possible to achieve similar results by focusing on the code which you have access to. You don’t have source code of Windows OS components, but you do have their APIs and debug symbols, and that’s much better than just binaries.

If you want to change what’s rendered, you can replace the GPU API with a wrapped version, like renderdoc does. If you want to change what’s loaded from disk, patch game files, or replace whatever OS file I/O APIs is used by the game (DLL injection, then MinHook or Detours).

Even when you do need to change game’s own native code, directly patching machine code is rarely a good idea, very hard to implement and especially debug. An easier way is replacing complete functions with API-compatible replacements implemented in your DLL library in C++. Again, use MinHook or Detours to replace the implementation. C++ allows unrestricted memory access so you can read and write everywhere, here’s working examples: https://github.com/Const-me/vis_avs_dx/blob/master/avs_dx/Dx... https://github.com/Const-me/vis_avs_dx/blob/master/avs_dx/Dx... I didn’t have source code of these C++ classes, but wanted their data regardless. Found the offsets by using VS debugger, these third-party DLLs include GUI to change the values, I compared memory before/after making changes.

> What reverse engineer projects are good for beginners?

In the context of modern Windows games, assuming you wanna change what’s rendered, a good start might be https://renderdoc.org/. Officially, the tool is only supported when you run your own code. Technically, it often works with retail games too, just don’t open issues about that, they’ll be closed as a not supported use case. As a nice side effect, you’ll learn a thing of 2 about Direct3D. The tool is open source with good license (MIT), so you can fork, disable their frame captures, and change their API wrappers to modify the output of some particular game.

One more thing, modern games use a lot of bytecodes. E.g. D3D shaders are byte code, search “3dmigoto decompiler” to decompile dxbc into HLSL. .NET is often byte code (Unity3D is based on .NET), use reflector to decompile into C#. Many games use custom VMs, sometimes modding community has decompilers for their custom byte code.

> I’d love to pick up an older gadget and try to reverse engineer it

What do you mean by “gadget”?


Depends on the platform. Older platforms like the NES or SEGA Genesis often had software written in ARM - there are huge communities around modifying these games.


Good point. Yeah, for old games like NES, Genesis, or DOS, you don’t often have other choice.


I only say this because I myself am a girl who learned 68k ASM when I was ten or eleven through the Sonic ROM hacking scene :)


I got a fairly good introduction to assembly with LC-3 (Little Computer 3, an instruction set for learning) programming in an elementary electrical engineering class in college. I haven't looked myself, but for those self studying, searching "LC-3" might be a good option for self-learning assembly.



As recently as a year ago I used built in assembler in Delphi to perform some low level timing. It was not for just super-duper performance though. Surprisingly it was the most straightforward and simple way. Meanwhile my firmware for 3-Phase AC motor torque and speed control for really low power micro controller was doing fine with plain C without any assembly.


What do you do for a living?


I design and develop product for living. Universal guy. Mostly software but sometimes part of it is also firmware / elecronics / hardware. I happened to be a person who did everything from billing systems and other giant products for TELCOs and down to firmware for micro-controller.

Some products I own and some are made to order. Some I did on my own and for some I was a leader of big team


There is many kind of assembly language, some for the actual computer hardware, some for VMs, and some used as both.

I have used and sometimes still do use assembly language, including 6502 (specifically, NMOS 6502 without decimal arithmetic, including unofficial opcodes), and a little bit of x86 stuff (although the modern x86 is very messy, I think), but also Z-machine and Glulx. I have also used MIX and MMIX assembly (and may use MMIX more if I would actually make a computer with it). And then some other programs (such as ZZ Zero, which is similar to ZZT) has its own kind of assembly language.

One feature not mentioned is the relative numbered labels such as 1H and 2H available in MIXAL and MMIXAL; you can then use 2F to find the next 2H label forward, or 2B to find the next 2H label backward. My own assemblers for Glulx and ZZ Zero support the similar feature too.


He does touch on local labels briefly - he suggests (and I agree) that you can get the same effect, more maintainably, with macros.


Macros are good too, although I think both are useful.


There's food for thought there. As part of my current project I wrote an assembler for 8-bit Avr. (on github https://github.com/Lerc/AvrAsm/ )

Part of my motivation for this was to have an assembler that ran in the browser (for my fantasy console that also runs in the browser), but another big part of it was to write an assembler designed to be more friendly to people writing assembly directly.

When I wrote 6502 asm I mostly did it from Supermon which is a no frills experience. It's nice to see the features that assemblers have now, I think I'll be implementing quite a few of those macros from this link in my own assembler.



Optimizations have come a long way. Not only do you need to be an asm programmer to beat the optimizer, you need to be a good asm programmer.


Sure it is, someone has to write compiler backends and OS bootloaders, even if all instructions would be exposed as intrinsics.


Even back in the 1980s, author Lance Leventhal who wrote some great programming books on assembly always warned that productivity wise you write the same number of lines of code whether its 6502 or C.

As that 6502 example in the article shows, you don't great great productivity with assembly. And even macros don't improve on it that much.


One writes 6502 assembly by hand because there is really no alternative on that ISA. It does not make a good target for a C compiler, there are very few of them to begin with, and hand written 6502 (or 65816) is going to be better than anything produced by any compiler for it at this point in time.


On modern x86 it's not very useful at all anymore. An ICC or LLVM backend with compiler intrinsics will get you quicker performance, with reduced maintenance and cost. Performance will also move over time with backend optimizations getting better.

You can still do it if you care about debug build performance.


I'm rarely keen on posting negatives on articles that clearly took a lot of time to make, but I think this requires a bit of correction.

I think this article is very, very simplistic. All of it relates to a 8 bits CPU that is 40+ years old.

I switched to HLL as soon as I could get my hand on a compiler, namely, UCSD Pascal at the time! Then the Pascal, then to C and then myriads of other languages. I covered 6502, Z80, 68k (all of them, to 68040), PowerPC (all of them from 601 prototypes to G5s), ARMs (more than I can count) and x86s (same).

True to be told, the assembly language I started with /helped a LOT/ with be becoming an efficient developer; a developer who understand what 'code' is being generated when he writes an expression, a statement, a loop, and one who understands what the runtime implication are for most of the 'sugar coating' HLL gives.

However, starting (a bit) with the 68k, then even more so with the PowerPC, it became pretty much impossible to write /from scratch/ an assembly equivalent that was QUICKER than the compiler generated code. That was 20+ years ago. DRAM latency happened, pipelining happened and SIMD happened.

Today, hand writing assembly is pretty much stupid on modern CPUs. Given the register files, timings, shadow registers, bus latencies etc etc the compiler will ALWAYS be better because there is so much criteria to think about when generating code...

I'm not saying that having the knowledge is not useful; the best use of assembly is to write some code il HLL, one that is supposed to be super-mega-critical-quick, then disassemble it and see how it looks. More often then not, you can't make it better than it is in situ -- most of the time you will gain is to prepare your data better, align it better etc etc -- basically, 'hinting' the compiler to do a better job. You can do serious code butchery like that, without a hint of assembler [0].

But really, I haven't written any assembly for /performance reasons/ in 15 years, and that was Altivec on PowerPC.

For 8 bits, it's all smooth as butter, but the article also doesn't take into account the massive progress in compilers; I'm the author of SimAVR [1] and I've seen my load of generated code for that CPU, and the GCC toolchain is /very hard to beat/ by hand these days.

[0]: critical audio loop on one of my old PCI card driver, converting float<->int, applying gain etc while using the register file to the max, and making most use of the pipelining of the G4 (at the time) https://gist.github.com/buserror/0a3a69cca927b8da6c9c7ee1605... -- note, the inner loop was generated by a script that was doing the cycle calculations (!)

[1]: https://github.com/buserror/simavr


> Today, hand writing assembly is pretty much stupid on modern CPUs

Yup. Explains all that neat hand-written AVX asm code in your video decoder, strcmp() implementation, lzma decompressor, utf8 parser, and the base64 decode logic in your browser.

A lot of people put in a lot of hard work so that you can have the cute thought that there is no more reason to write assembly. Many of them wrote your compilers, some of them wrote some of the logic I mentioned above. Quite sure that none of them appreciate being called "pretty much stupid".


You haven't written any Assembly, because the guys and girls that implemented the compilers you use have done it for you.

Someone has to create those tools.


This article is specific to the 6502 where the commonly used CC65 C compiler the author references produces much, much worse code speed wise than what you can with pure assembly. In that regard, the article is not simplistic in the least. Coincidentally, I messaged the author just yesterday about the for loop example to point out that it was generated without optimization. Even with optimization enabled, the code is still about 3 times slower than hand written assembly. I know this may not be typical for other architectures like AVR but it certainly is for 6502.


I used to do my production code exclusively in 6502 assembler, with some tools in P-system Pascal. As I would read in magazines about the C language I would try to imagine what the C compiler would generate for certain constructs, and I couldn't imagine it being efficient compared to other 8-bit processors. Then we decided to experiment (at the company) with C and got a compiler. I was right, the code was awful. It used exactly the idioms I thought I would use if I had to to it. I can picture a really top-notch compiler doing better (because I'm more familiar with optimization in compilers now), but sooner or later some of the quirks (like 8-bit index registers and only page 0 can be used for pointers) will catch you.


In my experience, you can easily beat a compiler for size, which may or may not correspond to speed.


This is like being an architect and noticing that bricks and cement are still relevant... Of course they are!


It's more like being an architect and working with bricks and cement yourself. The author's argument is that sometimes an "architect" should do that. Agree or disagree, but it's certainly not a truism.


An architect might in fact design something down to the brick-and-mortar level, such that someone else assembles it to the actual building. That architect is working with bricks and cement, effectively at the design level.

That's the same as using assembly language, rather than poking binary/hex values into memory.


Some architects are surprised by the requirement to have load bearing walls.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: