Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Modern CPUs are more difficult to program in assembly.

The simplicity of RISC-V is illusory. Because it lacks many features of normal ISAs, like ARM or Intel/AMD x86-64, writing programs that are both efficient and robust, i.e. which handle safely any errors, is quite difficult in assembly language.

For a simpler programming in assembly language it is hard to beat DEC PDP-11 and Motorola 68000 derivatives.

However those are no longer directly useful in practice. For something useful, the best would be to learn assembly programming using a development board for some Cortex-M microcontroller, preferably a less ancient core, e.g. with Cortex-M23 or Cortex-M33 or Cortex-M55 or Cortex-M85, i.e. cores implementing the Armv8-M ISA (the latter 2 also implement the Helium vector extension).

Probably some development board with a microcontroller using Cortex-M33 would be the easiest to find and it should cost no more than $20 to $30. I would not recommend for learning today any of the development boards with obsolete cores, like Cortex-M0+, Cortex-M3, Cortex-M4 or Cortex-M7, even if those boards can be found e.g. at $10 or even less.

Such a development board can be connected to any PC with a USB cable. All the development tools are free (there are paid alternatives, but those are not better than the free GNU tools).

You can compile or assemble your program on the PC, then load and run it on the development board. You can have a serial console window connecting to your program, by using a serial port on the development board and a USB serial interface. All such development boards have LEDs and various connectors for peripherals, allowing you to see what your program does.

I think that learning to program in assembly such an Armv8-M microcontroller is more useful than learning something like 6502. Armv8-M is less quirky than 6502 or RISC-V and it is something that you can use for implementing some useful home device or even professionally.

Otherwise, the best is to learn the assembly language of your personal computer, e.g. x86-64 or Aarch64, but that is much more difficult than starting with a microcontroller development board from ST (e.g. a cheap STM32 Nucleo variant), NXP, Infineon, Renesas, Microchip, etc.



> Probably some development board with a microcontroller using Cortex-M33 would be the easiest to find and it should cost no more than $20 to $30.

The Pi Pico 2 RP2350 has dual Cortex-M33 cores (and RISC-V), and costs US$5.


> Because it lacks many features of normal ISAs

Do you have some examples of this?


The most important are the lack of integer overflow detection and indexed addressing. Integer overflow detection is required for any arithmetic operation unless it is possible to prove at compile time that overflow is impossible (which is possible mostly for operations with some counters or indices, whose values are confined inside known ranges), while indexed addressing is needed in all loops that access arrays, i.e. extremely frequently from the point of view of the number of actually executed instructions.

There are absolutely no reasons for omitting these essential features, because their hardware implementation is many times simpler and cheaper than either the software workarounds for their absence or than the hardware workarounds that can be implemented in other parts of the CPU core, e.g. instruction fusion.

6502 is much more similar to a normal CPU than RISC-V, because it has both integer overflow detection and indexed addressing.

While I believe that other ISAs are better than 6502 for learning assembly language for the first time, I consider 6502 as a much better choice than RISC-V.

RISC-V could be used for teaching if that would be done in the right way, i.e. by showing how to implement with the existing RISC-V instructions everything that is missing in RISC-V. In that case the students would still learn how to write real programs instead of toy didactic programs.

However I have not seen any source teaching RISC-V in this way and it would be tedious for newbies, in the same way as if they were first taught how to implement a floating-point library on a CPU without floating-point instructions, instead of being allowed to use the floating-point instructions that any CPU has today.


> integer overflow detection

What are you looking for here? Carry and overflow flags were explicitly not included because of the additional cost for OoO processors.

Let's compare overflow detection on RISC-V vs aarch64:

    unsigned 64-bit:
        add: RV: add+bltu         Arm: adds+bcc
        sub: RV: sub+bltu         Arm: subs+bcs
        mul: RV: mulhu+mul+beqz   Arm: umulh+mul+cbz

    unsigned 32-bit:
        add: RV: addw+bgeu       Arm: adds+bcc
        sub: RV: subw+bgeu       Arm: subs+bcs
        mul: RV: mul+slli+beqz   Arm: umul+cmp lsr 32

    signed 64-bit:
        add: RV: add+slt+slti+beq    Arm: adds+bcc
        sub: RV: sub+slt+slti+beq    Arm: subs+bcs
        mul: RV: mulh+mul+srai+beq   Arm: smulh+mul+cmp asr 63

    signed 32-bit:
        add: RV: addw+add+beq     Arm: adds+bvc
        sub: RV: subw+sub+beq     Arm: subs+bvs
        mul: RV: mul+sext.w+bew   Arm: smul+asr+cmp asr 31
So it's on par for unsigned, and takes two additional independent instructions for signed 64-bit and one for signed 32-bit.

For teaching, using unsigned XLEN-bit values by default is probably a good idea anyway.

> indexed addressing

I'm not sure how much this actually matters in practice. It's nice when you access multiple arrays at the same index, such that you only need to implement one index instead of every pointer. Such loops are often vectorized, and the indexed loads become useless, once you read two values from an array index, e.g. an array of structs.

Edit: removed measurements, because I'm not sure they are correct, might add back later.


The cost of providing carry and overflow is absolutely negligible in any CPU and even more so in an OoO CPU, which is many times more complex.

If you mean that if the flags are not stored in a general-purpose register, which is a possible choice, but it requires an extra register file write port, but in a dedicated flags register, like in x86 or ARM, then the flags register must also be renamed to allow concurrent operations, like any other register, this is a minor complication over having register renaming for all other registers.

What is extremely expensive is not having overflow and carry in hardware and having to implement software workarounds that require several times more instructions.

When loops are vectorized or you have an array of structures, this does not change anything, you still use the same indexed addressing (or auto-incremented addressing in ISAs that have it). Perhaps you think about scaled indexed addressing, which may not always work for an array of structures, but in such cases you just use simple indexed addressing, with the scale factor 1.

Without either indexed addressing or auto-incremented addressing you need an extra addition instruction for each memory access, which increases the code size and it limits the execution speed by occupying an extra execution port.

Because of this, the highest-performing RISC-V implementations have added non-standard ISA extensions for indexed addressing, but such extensions are still rather awkward because the base instruction encoding has not been thought for allowing indexed addressing, so the extensions must use a quite different encoding that must be squeezed in a limited encoding space.


> indexed addressing

Ok, this time I have a proper benchmark.

The XuanTie C910 core supports a custom extension with indexed loads [0] and I got my access to a MILK-V Pioneer server with that CPU working again.

As a quick benchmark I used a self compile of the chibicc C compiler: https://godbolt.org/z/4MMxsEarE

Measuring with the equivalent of rv64gcb and rv64gcb_xtheadmemidx, the xtheadmemidx variant ended up about 0.3% faster on the same core.

[0] https://github.com/XUANTIE-RV/thead-extension-spec/blob/mast...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: