patternMinor

Unsigned integer division ARM Cortex-M0+ Assembly

Submitted by: @import:stackexchange-codereview·Mar 10, 2026·

Viewed 0 times

divisionunsignedcortexassemblyarminteger

Problem

I am writing a subroutine for unsigned integer division in Assembly. I will call the subroutine DIVU.

Inputs: R1 will be the dividend. The divisor will be in R0.

Outputs: The quotient is going to be in RO and the remained in R1.

Basically, I am trying to make something like this:

R1 / R0 = R0remainderR1

If R0=0, I want to leave the input parameters unchanged and set the C flag when it returns. Otherwise, I just want to clear the C flag. I do not want to change any other registers' values after returning.

I have followed this idea:

Quotient = 0; 
while (Dividend >= Divisor) { 
    Dividend -= Divisor; 
    Quotient += 1;
}
Remainder = Dividend;

This is just a learning exercise, so the low performance of repeated subtraction is ok, as discussed in comments on the original Stack Overflow question posted before writing this code.

And in Assembly this is what I produced:

DIVU
    CMP R1,#0       ;compares R1 to 0
    BEQ AnsZero     ;if R1=0, it branches to AnsZero (the final answer will be 0)
    CMP R0,#0       ;compares R0 to 0
    BEQ EndFlag     ;if R0=0, it will go to the end to set C flag
    PUSH    {R3, LR}        ;saves R3 so it can used as a counter for quotient
    MOV R3,#0       ;sets R3 to 0
    While   CMP R0,R1       ;start of while loop 
    BLT EndWhil     ;Branches to end of while when dividend < divisor, otherwise goes through loop
    SUB R1,R1,R0    ;R1=R1-R0 , dividend=dividend-divisor
    ADD R3,R3,#1    ;R3=R3+1, quotient=quotient+1 (init is zero, so 0+1=1 if one successful loop)
    B   While       ;continues loop

EndWhil MOV R0,R3       ;R0=R3, the register that had the divisor gets the quotient
    POP {R3, PC}        ;R3's original value is returned
    BX  LR      ;ends subroutine

EndFlag SUBS    R0,R0,#1 
    MOV     R0, #0  
    BX  LR      ;ends subroutine

AnsZero MOV R0,#0       ;sets R0=0 because R1=1, 0/X=0r0
    BX  LR      ;ends subroutine

    BX  LR      ;ends subroutine

Solution

You have at least a couple bugs. The LT condition is signed less-than. You need BLO to branch on the unsigned Lower-Than condition (branch if Carry is unset). See also this article about carry vs. overflow.

Also, I think you forgot to put the remainder into R1.

Your custom calling convention makes life difficult. Flag return values appear to be cumbersome in Thumb mode, because many instructions are only available in flag-setting form. (Cortex-M0 only supports Thumb mode, with these instructions.) It's also strange to not let your function clobber R2 and R3 like the standard calling convention allows. This would reduce code-size for the function, although it would increase overall code size if there are many call sites.

It's normal to arrange a loop so the conditional branch is at the end. That reduces the instruction-count by one (removing the unconditional branch). Sometimes you need to test if the loop should even run once before falling into it, or jump to the test at the end, if you can't guarantee that it should always run at least once (do{}while() style).

You can combine the exit code-paths around AnsZero. You have MOV R0, #0 / BX LR twice, so you should just put AnsZero pointing at the first one and leave out the second. You also have two consecutive BX LR instructions, where you previously had a B to the next line at the end of the function. Never branch to the instruction that normal fall-through execution would take you to anyway.

The comment character in ARM asm is @. ; is used in x86 NASM / MASM, but the GNU assembler uses it to separate multiple instructions on the same line. Maybe there are ARM assemblers that use ; as a comment character, but making your code assemble with GAS seems like a good idea. Note that mov r3, #0 won't assemble with -mcpu=cortex-m0, because movs is the only immediate-mov instruction it supports. Cortex-M0 has very limited instruction choices.

Further style points: use : after label names, even if your assembler syntax doesn't technically require it. Some people may like to omit it when assembling data sections, but I don't think anyone likes it for code sections.

All-upper-case for asm instructions and register names is a valid choice. I don't like it, but I guess it doesn't hurt. Using it for symbol names is a bad idea, because you don't want to have to use all-caps names to call it from C.

Avoid useless comments like CMP R0,#0 ;compares R0 to 0. asm mnemonics are not that hard to decipher (except PowerPC). Comment space is limited, don't waste it saying the same thing the reader learned from reading the code itself.

Leave blank spaces between logically-separate blocks of code, even when there aren't branches. This improves human-readability.

I like to leave a space between operands in the operand list, like cmp r0, #0 instead of cmp r0,#0.
My version:

Always comment the top of your function with some high level description of input/output register usage. Just like in a higher-level language, describe the contract the function makes with its caller.

My asm code usually ends up littered with comments about alternatives I decided against. It's not an ideal example of good style.
`.syntax unified @ allow 2 or 3 operand forms of instructions.
.cpu cortex-m0
.thumb @ this is probably implied already by the .cpu

@@ input: R0=divisor, R1=dividend
@@ calculate R1/R0 by repeated subtraction
@@ output: R0=quotient, R1=remainder, C flag unset.
@@ or on division by zero: R0,R1 unchanged, C flag set.
@@ Other regs unmodified (even r2 and r3, which the normal calling convention allows functions to use as scratch regs)
.globl divu
divu:
CMP R1, #0 @ return 0,0 instead of divide error for the 0/0 corner case.
BEQ zero_dividend @ label names that describe why you go there are usually good. Comments at the label can describe what happens there.

CMP R0, #0
BEQ div_by_zero

@CMP R0, R1 @ let this case fall through the loop once, instead of slowing down the common case to speed up this special case.
@BLO QuotientZero

PUSH {R3, LR} @ LR doesn't make a good scratch reg, since many insns can only use low regs (R0-R7). Push/popping it saves a BX LR
MOVS R3, #0 @ R3 = quotient = repeated-subtraction counter
@ LDR R3, =#-1 @ account for the loop overshoot up-front. But don't do this because cortex-m0 can't encode it in one insn other than a PC-relative load

sub_loop: @do{
ADDS R3, #1 @ quotient += 1. (init is zero, so 0+1=1 if one successful loop)
SUBS R1, R0 @ dividend -= divisor and set flags,
@ CMP R0, R1 @ ...avoiding this cmp instruction. Potentially a significant speedup for a tight loop.
BLO sub_loop @} while(that didn't carry)@ i.e. while divisor was lower (unsigned) than the old value of dividend.

Code Snippets

$ arm-linux-gnueabi-objdump -d arm-divu.o 

arm-divu.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <divu>:
   0:   2900            cmp     r1, #0
   2:   d00a            beq.n   1a <zero_dividend>
   4:   2800            cmp     r0, #0
   6:   d007            beq.n   18 <div_by_zero>
   8:   b508            push    {r3, lr}
   a:   2300            movs    r3, #0

0000000c <sub_loop>:
   c:   3301            adds    r3, #1
   e:   1a09            subs    r1, r1, r0
  10:   d3fc            bcc.n   c <sub_loop>
  12:   4401            add     r1, r0
  14:   1e58            subs    r0, r3, #1
  16:   4280            cmp     r0, r0
  18:   bd08            pop     {r3, pc}

0000001a <div_by_zero>:
  1a:   3801            subs    r0, #1

0000001c <zero_dividend>:
  1c:   2000            movs    r0, #0
  1e:   4770            bx      lr

Context

StackExchange Code Review Q#141388, answer score: 4

Revisions (0)

No revisions yet.