patternMinor
Unsigned integer division ARM Cortex-M0+ Assembly
Viewed 0 times
divisionunsignedcortexassemblyarminteger
Problem
I am writing a subroutine for unsigned integer division in Assembly. I will call the subroutine
Inputs:
Outputs: The quotient is going to be in
Basically, I am trying to make something like this:
If
I have followed this idea:
This is just a learning exercise, so the low performance of repeated subtraction is ok, as discussed in comments on the original Stack Overflow question posted before writing this code.
And in Assembly this is what I produced:
DIVU.Inputs:
R1 will be the dividend. The divisor will be in R0.Outputs: The quotient is going to be in
RO and the remained in R1.Basically, I am trying to make something like this:
R1 / R0 = R0remainderR1If
R0=0, I want to leave the input parameters unchanged and set the C flag when it returns. Otherwise, I just want to clear the C flag. I do not want to change any other registers' values after returning.I have followed this idea:
Quotient = 0;
while (Dividend >= Divisor) {
Dividend -= Divisor;
Quotient += 1;
}
Remainder = Dividend;This is just a learning exercise, so the low performance of repeated subtraction is ok, as discussed in comments on the original Stack Overflow question posted before writing this code.
And in Assembly this is what I produced:
DIVU
CMP R1,#0 ;compares R1 to 0
BEQ AnsZero ;if R1=0, it branches to AnsZero (the final answer will be 0)
CMP R0,#0 ;compares R0 to 0
BEQ EndFlag ;if R0=0, it will go to the end to set C flag
PUSH {R3, LR} ;saves R3 so it can used as a counter for quotient
MOV R3,#0 ;sets R3 to 0
While CMP R0,R1 ;start of while loop
BLT EndWhil ;Branches to end of while when dividend < divisor, otherwise goes through loop
SUB R1,R1,R0 ;R1=R1-R0 , dividend=dividend-divisor
ADD R3,R3,#1 ;R3=R3+1, quotient=quotient+1 (init is zero, so 0+1=1 if one successful loop)
B While ;continues loop
EndWhil MOV R0,R3 ;R0=R3, the register that had the divisor gets the quotient
POP {R3, PC} ;R3's original value is returned
BX LR ;ends subroutine
EndFlag SUBS R0,R0,#1
MOV R0, #0
BX LR ;ends subroutine
AnsZero MOV R0,#0 ;sets R0=0 because R1=1, 0/X=0r0
BX LR ;ends subroutine
BX LR ;ends subroutineSolution
You have at least a couple bugs. The LT condition is signed less-than. You need BLO to branch on the unsigned Lower-Than condition (branch if Carry is unset). See also this article about carry vs. overflow.
Also, I think you forgot to put the remainder into R1.
Your custom calling convention makes life difficult. Flag return values appear to be cumbersome in Thumb mode, because many instructions are only available in flag-setting form. (Cortex-M0 only supports Thumb mode, with these instructions.) It's also strange to not let your function clobber R2 and R3 like the standard calling convention allows. This would reduce code-size for the function, although it would increase overall code size if there are many call sites.
It's normal to arrange a loop so the conditional branch is at the end. That reduces the instruction-count by one (removing the unconditional branch). Sometimes you need to test if the loop should even run once before falling into it, or jump to the test at the end, if you can't guarantee that it should always run at least once (
You can combine the exit code-paths around AnsZero. You have
The comment character in ARM asm is
Further style points: use
All-upper-case for asm instructions and register names is a valid choice. I don't like it, but I guess it doesn't hurt. Using it for symbol names is a bad idea, because you don't want to have to use all-caps names to call it from C.
Avoid useless comments like
Leave blank spaces between logically-separate blocks of code, even when there aren't branches. This improves human-readability.
I like to leave a space between operands in the operand list, like
My version:
Always comment the top of your function with some high level description of input/output register usage. Just like in a higher-level language, describe the contract the function makes with its caller.
My asm code usually ends up littered with comments about alternatives I decided against. It's not an ideal example of good style.
`.syntax unified @ allow 2 or 3 operand forms of instructions.
.cpu cortex-m0
.thumb @ this is probably implied already by the .cpu
@@ input: R0=divisor, R1=dividend
@@ calculate R1/R0 by repeated subtraction
@@ output: R0=quotient, R1=remainder, C flag unset.
@@ or on division by zero: R0,R1 unchanged, C flag set.
@@ Other regs unmodified (even r2 and r3, which the normal calling convention allows functions to use as scratch regs)
.globl divu
divu:
CMP R1, #0 @ return 0,0 instead of divide error for the 0/0 corner case.
BEQ zero_dividend @ label names that describe why you go there are usually good. Comments at the label can describe what happens there.
CMP R0, #0
BEQ div_by_zero
@CMP R0, R1 @ let this case fall through the loop once, instead of slowing down the common case to speed up this special case.
@BLO QuotientZero
PUSH {R3, LR} @ LR doesn't make a good scratch reg, since many insns can only use low regs (R0-R7). Push/popping it saves a BX LR
MOVS R3, #0 @ R3 = quotient = repeated-subtraction counter
@ LDR R3, =#-1 @ account for the loop overshoot up-front. But don't do this because cortex-m0 can't encode it in one insn other than a PC-relative load
sub_loop: @do{
ADDS R3, #1 @ quotient += 1. (init is zero, so 0+1=1 if one successful loop)
SUBS R1, R0 @ dividend -= divisor and set flags,
@ CMP R0, R1 @ ...avoiding this cmp instruction. Potentially a significant speedup for a tight loop.
BLO sub_loop @} while(that didn't carry)@ i.e. while divisor was lower (unsigned) than the old value of dividend.
Also, I think you forgot to put the remainder into R1.
Your custom calling convention makes life difficult. Flag return values appear to be cumbersome in Thumb mode, because many instructions are only available in flag-setting form. (Cortex-M0 only supports Thumb mode, with these instructions.) It's also strange to not let your function clobber R2 and R3 like the standard calling convention allows. This would reduce code-size for the function, although it would increase overall code size if there are many call sites.
It's normal to arrange a loop so the conditional branch is at the end. That reduces the instruction-count by one (removing the unconditional branch). Sometimes you need to test if the loop should even run once before falling into it, or jump to the test at the end, if you can't guarantee that it should always run at least once (
do{}while() style).You can combine the exit code-paths around AnsZero. You have
MOV R0, #0 / BX LR twice, so you should just put AnsZero pointing at the first one and leave out the second. You also have two consecutive BX LR instructions, where you previously had a B to the next line at the end of the function. Never branch to the instruction that normal fall-through execution would take you to anyway.The comment character in ARM asm is
@. ; is used in x86 NASM / MASM, but the GNU assembler uses it to separate multiple instructions on the same line. Maybe there are ARM assemblers that use ; as a comment character, but making your code assemble with GAS seems like a good idea. Note that mov r3, #0 won't assemble with -mcpu=cortex-m0, because movs is the only immediate-mov instruction it supports. Cortex-M0 has very limited instruction choices.Further style points: use
: after label names, even if your assembler syntax doesn't technically require it. Some people may like to omit it when assembling data sections, but I don't think anyone likes it for code sections.All-upper-case for asm instructions and register names is a valid choice. I don't like it, but I guess it doesn't hurt. Using it for symbol names is a bad idea, because you don't want to have to use all-caps names to call it from C.
Avoid useless comments like
CMP R0,#0 ;compares R0 to 0. asm mnemonics are not that hard to decipher (except PowerPC). Comment space is limited, don't waste it saying the same thing the reader learned from reading the code itself.Leave blank spaces between logically-separate blocks of code, even when there aren't branches. This improves human-readability.
I like to leave a space between operands in the operand list, like
cmp r0, #0 instead of cmp r0,#0.My version:
Always comment the top of your function with some high level description of input/output register usage. Just like in a higher-level language, describe the contract the function makes with its caller.
My asm code usually ends up littered with comments about alternatives I decided against. It's not an ideal example of good style.
`.syntax unified @ allow 2 or 3 operand forms of instructions.
.cpu cortex-m0
.thumb @ this is probably implied already by the .cpu
@@ input: R0=divisor, R1=dividend
@@ calculate R1/R0 by repeated subtraction
@@ output: R0=quotient, R1=remainder, C flag unset.
@@ or on division by zero: R0,R1 unchanged, C flag set.
@@ Other regs unmodified (even r2 and r3, which the normal calling convention allows functions to use as scratch regs)
.globl divu
divu:
CMP R1, #0 @ return 0,0 instead of divide error for the 0/0 corner case.
BEQ zero_dividend @ label names that describe why you go there are usually good. Comments at the label can describe what happens there.
CMP R0, #0
BEQ div_by_zero
@CMP R0, R1 @ let this case fall through the loop once, instead of slowing down the common case to speed up this special case.
@BLO QuotientZero
PUSH {R3, LR} @ LR doesn't make a good scratch reg, since many insns can only use low regs (R0-R7). Push/popping it saves a BX LR
MOVS R3, #0 @ R3 = quotient = repeated-subtraction counter
@ LDR R3, =#-1 @ account for the loop overshoot up-front. But don't do this because cortex-m0 can't encode it in one insn other than a PC-relative load
sub_loop: @do{
ADDS R3, #1 @ quotient += 1. (init is zero, so 0+1=1 if one successful loop)
SUBS R1, R0 @ dividend -= divisor and set flags,
@ CMP R0, R1 @ ...avoiding this cmp instruction. Potentially a significant speedup for a tight loop.
BLO sub_loop @} while(that didn't carry)@ i.e. while divisor was lower (unsigned) than the old value of dividend.
Code Snippets
$ arm-linux-gnueabi-objdump -d arm-divu.o
arm-divu.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <divu>:
0: 2900 cmp r1, #0
2: d00a beq.n 1a <zero_dividend>
4: 2800 cmp r0, #0
6: d007 beq.n 18 <div_by_zero>
8: b508 push {r3, lr}
a: 2300 movs r3, #0
0000000c <sub_loop>:
c: 3301 adds r3, #1
e: 1a09 subs r1, r1, r0
10: d3fc bcc.n c <sub_loop>
12: 4401 add r1, r0
14: 1e58 subs r0, r3, #1
16: 4280 cmp r0, r0
18: bd08 pop {r3, pc}
0000001a <div_by_zero>:
1a: 3801 subs r0, #1
0000001c <zero_dividend>:
1c: 2000 movs r0, #0
1e: 4770 bx lrContext
StackExchange Code Review Q#141388, answer score: 4
Revisions (0)
No revisions yet.