patternMinor
Checking substring in 8086 ASM
Viewed 0 times
substringcheckingasm8086
Problem
I have tried like this to check a substring in a mainstring in 8086. Is there any shorter way of doing this? My implementation seems lengthy.
DATA SEGMENT
STR1 DB 'MADAM'
LEN1 DW ($-STR1); storing the length of STR1
STR2 DB 'MADAA'
LEN2 DW ($-STR2); stroing the length of STR2
DATA ENDS
CODE SEGMENT
LEA SI, STR1
LEA DI, STR2
MOV DX, LEN1
MOV CX, LEN2
CMP CX, DX; comparing main & substring length
JA EXIT; if substring size is bigger than there is no chance to be found it in main string
JE SAMELENGTH; if main & sub string both have same length the we can compare them directly
JB FIND; general case (substring length FIND
ADD SP, 0001H
CLD
REPE CMPSB
JNE TEMPRED
JMP GREEN
TEMPRED:; substring not found starting from the current character of main string, but it is possible to find match if we start from next character in main string
MOV SI,SP; going to the next character of main string (after REPE CMPSB of CHECK segment)
DEC DX
LEA DI, STR2; reloading substring index in DI (after REPE CMPSB of CHECK segment)
JMP FIND; if a character matches but the following substring mismatches in main string then we start over the same process from the next character of main string by going to FIND segment
GREEN:
MOV BX, 0001H; substring found
JMP EXIT
RED:
MOV BX, 0000H; substring not found
JMP EXIT
EXIT:
CODE ENDS
END
RETSolution
There are a number of things that could be improved with this code. I hope you find these suggestions helpful.
Specify which assembler
Unlike C or Python, there are a great many variations in assembler syntax, even for the same architecture, such as the x86 of this code. Generally, it's useful to note which assembler, which target processor and which OS (if any) in the comments at the top of the file. In this case, it looked most like 16-bit TASM, so that's the compiler I used to test this code.
Use an
The code would not assemble for me until I added an
The
Avoid instructions outside any segment
The
The problem is that the
Eliminate convoluted branching
Avoid needless branching. They make your code harder to read and slower to execute. For example, the code currently has this:
This could be very much simplified:
There are a number of such simplifications possible with little effort.
Know your instruction set
The code currently has this set of instructions
However, the
Use
The code at the location
Avoid using
Just after
Consider using standard length lines
The comments in the code are very long and the semicolon is right after the instruction. Neither of these things are necessarily wrong, but they are different from the usual convention which is to align the semicolon character in some column and making sure that lines are no more than 72 characters long (some use 78).
Specify which assembler
Unlike C or Python, there are a great many variations in assembler syntax, even for the same architecture, such as the x86 of this code. Generally, it's useful to note which assembler, which target processor and which OS (if any) in the comments at the top of the file. In this case, it looked most like 16-bit TASM, so that's the compiler I used to test this code.
Use an
ASSUME directiveThe code would not assemble for me until I added an
ASSUME directive. The ASSUME directive doesn't actually generate any code. It simply specifies which assumptions the assembler should make when generating the output. It also helps human readers of your code understand the intended context. In this particular case, I added this line just after the CODE SEGMENT declaration:ASSUME CS:CODE, DS:DATA, ES:DATAThe
CS and DS assumptions are obvious, but the ES assumption is less so. However, the code uses the CMPSB instruction and based on the context, this means an implicit assumption that ES also points to the DATA segment. In my case, (emulated 16-bit DOS), I had to add a few statements to the start of the code to actually load the DS and ES segment registers appropriately.Avoid instructions outside any segment
The
EXIT code currently looks like this:EXIT:
CODE ENDS
END
RETThe problem is that the
CODE ENDS closes the CODE segment and the END directive tells the assembler that there is no more code and thus the RET instruction may or may not be assembled, and may or may not actually be placed in the CODE segment. You probably meant instead to do this:EXIT:
RET
CODE ENDS
ENDEliminate convoluted branching
Avoid needless branching. They make your code harder to read and slower to execute. For example, the code currently has this:
JA EXIT
JE SAMELENGTH
JB FIND
SAMELENGTH:
CLD
REPE CMPSB
JNE RED
JMP GREEN
; ... code elided
GREEN:
MOV BX, 0001H; substring found
JMP EXIT
RED:
MOV BX, 0000H; substring not found
JMP EXIT
EXIT:This could be very much simplified:
JA EXIT
JB FIND
; fall through to same length
SAMELENGTH:
XOR BX,BX ; assume string not found
CLD
REPE CMPSB
JNE EXIT
INC BX ; indicate that string was found
EXIT:There are a number of such simplifications possible with little effort.
Know your instruction set
The code currently has this set of instructions
DEC DX
CMP DX, 0000H
JE REDHowever, the
DEC instruction already sets the Z flag, so the CMP instruction is not needed.Use
REPNE SCASB as appropriateThe code at the location
FIND is largely the same as would have been done by using REPNE SCASB. The only difference is in which registers are used. The code you have isn't necessarily wrong, but it could probably be shorter.Avoid using
SP as a general registerJust after
CHECK, the code saves a copy of the pointer (not an index as the comment falsely claims) to the SP register. However, SP is a stack pointer, so this code can only be used in an environment in which the stack is not used. That could be the case, but it makes the code much less portable to code it that way, especially because the AX or BX registers could just as easily have been used here.Consider using standard length lines
The comments in the code are very long and the semicolon is right after the instruction. Neither of these things are necessarily wrong, but they are different from the usual convention which is to align the semicolon character in some column and making sure that lines are no more than 72 characters long (some use 78).
Code Snippets
ASSUME CS:CODE, DS:DATA, ES:DATAEXIT:
CODE ENDS
END
RETEXIT:
RET
CODE ENDS
ENDJA EXIT
JE SAMELENGTH
JB FIND
SAMELENGTH:
CLD
REPE CMPSB
JNE RED
JMP GREEN
; ... code elided
GREEN:
MOV BX, 0001H; substring found
JMP EXIT
RED:
MOV BX, 0000H; substring not found
JMP EXIT
EXIT:JA EXIT
JB FIND
; fall through to same length
SAMELENGTH:
XOR BX,BX ; assume string not found
CLD
REPE CMPSB
JNE EXIT
INC BX ; indicate that string was found
EXIT:Context
StackExchange Code Review Q#60389, answer score: 9
Revisions (0)
No revisions yet.