HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Checking substring in 8086 ASM

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
substringcheckingasm8086

Problem

I have tried like this to check a substring in a mainstring in 8086. Is there any shorter way of doing this? My implementation seems lengthy.

DATA SEGMENT
STR1 DB 'MADAM'
LEN1 DW ($-STR1);       storing the length of STR1
STR2 DB 'MADAA'
LEN2 DW ($-STR2);       stroing the length of STR2
DATA ENDS

CODE SEGMENT

LEA SI, STR1
LEA DI, STR2
MOV DX, LEN1
MOV CX, LEN2
CMP CX, DX;             comparing main & substring length
JA EXIT;                if substring size is bigger than there is no chance to be found it in main string
JE SAMELENGTH;          if main & sub string both have same length the we can compare them directly
JB FIND;                general case (substring length  FIND
        ADD SP, 0001H
        CLD
        REPE CMPSB
        JNE TEMPRED
        JMP GREEN

TEMPRED:;               substring not found starting from the current character of main string, but it is possible to find match if we start from next character in main string
        MOV SI,SP;      going to the next character of main string (after REPE CMPSB of CHECK segment)
        DEC DX
        LEA DI, STR2;   reloading substring index in DI (after REPE CMPSB of CHECK segment)
        JMP FIND;       if a character matches but the following substring mismatches in main string then we start over the same process from the next character of main string by going to FIND segment         

GREEN:  
        MOV BX, 0001H;  substring found
        JMP EXIT

RED:    
        MOV BX, 0000H;  substring not found
        JMP EXIT

EXIT:         
    CODE ENDS
    END
    RET

Solution

There are a number of things that could be improved with this code. I hope you find these suggestions helpful.

Specify which assembler

Unlike C or Python, there are a great many variations in assembler syntax, even for the same architecture, such as the x86 of this code. Generally, it's useful to note which assembler, which target processor and which OS (if any) in the comments at the top of the file. In this case, it looked most like 16-bit TASM, so that's the compiler I used to test this code.

Use an ASSUME directive

The code would not assemble for me until I added an ASSUME directive. The ASSUME directive doesn't actually generate any code. It simply specifies which assumptions the assembler should make when generating the output. It also helps human readers of your code understand the intended context. In this particular case, I added this line just after the CODE SEGMENT declaration:

ASSUME CS:CODE, DS:DATA, ES:DATA


The CS and DS assumptions are obvious, but the ES assumption is less so. However, the code uses the CMPSB instruction and based on the context, this means an implicit assumption that ES also points to the DATA segment. In my case, (emulated 16-bit DOS), I had to add a few statements to the start of the code to actually load the DS and ES segment registers appropriately.

Avoid instructions outside any segment

The EXIT code currently looks like this:

EXIT:         
        CODE ENDS
        END
        RET


The problem is that the CODE ENDS closes the CODE segment and the END directive tells the assembler that there is no more code and thus the RET instruction may or may not be assembled, and may or may not actually be placed in the CODE segment. You probably meant instead to do this:

EXIT:         
        RET
        CODE ENDS
        END


Eliminate convoluted branching

Avoid needless branching. They make your code harder to read and slower to execute. For example, the code currently has this:

JA EXIT
        JE SAMELENGTH
        JB FIND

SAMELENGTH:
        CLD
        REPE CMPSB
        JNE RED
        JMP GREEN
        ; ... code elided
GREEN:  
        MOV BX, 0001H;  substring found
        JMP EXIT

RED:    
        MOV BX, 0000H;  substring not found
        JMP EXIT
EXIT:


This could be very much simplified:

JA EXIT
        JB FIND
        ;  fall through to same length
SAMELENGTH:
        XOR BX,BX      ; assume string not found
        CLD
        REPE CMPSB
        JNE EXIT
        INC BX         ; indicate that string was found
EXIT:


There are a number of such simplifications possible with little effort.

Know your instruction set

The code currently has this set of instructions

DEC DX
        CMP DX, 0000H
        JE RED


However, the DEC instruction already sets the Z flag, so the CMP instruction is not needed.

Use REPNE SCASB as appropriate

The code at the location FIND is largely the same as would have been done by using REPNE SCASB. The only difference is in which registers are used. The code you have isn't necessarily wrong, but it could probably be shorter.

Avoid using SP as a general register

Just after CHECK, the code saves a copy of the pointer (not an index as the comment falsely claims) to the SP register. However, SP is a stack pointer, so this code can only be used in an environment in which the stack is not used. That could be the case, but it makes the code much less portable to code it that way, especially because the AX or BX registers could just as easily have been used here.

Consider using standard length lines

The comments in the code are very long and the semicolon is right after the instruction. Neither of these things are necessarily wrong, but they are different from the usual convention which is to align the semicolon character in some column and making sure that lines are no more than 72 characters long (some use 78).

Code Snippets

ASSUME CS:CODE, DS:DATA, ES:DATA
EXIT:         
        CODE ENDS
        END
        RET
EXIT:         
        RET
        CODE ENDS
        END
JA EXIT
        JE SAMELENGTH
        JB FIND

SAMELENGTH:
        CLD
        REPE CMPSB
        JNE RED
        JMP GREEN
        ; ... code elided
GREEN:  
        MOV BX, 0001H;  substring found
        JMP EXIT

RED:    
        MOV BX, 0000H;  substring not found
        JMP EXIT
EXIT:
JA EXIT
        JB FIND
        ;  fall through to same length
SAMELENGTH:
        XOR BX,BX      ; assume string not found
        CLD
        REPE CMPSB
        JNE EXIT
        INC BX         ; indicate that string was found
EXIT:

Context

StackExchange Code Review Q#60389, answer score: 9

Revisions (0)

No revisions yet.