HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Program to retrieve key/message from a multiple times used one time pad

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
usedmessageprogramtimeretrieveonemultiplepadtimesfrom

Problem

I wrote a program to retrieve the key/messages from 10 different ciphers which were all encrypted with the same key with an xor one-time-pad method via crib dragging.

To do this, I wrote a python script which turns a string to test to hex code. Then it xors the 10 cipher strings (200 hex characters each), two at a time. After that it, xors the test string with the 5 xor hashes on each position and then looks via a website whether this is a reasonably good string to go further and try again.

I already tested it and got positive results.

Now when I do it with the real ciphers I got nowhere so far, and I am not sure whether this is because of the script or me not being able to expand on retrieved valid string fragments.

Here is the code, I know that the website lookup is very bad, but I found no other way this fast/easy (to implement, not to run).

EDIT: Since I do not think overwriting the code is good, I put it here in case somebody wants to see it: Gist link

``
# -- coding: UTF-8 --
import codecs
from urllib.error import URLError
import urllib.request

forbiddenSigns = ['{', '}', '[', ']', '(', ')', '?', '=', '%', '_', ';', '-', '^', ':', '', '|', '!', '
', '´', "'", '\\']

with open("testCiphers.txt") as f:
ciphers = f.readlines()

ciphers = [s.strip() for s in ciphers]

crib = "the"
cribList = []

for c in crib:
cribList.append(hex(ord(c))[2:])

cribHash = ''.join(cribList)
cribLength = len(cribHash)

cipherHashList = []
for i in range(0, len(ciphers) - 1):
cipherHashList.append(int(ciphers[i], 16) ^ int(ciphers[i + 1], 16))

hashResultList = [str(hex(hr))[2:] for hr in cipherHashList]

hexList = []

for hashResultString in hashResultList:
for i in range(0, len(hashResultString) - cribLength + 1, 1):
hexList.append(hashResultString[i:i + cribLength])

for ha in hexList:
cribAttempt = int(cribHash, 16) ^ int(ha, 16)
cribAttempt = (hex(cribAttempt))[2:].zfill(cribLength)
clearText = codecs.decode(cribAttempt, "hex")
if

Solution

List comprehensions

You use list comprehensions sometimes, and sometimes you use for ...: someList.append(...). This is not consistent. Use list comprehensions as much as you can since you are already building lists (and list comprehensions are slightly faster than for loops):

cribList = [hex(ord(letter)[2:] for letter in crib]

cipherHashList = [int(ciphers[i],16)^int(ciphers[i+1],16) for i in range(len(ciphers)-1)]

hexList = [hashResultString[i:i+cribLength] for hashResultString in hashResultList for i in range(len(hashResultString) - cribLength + 1)]


You can speed things up a little more (maybe at the cost of readability) by using more complex expressions in the list comprehension to avoid creating intermediate lists:

hashResultList = [hex(int(ciphers[i],16)^int(ciphers[i+1],16))[2:] for i in range(len(ciphers)-1)]


Naming

Check PEP 8 for the rationale but you’d better use UPPERCASE names for constants such as FORBIDDEN_SIGNS or CRIB and lower_snake_case for variables such as hex_list or crib_length.

Some of the variable are named thingList and might be improved using things.

Reusability

Make functions to ease testing into the interpreter and/or reusability:

def get_crib_hash():
    return ''.join(hex(ord(letter))[2:] for letter in CRIB)


change the value of CRIB in the interpreter and voilà, every time you (or any function) use get_crib_hash() the value is easily updated without having to recompute everything.

For efficiency reasons I recommend using a generator function for file reading.

Flaws

On my computer, instead of returning b'test':

>>> codecs.decode("74657374","hex")
Traceback (most recent call last):
  File "", line 1, in 
    codecs.decode("74657374","hex")
LookupError: unknown encoding: hex


You can use a more robust approach in the name of binascii.unhexlify which is doing exactly what you want.

For your issue with “real ciphers”, you might want to print the clear text first instead of trying to lookup words through a web API. It’ll help understand what is happening.

Small improvements

-
forbiddenSigns = ['{', '}', '[', ']', '(', ')', '?', '=', '%', '_', ';', '-', '^', ':', '', '|', '!', '`', '´', "'", '\\']


is better written (more resource friendly) as:

FORBIDDEN_SIGNS = '{}[]()?=%_;-^:<>|!`´\'\\'


and you will still be able to iterate over each symbol.

  • range() does not need a 0 as first argument (unless you want to specify a step); similarly a step of 1 is the default.



  • compute once if you can; e.g: str(clearText, "ASCII").



  • consider using an if __name__ == '__main__': construct to emphasize on reusability and improve testing in the interactive interpreter.



  • format is often more efficient for building strings; it can even replace hex for your use case.



Proposed improvements

import binascii

FORBIDDEN_SIGNS = '{}[]()?=%_;-^:<>|!`´\'\\'
CRIB = 'the'

def get_crib_hash():
    return ''.join(format(ord(letter), '02x') for letter in CRIB)

def hash_file(filename):
    with open(filename) as f:
        previous = None
        for line in f:
            line = line.strip()
            try:
                yield format(int(previous, 16) ^ int(line, 16), 'x')
            except TypeError: # for int(None, 16) at first pass
                pass
            previous = line # Yeah, I’ll compute int(this_thing, 16) a second time…
            # Avoiding it is left as an exercise for the reader.

def decipher_file(filename):
    crib_hash = get_crib_hash()
    crib_length = len(crib_hash)
    crib_hash = int(crib_hash, 16)

    for hash_result in hash_file(filename):
        for i in range(len(hash_result) - crib_length + 1):
            attempt = crib_hash ^ int(hash_result[i:i+crib_length], 16)
            attempt = '{0:0{fill}x}'.format(attempt, fill=crib_length)
            yield str(binascii.unhexlify(attempt), 'ASCII')

if __name__ == '__main__':
    from urllib.error import URLError
    from urllib.request import urlopen

    for clear_text in decipher_file('testCiphers.txt'):
        # or just print(clear_text)…
        if '\\x' not in clear_text and any(s not in clear_text
                                           for s in FORBIDDEN_SIGNS):
            url = 'http://www.morewords.com/contains/{}'.format(clear_text)
            try:
                response = str(urlopen(url).read())
                if 'words found.' in response:
                    print(clear_text)
            except URLError :
                'error getting webpage'

Code Snippets

cribList = [hex(ord(letter)[2:] for letter in crib]

cipherHashList = [int(ciphers[i],16)^int(ciphers[i+1],16) for i in range(len(ciphers)-1)]

hexList = [hashResultString[i:i+cribLength] for hashResultString in hashResultList for i in range(len(hashResultString) - cribLength + 1)]
hashResultList = [hex(int(ciphers[i],16)^int(ciphers[i+1],16))[2:] for i in range(len(ciphers)-1)]
def get_crib_hash():
    return ''.join(hex(ord(letter))[2:] for letter in CRIB)
>>> codecs.decode("74657374","hex")
Traceback (most recent call last):
  File "<pyshell#39>", line 1, in <module>
    codecs.decode("74657374","hex")
LookupError: unknown encoding: hex
forbiddenSigns = ['{', '}', '[', ']', '(', ')', '?', '=', '%', '_', ';', '-', '^', ':', '<', '>', '|', '!', '`', '´', "'", '\\']

Context

StackExchange Code Review Q#109058, answer score: 2

Revisions (0)

No revisions yet.