patternpythonMinor
Encoding and decoding small strings of text
Viewed 0 times
textsmalldecodingencodingandstrings
Problem
This is supposed to encode and decode small strings of text. Unfortunately, even on a really good laptop, it performs slowly. This might have to do with my looping, or just the sheer computation.
It takes a string, uses simple RSA, subtracts by random #, converts both to fixed binary, then compresses using base 80 "everyletter". To decrypt, it just reverses the process.
The encryption and decryption use binary numbers of length 5 and 19, so it is very important they don't dynamically change.
Take a look and feel free to comment on anything else that seems wrong:
```
# P, M, and D are initialized here
def encode_list( string, m, p ):
prog.config(maximum=len(string), value=0)
l = list(string)
code = []
for c in l:
code.append(encode(c, m, p))
prog.step()
win.update()
return code
def encode( ch, m, p ):
return (ord(ch)**p) % m
def decode_list( num_list ):
prog.config(maximum=len(num_list), value=0)
trans_nums = []
word = []
for q in num_list:
trans_nums.append(decode(q))
prog.step() #for a progress bar, usually more than 5 sec for 30+ characters
win.update()
for r in trans_nums:
word.append(chr(r))
return word
def decode( num ):
return (num**D) % M
def to_binary( number, base ): #I need a fixed length base for repeatability
r = ""
temp = number
powlis = []
for power in range(base-1, -1, -1):
n = pow(2, power)
powlis.append(n)
for p in powlis:
if temp-p >= 0:
r+="1"
temp-=p
else:
r+="0"
return r
def from_binary( bi_string, base ):
r = 0
bitlist = []
for bit in bi_string:
bitlist.append(int(bit))
bitlist.reverse()
for add in range(base-1, -1, -1):
r += pow(2, add)*bitlist[add]
return r
everyletter=[]
for x in range(1, 11):
everyletter.append(str(x)[len(str(x))-
It takes a string, uses simple RSA, subtracts by random #, converts both to fixed binary, then compresses using base 80 "everyletter". To decrypt, it just reverses the process.
The encryption and decryption use binary numbers of length 5 and 19, so it is very important they don't dynamically change.
Take a look and feel free to comment on anything else that seems wrong:
```
# P, M, and D are initialized here
def encode_list( string, m, p ):
prog.config(maximum=len(string), value=0)
l = list(string)
code = []
for c in l:
code.append(encode(c, m, p))
prog.step()
win.update()
return code
def encode( ch, m, p ):
return (ord(ch)**p) % m
def decode_list( num_list ):
prog.config(maximum=len(num_list), value=0)
trans_nums = []
word = []
for q in num_list:
trans_nums.append(decode(q))
prog.step() #for a progress bar, usually more than 5 sec for 30+ characters
win.update()
for r in trans_nums:
word.append(chr(r))
return word
def decode( num ):
return (num**D) % M
def to_binary( number, base ): #I need a fixed length base for repeatability
r = ""
temp = number
powlis = []
for power in range(base-1, -1, -1):
n = pow(2, power)
powlis.append(n)
for p in powlis:
if temp-p >= 0:
r+="1"
temp-=p
else:
r+="0"
return r
def from_binary( bi_string, base ):
r = 0
bitlist = []
for bit in bi_string:
bitlist.append(int(bit))
bitlist.reverse()
for add in range(base-1, -1, -1):
r += pow(2, add)*bitlist[add]
return r
everyletter=[]
for x in range(1, 11):
everyletter.append(str(x)[len(str(x))-
Solution
- Please note that I am just commenting general readability not performance
You really like single letter variable names:
l = list(string)
encode( ch, m, p )
r = ""
Please, use longer variable names to enhance readability.
Quoting from the good answer of Janne and adding:
Try to separate computations into functions with clear inputs and
outputs
Also you may create two files:
compress.pyin which you will write all your algoritmhs of compression.
gui.pyin which you write the Tkinter code for the user interface.
This allows simpler maintaining: if you see that the encoding is broken go to that file, if you think that the GUI is broken go to the GUI file.
Also, it would be a good idea to add some explanation about what the
program is doing and what algorithm it is using.
To be more precise, in Python there is a widely used convention called docstrings, they look like this:
def double(x):
"""Returns the double of x"""
return 2*xfrom tkinter import *This is acceptable for tkinter but it is more readable to use:
import tkinter as tkAs people reading will instantly know when something is a tkinter class.
Now let's analyze the following block of code:
everyletter=[]
for x in range(1, 11):
everyletter.append(str(x)[len(str(x))-1])
for x in range(97, 123):
everyletter.append(chr(x))
for x in range(65, 91):
everyletter.append(chr(x))
for char in ['~','!','@','#','
running this gives:
['1', '2', '3', '4', '5', '6', '7', '8', '9', '0', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '~', '!', '@', '#', '
So you can simplify your code a lot if you declare EVERY_LETTER as a constant: at the start of the file write:
EVERY_LETTER = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '0', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '~', '!', '@', '#', '
But the name is wrong: '4' '}' and ';' for example are not letters, they are called 'chars' (that is a technical term short for characters). so it would be better to call it ALL_CHARS.
Performance
Use PyPy. It give drastic speed-ups with no code change.,'%','^','&','[',']','{','}','|',';',':',',','.','?']:
everyletter.append(char)
running this gives:
%%CODEBLOCK_4%%
So you can simplify your code a lot if you declare EVERY_LETTER as a constant: at the start of the file write:
%%CODEBLOCK_5%%
But the name is wrong: '4' '}' and ';' for example are not letters, they are called 'chars' (that is a technical term short for characters). so it would be better to call it ALL_CHARS.
Performance
Use PyPy. It give drastic speed-ups with no code change., '%', '^', '&', '[', ']', '{', '}', '|', ';', ':', ',', '.', '?']
So you can simplify your code a lot if you declare EVERY_LETTER as a constant: at the start of the file write:
%%CODEBLOCK_5%%
But the name is wrong: '4' '}' and ';' for example are not letters, they are called 'chars' (that is a technical term short for characters). so it would be better to call it ALL_CHARS.
Performance
Use PyPy. It give drastic speed-ups with no code change.,'%','^','&','[',']','{','}','|',';',':',',','.','?']:
everyletter.append(char)running this gives:
%%CODEBLOCK_4%%
So you can simplify your code a lot if you declare EVERY_LETTER as a constant: at the start of the file write:
%%CODEBLOCK_5%%
But the name is wrong:
'4' '}' and ';' for example are not letters, they are called 'chars' (that is a technical term short for characters). so it would be better to call it ALL_CHARS.Performance
Use PyPy. It give drastic speed-ups with no code change., '%', '^', '&', '[', ']', '{', '}', '|', ';', ':', ',', '.', '?']
But the name is wrong:
'4' '}' and ';' for example are not letters, they are called 'chars' (that is a technical term short for characters). so it would be better to call it ALL_CHARS.Performance
Use PyPy. It give drastic speed-ups with no code change.,'%','^','&','[',']','{','}','|',';',':',',','.','?']: everyletter.append(char)
running this gives:
%%CODEBLOCK_4%%
So you can simplify your code a lot if you declare EVERY_LETTER as a constant: at the start of the file write:
%%CODEBLOCK_5%%
But the name is wrong:
'4' '}' and ';' for example are not letters, they are called 'chars' (that is a technical term short for characters). so it would be better to call it ALL_CHARS.Performance
Use PyPy. It give drastic speed-ups with no code change., '%', '^', '&', '[', ']', '{', '}', '|', ';', ':', ',', '.', '?']
So you can simplify your code a lot if you declare EVERY_LETTER as a constant: at the start of the file write:
%%CODEBLOCK_5%%
But the name is wrong:
'4' '}' and ';' for example are not letters, they are called 'chars' (that is a technical term short for characters). so it would be better to call it ALL_CHARS.Performance
Use PyPy. It give drastic speed-ups with no code change.,'%','^','&','[',']','{','}','|',';',':',',','.','?']: everyletter.append(char)
running this gives:
%%CODEBLOCK_4%%
So you can simplify your code a lot if you declare EVERY_LETTER as a constant: at the start of the file write:
%%CODEBLOCK_5%%
But the name is wrong:
'4' '}' and ';' for example are not letters, they are called 'chars' (that is a technical term short for characters). so it would be better to call it ALL_CHARS.Performance
Use PyPy. It give drastic speed-ups with no code change.
Code Snippets
def double(x):
"""Returns the double of x"""
return 2*xfrom tkinter import *import tkinter as tkeveryletter=[]
for x in range(1, 11):
everyletter.append(str(x)[len(str(x))-1])
for x in range(97, 123):
everyletter.append(chr(x))
for x in range(65, 91):
everyletter.append(chr(x))
for char in ['~','!','@','#','$','%','^','&','[',']','{','}','|',';',':',',','.','?']:
everyletter.append(char)['1', '2', '3', '4', '5', '6', '7', '8', '9', '0', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '~', '!', '@', '#', '$', '%', '^', '&', '[', ']', '{', '}', '|', ';', ':', ',', '.', '?']Context
StackExchange Code Review Q#78377, answer score: 4
Revisions (0)
No revisions yet.