patternpythonMinor
Unique short string for URL
Viewed 0 times
uniqueshortforstringurl
Problem
I needed short, unique, seemingly random URLs for a project I'm working on, so I put together the following after some research. I believe it's working as expected, but I want to know if there's a better or simpler way of doing this. Basically, it takes an integer and returns a string of the specified length using the character list. I'm new to Python, so help is much appreciated.
def genkey(value, length=5, chars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890', prime=694847539):
digits = []
base = len(chars)
seed = base ** (length - 1)
domain = (base ** length) - seed
num = ((((value - 1) + seed) * prime) % domain) + seed
while num:
digits.append(chars[num % base])
num /= base
return ''.join(reversed(digits))Solution
The trouble with using a pseudo-random number generator to produce your shortened URLs is, what do you do if there is a collision? That is, what happens if there are values
Anyway, if I didn't need to solve the collision problem, I would use Python's built-in pseudo-random number generator for this instead of writing my own. Also, I would add a docstring and some doctests.
Update: you clarified in comments that you do need to avoid collisions, and moreover you know that
-
As far as I can see, there's no need to subtract 1 from
-
There's no need to use the value
-
There's no need to call
Applying all those simplifications yields the following:
A couple of things you might want to beware of:
-
This pseudo-random scheme is not cryptographically strong: that is, it's fairly easy to go back from the URL to the value that produced it. This can be a problem in some applications.
-
The random strings produced by this scheme may include real words or names in human languages. This could be unfortunate in some applications, if the resulting words were offensive or otherwise inappropriate.
v and w such that v != w but genkey(v) == genkey(w)? Would this be a problem for your application, or would it be entirely fine?Anyway, if I didn't need to solve the collision problem, I would use Python's built-in pseudo-random number generator for this instead of writing my own. Also, I would add a docstring and some doctests.
import random
import string
def genkey(value, length = 5, chars = string.ascii_letters + string.digits):
"""
Return a string of `length` characters chosen pseudo-randomly from
`chars` using `value` as the seed.
>>> ' '.join(genkey(i) for i in range(5))
'0UAqF i0VpE 76dfZ oHwLM ogyje'
"""
random.seed(value)
return ''.join(random.choice(chars) for _ in xrange(length))Update: you clarified in comments that you do need to avoid collisions, and moreover you know that
value is a number between 1 and domain inclusive. In that case, you're right that your transformation is an injection, since prime is coprime to domain, so your method is fine, but there are several things that can be done to simplify it:-
As far as I can see, there's no need to subtract 1 from
value.-
There's no need to use the value
seed at all. You're using this to ensure that you have at least length digits in num, but it's easier just to generate exactly length digits. (This gives you a bigger domain in any case.)-
There's no need to call
reversed: the reverse of a pseudo-random string is also a pseudo-random string.Applying all those simplifications yields the following:
def genkey(value, length=5, chars=string.ascii_letters + string.digits, prime=694847539):
"""
Return a string of `length` characters chosen pseudo-randomly from
`chars` using `value` as the seed and `prime` as the multiplier.
`value` must be a number between 1 and `len(chars) ** length`
inclusive.
>>> ' '.join(genkey(i) for i in range(1, 6))
'xKFbV UkbdG hVGer Evcgc 15HhX'
"""
base = len(chars)
domain = base ** length
assert(1 <= value <= domain)
n = value * prime % domain
digits = []
for _ in xrange(length):
n, c = divmod(n, base)
digits.append(chars[c])
return ''.join(digits)A couple of things you might want to beware of:
-
This pseudo-random scheme is not cryptographically strong: that is, it's fairly easy to go back from the URL to the value that produced it. This can be a problem in some applications.
-
The random strings produced by this scheme may include real words or names in human languages. This could be unfortunate in some applications, if the resulting words were offensive or otherwise inappropriate.
Code Snippets
import random
import string
def genkey(value, length = 5, chars = string.ascii_letters + string.digits):
"""
Return a string of `length` characters chosen pseudo-randomly from
`chars` using `value` as the seed.
>>> ' '.join(genkey(i) for i in range(5))
'0UAqF i0VpE 76dfZ oHwLM ogyje'
"""
random.seed(value)
return ''.join(random.choice(chars) for _ in xrange(length))def genkey(value, length=5, chars=string.ascii_letters + string.digits, prime=694847539):
"""
Return a string of `length` characters chosen pseudo-randomly from
`chars` using `value` as the seed and `prime` as the multiplier.
`value` must be a number between 1 and `len(chars) ** length`
inclusive.
>>> ' '.join(genkey(i) for i in range(1, 6))
'xKFbV UkbdG hVGer Evcgc 15HhX'
"""
base = len(chars)
domain = base ** length
assert(1 <= value <= domain)
n = value * prime % domain
digits = []
for _ in xrange(length):
n, c = divmod(n, base)
digits.append(chars[c])
return ''.join(digits)Context
StackExchange Code Review Q#22954, answer score: 5
Revisions (0)
No revisions yet.