patternpythondjangoMinor
Generating a unique ID for a filename
Viewed 0 times
filenameuniquegeneratingfor
Problem
This function generates a unique id for my files, which will be stored in the database to reference them. It returns the id if unique, if not, it generates a new one. My separate file server can then use these ids to locate and retrieve a file.
I'm planning on storing thousands upon maybe millions of files uploaded by users in the future, so this needs to be robust and futureproof.
The
For example:
Media
When the files are created, they are stored like this:
Let's say the ID is, for example:
and the filename for the file that we want to get is:
Then the file would be saved like this in my file server:
This ensures that there will be no collisions, and the file system is split and doesn't have thousand of files in one folder, which will damage performance.
Is there any way I can make this perform better or be more reliable? Avoiding collisions and performance issues is a must.
I'm planning on storing thousands upon maybe millions of files uploaded by users in the future, so this needs to be robust and futureproof.
from django.utils.crypto import get_random_string
from .models import Media
def generateUID():
uid = get_random_string(length=16, allowed_chars=u'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
date = datetime.datetime.now()
result = '%s-%s-%s_%s' % (date.year, date.month, date.day, uid)
print(result)
try:
obj = Media.objects.get(uid=result)
except Media.DoesNotExist:
return uid
else:
return generateUID()The
Media model is a parent which stores the uid. There is an image model that Media can relate to.For example:
Media
uid = 'test':- Image
name = 't200x200'
- Image
name = 'original'
When the files are created, they are stored like this:
Let's say the ID is, for example:
2016-9-14_cKYT96osstPB8z2Fand the filename for the file that we want to get is:
avatars_2016-9-14_cKYT96osstPB8z2F_t200x200.pngThen the file would be saved like this in my file server:
media/avatars/2016/9/14/c/K/Y/cKYT96osstPB8z2F/t200x200.pngThis ensures that there will be no collisions, and the file system is split and doesn't have thousand of files in one folder, which will damage performance.
Is there any way I can make this perform better or be more reliable? Avoiding collisions and performance issues is a must.
Solution
Code review:
Well, your code seems ok, it can be improved:
Analyze:
Instead of creating an UID and then writing a file, I usually process in reverse order and use tempfile.mkstemp
Generate a UUID from a host ID, sequence number, and the current time. If node is not given, getnode() is used to obtain the hardware address. If clock_seq is given, it is used as the sequence number; otherwise a random 14-bit sequence number is chosen.
Example:
If you want folders, you only need to replace "-" by "/".
Well, your code seems ok, it can be improved:
- You can replace the letters by
string.ascii_letters
- You can remove
objsince it is not used.
- You can format your date using the
formatmethod, i.e. ˋ"{0:%Y/%j}".format(date). That way, you can have a directory by year and a subdirectory by day in the year.
Analyze:
Instead of creating an UID and then writing a file, I usually process in reverse order and use tempfile.mkstemp
.
[It] Creates a temporary file in the most secure manner possible…
That way, you avoid collisions. Then, I use the newly created file name as an ID.
You speak about performance: note that the more you have subdirectories, the more you have inodes. This can decrease performance when you want to access the files.
To summarize:
- Create a directory for each day using the format
"{0:%Y/%j}".format(date)
- Create a temporary file in that directory
- Use the relative path and the filename to construct your UID. For instance, replace "/" and dots by underscore.
Generating UUID
If, for any reason, you can't rely on the file system temporary files, and you must create an UID yourself, I suggest you to consider using UUIDs (Universally Unique IDentifier).
See the Python module uuid available for Python 2 and 3.
For instance, you can use uudi.uuid1`:Generate a UUID from a host ID, sequence number, and the current time. If node is not given, getnode() is used to obtain the hardware address. If clock_seq is given, it is used as the sequence number; otherwise a random 14-bit sequence number is chosen.
Example:
>>> import uuid
>>> # make a UUID based on the host ID and current time
>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')If you want folders, you only need to replace "-" by "/".
Code Snippets
>>> import uuid
>>> # make a UUID based on the host ID and current time
>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')Context
StackExchange Code Review Q#141369, answer score: 5
Revisions (0)
No revisions yet.