HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythondjangoMinor

Generating a unique ID for a filename

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filenameuniquegeneratingfor

Problem

This function generates a unique id for my files, which will be stored in the database to reference them. It returns the id if unique, if not, it generates a new one. My separate file server can then use these ids to locate and retrieve a file.

I'm planning on storing thousands upon maybe millions of files uploaded by users in the future, so this needs to be robust and futureproof.

from django.utils.crypto import get_random_string
from .models import Media

def generateUID():
    uid = get_random_string(length=16, allowed_chars=u'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
    date = datetime.datetime.now()

    result = '%s-%s-%s_%s' % (date.year, date.month, date.day, uid)
    print(result)

    try:
        obj = Media.objects.get(uid=result)
    except Media.DoesNotExist:
        return uid
    else:
        return generateUID()


The Media model is a parent which stores the uid. There is an image model that Media can relate to.

For example:

Media uid = 'test':

  • Image name = 't200x200'



  • Image name = 'original'



When the files are created, they are stored like this:

Let's say the ID is, for example:

2016-9-14_cKYT96osstPB8z2F


and the filename for the file that we want to get is:

avatars_2016-9-14_cKYT96osstPB8z2F_t200x200.png


Then the file would be saved like this in my file server:

media/avatars/2016/9/14/c/K/Y/cKYT96osstPB8z2F/t200x200.png


This ensures that there will be no collisions, and the file system is split and doesn't have thousand of files in one folder, which will damage performance.

Is there any way I can make this perform better or be more reliable? Avoiding collisions and performance issues is a must.

Solution

Code review:

Well, your code seems ok, it can be improved:

  • You can replace the letters by string.ascii_letters



  • You can remove obj since it is not used.



  • You can format your date using the format method, i.e. ˋ"{0:%Y/%j}".format(date). That way, you can have a directory by year and a subdirectory by day in the year.



Analyze:

Instead of creating an UID and then writing a file, I usually process in reverse order and use
tempfile.mkstemp.


[It] Creates a temporary file in the most secure manner possible…

That way, you avoid collisions. Then, I use the newly created file name as an ID.

You speak about performance: note that the more you have subdirectories, the more you have inodes. This can decrease performance when you want to access the files.

To summarize:

  • Create a directory for each day using the format "{0:%Y/%j}".format(date)



  • Create a temporary file in that directory



  • Use the relative path and the filename to construct your UID. For instance, replace "/" and dots by underscore.



Generating UUID

If, for any reason, you can't rely on the file system temporary files, and you must create an UID yourself, I suggest you to consider using UUIDs (Universally Unique IDentifier).

See the Python module uuid available for Python 2 and 3.

For instance, you can use
uudi.uuid1`:


Generate a UUID from a host ID, sequence number, and the current time. If node is not given, getnode() is used to obtain the hardware address. If clock_seq is given, it is used as the sequence number; otherwise a random 14-bit sequence number is chosen.

Example:

>>> import uuid

>>> # make a UUID based on the host ID and current time
>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')


If you want folders, you only need to replace "-" by "/".

Code Snippets

>>> import uuid

>>> # make a UUID based on the host ID and current time
>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')

Context

StackExchange Code Review Q#141369, answer score: 5

Revisions (0)

No revisions yet.