HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Rewrite Amazon s3 key

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
amazonrewritekey

Problem

I've create a function that rewrites the key or "path" of an object in s3.

By default, Amazon Web Services Firehose writes to s3 in the format of YYYY/MM/DD/HH/foo.json. We have a AWS Lambda function listening for putObjects on l1/source/event_type/fh/, and when a new file is added to s3, the Lambda is invoked and the key or 'path' to that file is rewritten to a flat structure of l1/source/event_type/daily/dt=YYYY-MM-DD/foo.json - yes, I purposefully left-off the fh and HH paths.

input key: l1/source/event_type/fh/YYYY/MM/DD/HH/foo.json

output key: l1/source/event_type/daily/dt=YYYY-MM-DD/foo.json

def create_date_parition_from_key(key):
    '''creates new date parition prefix
    '''

    try:
        key_split = re.split(r'(/\d{4})', key)
        start_path = (key_split[0].split('/')[0], key_split[0].split('/')[1])
        remove_fh_path = '/'.join(start_path)
        default = key_split[2].split('/')
        year = key_split[1][1:]

        s3_prefix = remove_fh_path + '/'# /l1/foo/bar/baz/
        date_partion = ('daily/dt=' + 
                        '-'.join([year, default[1], default[2]]) + 
                        '/') # dt=YYYY-MM-DD/
        file_name = default[-1] # foo.json

        new_key = s3_prefix + date_partion + file_name

        print ('New partition key created: {}.'.format(new_key))

        return new_key
    except Exception as ex:
        print(ex)
        print('Error paritioning key {}.'.format(key))
        raise ex


I'm newer to python and am just looking for ways to improve my code as it seems fragile.

EDIT: The input key can vary in its number of paths:

  • l1/source/event_type/fh/YYYY/MM/DD/HH/foo.json



  • l1/app/source/event_type/fh/YYYY/MM/DD/HH/foo.json



  • l1/event_type/fh/YYYY/MM/DD/HH/foo.json

Solution

Simplicity

Tuple unpacking and str.format can simplify the function so much:

def create_date_parition_from_key(key):
    a,b,c,_,year, month, day, _, name = key.split('/')
    return "{}/{}/{}/daily/dt={}-{}-{}/{}".format(\
            a, b, c, year, month, day, name)


If input key can change in the number of paths:

def create_date_parition_from_key(key):
    *_, year, month, day, _, name = key.split('/')
    return "{}/daily/dt={}-{}-{}/{}".format(\
            key[:key.index("/fh/")], year, month, day, name)


Exception handling

try:
    # Things
except Exception as ex:
    print(ex)
    print('Error paritioning key {}.'.format(key))
    raise ex


This is against re-use. I cannot re-use this function because it inevitably prints to the console on error, I have no way of working around it throwing an exception with try - except.

This kind of error notification should be done in the main function that has exactly the job of communicating errors / successes of the other functions to the end user.

Finally Exception is too vague and too much code is inside the try block. Please reduce the code inside try as much as possible and specify a precise Exception kind.

I just removed all exception handling in my version as in my opinion it was just making the code worse while there was no need for it.

Code Snippets

def create_date_parition_from_key(key):
    a,b,c,_,year, month, day, _, name = key.split('/')
    return "{}/{}/{}/daily/dt={}-{}-{}/{}".format(\
            a, b, c, year, month, day, name)
def create_date_parition_from_key(key):
    *_, year, month, day, _, name = key.split('/')
    return "{}/daily/dt={}-{}-{}/{}".format(\
            key[:key.index("/fh/")], year, month, day, name)
try:
    # Things
except Exception as ex:
    print(ex)
    print('Error paritioning key {}.'.format(key))
    raise ex

Context

StackExchange Code Review Q#142263, answer score: 4

Revisions (0)

No revisions yet.