HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Convert international datestring to ISO-format

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
formatdatestringconvertinternationaliso

Problem

The function below takes a datestring in the form d-mmm-yyyy and converts it to ISO date format (yyyy-mm-dd). Delimiters may be hyphen, space or /, and Dutch or English month abbreviations may be used.

Now, I know there is dateutil, but it returns unknown string format if you try to parse something with a non-English month in it. I haven't digested all its documentation, but I think dateutil is mainly intended for date calculation. I'm not doing that, I'm just cleaning up user input.

So I wrote my own.

import re
.
.

def ISOdate(date): 
    '''
        converts the following date string format to ISO (yyyy-mm-dd):  
        28-okt-1924 (dutch month abbreviations)
        28 oct 1924 (english..) 
         9/nov/2012 (single digit)
    '''

    shortmonths = [
        'jan', 'feb', 'mrt', 'apr', 'mei', 'jun', 
        'jul', 'aug', 'sep', 'okt', 'nov', 'dec', 
        'jan', 'feb', 'mar', 'apr', 'may', 'jun', 
        'jul', 'aug', 'sep', 'oct', 'nov', 'dec'
        ] 

    # Month abbrevs are only different march, may and october.             

    pat = r'(\d{1,2})\s?[-\/]?\s?(\w{3})\s?[-\/]?\s?(\d{4})'

    q = re.match(pat, date)
    if q: 
        year = q.group(3)
        day = int(q.group(1)) 
        month = shortmonths.index(q.group(2).lower()) % 12 + 1
        return u'{}-{:02d}-{:02d}'.format(year, month, day)
    else:
        # just return input, date fields may be empty
        return date


The regex match parses date month and year may not be pretty, but it works and it's easy to expand for other patterns. Likewise for the month number lookup with index, which is more concise than a chain of if, elif to match month strings to numbers.

Instead of the code between if q: and else:, I also had this, which uses datetime:

year = int(q.group(3))
day = int(q.group(1))
month = shortmonths.index(q.group(2).lower()) % 12 + 1
d = datetime.datetime(year, month, day)
return u'{:%YY-%m-%d}'.format(d)


This works too, but

Solution

Neat stuff.

Suggestions:

Changing shortmonths to a dictionary. This will allow for a pair between numerical months and alphabetical months. No need to repeat 'jan' for example, as you have it now.

Pythonic: unpack month, year, day in a one liner.

Use datetime's strftime to format dates...makes life easier in case you want to change the format down the road.

import re
import datetime

def ISOdate(date):

    month_d = {'01': 'jan',
               '02': 'feb',
               '03': ['mar', 'mrt'],
               '04': 'apr',
               '05': ['may', 'mei'],
               '06': 'jun',
               '07': 'jul',
               '08': 'aug',
               '09': 'sep',
               '10': ['oct', 'okt'],
               '11': 'nov',
               '12': 'dec'
               }

     pat = r'(\d{1,2})\s?[-\/]?\s?(\w{3})\s?[-\/]?\s?(\d{4})'
     q = re.match(pat, date)

     if q:
         day, month, year = [q.group(idx+1) for idx in range(3)]
         if month.isalpha(): # change from letters to numbers
             month = [k for k, v in month_d.items() if month in v][0]
         out_date = datetime.date(int(year), int(month), int(day))
         return datetime.datetime.strftime(out_date, '%Y-%m-%d')  

     else:
         return date

Code Snippets

import re
import datetime

def ISOdate(date):

    month_d = {'01': 'jan',
               '02': 'feb',
               '03': ['mar', 'mrt'],
               '04': 'apr',
               '05': ['may', 'mei'],
               '06': 'jun',
               '07': 'jul',
               '08': 'aug',
               '09': 'sep',
               '10': ['oct', 'okt'],
               '11': 'nov',
               '12': 'dec'
               }

     pat = r'(\d{1,2})\s?[-\/]?\s?(\w{3})\s?[-\/]?\s?(\d{4})'
     q = re.match(pat, date)

     if q:
         day, month, year = [q.group(idx+1) for idx in range(3)]
         if month.isalpha(): # change from letters to numbers
             month = [k for k, v in month_d.items() if month in v][0]
         out_date = datetime.date(int(year), int(month), int(day))
         return datetime.datetime.strftime(out_date, '%Y-%m-%d')  

     else:
         return date

Context

StackExchange Code Review Q#57841, answer score: 2

Revisions (0)

No revisions yet.