HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Simple python script to shorten URLs in specified text files

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
scriptsimpletextshortenfilespythonspecifiedurls

Problem

A recent question inspired me to write a script for this task:

  • Shorten URLs in specified text files using googleapis.com



  • Handle multiple URLs, whether on the same line or multiple lines



The original question tried to do this in Bash, but it gets quite tricky when it comes to handling multiple URLs, so I decided to rewrite in Python:

#!/usr/bin/env python

import sys
import os
import re
import urllib2
import json

re_url_pattern = re.compile(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')

def shorten_url(url):
    print("Shortening url {} ...".format(url))
    post_url = 'https://www.googleapis.com/urlshortener/v1/url?fields=id'
    postdata = {'longUrl': url}
    headers = {'Content-Type': 'application/json'}
    req = urllib2.Request(post_url, json.dumps(postdata), headers)
    return json.loads(urllib2.urlopen(req).read())['id']

def shorten_urls_in_text(text):
    found_urls = False
    for url in set(re_url_pattern.findall(text)):
        short_url = shorten_url(url)
        text = text.replace(url, short_url)
        found_urls = True

    return found_urls, text

def shorten_urls_in_file(path):
    text = read_file(path)
    found_urls, text = shorten_urls_in_text(text)

    if found_urls:
        write_file(path, text)

def shorten_urls_in_files(*paths):
    for path in paths:
        if os.path.isfile(path):
            print('Shortening urls in file {} ...'.format(path))
            shorten_urls_in_file(path)
        else:
            print('No such file: {}'.format(path))

def read_file(path):
    with open(path) as fh:
        return fh.read()

def write_file(path, text):
    with open(path, 'w') as fh:
        fh.write(text)

if __name__ == '__main__':
    shorten_urls_in_files(*sys.argv[1:])


And some unit tests:

```
#!/usr/bin/env python

import unittest
from test.test_support import run_unittest

from shorten_urls import shorten_url
from shorten_urls import shorten_urls_in_text

sample_url1 = 'http:

Solution

Overall, LGTM.

Suggestions:

-
try ... except would look better than os.path.isfile(). Even if isfile returns True, you still may encounter an exception trying to open it (say, insufficient privileges), so you should be ready to handle exceptions anyway. Besides, using isfile may lead to a race (not likely in this code), if a file is removed right between testing and opening it.

-
The script spends most of the time in shorten_url, which in turn mostly waits for network IO. I'd recommend to give each invocation a thread.

-
Original files are destroyed. I'd prefer to keep originals intact, or at least provide an option for that. Also an ability to read urls from standard input is usually desirable.

Context

StackExchange Code Review Q#57848, answer score: 5

Revisions (0)

No revisions yet.