patternpythonMinor
Simple python script to shorten URLs in specified text files
Viewed 0 times
scriptsimpletextshortenfilespythonspecifiedurls
Problem
A recent question inspired me to write a script for this task:
The original question tried to do this in Bash, but it gets quite tricky when it comes to handling multiple URLs, so I decided to rewrite in Python:
And some unit tests:
```
#!/usr/bin/env python
import unittest
from test.test_support import run_unittest
from shorten_urls import shorten_url
from shorten_urls import shorten_urls_in_text
sample_url1 = 'http:
- Shorten URLs in specified text files using googleapis.com
- Handle multiple URLs, whether on the same line or multiple lines
The original question tried to do this in Bash, but it gets quite tricky when it comes to handling multiple URLs, so I decided to rewrite in Python:
#!/usr/bin/env python
import sys
import os
import re
import urllib2
import json
re_url_pattern = re.compile(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')
def shorten_url(url):
print("Shortening url {} ...".format(url))
post_url = 'https://www.googleapis.com/urlshortener/v1/url?fields=id'
postdata = {'longUrl': url}
headers = {'Content-Type': 'application/json'}
req = urllib2.Request(post_url, json.dumps(postdata), headers)
return json.loads(urllib2.urlopen(req).read())['id']
def shorten_urls_in_text(text):
found_urls = False
for url in set(re_url_pattern.findall(text)):
short_url = shorten_url(url)
text = text.replace(url, short_url)
found_urls = True
return found_urls, text
def shorten_urls_in_file(path):
text = read_file(path)
found_urls, text = shorten_urls_in_text(text)
if found_urls:
write_file(path, text)
def shorten_urls_in_files(*paths):
for path in paths:
if os.path.isfile(path):
print('Shortening urls in file {} ...'.format(path))
shorten_urls_in_file(path)
else:
print('No such file: {}'.format(path))
def read_file(path):
with open(path) as fh:
return fh.read()
def write_file(path, text):
with open(path, 'w') as fh:
fh.write(text)
if __name__ == '__main__':
shorten_urls_in_files(*sys.argv[1:])And some unit tests:
```
#!/usr/bin/env python
import unittest
from test.test_support import run_unittest
from shorten_urls import shorten_url
from shorten_urls import shorten_urls_in_text
sample_url1 = 'http:
Solution
Overall, LGTM.
Suggestions:
-
-
The script spends most of the time in
-
Original files are destroyed. I'd prefer to keep originals intact, or at least provide an option for that. Also an ability to read urls from standard input is usually desirable.
Suggestions:
-
try ... except would look better than os.path.isfile(). Even if isfile returns True, you still may encounter an exception trying to open it (say, insufficient privileges), so you should be ready to handle exceptions anyway. Besides, using isfile may lead to a race (not likely in this code), if a file is removed right between testing and opening it.-
The script spends most of the time in
shorten_url, which in turn mostly waits for network IO. I'd recommend to give each invocation a thread.-
Original files are destroyed. I'd prefer to keep originals intact, or at least provide an option for that. Also an ability to read urls from standard input is usually desirable.
Context
StackExchange Code Review Q#57848, answer score: 5
Revisions (0)
No revisions yet.