patternpythonMinor
Keeping just the rare characters in a long string
Viewed 0 times
keepingthejustlongcharactersrarestring
Problem
I'm working on The Python Challenge. The current code iterates over a 98,766-character string before printing an 8-character string. Current execution time is between 7 and 10 seconds.
How can I optimize my current code and what rookie mistakes am I making that may be adding to the execution time?
How can I optimize my current code and what rookie mistakes am I making that may be adding to the execution time?
import urllib.request
import re
url = 'http://www.pythonchallenge.com/pc/def/ocr.html'
x = urllib.request.urlopen(url).read().decode('utf-8')
text = (re.findall(r'', x, re.DOTALL))[1]
# source contains two adjacent html comments
# returns second comment
s = ''
for i in text:
if text.count(i) < 10:
s += i
print(s)Solution
x is not a good variable name. I suggest page_source.Your approach is inefficient because you re-count all of the occurrences of each character, which is O(n2), where n is the length of the string.
It would be better to use a
Counter. It's more efficient because counts all characters using just one pass through the string. (We call that O(n).)from collections import Counter
count = Counter(text)
s = ''.join(char for char in text if count[char] < 10)
print(s)If you wanted to make your own counter with similar performance characteristics, you could do something similar to this:
count = {}
for char in text:
count[char] = 1 + count.get(char, 0)Code Snippets
from collections import Counter
count = Counter(text)
s = ''.join(char for char in text if count[char] < 10)
print(s)count = {}
for char in text:
count[char] = 1 + count.get(char, 0)Context
StackExchange Code Review Q#94385, answer score: 4
Revisions (0)
No revisions yet.