patternpythonMinor
Splitting plain text dictionary data to multiple files
Viewed 0 times
textsplittingfilesmultipledictionarydataplain
Problem
I have a plain text file with the content of a dictionary (Webster's Unabridged Dictionary) in this format:
`A
A (named a in the English, and most commonly ä in other languages).
Defn: The first letter of the English and of many other alphabets.
The capital A of the alphabets of Middle and Western Europe, as also
the small letter (a), besides the forms in Italic, black letter,
etc., are all descended from the old Latin A, which was borrowed from
the Greek Alpha, of the same form; and this was made from the first
letter (Aleph, and itself from the Egyptian origin. The Aleph was a
consonant letter, with a guttural breath sound that was not an
element of Greek articulation; and the Greeks took it to represent
their vowel Alpha with the ä sound, the Phoenician alphabet having no
vowel symbols. This letter, in English, is used for several different
vowel sounds. See Guide to pronunciation, §§ 43-74. The regular long
a, as in fate, etc., is a comparatively modern sound, and has taken
the place of what, till about the early part of the 17th century, was
a sound of the quality of ä (as in far).
Defn: The name of the sixth tone in the model major scale (that in
C), or the first tone of the minor scale, which is named after it the
scale in A minor. The second string of the violin is tuned to the A
in the treble staff.
-- A sharp (A#) is the name of a musical tone intermediate between A
and B.
-- A flat (A) is the name of a tone intermediate between A and G.
A per se Etym: (L. per se by itself), one preëminent; a nonesuch.
[Obs.]
O fair Creseide, the flower and A per se Of Troy and Greece. Chaucer.
A
A, prep. Etym: [Abbreviated form of an (AS. on). See On.]
tiptoe." "A Sundays" Shak. "Wit that men have now a days." Chaucer.
"Set them a work." Robynson (More's Utopia)
substantives in -ing which begin with a consonant. This is a
s
`A
A (named a in the English, and most commonly ä in other languages).
Defn: The first letter of the English and of many other alphabets.
The capital A of the alphabets of Middle and Western Europe, as also
the small letter (a), besides the forms in Italic, black letter,
etc., are all descended from the old Latin A, which was borrowed from
the Greek Alpha, of the same form; and this was made from the first
letter (Aleph, and itself from the Egyptian origin. The Aleph was a
consonant letter, with a guttural breath sound that was not an
element of Greek articulation; and the Greeks took it to represent
their vowel Alpha with the ä sound, the Phoenician alphabet having no
vowel symbols. This letter, in English, is used for several different
vowel sounds. See Guide to pronunciation, §§ 43-74. The regular long
a, as in fate, etc., is a comparatively modern sound, and has taken
the place of what, till about the early part of the 17th century, was
a sound of the quality of ä (as in far).
- (Mus.)
Defn: The name of the sixth tone in the model major scale (that in
C), or the first tone of the minor scale, which is named after it the
scale in A minor. The second string of the violin is tuned to the A
in the treble staff.
-- A sharp (A#) is the name of a musical tone intermediate between A
and B.
-- A flat (A) is the name of a tone intermediate between A and G.
A per se Etym: (L. per se by itself), one preëminent; a nonesuch.
[Obs.]
O fair Creseide, the flower and A per se Of Troy and Greece. Chaucer.
A
A, prep. Etym: [Abbreviated form of an (AS. on). See On.]
- In; on; at; by. [Obs.] "A God's name." "Torn a pieces." "Stand a
tiptoe." "A Sundays" Shak. "Wit that men have now a days." Chaucer.
"Set them a work." Robynson (More's Utopia)
- In process of; in the act of; into; to; -- used with verbal
substantives in -ing which begin with a consonant. This is a
s
Solution
I've executed the code using the cleaned file you provided and it worked fine
for me.
I have a few comments:
like
find easier to read
However, when looking at the directories that only have one character and an
underscore, the counter doesn't seem to be right.
is going to be written to disk. Otherwise
because strings are immutable.
Regarding how to use
Once you have more logs you can play a little bit setting different levels to each message and adding a command line option to set the desired level and get the desired level of verbosity.
for me.
I have a few comments:
- Comments and docstrings would make the code more readable.
- Try to use constants instead of hardcoded values (instead of
2something
like
DIR_LENGTH or PREFIX_LENGTH would be nice)- Replace
printstatements (only one at the moment) withloggingcalls
- The index file is opened and closed for every entry. That doesn't seem to be efficient.
not line.count(' ')seems to be equivalent to' ' not in linewhich I
find easier to read
- I see there's a counter for all entries starting with the same character.
However, when looking at the directories that only have one character and an
underscore, the counter doesn't seem to be right.
contentshould be a list of strings and it should be joined when the term
is going to be written to disk. Otherwise
+= with strings isn't efficientbecause strings are immutable.
Regarding how to use
logging, the very basic to start with would be as follows:import logging
...
def write_entry_file(dirname, filename, content, debug=False):
...
logging.debug('writing to file %s', path)
...
def main():
...
logging.basicConfig(
format='%(levelname)s: %(message)s',
level=logging.DEBUG)
...Once you have more logs you can play a little bit setting different levels to each message and adding a command line option to set the desired level and get the desired level of verbosity.
Code Snippets
import logging
...
def write_entry_file(dirname, filename, content, debug=False):
...
logging.debug('writing to file %s', path)
...
def main():
...
logging.basicConfig(
format='%(levelname)s: %(message)s',
level=logging.DEBUG)
...Context
StackExchange Code Review Q#59629, answer score: 7
Revisions (0)
No revisions yet.