HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Splitting plain text dictionary data to multiple files

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
textsplittingfilesmultipledictionarydataplain

Problem

I have a plain text file with the content of a dictionary (Webster's Unabridged Dictionary) in this format:

`A
A (named a in the English, and most commonly ä in other languages).

Defn: The first letter of the English and of many other alphabets.
The capital A of the alphabets of Middle and Western Europe, as also
the small letter (a), besides the forms in Italic, black letter,
etc., are all descended from the old Latin A, which was borrowed from
the Greek Alpha, of the same form; and this was made from the first
letter (Aleph, and itself from the Egyptian origin. The Aleph was a
consonant letter, with a guttural breath sound that was not an
element of Greek articulation; and the Greeks took it to represent
their vowel Alpha with the ä sound, the Phoenician alphabet having no
vowel symbols. This letter, in English, is used for several different
vowel sounds. See Guide to pronunciation, §§ 43-74. The regular long
a, as in fate, etc., is a comparatively modern sound, and has taken
the place of what, till about the early part of the 17th century, was
a sound of the quality of ä (as in far).

  1. (Mus.)



Defn: The name of the sixth tone in the model major scale (that in
C), or the first tone of the minor scale, which is named after it the
scale in A minor. The second string of the violin is tuned to the A
in the treble staff.
-- A sharp (A#) is the name of a musical tone intermediate between A
and B.
-- A flat (A) is the name of a tone intermediate between A and G.

A per se Etym: (L. per se by itself), one preëminent; a nonesuch.
[Obs.]
O fair Creseide, the flower and A per se Of Troy and Greece. Chaucer.

A
A, prep. Etym: [Abbreviated form of an (AS. on). See On.]

  1. In; on; at; by. [Obs.] "A God's name." "Torn a pieces." "Stand a


tiptoe." "A Sundays" Shak. "Wit that men have now a days." Chaucer.
"Set them a work." Robynson (More's Utopia)

  1. In process of; in the act of; into; to; -- used with verbal


substantives in -ing which begin with a consonant. This is a
s

Solution

I've executed the code using the cleaned file you provided and it worked fine
for me.

I have a few comments:

  • Comments and docstrings would make the code more readable.



  • Try to use constants instead of hardcoded values (instead of 2 something


like DIR_LENGTH or PREFIX_LENGTH would be nice)

  • Replace print statements (only one at the moment) with logging calls



  • The index file is opened and closed for every entry. That doesn't seem to be efficient.



  • not line.count(' ') seems to be equivalent to ' ' not in line which I


find easier to read

  • I see there's a counter for all entries starting with the same character.


However, when looking at the directories that only have one character and an
underscore, the counter doesn't seem to be right.

  • content should be a list of strings and it should be joined when the term


is going to be written to disk. Otherwise += with strings isn't efficient
because strings are immutable.

Regarding how to use logging, the very basic to start with would be as follows:

import logging

...

def write_entry_file(dirname, filename, content, debug=False):
    ...
    logging.debug('writing to file %s', path)
    ...

def main():
    ...
    logging.basicConfig(
        format='%(levelname)s: %(message)s',
        level=logging.DEBUG)
    ...


Once you have more logs you can play a little bit setting different levels to each message and adding a command line option to set the desired level and get the desired level of verbosity.

Code Snippets

import logging

...

def write_entry_file(dirname, filename, content, debug=False):
    ...
    logging.debug('writing to file %s', path)
    ...

def main():
    ...
    logging.basicConfig(
        format='%(levelname)s: %(message)s',
        level=logging.DEBUG)
    ...

Context

StackExchange Code Review Q#59629, answer score: 7

Revisions (0)

No revisions yet.