HiveBrain v1.2.0
Get Started
← Back to all entries
principlepythonMinor

"Compare" program for Eclipse preference files

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
eclipseprogramfilesforcomparepreference

Problem

I am trying to write a simple (trivial?) "compare" program for Eclipse preferences files.

Eclipse preferences files take more of less this form:

# optional comment line
/a/sequence/of/path/elements=string
/another/sequence/of/path/elements=42
# ... (possibly repeated)


Let's call the path sequences "keys" and what follows the = sign "values".

The rules of the program should be:

  • Exit upon detecting an invalid # of arguments (must be 2)



  • argument1 and argument2 are the files to compare



  • Exit if any of the input files are empty



  • One preference entry per line



  • Lines with a path key will have a path value, guaranteed



The output should be as follows:

Part 1

-
All lines from argument1 which have keys not present in argument2, like so:

/path 42
/path2 banana


-
Blank line

Part 2

-
All keys that are present in both argument1 and argument2, like so:

/shared (valueInArgument1, valueInArgument2)
/shared2 (valueInArgument1, valueInArgument2)


-
Blank line

Part 3

-
All lines from argument2 which have keys not present in argument1, like so:

/path3 24
/path4 ananab


Notice Part 1, Part 2, and Part 3 must all be sorted.

I would like to get something:

  • Pythonic



  • Efficient (do as little work as possible computationally, use as little space as possible)



  • Clear and readable from a logical point of view



  • Correct (gets the right result even in edge cases)



  • Instructive (uses data structures and algorithms properly)



Here's my attempt:

```
#!/usr/bin/env /opt/local/bin/python2.7

import sys
import os
import re
import pprint

COMMAND_SYNTAX_ERROR = 2
EMPTY_PREFS_FILE_ERROR = 3

PREFS_REGEX_PATTERN = '^(.?)=(.)$'
PREFS_REGEX = re.compile(PREFS_REGEX_PATTERN)

def parse_prefs_line(line):
regex_result = PREFS_REGEX.match(line)
if not regex_result:
return None, None
return regex_result.group(1), regex_result.group(2)

arguments = sys.argv
n_arguments = len(arguments)

if n_arguments != 3:
print 'usage: e

Solution

Don't optimize unless your profiler says so.

You could start with the simplest code that works e.g., here's a straightforward translation of your requirements:

import sys

def get_entries(filename):
    with open(filename) as file:
        # extract 'key = value' entries
        entries = (map(str.strip, line.partition('=')[::2]) for line in file)
        #note: if keys are repeated the last value wins
        # enforce non-empty values, skip comments
        return {key: value for key, value in entries
                if value and not key.startswith('#')}

if len(sys.argv) != 3:
    sys.exit(2) # wrong number of arguments
d1, d2 = map(get_entries, sys.argv[1:])
if not (d1 and d2):
    sys.exit(1) # no entries in a file

def print_entries(keys, d, d2=None):
    for k in sorted(keys):
        value = d[k] if d2 is None else "(%s, %s)" % (d[k], d2[k])
        print k, value
    print

print_entries(d1.viewkeys() - d2.viewkeys(), d1)
print_entries(d1.viewkeys() & d2.viewkeys(), d1, d2)
print_entries(d2.viewkeys() - d1.viewkeys(), d2)


You could compare results and the performance with your code.

You could also compare it the comm command from coreutils:

$ comm <(sort file1) <(sort file2)

Code Snippets

import sys

def get_entries(filename):
    with open(filename) as file:
        # extract 'key = value' entries
        entries = (map(str.strip, line.partition('=')[::2]) for line in file)
        #note: if keys are repeated the last value wins
        # enforce non-empty values, skip comments
        return {key: value for key, value in entries
                if value and not key.startswith('#')}

if len(sys.argv) != 3:
    sys.exit(2) # wrong number of arguments
d1, d2 = map(get_entries, sys.argv[1:])
if not (d1 and d2):
    sys.exit(1) # no entries in a file

def print_entries(keys, d, d2=None):
    for k in sorted(keys):
        value = d[k] if d2 is None else "(%s, %s)" % (d[k], d2[k])
        print k, value
    print

print_entries(d1.viewkeys() - d2.viewkeys(), d1)
print_entries(d1.viewkeys() & d2.viewkeys(), d1, d2)
print_entries(d2.viewkeys() - d1.viewkeys(), d2)
$ comm <(sort file1) <(sort file2)

Context

StackExchange Code Review Q#14774, answer score: 5

Revisions (0)

No revisions yet.