HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Selecting specific lines from a file and focusing on them only?

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filefocusingselectingspecificandfromthemonlylines

Problem

So I have a file containing a huge list of sentences, some containing keywords, and some not, so in order to specifically focus on the ones with keywords, I used this method. It works, but is there another way to this without having to create a new file?

keyW = ["love", "like", "best", "hate", "lol", "better", "worst", "good", "happy", "haha", "please", "great", "bad", "save", "saved", "pretty", "greatest", 'excited', 'tired', 'thanks', 'amazing', 'glad', 'ruined', 'negative', 'loving', 'sorry', 'hurt', 'alone', 'sad', 'positive', 'regrets', 'God']
with open('tweets.txt') as oldfile, open('newfile.txt', 'w') as newfile:
    for line in oldfile:
        if any(word in line for word in keyW):
         newfile.write(line)


because with these specific tweets, I'm going to use them when doing another function

for line in open('tweets.txt'):
    line = line.split(" ")
    lat = float(line[0][1:-1]) #Stripping the [ and the ,
    long = float(line[1][:-1])  #Stripping the ]
    if eastern.contains(lat, long):
        eastScore += score(line)
    elif central.contains(lat, long):
        centralScore += score(line)
    elif mountain.contains(lat, long):
        mountainScore += score(line)
    elif pacific.contains(lat, long):
        pacificScore += score(line)
    else:
        continue


Ultimately, this is what my code looks like.

```
from collections import Counter
try:
keyW_Path = input("Enter file named keywords: ")
keyFile = open(keyW_Path, "r")
except IOError:
print("Error: file not found.")
exit()
# Read the keywords into a list
keywords = {}
wordFile = open('keywords.txt', 'r')
for line in wordFile.readlines():
word = line.replace('\n', '')
if not(word in keywords.keys()): #Checks that the word doesn't already exist.
keywords[word] = 0 # Adds the word to the DB.
wordFile.close()
# Read the file name from the user and open the file.
try:
tweet_path = input("Enter file named tweets: ")
tweetFile = open(tweet

Solution

Your code in general

You should definitely familiarize yourself with PEP8 which specifies Python coding style guidelines.
Those include joint lower case names for variables an functions as well as empty lines around function and class definitions and a maximum line length of 79 characters.

Consistency

Whatever you do, be consistent. You mix single and double quotes for string literals and you do not use the file's build-in context management and iteration capability on wordFile.

Divide and conquer

Your code seems to do some complex data processing in several steps. Isolate these different tasks and outsource them into several functions, each specialized for one certain process. You already did this with the functions score() and total() and your Regions() class. Let's have more of those.

Use the script's __name__

Put the part of your code that should run when you execute your script inside an

if __name__ == '__main__':
    


block. This prevents it to run on the import of the script as a module, if you or some other user one day decide to re-use its members, i.e. functions and classes, in other programs.

Comments

Though you commented parts of your code, those comments are a counterexample of their kind. You state the obvious by commenting on checking the membership of items in a dictionary, which can obviously be read from the code itself. On the other hand it is not clear, why you store comma seperated values as a key in this very dictionary, giving each key the value of 0 without using the dictionary any further apart from printing it out.

Filtering lines

Regarding your first question, filtering lines of a file by certain keywords is a common example for coroutines. You can use a method like

def grep(keywords):
    """Yields lines containing keywords"""

    file = yield

    for line in file:
        if any(keyword in line for keyword in keywords):
            yield line


and invoke it with

with open(tweets_file) as tweets:
        fltr = grep(words)
        next(fltr)
        fltr.send(tweets)

        for line in fltr:
            print(line)


assuming words is your keyword list.

Code Snippets

if __name__ == '__main__':
    <your code here>
def grep(keywords):
    """Yields lines containing keywords"""

    file = yield

    for line in file:
        if any(keyword in line for keyword in keywords):
            yield line
with open(tweets_file) as tweets:
        fltr = grep(words)
        next(fltr)
        fltr.send(tweets)

        for line in fltr:
            print(line)

Context

StackExchange Code Review Q#147297, answer score: 3

Revisions (0)

No revisions yet.