patternpythonMinor
Selecting specific lines from a file and focusing on them only?
Viewed 0 times
filefocusingselectingspecificandfromthemonlylines
Problem
So I have a file containing a huge list of sentences, some containing keywords, and some not, so in order to specifically focus on the ones with keywords, I used this method. It works, but is there another way to this without having to create a new file?
because with these specific tweets, I'm going to use them when doing another function
Ultimately, this is what my code looks like.
```
from collections import Counter
try:
keyW_Path = input("Enter file named keywords: ")
keyFile = open(keyW_Path, "r")
except IOError:
print("Error: file not found.")
exit()
# Read the keywords into a list
keywords = {}
wordFile = open('keywords.txt', 'r')
for line in wordFile.readlines():
word = line.replace('\n', '')
if not(word in keywords.keys()): #Checks that the word doesn't already exist.
keywords[word] = 0 # Adds the word to the DB.
wordFile.close()
# Read the file name from the user and open the file.
try:
tweet_path = input("Enter file named tweets: ")
tweetFile = open(tweet
keyW = ["love", "like", "best", "hate", "lol", "better", "worst", "good", "happy", "haha", "please", "great", "bad", "save", "saved", "pretty", "greatest", 'excited', 'tired', 'thanks', 'amazing', 'glad', 'ruined', 'negative', 'loving', 'sorry', 'hurt', 'alone', 'sad', 'positive', 'regrets', 'God']
with open('tweets.txt') as oldfile, open('newfile.txt', 'w') as newfile:
for line in oldfile:
if any(word in line for word in keyW):
newfile.write(line)because with these specific tweets, I'm going to use them when doing another function
for line in open('tweets.txt'):
line = line.split(" ")
lat = float(line[0][1:-1]) #Stripping the [ and the ,
long = float(line[1][:-1]) #Stripping the ]
if eastern.contains(lat, long):
eastScore += score(line)
elif central.contains(lat, long):
centralScore += score(line)
elif mountain.contains(lat, long):
mountainScore += score(line)
elif pacific.contains(lat, long):
pacificScore += score(line)
else:
continueUltimately, this is what my code looks like.
```
from collections import Counter
try:
keyW_Path = input("Enter file named keywords: ")
keyFile = open(keyW_Path, "r")
except IOError:
print("Error: file not found.")
exit()
# Read the keywords into a list
keywords = {}
wordFile = open('keywords.txt', 'r')
for line in wordFile.readlines():
word = line.replace('\n', '')
if not(word in keywords.keys()): #Checks that the word doesn't already exist.
keywords[word] = 0 # Adds the word to the DB.
wordFile.close()
# Read the file name from the user and open the file.
try:
tweet_path = input("Enter file named tweets: ")
tweetFile = open(tweet
Solution
Your code in general
You should definitely familiarize yourself with PEP8 which specifies Python coding style guidelines.
Those include joint lower case names for variables an functions as well as empty lines around function and class definitions and a maximum line length of 79 characters.
Consistency
Whatever you do, be consistent. You mix single and double quotes for string literals and you do not use the
Divide and conquer
Your code seems to do some complex data processing in several steps. Isolate these different tasks and outsource them into several functions, each specialized for one certain process. You already did this with the functions
Use the script's
Put the part of your code that should run when you execute your script inside an
block. This prevents it to run on the import of the script as a module, if you or some other user one day decide to re-use its members, i.e. functions and classes, in other programs.
Comments
Though you commented parts of your code, those comments are a counterexample of their kind. You state the obvious by commenting on checking the membership of items in a dictionary, which can obviously be read from the code itself. On the other hand it is not clear, why you store comma seperated values as a key in this very dictionary, giving each key the value of
Filtering lines
Regarding your first question, filtering lines of a file by certain keywords is a common example for coroutines. You can use a method like
and invoke it with
assuming
You should definitely familiarize yourself with PEP8 which specifies Python coding style guidelines.
Those include joint lower case names for variables an functions as well as empty lines around function and class definitions and a maximum line length of 79 characters.
Consistency
Whatever you do, be consistent. You mix single and double quotes for string literals and you do not use the
file's build-in context management and iteration capability on wordFile.Divide and conquer
Your code seems to do some complex data processing in several steps. Isolate these different tasks and outsource them into several functions, each specialized for one certain process. You already did this with the functions
score() and total() and your Regions() class. Let's have more of those.Use the script's
__name__Put the part of your code that should run when you execute your script inside an
if __name__ == '__main__':
block. This prevents it to run on the import of the script as a module, if you or some other user one day decide to re-use its members, i.e. functions and classes, in other programs.
Comments
Though you commented parts of your code, those comments are a counterexample of their kind. You state the obvious by commenting on checking the membership of items in a dictionary, which can obviously be read from the code itself. On the other hand it is not clear, why you store comma seperated values as a key in this very dictionary, giving each key the value of
0 without using the dictionary any further apart from printing it out.Filtering lines
Regarding your first question, filtering lines of a file by certain keywords is a common example for coroutines. You can use a method like
def grep(keywords):
"""Yields lines containing keywords"""
file = yield
for line in file:
if any(keyword in line for keyword in keywords):
yield lineand invoke it with
with open(tweets_file) as tweets:
fltr = grep(words)
next(fltr)
fltr.send(tweets)
for line in fltr:
print(line)assuming
words is your keyword list.Code Snippets
if __name__ == '__main__':
<your code here>def grep(keywords):
"""Yields lines containing keywords"""
file = yield
for line in file:
if any(keyword in line for keyword in keywords):
yield linewith open(tweets_file) as tweets:
fltr = grep(words)
next(fltr)
fltr.send(tweets)
for line in fltr:
print(line)Context
StackExchange Code Review Q#147297, answer score: 3
Revisions (0)
No revisions yet.