Recent Entries 10
- pattern minor 112d agoPython CGI front-end for web service to perform machine translationI am trying to optimize this python script that is used to process web requests for machine translation. The actual translation executable that is called is quite fast. Also, the perl scripts that are called are fast as well. The largest performance boost came from removing unnecessary import libraries. I would like to have this code reviewed so I can further optimize the performance. Also, I welcome any advice on a pythonic way of testing performance. My code is littered with timing and print commands that I removed for this post. ``` #!/usr/bin/env python # -*- coding: UTF-8 -*- import time import sys import cgi import subprocess import string import xmlrpclib reload(sys) sys.setdefaultencoding('utf8') isTestPerformance = len(sys.argv) == 4 # Parameters if isTestPerformance: source = sys.argv[1] target = sys.argv[2] sourceText = sys.argv[3] else: # this part is important to tell the browser that output is html text. print "Access-Control-Allow-Origin: *" print "Content-Type: text/plain;charset=utf-8" print form = cgi.FieldStorage() sourceText = form.getvalue("sourceText").decode('utf8') source = form.getvalue("source").lower() target = form.getvalue("target").lower() # Decode the CGI encoded source text # NOTE: Custom encoding of semicolon (;), (?), (&), (#), etc, is only done here because # CGI can not handle them. Do not used this (decode) if you are not using CGI, # or use some other decoding that matches the encoding from the caller of this code sourceText = sourceText.replace("__QUESTION_MARK__", "?") sourceText = sourceText.replace("__SEMICOLON__", ";") sourceText = sourceText.replace("__AMPERSAND__", "&") sourceText = sourceText.replace("__NUMBER__", "#") # sourceText = sourceText.replace("__NEWLINE__", "\n") # Tokenize the Source Text if source == "zh": # Chinese has to do word alignment # options are slim: write the text to a file # use NLTK Stanford NLP (python>java) to segment chinese
- pattern minor 112d agoCSGO inventory and price python code``` #!/usr/bin/python # -*- coding: utf-8 -*- import urllib2 import json import datetime import time global file_name file_name = "skins 2017-05-05 23-15-16.txt" wear_list = ["Factory New", "Minimal Wear", "Field-Tested", "Well-Worn", "Battle-Scarred"] wear_val = {"Factory New": 1, "Minimal Wear": 2, "Field-Tested": 3, "Well-Worn": 4, "Battle-Scarred": 5} items = [] item_prices = {} def getInventory(steamid): try: data = urllib2.urlopen('http://steamcommunity.com/profiles/'+steamid+'/inventory/json/730/2') except: print("Overloaded the server...") print("Waiting...") time.sleep(60) data = urllib2.urlopen('http://steamcommunity.com/profiles/'+steamid+'/inventory/json/730/2') json_data = json.loads(data.read()) descriptions = json_data['rgDescriptions'] now = datetime.datetime.now() date = now.strftime("%Y-%m-%d %H-%M-%S") global file_name file_name = "skins " + str(date) + ".txt" txt = open(file_name, "w+") for v in descriptions: name = str([descriptions[v]['market_name']]) name = name[3:] name = name[:-2] if name.endswith("Flip Knife | Rust Coat (Battle-Scarred)"): name = name[7:] if name.startswith("StatTrak"): name = name[15:] name = 'StatTrak ' + name if name.endswith("(Dragon King) (Minimal Wear)"): name = "M4A4 | Dragon King (Minimal Wear" txt.write(name) txt.write('\n') #txt.write(str(descriptions[v])) #txt.write('\n') print(name) txt.close() print('Done!') return def getPrice(): x = 1 gun_name_wear = 0 txt = open(file_name, "r+") for line in txt: stattrak = 0 wear = line[line.find("(")+1:line.find(")")] if wear in wear_list: print(wear) wear = wear.replace(" ","%20") gun = line.split(' |', 1)[0].replace('.', '') print(gun) if "StatTr
- principle minor 112d agoGame winning optimal strategyConsider a row of n coins of values v1 . . . vn, where n is even. a player selects either the first or last coin from the row, removes it from the row permanently, and receives the value of the coin. Determine the maximum possible amount of money we can definitely win if we move first. Let us understand the problem with few examples: [8, 15, 3, 7] : The user collects maximum value as 22(7 + 15) Does choosing the best at each move give an optimal solution? Is there any way to improve this code? ``` def optimal_strategy_game(v): n = len(v) t = [[0 for x in xrange(n)] for x in xrange(n)] for gap in xrange(n): for i in xrange(n): j = i + gap if j j-1 else t[i+1][j-1] y = t[i+2][j] if i+2 <j else 0 z = t[i][j-2] if i <= j-2 else 0 t[i][j] = max(v[i]+min(x,y), v[j]+min(x,z)) print t print t[0][n-1] v = [8,15,3,7] n = len(v) print optimal_strategy(0,n-1,v) optimal_strategy_game(v) ```
- pattern minor 112d agoUse Python to determine the repeating pattern in a stringI am writing an algorithm to count the number of times a substring repeats itself. The string is between 1-200 characters ranging from letters a-z. There should be no left overs at the end of the pattern either and it should be split into the smallest possible combination. ``` answer("abcabcabcabc"):output = 4 answer("abccbaabccba"): output = 2 answer("abcabcd"): output = 1 ``` My code: ``` import re def answer(s): length = len(s) x=[] reg = "" for i in range(1, length+1): if length % i == 0: x.append(i) repeat = len(x)*10 for _ in range(0, repeat): a = length/max(x) print(length) print(max(x)) print(a) for _ in range(0, a): reg = reg + "." exp = re.findall(reg, s) print exp if all(check==exp[0] for check in exp): print len(exp) return len(exp) elif all(check!=exp[0] for check in exp): x.remove(max(x)) ``` This is Python 2.7 code and I don't have to use regex. It just seemed like the easiest way. Is there a better/faster/more optimal way to do this? NOTE: it breaks if the string size is too big. EDIT: Fixed indentation
- pattern minor 112d agoLoop through database and run shell commands with Python and exiftoolBriefly, I'm looking at getting the code below to execute faster. I have 100k images to go through. I'm running a query against MySQL, looping through results and then running exiftool against an image, then moving it. I started running it and it quickly became evident it wouldn't be a quick thing :-( ``` import mysql.connector import os cnx = mysql.connector.connect(user='root',database='database', password='password') cursor = cnx.cursor() query = ("SELECT post_title,Event,File,Name from a order by File") cursor.execute(query) def shellquote(s): return s.replace("'", "") for (post_title, Event,File,Name) in cursor: olddir = r'/home/alan/Downloads/OLD/' newdir = r'/home/alan/Downloads/NEW/' + post_title oldfile = olddir + File newfile = newdir + "/"+File if not os.path.exists(newfile): os.makedirs(newfile) if os.path.isfile(oldfile): print " > PROCESSING: " + oldfile os.system("exiftool -q "+shellquote(oldfile)+" -xmp:title='"+shellquote(post_title)+"'") os.system("exiftool -q "+shellquote(oldfile)+" -xmp:description='"+shellquote(Name)+" courtesy of https://www.festivalflyer.com'") os.system("exiftool -q "+shellquote(oldfile)+" -description='"+shellquote(Name)+" courtesy of https://www.festivalflyer.com'") os.rename(oldfile, newfile) cursor.close() cnx.close() ``` I tried using subprocess but for whatever reason, I didn't get it to run. Any advice is welcome. I suppose I could move the 3 lines of `exiftool` commands to just one and pass multiple arguments. I also saw `-stay_open` as an option to `exiftool` but not sure how to apply it
- pattern minor 112d agoSimplify the restructuring of json dataI am trying to ad nested to some flat data, which is nested. Basically this code works the following way: - "taglevel":1 tags should be key of the array - "taglevel":2 or higher tags should be nested within an array and not be duplicated in its' array - If no "taglevel":1 exists add, it to a generic "NoLevel_1" array The code is still clunky and I feel there is a much cleaner way to achieve this. ``` import json generic = [] result = [] for i in json_data: if any(d['taglevel'] == 1 for d in i['tag']): tag_data = {} tag_child = [] for tag in i['tag']: if tag['taglevel'] == 1: tag_data['name'] = tag['name'] tag_data['taglevel'] = 1 else: tag_child.append(tag) filtered = {tuple((k, d[k]) for k in sorted(d) if k in ['name']): d for d in tag_child} tag_data['tag_child'] = list(filtered.values()) if any(d['name'] == tag_data['name'] for d in result): for t in result: if t['name'] == tag_data['name']: t['tag_child'] = t['tag_child'] + tag_child filtered = {tuple((k, d[k]) for k in sorted(d) if k in ['name']): d for d in t['tag_child']} t['tag_child'] = list(filtered.values()) else: result.append(tag_data) else: for tag in i['tag']: generic.append(tag) tag_data = {} tag_data['name'] = 'NoLevel1' tag_data['taglevel'] = 1 tag_data['tag_child'] = generic result.append(tag_data) print json.dumps(result, indent=4, sort_keys=True) ``` The data: ``` json_data = [{ "title": "Random", "tag": [ { "name": "Fruit", "taglevel": 1 }, { "name": "Apple", "taglevel": 2 } ] }, { "title": "Other", "tag": [ { "name": "Fruit", "taglevel": 1
- debug moderate 112d agoFind the total number of ways W, in which a sum S can be reached in N throws of a diceI was solving this question: Given there is a 6 sided dice. Find the total number of ways W, in which a sum S can be reached in N throws. Example: S = 1, N = 6 => W = 0 S = 6, N = 6 => W = 1 S = 7, N = 6 => W = 6 S = 3, N = 2 => W = 2 How to improve its complexity and make it more readable? ``` def get_sum_dp(n,s): t = [[0 for i in xrange(1,s+2)] for j in xrange(1,n+2)] for j in xrange(1,7): t[1][j] = 1 for i in range(2, n+1): for j in range(1, s+1): for k in range(1,7): if k < j: t[i][j] += t[i-1][j-k] print t[n][s] get_sum_dp(2,8) ```
- pattern moderate 112d agoCheck if a number N is a power of KI was asked this question in interview: Check if a number N is a power of K. Example: N = 32, K = 2 => True N = 40, K = 5 => False I wrote following code but got the feedback that, complexity could have be improved, How to improve its complexity? ``` def check_kth_power(n, k): while n%k == 0: n = n/k if n != 1: return False return True print check_kth_power(128, 5) ```
- pattern minor 112d agoStudent Attendance Record III'm currently working on the Student Attendance Record II problem: Given a positive integer \$n\$, return the number of all possible attendance records with length \$n\$, which will be regarded as rewardable. The answer may be very large, return it after `mod 109 + 7`. A student attendance record is a string that only contains the following three characters: ``` 'A' : Absent. 'L' : Late. 'P' : Present. ``` A record is regarded as rewardable if it doesn't contain more than one 'A' (absent) or more than two continuous 'L' (late). The idea is to use Dynamic Programming to use the results for smaller `n` to calculate the results of bigger `n`. I'm currently keeping two lists - one to track `L` and `P` and the other - to track `A`: ``` MODULO = 10 ** 9 + 7 class Solution(object): def checkRecord(self, n): lates = [[0, 0, 0], [1, 1, 0]] + [[0, 0, 0] for _ in xrange(2, n + 1)] absences = [[0, 0, 0], [1, 0, 0]] + [[0, 0, 0] for _ in xrange(2, n + 1)] for i in xrange(2, n + 1): last_late_row = lates[i - 1] last_late_row_sum = last_late_row[0] + last_late_row[1] + last_late_row[2] last_absence_row = absences[i - 1] last_absence_row_sum = last_absence_row[0] + last_absence_row[1] + last_absence_row[2] lates[i] = last_late_row_sum % MODULO, last_late_row[0], last_late_row[1] absences[i] = ((last_late_row_sum + last_absence_row_sum) % MODULO, last_absence_row[0], last_absence_row[1]) return (sum(lates[n]) + sum(absences[n])) % MODULO ``` The code works, but it does not pass the Time Limit requirements for big `n`. Even though, for `n = 100000` LeetCode OJ runs it for only ~220ms. How would you recommend to improve on running time?
- pattern minor 112d agoTrack changes inside a directoryI have built a Python 2.7 script to track all file and subdir changes inside a nominated directory. It is used with directories that have multiple levels of subdirectories, hundreds of thousands of files, and hundreds of GB of file data. The filenames can have Unicode characters (encoded in UTF-8). By "changes" I mean additions/deletions of files and subdirs, or changes in filesizes (i.e., we are not concerned with the content of the files). The tracking is not continuous, we are just comparing to the last time we checked (typically checking twice a day). The script works fine to the best of my knowledge. I would gladly receive feedback on any aspect of the script including best coding practices and design pattern usage, handling of unexpected cases, and performance. I include the whole script here. It's 310 lines long and I am wondering whether this might be too long as a question body, but I could not find size guidelines on the site. I opted to include everything instead of code snippets since this seems to be the recommended practice here. I also recognise that the width of my lines does not offer the best viewing opportunity inside the code box (which seems to fit 93 char lines). I normally use 120-char vertical rulers in my code, and sometimes I allow lines to go past them. I am not sure if I should modify my code to offer a better viewing chance here. Let me know if reading the code here is too annoying, and I'll wrap-line it. You can find the code, with more backstory, details, and other pieces of code that help run the tool as a background agent here: https://github.com/boulis/Track-Dir-Changes ``` import json, subprocess from argparse import ArgumentParser from os import walk from os.path import join, getsize from datetime import datetime parser = ArgumentParser(description="Tracks any changes in a specified directory. Additions, deletions,\n \ changes of files and subdirs are tracked and recorded in a log file.\