Recent Entries 10
- pattern minor 112d agoPython CGI front-end for web service to perform machine translationI am trying to optimize this python script that is used to process web requests for machine translation. The actual translation executable that is called is quite fast. Also, the perl scripts that are called are fast as well. The largest performance boost came from removing unnecessary import libraries. I would like to have this code reviewed so I can further optimize the performance. Also, I welcome any advice on a pythonic way of testing performance. My code is littered with timing and print commands that I removed for this post. ``` #!/usr/bin/env python # -*- coding: UTF-8 -*- import time import sys import cgi import subprocess import string import xmlrpclib reload(sys) sys.setdefaultencoding('utf8') isTestPerformance = len(sys.argv) == 4 # Parameters if isTestPerformance: source = sys.argv[1] target = sys.argv[2] sourceText = sys.argv[3] else: # this part is important to tell the browser that output is html text. print "Access-Control-Allow-Origin: *" print "Content-Type: text/plain;charset=utf-8" print form = cgi.FieldStorage() sourceText = form.getvalue("sourceText").decode('utf8') source = form.getvalue("source").lower() target = form.getvalue("target").lower() # Decode the CGI encoded source text # NOTE: Custom encoding of semicolon (;), (?), (&), (#), etc, is only done here because # CGI can not handle them. Do not used this (decode) if you are not using CGI, # or use some other decoding that matches the encoding from the caller of this code sourceText = sourceText.replace("__QUESTION_MARK__", "?") sourceText = sourceText.replace("__SEMICOLON__", ";") sourceText = sourceText.replace("__AMPERSAND__", "&") sourceText = sourceText.replace("__NUMBER__", "#") # sourceText = sourceText.replace("__NEWLINE__", "\n") # Tokenize the Source Text if source == "zh": # Chinese has to do word alignment # options are slim: write the text to a file # use NLTK Stanford NLP (python>java) to segment chinese
- pattern minor 112d agoFind factors of a numberI am a beginner in C. I have just written this program to find the factors of a provided number \$n\$ where \$1\leq n \leq 10^9\$. However, when I input large numbers (e.g. the maximum, \$10^9\$), the program takes a long time to finish finding the larger factors. How do I reduce its time taken? Also, are there any possible improvements for this code? ``` #include int main(){ int a,i; scanf("%d",&a); for(i=1;i<(a/2+1);i++){ if(a%i==0){ printf("%d\n",i); } } printf("%d\n",a); return 0; } ```
- pattern minor 112d agoPreprocessing steps to follow while cleaning and extracting text data from tweetsI have a dataset of around 200,000 tweets. I am running a classification task on them. Dataset has two columns - class label and the tweet text. In the preprocessing step I am passing the dataset through following cleaning step: ``` import re from nltk.corpus import stopwords import pandas as pd def preprocess(raw_text): # keep only words letters_only_text = re.sub("[^a-zA-Z]", " ", raw_text) # convert to lower case and split words = letters_only_text.lower().split() # remove stopwords stopword_set = set(stopwords.words("english")) meaningful_words = [w for w in words if w not in stopword_set] # join the cleaned words in a list cleaned_word_list = " ".join(meaningful_words) return cleaned_word_list def process_data(dataset): tweets_df = pd.read_csv(dataset,delimiter='|',header=None) num_tweets = tweets_df.shape[0] print("Total tweets: " + str(num_tweets)) cleaned_tweets = [] print("Beginning processing of tweets at: " + str(datetime.now())) for i in range(num_tweets): cleaned_tweet = preprocess(tweets_df.iloc[i][1]) cleaned_tweets.append(cleaned_tweet) if(i % 10000 == 0): print(str(i) + " tweets processed") print("Finished processing of tweets at: " + str(datetime.now())) return cleaned_tweets cleaned_data = process_data("tweets.csv) ``` And here is the relevant output: ``` Total tweets: 216041 Beginning processing of tweets at: 2017-05-16 13:45:47.183113 Finished processing of tweets at: 2017-05-16 13:47:01.436338 ``` It's taking approx. 2 minutes to process the tweets. Although it looks relatively a small timeframe for current dataset I would like to improve it further especially when I use a dataset of much bigger size. Can the steps/code in the `preprocess(raw_text)` method be improved in order to achieve faster execution?
- pattern minor 112d agoLow C performance for TSP heuristicI've been recently doing classes on the traveling salesman problem. I've been tackling the nearest neighbor heuristic for the problem and seeing that the algorithm was simplistic, I've decided to see how fast would be solutions implemented in different languages. I've implemented several versions of the solution using different lua extensions, of those most importantly, luajit. I've also done the same algorithm in C. What surprised me is that luajit solution using C data structures actually beats the pure C implementation significantly (5-6 times faster) C was compiled with `gcc -o3` and the problem size was such that completion times were on the order of a few seconds. I'm not very well versed in the ways of the language, so I wonder if I didn't do something right in C code: ``` #include typedef struct city {double x; double y; double ind;} city; double tsp(city* cities, int N){ city current=cities[0]; double S=0; for (int i=N-1; i>0; i--){ int pos=i; city viewing= cities[i]; double d2=pow((current.x-viewing.x),2)+pow((current.y-viewing.y),2); double ind=viewing.ind; for (int j=1; j<i; j++){ viewing=cities[j]; double D2=pow((current.x-viewing.x),2)+pow((current.y-viewing.y),2); if ((D2 < d2)||((D2==d2)&&(viewing.ind<ind))){ pos=j; d2=D2; ind=viewing.ind; } } S+=sqrt(d2); current=cities[pos]; cities[pos]=cities[i]; } return S+sqrt(pow((cities[0].x-cities[1].x),2)+pow((cities[0].y-cities[1].y),2)); } ``` The function implements the following algorithm: start at city[0], find nearest city, go there, find nearest unvisited city, go there, repeat until no cities left, go to city[0].
- pattern minor 112d agoMonty Hall in BBC BASIC and PythonHow would I be able to improve the speed of this Monty Hall program? Interestingly, the same code written using BBC BASIC for Windows completes the task in half the time of the Python code. Python: ``` import random t = 10000001 j = 0 k = 0 for a in range(1, t): p = int(random.random() * 3) + 1 g = int(random.random() * 3) + 1 if p == g: r = int(random.random() * 2) + 1 if p == 1: r += 1 if p == 2 and r == 2: r = 3 else: r = p ^ g s = g f = g ^ r if s == p: j = j + 1 if f == p: k = k + 1 print(f"After a total of {t - 1} trials,") print(f"The 'sticker' won {j} times ({int(j/t*100)}%)") print(f"The 'swapper' won {k} times ({int(k/t*100)}%)") ``` BBC BASIC for Windows: ``` T% = 10000000 for A% = 1 to T% P% = rnd(3) G% = rnd(3) if P% = G% then R% = rnd(2) if P% = 1 then R% += 1 if P% = 2 and R% = 2 then R% = 3 else R% = P% eor G% endif S% = G% F% = G% eor R% if S% = P% then J% = J% + 1 if F% = P% then K% = K% + 1 next print "After a total of ";T%;" trials," print "The 'sticker' won ";J%;" times (";int(J%/T%*100);"%)" print "The 'swapper' won ";K%;" times (";int(K%/T%*100);"%)" ```
- pattern minor 112d agoCollision detection methodI think everyone has some code they are embarrassed and not proud of and today I have decided to show mine. I'm not sure how to go about making this more efficient. At the time I was just happy it did what I wanted it to do and it's the first game I ever made. So basically, this is a collision detection method for my game and it describes which objects should collide and if they do, what should happen? Any ideas on how to improve would be much appreciated :) ``` void collisionHandling(GameObject other) { //the following method details the outcome whenever one class comes into contact with another if ((this instanceof Heart && other instanceof Ship) && this.overlap(other)) { //increments ship lives if heart is hit by ship unless the ship has 5 lives Game.ship.dead = false; if (Ship.lives < 5) Game.ship.incLives(); this.hit(); } if ((this instanceof ShieldSprite && other instanceof Ship) && this.overlap(other)) { //makes the ship invulnerable if the sprite is hit and sets a counter till it runs out Game.ship.dead = false; Ship.invul = true; Game.shipInvulCounter = 400; this.hit(); } if ((this instanceof Enemy && other instanceof Bullet) && this.overlap(other)) { //if the enemy gets it by a bullet it gets hit this.hit(); } if ((this instanceof Enemy && other instanceof Ship) && this.overlap(other)) { //if the enemy gets it by a bullet it gets hit this.hit(); other.hit(); } if((this instanceof Enemy && other instanceof BigBullet) && this.overlap(other)){ //if the enemy gets it by a big bullet it dies instantly this.dead = true; other.hit(); } if((this instanceof BigBullet && other instanceof Asteroid) && this.overlap(other)){ //if a big bullet hits an asteroid they both get hit other.hit(); this.hit(); } if((this instanceof Bullet && other instanceof Aster
- pattern minor 112d agoBeating memcmp in C++Once again I decided to beat system `memcmp` function. This time I decided to use a template and to "precompile" all cases from 0 to 31 bytes. Result is 400% improvement - from about 1:15 min to 0:25 min. Finally I had rewritten `memcmp_fixed_` with naive looking for statement and I noticed that the compiler can optimize it as well. However I did not tested with random data, so I am not sure what role the cache line and branch predictor plays in the tests. Here is the code: ``` #include #include namespace{ template int memcmp_fixed_(const unsigned char *s1, const unsigned char *s2){ for(size_t i = 0; i int memcmp_fixed_(const unsigned char *s1, const unsigned char *s2){ return *s1 - *s2; } template int memcmp_fixed_(const void *a1, const void *a2){ const unsigned char *s1 = (const unsigned char *) a1; const unsigned char *s2 = (const unsigned char *) a2; return memcmp_fixed_(s1, s2); } } inline int fast_memcmp(const void *a1, const void *a2, size_t const size){ switch(size){ case 0: return 0; case 1: return memcmp_fixed_(a1, a2); case 2: return memcmp_fixed_(a1, a2); case 3: return memcmp_fixed_(a1, a2); case 4: return memcmp_fixed_(a1, a2); case 5: return memcmp_fixed_(a1, a2); case 6: return memcmp_fixed_(a1, a2); case 7: return memcmp_fixed_(a1, a2); case 8: return memcmp_fixed_(a1, a2); case 9: return memcmp_fixed_(a1, a2); case 10: return memcmp_fixed_(a1, a2); case 21: return memcmp_fixed_(a1, a2); case 22: return memcmp_fixed_(a1, a2); case 23: return memcmp_fixed_(a1, a2); case 24: return memcmp_fixed_(a1, a2); case 25: return memcmp_fixed_(a1, a2); case 26: return memcmp_fixed_(a1, a2); case 27: return memcmp_fixed_(a1, a2); case 28: return memcmp_fixed_(a1, a2); case 29: return memcmp_fixed_(a1, a2); case 30: return memcmp_fixed_(a1, a2);
- pattern minor 112d agoFind lexicographical permutationsI have tried many possible methods here but I am still not able to reduce the time complexity of below algorithm. The basic layout of my problem is as follows: I have a string that is composed of only H's and V's. There can be not more than 10 H's and 10 V's possible. I need to find the Kth lexicographical order of such pattern. Also, inp[] array contains input is format: ``` 2 2 3 4 5 4 ``` which means that total H = 2, total V = 2 and I need to print 3rd lexicographical permutation of HHVV. Similarly for 4 5 4, I need to print 4th lexicographical permutation of HHHHVVVVV. ``` //This function returns the next lexicographical permutation of the StringBuilder s static StringBuilder getKRankString(StringBuilder s) { char ch[] = (s.toString()).toCharArray(); int i = 0, j=0; //This loop gets the highest i for which ch[i]ch[i] for(int k=s.length()-1;k>i;k--){ if(ch[k]>ch[i]){ j = k; break; } } //System.out.println(j); //swap characters at i and j char temp = ch[i]; ch[i] = ch[j]; ch[j] = temp; //Append the original string till i to reversed string from i till end StringBuilder swapped = new StringBuilder(new String(ch)); //System.out.println(swapped); String sb1 = swapped.substring(0, i+1); String sb2 = (new StringBuilder(swapped.substring(i+1, s.length())).reverse()).toString(); StringBuilder str = new StringBuilder(sb1+sb2); return str; } static String[] gridLand(String[] inp) { String[] outputArr = new String[inp.length]; for(int i=0;i<inp.length;i++){ //get the first line of input and get integer x and y int x = Integer.valueOf(inp[i].split(" ")[0]); int y = Integer.valueOf(inp[i].split(" ")[1]); int k = Integer.valueOf(inp[i].split(" ")[2]); StringBuilder sb = new StringBuilder(); //form the original string using x H's and y V's. //This will be the first String of this order for(in
- pattern minor 112d agoReturn all valid expressions with specific targetThe problem is stated as: Given a string that contains only digits 0-9 and a target value, return all expressions that are created by adding some binary operators (+, -, or *) between the digits so they evaluate to the target value. In some cases there may not be any binary operators that will create valid expressions, in which case the function should return an empty array. The numbers in the new expressions should not contain leading zeros. The function should return all valid expressions that evaluate to target, sorted lexicographically. For example: `digits = "123"` and `target = 6`, should return: `["1*2*3", "1+2+3"]` My current algorithm is below. I optimized it as much as I can. The question is base on code fights. My algo will generate all combinations of operands and operators accordingly. For the example above, it'll generate: Operands: ``` [['1', '2', '3'], ['1', '23'], ['12', '3'], ['123']] ``` Operators: ``` {0: [()], 1: [('+',), ('-',), ('*',)], 2: [('+', '+'), ('+', '-'), ('+', '*'), ('-', '+'), ('-', '-'), ('-', '*'), ('*', '+'), ('*', '-'), ('*', '*')]} ``` It then combines all possible combinations of operands and operators and evaluate each. For `digits = "1234506789"` and `target = 6`, it takes about 2.2 secs. It should be enough for code fights that has a limit of 4 sec, but I guess that also depends on the speed of the processor. But for some reason it still hits the time limit from the site. Thus Im trying to see how it can be optimized a bit more. Restrictions are: ``` 2 <= digits.length <= 10 -10^4 <= target <= 10^4 ``` My code is below. I commented out some of the alternatives I used, which pretty much has the same speed though. ``` from itertools import combinations, permutations import itertools import time def getExpression(digits, target): operation = { '+': lambda a, b: a + b, '-': lambda a, b: a - b, '*': lambda a, b: a * b, } seen = {} def calculate2(e,sign):
- pattern minor 112d agoFind the center of a red dot of an image in HaskellI try to find the center of all the red pixels contained in an image using `Haskell` and `repa`. My problem is, that the code is not fast enough, probably because I read the image with `repa-io` which returns an `Array Z DIM3 Word8`, but I was not able to figure out how to filter the repa array by how red the given pixels are. That's why I convert the array to a normal list, using `R.toList`. The rest of the calculations are then done using lists. Here is my code: ``` import Data.Array.Repa.Repr.ForeignPtr (F, fromForeignPtr, toForeignPtr) import Data.Array.Repa (Z(..), (:.)(..), DIM0, DIM1, DIM2, DIM3, Array(..), Source, Shape) import qualified Data.Array.Repa as R import Data.Array.Repa.IO.DevIL import Data.List import System.Environment (getArgs) main :: IO () main = do [path] Array r DIM3 a -> (Int, Int) findRedDot img = (median . fst $ splitted, median . snd $ splitted) where -- function returning the coordinates as a tuple, when the pixel -- is red enough, otherwise return (-1, -1) convert f (Z :. i :. j) = let r = f (Z :. i :. j :. 0) g = f (Z :. i :. j :. 1) b = f (Z :. i :. j :. 2) in if r > 237 && (g + b) Z :. w :. h) convert -- turns the DIM2 repa array into a default haskell list lst = R.toList packed p (-1, -1) = False p _ = True -- split up the list into x and y component splitted = splitUp $ filter p lst -- | split up a list of tuples into two lists splitUp :: [(Int, Int)] -> ([Int], [Int]) splitUp [] = ([], []) splitUp ((x, y):rest) = (x:(fst next), y:(snd next)) where next = splitUp rest -- | find the median of a numeric list median :: (Num a, Ord a) => [a] -> a median [] = -1 median l = sorted !! mid where len = length l mid = len `quot` 2 sorted = sort l ``` I did compile with `g