Recent Entries 10
- snippet moderate 112d agoCount number of bits to convert one integer to another integerReturn the number of bits (N) that will need to be changed in order to convert an integer(X), into another integer(Y). Constraint: 0 N = 2 X = 2, Y = 3 => N = 1 X = 0, Y = 4 => N = 1 I have written following code, how to improve it? ``` def bit_diff(n1,n2): cmn_d = n1 & n2 all_d = n1 | n2 diff = all_d - cmn_d cnt = 0 while diff: if diff & 1: cnt+=1 diff>>=1 return cnt print bit_diff(1,2) ```
- pattern minor 112d agoFinding roughly matching genome sequences in Python dictionaryThe purpose of my code here is to play a part in genome sequencing analysis, and while functional it takes days to run, so I am looking for any way I can improve speed. The input is up to 500 million lines long (making speed code efficiency important) and contains sequencing reads and corresponding info. Each read takes up 4 lines within the input file and looks something like this: ``` @A001 <-header AAAAACCCCCCCCCCCC <-seq read (finalRead) + ################# <-quality (trimmed_quality) ``` The portion of my code that is very slow takes a dictionary as input, which contains all of the data found within the input sequencing file and is in the form shown below: ``` duplexDict[umi] = {'header':header, 'seq':finalRead, 'qual':trimmed_quality} ``` In the first part of the code I am looking for pairs of sequences by checking for similar keys (termed umi in the code). The goal is to find keys that when converted to complement sequence are only different by a single letter. Then for each key if there is only one closely matching key, the associated dictionaries are retained. If there are no matches or more than one matching key, all of these keys should be ignored. ``` from Levenshtein import distance deDuplexDict = {} # dict that will contain key pairs finalList = [] # list to keep track of valid key pairs for i in duplexDict: # dict with sequencing file info tempList = [] for j in duplexDict: complement = str(Seq(j).complement()) # this is just finding complementary sequence if distance(i,complement) <= 1: # find similar umi/read seq pairs tempList.append(j) # make a list of all similar pairs # only keep a complementary pair if there are exactly two matching consensus reads if len(tempList) == 1: if i not in finalList and j not in finalList: finalList.append(i) finalList.append(j) # only retain those dict values that are true pairs for key in finalList: deDuplexDict[key] =
- pattern minor 112d agoHamming distance between numbers in JavaScriptIn Leetcode it states that my runtime is only faster than 38% all of submitted JavaScript solutions. Is there anything I can change to make it more efficient? ``` /** * @param {number} x * @param {number} y * @return {number} */ var hammingDistance = function(x, y) { var resultStr =( x^y); let count =0; while(resultStr>0){ resultStr =(resultStr&resultStr-1) ; count++; } return count; }; ```
- pattern minor 112d agoGiven two strings, a and b, determine the minimum number of character deletions required to make a and b anagramsI'm trying to figure out how my code could be improved. I solved a HackerRank problem from the Cracking the Coding Interview section, and am thinking that it should be able to be solved a simpler than what I did. The problem is as follows: Strings: Making Anagrams Alice is taking a cryptography class and finding anagrams to be very useful. We consider two strings to be anagrams of each other if the first string's letters can be rearranged to form the second string. In other words, both strings must contain the same exact letters in the same exact frequency For example, `bacdc` and `dcbac` are anagrams, but `bacdc` and `dcbad` are not. Alice decides on an encryption scheme involving two large strings where encryption is dependent on the minimum number of character deletions required to make the two strings anagrams. Can you help her find this number? Given two strings, `a` and `b`, that may or may not be of the same length, determine the minimum number of character deletions required to make `a` and `b` anagrams. Any characters can be deleted from either of the strings. Input Format The first line contains a single string, `a`. The second line contains a single string, `b`. Constraints - \$1 \le |a|,|b| \le 10^4\$ - It is guaranteed that `a` and `b` consist of lowercase English alphabetic letters (i.e., a through z). Output Format Pring a single integer denoting the number of characters you must delete to make the two strings anagrams of each other. Sample Input ``` cde abc ``` Sample Output ``` 4 ``` Explanation We delete the following characters from our two strings to turn them into anagrams of each other: - Remove `d` and `e` from `cde` to get `c`. - Remove `a` and `b` from `abc` to get `c`. We must delete 4 characters to make both strings anagrams, so we print 4 on a new line. My Implementation I am using C++, and since s
- pattern minor 112d agoA telephone book command line program in ANSI C(See the next iteration.) Introduction I was in the mood for some C code and wrote this program for handling a personal telephone record book. One of the goals is strict portability; the code compiles with `gcc -ansi -pedantic -Wall ...` silently. The second goal is being prepared for dealing with problems (out of memory, file I/O errors). Usage The program creates and works on a file called `.telephone_book` that is placed in the user's home directory. In order to add new record: `./tbook --add Claus Santa 040-1234567 ` In order to list all records: `./tbook # list all ./tbook blah # list all records whose last name is closest to "blah" by Levenshtein distance ./tbook - blah # list all records whose first name is closest to "blah" by Levenshtein distance ./tbook bluh blah # list all records whose both last and first names are closest to "bluh blah" by Levenshtein distance ` The output may be something like: `----------+------------+------------------+--- Last name | First name | Telephone number | ID ----------+------------+------------------+--- Bro | Vector | 11 | 1 Efremov | Rodion | 8055 | 2 Funk | Funky | 12321 | 3 ----------+------------+------------------+--- Gates | Bill | 677 | 4 Minogue | Kylie | 4401 | 5 Ryazanov | Viktor | 3454 | 6 ` In order to delete records by their IDs: `./tbook --remove ID1 ID2 ... IDn ` Here are all 5 source files: telephone_book.h: `#ifndef TELEPHONE_BOOK_H #define TELEPHONE_BOOK_H #include /******************************************************************************* * This structure holds a single telephone book record. * *******************************************************************************/ typedef struct { char* first_name; char* last_name; char* telephone_number; int id; } telephone_book_record; /**********************************
- pattern minor 112d agoEdit distance implementationThis problem is taken from the book Introduction to Algorithms, Third Edition By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein: In order to transform one source string of text x[1..m] to a target string y[1..n], we can perform various transformation operations. Our goal is, given x and y, to produce a series of transformations that change x to y. We use an array ́z—assumed to be large enough to hold all the characters it will need—to hold the intermediate results. Initially, z is empty, and at termination, we should have z[j] = y[j] for j = 1, 2,..., n. We maintain current indices i into x and j into z, and the operations are allowed to alter z and these indices. Initially, i = j = 1. We are required to examine every character in x during the transformation, which means that at the end of the sequence of transformation operations, we must have i = m + 1. We may choose from among six transformation operations: Copy a character from x to z by setting ́z[j] = x[i] and then incrementing both i and j. This operation examines x[i]. Replace a character from x by another character c, by setting z[j] = c, and then incrementing both i and j . This operation examines x[i]. Delete a character from x by incrementing i but leaving j alone. This operation examines x[i]. Insert the character c into z by setting z[j] = c and then incrementing j , but leaving i alone. This operation examines no characters of x. Twiddle (i.e., exchange) the next two characters by copying them from x to z but in the opposite order; we do so by setting z[j] = x[i + 1] and ́z[j + 1] = x[i] and then setting i = i + 2 and j = j + 2. This operation examines x[i] and x[i + 1]. Kill the remainder of x by setting i = m + 1. This operation examines all characters in x that have not yet been examined. This operation, if performed, must be the final operation. a. Given two sequences x[1..m] and y[1..n]
- pattern minor 112d agoCheck if two strings are one 'edit' apart using PythonThe code below is my solution in python 3 to exercise 1.5 in Cracking the Coding Interview. I would appreciate feedback on both the algorithm (improvements to space and time complexity) and/or coding style. I think the time and space complexity of the code below is \$O(n^{2})\$ and \$O(n)\$ respectively. The exercise statement is as follows: Given 2 Strings write a function to check if they are 1 edit away. There are three type of edits 1) Insert a character 2) Remove a character 3) Replace a character I wrote the code in Python 3.5 and confirmed that it passed a small unit test. For this problem I am particularly interested in feedback on where in my code (if at all) I should include more comments. ``` import unittest def is_one_away(first: str, other: str) -> bool: """Given two strings, check if they are one edit away. An edit can be any one of the following. 1) Inserting a character 2) Removing a character 3) Replacing a character""" if len(first) 1: return False elif len(first) - len(other) == 1: for pos in range(len(first)): if first[:pos] + first[pos+1:] == other: return True return False else: num_different_chars = sum(1 for pos in range(len(first)) if first[pos] != other[pos]) return num_different_chars < 2 class MyTest(unittest.TestCase): def test_is_one_away(self): self.assertEqual(is_one_away('pale', 'ale'), True) self.assertEqual(is_one_away('pales', 'pale'), True) self.assertEqual(is_one_away('pale', 'bale'), True) self.assertEqual(is_one_away('pale', 'bake'), False) self.assertEqual(is_one_away('ale', 'pale'), True) self.assertEqual(is_one_away('aale', 'ale'), True) self.assertEqual(is_one_away('aael', 'ale'), False) self.assertEqual(is_one_away('motherinlaw', 'womanhitler'), False) self.assertEqual(is_one_away('motherinlaw','motherinlow'), True) ``
- pattern minor 112d agoMatlab implementation of Needleman-Wunsch algorithmThis code (an implementation of the path finding Needleman-Wunsch algorithm), given two sequences, correctly aligns and calculates the similarity of them. For example, given two sequences: AAA,CCC or AA,CA it correctly aligns them as ``` ---AAA CCC--- ``` and ``` -AA CA- ``` respectively, while returning the similarity. ``` % Matlab implementation of NW algorithm % Written by Joseph Farah % Completed 7/14/16 % Last updated 7/14/16 function similarity = needleman(s1, s2) % defining variables % sequence related variables s1 = strcat('-',s1); s2 = strcat('-',s2); seq1 = s1; seq2 = s2; %seq1 = '-ACTTAGATTACTTG'; %seq2 = '-CCCACTAGATTATTAG'; fseq1 = ''; fseq2 = ''; % weights w_s = 2; w_indel = 1; % temporary costs for the matrix fill h_cost = 0; v_cost = 0; d_cost = 0; % the matrices cost_matrix = zeros(length(seq1),length(seq2)); direction_cell_matrix = cell(length(seq1),length(seq2)); %% Initiliazation and Scoring for x = 1:length(seq1) for y = 1:length(seq2) point = [seq1(x), seq2(y)]; if point(1) == '-' && point(2) == '-' cost_matrix(x,y) = 0; direction_cell_matrix{x,y} = 'n'; continue end % apparently, to get it to go in the right order, moving % horizontally is defined by y-1 and vertically by x-1 if point(1) == '-' % this means it is on the lip -- cost_matrix(x,y) = cost_matrix(x,y-1) + w_indel; % the only way it can go is left! direction_cell_matrix{x,y} = 'h'; continue end if point(2) == '-' % is it vertical? cost_matrix(x,y) = cost_matrix(x-1,y) + w_indel; % the only way it can go is up! direction_cell_matrix{x,y} = 'v'; continue end % if none of the above conditions were met, the point must be in % the body of the array. First thing we have to do is calculate the % cost of the current point. h_cost = cost
- pattern minor 112d agoA String.prototype.diff() implementation (text diff)I just had the idea to develop an algorithm to calculate and highlight the difference between two strings. I know that there are already some libraries to do the job but I just tried to make my own. I really don't know whether this one is similar to the existing ones but it seems to work fine. As of now it's still in it's early stage which means depending on the situation it sometimes produce some multiple consecutive deletion and insertion spans which probably require a second pass to consolidate them under two single delete and insert spans. How it works: It takes two strings like "the quick brown fox jumps over the lazy dog" and "the quick brown coyote jumps over the lazy dog" It will create match a string up until it meets the first mismatching character in-between the two. The index of this character is designated by base index (`bi`). So in this case `matchStr` is "the quick brown " then it will generate two new strings as `longer` ("coyote jumps over the lazy dog") and `shorter` ("fox jumps over the lazy dog"). Now the `Array.prototype.rotate()` generic method, which i had implemented a while back for another project walks in. `Array.prototype.rotate()` can rotate the array in both directions but in this particular case we will only rotate it in one direction. `shorter` stays static and `longer` gets rotated to find the longest overlapping sub-string. ``` XX: "fox jumps over the lazy dog" 00: "coyote jumps over the lazy dog" 01: "oyote jumps over the lazy dogc" 02: "yote jumps over the lazy dogco" 03: "ote jumps over the lazy dogcoy" 04: "te jumps over the lazy dogcoyo" 05: "e jumps over the lazy dogcoyot" 06: " jumps over the lazy dogcoyote" 07: "jumps over the lazy dogcoyote " 08: "umps over the lazy dogcoyote j" 09: "mps over the lazy dogcoyote ju" 10: "ps over the lazy dogcoyote jum" 11: "s over the lazy dogcoyote jump" 12: " over the lazy dogcoyote jumps" 13: "over the lazy dogcoyote jumps " 14: "ver the lazy dogcoyote jumps o" 15: "er the l
- pattern minor 112d agoClustering nodes with Hamming distance < 3I want to speed up the following code, which is from an algorithm class. I get a list of 200000 nodes where every node is a tuple of the length of 24 where every item is either a 1 or 0. These items represent a graph where the distance between them is the hamming distance(number of bits that two nodes are different). I have to run the union find algorithm on them to union all the nodes with distances 2. I wrote the code and it runs unfortunately in 3 minutes. After profiling the code I got ``` 143269498 function calls in 185.193 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 47.656 47.656 183.583 183.583 ex2_big.py:18(cluster_alg) 1 1.090 1.090 185.193 185.193 ex2_big.py:2() 192670 0.165 0.000 0.165 0.000 ex2_big.py:39(union) 11247652 13.388 0.000 13.388 0.000 ex2_big.py:49(find) 110327340 120.790 0.000 122.365 0.000 ex2_big.py:5(all_string_with_diff) 14511524 1.175 0.000 1.175 0.000 ex2_big.py:6() 5000000 0.355 0.000 0.355 0.000 ex2_big.py:61() 596364 0.044 0.000 0.044 0.000 {len} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 2 0.010 0.005 0.010 0.005 {method 'keys' of 'dict' objects} 200001 0.144 0.000 0.144 0.000 {method 'split' of 'str' objects} 200001 0.020 0.000 0.020 0.000 {method 'strip' of 'str' objects} 1 0.000 0.000 0.000 0.000 {open} 993940 0.355 0.000 0.355 0.000 {range} ``` The heavy function is all_string_with_diff The code of this function is very simple: ``` def all_string_with_diff(tup,k): perms = [(j for j in range(i-1,len(tup)-k+i))for i in range(1,k+1)] for x in product(*perms): l = list(tup) for y in x: if l[y] =='1': l[y] = '0' else: l[y] = '1' yield