Recent Entries 10
- pattern minor 112d agoCompress svg files in PHPI wrote something to "compress" svg files. The svg files I am using often have comments and empty `` tags, and I want to remove them. My main goal is not the speed of the compression, but the size of the compressed svg file. Here is an example svg file: https://image.flaticon.com/icons/svg/222/222436.svg And here is the code I am using: ``` public function compress($svg) { $svg = preg_replace('//', '', $svg); $svg = preg_replace('/[\n\r\s]*/', '', $svg); $svg = preg_replace('/\n/', ' ', $svg); $svg = preg_replace('/\t/', ' ', $svg); $svg = preg_replace('/\s\s+/', ' ', $svg); $svg = str_replace('> <', $svg); $svg = str_replace(';"', '"', $svg); return $svg; } ``` - Do you see any dangers here, perhaps this could ruin some svg files? - Is there something I could do to compress the file even more? - Is there any way to speed this up?
- pattern minor 112d agoHelper functions to perform simple token replacements in a stringThis script provides function T9r, which has some methods to detect, parse and replace tokens "{{ some_token }}" in a string with properties on an object. My use case, to have "composable" json objects or strings, (used for configs), that can be populated with real values at runtime. It's been a while since i worked in JS and i just wanted to get a gauge on my style, and general readability of my code, as well as see if there are any improvements i could make to the script. Example Usage: ``` var str = '{{ App_Root }}/apple/sauce/{{ OK }}'; var context = { app_root: ':-)', Ok: '' }; T9r.replaceTokens(str,context); ``` Module: ``` // T9r ~ "T{oken Parse}(9)r" const T9r = function(){}; /** A helper function to process and replace all tokens within a string. */ T9r.replaceTokens = function(Str, Context){ return T9r.parseTokens( T9r.extractTokens(Str), Context, Str ); }; /** Will match all tokens "{{ some_token }}, {{some_token}}, {{ Some_ToKen }}" within a body of text, extracting all matches ( Tokens ), while pruning each match, removing the opening and closing curly brakets, as well as strip out any whitespace, so we have text that can be used to lookup props on an object. Will return an empty array, if no tokens are found. */ T9r.extractTokens = function( str, pattern){ pattern = pattern ? pattern : /\{([^}]+)\}/ig; var matches = str.match(pattern); if( ! matches ) return []; return T9r.pruneTokens(matches); }; /** Returns the count of Tokens that exist within a string body. */ T9r.tokenCount = function( str, pattern){ pattern = pattern ? pattern : /\{([^}]+)\}/ig; return str.match(pattern).length; }; /** Removes the leading and trailing wrapping-chars from a token match, as well as strip out all whitespace. */ T9r.pruneTokens = function(Tokens){ Tokens.forEach(function( token, idx, tokens ){ tokens[idx] = token.slice(2,-1).replace(/\s+/g,''); });
- pattern minor 112d agoMatching a time interval using regexWhat I needed to do was check whether a given string matches a certain pattern. The pattern is this: ``` 00:00:00,000 --> 00:00:00,000 ``` Things to keep in mind: The 0s can be numbers from 0 to 9. The pattern must be alone in a single line; the line has to only consist of the pattern. I came up with this: ``` "^(\\d\\d):(\\d\\d):(\\d\\d),(\\d\\d\\d) --> (\\d\\d):(\\d\\d):(\\d\\d),(\\d\\d\\d)" ``` I tested it a few times and strings that respect the pattern return `true`, the ones that don't, return `false`, as they should. Here's a test case: ``` import java.util.regex.Pattern; public class TestCaseRegex { public static void main(String[] args) { //will return true String testOne = "00:01:23,846 --> 00:01:26,212"; //will return false, there's a letter where a number should be String testTwo = "00:01:23,84a --> 00:01:21,221"; //will return true String testThree = "00:05:54,846 --> 00:01:16,450"; //will return false. The string doesn't match the format. String testFour = "00:05:54,6 --> 00:0116,450"; System.out.println(patternMatch(testOne)); System.out.println(patternMatch(testTwo)); System.out.println(patternMatch(testThree)); System.out.println(patternMatch(testFour)); } public static boolean patternMatch(String str) { Pattern p = Pattern.compile("^(\\d\\d):(\\d\\d):(\\d\\d),(\\d\\d\\d) " + "--> (\\d\\d):(\\d\\d):(\\d\\d),(\\d\\d\\d)"); return p.matcher(str).matches(); } } ``` As I'm very new to regex, I'm wondering if this is the most efficient/correct way to accomplish this.
- pattern minor 112d agoExtract title with regexI'm quite new to Go and I feel like this code could be smaller and cleaner. I would love any suggestions and/or hints about mistakes and conventional go things! ``` func getBookTitle(client *http.Client) { rsp, err := client.Get(bookSite) if err != nil { panic(err) } html, _ := ioutil.ReadAll(rsp.Body) //Get div with title regTitle := regexp.MustCompile("()[\n+\\s]*()[a-zA-Z–\\-\n\\s:]*()[\n+\\sdd]*()") //remove linebreaks regex regFormatTitle := regexp.MustCompile("[\r\n]*") //apply regex title := regFormatTitle.ReplaceAllString(string(regTitle.Find(html)),"") //Remove remove html tags and remove whitespaces title = strings.TrimSpace(title[strings.Index(title,"")+len(""):strings.Index(title,"")]) fmt.Printf("Book title:%s\n",title) rsp.Body.Close() } ```
- pattern minor 112d agoPHP Regex Input SanitizationThis helper class is supposed to validate the user input from a text based wifi login form. The entire process is basically this: I use the `create_voucher` function of this API passing `voucher_duration` to set the amount of time for which the voucher is going to be valid and `$this->clean['name']." ".$this->clean['surname']` as a note so I can later identify which device belongs to which user. The code returned by `create_voucher` is then sent to the passed e-mail-adress using php-mailer so the user can login and use the wifi. I am particularly unhappy with my error handling and would like to know if you see any obvious ways to break the code or inject malicious code. ``` class sanitizer { public $clean; private $post=null; private $reg_email ='/^\S+@\S+\.\S+$/'; //Just some basic checking private $reg_name = '/^[\'\p{L} -]+[\n]?$/im'; //Allowing some wierd names private $reg_number='/^[[:digit:]]*$/im'; //A single integer no fuzz public function __construct($post){ $this->post=$post; } private function sanitize($key, $regex){ if (preg_match($regex, $this->post[$key])) { $this->clean[$key]= $this->post[$key]; } else { $this->clean[$key]=null; } } public function clean_up(){ if (isset($this->post['smt_sent'])) { if ($this->post['smt_sent']==1) { $this->sanitize('name', $this->reg_name); $this->sanitize('surname', $this->reg_name); $this->sanitize('voucher_duration', $this->reg_number); if ($this->post['voucher_duration'] > 0 && ($this->post['voucher_duration']/60 > 48)) { $this->clean['duration']=null; } $this->sanitize('email_own', $this->reg_email); $this->clean['smt_received']=1; $this->clean['error'] = false; //No errors yet foreach ($this->clean as $field) { //Lo
- pattern minor 112d agoRemove specific CSS rules from WYSIWYG textI have created a working function for cleaning some specific styles within the text from a WYSIWYG editor. ``` var rulesText = textArea.value; var selector = "a:link, span.MsoHyperlink"; var pattern = new RegExp(selector.replace(/\./g, "\\.") + "\\s*{[^}]*?}", "gim"); rulesText = rulesText.replace(pattern,""); var nextSelector = "a:visited, span.MsoHyperlinkFollowed"; var nextPattern = new RegExp(nextSelector.replace(/\./g, "\\.") + "\\s*{[^}]*?}", "gim"); rulesText = rulesText.replace(nextPattern,""); textArea.value = rulesText; console.log("FIRED!"); ``` Here are the styles being cleaned: ``` a:link, span.MsoHyperlink {mso-style-unhide:no; color:blue; text-decoration:underline; text-underline:single;} a:visited, span.MsoHyperlinkFollowed {mso-style-noshow:yes; mso-style-priority:99; color:purple; mso-themecolor:followedhyperlink; text-decoration:underline; text-underline:single;} ``` This solution seems a bit wonky to me... And I feel like there is definitely more room for elegance. How can improve on this, possibly combine the two RegExp instances, and make sure that multiple instances of this are being removed?
- pattern minor 112d agoBeginner code for MadLibs gameI am working through Automate the Boring Stuff. How can I make this code cleaner/better? ``` #madlibs.py : ABS Chapter 8 Project #Make mad libs using madlibtemplate.txt file that will prompt user for parts of speech #Will return new file with replacements called madlib.txt import re, os #Create template.txt file if doesn't exist print("Let's play MadLibs! \n Note: Your current working directory is ", os.getcwd()) if os.path.exists('madlibtemplate.txt') == False: print("No template file found. Using default 'madlibtemplate.txt'") template = open('madlibtemplate.txt', 'w+') template.write("""The ADJECTIVE girl walked to the NOUN and then VERB. A nearby NOUN was upset by this.""") else: print("Using your 'madlibtemplate.txt' file:") template = open('madlibtemplate.txt', 'r') #Extract text from template file & turn into list of words template.seek(0) #reset pointer to beginning of file text = (template.read()).lower() #make lowercase to match parts of speech wordregex = re.compile(r'\w+|\s|\S') words = re.findall(wordregex, text) #Find and replace parts of speech for i, word in enumerate(words): for part_of_speech in ['adjective', 'noun', 'adverb', 'verb', 'interjection']: if word == part_of_speech: print('Enter a(n) %s ' % part_of_speech) words[i] = input() madlib_text_unformatted = ''.join(words) #Capitalize first letter in each sentence (Function found on stackoverflow) def sentence_case(text): # Split into sentences. Therefore, find all text that ends # with punctuation followed by white space or end of string. sentences = re.findall('[^.!?]+[.!?](?:\s|\Z)', text) # Capitalize the first letter of each sentence sentences = [x[0].upper() + x[1:] for x in sentences] # Combine sentences return ''.join(sentences) madlib_text = sentence_case(madlib_text_unformatted) #Print result and save to a new file print('Here we go... \n') print(madlib_text) madlib = open('madlib
- pattern minor 112d agoRemove all characters exceptMy code takes a string and replaces all characters which are not: - English letters - Numbers - `, / -` I have tested it and it seems to generally work well enough. But it may have some catastrophic bug in it and/or can be simplified. ``` x <- "dog/John is a cutting-edge pilot^¢„þ" gsub("[^a-zA-Z0-9,-:space:]+", " ", x, perl = TRUE) "dog/John is a cutting-edge pilot " ```
- pattern minor 112d agoReplacing MySQL's AUTO_INCREMENT with Postgres' SERIALI am making changes to a Python script that converts MySQL scripts to PostgreSQL, and I want to replace strings such as `id INTEGER NOT NULL AUTO_INCREMENT` with `id SERIAL NOT NULL`. This is the code I got to work: ``` import re line = 'id INTEGER(11) NOT NULL AUTO_INCREMENT,' numeric_types = ['(BIG|MEDIUM|SMALL|TINY)*INT(EGER)*(\(.*?\))*', 'DEC(IMAL)*(\(.*?\))*', 'NUMERIC(\(.*\))*', 'FIXED(\(.*\))*', 'FLOAT(\(.*\))*', 'DOUBLE( PRECISION)*(\(.*?\))*', 'REAL(\(.*\))*', 'BIT', 'BOOL(EAN)*'] for i in range(len(numeric_types)): type = numeric_types[i] if (re.search(type, line)): line = re.sub(type, "SERIAL", line).replace(" AUTO_INCREMENT", "") print line break ``` Notes: - `line` will be a column from a CREATE TABLE statement inputted by the user - I could probably join all regular expressions into one using `OR`s, but I do not know if that would be a good practice or not
- pattern minor 112d agoExtract unique terms from a PANDAS seriesBackground I have process tons of DataFrames with shapes of ~230 columns x ~2000-50000+ rows. Here is an extremely simplified example; ``` numbers colors 0 0.03620894806802 1xYellow ; 2xRed 1 0.7641262315308163 2xYellow ; 1xOrange 2 0.5607449770945651 3xYellow ; 2xGreen 3 0.6714547913365702 1xYellow ; 1xRed 4 0.8646309438322237 2xYellow ; 1xRed ``` Problem I need to break the `colors` column down to a set that looks like this; `{'Green', 'Orange', 'Red', 'Yellow'}`. The example code below can do this but it is painfully slow on huge DataFrames. `import re import pandas as pd import numpy as np # Generating example data color = ["1xYellow ; 2xRed ", "2xYellow ; 1xOrange ", "3xYellow ; 2xGreen ", "1xYellow ; 1xRed ", "2xYellow ; 1xRed "] numbers = np.random.rand(len(color)) ex_df = pd.DataFrame(np.array([numbers,color]).T, columns = ["numbers","colors"]) # Compile the regex to apply with findall rx = re.compile("x(\w+)\s") just_colors = ex_df.colors.apply(rx.findall) # Below is the painfully slow operation that needs optimization. present_colors = set(sum(just_colors,[])) ` Question Is there a better method out there for pulling unique terms out of a pandas series?