Recent Entries 10
- pattern minor 112d agoMapping .csv rows to a DictionaryI have a .csv file in which each row represents an association between a software application and a web server. My goal is to fetch a `List` by application, so I'm mapping the .csv to a `Dictionary>`. I'm using CsvHelper to read the .csv file and map it to objects. .csv format: ``` Family | Environment | Name | Application 01 | Dev | WEBD01 | application1 01 | Production | WEBP01 | application1 02 | Dev | WEBD02 | application2 ``` Server class: ``` public class Server { public string Name { get; set; } public string Family { get; set; } public string Environment { get; set; } } ``` Mapping: ``` protected override void RefreshData() { // _serverDictionary is a class-level Dictionary> _serverDictionary.Clear(); using (TextReader textReader = File.OpenText(_csvFile.FullName)) using (CsvReader reader = new CsvReader(textReader)) { while (reader.Read()) { string applicationName = reader.GetField("Application"); Server server = reader.GetRecord(); if (_serverDictionary.ContainsKey(applicationName)) _serverDictionary[applicationName].Add(server); else _serverDictionary.Add(applicationName, new List { server }); } } } ``` Fetching: ``` public IEnumerable GetByApplication(string applicationName) { List servers = new List(); _appServers.Where(pair => pair.Key.EqualsIgnoreCase(applicationName)) .ForEach(pair => servers.AddRange(pair.Value)); return servers; } ``` Concerns: - Is a `Dictionary` the best data structure to use in the first place? - I feel that both the mapping code and the fetch code could be more efficient by better utilizing LINQ projection.
- pattern minor 112d agoExtracting data from database to CSVI have a feature for users to be able to export the database information. But if the user choose all options to export it takes 1 minute or more to download the .csv file. I'm only including 1 part of the if statement, where I'm pulling in all the data. here it is: ``` function exportTheData() { //get the data for data array if(exportVolumeData == 1) { for(j=0; j<plantData1.length; j++) { i = plantData["MergeKey_lvl00"].indexOf(plantData1["MergeKey_lvl00"][j]); data.push(plantData["PlantName"][i]); if(statesExport == 1) { countyindex = counties["CountyId"].indexOf(plantData["LocationId"][i]); stateid = counties["StateId"][countyindex]; statename = states["StateName"][states["StateId"].indexOf(stateid)]; data.push(statename); } if(countyExport == 1) { countyindex = counties["CountyId"].indexOf(plantData["LocationId"][i]); countyname = counties["CountyName"][countyindex]; data.push(countyname); } if(basinsExport == 1) { countyindex = counties["CountyId"].indexOf(plantData["LocationId"][i]); subbasinid = counties["SubBasinId"][countyindex]; subbasinindex = basinSub["SubBasinId"].indexOf(subbasinid); basinid = basinSub["BasinId"][subbasinindex]; basinindex = basin["BasinId"].indexOf(basinid); basinname = basin["BasinName"][basinindex]; data.push(basinname); } if(subBasinsExport == 1) { countyindex = counties["CountyId"].indexOf(plantData["LocationId"][i]); subbasinid = counties["SubBasinId"][countyindex]; subbasinindex = basinSub["SubBasinId"].ind
- snippet minor 112d agoPython command-line program to convert genomic data fileBackground: I have written this code to transforme a .csv file exported from a software called Geneious containing SNPs and concatenate them into a DNA sequence. So basically take fields from .csv file to make strings. The code itself is just a bunch of functions that perform small tasks, some functions call others and in the end the result is printed to a file. I used argparse because this is going to be a command line tool, and is useful to have obligatory arguments and default values for the others. I am inexperienced in coding and have noone to review my code. I feel that needing to call each argument for each function is really awkward. My questions: Is this the best structure? Is creating a "chain" of functions like this the Best practice? Code ``` import argparse import collections import csv def cleaning(file_as_list, snp, names): """From input file get the SNPS.""" with open(file_as_list, 'r') as input_file: reader = csv.reader(input_file) file = list(reader) have_SNP = [x for x in file if x[snp] == '1'] for i in range(len(have_SNP)): mult_names = have_SNP[i][names].replace(':', ',').replace(', ', ',') sep_names = mult_names.split(',') only_names = [x for x in sep_names if ' ' not in x] have_SNP[i][names] = only_names return have_SNP def reference_dic(file_as_list, snp, names, col_ref, pos): """Creates the dict with all positions and reference nucleotides.""" have_SNP = cleaning(file_as_list, snp, names) ref_dic = {} for i in have_SNP: ref_dic[int(i[pos].replace(',', ''))] = i[col_ref] return ref_dic def pos_list(file_as_list, snp, names, col_ref, pos): """Creates a list with all the ehxisting positions in reference.""" ref_dic = reference_dic(file_as_list, snp, names, col_ref, pos) list_pos = [] for key in ref_dic: list_pos.append(key) sorted_pos_lis = sorted(list_pos) return sorted_pos_lis def genomes_list(file_as_list,
- snippet minor 112d agoConvert XML to CSVI'm pretty sure this code can be optimized, but I'm not talented enough in Linq to do it myself. Here's what I'm trying to do: I have an XML file that needs to be converted into a .csv file. The XML looks like this: ``` Super Mario Bros 14 29,99 -No Comment- N/A Nintendo Video Games 1985 001 The Legend of Zelda 12 34,99 -No Comment- N/A Nintendo Video Games 1986 002 ``` (There are many more Items in the list, but they are all the same.) The code I'm currently using is working as intended, here it is: ``` public void fileConvert_XMLToCSV() { //This method converts an xml file into a .csv file XDocument xDocument = XDocument.Load(FilePath_CSVToXML); StringBuilder dataToBeWritten = new StringBuilder(); var results = xDocument.Descendants("Item").Select(x => new { title = (string)x.Element("Name"), amount = (string)x.Element("Count"), price = (string)x.Element("Price"), year = (string)x.Element("Year"), productID = (string)x.Element("ProductID") }).ToList(); for (int i = 0; i < results.Count; i++) { string tempTitle = results[i].title; string tempAmount = results[i].amount; string tempPrice = results[i].price; string tempYear = results[i].year; string tempID = results[i].productID; dataToBeWritten.Append(tempYear); dataToBeWritten.Append(";"); dataToBeWritten.Append(tempTitle); dataToBeWritten.Append(";"); dataToBeWritten.Append(tempID); dataToBeWritten.Append(";"); dataToBeWritten.Append(tempAmount); dataToBeWritten.Append(";"); dataToBeWritten.Append(tempPrice); dataToBeWritten.Append(";"); dataToBeWritten.Append(0); dataToBeWritten.Append(";"); dataToBeWritten.Append(0); dataToBeWritt
- snippet minor 112d agoValidating a CSV list of contacts and convert it to JSONI've written a class that takes a file, validates the formatting of the lines from an input file and writes the set of valid lines to an output file. Each line of the file should have a first name, last name, phone number, color, and zip code. A zip code is valid if it has only 5 characters, a phone number can have only 10 digits (in addition to dashes/parentheses in appropriate places). The accepted formats of each line of the input file are the following: ``` Lastname, Firstname, (703)-742-0996, Blue, 10013 Firstname Lastname, Red, 11237, 703 955 0373 Firstname, Lastname, 10013, 646 111 0101, Green ``` The program needs to write a JSON object with all of the valid lines from the input file in a list sorted in ascending alphabetical order by (last name, first name). These are the test cases I ran with it as well as the JSON output. I think I've identified all of the edge cases with the tests but I could have missed something. This code should exemplify good design choices and extensibility and should be production quality. Should anything be added/removed from the solution to meet these requirements? Also, any tests that would make the code fail are welcome. The code for the solution is below: __main__.py ``` import sys from file_formatter import FileFormatter if __name__ == "__main__": formatter = FileFormatter(sys.argv[-1],"result.out") formatter.parse_file() ``` file_formatter.py ``` """ file_formatter module The class contained in this module validates a CSV file based on a set of internally specified accepted formats and generates a JSON file containing normalized forms of the valid lines from the CSV file. Example: The class in this module can be imported and passed an initial value for the input data file from the command line like this: $ python example_program.py name_of_data_file.in Classes: FileFormatter: Takes an input file and output its valid lines to a result file. """ import json class FileFormatter
- pattern minor 112d agoFootball game simulationI'm working on a text-based football simulation game along the lines of Football Simulator. Below is a subset of my total code, specifically the functions used to create a new player. I also have functions (not shown) to create a new coach, create the teams, create the weekly schedules, etc. I'm hoping to be able to use the feedback I get here to improve those sections as well. Before anyone suggests storing the data in a database, I started out that way, but ending up opting for dictionaries/lists instead for several reasons, so please try to look past that. Anyway, here goes. The biggest thing I'm struggling with is having to pass a list (`person_data`) of all the parameters needed by `create_new_player`. I don't feel it's efficient to have to build up a list before calling the function, pass it, then have to deconstruct it inside the function. I know using global variables isn't recommended, so I'm not sure if there are any other options. I have to do similar things (albeit using a list of different parameters) for my other functions. I appreciate all feedback you may have. EDIT: I made a mistake in my original post, I use player_id_index to keep track of how many players have been created, so that the next time I call `create_new_player` it starts where the previous one left off, even though it's not shown below. ``` # python3 import csv from random import choice, randint, gauss def create_names_first_data(): ''' create a list of all possible first names using text file as source data ''' first_names = [] filename_first = 'resources/names_first.txt' with open(filename_first, 'r') as file_to_open: for line in file_to_open: data = line.split() new_name = data[0] first_names.append(new_name) return first_names def create_names_last_data(): ''' create a list of all possible last names using text file as source data ''' last_names = [] filename_last = 'resources/names_la
- pattern minor 112d agoPython code to split csv into smaller csvs, not splitting IDsI have Python code that splits a given large csv into smaller csvs. This large CSV has an ID column (column 1), which consecutive entries in the csv can share. The large csv might look something like this: ``` sfsddf8sdf8, 123, -234, dfsdfe, fsefsddfe sfsddf8sdf8, 754, 464, sdfgdg, QFdgdfgdr sfsddf8sdf8, 485, 469, mgyhjd, brgfgrdfg sfsddf8sdf8, 274, -234, dnthfh, jyfhghfth sfsddf8sdf8, 954, -145, lihgyb, fthgfhthj powedfnsk93, 257, -139, sdfsfs, sdfsdfsdf powedfnsk93, 284, -126, sdgdgr, sdagssdff powedfnsk93, 257, -139, srfgfr, sdffffsss erfsfeeeeef, 978, 677, dfgdrg, ssdttnmmm etc... ``` The IDs are not sorted alphabetically in the input file, but consecutive identical IDs are grouped together. My code does not split the IDs into different csvs, ensuring that each id appears in only one output csv. My code is: ``` import pandas as pd import os def iterateIDs(file): #create chunks based on tripID csv_reader = pd.read_csv(file, iterator=True, chunksize=1, header=None) first_chunk = csv_reader.get_chunk() id = first_chunk.iloc[0,0] chunk = pd.DataFrame(first_chunk) for l in csv_reader: if id == l.iloc[0,0] or len(chunk) 100000000: #if file too big, split into seperate chunks chunk_count = 1 chunk_Iterate = iterateIDs("TripRecordsReportWaypoints.csv") for chunk in chunk_Iterate: chunk.to_csv('SmallWaypoints_{}.csv'.format(chunk_count),header=None,index=None) chunk_count = chunk_count+1 ``` However, this code runs very slowly. I tested it on a small file, 284 MB and 3.5 million rows, however it took over an hour to run. Is there any way I can achieve this result quicker? I don't mind if it's outside of python.
- pattern minor 112d agoRead CSV into 2D float array in GoDepending on how you count, this is my first Go program. I'm trying to read a CSV into a two-dimensional array of some numeric type, and then print it out. (I want to use this to read "edge weights" to build a Graph; that is my next mission, unrelated to the code below.) So the code below works. But particularly as I'm new to the language, I'd like to know: - Are there shorter ways to accomplish the same functionality? - Any ways to make this code more idiomatic? - float64 feels arbitrary, but Go is statically typed -- any way I can make this more dynamic, allowing other types? Here 'tis; rip her apart if you want! Trying to learn. ``` package csfloat import ( "encoding/csv" "fmt" "os" "strconv" "strings" ) // make2dFloatArray makes a new 2d array of float64s based on the // rowCount and colCount provided as arguments func make2dFloatArray(rowCount int, colCount int) [][]float64 { values := make([][]float64, rowCount) for rowIndex := range values { values[rowIndex] = make([]float64, colCount) } return values } // stringValuesToFloats converts a 2d array of strings into a 2d array // of float64s. func stringValuesToFloats(stringValues [][]string) ([][]float64, error) { values := make2dFloatArray(len(stringValues), len(stringValues[0])) for rowIndex, _ := range values { for colIndex, _ := range values[rowIndex] { var err error = nil trimString := strings.TrimSpace(stringValues[rowIndex][colIndex]) values[rowIndex][colIndex], err = strconv.ParseFloat(trimString, 64) if err != nil { fmt.Println(err) return values, err } } } return values, nil } // ReadFromCsv will read the csv file at filePath and return its // contents as a 2d array of floats func ReadFromCsv(filePath string) ([][]float64, error) { file, err := os.Open(filePath) if err != nil {
- pattern minor 112d agoDelete lines from a CSV file that contain fields listed in a text fileI wrote a powershell script to compare words from a text-file with a csv-column. If the word in the column matches, the line is deleted. ``` $reader = [System.IO.File]::OpenText($fc_file.Text) try { for() { $line = $reader.ReadLine() if ($line -eq $null) { break } if ($line -eq "") { break } # process the line $fc_suchfeld = $fc_ComboBox.Text $tempstorage = $scriptPath + "\temp\temp.csv" Import-Csv $tempfile -Delimiter $delimeter -Encoding $char | where {$_.$fc_suchfeld -notmatch [regex]::Escape($line)} | Export-Csv $tempstorage -Delimiter $delimeter -Encoding $char -notypeinfo Remove-Item $tempfile Rename-Item $tempstorage $tempfile_ext } } finally { $reader.Close() } ``` My code works great, but it is very slow, due to saving and copying the csv file after every line. Is there a way to improve it?
- pattern minor 112d agoCounting weather events for visualization using GnuplotI wrote this code to analyze ~800 MB of weather data from the US. I am planning on visualizing the data with Gnuplot and Gimp. I have already made the images and a gif file. This code runs somewhat fast, although I do not suppose it is efficent. How can I improve it? The idea behind my code is: - find keywords in the lines, such as state names and their weather conditions - save the weather condition in a map according to the state and year - get the yearly weather conditions to a file named after the analyzed state main code: ``` int main() { std::string data[18] = {"stormdata_1996.csv", ... // I left out the files from her on purpose "stormdata_2013.csv"}; for(int j=0;j weather; std::ifstream fin(data[j].c_str(),std::ios::in); getWeather(weather, fin, city[i]); fin.close(); std::ofstream fout(city[i].c_str(),std::ios::app); outPut(weather, fout, city[i],j+1996); fout.close(); weather.clear(); } } return 0; } ``` functions: ``` void getWeather(std::map& weather, std::ifstream& fin, std::string& city){ std::string line; while(!fin.eof()){ getline(fin, line); if(line.find(city) != std::string::npos){ if(line.find("Drought") != std::string::npos){ weather["Drought"]++; }else if(line.find("Flood") != std::string::npos){ weather["Flood"]++; }else if(line.find("Heavy Snow") != std::string::npos){ weather["Heavy Snow"]++; ... // I left out some else if statements from here to shorten the post }else if(line.find("High Surf") != std::string::npos){ weather["High Surf"]++; } } } } void outPut(std::map& weather, std::ofstream& fout,std::string& city, int date){ fout << city << "-" << date << std::endl; for(auto i:weather){