Recent Entries 10
- pattern minor 112d agoC program that recovers lost JPEG filesMy overall strategy is to load a block size (512bytes) into memory and check if that blocks first 4 bytes match a JPEGS first 4 bytes. If it does then, I'll open a new out file and start writing the bytes into the out file. I'll repeat this for every "lost" JPEG. You can assume that the "lost" JPEGs are stored consecutively. ``` #include #include #include #include int main( int argc, char *argv[] ) { //check for proper usage if ( argc != 2 ) { fprintf(stderr, "Proper usage: ./recover file\n" ); return 1; } //Remember name of file char *infile = argv[1]; //open infile FILE *inptr = fopen( infile, "r"); if ( inptr == NULL ) { fprintf(stderr, "Could not open %s, please try again\n", infile ); return 2; } //INIT block size and buffer int blockSize = 512; unsigned char buffer[blockSize]; //Storage for file name and file char fileBuffer[8]; FILE *outptr = NULL; //count how many images have been found int imageCount = 0; //iterate over the infile 512 bytes at a time while ( fread( buffer, blockSize, 1, inptr ) == 1 ) { //look for jpeg header if ( buffer[0] == 0xff && buffer[1] == 0xd8 && buffer[2] == 0xff && (buffer[3] & 0xf0) == 0xe0 ) { //close output file if its open if( outptr != NULL) { fclose(outptr); } //create the new file name and open that file sprintf( fileBuffer, "%03d.jpg", imageCount ); outptr = fopen( fileBuffer, "w" ); if ( outptr == NULL ) { fprintf( stderr, "Could not create %s", fileBuffer ); return 3; } //INCREMENT Image count imageCount++; } //write to out file if an image was found if( outptr != NULL ) { fwrite( buffer, blockSize, 1, outptr); } } //close last photo fclose(outptr); //close input file fclose(inptr); //success return 0
- pattern minor 112d agoAdding a new node in an AVL treeI was working on a class assignment today and I wrote this code which works perfectly fine. It looks tho kind of has too much 'if statements' in it, and I was thinking that it could be written much better than this. (I'm still a beginner in C). Do you suggest any new techniques that I could use/learn to avoid so many `if` statements and how could you improve this code without making it much longer or complex? The code is a part of much bigger code to add a new node in an AVL tree. ``` if (avlt->root == NULL) { new = &(avlt->root); parent = NULL; } if (avlt->root != NULL) { parent = avlt->root; while (1) { if (parent->value > value) { if (parent->left) parent = parent->left; else { new = &(parent->left); break; } } else if (parent->value right) parent = parent->right; else { new = &(parent->right); break; } } else return; } } ``` These are the structs, maybe they can be useful for you to understand the code: ``` struct AVLNode { struct AVLNode* left; struct AVLNode* right; struct AVLNode* parent; int value; int height; }; struct AVLTree { struct AVLNode* root; int numberOfNodes; }; typedef struct AVLNode AVLNode; typedef struct AVLTree AVLTree; ```
- snippet minor 112d agoIn-place sorting of a binary file using heapsortI have been working on sorting a file in-place with heapsort. I'm very glad it is working. But the time it takes is just too much. So I was wondering on how I could improve it. Here are my requirements: - The file consists of 14B index entries. One index entry consists of 8B of the actual index and 6B of offset. I want to sort these entries after the first 8B. The remaining 6B will need to be dragged along but will not be required for sorting. - The sorting has to occur in-place. No additional files may be created - I need to able to tell the progress. I display a progress bar of the sorting while it runs. - Some amount of RAM may be used to speed up sorting. The amount will be passed as a parameter - You can parallize the sorting. But if you do there will be a limit of threads you can create which would also be passed in a parameter if you need it. - The sorting does not need to be stable. - You may use a different sorting algorithm Before I show you my current code I want to add a few notes: The reason I chose Heapsort over Quicksort is the fact that I want to have a progress bar. I ask you not to debate this requirement. The files may be a lot larger than the available RAM. That's why you can't load the file entierly into RAM. The first 8B of each entry are pretty much random and evenly distributed. My current aproach constits of loading the first index entries into RAM until I reach the limit. And writing that data into the file after the sorting finishes. Should you find any other issues like bad code style feel free to correct me too. I use CSI escape sequences to display the progress bar. If you are not familiar with it just leave the code as it is. It is working as intended anyway and I just included it for the sake of completeness. sortidx.h ``` #ifndef SORTIDX_H #define SORTIDX_H // Includes #include #include #include #include #include #include #include #include "util.h" // Functions // Returns the parent index. constexpr size_t
- pattern moderate 112d agoReading a repetitive file with repetitive codeI made this code for take some data from a .txt file. The text file is a large list of data from temperature with a format that is repetitive and have some lines at the beginning with the station reference. I took the first ten lines but only one time, then I jump to line 12 to take the headers and then I have a mark that tell me if a new year of records is beginning and jump to line 13 and write from the line 13 to 30. Is there a Pythonic way, without a lot of `next(f)`? ``` from itertools import islice with open(rutaEscribir,'w') as w: with open(rutaLeer) as f: for line in islice(f,1,10): w.write(line) f.seek(0) for line in islice(f,12,13): w.write(line) f.seek(0) ano = 1977 for line in f: sena = '&' if sena in line: print('Found') print('Año: ',ano) next(f) next(f) next(f) next(f) next(f) next(f) next(f) next(f) next(f) next(f) next(f) next(f) next(f) for n in islice(f,0,31): w.write(n) ano += 1 ``` Here is an excerpt from the file: ``` & I D E A M - INSTITUTO DE HIDROLOGIA, METEOROLOGIA Y ESTUDIOS AMBIENTALES SISTEMA DE INFORMACION VALORES TOTALES DIARIOS DE RECORRIDO DEL VIENTO (Kms) NACIONAL AMBIENTAL FECHA DE PROCESO : 2016/09/08 ANO 1986 ESTACION : 26185020 MESOPOTAMIA LATITUD 0553 N TIPO EST CO DEPTO ANTIOQUIA FECHA-INSTALACION 1970-JUN LONGITUD 7519 W
- pattern minor 112d agoRegex Matching a Naming ConventionProgram Purpose So, I have a naming convention for certain folders. I want to take in a folder name, and determine if it conforms to the convention. Naming Convention The convention (case insensitive) can be as simple as "Surname, Firstname" It could be as complicated as "Surname (meta), Firstname (meta) & Firstname (meta) ; Surname (meta), Firstname (meta) & Firstname (meta)" It is broken down like so: - A name is made up of a `[Surname]` and 1 or 2 `[Firstnames]`. - Each `[Surname]` and `[Firstname]` can have an optional `[ (metadata)]` after it. - If there are 2 `[Firstnames]`, they are separated by `[ & ]`. - A name can, optionally, have a second set of `[Surname]` & `[Firstnames]`. Separated from the first set by `[ ; ]`. As part of a larger program, I have a class object which handles information relating to a folder. When a folder name is passed to the class, it validates the naming convention. It currently does this via regex but I find regex to be an incredible source of bugs and un-maintainable code. So, is there a better way? Program Flow - Receive folder name - Copy folder name - Regex Match/Replace the copy with `vbNullString` - If the copy is now `vbNullString`, the whole string matched and is valid Validation Code ``` Private Sub AddNamesFromClientFolder(ByVal ClientFolderName As String) '/ Copy folder name '/ Replace copy's regex matching with null string '/ If the copy is now a null string, the whole name matched and is valid '/ Client Folder names should be of the form: '/ "[Surname] ( [misc] ), [Firstname] ( [Misc] ) & [Firstname] ( [Misc] ) ; [Other Surname] ( [Misc] ), [Other Firstname] ( [Misc] ) & [Other Firstname] ( [Misc] )" '/ '/ With minimum form: '/ "[Surname], [Firstname]" Dim IsValid As Boolean If Len(ClientFolderName) > 0 Then Dim validationRegex As RegExp Set validationRegex = New RegExp With validationRegex .Global =
- pattern minor 112d agoValidating FileSystem StructureI have a File System. It is *supposed* to be laid out / used / added to in certain ways. This is a program to report on the *actual* state of the file system versus what it's supposed to be. In particular, pick out unexpected folders and (eventually) validate that Client Folder Names follow a particular convention. Expected File Structure: [Drives] -> [Root folders] -> [Adviser Folders] -> [Type Of Business Folders] -> [Client Folders] Components: `GetRootDrives()` Dictionary of expected Drives (currently 1) `GetRootFolderNames()` Dictionary of expected RootFolders (currently 1) `GetAdviserFolderNames()` Dictionary of expected Adviser Folders `GetBusinessTypeFolderNames()` Dictionary of expected Business Type Folders Code for the above not included. `GetDirectoryMap()` Returns a list of `CLS_Client_Folder_Properties` objects. One for every unexpected folder. One for every Client Folder. Code for `CLS_Client_Folder_Properties` not included Program Flow: - Retrieve lists of expected Drives/Folders - Iterate through folders If the folder is not in the relevant list, create a partial folder_properties object and add to return list If the folder is in the relevant list, iterate through the Sub Folders - Once we get to a folder expected to contain client files, iterate over each sub folder, creating a folder_properties object for each and add to return list - Return the list Concerns This feels very hacky. It's a 6-level nested For/If Loop. There must be a better way. Code ``` Option Explicit Public Function GetLuminDirectoryMap() As Variant '/ All directories should be stored in the form "[Directory Name][Delimiter]" E.G. "SomeDirectory\" '/ Assumed Directory Structure: [Drives] -> '/ [Root Directories] -> '/ [Adviser Directories] -> '/ [Type of Business Directories] -> '
- snippet minor 112d agoReading binary files in XTF formatI have a few thousand binary files that have corresponding structs in them, followed by any number of bytes (this exact number of bytes is given in one of the fields in the structs). I read in data from the binary files to a struct, and assign variables from certain fields in those structs. I was wondering if there is any significant improvement I could make to speed things up, specifically regarding the binary file reading? Note: I'm reading in 256 bytes at a time to a struct, and in that struct is a number that says how many bytes follow until the next struct. So there isn't a static pattern to the data that I can follow. ``` const int STRUCT_HEADER_SIZE = 256; //256 bytes const int FILE_HEADER_SIZE = 1024; //1024 bytes [StructLayout(LayoutKind.Sequential, Pack = 1, CharSet = CharSet.Ansi)] public struct STRUCTHEADER { //Sample fields public ushort MagicNumber; public byte SubChannelNumber; public ushort NumChansToFollow; [MarshalAs(UnmanagedType.ByValArray, SizeConst = 2)] public ushort[] Reserved1; public int NumBytesThisRecord; //This value says how many bytes there are total in the current 'packet' which includes this struct header. public ushort Year; public byte Month; public byte Day; public byte Hour; } FileStream stream1; STRUCTHEADER testStruct = new STRUCTHEADER(); List filePaths = new List(); foreach (string filePath in filePaths) //for each binary file { ReadBinaryFile(filePath); //Read the binary file } public void ReadBinaryFile(string filePath) { try { stream1 = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.None); } catch(Exception ex) { } try { stream1.Position = FILE_HEADER_SIZE; //Start stream after the file header while (stream1.Position (stream1); //read data from binary file into STRUCTHEADER type struct //assigning fields here //ex: int year = testStruct.year; str
- pattern minor 112d agoExtract PNGs embedded in a fileMy approach was to find each occurrence of the PNG's file signature followed by its end of file (EOF) and write the bytes between to a new file whose name is simply a counter starting at zero. ``` use std::env; use std::fs::File; use std::io::prelude::*; const SIGNATURE: [u8; 8] = [137, 80, 78, 71, 13, 10, 26, 10]; const EOF: [u8; 8] = [73, 69, 78, 68, 174, 66, 96, 130]; fn main() { let path = env::args().nth(1).unwrap(); let mut f = File::open(path).unwrap(); let mut buf = Vec::new(); f.read_to_end(&mut buf).unwrap(); let mut iter = buf.iter(); let mut i = 0; while let Some(start) = iter.rposition(|&x| x == SIGNATURE[0]) { if buf[start..].starts_with(&SIGNATURE) { let mut iter = buf[start..].iter(); let mut end = iter.position(|&x| x == 130).unwrap(); loop { let slice = &buf[start..start + end + 1]; if slice.ends_with(&EOF) { let mut f = File::create(i.to_string() + ".png").unwrap(); f.write_all(slice).unwrap(); i += 1; break; } end += match iter.position(|&x| x == 130) { Some(x) => { x + 1 }, None => break, }; } } } } ``` The only reason I used `rposition` instead of `position` for `start` was because the file I was working with had all its PNG's embedded at the end. I had attempted to replace some of the `unwrap`s with `try!`s but realized that the macro needs to be called from a function that returns a `Result`, i.e. can't be used in `fn main()`, and couldn't figure out how best to break up the code. Was wondering if there might be a better way to approach the problem. If not, how can I at least clean up this implementation?
- pattern minor 112d agoParsing 4 Million FilenamesThis is the next step in my project to query Companies House records. The first step, retrieving and validating the company numbers to be targeted, was covered here and I will include that code at the end for context. In this stage, `GetTargetFilenames`, I need to parse the entire series of Corporate Filings (typically 100,000 per month, going back 30 months at this point) and, for each: - Extract the company number from the filename - Check the company number against the ones I am targeting - If it is being targeted, add it to a `Dictionary` of target filenames With this project, I am trying to take particular care with regards to future maintainability (naming, commenting etc.), so any critiques of that aspect would be especially welcome. ``` Option Explicit Public Const COMPANY_NUMBER_COLUMN As Long = 1 Public Const parentFolderPath As String = "S:\Investments\Data\Companies House\Monthly Companies House Downloads\" Public Sub ParseAllCompanyRecords() '/ Data Structure: "Company Numbers", once input, will be stored as strings '/ Company Number: 8-character string, generally 8-digits but sometimes with text prefixes E.G. "OC374102" '/ Folder Path for monthly CH downloads: "S:\Investments\Data\Companies House\Monthly Companies House Downloads\" '/ Filename Strucutre of a Monthly Folder: [parentFolderPath]"Accounts_Monthly_Data-"[Full Month Name][yyyy]"\" - Square Brackets not in filename '/ Filename Structure of an individual filing: [Monthly Folder Path]"Prod224_"[4-character code]"_"[8-character Company Registration Number]"_"[yyyymmdd][.html OR .xml] - Square Brackets not in filename Dim targetCompanyNumbers As Dictionary Set targetCompanyNumbers = GetTargetCompanyNumbers Dim targetFilenames As Dictionary Set targetFilenames = GetTargetFilenames(targetCompanyNumbers) End Sub Public Function GetTargetFilenames(ByRef targetCompanyNumbers As Dictionary) As Dictionary '/ Folder Path for monthly CH downloads: "S:\I
- pattern minor 112d agoPulling PE32 header infoContext Info I coded up a program that maps an executable file(.exe .dll mainly) to the program's memory space which allows for easier extraction of the PE header info. I extract the information by simply casting a structure of a certain header to a memory location of the mapped file. What I'm asking for Readability and general structure. Some of the naming is horrible and I don't know which is the right alternative. The main function looks like it's all over the place for some reason and feels very hard to read. And anything else of course, I'm sure there are tons of bad things that I'm not aware of. pe32inf.c ``` #include #include #include "pe32inf.h" void Terminate(const char *s); LPCTSTR DecodeInput(int argc, char *argv[]); LPVOID Map(LPCTSTR lpFileName); MZ_DOS SetDOSheader(LPVOID lpFileBase); COFF SetCOFFheader(LPVOID lpCOFFoffset); SectionTable SetSectionTable(LPVOID SectionTableOffset, int NumberOfSections); int main(int argc, char *argv[]) { LPCTSTR lpFileName; LPVOID lpFileBase; MZ_DOS DOSheader; //Naming issues with the headers. COFF COFFheader; OptionalHeader OPTheader; //I tried to get around the 2 declarations because 1 is going to be unused. OptionalHeader64 OPTheader64; //However, I can only think of malloc and then there's inconsistency since this will be a PTR. SectionTable SECtable; lpFileName = DecodeInput(argc, argv); lpFileBase = Map(lpFileName); DOSheader = SetDOSheader(lpFileBase); COFFheader = SetCOFFheader(lpFileBase + DOSheader.pe_offset + 0x4); //0x4 To skip PE sig. LPVOID lpOptionalHeader = lpFileBase + DOSheader.pe_offset + 0x4 + sizeof(COFF); WORD magic = *(WORD*)lpOptionalHeader; if (magic == 0x10b) { //PE32 OPTheader = *(OptionalHeader*)lpOptionalHeader; } else if (magic == 0x20b) { //PE32+ OPTheader64 = *(OptionalHeader64*)lpOptionalHeader; } else { Terminate("Unknown PE magic."); } LPVOID SectionTableOffset = lpOptionalHea