patterncModerate
Forming word combinations from file
Viewed 0 times
filecombinationswordfromforming
Problem
I am looking for ideas to optimize the below code for speed. I'm also looking for some good improvement in execution time.
This code reads a text file. The text file has many lines of ASCII text. Then it finds words out of them (space/tab delimiter). It has to neglect any punctuation symbol in the file. Then it uses those words to form a combinations as
```
#include "stdio.h"
#include "string.h"
#include "stdlib.h"
#include "ctype.h"
main(int argc,char *argv[])
{
FILE *fin;
unsigned int i,k,j=0,m=0;
char c;
char ptr, p;
char line_buf[1500];
char word[75];
unsigned int line_no = 1;
char encrypted_pass[120];//Currently it would support MD5,DES,SHA256,SHA512(max - 104 character encrypted password)
char *result;
char pass_key[120];
char num[6] = {'0','2','4','8','\0'};
unsigned int words_found = 0,len=0;
char words[1500][8];
fin = fopen(argv[1], "rt");
if(fin == NULL)
{
//Exit if file not found
printf("Error opening input file,...exiting...\n");
exit(-1);
}
//Read lines from the file until end of file
while(!feof(fin))
{
//Initialize a index to point to current character in line buffer
i=0;
//Initialize a index to current character to be stored in word string
k=0;
//read one line at time, from the file
ptr = fgets(line_buf,1500,fin);
if(ptr == NULL)//End of file has reached,stop reading
{
break;
}
if(line_buf[i] == '\n')//End of line has reached,continue to read next line
{
continue;
}
if(line_no == 1)//This is the line containing encrypted password
{
line_no++;
//Remove any leading spaces,tabs present in the encrypted password line
while(isspace((unsigned char)(line_buf[m])))
{
This code reads a text file. The text file has many lines of ASCII text. Then it finds words out of them (space/tab delimiter). It has to neglect any punctuation symbol in the file. Then it uses those words to form a combinations as
[word1][word2] to pass to the hashing function to get a result. ```
#include "stdio.h"
#include "string.h"
#include "stdlib.h"
#include "ctype.h"
main(int argc,char *argv[])
{
FILE *fin;
unsigned int i,k,j=0,m=0;
char c;
char ptr, p;
char line_buf[1500];
char word[75];
unsigned int line_no = 1;
char encrypted_pass[120];//Currently it would support MD5,DES,SHA256,SHA512(max - 104 character encrypted password)
char *result;
char pass_key[120];
char num[6] = {'0','2','4','8','\0'};
unsigned int words_found = 0,len=0;
char words[1500][8];
fin = fopen(argv[1], "rt");
if(fin == NULL)
{
//Exit if file not found
printf("Error opening input file,...exiting...\n");
exit(-1);
}
//Read lines from the file until end of file
while(!feof(fin))
{
//Initialize a index to point to current character in line buffer
i=0;
//Initialize a index to current character to be stored in word string
k=0;
//read one line at time, from the file
ptr = fgets(line_buf,1500,fin);
if(ptr == NULL)//End of file has reached,stop reading
{
break;
}
if(line_buf[i] == '\n')//End of line has reached,continue to read next line
{
continue;
}
if(line_no == 1)//This is the line containing encrypted password
{
line_no++;
//Remove any leading spaces,tabs present in the encrypted password line
while(isspace((unsigned char)(line_buf[m])))
{
Solution
First, a note for those who might wonder why I'd help somebody who's apparently trying to build a tool to do dictionary attacks on passwords. The reason is pretty simple: first, there are tools that do the job pretty well already, so this is hardly unleashing anything new on the world. Second, this doesn't seem to allow for salt, which means it's pretty much obsolete anyway (and anybody who doesn't use salt deserves to be broken into in any case). Ultimately it comes down to this: I don't think helping out on this will cause any real reduction in security.
While I suppose I can guess at why you've crammed everything into
Other than that, your code seems to have quite a bit of redundancy. For example:
Now, let me ask a question: do you think it's possible for
If you're trying to avoid the "overhead" of calling
You then have code like:
This is quite inefficient. First it walks through the code modifying it in place, then it copies the modified data to a new location. You should be able to improve speed by combining the two:
Looking at a slightly higher level, you seem to care very little about the overall line structure of the input, and are interested primarily (exclusively?) in words. That being the case, you'd probably be better off not reading a line at a time. The result would be simpler, and probably faster as well. For example, you have a pretty big block of code to read the encrypted/hashed password and eliminate leading/trailing white space:
It appears that this entire block of code works our roughly equivalent to:
I've changed the maximum length to 119 based on the buffer length you provided. If you prefer to limit it to 34, that's pretty easy to handle as well. I'd note, however, that not only is this dramatically shorter, but probably faster as well -- for example, your code copies data you don't really want, then walks through all that data to find the length, truncates some of it, walks through to find the length again, and redundantly attempts to truncate the same data again (perhaps you didn't realize that
While I suppose I can guess at why you've crammed everything into
main, I think it's a mistake anyway. On a reasonably modern processor, the overhead of calling a function is extremely minimal (to the point of being essentially impossible to even measure, in most cases). At the same time, the size of cache (especially level 1 cache) is limited, so eliminating redundancy can and will help speed by fitting more of your code into the cache.Other than that, your code seems to have quite a bit of redundancy. For example:
if((line_buf[i] == ' ')&&(
(line_buf[i-1] != '\t')&&
(line_buf[i-1] != ' ')&&
(isalpha((unsigned char)(line_buf[i-1]))))
)Now, let me ask a question: do you think it's possible for
isalpha(x) to return true, if x=='\t' or x==' ' was true? I'm pretty sure the definition of isalpha precludes that possibility, so you can dispense with those tests. All you appear to need is: if(line_buf[i] == ' ') && isalpha((unsigned char)line_buf[i-1]))If you're trying to avoid the "overhead" of calling
isalpha, be aware that in most C libraries it's implemented as a macro that uses a table lookup, so testing for ' ' and '\t' separately is more likely to be a loss than a gain. Even when/if it's a function, modern processors reduce function call overhead to the point that the extra tests are probably a loss anyway.You then have code like:
for (p = word; *p != '\0'; p++)
{
*p = (char) tolower(*p);
}
strcpy(words[j],word);This is quite inefficient. First it walks through the code modifying it in place, then it copies the modified data to a new location. You should be able to improve speed by combining the two:
for (i=0; p[i]; i++)
words[j][i] = tolower(p[i]);
words[j][i] = '\0';Looking at a slightly higher level, you seem to care very little about the overall line structure of the input, and are interested primarily (exclusively?) in words. That being the case, you'd probably be better off not reading a line at a time. The result would be simpler, and probably faster as well. For example, you have a pretty big block of code to read the encrypted/hashed password and eliminate leading/trailing white space:
if(line_no == 1)//This is the line containing encrypted password
{
line_no++;
//Remove any leading spaces,tabs present in the encrypted password line
while(isspace((unsigned char)(line_buf[m])))
{
m++;
}
strcpy(encrypted_pass,&line_buf[m]);
//Remove any trailing spaces,tabs present in the encrypted password line
len = strlen(encrypted_pass);
while ((len > 0) && (isspace((unsigned char)encrypted_pass[len - 1])))
{
encrypted_pass[--len] = 0;
}
len = strlen(encrypted_pass);
if(len == 15)//DES-based algorithm where encryped password is 13 characters
{
if(encrypted_pass[len-2] == '\r')
{
encrypted_pass[len-2] = '\0';//for \r\n newline character
}
}
else if(len == 14)//DES-based algorithm where encryped password is 13 characters
{
if(encrypted_pass[len-1] == '\n')
{
encrypted_pass[len-1] = '\0';//for \n newline character
}
}
else if(len == 36)//MD5-based algorithm where encryped password is 34 characters
{
if(encrypted_pass[len-2] == '\r')
{
encrypted_pass[len-2] = '\0';//for \r\n newline character
}
}
else if(len == 35)//MD5-based algorithm where encryped password is 34 characters
{
if(encrypted_pass[len-1] == '\n')
{
encrypted_pass[len-1] = '\0';//for \n newline character
}
}It appears that this entire block of code works our roughly equivalent to:
fscanf(" %119[^ \r\n]", encrypted_pass);I've changed the maximum length to 119 based on the buffer length you provided. If you prefer to limit it to 34, that's pretty easy to handle as well. I'd note, however, that not only is this dramatically shorter, but probably faster as well -- for example, your code copies data you don't really want, then walks through all that data to find the length, truncates some of it, walks through to find the length again, and redundantly attempts to truncate the same data again (perhaps you didn't realize that
'\r' and '\n' are classified as white space, at least in the "C" locale?) Using fscanf, you read (only) the datCode Snippets
if((line_buf[i] == ' ')&&(
(line_buf[i-1] != '\t')&&
(line_buf[i-1] != ' ')&&
(isalpha((unsigned char)(line_buf[i-1]))))
)if(line_buf[i] == ' ') && isalpha((unsigned char)line_buf[i-1]))for (p = word; *p != '\0'; p++)
{
*p = (char) tolower(*p);
}
strcpy(words[j],word);for (i=0; p[i]; i++)
words[j][i] = tolower(p[i]);
words[j][i] = '\0';if(line_no == 1)//This is the line containing encrypted password
{
line_no++;
//Remove any leading spaces,tabs present in the encrypted password line
while(isspace((unsigned char)(line_buf[m])))
{
m++;
}
strcpy(encrypted_pass,&line_buf[m]);
//Remove any trailing spaces,tabs present in the encrypted password line
len = strlen(encrypted_pass);
while ((len > 0) && (isspace((unsigned char)encrypted_pass[len - 1])))
{
encrypted_pass[--len] = 0;
}
len = strlen(encrypted_pass);
if(len == 15)//DES-based algorithm where encryped password is 13 characters
{
if(encrypted_pass[len-2] == '\r')
{
encrypted_pass[len-2] = '\0';//for \r\n newline character
}
}
else if(len == 14)//DES-based algorithm where encryped password is 13 characters
{
if(encrypted_pass[len-1] == '\n')
{
encrypted_pass[len-1] = '\0';//for \n newline character
}
}
else if(len == 36)//MD5-based algorithm where encryped password is 34 characters
{
if(encrypted_pass[len-2] == '\r')
{
encrypted_pass[len-2] = '\0';//for \r\n newline character
}
}
else if(len == 35)//MD5-based algorithm where encryped password is 34 characters
{
if(encrypted_pass[len-1] == '\n')
{
encrypted_pass[len-1] = '\0';//for \n newline character
}
}Context
StackExchange Code Review Q#3525, answer score: 11
Revisions (0)
No revisions yet.