patterncMinor
Counting letters as quickly as possible
Viewed 0 times
quicklypossiblecountingletters
Problem
I received a task, of taking a known text file (Linux dictionary), use threads to count the different letters in it, and present the results in an array.
The code doesn't have to be pretty, elegant, clean or short. The only demand is to count the letters correctly and do it as fast as possible. The file will be timed upon execution by using the time command, so every millisecond counts.
I know using a single thread may prove to be more efficient, but multithreading is a requirement here...
What I did so far:
I am compiling the file with:
and running it as sudo su, hoping it will be higher priority.
I will be very happy if you could review my code and tell me where you think I can improve the speed (no matter how negligible you think it is).
Also, if you think something I did is pointless, please feel free to throw it in my face as well.
```
#include
#include
#include
#include
#include
#include
#include
#include
/ declare counter arrays for the threads - global is quicker /
int counter[256];
int counter2[256];
int counter3[256];
int counter4[256];
/ declare a global pointer for the mapped file /
unsigned char* mapped_file;
/ declare variables for thread ids /
pthread_t pthread_id_1;
pthread_t pthread_id_2;
pthread_t pthread_id_3;
pthread_t pthread_id_4;
/* there are 4 functions that are declared independently, for each thread.
it can be smarter, but we are looking for quick, not smart */
void thread_func1(void NotInUse)
{
/* using register to quicken processing.
234712 is a quarter of the size of the file */
The code doesn't have to be pretty, elegant, clean or short. The only demand is to count the letters correctly and do it as fast as possible. The file will be timed upon execution by using the time command, so every millisecond counts.
I know using a single thread may prove to be more efficient, but multithreading is a requirement here...
What I did so far:
- memory mapped the file so I can work with it from memory.
- used 4 threads so each of the 4 cores can do part of the job.
- avoided race conditions in design, so locks won't have to be used and spend time.
- tried to avoid calculations and unnecessary function calls.
- tried to get higher priority for my process.
- used register on variables whenever possible.
I am compiling the file with:
gcc count.c -o count -lpthread -Ofastand running it as sudo su, hoping it will be higher priority.
I will be very happy if you could review my code and tell me where you think I can improve the speed (no matter how negligible you think it is).
Also, if you think something I did is pointless, please feel free to throw it in my face as well.
```
#include
#include
#include
#include
#include
#include
#include
#include
/ declare counter arrays for the threads - global is quicker /
int counter[256];
int counter2[256];
int counter3[256];
int counter4[256];
/ declare a global pointer for the mapped file /
unsigned char* mapped_file;
/ declare variables for thread ids /
pthread_t pthread_id_1;
pthread_t pthread_id_2;
pthread_t pthread_id_3;
pthread_t pthread_id_4;
/* there are 4 functions that are declared independently, for each thread.
it can be smarter, but we are looking for quick, not smart */
void thread_func1(void NotInUse)
{
/* using register to quicken processing.
234712 is a quarter of the size of the file */
Solution
Disclaimer: only an autopsy profiling may give an ultimate answer about performance.
That said, there is no miracles; the file is to be processed in an effectively sequential manner: every byte from the disk must be shipped into RAM, added up, and forgotten. The bottleneck is disk IO anyway. I have a gut feeling that memory mapping would create more problems than it is supposed to solve, due to a multitude of page faults it incurs.
I would rather go for double (or triple, or 4-way, or as much as needed) buffering, with one thread doing prefetch, and one thread doing the calculations.
That said, there is no miracles; the file is to be processed in an effectively sequential manner: every byte from the disk must be shipped into RAM, added up, and forgotten. The bottleneck is disk IO anyway. I have a gut feeling that memory mapping would create more problems than it is supposed to solve, due to a multitude of page faults it incurs.
I would rather go for double (or triple, or 4-way, or as much as needed) buffering, with one thread doing prefetch, and one thread doing the calculations.
Context
StackExchange Code Review Q#77268, answer score: 3
Revisions (0)
No revisions yet.