HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

32-bit checksum of a file

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
checksumfilebit

Problem

I have implemented a function to produce a 32-bit checksum of a file using the following method: checksum = word_1 + word_2 + ... + word_n, where word_i is the 32-bit words the file consists of.

Here are several questions I'm very interested about:

  • Is the way I read file word by word correct or there is a better way? (I aim not to read the whole file at once because it can be very large.)



  • Are there any problems with chosen data types such as uint32_t, unsigned and so on?



  • Have I got the right way to handle a file that isn't N*4 bytes in size? For example, for 7 bytes file I'm just setting 0-8 bits into 0 to avoid using an accidental value. Or I should set 24-31 bits into 0?



Here is the code I have so far:

void execute(std::ifstream& file) 
{
    const size_t WORD_SIZE = sizeof(uint32_t);

    file.seekg(0, ios::end);
    auto sizeInBytes = file.tellg();

    file.seekg(0);
    uint32_t checksum = 0U; // ???
    if(auto sizeInEntireWords = sizeInBytes / WORD_SIZE)
    {
        for(int i = 0; i (&word), WORD_SIZE); // ???
            checksum += word;
        }
    }

    if(auto additionalSizeInBytes = sizeInBytes % WORD_SIZE)
    {
        uint32_t word;
        file.read(reinterpret_cast(&word), WORD_SIZE);
        word &= (~0U << (WORD_SIZE - additionalSizeInBytes * 8)); // ???
        checksum += word;
    }

    cout << checksum << endl;
}

Solution

First of all, your checksum function should probably be named checksum and actually return the checksum, rather than simply printing it.

Now, you're not taking advantage of what the std::basic_ifstream API actually gives you. First, read() returns a basic_istream&, which is convertible to bool. That bool tells you if the complete read succeeded or not. So all you have to do is:

uint32_t checksum(std::ifstream& file) 
{
    uint32_t sum = 0;

    uint32_t word = 0;
    while (file.read(reinterpret_cast(&word), sizeof(word))) {
        sum += word;
    }

    // ??


Now, when read() fails, that means our file has run out (assuming we had a valid file to begin with). But, we have other information. There is also gcount() which


Returns the number of characters extracted by the last unformatted input operation.

If the read partially succeeded, we can mask off the other bytes and add the remainder. Thus, the full solution might be:

uint32_t checksum(std::ifstream& file) 
{
    uint32_t sum = 0;

    uint32_t word = 0;
    while (file.read(reinterpret_cast(&word), sizeof(word))) {
        sum += word;
    }

    if (file.gcount()) {
        word &= (~0U >> ((sizeof(uint32_t) - file.gcount()) * 8));
        sum += word;
    }

    return sum;
}


Doing the masking looks kind of terrible though, so instead we could simply zero out word every time so we can just add the result:

uint32_t checksum(std::ifstream& file) 
{
    uint32_t sum = 0;

    uint32_t word = 0;
    while (file.read(reinterpret_cast(&word), sizeof(word))) {
        sum += word;
        word = 0;
    }

    sum += word; // add the last word, could be 0
                 // if the file size is divisible by 4

    return sum;
}

Code Snippets

uint32_t checksum(std::ifstream& file) 
{
    uint32_t sum = 0;

    uint32_t word = 0;
    while (file.read(reinterpret_cast<char*>(&word), sizeof(word))) {
        sum += word;
    }

    // ??
uint32_t checksum(std::ifstream& file) 
{
    uint32_t sum = 0;

    uint32_t word = 0;
    while (file.read(reinterpret_cast<char*>(&word), sizeof(word))) {
        sum += word;
    }

    if (file.gcount()) {
        word &= (~0U >> ((sizeof(uint32_t) - file.gcount()) * 8));
        sum += word;
    }

    return sum;
}
uint32_t checksum(std::ifstream& file) 
{
    uint32_t sum = 0;

    uint32_t word = 0;
    while (file.read(reinterpret_cast<char*>(&word), sizeof(word))) {
        sum += word;
        word = 0;
    }

    sum += word; // add the last word, could be 0
                 // if the file size is divisible by 4

    return sum;
}

Context

StackExchange Code Review Q#104948, answer score: 5

Revisions (0)

No revisions yet.