HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Analyzer for compression algorithms (or tools)

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
compressionalgorithmsfortoolsanalyzer

Problem

This file contains the main functions of my tool, rac. In particular, I care a lot about the Shannon function.

Would you please give some hints to make the file cleaner? I have a perversion for concise and good indented code.

#include 
#include 
#include "headers/utils.h"

double ccount(char * foo)
{
  FILE * file = fopen(foo, "r");
  int a = 0, b;
  while ((b = getc(file)) != EOF)
    a++;

  fclose(file);
  return a;
}

int cmp(double foo, double bar)
{
  if (foo >= bar)
    return 1;
  else
    return 0;
}

double ratio(double foo, double bar)
{
  return (100*(foo - bar)/ foo);
}

void stat(char * file, double vector[256])
  {
    int c;
    FILE * ftopen = fopen(file, "r");
    while ((c = getc(ftopen)) != EOF)
      {
        if (c >= 0 && c <= 256) 
            vector[c]++;  
      }

      fclose(ftopen);
  }

void freq(char * foo, double vector[256])
{
  int i;
  for (i = 0; i < 256; i++)
  {
    if (vector[i] == 0)
      continue;
    else
      {
        vector[i] = (vector[i] / ccount(foo));
      }
  }
}

double shannon(double vector[256])
{
  int i;
  double shan = 0.0;
  for (i = 0; i < 256; i++)
  {
    if (vector[i] == 0.0)
      continue;
    else
      {
        shan = shan + (log2(pow(vector[i], vector[i]))); 
      }
  }
  return (shan * -1.0); //entropy of the file
}

Solution

MAGIC NUMBERS

The number 256 is used in multiple places as an array size. Rather than using the raw number
256, it would be better to use a symbolic constant. Raw numbers in code are sometimes referred to as magic numbers, because it isn't clear what they mean.
There are at least two reasons for this,

  • It identifies what the number means.



  • If at some point in the future the number has to be changed for all uses, the code only needs to be changed in one place.



An Example in C:

#define ARRAY_SIZE 256

void stat(char * file, double vector[ARRAY_SIZE])
{
    int c;
    FILE * ftopen = fopen(file, "r");
    while ((c = getc(ftopen)) != EOF)
      {
        if (c >= 0 && c <= ARRAY_SIZE)
            vector[c]++;
      }

      fclose(ftopen);
}


An Example in C++:

constexpr int ARRAY_SIZE = 256;


Use Meaningful Variable and Function Names

The code contains foo and bar in a number of places. These are not meaningful names.
The names should indicate what the variables are, so that if you or someone else needs
to change the code 2 years in the future it is an easy task rather than having to go
through all the code to see what needs to be changed.

Another exmaple of this is the function ccount(), which could be renamed file_size().

A third example is the array vector. Other than array if doubles it is not clear what
this array is used for.

Inconsistent Coding Style

In a professional setting where multiple people may have to maintain the code it is best
to be as consistent as possible.

The else clauses almost always have { block} but the if statements do not.
This is true whether or not the block contains multiple statement or not.
You should choose one way or the other.

In the
for loops the { is immediately under the for, but in the else clauses it
is indented. Choose one or the other. The most common usage is how you do the for loops.

This is partially a matter of choice, I use
{ block }` on all control statements:

if ()
{
}
else
{
}

for ()
{
}

while ()
{
}

do
{
} while ();


I do this even for single statements because if the code needs to be updated by adding a statement
to a block it is easier for anyone to see where the line of code should be inserted. This was
also required as part of a coding standard at a company where I worked.

Empty Clauses in IF/ELSE

In the function shannon() the code contains:

if (vector[i] == 0.0)
      continue;
    else
      {
        shan = shan + (log2(pow(vector[i], vector[i])));
      }


This would be clearer if it was written

if (vector[i] != 0.0)
      {
        shan = shan + (log2(pow(vector[i], vector[i])));
      }


An else clause can be added at a later time if it is needed. The if portion of the current
implementation is meaningless and would be optimized out by the compiler.

Code Snippets

#define ARRAY_SIZE 256

void stat(char * file, double vector[ARRAY_SIZE])
{
    int c;
    FILE * ftopen = fopen(file, "r");
    while ((c = getc(ftopen)) != EOF)
      {
        if (c >= 0 && c <= ARRAY_SIZE)
            vector[c]++;
      }

      fclose(ftopen);
}
constexpr int ARRAY_SIZE = 256;
if ()
{
}
else
{
}

for ()
{
}

while ()
{
}

do
{
} while ();
if (vector[i] == 0.0)
      continue;
    else
      {
        shan = shan + (log2(pow(vector[i], vector[i])));
      }
if (vector[i] != 0.0)
      {
        shan = shan + (log2(pow(vector[i], vector[i])));
      }

Context

StackExchange Code Review Q#140121, answer score: 2

Revisions (0)

No revisions yet.