HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Get a floating point number from a string with a finite state machine

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
numberfinitepointwithfloatinggetstatemachinefromstring

Problem

I wanted to restart the big-float project I had paused the moment I suddenly realized that the implementation of the main four rounding modes (which I postponed so long, that I got all of the IEEE-754 functions implemented in the meantime) needs three guard bits. And it needs them from the beginning. Which needs a complete refactoring. Yippy.

But that might also be a good chance to clean the whole thing up, I pondered--the function that parses the input is quite a mess, for example. Parsing is normally done with a small program: a lexer: put some regular expressions in and get your stuff out in the right order, very simple. But it has the disadvantage to need a third party program--or an "app" as the kids say today--and that is a dependency to be avoided -- a standard compliant C-compiler must suffice. Which exact C-standard the term "standard compliant" talks about is open for negotiations but it should not be older than, say, 15 years.

Writing a finite state machine manually is probably not the brightest idea I had in my life but the actual amount of such a task is not always easy to approximate and there are worse things to do on a weekend (although I really need to repair the heating before the start of winter).

The format of the input string is best described formally, to avoid any ambiguities, so what follows is a description of the acceptable input in EBNF (ISO 14977).

```
( space (0x20), used only for the thousands separators )
space = ? US-ASCII character 32 ?;

( sign )
sign = '+'|'-';

( thousands separator, must be between two digits )
tsep = '_' | space;

( A leading zero is treated as a prefix, hence the special treatment for it )
zero = '0';

( Integers. Integers must contain at least one digit of the respective base )

( binary digit )
bindig = '1';
binnum = (bindig | zero) | (bindig | zero), tsep, ( bindig | zero );
( binary integer )
binint = binnum, {binnum};

( octal digit );
octdig = bindig|'2'|'3'|'4'|'5'|'6'|'7';
octnum

Solution

There is a lot in your post. So far only some minor ideas.

-
Fundamentally, code is relying on long double with a greater precision than double. Better to code a solution without this reliance. Else how to code a long double version?

-
str2dbl(char s, double d) only uses char *s as a non-const to do trimming. I'd expect code to cope with a const string. The code modifications needed are small.

-
Re-order str2dbl() for clarity by starting with space and sign handling. Suggested layout:

int str2dbl(char *s, double *d) {
  ...
  while (isspace(*s)) s++;
  int sign = *s;
  if (sign == '+' || sign == '-') s++;
  // At this point sign is either '-' or not.

  // Handle NAN

  // Handle Infinity

  // main converison

  // detect is end-of-string is \0

  if (sign == '-') result = -result;


-
Small simplification with add_int(int a, int b). Using a b >= 0 instead of b > 0 allows a simplification that a compiler may not catch. Similar with sub_int()

// if (((b > 0) && (a > (INT_MAX - b))) || ((b = 0) && (a > (INT_MAX - b))) || ((b = 0) { if (a > INT_MAX - b) Over(); } 
else { if (a < INT_MIN - b) Under(); }.


-
Wrong return value on overflow (2 places).

if (b < (INT_MIN / a)) {
  errno = ERANGE;
  // return INT_MAX;
  return INT_MIN;


-
See little value is type matching simple constants.

// long double power = 1.0L;
long double power = 1.0;


-
Questionable code when exponent >. I'd expect either (% and /) or (& and >>). With signed arithmetic, these have slightly different functionalities.

while (exponent) {
  // if (negative_exponent % 2 --> -1) 
  // if (exponent % 2 == 1) {  
  if (exponent % 2) {
    power *= base;
  }
  // exponent >>= 1;
  exponent /= 2;
  base *= base;
}


-
Casting to unsigned char is better for string is....() and to...() functions. These functions are UB for negative values. Of course, ASCII is always positive, but not much work to prevent.

const char *s1,
  ...
  // c1 = tolower(*s1);
  c1 = tolower((unsigned char) *s1);


-
Questionable code when char is unsigned. Suggest signed char. Else main_sig will the wrong value even is one assumes ASCII-only in the string.

// static const char digit_map[] = {
static const signed char digit_map[] = {
  -1, -1, -1, -1, -1, -1, -1, -1, //  0x00-0x07


-
strncasecmp() has the same problem. Better to use unsigned char here.

// char c1 = 0;
// char c2 = 0;
unsigned char c1 = 0;
unsigned char c2 = 0;


-
For locale issues, the decimal point may be something else than '.'.

-
Avoid magic numbers.

// static int fsm_table[31][13] = {
static int fsm_table[][OTHER] = {


-
Concerning // empty input, would an error be better?. Any parsing that lacks a scanned digit (or inf or NaN) should flag an error. E. g.: "", "-", "-+0", "in", " " etc.

Code Snippets

int str2dbl(char *s, double *d) {
  ...
  while (isspace(*s)) s++;
  int sign = *s;
  if (sign == '+' || sign == '-') s++;
  // At this point sign is either '-' or not.

  // Handle NAN

  // Handle Infinity

  // main converison

  // detect is end-of-string is \0

  if (sign == '-') result = -result;
// if (((b > 0) && (a > (INT_MAX - b))) || ((b < 0) && (a < (INT_MIN - b)))) {
//      v-------------- opposite ------------v
if (((b >= 0) && (a > (INT_MAX - b))) || ((b < 0) && (a < (INT_MIN - b)))) {
// or
if (b >= 0) { if (a > INT_MAX - b) Over(); } 
else { if (a < INT_MIN - b) Under(); }.
if (b < (INT_MIN / a)) {
  errno = ERANGE;
  // return INT_MAX;
  return INT_MIN;
// long double power = 1.0L;
long double power = 1.0;
while (exponent) {
  // if (negative_exponent % 2 --> -1) 
  // if (exponent % 2 == 1) {  
  if (exponent % 2) {
    power *= base;
  }
  // exponent >>= 1;
  exponent /= 2;
  base *= base;
}

Context

StackExchange Code Review Q#144583, answer score: 4

Revisions (0)

No revisions yet.