patterncMinor
Get a floating point number from a string with a finite state machine
Viewed 0 times
numberfinitepointwithfloatinggetstatemachinefromstring
Problem
I wanted to restart the big-float project I had paused the moment I suddenly realized that the implementation of the main four rounding modes (which I postponed so long, that I got all of the IEEE-754 functions implemented in the meantime) needs three guard bits. And it needs them from the beginning. Which needs a complete refactoring. Yippy.
But that might also be a good chance to clean the whole thing up, I pondered--the function that parses the input is quite a mess, for example. Parsing is normally done with a small program: a lexer: put some regular expressions in and get your stuff out in the right order, very simple. But it has the disadvantage to need a third party program--or an "app" as the kids say today--and that is a dependency to be avoided -- a standard compliant C-compiler must suffice. Which exact C-standard the term "standard compliant" talks about is open for negotiations but it should not be older than, say, 15 years.
Writing a finite state machine manually is probably not the brightest idea I had in my life but the actual amount of such a task is not always easy to approximate and there are worse things to do on a weekend (although I really need to repair the heating before the start of winter).
The format of the input string is best described formally, to avoid any ambiguities, so what follows is a description of the acceptable input in EBNF (ISO 14977).
```
( space (0x20), used only for the thousands separators )
space = ? US-ASCII character 32 ?;
( sign )
sign = '+'|'-';
( thousands separator, must be between two digits )
tsep = '_' | space;
( A leading zero is treated as a prefix, hence the special treatment for it )
zero = '0';
( Integers. Integers must contain at least one digit of the respective base )
( binary digit )
bindig = '1';
binnum = (bindig | zero) | (bindig | zero), tsep, ( bindig | zero );
( binary integer )
binint = binnum, {binnum};
( octal digit );
octdig = bindig|'2'|'3'|'4'|'5'|'6'|'7';
octnum
But that might also be a good chance to clean the whole thing up, I pondered--the function that parses the input is quite a mess, for example. Parsing is normally done with a small program: a lexer: put some regular expressions in and get your stuff out in the right order, very simple. But it has the disadvantage to need a third party program--or an "app" as the kids say today--and that is a dependency to be avoided -- a standard compliant C-compiler must suffice. Which exact C-standard the term "standard compliant" talks about is open for negotiations but it should not be older than, say, 15 years.
Writing a finite state machine manually is probably not the brightest idea I had in my life but the actual amount of such a task is not always easy to approximate and there are worse things to do on a weekend (although I really need to repair the heating before the start of winter).
The format of the input string is best described formally, to avoid any ambiguities, so what follows is a description of the acceptable input in EBNF (ISO 14977).
```
( space (0x20), used only for the thousands separators )
space = ? US-ASCII character 32 ?;
( sign )
sign = '+'|'-';
( thousands separator, must be between two digits )
tsep = '_' | space;
( A leading zero is treated as a prefix, hence the special treatment for it )
zero = '0';
( Integers. Integers must contain at least one digit of the respective base )
( binary digit )
bindig = '1';
binnum = (bindig | zero) | (bindig | zero), tsep, ( bindig | zero );
( binary integer )
binint = binnum, {binnum};
( octal digit );
octdig = bindig|'2'|'3'|'4'|'5'|'6'|'7';
octnum
Solution
There is a lot in your post. So far only some minor ideas.
-
Fundamentally, code is relying on
-
-
Re-order
-
Small simplification with
-
Wrong return value on overflow (2 places).
-
See little value is type matching simple constants.
-
Questionable code when
-
Casting to
-
Questionable code when
-
-
For locale issues, the decimal point may be something else than
-
Avoid magic numbers.
-
Concerning
-
Fundamentally, code is relying on
long double with a greater precision than double. Better to code a solution without this reliance. Else how to code a long double version?-
str2dbl(char s, double d) only uses char *s as a non-const to do trimming. I'd expect code to cope with a const string. The code modifications needed are small.-
Re-order
str2dbl() for clarity by starting with space and sign handling. Suggested layout:int str2dbl(char *s, double *d) {
...
while (isspace(*s)) s++;
int sign = *s;
if (sign == '+' || sign == '-') s++;
// At this point sign is either '-' or not.
// Handle NAN
// Handle Infinity
// main converison
// detect is end-of-string is \0
if (sign == '-') result = -result;-
Small simplification with
add_int(int a, int b). Using a b >= 0 instead of b > 0 allows a simplification that a compiler may not catch. Similar with sub_int()// if (((b > 0) && (a > (INT_MAX - b))) || ((b = 0) && (a > (INT_MAX - b))) || ((b = 0) { if (a > INT_MAX - b) Over(); }
else { if (a < INT_MIN - b) Under(); }.-
Wrong return value on overflow (2 places).
if (b < (INT_MIN / a)) {
errno = ERANGE;
// return INT_MAX;
return INT_MIN;-
See little value is type matching simple constants.
// long double power = 1.0L;
long double power = 1.0;-
Questionable code when
exponent >. I'd expect either (% and /) or (& and >>). With signed arithmetic, these have slightly different functionalities.while (exponent) {
// if (negative_exponent % 2 --> -1)
// if (exponent % 2 == 1) {
if (exponent % 2) {
power *= base;
}
// exponent >>= 1;
exponent /= 2;
base *= base;
}-
Casting to
unsigned char is better for string is....() and to...() functions. These functions are UB for negative values. Of course, ASCII is always positive, but not much work to prevent.const char *s1,
...
// c1 = tolower(*s1);
c1 = tolower((unsigned char) *s1);-
Questionable code when
char is unsigned. Suggest signed char. Else main_sig will the wrong value even is one assumes ASCII-only in the string.// static const char digit_map[] = {
static const signed char digit_map[] = {
-1, -1, -1, -1, -1, -1, -1, -1, // 0x00-0x07-
strncasecmp() has the same problem. Better to use unsigned char here. // char c1 = 0;
// char c2 = 0;
unsigned char c1 = 0;
unsigned char c2 = 0;-
For locale issues, the decimal point may be something else than
'.'.-
Avoid magic numbers.
// static int fsm_table[31][13] = {
static int fsm_table[][OTHER] = {-
Concerning
// empty input, would an error be better?. Any parsing that lacks a scanned digit (or inf or NaN) should flag an error. E. g.: "", "-", "-+0", "in", " " etc.Code Snippets
int str2dbl(char *s, double *d) {
...
while (isspace(*s)) s++;
int sign = *s;
if (sign == '+' || sign == '-') s++;
// At this point sign is either '-' or not.
// Handle NAN
// Handle Infinity
// main converison
// detect is end-of-string is \0
if (sign == '-') result = -result;// if (((b > 0) && (a > (INT_MAX - b))) || ((b < 0) && (a < (INT_MIN - b)))) {
// v-------------- opposite ------------v
if (((b >= 0) && (a > (INT_MAX - b))) || ((b < 0) && (a < (INT_MIN - b)))) {
// or
if (b >= 0) { if (a > INT_MAX - b) Over(); }
else { if (a < INT_MIN - b) Under(); }.if (b < (INT_MIN / a)) {
errno = ERANGE;
// return INT_MAX;
return INT_MIN;// long double power = 1.0L;
long double power = 1.0;while (exponent) {
// if (negative_exponent % 2 --> -1)
// if (exponent % 2 == 1) {
if (exponent % 2) {
power *= base;
}
// exponent >>= 1;
exponent /= 2;
base *= base;
}Context
StackExchange Code Review Q#144583, answer score: 4
Revisions (0)
No revisions yet.