HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Check if a numpy array contains numerical data

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
numpyarraynumericalcontainscheckdata

Problem

This function serves me as a testing-utility to check if the result is really numeric and sometimes as input-validation if there are a lot of operations before I would find out (with an Exception) if it's not-numerical.

But I feel like the .dtype.kind in ... is too complex. I've written this function a while back and tried using some better approach but I couldn't find any solution that works for python 2.7 and 3.x and different numpy version 1.7+.

The code:

import numpy as np

def is_numeric_array(array):
    """Checks if the dtype of the array is numeric.

    Booleans, unsigned integer, signed integer, floats and complex are
    considered numeric. 

    Parameters
    ----------
    array : `numpy.ndarray`-like
        The array to check.

    Returns
    -------
    is_numeric : `bool`
        True if it is a recognized numerical and False if object or
        string.
    """
    numerical_dtype_kinds = {'b', # boolean
                             'u', # unsigned integer
                             'i', # signed integer
                             'f', # floats
                             'c'} # complex
    try:
        return array.dtype.kind in numerical_dtype_kinds
    except AttributeError:
        # in case it's not a numpy array it will probably have no dtype.
        return np.asarray(array).dtype.kind in numerical_dtype_kinds


and I have the following tests:

```
def test_not_array():
assert is_numeric_array(1)
assert is_numeric_array(1.)
assert is_numeric_array(1+1j)
assert not is_numeric_array('a')
assert not is_numeric_array(None)
assert is_numeric_array([1, 2, 3])

def test_array():
assert is_numeric_array(np.array(1))
assert is_numeric_array(np.array(1.))
assert is_numeric_array(np.array(1+1j))
assert is_numeric_array(np.array([1]))
assert is_numeric_array(np.array([1.]))
assert is_numeric_array(np.array([1+1j]))
assert not is_numeric_array(np.array('a'))
assert not is_numeric

Solution


  1. Question



You write, "I feel like the .dtype.kind in ... is too complex". You're probably right about that: I've never needed anything like this in NumPy code. Normally I know what the datatypes are, or I rely on the caller setting them up correctly. But I don't think I can help unless you can explain what you are using this for. Why do you need to know whether an array is numeric or not?

Update: you say in comments that you are trying validate data read in from files. For this use case, consider using np.genfromtxt, passing loose=False. For example:

>>> from io import BytesIO
>>> np.genfromtxt(BytesIO(b'1,2,x,3'), dtype=float, delimiter=',', loose=False)
Traceback (most recent call last):
  File "numpy/lib/_iotools.py", line 688, in _strict_call
    new_value = self.func(value)
ValueError: could not convert string to float: b'x'


  1. Review



-
You've written a docstring! That's excellent.

-
But I think the docstring could be clearer about what happens when array is not a NumPy array. I would write something like, "Determine whether the argument has a numeric datatype, when converted to a NumPy array."

-
The docstring says, "False if object or string" but those are not the only non-numeric kinds (there's also unicode and void), so I would write something like, "True if the array has a numeric datatype, False if not."

-
np.asarray is cheap if the argument is already an array: "No copy is performed if the input is already an ndarray" so you might as well call it in all cases, and avoid the try: ... except: and the duplicated code.

-
The set numerical_dtype_kinds is always the same, and so it ought to be a global variable.

-
The test cases would be more convenient to run if you used the features in the unittest module.

-
There's a lot of repetition in the test cases. Since all the tests are of the form assert is_numeric_array(x) or assert not is_numeric_array(x), it would make sense to put the test cases in a couple of lists, and iterate over them. There's duplication between test_not_array and test_array that could easily be removed.

-
There are no test cases checking that Booleans are numeric, or that objects are not.

  1. Revised code



import numpy as np

# Boolean, unsigned integer, signed integer, float, complex.
_NUMERIC_KINDS = set('buifc')

def is_numeric(array):
    """Determine whether the argument has a numeric datatype, when
    converted to a NumPy array.

    Booleans, unsigned integers, signed integers, floats and complex
    numbers are the kinds of numeric datatype.

    Parameters
    ----------
    array : array-like
        The array to check.

    Returns
    -------
    is_numeric : `bool`
        True if the array has a numeric datatype, False if not.

    """
    return np.asarray(array).dtype.kind in _NUMERIC_KINDS

from unittest import TestCase

class TestIsNumeric(TestCase):
    NUMERIC = [True, 1, -1, 1.0, 1+1j]
    NOT_NUMERIC = [object(), 'string', u'unicode', None]

    def test_is_numeric(self):
        for x in self.NUMERIC:
            for y in (x, [x], [x] * 2):
                for z in (y, np.array(y)):
                    self.assertTrue(is_numeric(z))
        for x in self.NOT_NUMERIC:
            for y in (x, [x], [x] * 2):
                for z in (y, np.array(y)):
                    self.assertFalse(is_numeric(z))
        for kind, dtypes in np.sctypes.items():
            if kind != 'others':
                for dtype in dtypes:
                    self.assertTrue(is_numeric(np.array([0], dtype=dtype)))

Code Snippets

>>> from io import BytesIO
>>> np.genfromtxt(BytesIO(b'1,2,x,3'), dtype=float, delimiter=',', loose=False)
Traceback (most recent call last):
  File "numpy/lib/_iotools.py", line 688, in _strict_call
    new_value = self.func(value)
ValueError: could not convert string to float: b'x'
import numpy as np

# Boolean, unsigned integer, signed integer, float, complex.
_NUMERIC_KINDS = set('buifc')

def is_numeric(array):
    """Determine whether the argument has a numeric datatype, when
    converted to a NumPy array.

    Booleans, unsigned integers, signed integers, floats and complex
    numbers are the kinds of numeric datatype.

    Parameters
    ----------
    array : array-like
        The array to check.

    Returns
    -------
    is_numeric : `bool`
        True if the array has a numeric datatype, False if not.

    """
    return np.asarray(array).dtype.kind in _NUMERIC_KINDS


from unittest import TestCase

class TestIsNumeric(TestCase):
    NUMERIC = [True, 1, -1, 1.0, 1+1j]
    NOT_NUMERIC = [object(), 'string', u'unicode', None]

    def test_is_numeric(self):
        for x in self.NUMERIC:
            for y in (x, [x], [x] * 2):
                for z in (y, np.array(y)):
                    self.assertTrue(is_numeric(z))
        for x in self.NOT_NUMERIC:
            for y in (x, [x], [x] * 2):
                for z in (y, np.array(y)):
                    self.assertFalse(is_numeric(z))
        for kind, dtypes in np.sctypes.items():
            if kind != 'others':
                for dtype in dtypes:
                    self.assertTrue(is_numeric(np.array([0], dtype=dtype)))

Context

StackExchange Code Review Q#128032, answer score: 5

Revisions (0)

No revisions yet.