patternpythonMinor
Check if a numpy array contains numerical data
Viewed 0 times
numpyarraynumericalcontainscheckdata
Problem
This function serves me as a testing-utility to check if the result is really numeric and sometimes as input-validation if there are a lot of operations before I would find out (with an Exception) if it's not-numerical.
But I feel like the
The code:
and I have the following tests:
```
def test_not_array():
assert is_numeric_array(1)
assert is_numeric_array(1.)
assert is_numeric_array(1+1j)
assert not is_numeric_array('a')
assert not is_numeric_array(None)
assert is_numeric_array([1, 2, 3])
def test_array():
assert is_numeric_array(np.array(1))
assert is_numeric_array(np.array(1.))
assert is_numeric_array(np.array(1+1j))
assert is_numeric_array(np.array([1]))
assert is_numeric_array(np.array([1.]))
assert is_numeric_array(np.array([1+1j]))
assert not is_numeric_array(np.array('a'))
assert not is_numeric
But I feel like the
.dtype.kind in ... is too complex. I've written this function a while back and tried using some better approach but I couldn't find any solution that works for python 2.7 and 3.x and different numpy version 1.7+.The code:
import numpy as np
def is_numeric_array(array):
"""Checks if the dtype of the array is numeric.
Booleans, unsigned integer, signed integer, floats and complex are
considered numeric.
Parameters
----------
array : `numpy.ndarray`-like
The array to check.
Returns
-------
is_numeric : `bool`
True if it is a recognized numerical and False if object or
string.
"""
numerical_dtype_kinds = {'b', # boolean
'u', # unsigned integer
'i', # signed integer
'f', # floats
'c'} # complex
try:
return array.dtype.kind in numerical_dtype_kinds
except AttributeError:
# in case it's not a numpy array it will probably have no dtype.
return np.asarray(array).dtype.kind in numerical_dtype_kindsand I have the following tests:
```
def test_not_array():
assert is_numeric_array(1)
assert is_numeric_array(1.)
assert is_numeric_array(1+1j)
assert not is_numeric_array('a')
assert not is_numeric_array(None)
assert is_numeric_array([1, 2, 3])
def test_array():
assert is_numeric_array(np.array(1))
assert is_numeric_array(np.array(1.))
assert is_numeric_array(np.array(1+1j))
assert is_numeric_array(np.array([1]))
assert is_numeric_array(np.array([1.]))
assert is_numeric_array(np.array([1+1j]))
assert not is_numeric_array(np.array('a'))
assert not is_numeric
Solution
- Question
You write, "I feel like the
.dtype.kind in ... is too complex". You're probably right about that: I've never needed anything like this in NumPy code. Normally I know what the datatypes are, or I rely on the caller setting them up correctly. But I don't think I can help unless you can explain what you are using this for. Why do you need to know whether an array is numeric or not?Update: you say in comments that you are trying validate data read in from files. For this use case, consider using
np.genfromtxt, passing loose=False. For example:>>> from io import BytesIO
>>> np.genfromtxt(BytesIO(b'1,2,x,3'), dtype=float, delimiter=',', loose=False)
Traceback (most recent call last):
File "numpy/lib/_iotools.py", line 688, in _strict_call
new_value = self.func(value)
ValueError: could not convert string to float: b'x'- Review
-
You've written a docstring! That's excellent.
-
But I think the docstring could be clearer about what happens when
array is not a NumPy array. I would write something like, "Determine whether the argument has a numeric datatype, when converted to a NumPy array."-
The docstring says, "False if object or string" but those are not the only non-numeric kinds (there's also unicode and void), so I would write something like, "True if the array has a numeric datatype, False if not."
-
np.asarray is cheap if the argument is already an array: "No copy is performed if the input is already an ndarray" so you might as well call it in all cases, and avoid the try: ... except: and the duplicated code.-
The set
numerical_dtype_kinds is always the same, and so it ought to be a global variable.-
The test cases would be more convenient to run if you used the features in the
unittest module.-
There's a lot of repetition in the test cases. Since all the tests are of the form
assert is_numeric_array(x) or assert not is_numeric_array(x), it would make sense to put the test cases in a couple of lists, and iterate over them. There's duplication between test_not_array and test_array that could easily be removed.-
There are no test cases checking that Booleans are numeric, or that objects are not.
- Revised code
import numpy as np
# Boolean, unsigned integer, signed integer, float, complex.
_NUMERIC_KINDS = set('buifc')
def is_numeric(array):
"""Determine whether the argument has a numeric datatype, when
converted to a NumPy array.
Booleans, unsigned integers, signed integers, floats and complex
numbers are the kinds of numeric datatype.
Parameters
----------
array : array-like
The array to check.
Returns
-------
is_numeric : `bool`
True if the array has a numeric datatype, False if not.
"""
return np.asarray(array).dtype.kind in _NUMERIC_KINDS
from unittest import TestCase
class TestIsNumeric(TestCase):
NUMERIC = [True, 1, -1, 1.0, 1+1j]
NOT_NUMERIC = [object(), 'string', u'unicode', None]
def test_is_numeric(self):
for x in self.NUMERIC:
for y in (x, [x], [x] * 2):
for z in (y, np.array(y)):
self.assertTrue(is_numeric(z))
for x in self.NOT_NUMERIC:
for y in (x, [x], [x] * 2):
for z in (y, np.array(y)):
self.assertFalse(is_numeric(z))
for kind, dtypes in np.sctypes.items():
if kind != 'others':
for dtype in dtypes:
self.assertTrue(is_numeric(np.array([0], dtype=dtype)))Code Snippets
>>> from io import BytesIO
>>> np.genfromtxt(BytesIO(b'1,2,x,3'), dtype=float, delimiter=',', loose=False)
Traceback (most recent call last):
File "numpy/lib/_iotools.py", line 688, in _strict_call
new_value = self.func(value)
ValueError: could not convert string to float: b'x'import numpy as np
# Boolean, unsigned integer, signed integer, float, complex.
_NUMERIC_KINDS = set('buifc')
def is_numeric(array):
"""Determine whether the argument has a numeric datatype, when
converted to a NumPy array.
Booleans, unsigned integers, signed integers, floats and complex
numbers are the kinds of numeric datatype.
Parameters
----------
array : array-like
The array to check.
Returns
-------
is_numeric : `bool`
True if the array has a numeric datatype, False if not.
"""
return np.asarray(array).dtype.kind in _NUMERIC_KINDS
from unittest import TestCase
class TestIsNumeric(TestCase):
NUMERIC = [True, 1, -1, 1.0, 1+1j]
NOT_NUMERIC = [object(), 'string', u'unicode', None]
def test_is_numeric(self):
for x in self.NUMERIC:
for y in (x, [x], [x] * 2):
for z in (y, np.array(y)):
self.assertTrue(is_numeric(z))
for x in self.NOT_NUMERIC:
for y in (x, [x], [x] * 2):
for z in (y, np.array(y)):
self.assertFalse(is_numeric(z))
for kind, dtypes in np.sctypes.items():
if kind != 'others':
for dtype in dtypes:
self.assertTrue(is_numeric(np.array([0], dtype=dtype)))Context
StackExchange Code Review Q#128032, answer score: 5
Revisions (0)
No revisions yet.