Numpy#
Numpy – short for numerical Python
A library that is used extensively used in scientific computing
Provides:
A multidimensional array object called
ndarray
(or just array)Fast routines, i.e., functions for sorting, selecting, mathematical and linear algebra operations etc.
Differences between numpy arrays and Python sequences (lists and tuples)
Numpy arrays are mutable while sequences can be immutable
Numpy arrays are homogenous (same element type) while sequences can be heterogenous
In general, numpy arrays have a fixed size in memory unlike Python sequences
Consider a problem asking you to find the resultant of two vectors
\(F_1 = 2i + 3j - 4k\)
\(F_2 = 3i - 1j + 4k\)
\(R = F_1 + F_2\)
You need to add these vectors element-wise
Requires loops and indexing
F1 = (2, 3, -4)
F2 = (3, -1, 4)
# Elementwise addition of a and b
c = []
for i in range(len(F1)):
c.append(F1[i] + F2[i])
print(c, '\n')
print(f"R = {c[0]}i + {c[1]}j + {c[2]}k")
[5, 2, 0]
R = 5i + 2j + 0k
The same program using numpy
import numpy as np
F1 = np.array([2, 3, -4])
F2 = np.array((3, -1, 4))
R = F1 + F2
print(R, '\n')
print(f"R = {R[0]}i + {R[1]}j + {R[2]}k")
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[2], line 1
----> 1 import numpy as np
3 F1 = np.array([2, 3, -4])
4 F2 = np.array((3, -1, 4))
ModuleNotFoundError: No module named 'numpy'
Numpy allows for more direct element-wise operations using the
ndarray
datatypeNumpy arrays are fast and efficient compared to python sequences
Vectorization - no explicit looping and indexing
These operations happen in the background with pre-compiled C code
Broadcasting - all operations are done element-wise, in general
This is the method used to carry out element-wise operation
Makes up for concise, readable code with less scope for bugs
Real math-like expressions and enables vectorization
Numpy Arrays#
ndarray
– n-dimensional array, the basic numpy data typeCan be used to represent vectors (1D arrays), matrices (2D arrays), …
Creating numpy arrays
Using the
np.array
functionInput is a list, tuple or array-like object
Or any object that has the magic method
__array__
The data type of each element in the
ndarray
is automatically inferred and stored in thedtype
attribute
a1 = np.array([1, 2, 5, 6]) # 1d-array
a2 = np.array([[1.4, 2],
[2, 3]]) # 2d-array
ndarray attributes
dtype
– the datatype of the stored elementsndim
– the dimension of the arrayshape
– a tuple with the size of each dimension (also referred to as an axis)size
– the total number of elements in the arrayitemsize
– the size of each element of the array in bytes
a1 = np.array([1, 2, 5, 6]) # 1d-array
a2 = np.array([[1.4, 2],
[2, 3]]) # 2d-array
a3 = np.array([3, 5, 9j]) # 1d-array with a complex number
print(a1.dtype, a2.dtype, a3.dtype, sep='\t')
print(a1.shape, a2.shape, a3.shape, sep='\t')
int64 float64 complex128
(4,) (2, 2) (3,)
Array Creation#
arange
#
Generating a sequence of numbers
np.arange
– like therange
but can create non-integer sequencesInputs – start, stop, and increment
The start and increment values can be optionally discarded
Default values of 0 and 1 will be used respectively
a = np.arange(1.2, 2.00005, 0.2)
a
array([1.2, 1.4, 1.6, 1.8, 2. ])
np.arange(2.2) # start = 0, increment = 1 - defaults
array([0., 1., 2.])
Example - function evaluation
Consider the function \(y = 3x^2\)
Evaluate \(y\) uniformly in an interval of \([0, 10]\)
x = np.arange(0, 10.001, 0.2)
y = 3*x**2
y
array([0.0000e+00, 1.2000e-01, 4.8000e-01, 1.0800e+00, 1.9200e+00,
3.0000e+00, 4.3200e+00, 5.8800e+00, 7.6800e+00, 9.7200e+00,
1.2000e+01, 1.4520e+01, 1.7280e+01, 2.0280e+01, 2.3520e+01,
2.7000e+01, 3.0720e+01, 3.4680e+01, 3.8880e+01, 4.3320e+01,
4.8000e+01, 5.2920e+01, 5.8080e+01, 6.3480e+01, 6.9120e+01,
7.5000e+01, 8.1120e+01, 8.7480e+01, 9.4080e+01, 1.0092e+02,
1.0800e+02, 1.1532e+02, 1.2288e+02, 1.3068e+02, 1.3872e+02,
1.4700e+02, 1.5552e+02, 1.6428e+02, 1.7328e+02, 1.8252e+02,
1.9200e+02, 2.0172e+02, 2.1168e+02, 2.2188e+02, 2.3232e+02,
2.4300e+02, 2.5392e+02, 2.6508e+02, 2.7648e+02, 2.8812e+02,
3.0000e+02])
linspace
#
Generating a sequence of numbers
np.linspace
– evenly spaced valuesInputs – start, stop, and number
The number can be left out; default value of 50 will be used
Can also use
np.logspace
andnp.geomspace
np.linspace(0, 10, 5)
array([ 0. , 2.5, 5. , 7.5, 10. ])
np.linspace(2.5, -2.5)
array([ 2.5 , 2.39795918, 2.29591837, 2.19387755, 2.09183673,
1.98979592, 1.8877551 , 1.78571429, 1.68367347, 1.58163265,
1.47959184, 1.37755102, 1.2755102 , 1.17346939, 1.07142857,
0.96938776, 0.86734694, 0.76530612, 0.66326531, 0.56122449,
0.45918367, 0.35714286, 0.25510204, 0.15306122, 0.05102041,
-0.05102041, -0.15306122, -0.25510204, -0.35714286, -0.45918367,
-0.56122449, -0.66326531, -0.76530612, -0.86734694, -0.96938776,
-1.07142857, -1.17346939, -1.2755102 , -1.37755102, -1.47959184,
-1.58163265, -1.68367347, -1.78571429, -1.8877551 , -1.98979592,
-2.09183673, -2.19387755, -2.29591837, -2.39795918, -2.5 ])
zeros
, ones
, full
#
Generating arrays filled with constant values
np.zeros(shape)
– creates anndarray
with zeros given the shapeshape can either be an
int
or atuple
orlist
of integers
np.ones(shape)
– same asnp.zeros
but fills the array with onesnp.full(shape, fill)
– creates an array with the given shape filled with the fill value
np.zeros(4)
array([0., 0., 0., 0.])
np.full([2,3],2)
array([[2, 2, 2],
[2, 2, 2]])
eye
#
Identity matrix
np.eye(N)
– creates an identity matrix withN
rows and columnsnp.eye(N, M)
– creates an identity matrix withN
rows andM
columnsnp.identity(N)
– equivalent tonp.eye(N)
np.eye(4)
array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])
np.eye(4, 3) # truncated identity matrix
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.],
[0., 0., 0.]])
diag
#
Diagonal matrices
np.diag(arr)
– creates a diagonal matrix given an array-like object (lists, tuples, 1D ndarray etc.)Note: if the input is a 2D ndarray, the output is a 1D ndarray containing the diagonal elements
np.diag((1, 2, 3))
array([[1, 0, 0],
[0, 2, 0],
[0, 0, 3]])
Importing Data#
From a text file
np.loadtxt(filename, delimiter, skiprows, dtype)
– creates an array with data from a file
File contents
x, y |
---|
0, 0 |
1, 1 |
2, 4 |
3, 9 |
4, 16 |
5, 25 |
a = np.loadtxt('Data.csv', delimiter=',', skiprows = 1, dtype = 'i')
print(a)
[[ 0 0]
[ 1 1]
[ 2 4]
[ 3 9]
[ 4 16]
[ 5 25]]
Check the links below for more info on array creation routines
Datatypes#
Almost all array creation methods accept an optional parameter
dtype
to specify the data typeThe default data type in numpy is
float
a_int = np.array((1, 2, 4), dtype=int)
a_flt = np.array((1, 2, 4), dtype=float)
a_cplx = np.array((1, 2, 4), dtype=complex)
a_bool = np.array((1, 2, 4), dtype=bool)
print(a_int, a_flt, a_cplx, a_bool, sep="\n\n", end="\n\n")
print(a_int.dtype, a_flt.dtype, a_cplx.dtype, a_bool.dtype, sep="\t")
[1 2 4]
[1. 2. 4.]
[1.+0.j 2.+0.j 4.+0.j]
[ True True True]
int64 float64 complex128 bool
Check this webpage for more info on data types
Data Type Conversion#
Arrays can be converted from one data type to another using the
astype
method
mat = np.full((3, 3), 2, dtype = int)
print(mat)
[[2 2 2]
[2 2 2]
[2 2 2]]
mat_complex = mat.astype(complex)
print(mat_complex)
[[2.+0.j 2.+0.j 2.+0.j]
[2.+0.j 2.+0.j 2.+0.j]
[2.+0.j 2.+0.j 2.+0.j]]
Accessing Elements#
Indexing and slicing (introduced with Python sequences) works with numpy arrays too
[𝑖 :𝑗 :Δ] – extracts elements beginning from index 𝑖 to 𝑗−1 in increments of Δ
If 𝑗 is not specified, it is assumed to be the length of the sequence
If 𝑖 is not specified, it is assumed to be 0
If Δ is not specified, it is assumed to be 1
arr1d = np.array([1, 4, 9, 16, 25, 36])
arr1d.shape
(6,)
# Basic indexing
arr1d[-2]
25
# Slicing
arr1d[0::2]
array([ 1, 9, 25])
# Slicing
arr1d[::-1]
array([36, 25, 16, 9, 4, 1])
Advanced indexing with 1D arrays
https://numpy.org/doc/stable/user/quickstart.html#advanced-indexing-and-index-tricks
arr1d = np.array([1, 4, 9, 16, 25, 36])
arr1d[[0, 3, 2, 0]]
array([ 1, 16, 9, 1])
j = np.array([[3, 4],
[1, 2]])
arr1d[j]
array([[16, 25],
[ 4, 9]])
Indexing and slicing of 2D arrays
Indices are [row, column]
:
used to indicates all elements along the axis are to be selected
arr2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Basic indexing
arr2d[1] # row
array([4, 5, 6])
# Basic indexing
arr2d[2, 0] # row, col
7
# Basic indexing
arr2d[(2, 0)] # Same result with a tuple
7
# Slicing
arr2d[0::2]
array([[1, 2, 3],
[7, 8, 9]])
# Slicing + Indexing
arr2d[:, :]
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Advanced indexing for 2D arrays
arr2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
inds = np.array([[1, 2],
[2, 0]])
arr2d[inds]
array([[[4, 5, 6],
[7, 8, 9]],
[[7, 8, 9],
[1, 2, 3]]])
Treats the 2D array as a 1D array and generated a higher dimensional
ndarray
arr2d[0, [2, 1]]
array([3, 2])
Elements with indices
(0, 1) and (0, 2)
arr2d[[0, 0, 2, 2],
(0, 2, 0, 2)] # Tuples or lists can be used for specifying indices
array([1, 3, 7, 9])
Elements with indices
(0, 0), (0, 2), (2, 0), and (2, 2)
rows = np.array([[0, 0],
[2, 2]])
columns = [[0, 2],
[0, 2]]
arr2d[rows, columns]
array([[1, 3],
[7, 9]])
Same elements as before are selected but the results is now a 2D array
Indexing and slicing for 3D arrays
a = np.array([1, 2, 3, 4, 5, 6, 7, 8])
arr3d = a.reshape((2, 2, 2))
arr3d
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
The
reshape
method changes the shape of thendarray
given atuple
provided the number of elements remain the samenp.reshape
module function does the same thing
arr3d[0, 1, 0]
3
arr3d[(0, 1, 0)] # Same as above, but using a tuple; list will result in advanced slicing
3
arr3d[:, 1, 1]
array([4, 8])
Each element is a 2D array
Boolean array indexing#
The comparison operator is broadcast to all elements in the array
arr2d = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
arr2d > 5
array([[False, False, False],
[False, False, True],
[ True, True, True]])
arr2d[arr2d > 5]
array([6, 7, 8, 9])
arr2d[arr2d % 2 == 1]
array([1, 3, 5, 7, 9])
np.isnan
andnp.isinf
can be used to filter bad data
arr = np.array([[1, np.inf, 3],
[4, 5, 6],
[7, 8, np.nan]])
arr = arr[~np.isinf(arr)] # ~ is the bitwise not operator
arr = arr[~np.isnan(arr)]
print(arr)
[1. 3. 4. 5. 6. 7. 8.]
Assignment#
Numpy arrays are mutable
They are modified just like Python lists
Augmented assignment can also be used
a = np.linspace(1, 9, 9, dtype=int)
m = a.reshape((3, 3)) # np.reshape(a, (3, 3))
m
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Assigning a constant value to a row
m[1] = 0
m
array([[1, 2, 3],
[0, 0, 0],
[7, 8, 9]])
Assigning a constant value to a column
m[:, 1] = 10
m
array([[ 1, 10, 3],
[ 0, 10, 0],
[ 7, 10, 9]])
Assign an
ndarray
orlist
ortuple
to a rowDimensions of the LHS and RHS must match
m[2] = [2, 3, 4]
m
array([[ 1, 10, 3],
[ 0, 10, 0],
[ 2, 3, 4]])
Arithmetic Operations#
Arithmetic operations result in a new instance of
ndarray
a = np.linspace(1, 9, 9, dtype=int).reshape((3, 3))
a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
a + 2
array([[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Values are broadcast across array
In other words, the scale 2 is stretched to match the dimensions of
a
Add a constant value to the array via broadcasting
a + 2
a ** [1, 2, 3]
array([[ 1, 4, 27],
[ 4, 25, 216],
[ 7, 64, 729]])
The
list
,tuple
, orndarray
being multiplied will be stretched to match the dimensions ofa
Multiplying a 2D array with an 1D
ndarray
,list
, ortuple
a * [1, 2, 3]
All the other Python operators like
-
,/
,//
,%
,<
,>
, and==
work in the same manner
Broadcasting works on any two arrays with compatible dimensions
# A 2D array with 4 rows and 1 column
x = np.arange(4, dtype=int).reshape((4, 1))
x
array([[0],
[1],
[2],
[3]])
# A 2D array with 1 row and 3 columns
# Not the same as a 1D array!
y = np.ones(3).reshape((1, 3))
y
array([[1., 1., 1.]])
x + y
array([[1., 1., 1.],
[2., 2., 2.],
[3., 3., 3.],
[4., 4., 4.]])
x + y
Matrix Math#
By default, all operations in numpy are element-wise.
a = np.linspace(1, 9, 9, dtype=int)
x = a.reshape((3, 3))
y = np.eye(3)
y
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
x*y
array([[1., 0., 0.],
[0., 5., 0.],
[0., 0., 9.]])
To perform matrix multiplication, use
@
operator instead of*
Alternately,
np.matmul(x, y)
can be used
x@y
array([[1., 2., 3.],
[4., 5., 6.],
[7., 8., 9.]])
@
operators works withlist
andtuple
objects too in the same way it does withndarray
b = (1, 1, 1)
x@b # b is treated as a column vector
array([ 6, 15, 24])
c = [1, 2, 3]
c@x # c is treated as a row vector
Augmented Assignment#
Augmented assignment modifies the
nadrray
in place
a = np.linspace(1, 9, 9, dtype=int).reshape((3, 3))
a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Multiply a constant value to a row
a[1] *= 2
a
array([[ 1, 2, 3],
[ 8, 10, 12],
[ 7, 8, 9]])
Subtract an
ndarray
,list
, ortuple
to a column (dimensions must match)
a[:, 2] -= (2, 3, 4)
a
array([[ 1, 2, 1],
[ 8, 10, 9],
[ 7, 8, 5]])
Multiply the entire array by an
ndarray
(dimensions should match)
a += [1, 2, 4]
a
array([[ 2, 4, 5],
[ 9, 12, 13],
[ 8, 10, 9]])
Other operations work the same way
The two operands should have the same data type
For division, the
dtype
ofndarray
should befloat
Broadcasting Rules#
The basic rules of broadcasting:
The trailing dimension of each array is either equal or one of them is 1
The array with trailing dimension 1 will be stretched to match the other array’s dimension
This is repeated for the other dimensions of the array going right to left
Example 1:
|
shape |
Description |
---|---|---|
|
(512, 512, 4) |
A color image with 512x512 pixels and rgba values for each pixel |
|
(4,) |
Scaling values, one for each channel (, in the shape can be ignored) |
|
(512, 512, 4) |
A new image with scaled rgba values |
Example 2:
|
shape |
---|---|
|
(8, 1, 6, 1) |
|
(7, 1, 5) |
|
(8, 7, 6, 5) |
Example 3:
|
shape |
---|---|
|
(2, 1) |
|
(8, 4, 3) |
|
? |
In this case, the dimensions are incompatible and broadcasting is not possible
More info on broadcasting
Numpy Array Methods#
Some ndarray methods
transpose
– transpose of the array (equivalent to the attribute, T)max
,min
,mean
– return the max, min and mean of the array, respectively, along a given axissum
,prod
– return the sum and product, respectively, of all the elements along a given axissort
– sort the array in-placeargsort
– returns an array of indices that would sort the arrayflatten
– return a flattened copy of the array (i.e., collapsed to 1d)
Check ndarray documentation for a full list
a = np.array([[2, 3, 1],
[4, 15, 6],
[7, 8, 9]])
a.sort(0) # Sort along axis 0 (columns)
a
a.sort(1) # Sort along axis 1 (rows)
a
Numpy Functions#
Numpy provides functions/routines for
Array creation
array
,zeros
,ones
,full
,arange
,linspace
, …
Array manipulation
reshape
,delete
,insert
,append
,hstack
,vstack
,split
,transpose
, …
Linear algebra
dot
/inner
,outer
,linalg.inv
,linalg.eig
,linalg.det
, …
Mathematical functions (https://numpy.org/doc/1.21/reference/routines.math.html)
Vectorized – apply the function element-wise
Trigonometric functions –
sin
,cos
,tan
,arcsin
,arccos
,arctan
,arctan2
Hyperbolic functions –
sinh
,cosh
,tanh
,arcsinh
,arccosh
,arctanh
exp
,log
,ceil
,floor
,abs
, …
x = np.arange(0, 2.01*np.pi, 0.2*np.pi)
y = np.sin(x)
a = np.vstack((x, y)).T
a
print(f'{"x":^4}\t{"y":^5}')
print(f'{"-"*4}\t{"-"*5}')
for p in a:
print(f"{p[0]:1.2f}\t{p[1]: 1.2f}")
Vectorization#
Python functions like
math.sin
do not work withndarray
objects
import math
x = np.arange(0, 2, 0.01)
y = [math.sin(i) for i in x] # List comprehension - alternate syntax for a simple for loop
# The above code is equivalent to the commented code below
# y = []
# for i in x:
# y += math.sin(i)
print(y)
%timeit [math.sin(i) for i in x]
np.sin
works readily withndarray
s and is much faster
%timeit np.sin(x)
Use
np.vectorize(func)
for functions that are not numpy awareReturns a vectorized version of the function
Just wraps a
for
loop around numpy unaware function; do not expect a speedupThe ouput datatype is determined by evaluating
func
with the first element
def mySqrt(x):
if x < 0:
# If this is 0, output of the vectorized function will be of type int
return 0.0
else:
return x**0.5
Cannot use
ndarray
as an input
x = np.arange(-1, 2, 0.1)
mySqrt_vectorized = np.vectorize(mySqrt)
y = mySqrt_vectorized(x)
y
array([0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0.31622777, 0.4472136 , 0.54772256, 0.63245553,
0.70710678, 0.77459667, 0.83666003, 0.89442719, 0.9486833 ,
1. , 1.04880885, 1.09544512, 1.14017543, 1.18321596,
1.22474487, 1.26491106, 1.30384048, 1.34164079, 1.37840488])