Numpy#

  • Numpy – short for numerical Python

    • A library that is used extensively used in scientific computing

    • Provides:

      • A multidimensional array object called ndarray (or just array)

      • Fast routines, i.e., functions for sorting, selecting, mathematical and linear algebra operations etc.

  • Differences between numpy arrays and Python sequences (lists and tuples)

    • Numpy arrays are mutable while sequences can be immutable

    • Numpy arrays are homogenous (same element type) while sequences can be heterogenous

    • In general, numpy arrays have a fixed size in memory unlike Python sequences

  • Consider a problem asking you to find the resultant of two vectors

    • \(F_1 = 2i + 3j - 4k\)

    • \(F_2 = 3i - 1j + 4k\)

    • \(R = F_1 + F_2\)

  • You need to add these vectors element-wise

    • Requires loops and indexing

F1 = (2, 3, -4)
F2 = (3, -1, 4)

# Elementwise addition of a and b
c = []
for i in range(len(F1)):
    c.append(F1[i] + F2[i])
    
print(c, '\n')
print(f"R = {c[0]}i + {c[1]}j + {c[2]}k")
[5, 2, 0] 

R = 5i + 2j + 0k
  • The same program using numpy

import numpy as np

F1 = np.array([2, 3, -4])
F2 = np.array((3, -1, 4))
R = F1 + F2

print(R, '\n')
print(f"R = {R[0]}i + {R[1]}j + {R[2]}k")
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 1
----> 1 import numpy as np
      3 F1 = np.array([2, 3, -4])
      4 F2 = np.array((3, -1, 4))

ModuleNotFoundError: No module named 'numpy'
  • Numpy allows for more direct element-wise operations using the ndarray datatype

  • Numpy arrays are fast and efficient compared to python sequences

    • Vectorization - no explicit looping and indexing

      • These operations happen in the background with pre-compiled C code

    • Broadcasting - all operations are done element-wise, in general

      • This is the method used to carry out element-wise operation

      • Makes up for concise, readable code with less scope for bugs

      • Real math-like expressions and enables vectorization

Numpy Arrays#

  • ndarray – n-dimensional array, the basic numpy data type

    • Can be used to represent vectors (1D arrays), matrices (2D arrays), …

  • Creating numpy arrays

    • Using the np.array function

      • Input is a list, tuple or array-like object

      • Or any object that has the magic method __array__

  • The data type of each element in the ndarray is automatically inferred and stored in the dtype attribute

a1 = np.array([1, 2, 5, 6])    # 1d-array

a2 = np.array([[1.4, 2],
               [2, 3]])        # 2d-array
  • ndarray attributes

    • dtype – the datatype of the stored elements

    • ndim – the dimension of the array

    • shape – a tuple with the size of each dimension (also referred to as an axis)

    • size – the total number of elements in the array

    • itemsize – the size of each element of the array in bytes

a1 = np.array([1, 2, 5, 6])    # 1d-array
a2 = np.array([[1.4, 2],
               [2, 3]])        # 2d-array
a3 = np.array([3, 5, 9j])      # 1d-array with a complex number
print(a1.dtype, a2.dtype, a3.dtype, sep='\t')
print(a1.shape, a2.shape, a3.shape, sep='\t')
int64	float64	complex128
(4,)	(2, 2)	(3,)

Array Creation#

arange#

  • Generating a sequence of numbers

    • np.arange – like the range but can create non-integer sequences

      • Inputs – start, stop, and increment

    • The start and increment values can be optionally discarded

      • Default values of 0 and 1 will be used respectively

a = np.arange(1.2, 2.00005, 0.2)
a
array([1.2, 1.4, 1.6, 1.8, 2. ])
np.arange(2.2) # start = 0, increment = 1 - defaults
array([0., 1., 2.])
  • Example - function evaluation

    • Consider the function \(y = 3x^2\)

    • Evaluate \(y\) uniformly in an interval of \([0, 10]\)

x = np.arange(0, 10.001, 0.2)
y = 3*x**2
y
array([0.0000e+00, 1.2000e-01, 4.8000e-01, 1.0800e+00, 1.9200e+00,
       3.0000e+00, 4.3200e+00, 5.8800e+00, 7.6800e+00, 9.7200e+00,
       1.2000e+01, 1.4520e+01, 1.7280e+01, 2.0280e+01, 2.3520e+01,
       2.7000e+01, 3.0720e+01, 3.4680e+01, 3.8880e+01, 4.3320e+01,
       4.8000e+01, 5.2920e+01, 5.8080e+01, 6.3480e+01, 6.9120e+01,
       7.5000e+01, 8.1120e+01, 8.7480e+01, 9.4080e+01, 1.0092e+02,
       1.0800e+02, 1.1532e+02, 1.2288e+02, 1.3068e+02, 1.3872e+02,
       1.4700e+02, 1.5552e+02, 1.6428e+02, 1.7328e+02, 1.8252e+02,
       1.9200e+02, 2.0172e+02, 2.1168e+02, 2.2188e+02, 2.3232e+02,
       2.4300e+02, 2.5392e+02, 2.6508e+02, 2.7648e+02, 2.8812e+02,
       3.0000e+02])

linspace#

  • Generating a sequence of numbers

    • np.linspace – evenly spaced values

      • Inputs – start, stop, and number

    • The number can be left out; default value of 50 will be used

    • Can also use np.logspace and np.geomspace

np.linspace(0, 10, 5)
array([ 0. ,  2.5,  5. ,  7.5, 10. ])
np.linspace(2.5, -2.5)
array([ 2.5       ,  2.39795918,  2.29591837,  2.19387755,  2.09183673,
        1.98979592,  1.8877551 ,  1.78571429,  1.68367347,  1.58163265,
        1.47959184,  1.37755102,  1.2755102 ,  1.17346939,  1.07142857,
        0.96938776,  0.86734694,  0.76530612,  0.66326531,  0.56122449,
        0.45918367,  0.35714286,  0.25510204,  0.15306122,  0.05102041,
       -0.05102041, -0.15306122, -0.25510204, -0.35714286, -0.45918367,
       -0.56122449, -0.66326531, -0.76530612, -0.86734694, -0.96938776,
       -1.07142857, -1.17346939, -1.2755102 , -1.37755102, -1.47959184,
       -1.58163265, -1.68367347, -1.78571429, -1.8877551 , -1.98979592,
       -2.09183673, -2.19387755, -2.29591837, -2.39795918, -2.5       ])

zeros, ones, full#

  • Generating arrays filled with constant values

    • np.zeros(shape) – creates an ndarray with zeros given the shape

      • shape can either be an int or a tuple or list of integers

    • np.ones(shape) – same as np.zeros but fills the array with ones

    • np.full(shape, fill) – creates an array with the given shape filled with the fill value

np.zeros(4)
array([0., 0., 0., 0.])
np.full([2,3],2)
array([[2, 2, 2],
       [2, 2, 2]])

eye#

  • Identity matrix

    • np.eye(N) – creates an identity matrix with N rows and columns

    • np.eye(N, M) – creates an identity matrix with N rows and M columns

    • np.identity(N) – equivalent to np.eye(N)

np.eye(4)
array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])
np.eye(4, 3) # truncated identity matrix
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 0.]])

diag#

  • Diagonal matrices

    • np.diag(arr) – creates a diagonal matrix given an array-like object (lists, tuples, 1D ndarray etc.)

      • Note: if the input is a 2D ndarray, the output is a 1D ndarray containing the diagonal elements

np.diag((1, 2, 3))
array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

Importing Data#

  • From a text file

    • np.loadtxt(filename, delimiter, skiprows, dtype) – creates an array with data from a file

  • File contents

x, y

0, 0

1, 1

2, 4

3, 9

4, 16

5, 25

a = np.loadtxt('Data.csv', delimiter=',', skiprows = 1, dtype = 'i')
print(a)
[[ 0  0]
 [ 1  1]
 [ 2  4]
 [ 3  9]
 [ 4 16]
 [ 5 25]]

Datatypes#

  • Almost all array creation methods accept an optional parameter dtype to specify the data type

  • The default data type in numpy is float

a_int = np.array((1, 2, 4), dtype=int)
a_flt = np.array((1, 2, 4), dtype=float)
a_cplx = np.array((1, 2, 4), dtype=complex)
a_bool = np.array((1, 2, 4), dtype=bool)
print(a_int, a_flt, a_cplx, a_bool, sep="\n\n", end="\n\n")
print(a_int.dtype, a_flt.dtype, a_cplx.dtype, a_bool.dtype, sep="\t")
[1 2 4]

[1. 2. 4.]

[1.+0.j 2.+0.j 4.+0.j]

[ True  True  True]

int64	float64	complex128	bool
  • Check this webpage for more info on data types

Data Type Conversion#

  • Arrays can be converted from one data type to another using the astype method

mat = np.full((3, 3), 2, dtype = int)
print(mat)
[[2 2 2]
 [2 2 2]
 [2 2 2]]
mat_complex = mat.astype(complex)
print(mat_complex)
[[2.+0.j 2.+0.j 2.+0.j]
 [2.+0.j 2.+0.j 2.+0.j]
 [2.+0.j 2.+0.j 2.+0.j]]

Accessing Elements#

  • Indexing and slicing (introduced with Python sequences) works with numpy arrays too

    • [𝑖 :𝑗 :Δ] – extracts elements beginning from index 𝑖 to 𝑗−1 in increments of Δ

    • If 𝑗 is not specified, it is assumed to be the length of the sequence

    • If 𝑖 is not specified, it is assumed to be 0

    • If Δ is not specified, it is assumed to be 1

arr1d = np.array([1, 4, 9, 16, 25, 36])
arr1d.shape
(6,)
# Basic indexing
arr1d[-2]
25
# Slicing
arr1d[0::2]
array([ 1,  9, 25])
# Slicing
arr1d[::-1]
array([36, 25, 16,  9,  4,  1])
arr1d = np.array([1, 4, 9, 16, 25, 36])
arr1d[[0, 3, 2, 0]]
array([ 1, 16,  9,  1])
j = np.array([[3, 4],
              [1, 2]])
arr1d[j]
array([[16, 25],
       [ 4,  9]])
  • Indexing and slicing of 2D arrays

    • Indices are [row, column]

    • : used to indicates all elements along the axis are to be selected

arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])
# Basic indexing
arr2d[1] # row
array([4, 5, 6])
# Basic indexing
arr2d[2, 0] # row, col
7
# Basic indexing
arr2d[(2, 0)] # Same result with a tuple
7
# Slicing
arr2d[0::2]
array([[1, 2, 3],
       [7, 8, 9]])
# Slicing + Indexing
arr2d[:, :]
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
  • Advanced indexing for 2D arrays

arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])
inds = np.array([[1, 2], 
                [2, 0]])
arr2d[inds]
array([[[4, 5, 6],
        [7, 8, 9]],

       [[7, 8, 9],
        [1, 2, 3]]])
  • Treats the 2D array as a 1D array and generated a higher dimensional ndarray

arr2d[0, [2, 1]]
array([3, 2])
  • Elements with indices (0, 1) and (0, 2)

arr2d[[0, 0, 2, 2], 
      (0, 2, 0, 2)] # Tuples or lists can be used for specifying indices
array([1, 3, 7, 9])
  • Elements with indices (0, 0), (0, 2), (2, 0), and (2, 2)

rows = np.array([[0, 0], 
                 [2, 2]])
columns = [[0, 2], 
           [0, 2]]
arr2d[rows, columns]
array([[1, 3],
       [7, 9]])
  • Same elements as before are selected but the results is now a 2D array

  • Indexing and slicing for 3D arrays

a = np.array([1, 2, 3, 4, 5, 6, 7, 8])
arr3d = a.reshape((2, 2, 2))
arr3d
array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])
  • The reshape method changes the shape of the ndarray given a tuple provided the number of elements remain the same

  • np.reshape module function does the same thing

arr3d[0, 1, 0]
3
arr3d[(0, 1, 0)] # Same as above, but using a tuple; list will result in advanced slicing
3
arr3d[:, 1, 1]
array([4, 8])
  • Each element is a 2D array

Boolean array indexing#

  • The comparison operator is broadcast to all elements in the array

arr2d = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]])
arr2d > 5
array([[False, False, False],
       [False, False,  True],
       [ True,  True,  True]])
arr2d[arr2d > 5]
array([6, 7, 8, 9])
arr2d[arr2d % 2 == 1]
array([1, 3, 5, 7, 9])
  • np.isnan and np.isinf can be used to filter bad data

arr = np.array([[1, np.inf, 3],
                [4, 5, 6],
                [7, 8, np.nan]])
arr = arr[~np.isinf(arr)] # ~ is the bitwise not operator
arr = arr[~np.isnan(arr)]
print(arr)
[1. 3. 4. 5. 6. 7. 8.]

Assignment#

  • Numpy arrays are mutable

    • They are modified just like Python lists

  • Augmented assignment can also be used

a = np.linspace(1, 9, 9, dtype=int)
m = a.reshape((3, 3)) # np.reshape(a, (3, 3))
m
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
  • Assigning a constant value to a row

m[1] = 0
m
array([[1, 2, 3],
       [0, 0, 0],
       [7, 8, 9]])
  • Assigning a constant value to a column

m[:, 1] = 10
m
array([[ 1, 10,  3],
       [ 0, 10,  0],
       [ 7, 10,  9]])
  • Assign an ndarray or list or tuple to a row

    • Dimensions of the LHS and RHS must match

m[2] = [2, 3, 4]
m
array([[ 1, 10,  3],
       [ 0, 10,  0],
       [ 2,  3,  4]])

Arithmetic Operations#

  • Arithmetic operations result in a new instance of ndarray

a = np.linspace(1, 9, 9, dtype=int).reshape((3, 3))
a
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
a + 2
array([[ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])
  • Values are broadcast across array

    • In other words, the scale 2 is stretched to match the dimensions of a

  • Add a constant value to the array via broadcasting

a + 2
🠓 Broadcasting
$$ \begin{array}{|c|c|c|} \hline 1 & 2 & 3 \\ \hline 4 & 5 & 6 \\ \hline 7 & 8 & 9 \\ \hline \end{array} \quad + \quad \begin{array}{|c|c|c|} \hline 2 & \color{#969696} 2 & \color{#969696} 2 \\ \hline \color{#969696} 2 & \color{#969696} 2 & \color{#969696} 2 \\ \hline \color{#969696} 2 & \color{#969696} 2 & \color{#969696} 2 \\ \hline \end{array} \quad = \quad \begin{array}{|c|c|c|} \hline 3 & 4 & 5 \\ \hline 6 & 7 & 8 \\ \hline 9 & 10 & 11 \\ \hline \end{array} $$
a ** [1, 2, 3]
array([[  1,   4,  27],
       [  4,  25, 216],
       [  7,  64, 729]])
  • The list, tuple, or ndarray being multiplied will be stretched to match the dimensions of a

  • Multiplying a 2D array with an 1D ndarray, list, or tuple

a * [1, 2, 3]
🠓 Broadcasting
$$ \begin{array}{|c|c|c|} \hline 1 & 2 & 3 \\ \hline 4 & 5 & 6 \\ \hline 7 & 8 & 9 \\ \hline \end{array} \quad + \quad \begin{array}{|c|c|c|} \hline 1 & 2 & 3 \\ \hline \color{#969696} 1 & \color{#969696} 2 & \color{#969696} 3 \\ \hline \color{#969696} 1 & \color{#969696} 2 & \color{#969696} 3 \\ \hline \end{array} \quad = \quad \begin{array}{|c|c|c|} \hline 1 & 4 & 9 \\ \hline 4 & 10 & 18 \\ \hline 7 & 16 & 27 \\ \hline \end{array} $$
  • All the other Python operators like -, /, //, %, <, >, and == work in the same manner

  • Broadcasting works on any two arrays with compatible dimensions

# A 2D array with 4 rows and 1 column
x = np.arange(4, dtype=int).reshape((4, 1))
x
array([[0],
       [1],
       [2],
       [3]])
# A 2D array with 1 row and 3 columns
# Not the same as a 1D array!
y = np.ones(3).reshape((1, 3))
y
array([[1., 1., 1.]])
x + y
array([[1., 1., 1.],
       [2., 2., 2.],
       [3., 3., 3.],
       [4., 4., 4.]])
x + y
🠓 Broadcasting
$$ \begin{array}{|c|c|c|} \hline 0 & \color{#969696} 0 & \color{#969696} 0 \\ \hline 1 & \color{#969696} 1 & \color{#969696} 1 \\ \hline 2 & \color{#969696} 2 & \color{#969696} 2 \\ \hline 3 & \color{#969696} 3 & \color{#969696} 3 \\ \hline \end{array} \quad + \quad \begin{array}{|c|c|c|} \hline 1 & 1 & 1 \\ \hline \color{#969696} 1 & \color{#969696} 1 & \color{#969696} 1 \\ \hline \color{#969696} 1 & \color{#969696} 1 & \color{#969696} 1 \\ \hline \color{#969696} 1 & \color{#969696} 1 & \color{#969696} 1 \\ \hline \end{array} \quad = \quad \begin{array}{|c|c|c|} \hline 1 & 1 & 1 \\ \hline 2 & 2 & 2 \\ \hline 3 & 3 & 3 \\ \hline 4 & 4 & 4 \\ \hline \end{array} $$

Matrix Math#

  • By default, all operations in numpy are element-wise.

a = np.linspace(1, 9, 9, dtype=int)
x = a.reshape((3, 3))
y = np.eye(3)
y
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])
x*y
array([[1., 0., 0.],
       [0., 5., 0.],
       [0., 0., 9.]])
  • To perform matrix multiplication, use @ operator instead of *

    • Alternately, np.matmul(x, y) can be used

x@y
array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])
  • @ operators works with list and tuple objects too in the same way it does with ndarray

b = (1, 1, 1)
x@b # b is treated as a column vector
array([ 6, 15, 24])
c = [1, 2, 3]
c@x # c is treated as a row vector

Augmented Assignment#

  • Augmented assignment modifies the nadrray in place

a = np.linspace(1, 9, 9, dtype=int).reshape((3, 3))
a
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
  • Multiply a constant value to a row

a[1] *= 2
a
array([[ 1,  2,  3],
       [ 8, 10, 12],
       [ 7,  8,  9]])
  • Subtract an ndarray, list, or tuple to a column (dimensions must match)

a[:, 2] -= (2, 3, 4)
a
array([[ 1,  2,  1],
       [ 8, 10,  9],
       [ 7,  8,  5]])
  • Multiply the entire array by an ndarray (dimensions should match)

a += [1, 2, 4]
a
array([[ 2,  4,  5],
       [ 9, 12, 13],
       [ 8, 10,  9]])
  • Other operations work the same way

  • The two operands should have the same data type

  • For division, the dtype of ndarray should be float

Broadcasting Rules#

  • The basic rules of broadcasting:

    • The trailing dimension of each array is either equal or one of them is 1

      • The array with trailing dimension 1 will be stretched to match the other array’s dimension

    • This is repeated for the other dimensions of the array going right to left

  • Example 1:

ndarray

shape

Description

A

(512, 512, 4)

A color image with 512x512 pixels and rgba values for each pixel

B

(4,)

Scaling values, one for each channel (, in the shape can be ignored)

A*B

(512, 512, 4)

A new image with scaled rgba values

  • Example 2:

ndarray

shape

A

(8, 1, 6, 1)

B

(7, 1, 5)

A<op>B

(8, 7, 6, 5)

  • Example 3:

ndarray

shape

A

(2, 1)

B

(8, 4, 3)

A<op>B

?

  • In this case, the dimensions are incompatible and broadcasting is not possible

Numpy Array Methods#

  • Some ndarray methods

    • transpose – transpose of the array (equivalent to the attribute, T)

    • max, min, mean – return the max, min and mean of the array, respectively, along a given axis

    • sum, prod – return the sum and product, respectively, of all the elements along a given axis

    • sort – sort the array in-place

    • argsort – returns an array of indices that would sort the array

    • flatten – return a flattened copy of the array (i.e., collapsed to 1d)

a = np.array([[2, 3, 1],
              [4, 15, 6],
              [7, 8, 9]])
a.sort(0) # Sort along axis 0 (columns)
a
a.sort(1) # Sort along axis 1 (rows)
a

Numpy Functions#

  • Numpy provides functions/routines for

    • Array creation

      • array, zeros, ones, full, arange, linspace, …

    • Array manipulation

      • reshape, delete, insert, append, hstack, vstack, split, transpose, …

    • Linear algebra

      • dot/inner, outer, linalg.inv, linalg.eig, linalg.det, …

    • Mathematical functions (https://numpy.org/doc/1.21/reference/routines.math.html)

      • Vectorized – apply the function element-wise

      • Trigonometric functions – sin, cos, tan, arcsin, arccos, arctan, arctan2

      • Hyperbolic functions – sinh, cosh, tanh, arcsinh, arccosh, arctanh

      • exp, log, ceil, floor, abs, …

x = np.arange(0, 2.01*np.pi, 0.2*np.pi)
y = np.sin(x)

a = np.vstack((x, y)).T
a
print(f'{"x":^4}\t{"y":^5}')
print(f'{"-"*4}\t{"-"*5}')
for p in a:
    print(f"{p[0]:1.2f}\t{p[1]: 1.2f}")

Vectorization#

  • Python functions like math.sin do not work with ndarray objects

import math

x = np.arange(0, 2, 0.01)
y = [math.sin(i) for i in x] # List comprehension - alternate syntax for a simple for loop
# The above code is equivalent to the commented code below
# y = []
# for i in x:
#     y += math.sin(i)
print(y)
%timeit [math.sin(i) for i in x]
  • np.sin works readily with ndarrays and is much faster

%timeit np.sin(x)
  • Use np.vectorize(func) for functions that are not numpy aware

    • Returns a vectorized version of the function

    • Just wraps a for loop around numpy unaware function; do not expect a speedup

    • The ouput datatype is determined by evaluating func with the first element

def mySqrt(x):
    if x < 0:
        # If this is 0, output of the vectorized function will be of type int
        return 0.0
    else:
        return x**0.5
  • Cannot use ndarray as an input

x = np.arange(-1, 2, 0.1)
mySqrt_vectorized = np.vectorize(mySqrt)
y = mySqrt_vectorized(x)
y
array([0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.        , 0.        , 0.        , 0.        ,
       0.        , 0.31622777, 0.4472136 , 0.54772256, 0.63245553,
       0.70710678, 0.77459667, 0.83666003, 0.89442719, 0.9486833 ,
       1.        , 1.04880885, 1.09544512, 1.14017543, 1.18321596,
       1.22474487, 1.26491106, 1.30384048, 1.34164079, 1.37840488])

Python to Matlab#