Basic Data Types#

How to Build the World in Binary#

  • Computers operate in binary – transistors are on (1) or off (0)

  • Decimal (base 10) to binary (base 2):

Decimal

Binary

0

00000000

1

00000001

2

00000010

3

00000011

4

00000100

8

00001000

74

01001010

  • What about negative numbers? Decimals? Letters? Punctuation? Graphics?

  • This is the letter J: 01001010

  • It’s also the number 74

  • We need to inform the computer whether this is meant to be an integer or a letter or … something else

Data Types#

  • In statically typed languages, data type must be explicitly specified.

  • In C/C++:

    • int a = 12345;

    • float b = 1.2345e+22;

    • char c = '#';

  • Dynamically typed languages like Python will automatically detect the data type

  • Python will automatically assign data type (you can still manually specify if desired):

a = 12345
b = 1.2345e+22
c = '#'
d = 1 + 2j

type(d)
complex

Python Data Types#

  • Built-in data types in Python

    • Numeric types

    • Boolean type

    • Sequence types

      • For sequences of numerical/textual data

  • Programmers can also create their own data types through classes

    • A class defines the blueprint/template for creating new instances of objects

    • Objects combine data (i.e., attributes of the object) with methods (i.e., functions that operate on the object’s data)

    • Examples

      • Squares, rectangles and rhombi are instances of the class quadrilaterals

      • Basketball, volleyball, tennis, hockey etc. are instances of the class sports

  • Built-in basic data types in Python:

    • bool – Boolean data

    • int, float, complex – numbers

    • str – sequences of text termed strings

  • In Python, all data is represented by objects

    • Each object has an identity, type, and value

    • This contrasts with languages C and C++ where basic data types like bool, int, and float are just data

Boolean#

  • Boolean type - bool

    • Takes a value of either True or False

y = True
n = False
  • y and n are variable names or identfiers

  • True and False are the values assigned to them

  • y and n are of type bool

type(y)
bool

Integer#

  • Integer type - int

    • Integers of arbitrary size

    • In C/C++, the int data type is typically 4 bytes (= 32 bits) in size

      • There exists a longer integer data type called long, 8 bytes in size

      • And a shorter integer data type called short, 2 bytes in size

  • Signed integer representation in memory:

b31 b30 b29 ... ... ... b2 b1 b0
  • The decimal number that it represents is given by:

\[d = (-1)^{b_{31}} \sum_{i=0}^{30} b_i 2^i\]
  • Here \(b_i\) are bits of data (0’s or 1’s) (\(i = 0, 1, \cdots, 30\))

  • \(b_{31}\) is the sign bit (0 is positive, 1 is negative)

  • Represents integers from \(-2^{31}\) to \(2^{31}-1\) (-2,147,483,648 to 2,147,483,647)

  • Unsigned integers do not use a signed bit:

\[d = \sum_{i=0}^{31} b_i 2^i\]
  • Represents integers from 0 to \(2^{32} - 1\) (0 to 4,294,967,295)

Integer Overflow#

  • Consider this C program:

#include <stdio.h>
#include <math.h>

int main()
{
    int a = pow(2, 32);
    printf("Value of a is %d\n", a);
    printf("Size of a is %lu bytes\n", sizeof(a));
    
    long b = pow(2, 32);
    printf("Value of b is %ld\n", a);
    printf("Size of b is %lu bytes\n", sizeof(b));
}
  • The output of this program is (you can run it on cpp.sh, an online C/C++ compiler):

Value of a is 2,147,483,647
Size of a is 4 bytes

Value of b is 4,294,967,296
Size of b is 8 bytes

  • In C, C++ and other related languages, care should be taken to use the correct data type

  • Not doing so will lead to unpredictable (and sometimes disastrous results)

  • Python allows for an arbitrary sized int data type and does not have an integer overflow problem

x = 1
y = 200000
z = 1_000_000_000
  • _ can be used optionally to separate digits for clarity

# Arbitrary precision of Python int data type
import sys
a = 2**320
print(f"The value of a is {a}")
print(f"The size of a is {sys.getsizeof(a)} bytes")
The value of a is 2135987035920910082395021706169552114602704522356652769947041607822219725780640550022962086936576
The size of a is 68 bytes

Float#

  • Float type - float

    • Real numbers of size 8 bytes (same as the data type double in C/C++)

x = 1.24
y = 1.1e-4 # Scientific notation
z = 0.000_000_0001
  • Decimal point or scientific notation must be used; otherwise, the number is interpreted as an int

Float

Double-precision floating-point number binary representation (Source: Wikipedia.org)

  • Max float value \(\approx\) 1.7977e+308

  • Min positive float value \(\approx\) 4.9406e-324

  • Floating point number inf and nan

    • inf - any number that goes beyond the memory capacity of float

      • Short for infinity

    • nan - numbers that lack mathematical basis

      • Short for not a number

2.e+308
inf
-2.1e+308
-inf
2.e+308 * 0
nan

Complex#

  • Complex type – complex

    • Represents complex numbers through two floats

    • Written as the sum of the real part and imaginary part

    • The numeric value of the imaginary part should be immediately followed by the letter “j” which denotes the imaginary unit

      • \(j^2=−1\)

a = 2 + 4j
print(a)
(2+4j)
b = 1.2 + 2.4j
b.real
1.2
b.imag
2.4
  • real and imag are attributes specific to the complex data type

String#

  • Text type – str

    • Represents a sequence/string of characters

    • Characters include:

      • Alphabet – lowercase and uppercase

      • Digits – 0, 1, 2, ⋯, 9

      • Symbols - !, @, #, $, ⋯

      • Special characters:

Name

Character

Line feed or new line

\n

Form feed or page break

\f

Carriage return

\r

Tab

\t

Backspace

\b

Bell

\a

c1 = 'a'
c2 = '#'
c3 = '0'
  • C and C++ have a data type called char for single characters

  • In Python, even a single character is stored as a string object

s1 = "Hello World"
s2 = 'Hello Washingtonians'
s3 = '''Hello Cougs'''
s4 = """Hello MME"""
  • Single (’ ‘) or double (” “) quotes must be used to define a string

  • Alternately, tripe single (‘’’ ‘’’) or double quotes (“”” “”””) can be used

  • Use of single quotes and double quotes cannot be mixed

  • With triple single/double quotes, a string can be split into multiple lines

s2 = "She sells seashells \
by the seashore."
print(s2)
She sells seashells by the seashore.
  • The backslash (\) character splits a single line of code into two lines

    • But, it is still interpreted as a single line by the interpreter

  • Here, \ is not a part of the string definition

  • Triple single or double quotes allow splitting and inclusion of whitespace characters like \n

s3 = '''She sells seashells
by the seashore.'''
print(s3)
She sells seashells
by the seashore.
s4 = """She sells seashells
by the seashore."""
print(s4)
She sells seashells
by the seashore.
s3 = '''She sells seashells \
by the seashore.'''
print(s3)
She sells seashells by the seashore.

Identifiers#

  • Variables of different data types are typically assigned a name or identifier

  • It is important that variable names be descriptive

# Not descriptive
_ = 20
# Descriptive identifiers
velocity = 20
distance = 400
time = 34
  • Cannot use any of the Python keywords for a variable name

False

await

else

import

pass

None

break

except

in

raise

True

class

finally

is

return

and

continue

for

lambda

try

as

def

from

nonlocal

while

assert

del

global

not

with

async

elif

if

or

yield

  • Interpreter throws an exception or error if an attempt is made to use a keyword for a variable name

def = 20.4
  Cell In[21], line 1
    def = 20.4
        ^
SyntaxError: invalid syntax
  • Rules for identifiers

    • Can be of any length

    • Can combine English alphabet (a to z and A to Z) and digits (0 to 9) with underscore (_)

      • A_1, b2, b_2

      • _b, __b

      • This_is_a_long_overly_descriptive_identifier

    • Identifier cannot begin with a digit (e.g., 1a)

    • Identifier cannot contain spaces

  • Underscore usage conventions

    • A trailing underscore is used with a variable if it conflicts with a keyword

    • A single underscore is a temporary variable

      • Also stores the last evaluated expression when using Python interpreter interactively

    • Identifiers with leading and trailing double underscores and a leading underscore are by convention only used in the definition of Python class attributes

Naming Convention#

  • A consistent naming scheme for identifiers keeps the source code readable and thus, maintainable

Convention

Example

Convention

Example

Single lowercase letter

b = 241

Single uppercase letter

B = 241

lowercase

num = 241

UPPERCASE

NUM = 241

  • More practical conventions

    • Snake_case: course_num = 241

      • Words connected by _ (underscore symbol)

    • Camel case: CourseNum = 241

      • Start of each word is capitalized

    • Mixed case: courseNum = 241

      • Same as came case except first word is not capitalized

Type Conversion#

  • Python provides built-in fuctions to convert one data type to another.

    • bool - converts other data types to a Boolean type

    • int - converts to an integer type

    • float - converts to float type

    • str - converts to string type

    • complex - converts to complex type

bool(20)
True
print(bool('d'))       # str -> bool
print(int(-2.4))         # float -> int
print(float(4))          # int -> float
print(complex(3.2))      # float -> complex
print(int('2_000'))      # str -> int
print(float('3.21e+2'))  # str -> float
True
-2
4.0
(3.2+0j)
2000
321.0

Utility Functions#

  • Can determine type of object using the built-in type function.

  • Can determine the size in memory of object using sys.getsizeof function

    • Not a built-in function, must include import sys in code

    • Returns the memory size of the input object in bytes

import sys

a = 20.
sys.getsizeof(a)
24
sys.getsizeof(10**100)
72

Methods#

  • As all Python data types are objects of a specific class, they have built-in methods

  • The dir function can be used to list these methods

  • The methods with pre and post double underscores (__) are referred to as magic methods

print(dir(complex))
['__abs__', '__add__', '__bool__', '__class__', '__complex__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__mul__', '__ne__', '__neg__', '__new__', '__pos__', '__pow__', '__radd__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__rpow__', '__rsub__', '__rtruediv__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', 'conjugate', 'imag', 'real']

The dir and del Functions#

  • When no input is provided, the dir function displays a list of loaded modules and defined variables

dir()
  • The output of dir() is the scope of the program

  • The del statement removes objects or modules from the current scope

a = 20
print(a)
20
del a
print(a)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[73], line 2
      1 del a
----> 2 print(a)

NameError: name 'a' is not defined