Basic Data Types

Basic Data Types#

How to Build the World in Binary#

Computers operate in binary – transistors are on (1) or off (0)

Decimal (base 10) to binary (base 2):

Decimal	Binary
0	00000000
1	00000001
2	00000010
3	00000011
4	00000100
8	00001000
74	01001010

What about negative numbers? Decimals? Letters? Punctuation? Graphics?

This is the letter J: 01001010

It’s also the number 74
We need to inform the computer whether this is meant to be an integer or a letter or … something else

Data Types#

In statically typed languages, data type must be explicitly specified.
In C/C++:
- int a = 12345;
- float b = 1.2345e+22;
- char c = '#';

Dynamically typed languages like Python will automatically detect the data type

Python will automatically assign data type (you can still manually specify if desired):

a = 12345
b = 1.2345e+22
c = '#'
d = 1 + 2j

type(d)

complex

Python Data Types#

Built-in data types in Python
- Numeric types
- Boolean type
- Sequence types
  - For sequences of numerical/textual data
- …

Programmers can also create their own data types through classes
- A class defines the blueprint/template for creating new instances of objects
- Objects combine data (i.e., attributes of the object) with methods (i.e., functions that operate on the object’s data)
- Examples
  - Squares, rectangles and rhombi are instances of the class quadrilaterals
  - Basketball, volleyball, tennis, hockey etc. are instances of the class sports

Built-in basic data types in Python:
- bool – Boolean data
- int, float, complex – numbers
- str – sequences of text termed strings

In Python, all data is represented by objects
- Each object has an identity, type, and value
- This contrasts with languages C and C++ where basic data types like bool, int, and float are just data

Boolean#

Boolean type - bool
- Takes a value of either True or False

y = True
n = False

y and n are variable names or identfiers
True and False are the values assigned to them
y and n are of type bool

type(y)

bool

Integer#

Integer type - int
- Integers of arbitrary size
- In C/C++, the int data type is typically 4 bytes (= 32 bits) in size
  - There exists a longer integer data type called long, 8 bytes in size
  - And a shorter integer data type called short, 2 bytes in size

Signed integer representation in memory:

b₃₁

b₃₀

b₂₉

...

b₂

b₁

b₀

The decimal number that it represents is given by:

\[d = (-1)^{b_{31}} \sum_{i=0}^{30} b_i 2^i\]

Here $b_i$ are bits of data (0’s or 1’s) ($i = 0, 1, \cdots, 30$)
$b_{31}$ is the sign bit (0 is positive, 1 is negative)
Represents integers from $-2^{31}$ to $2^{31}-1$ (-2,147,483,648 to 2,147,483,647)

Unsigned integers do not use a signed bit:

\[d = \sum_{i=0}^{31} b_i 2^i\]

Represents integers from 0 to $2^{32} - 1$ (0 to 4,294,967,295)

Integer Overflow#

Consider this C program:

#include <stdio.h>
#include <math.h>

int main()
{
    int a = pow(2, 32);
    printf("Value of a is %d\n", a);
    printf("Size of a is %lu bytes\n", sizeof(a));
    
    long b = pow(2, 32);
    printf("Value of b is %ld\n", a);
    printf("Size of b is %lu bytes\n", sizeof(b));
}

The output of this program is (you can run it on cpp.sh, an online C/C++ compiler):

Value of a is 2,147,483,647
Size of a is 4 bytes

Value of b is 4,294,967,296
Size of b is 8 bytes

In C, C++ and other related languages, care should be taken to use the correct data type
Not doing so will lead to unpredictable (and sometimes disastrous results)
- The explosion of Ariane 5 was a direct result of integer overflow

Python allows for an arbitrary sized int data type and does not have an integer overflow problem

x = 1
y = 200000
z = 1_000_000_000

_ can be used optionally to separate digits for clarity

# Arbitrary precision of Python int data type
import sys
a = 2**320
print(f"The value of a is {a}")
print(f"The size of a is {sys.getsizeof(a)} bytes")

The value of a is 2135987035920910082395021706169552114602704522356652769947041607822219725780640550022962086936576
The size of a is 68 bytes

Float#

Float type - float
- Real numbers of size 8 bytes (same as the data type double in C/C++)

x = 1.24
y = 1.1e-4 # Scientific notation
z = 0.000_000_0001

Decimal point or scientific notation must be used; otherwise, the number is interpreted as an int

Float

Double-precision floating-point number binary representation (Source: Wikipedia.org)

Max float value $\approx$ 1.7977e+308
Min positive float value $\approx$ 4.9406e-324

Floating point number inf and nan
- inf - any number that goes beyond the memory capacity of float
  - Short for infinity
- nan - numbers that lack mathematical basis
  - Short for not a number

2.e+308

inf

-2.1e+308

-inf

2.e+308 * 0

nan

Complex#

Complex type – complex
- Represents complex numbers through two floats
- Written as the sum of the real part and imaginary part
- The numeric value of the imaginary part should be immediately followed by the letter “j” which denotes the imaginary unit
  - $j^2=−1$

a = 2 + 4j
print(a)

(2+4j)

b = 1.2 + 2.4j
b.real

1.2

b.imag

2.4

real and imag are attributes specific to the complex data type

String#

Text type – str
- Represents a sequence/string of characters
- Characters include:
  - Alphabet – lowercase and uppercase
  - Digits – 0, 1, 2, ⋯, 9
  - Symbols - !, @, #, $, ⋯
  - Special characters:

Name	Character
Line feed or new line	\n
Form feed or page break	\f
Carriage return	\r
Tab	\t
Backspace	\b
Bell	\a

c1 = 'a'
c2 = '#'
c3 = '0'

C and C++ have a data type called char for single characters
In Python, even a single character is stored as a string object

s1 = "Hello World"
s2 = 'Hello Washingtonians'
s3 = '''Hello Cougs'''
s4 = """Hello MME"""

Single (’ ‘) or double (” “) quotes must be used to define a string
Alternately, tripe single (‘’’ ‘’’) or double quotes (“”” “”””) can be used
Use of single quotes and double quotes cannot be mixed

With triple single/double quotes, a string can be split into multiple lines

s2 = "She sells seashells \
by the seashore."
print(s2)

She sells seashells by the seashore.

The backslash (\) character splits a single line of code into two lines
- But, it is still interpreted as a single line by the interpreter
Here, \ is not a part of the string definition

Triple single or double quotes allow splitting and inclusion of whitespace characters like \n

s3 = '''She sells seashells
by the seashore.'''
print(s3)

She sells seashells
by the seashore.

s4 = """She sells seashells
by the seashore."""
print(s4)

She sells seashells
by the seashore.

s3 = '''She sells seashells \
by the seashore.'''
print(s3)

She sells seashells by the seashore.

Identifiers#

Variables of different data types are typically assigned a name or identifier
It is important that variable names be descriptive

# Not descriptive
_ = 20

# Descriptive identifiers
velocity = 20
distance = 400
time = 34

Cannot use any of the Python keywords for a variable name


False	await	else	import	pass
None	break	except	in	raise
True	class	finally	is	return
and	continue	for	lambda	try
as	def	from	nonlocal	while
assert	del	global	not	with
async	elif	if	or	yield

Interpreter throws an exception or error if an attempt is made to use a keyword for a variable name

def = 20.4

  Cell In[21], line 1
    def = 20.4
        ^
SyntaxError: invalid syntax

Rules for identifiers
- Can be of any length
- Can combine English alphabet (a to z and A to Z) and digits (0 to 9) with underscore (_)
  - A_1, b2, b_2
  - _b, __b
  - This_is_a_long_overly_descriptive_identifier
- Identifier cannot begin with a digit (e.g., 1a)
- Identifier cannot contain spaces

Underscore usage conventions
- A trailing underscore is used with a variable if it conflicts with a keyword
- A single underscore is a temporary variable
  - Also stores the last evaluated expression when using Python interpreter interactively
- Identifiers with leading and trailing double underscores and a leading underscore are by convention only used in the definition of Python class attributes

Naming Convention#

A consistent naming scheme for identifiers keeps the source code readable and thus, maintainable

A document named PEP 8 – Style Guide for Python Code describes some commonly used conventions

Convention	Example	Convention	Example
Single lowercase letter	`b = 241`	Single uppercase letter	`B = 241`
lowercase	`num = 241`	UPPERCASE	`NUM = 241`

More practical conventions
- Snake_case: course_num = 241
  - Words connected by _ (underscore symbol)
- Camel case: CourseNum = 241
  - Start of each word is capitalized
- Mixed case: courseNum = 241
  - Same as came case except first word is not capitalized

Type Conversion#

Python provides built-in fuctions to convert one data type to another.
- bool - converts other data types to a Boolean type
- int - converts to an integer type
- float - converts to float type
- str - converts to string type
- complex - converts to complex type

bool(20)

True

print(bool('d'))       # str -> bool
print(int(-2.4))         # float -> int
print(float(4))          # int -> float
print(complex(3.2))      # float -> complex
print(int('2_000'))      # str -> int
print(float('3.21e+2'))  # str -> float

True
-2
4.0
(3.2+0j)
2000
321.0

Utility Functions#

Can determine type of object using the built-in type function.
Can determine the size in memory of object using sys.getsizeof function
- Not a built-in function, must include import sys in code
- Returns the memory size of the input object in bytes

import sys

a = 20.
sys.getsizeof(a)

sys.getsizeof(10**100)

Methods#

As all Python data types are objects of a specific class, they have built-in methods
The dir function can be used to list these methods
The methods with pre and post double underscores (__) are referred to as magic methods

print(dir(complex))

['__abs__', '__add__', '__bool__', '__class__', '__complex__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__mul__', '__ne__', '__neg__', '__new__', '__pos__', '__pow__', '__radd__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__rpow__', '__rsub__', '__rtruediv__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', 'conjugate', 'imag', 'real']

The dir and del Functions#

When no input is provided, the dir function displays a list of loaded modules and defined variables

dir()

The output of dir() is the scope of the program

The del statement removes objects or modules from the current scope

a = 20
print(a)

del a
print(a)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[73], line 2
      1 del a
----> 2 print(a)

NameError: name 'a' is not defined