Basic Data Types#
How to Build the World in Binary#
Computers operate in binary – transistors are on (1) or off (0)
Decimal (base 10) to binary (base 2):
Decimal |
Binary |
---|---|
0 |
00000000 |
1 |
00000001 |
2 |
00000010 |
3 |
00000011 |
4 |
00000100 |
8 |
00001000 |
74 |
01001010 |
What about negative numbers? Decimals? Letters? Punctuation? Graphics?
This is the letter J:
01001010
It’s also the number 74
We need to inform the computer whether this is meant to be an integer or a letter or … something else
Data Types#
In statically typed languages, data type must be explicitly specified.
In C/C++:
int a = 12345;
float b = 1.2345e+22;
char c = '#';
Dynamically typed languages like Python will automatically detect the data type
Python will automatically assign data type (you can still manually specify if desired):
a = 12345
b = 1.2345e+22
c = '#'
d = 1 + 2j
type(d)
complex
Python Data Types#
Built-in data types in Python
Numeric types
Boolean type
Sequence types
For sequences of numerical/textual data
…
Programmers can also create their own data types through classes
A class defines the blueprint/template for creating new instances of objects
Objects combine data (i.e., attributes of the object) with methods (i.e., functions that operate on the object’s data)
Examples
Squares, rectangles and rhombi are instances of the class quadrilaterals
Basketball, volleyball, tennis, hockey etc. are instances of the class sports
Built-in basic data types in Python:
bool
– Boolean dataint
,float
,complex
– numbersstr
– sequences of text termed strings
In Python, all data is represented by objects
Each object has an identity, type, and value
This contrasts with languages C and C++ where basic data types like
bool
,int
, andfloat
are just data
Boolean#
Boolean type -
bool
Takes a value of either
True
orFalse
y = True
n = False
y
andn
are variable names or identfiersTrue
andFalse
are the values assigned to themy
andn
are of typebool
type(y)
bool
Integer#
Integer type -
int
Integers of arbitrary size
In C/C++, the
int
data type is typically 4 bytes (= 32 bits) in sizeThere exists a longer integer data type called
long
, 8 bytes in sizeAnd a shorter integer data type called
short
, 2 bytes in size
Signed integer representation in memory:
b31 | b30 | b29 | ... | ... | ... | b2 | b1 | b0 |
The decimal number that it represents is given by:
Here \(b_i\) are bits of data (0’s or 1’s) (\(i = 0, 1, \cdots, 30\))
\(b_{31}\) is the sign bit (0 is positive, 1 is negative)
Represents integers from \(-2^{31}\) to \(2^{31}-1\) (-2,147,483,648 to 2,147,483,647)
Unsigned integers do not use a signed bit:
Represents integers from 0 to \(2^{32} - 1\) (0 to 4,294,967,295)
Integer Overflow#
Consider this C program:
#include <stdio.h>
#include <math.h>
int main()
{
int a = pow(2, 32);
printf("Value of a is %d\n", a);
printf("Size of a is %lu bytes\n", sizeof(a));
long b = pow(2, 32);
printf("Value of b is %ld\n", a);
printf("Size of b is %lu bytes\n", sizeof(b));
}
The output of this program is (you can run it on cpp.sh, an online C/C++ compiler):
Value of a is 2,147,483,647
Size of a is 4 bytes
Value of b is 4,294,967,296
Size of b is 8 bytes
In C, C++ and other related languages, care should be taken to use the correct data type
Not doing so will lead to unpredictable (and sometimes disastrous results)
The explosion of Ariane 5 was a direct result of integer overflow
Python allows for an arbitrary sized
int
data type and does not have an integer overflow problem
x = 1
y = 200000
z = 1_000_000_000
_
can be used optionally to separate digits for clarity
# Arbitrary precision of Python int data type
import sys
a = 2**320
print(f"The value of a is {a}")
print(f"The size of a is {sys.getsizeof(a)} bytes")
The value of a is 2135987035920910082395021706169552114602704522356652769947041607822219725780640550022962086936576
The size of a is 68 bytes
Float#
Float type -
float
Real numbers of size 8 bytes (same as the data type
double
in C/C++)
x = 1.24
y = 1.1e-4 # Scientific notation
z = 0.000_000_0001
Decimal point or scientific notation must be used; otherwise, the number is interpreted as an
int
Double-precision floating-point number binary representation (Source: Wikipedia.org)
Max float value \(\approx\) 1.7977e+308
Min positive float value \(\approx\) 4.9406e-324
Floating point number
inf
andnan
inf
- any number that goes beyond the memory capacity offloat
Short for infinity
nan
- numbers that lack mathematical basisShort for not a number
2.e+308
inf
-2.1e+308
-inf
2.e+308 * 0
nan
Complex#
Complex type –
complex
Represents complex numbers through two floats
Written as the sum of the real part and imaginary part
The numeric value of the imaginary part should be immediately followed by the letter “j” which denotes the imaginary unit
\(j^2=−1\)
a = 2 + 4j
print(a)
(2+4j)
b = 1.2 + 2.4j
b.real
1.2
b.imag
2.4
real
andimag
are attributes specific to thecomplex
data type
String#
Text type –
str
Represents a sequence/string of characters
Characters include:
Alphabet – lowercase and uppercase
Digits – 0, 1, 2, ⋯, 9
Symbols - !, @, #, $, ⋯
Special characters:
Name |
Character |
---|---|
Line feed or new line |
\n |
Form feed or page break |
\f |
Carriage return |
\r |
Tab |
\t |
Backspace |
\b |
Bell |
\a |
c1 = 'a'
c2 = '#'
c3 = '0'
C and C++ have a data type called
char
for single charactersIn Python, even a single character is stored as a
string
object
s1 = "Hello World"
s2 = 'Hello Washingtonians'
s3 = '''Hello Cougs'''
s4 = """Hello MME"""
Single (’ ‘) or double (” “) quotes must be used to define a string
Alternately, tripe single (‘’’ ‘’’) or double quotes (“”” “”””) can be used
Use of single quotes and double quotes cannot be mixed
With triple single/double quotes, a string can be split into multiple lines
s2 = "She sells seashells \
by the seashore."
print(s2)
She sells seashells by the seashore.
The backslash (\) character splits a single line of code into two lines
But, it is still interpreted as a single line by the interpreter
Here, \ is not a part of the string definition
Triple single or double quotes allow splitting and inclusion of whitespace characters like \n
s3 = '''She sells seashells
by the seashore.'''
print(s3)
She sells seashells
by the seashore.
s4 = """She sells seashells
by the seashore."""
print(s4)
She sells seashells
by the seashore.
s3 = '''She sells seashells \
by the seashore.'''
print(s3)
She sells seashells by the seashore.
Identifiers#
Variables of different data types are typically assigned a name or identifier
It is important that variable names be descriptive
# Not descriptive
_ = 20
# Descriptive identifiers
velocity = 20
distance = 400
time = 34
Cannot use any of the Python keywords for a variable name
False |
await |
else |
import |
pass |
None |
break |
except |
in |
raise |
True |
class |
finally |
is |
return |
and |
continue |
for |
lambda |
try |
as |
def |
from |
nonlocal |
while |
assert |
del |
global |
not |
with |
async |
elif |
if |
or |
yield |
Interpreter throws an exception or error if an attempt is made to use a keyword for a variable name
def = 20.4
Cell In[21], line 1
def = 20.4
^
SyntaxError: invalid syntax
Rules for identifiers
Can be of any length
Can combine English alphabet (a to z and A to Z) and digits (0 to 9) with underscore (_)
A_1
,b2
,b_2
_b
,__b
This_is_a_long_overly_descriptive_identifier
Identifier cannot begin with a digit (e.g.,
1a
)Identifier cannot contain spaces
Underscore usage conventions
A trailing underscore is used with a variable if it conflicts with a keyword
A single underscore is a temporary variable
Also stores the last evaluated expression when using Python interpreter interactively
Identifiers with leading and trailing double underscores and a leading underscore are by convention only used in the definition of Python class attributes
Naming Convention#
A consistent naming scheme for identifiers keeps the source code readable and thus, maintainable
A document named PEP 8 – Style Guide for Python Code describes some commonly used conventions
Convention |
Example |
Convention |
Example |
---|---|---|---|
Single lowercase letter |
|
Single uppercase letter |
|
lowercase |
|
UPPERCASE |
|
More practical conventions
Snake_case:
course_num = 241
Words connected by
_
(underscore symbol)
Camel case:
CourseNum = 241
Start of each word is capitalized
Mixed case:
courseNum = 241
Same as came case except first word is not capitalized
Type Conversion#
Python provides built-in fuctions to convert one data type to another.
bool
- converts other data types to a Boolean typeint
- converts to an integer typefloat
- converts to float typestr
- converts to string typecomplex
- converts to complex type
bool(20)
True
print(bool('d')) # str -> bool
print(int(-2.4)) # float -> int
print(float(4)) # int -> float
print(complex(3.2)) # float -> complex
print(int('2_000')) # str -> int
print(float('3.21e+2')) # str -> float
True
-2
4.0
(3.2+0j)
2000
321.0
Utility Functions#
Can determine type of object using the built-in
type
function.Can determine the size in memory of object using
sys.getsizeof
functionNot a built-in function, must include
import sys
in codeReturns the memory size of the input object in bytes
import sys
a = 20.
sys.getsizeof(a)
24
sys.getsizeof(10**100)
72
Methods#
As all Python data types are objects of a specific class, they have built-in methods
The
dir
function can be used to list these methodsThe methods with pre and post double underscores (__) are referred to as magic methods
print(dir(complex))
['__abs__', '__add__', '__bool__', '__class__', '__complex__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getnewargs__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__mul__', '__ne__', '__neg__', '__new__', '__pos__', '__pow__', '__radd__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__rpow__', '__rsub__', '__rtruediv__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', 'conjugate', 'imag', 'real']
The dir and del Functions#
When no input is provided, the
dir
function displays a list of loaded modules and defined variables
dir()
The output of
dir()
is the scope of the program
The
del
statement removes objects or modules from the current scope
a = 20
print(a)
20
del a
print(a)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[73], line 2
1 del a
----> 2 print(a)
NameError: name 'a' is not defined