Basics¶

Variables¶

To check if a local variable is defined, we can use the locals function:

if 'my_variable' in locals():
    print('Variable exists')

Conditions and boolean context¶

Comparison operators¶

Python uses the standard set of comparison operators (==, !=, <, >, <=, >=).

They are functionally similar to C++ operators: they can be overloaded and the semantic meaning of == is equality, not identity (in contrast to Java).

Automatic conversion to `bool`¶

Unlike in other languages, any expression can be used in boolean context in python, as there are rules how to convert any type to bool. The following statement is valid, foor example:

s = 'hello'
if s: 
    print(s)

The code above prints 'hello', as the variable s evaluates to True.

Any object in Python evaluates to True, with exeption of:

False
None
numerically zero values (e.g., 0, 0.0)
standard library types that are empty (e.g., empty string, list, dict)

The automatic conversion to bool in boolean context has some couner intuitive consequences. The following conditions are not equal:

s = 'hello'

if s: # s evaluates to True

if s == True: # the result of s == True is False, then False evaluete to False

Checking the type¶

To check the exact type:

if type(<VAR>) is <TYPE>:
# e.g.
if type(o) is str:

To check the type in the polymorphic way, including the subtypes:

if isinstance(<VAR>, <TYPE>):
# e.g.
if isinstance(o, str):

Built-in data types¶

Numbers¶

Python has the following numeric types:

int - integer
float - floating point number

The int type is unlimited, i.e., it can represent any integer number. The float type is limited by the machine precision, i.e., it can represent only a finite number of real numbers.

Check if a float number is integer¶

To check whether a float number is integer, we can use the is_integer function:

Check if a number is NaN¶

To check whether a number is NaN, we can use the math.isnan function or the numpyp.isnan function:

Rounding¶

To round a number, use the round function.

For rounding up, use the math.ceil function.

For rounding down, use the math.floor function.

Strings¶

Strings in Python can be enclosed in single or double quotes (equivalent). The triple quotes can be used for multiline strings.

String formatting¶

The string formatting can be done in several ways:

using the f prefix to string literal: f'{<VAR>}'
using the format method: '{}'.format(<VAR>)

Each variable can be formatted for that, Python has a string formatting mini language.

The format is specified after the : character (e.g., f'{47:4}' set the width of the number 47 to 4 characters). Most of the format specifiers have default values, so we can omit them (e.g., f'{47:4}' is equivalent to f'{47:4d}').

The following are the most common options:

To use the character { and } in the string, we have to escape them using double braces: {{ and }}.

String methods¶

capitalize: capitalize the first letter of the string
lower: convert the string to lowercase
upper: convert the string to uppercase
strip: remove leading and trailing whitespaces
lstrip: remove leading whitespaces
rstrip: remove trailing whitespaces

Enumerations¶

For enumerations, we can use the enum module. The basic syntax is:

from enum import Enum

class MyEnum(Enum):
    VALUE1 = 1
    VALUE2 = 2

Collections and generators¶

Python has several built-in data structures, most notably list, tuple, dict, and set. These are less efficient then comparable structures in other languages, but they are very convenient to use.

Also, there is a special generator type. It does not store the data it is only a convinient way how to access data generated by some function.

Generator¶

Python wiki

Generators are mostly used in the iteration, we can iterte them the same way as lists.

To get the first item of the generator, we can use the next function:

g = (x for x in range(10))
first = next(g) # 0

To create a generator function (a function that returns a generator), we can use the yield keyword. The following function returns a generator that yields the numbers 1, 2, and 3:

def gen():
    yield 1
    yield 2
    yield 3

The length of the generator is not known in advance, to get the length, we have to iterate the generator first, for example using len(list(<generator>))

Tuple¶

Tuples are meant to store a fixed sequence of values. They are immutable.

The tuple literal is a comma-separated list of values in round braces:

t = (1, 2, 3)

Named tuples¶

Official Manual

Named tuples are tuples with named members. Example:

from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])
p = Point(1, 2)
print(p.x) # 1

If we want to use type hints, we can use the NamedTuple class:

from typing import NamedTuple
class Point(NamedTuple):
    x: int
    y: int

p = Point(1, 2)
print(p.x) # 1

Dictionary¶

Official Manual

Disctionaries are initialized using curly braces ({}) and the : operator:

d = {
    'key1': 'value1',
    'key2': 'value2',
    ...
}

Two dictionaries can be merged using the | operator:

d3 = d1 | d2

Set¶

Documentation

Sets are initialized using curly braces ({}) or the set function:

s = {1, 2, 3}
s = set([1, 2, 3])

To add elements to the set, we use either the add for a single element or the update for multiple elements. In both cases, a union of the set and the new elements is computed, i.e., no exception is raised if an element is already in the set.

Comprehensions¶

In addition to literals, Python has a convinient way of creating basic data structures: the comprehensions. The basic syntax is:

<struct var> = 
    <op. brace> <member var expr.> for <member var> in <iterable><cl. brace>

As for literals, we use square braces ([]) for lists, curly braces ({}) for sets, and curly braces with colons for dictionaries. In contrast, we get a generator expression when using round braces (()), not a tuple.

We can also use the if keyword to filter the elements:

a = [it for it in range(10) if it % 2 == 0] # [0, 2, 4, 6, 8]

Sorting¶

Official Manual

For sorting, you can use the sorted function.

Instead of using comparators, Python has a different concept of key functions for custom sorting. The key function is a function that is applied to each element before sorting. For any expected object, the key function should return a value that can be compared.

Complex sorting using tuples¶

If we need to apply some complex sorting, we can use tuples as the key function return value. The tuples have comparison operator defined, the implementation is as follows:

elements are compared one by one
on first non-equal element, the comparison result is returned

This way, we can implement a complex sorting that would normaly require several conditions by storing the condition results in the tuple.

Slices¶

Many Python data structures support slicing: selecting a subset of elements. The syntax is:

<object>[<start>:<end>:<step>]

The start and end are inclusive.

The step is optional and defaults to 1. The start is also optional and defaults to 0.

Instead of omitting the start and end, we can use the None keyword:

a = [1, 2, 3, 4, 5]
a[None:3] # [1, 2, 3]

Sometimes, it is not possible to use the slice syntax:

when we need to use a variable for the step or,
when the object use the slice syntax for something else, e.g., for selecting columns in a Pandas dataframe.

In such cases, we can use the slice object:

a[0:10:2] 
s = slice(0, 10, 2)
a[s] # equivalent

Here, the parameters can be ommited as well. We can select everything by using slice(None), which is equivalent to slice(None, None, None).

Copying collections¶

If we copy a complex collection (e.g., a list of dictionaries), we typically want to create a deep copy so that the original collection is not modified. We can use the copy module for that:

import copy

a = [{'a': 1}, {'b': 2}]
b = copy.deepcopy(a)

Date and time¶

Python documentation

The base object for date and time is datetime

`datetime` construction¶

The datetime object can be directly constructed from the parts:

from daterime import datetime

d = datetime(2022, 12, 20, 22, 30, 0) # 2022-12-20 22:30:00

The time part can be ommited.

We can load the datetime from string using the strptime function:

d = datetime.strptime('2022-05-20 18:00', '%Y-%m-%d %H:%M')

For all possible time formats, check the strftime cheatsheet

Accessing the parts of `datetime`¶

The datetime object has the following attributes:

year
month
day
hour
minute
second

We can also query the day of the week using the weekday() method. The day of the week is represented as an integer, where Monday is 0 and Sunday is 6.

Intervals¶

There is also a dedicated object for time interval named timedelta. It can be constructed from parts (seconds to days), all parts are optional.

We can obtain a timedelta by substracting a datetime from another datetime:

d1 = datetime.strptime('2022-05-20 18:00', '%Y-%m-%d %H:%M')
d2 = datetime.strptime('2022-05-20 18:30', '%Y-%m-%d %H:%M')
interval = d2 - d1 # 30 minutes

We can also add or substract a timedelta object from the datetime object:

d = datetime.strptime('2022-05-20 18:00', '%Y-%m-%d %H:%M')
interval = timedelta(hours=1)
d2 = d + interval # '2022-05-20 19:00'

Converting to Unix timestamp¶

To convert a datetime object to unix timestamp, we can use the timestamp method. It returns the number of seconds since the epoch (1.1.1970 00:00:00). Note however, that the timestamp is computed based on the datetime object's timezone, or your local timezone if the datetime object has no timezone information.

Time and date objects¶

Instead of using the datetime object, we can use the time and date objects if we need to work with time or date only.

Analogously to the datetime object, these objects can be constructed from the parts:

t = time(hour=12, minute=30, second=0)
d = date(year=2022, month=12, day=20)

To convert datetime to time or date, we can use the time and date methods.

d = datetime.strptime('2022-05-20 18:00', '%Y-%m-%d %H:%M')
t = d.time() # '18:00:00'
d = d.date() # '2022-05-20'

Similarly, we use the date and time objects to construct a datetime object using the combine method:

d = date(year=2022, month=12, day=20)
t = time(hour=12, minute=30, second=0)
dt = datetime.combine(d, t) # '2022-12-20 12:30:00'

Named tuples¶

Apart from the standard tuple, Python has a named tuple class that can be created using the collections.namedtuple function. In named tuple, each member has a name and can be accessed using the dot operator:

from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])
p = Point(1, 2)
print(p.x) # 1

Functions¶

Variable length arguments¶

There are two types of variable length arguments in Python:

*args - variable length positional arguments.
**kwargs - variable length keyword arguments

Argument unpacking¶

if we need to conditionaly execute function with a different set of parameters (supposed the function has optional/default parameters), we can avoid multiple function calls inside the branching tree by using argument unpacking.

Suppose we have a function with three optional parameters: a, b, c. If we skip only last n parameters, we can use a list for parameters and unpack it using *:

def call_me(a, b, c):
    ...

l = ['param A', True]
call_me(*l) # calls the function with a = 'param A' and b = True

If we need to skip some parameters in the middle, we have to use a dict and unpack it using **:

d = {'c': 142}
call_me(**d) # calls the function with c = 142

String formatting¶

To format python strings we can use the format function of the string or the equivalen fstring:

a = 'world'
message = "Hello {} world".format(a)
message = f"Hello {a}" # equivalent

If we need to a special formatting for a variable, we can specify it behind : as we can see in the following example that padds the number from left:

uid = '47'
message = "Hello user {:0>4d}".format(a) # prints "Hello user 0047"
message = f"Hello {a:0>4d}" # equivalent

More formating optios can be found in the Python string formatting cookbook.

Classes¶

Official Manual

Classes in Python are defined using the class keyword:

class MyClass:
    ...

Unlike in other languages, we only declare the function members, other members are declared in the constructor or even later.

Constructor¶

The constructor is a special function named __init__. Usually, non-function members are declared in the constructor:

class MyClass:
    def __init__(self, a, b):
        self.a = a
        self.b = b
        self.c = 0
        self.d = None

Check if an object contains a member¶

To check whether an object contains a member, we can use the hasattr function:

if hasattr(obj, 'member'):
    ...

Constructor overloading¶

Python does not support function overloading, including the constructor. That is unfortunate as default arguments are less powerfull mechanism. For other functions, we can supplement overloading using a function with a different name. However, for the constructor, we need to use a different approach.

The most clean way is to use a class method as a constructor. Example:

class MyClass:
    def __init__(self, a, b = 0):
        self.a = a
        self.b = b
        self.c = 0
        self.d = None

    @classmethod
    def from_a(cls, b):
        return cls(0, b)

Importing¶

In python, we can import whole modules as:

import <module>

Also, we can import specific functions, classes, or variables from the module:

from <module> import <name>

Note that when importing variable, we import the reference to the variable. Therefore, it will become out of sync with the original variable if the original variable is reassigned. Therefore, importing non-constant variables is not recommended.

The module path can absolute or relative (starting with .). Absolute imports are recommended, as they are more robust and less error-prone.

Resolving absolute module paths¶

If the path is absolute, it is resolved as follows:

The already imported modules are searched
The built-in modules are searched
The module is searched in the import path which is a list of directories stored in the sys.path variable. The sys.path variable typically contains the following directories:
- the directory of the script that is executed ('' in case of the interactive shell),
- the directories in the PYTHONPATH environment variable,
- the standard library directories (e.g., /usr/lib/python3.9), and
- the site-packages directory.

Resolving relative module paths¶

Relative imports can only be used in packages (directories with __init__.py file). The relative path may start with

.: relative to the current module,
.. relative to the parent module.

Imports in tests¶

The tests are located outside the main package, so we cannot use the absolute import starting with the package name. One option is to use relative imports. But a better option is to use absolute imports starting from the project root. We can do that, because test suites like pytest add the project root to the sys.path variable.

The project root is typically determined automatically by the test suite, e.g. by searching for the setup.py file. Therefore, if the tests directory is located in the same directory as the setup.py file, we can import as follows:

import tests/common

Exceptions¶

documentation

Syntax:

try:
    <code that can raise exception>
except <ERROR TYPE> as <ERROR VAR>:
    <ERROR HANDELING>
finally:
    <code that is executed always>

The except and finally blocks are optional. In other words, we can handle errors without having any default cleanup code, and we can have cleanup code without handling errors.

Raising exceptions¶

To raise an exception, we can use the raise keyword:

raise ValueError('message')

Sometimes, we want just to re-raise an exception after some partial exception handling. In such cases, we can use the raise keyword without arguments:

try:
    ...
except:
    ...
    raise

Assertions¶

In Python, assertions are executed by defult. They can be disabled by running python with the -O or -OO flag.

The syntax is:

assert <condition>, <message>

Filesystem¶

There are three ways commonly used to work with filesystem in Python:

manipulate paths as strings
os.path
pathlib

The folowing code compares both approaches for path concatenation:

# string path concatenation
a = "C:/workspace"
b = "project/file.txt"
c = f"{a}/{b}"

# os.path concatenation
a = "C:/workspace"
b = "project/file.txt"
c = os.path.join(a, b)

# pathlib concatentation
a = Path("C:/workspace")
b = Path("project/file.txt")
c = a / b

As the pathlib is the most modern approach, we will use it in the following examples. Appart from pathlib documentation, there is also a cheat sheet available on github.

Path editing¶

Computing relative path¶

To prevent misetakes, it is better to compute relative paths beteen directories than to hard-code them. Fortunately, there are methods we can use for that.

If the desired relative path is a child of the start path, we can simply use the relative_to method of the Path object:

a = Path("C:/workspace")
b = Path("C:/workspace/project/file.txt")
rel = b.relative_to(a) # rel = 'project/file.txt'

However, if we need to go back in the filetree, we need a more sophisticated method from os.path:

a = Path("C:/Users")
b = Path("C:/workspace/project/file.txt")
rel = os.path.relpath(a, b) # rel = '../Workspaces/project/file.txt'

Get parent directory¶

We can use the parent property of the Path object:

p = Path("C:/workspace/project/file.txt")
parent = p.parent # 'C:\\workspace\\project'

Absolute and canonical path¶

We can use the absolute method of the Path object to get the absolute path. To get the canonical path, we can use the resolve method.

Always prefer the canonical path, because it can prevent many possible errors (with cloud folders, etc.).

Splitting paths and working with path parts¶

To read the file extension, we can use the suffix property of the Path object. The property returns the extension with the dot.

To change the extension, we can use the with_suffix method:

p = Path("C:/workspace/project/file.txt")
p = p.with_suffix('.csv') # 'C:\\workspace\\project\\file.csv'

To remove the extension, just use the with_suffix method with an empty string.

We can split the path into parts using the parts property:

p = Path("C:/workspace/project/file.txt")
parts = p.parts # ('C:\\', 'workspace', 'project', 'file.txt')

To find the index of some specific part, we can use the index method:

p = Path("C:/workspace/project/file.txt")
index = p.parts.index('project') # 2

Later, we can use the index to manipulate the path:

p = Path("C:/workspace/project/file.txt")
index = p.parts.index('project') # 2
p = Path(*p.parts[:index]) # 'C:\\workspace'

Changing path separators¶

To change the path separators to forward slashes, we can use the as_posix and method:

p = Path(r"C:\workspace\project\file.txt")
p = p.as_posix() # 'C:/workspace/project/file.txt'

Using `~` as the home directory in paths¶

Normally, the ~ character is not recognized as the home directory in Python paths. To enable this, we can use the expanduser method:

p = Path("~/project/file.txt")
p = p.expanduser() # 'C:\\Users\\user\\project\\file.txt'

Working directory¶

os.getcwd() - get the current working directory
os.chdir(<path>) - set the current working directory

Iterating over files¶

The pathlib module provides a convenient way to iterate over files in a directory. The particular methods are:

iterdir - iterate all files and directories in a directory
glob - iterate over files in a single directory, using a filter
rglob - iterate over files in a directory and all its subdirectories, using a filter

In general, the files will be sorted alphabetically. When we need a different order, we have to store the results in a list and sort it.

Single directory iteration¶

Using pathlib, we can iterate over files using a filter with the glob method:

p = Path("C:/workspace/project")
for filepath in p.glob('*.txt') # iterate over all txt files in the project directory

The old way is to use the os.listdir method:

p = Path("C:/workspace/project")
for filename in os.listdir(p):
    if filename.endswith('.txt'):
        filepath = p / filename

Recursive iteration¶

Using pathlib, we can iterate over files using a filter with the rglob method:

p = Path("C:/workspace/project")
for filepath in p.rglob('*.txt') # iterate over all txt files in the project directory and all its subdirectories

The old way is to use the os.walk method:

p = Path("C:/workspace/project")
for root, dirs, files in os.walk(p):
    for filename in files:
        if filename.endswith('.txt'):
            filepath = Path(root) / filename

Iterate only directories/files¶

There is no specific filter for files/directories, but we can use the is_file or is_dir method to filter out directories:

p = Path("C:/workspace/project")
for filepath in p.glob('*'):
    if filepath.is_file():
        # do something

Use more complex filters¶

Unfortunately, the glob and rglob methods do not support more complex filters (like regex). However, we can easily apply the regex filter manually:

p = Path("C:/workspace/project")
for filepath in p.glob('*'):
    if not re.match(r'^config.yaml$', filepath.name):
        # do something

Get the path to the current script¶

Path(__file__).resolve().parent

Checking write permissions for a directory¶

Unfortunatelly, most of the methods for checking write permissions are not reliable outside Unix systems. The most reliable way is to try to create a file in the directory:

p = Path("C:/workspace/project")
try:
    with open(p / 'test.txt', 'w') as f:
        pass
    p.unlink()
    return True
except PermissionError:
    return False
except:
    raise # re-raise the exception

Other methods like os.access or using tempfile module are not reliable on Windows (see e.g.: https://github.com/python/cpython/issues/66305).

Creating directories¶

To create a directory, we can use the mkdir method of the Path object:

p = Path("C:/workspace/project")
p.mkdir()

Important parameters:

parents: if set to True, the directory will be created even if the parent directories do not exist. Default is False.
exist_ok: if set to True, the directory will not be created if it already exists. Default is False.

Copying files and directories¶

For copying files and directories, we can use the shutil module. The most used method is copy2, which copies the file with all metadata:

import shutil

p1 = Path("C:/workspace/project/file.txt")
p2 = Path("C:/workspace/project/file2.txt")

shutil.copy2(p1, p2)

The copy2 method can also copy into a directory:

p1 = Path("C:/workspace/project/file.txt")
p2 = Path("C:/workspace/project2")

shutil.copy2(p1, p2) # the new file will be 'C:/workspace/project2/file.txt'

Other methods and the comparison are described in a SO question.

Deleting files and directories¶

To delete a file, we can use the unlink method of the Path object:

p = Path("C:/workspace/project/file.txt")
p.unlink()

for deleting directories, we can use the rmdir method:

p = Path("C:/workspace/project")
p.rmdir()

However, the rmdir method can delete only empty directories. To delete a directory with content, we can use the shutil module:

p = Path("C:/workspace/project")
shutil.rmtree(p)

Deleting Windows read-only files (i.e. Access Denied error)¶

On Windows, all the delete methods can fail because lot of files and directories are read-only. This is not a problem for most application, but it breaks Python delete methods. One way to solve this is to handle the error and change the attribute in the habdler. Example for shutil:

import os
import stat
import shutil

p = Path("C:/workspace/project")
shutil.rmtree(p, onerror=lambda func, path, _: (os.chmod(path, stat.S_IWRITE), func(path)))

Working with temporary files¶

The tempfile module provides a convenient way to work with temporary files.

To create a temporary file, we can use the NamedTemporaryFile function:

import tempfile

with tempfile.NamedTemporaryFile() as f:
    f.write(<data>)

Unlike normal files, we can both read and write to the temporary file using a single file object. However, we must return the file pointer to the beginning of the file:

with open('file.txt', 'rw') as f:
    f.write('data')
    f.seek(0)
    data = f.read()

Troubleshooting Missing Files¶

If Python cannot locate a file that we can locate in the file explorer, the problem can be cause by the cloud syncronization. Sometimes, relative paths are not resolved correctly when a part of the path is on the cloud folder. We can solve this by resolving the path: safe_path = safePath(<relative path>).resolve().

I/O¶

For simple file operations, we can use the open function. A simple file read is done as follows:

with open('file.txt', 'r') as f:
    data = f.read()

A simple file write is done as follows:

with open('file.txt', 'w') as f:
    f.write('data')

By default, the open function opens the file in text mode. To open the file in binary mode, we have to use the b flag:

with open('file.txt', 'rb') as f:
    data = f.read()

CSV¶

Official Manual

The csv module provides a Python interface for working with CSV files. The basic usage is:

import csv

with open('file.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        # do something

Reader parameters:

delimiter - the delimiter character

JSON¶

Official Manual

TO read a JSON file:

import json

with open('file.json', 'r') as f:
    data = json.load(f)

To write a JSON file:

import json

data = {'a': 1, 'b': 2}

with open('file.json', 'w') as f:
    json.dump(data, f)

Important parameters:

indent: the number of spaces used for indentation. This also enables other pretty-printing functionalities, like newlines after each element.

Custom serialization¶

The json module can serialize only basic types. If we need to serialize custom objects, we have to provide a custom serialization class. We then supply the class to the cls parameter of the dump function.

The serialization class is usually a subclass of the JSONEncoder class. The class has to implement the default method, which is called for each object that cannot be serialized by the standard serialization methods. Example:

import json

class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, MyCustomClass):
            return obj.to_json()
        return super().default(obj)

HDF5¶

HDF5 is a binary file format for storing large amounts of data. The h5py module provides a Python interface for working with HDF5 files.

An example of reading a dataset from an HDF5 file on SO

INI files¶

The configparser module provides a Python interface for working with INI files. The basic usage is:

import configparser

config = configparser.ConfigParser()
config.read('file.ini')

value = config['section']['key']

If we do not have sections in the INI file, we have to:

use the allow_unnamed_section argument of the ConfigParser: Python config = configparser.ConfigParser(allow_unnamed_section=True)
use the configparser.UNNAMED_SECTION in place of the section name: Python value = config[configparser.UNNAMED_SECTION]['key']

Command line arguments¶

The sys module provides access to the command line arguments. They are stored in the argv list with the first element being the name of the script.

However, mostly, we use the argparse module to parse the command line arguments. Example:

import argparse

parser = argparse.ArgumentParser()

parser.add_argument('input', type=str, help='input file')
parser.add_argument('output', type=str, help='output file')

args = parser.parse_args()

print(args.input)

The add_argument method has the following parameters:

name: the name of the argument
type: the type of the argument
help: the help message for the argument
required: if set to True, the argument is required
default: the default value for the argument
action: the action to be performed on the argument. Examples:
- store_true: store True if the argument is present, False otherwise

Logging¶

Official Manual

The logging itself is then done using the logging module methods:

logging.info("message")
logging.warning("message %s", "with parameter")

A simple logging configuration:

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s %(levelname)s %(message)s',
    handlers=[
        logging.FileHandler("log.txt"),
        logging.StreamHandler()
    ]
)

Note that this configuration can be done only once. Therefore, it should not be done in a library as it prevents the user from configuring the logging.

To set the level for a specific logger, we use the setLevel method: logger.setLevel(logging.DEBUG). We can also use a string representation of the level, e.g., logger.setLevel('DEBUG').

To check the level of the logger, we can use the isEnabledFor method:

if logger.isEnabledFor(logging.DEBUG):
    ...

This can be useful for avoiding expensive computations needed just for logging if the logging level is set to a higher level.

Type hints¶

Official Manual

Type hints are a useful way to help the IDE and other tools to understand the code so that they can provide better support (autocompletion, type checking, refactoring, etc.). The type hints are not enforced at runtime, so they do not affect the performance of the code.

We can specify the type of a variable using the : operator:

a: int = 1

Apart from the basic types, we can also use the typing module to specify more complex types:

from typing import List, Dict, Tuple, Set, Optional, Union, Any

a: List[int] = [1, 2, 3]

We can also specify the type of a function argument and return value:

def foo(a: int, b: str) -> List[int]:
    return [a, b]

Type hints in loops¶

The type of the loop variable is usually inferred by IDE from the type of the iterable. However, this sometimes fails, e.g., for zip objects. In such cases, we need to specify the type of the loop variable. However, we cannot use the : directly in the loop, but instead, we have to declare the variable before the loop:

for a: int in ... # error

a: int
for a in ... # ok

Circular type hints¶

Unfortunately, Python currently does not support circular type hints. However, it should be possible to use circular type hints since Python 3.14.

There are two types of circular type hints:

we need to refer the type while defining it. For that, we use the Self type: Python class MyClass: def get_me(self) -> Self: return self
two or more types refer to each other. For that, use a string representation of the type: ```Python class ClassA: def init(self, b: 'ClassB'): self.b = b

class ClassB: def set_a(self, a: ClassA): self.a = a ```

Common type hints¶

Language types¶

None
Numeric types:
- int,
- float,
bool
str
Collection types:
- list, or List[<type>],
- tuple, or Tuple[<type>, ...],
- set, or Set[<type>],
- dict, or Dict[<key type>, <value type>],
Iterables:
- Iterable[<type>] - any iterable
- Sequence[<type>] - iterable with random access ([] operator)
Any - any type
Union[<type>, ...] - any of the specified types
Optional[<type>] - the specified type or None
Callable[[<arg type>, ...], <return type>] - a function with specified arguments and return type

Pandas types¶

Pandas does not provide type hints. We can use the types itself, but this is only partially useful. We can use Series as a hint, but we cannot specify the inner type (e.g.: Series[int]). For that, we can use a wrapper library called pandera:

from pandera.typing import Series

a: Series[int]

Officiall documentation for Pandera data types

Calling external programs¶

To call an external program, we use the subprocess module. Most of the time, we use the run function:

import subprocess

subprocess.run(['ls', '-l'])

Important parameters:

check: if set to True, the function raises an exception if the return code is not 0. Default is False.
text, or universal_newlines: if set to True, the function returns the output as a string. Default is None.
env: A dictionary with the environment variables.
- Note that the default environment obtained from the parent process is not extended, but replaced by the provided dictionary. Therefore, if we want to extend the environment, we have to initialize the dictionary with the parent environment: Python env = os.environ.copy()

However, the subprocess.run method has some limitations. Notably, it cannot both capture and stream the output. To achieve this, and some other advanced features, we have to use the subprocess.Popen class.

`subprocess.Popen`¶

The subprocess.Popen class provides more control over the process. The basic usage is:

p = subprocess.Popen(['ls', '-l']) # start the process

# now we can communicate with the process, stream the output, etc.

p.wait() # wait for the process to finish

# now we can get the return code, continue with the code, etc.

Loading resources¶

Resources can be loaded using the importlib.resources module. This way, we can handle files but also resources stored in an archive.

The basic usage is:

import importlib.resources

file = importlib.resources.files('package').joinpath('file.txt')

# send the file to the function expecting a file-like object

my_function(importlib.resources.as_file(file))

Numpy¶

Data types¶

Documentation of basic data types

Date and time¶

Documentation

Numpy use the datetime64 data type for date and time. This type has its internal resolution, which can be anyting from nanoseconds to years. The resolution is displayed in the type name, e.g., datetime64[ns] and it is determined

automatically from the input data if dtype is not specified or specified as datetime64 or
by the dtype parameter if specified as datetime64[<resolution>].

Initialization¶

We can create the new array as:

zero-filled: np.zeros(<shape>, <dtype>)
ones-filled: np.ones(<shape>, <dtype>)
empty: np.empty(<shape>, <dtype>)
filled with a constant: np.full(<shape>, <value>, <dtype>)

Sorting¶

for sorting, we use the sort function.

There is no way how to set the sorting order, we have to use a trick for that:

a = np.array([1, 2, 3, 4, 5])
a[::-1].sort() # sort in reverse order

Export to CSV¶

To export the numpy array to CSV, we can use the savetxt function:

np.savetxt('file.csv', a, delimiter=',')

By default, the function saves values in the mathematical float format, even if the values are integers. To save the values as integers, we can use the fmt parameter:

np.savetxt('file.csv', a, delimiter=',', fmt='%i')

Usefull array properties:¶

size: number of array items
- unlike len, it counts all items in the mutli-dimensional array
itemsize: memory (bytes) needed to store one item in the array
nbytes: array size in bytes. Should be equal to size * itemsize .

Usefull functions¶

Regular expressions¶

In Python, the regex patterns are not compiled by default. Therefore we can use strings to store them.

The basic syntax for regex search is:

result = re.search(<pattern>, <string>)
if result: # pattern matches
    group = result.group(<group index>) # print the first group

The 0th group is the whole match, as usual.

To substitute the matched pattern, we can use the sub function:

pattern = re.compile(r'(\d+)')
result = pattern.sub(r'[\1]', '123') # '[123]'

Sometimes, we need to use numbers around the group index. In such cases, we have to use the \<group index><<number>> notation:

pattern = re.compile(r'(\d+)')
result = pattern.sub(r'\g<1>2025', '123') # '1232025'

Lambda functions¶

Lambda functions in python have the following syntax:

lambda <input parameters>: <return value>

Example:

f = lambda x: x**2

Only a single expression can be used in the lambda function, so we need standard functions for more complex logic (temporary variables, loops, etc.).

Decorators¶

Decorators are a special type of function that can be used to modify other functions.

When we write an annotation with the name of a function above another function, the annotated function is decorated. It means that when we call the annotated function, a wrapper function is called instead. The wrapper function is the function returned by the decorater: the function with the same name as the annotation.

If we want to also keep the original function functionality, we have to pass the function to the decorator and call it inside the wrapper function.

In the following example, we create a dummy decorator that keeps the original function functionality: Example:

def decorator(func):
    def wrapper():
        result = func()
        return result
    return wrapper

@decorator
def my_func():
    # do something
    return result

Decorator with arguments¶

If the original function has arguments, we have to pass them to the wrapper function. Example:

def decorator(func):
    def wrapper(param_1, param_2):
        result = func(param_1, param_2)
        return result
    return wrapper

@decorator
def my_func(param_1, param_2):
    # do something
    return result

Singletons¶

SO question

There are several ways how to implement a singleton in Python. The most common are:

using a module-level variable
using the __new__ method in combination with a base class
use a metaclass
using a decorator

Module-level variable¶

The most simple way is to use a module-level variable. Note taht if the singleton has to be initialized from the outside, the initialization has to be done in a singleton method, not in the constructor!

class Singleton:
    def __init__(self):
        self.initialized = False

    def init(self, init_param):
        if not self.initialized:
            self.initialized = True
            # do the initialization

singleton = Singleton()

# initialization
singleton.init(init_param)

Testing with pytest¶

Pytest is a simple testing framework for Python. It uses the assert statement for testing. The tests are defined in functions with the test_ prefix.

Fixtures¶

Fixtures are used to set up the environment for more than one test. If defined in the conftest.py file, they are available for all tests in the project.

Fixtures are defined using the @pytest.fixture decorator. The fixture can be used in the test function by passing the fixture name as an argument. The fixture has the following structure:

@pytest.fixture
def my_fixture():
    # code for setting up the environment
    yield # the test is executed here
    # clean up code

Mocking¶

For mocking, we can use the pytest-mock package. After installation, we can use the mocker fixture in any test function.

Capturing output¶

Documentation

To capture the output, we can use the capsys fixture:

def test_output(capsys):
    print('hello')
    captured = capsys.readouterr()
    assert captured.out == 'hello\n'

Similarly, we can inspect the standard error output using the captured.err attribute.

Jupyter¶

Memory¶

Most of the time, when the memory allocated by the notebook is larger than expected, it is caused by some library objects (plots, tables...]). However sometimes, it can be forgotten user objects. To list all user objects, from the largest:

# These are the usual ipython objects, including this one you are creating
ipython_vars = ['In', 'Out', 'exit', 'quit', 'get_ipython', 'ipython_vars']

# Get a sorted list of the objects and their sizes
sorted([(x, sys.getsizeof(globals().get(x))) for x in dir() if not x.startswith('_') and x not in sys.modules and x not in ipython_vars], key=lambda x: x[1], reverse=True)

Reloding modules with autoreload¶

When modules are imported they are not reloaded unless the kernel is restarted. In python scripts, this does not matter, we just execute the script again. However, when working with notebooks , it may be inconvenient to reload the kernel and all necessary cells just because of a small change in the imported module. Instead, we can use the autoreload extension.

First, we have to load the extension:

%load_ext autoreload

Then, we configure the autoreload with %autoreload <mode>. The most common modes are:

now (default): reload all modules immediately (if not excluded by the %aimport magic)
- This is useful especially if the automatic reloading does not work as expected.
0, off: disable autoreload
1, explicit: reload modules that were imported using the %aimport magic every time before executing the Python code
2, all: reload all modules (except those excluded by %aimport) every time before executing the Python code
3, complete: same as 2, but also add any new objects in the module

Plotting¶

There are several libraries for plotting in Python. The most common are:

matplotlib
plotly

In the table below, we can see a comparison of the most common plotting libraries:

Functionality	Matplotlib	Plotly
real 3D plots	no	yes
detail legend styling (padding, round corners...)	yes	no
dash patterns for lines with different axes ranges	yes	no

Matplotlib¶

Official Manual

Saving figures¶

To save a figure, we can use the savefig function. The savefig function has to be called before the show function, otherwise the figure will be empty.

Docstrings¶

For documenting Python code, we use docstrings, special comments soroudned by three quotation marks: """ docstring """

Unlike in other languages, there are multiple styles for docstring content. The most common are:

Epytext Python """ @param <param name>: <param description> @return: <return description> """
Google Python """ Args: <param name>: <param description> Returns: <return description> """
Numpy Python """ Parameters ---------- <param name> : <param type> <param description> Returns ------- <return type> <return description> """
reStructuredText Python """ :param <param name>: <param description> :return: <return description> """

Progress bars¶

For displaying progress bars, we can use the tqdm library. It is very simple to use:

from tqdm import tqdm
for i in tqdm(range(100)):
    ...

Important parameters:

desc: description of the progress bar

TQDM in Jupyter¶

When using tqdm in Jupyter, the basic progress bar may not work (it may print other logs repeatedly). In such cases, we can change the import to:

from tqdm.notebook import tqdm

If the code can be called both from Jupyter and from console, we can use `autonotebook

PostgreSQL¶

When working with PostgreSQL databases, we usually use either

the psycopg2 adapter or,
the sqlalchemy.

psycopg2¶

documentation

To connect to a database:

con = psycopg2.connect(<connection string>)

After running this code a new session is created in the database, this session is handeled by the con object.

The operation to the database is then done as follows:

create a cursor object which represents a database transaction Python cur = con.cursor()
execute any number of SQL commands Python cur.execute(<sql>)
commit the transaction Python con.commit()

SQLAlchemy¶

Connection documentation

SQLAlchemy works with engine objects that represent the application's connection to the database. The engine object is created using the create_engine function:

from sqlalchemy import create_engine

engine = create_engine('postgresql://user:password@localhost:5432/dbname')

A simple SELECT query can be executed using using the following code:

with engine.connect() as conneciton:
    result = conneciton.execute("SELECT * FROM table")
    ...

With modifying statements, the situation is more complicated as SQLAlchemy uses transactions by default. Therefore we need to commit the transaction. There are two ways how to do that:

using the commit method of the connection object Python with engine.connect() as conneciton: conneciton.execute("INSERT INTO table VALUES (1, 2, 3)") conneciton.commit()
creating a new block for the transaction using the begin method of the connection object Python with engine.connect() as conneciton: with conneciton.begin(): conneciton.execute("INSERT INTO table VALUES (1, 2, 3)")
- this option has also its shortcut: the begin method of the engine object Python with engine.begin() as conneciton: conneciton.execute("INSERT INTO table VALUES (1, 2, 3)")

Note that the old execute method of the engine object is not available anymore in newer versions of SQLAlchemy.

Statements with parameters¶

Sometimes it is desirable to use parameters in the SQL statements:

it prevents SQL injection in case of user input,
the provided parameters are automatically escaped, so we don't have to worry : in the SQL statement.

The syntax is:

connection.execute("INSERT INTO table VALUES (:param1, :param2)", param1=1, param2=2)

# or

connection.execute("INSERT INTO table VALUES (:param1, :param2)", {'param1': 1, 'param2': 2})

Executing statements without transaction¶

By default, sqlalchemy executes sql statements in a transaction. However, some statements (e.g., CREATE DATABASE) cannot be executed in a transaction. To execute such statements, we have to use the execution_options method:

with sqlalchemy_engine.connect() as conn:
    conn.execution_options(isolation_level="AUTOCOMMIT")
    conn.execute("<sql>")
    conn.commit()

Getting the affected rowcount¶

The result object returned by the execute method has the rowcount attribute that contains the number of affected rows.

Executing multiple statements at once¶

To execute multiple statements at once, for example when executing a script, it is best to use the execute method of the psycopg2 connection object. Moreover, to safely handle errors, it is best to catch the exceptions and manually rollback the transaction in case of an error:

conn = psycopg2.connect(<connection string>)
cursor = conn.cursor()
try:
    cursor.execute(<sql>)
    conn.commit()
except Exception as e:
    conn.rollback()
    raise e
finally:
    cursor.close()
    conn.close()

Working with GIS¶

When working with gis data, we usually change the pandas library for its GIS extension called geopandas.

For more, see the pandas manual.

Geocoding¶

For geocoding, we can use the Geocoder library.

Complex data structures¶

KDTree¶

documentation

KDTree can be found in the scipy library.

Geometry¶

There are various libraries for working with geometry in Python:

scipy.spatial: for basic geometry operations
shapely
geopandas: for gis data

Downloading files¶

To download files from the internet, we can use the requests library. The basic usage is:

import requests

response = requests.get(<url>)

with open(<filename>, 'wb') as f:
    f.write(response.content)

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search

Basics¶

Variables¶

Conditions and boolean context¶

Comparison operators¶

Automatic conversion to bool¶

Checking the type¶

Built-in data types¶

Numbers¶

Check if a float number is integer¶

Check if a number is NaN¶

Rounding¶

Strings¶

String formatting¶

String methods¶

Enumerations¶

Collections and generators¶

Generator¶

Tuple¶

Named tuples¶

Dictionary¶

Set¶

Comprehensions¶

Sorting¶

Complex sorting using tuples¶

Slices¶

Copying collections¶

Date and time¶

datetime construction¶

Accessing the parts of datetime¶

Intervals¶

Converting to Unix timestamp¶

Time and date objects¶

Named tuples¶

Functions¶

Variable length arguments¶

Argument unpacking¶

String formatting¶

Classes¶

Constructor¶

Check if an object contains a member¶

Constructor overloading¶

Importing¶

Resolving absolute module paths¶

Resolving relative module paths¶

Imports in tests¶

Exceptions¶

Raising exceptions¶

Assertions¶

Filesystem¶

Path editing¶

Computing relative path¶

Get parent directory¶

Absolute and canonical path¶

Splitting paths and working with path parts¶

Changing path separators¶

Using ~ as the home directory in paths¶

Working directory¶

Iterating over files¶

Single directory iteration¶

Recursive iteration¶

Iterate only directories/files¶

Use more complex filters¶

Get the path to the current script¶

Checking write permissions for a directory¶

Creating directories¶

Copying files and directories¶

Deleting files and directories¶

Deleting Windows read-only files (i.e. Access Denied error)¶

Working with temporary files¶

Troubleshooting Missing Files¶

I/O¶

CSV¶

JSON¶

Custom serialization¶

HDF5¶

INI files¶

Command line arguments¶

Logging¶

Type hints¶

Type hints in loops¶

Automatic conversion to `bool`¶

`datetime` construction¶

Accessing the parts of `datetime`¶

Using `~` as the home directory in paths¶

`subprocess.Popen`¶