Sean's Blog

Python

Just some notes on how to use Python effectively. Python strives to be simple and concise and is a good choice for rapid software development. The average execution speed is rather slow, which is ok because it nicely interacts with C, C++ and Rust which are fast. Another major selling point is the rich ecosystem of useful packages, spanning web development, data science and much more.

General Tips

  • Keep functions pure (no side effects) if possible
  • Use early returns in functions to avoid deep nesting
  • Use type hints and a type checker (ty) with pre-commit hooks
  • Use ruff - a fast linter / formatter
  • Use pathlib module for dealing with file system paths
  • No magic numbers (use expressive variable names e.g. waiting_time_ms)
  • Use f-strings for formatting strings
  • Validate variable types from external (untrustworthy) inputs, e.g. user input, web requests
    • try attrs and cattrs instead of pydantic
  • Use caching for heavy computations
  • Use pytest for unit testing

Package Management

Use uv (preferred and popular) or pixi (can install conda packages - useful for GPU/CUDA stuff) for fast and sane package management.

Web Development

  • Use FastAPI to create clean and simple REST API's supporting both synchronous and asynchronous routes
  • Use httpx for network requests (also supports sync and async)

CLI

  • For creating CLI: cyclopts
  • For formatting console output: rich
  • Progress bar: tqdm

Interactive Development

Great for prototyping, one-off analysis scripts and literate programming.

Jupyter Notebook and Marimo.

Desktop Apps

PyTauri / PyLoid / PyWebView

Concurrency

Python is a single threaded language with a Global Interpreter Lock (GIL). Meaning only multi-processing enables real parallel execution of non IO code. Multi-threading or async in Python thus only allows for concurrent IO operations (network / file system read and writes).

Python 3.13 has added experimental support for a no-GIL build flag, enabling true multi-threading support, which may become the default in the future.

Multi-Processing

Use joblib for sane multi-processing. Note that multi-processing should only be used to parallelize very CPU heavy tasks, since the overhead of starting processes is very high (always benchmark).

from math import sqrt
from joblib import Parallel, delayed

# Runs in 4 processes in parallel, preserves input order
results = Parallel(n_jobs=4)(delayed(sqrt)(i ** 2) for i in range(16))
print(results)

Async

I do not like async syntax wise (just personal preference) - I prefer multi-threading to speed up heavy IO tasks (relevant HN thread).

import asyncio
import httpx

async def fetch_multiple_urls(urls):
    async with httpx.AsyncClient() as client:
        tasks = [client.get(url) for url in urls]
        responses = await asyncio.gather(*tasks)
        return responses

Generators

For efficient (lazy / easy on RAM) code

# Loads entire file into RAM
def read_large_file_bad(filename):
    with open(filename) as f:
        return [int(line.strip()) for line in f]

# Only keeps one line in memory
def read_large_file_good(filename):
    with open(filename) as f:
        for line in f:
            yield int(line.strip())

# Memory efficient processing (file can be bigger than RAM)
total_sum = 0
for number in read_large_file_good("huge_file.txt"):
    total_sum += number

SQLite

SQLite3 support is built into the Python standard lib and a simple option to embed, store and analyze relational data.

When creating tables always use the STRICT keyword, to enfore type consistency on INSERT and UPDATE operations. This prevents ugly typing bugs that are possible - as Python does not guarantee type consistency at runtime.

Postgres

Postgres is very versatile and powerful DBMS. Use it with psycopg and a docker image.

from contextlib import asynccontextmanager
from fastapi import Depends, FastAPI
import psycopg_pool
import psycopg

conn_string = "postgres://postgres@localhost"

pool = psycopg_pool.AsyncConnectionPool(conn_string, open=False)

@asynccontextmanager
async def lifespan(app: FastAPI):
    await pool.open()
    yield
    await pool.close()

app = FastAPI(lifespan=lifespan)

async def get_conn():
    async with pool.connection() as conn:
	    yield conn

@app.get("/visit/")
async def add_visit(conn = Depends(get_conn)):
    async with conn.cursor() as cursor:
        # Run our queries
        await cursor.execute("insert into visits(timestamp) values (now())")

    return {"message": "Visit logged"}

Docker

Bundle your apps and make them reproducible using docker (with uv or pixi).

Logging

Use loguru (comes with a multi-processing queue that just works)

Performance

Use a profiler (pyinstrument) to find slow or RAM consuming code paths.

Use C / C++ / Rust / Zig / Mojo etc. for performance critical code or try PyPy and Cython first. Or check out a transpiler https://github.com/py2many/py2many / https://github.com/paiml/depyler.

#programming