Managing Python dependencies

2025-01-15

Table of Contents

My intention #

I don’t know how often you find yourself trying to deploy code to some external service like AWS, GCP or even some on-premise solution. There is always some complexity associated with it, especially if you have a codebase with thousands of files and dependencies.

In my search for a scalable solution, everything I found turned out to be made for simple problems. Google and GPT don’t know more than a basic hello world example, and real engineering projects consist of different setups and technical decisions that go much beyond a basic tutorial.

I’ve been thinking about this a lot, so I figured maybe writing would help me organize some ideas. If you have experienced the same issues, I would be more than happy to hear from you.

Context #

My experience with Go and monorepo #

You may be wondering, what the fuck Go has to do with anything. And yes, you’re right, but bear with me for just a second, and you’ll learn something fun.

If you don’t know what Go is, it is a programming language developed by Google around the year 2007. At the time Google was a big company and their de-facto programming language as i understand was C++ (they still use it a lot though). C++ is great for a lot of things, it is compiled, it is statically typed and is high performance.

But if you have worked with C++, you know it is a painful experience. For starters, the language has become so complex that it basically lets you do everything—so much so that it becomes unreadable. They even have some kind of C++ bible, with the 10 commandments of what you shall or shall not use when writing C++. If you are curious, you can read it here.

The problem with dependencies in C++ is that when the project becomes large enough, compiling becomes a huge problem. In that case, you have to be explicit about everything you have to compile, and with many files, this is not trivial. That is why we have tools like CMake.

Google saw growth in languages like Python and Java because of how easy they were to use, but they wanted to keep the performance features of C++. That is when Go came in.

Some of the advantages of Go are: it’s easy to compile (the compiler is smart, so you don’t need to use CMake), easy to understand, it handles concurrency with goroutines so threads are managed for you, it manages dependencies for you, and it’s compiled, so it’s portable. If you ask me, I would use Go for everything. However, you have to use the best tool for your solution, and as perfect as Go is, it doesn’t cover everything.

Now that I bored you with some history, deploying code in Go is as simple as

GOOS=linux GOARCH=amd64 go build main.go -o main

And you just take that binary and put it on your server. You can keep everything in a monorepo (like Google does) and let the compiler figure out what to compile. And no, there is no catch. If I had to guess when this works, I’d say around 80% of the time. The other 20% of cases are when CGO is involved and the complexity increases a bit.

Interpreted languages #

Now that I’ve talked about C++ and Go, there are other languages worth mentioning—interpreted ones. Languages like Python or JavaScript are interpreted. What does that mean? In short, an interpreted language needs an additional runtime responsible for interpreting the code as you execute it. This is different from compiled languages, which first read the whole codebase and then translate it to bytecode with additional optimizations. So the interpreter is not needed, and just your code is run.

The interpreter is like a man in the middle who reads each line of code, translates it, and then executes it.

Why you should care about deployment size #

Before jumping to the next section, there is another issue here which is not talked about enough. Deployment size plays a huge role in how your final application performs. There is a whole area of expertise about delivering these packages in the most efficient way possible—you probably know it as Continuous Integration and Continuous Delivery, or better known by the acronym CI/CD.

But with the growth of storage capacity and internet bandwidth, developers became less strict about the final package size—and perhaps for good reason. If it works, don’t touch it, right?

However, like everything, there is a limit—and throwing more resources at the problem usually doesn’t end very well. Besides, caring about these things will yield better results in the end. For example, your application will have faster loading times, easier distribution, and reduced consumption of resources.

Deploy #

A basic app #

Let’s go from the most common and basic solution and then build up to a more complex one. Usually you have a great idea and then proceed to create a repository. You use Python because you have PyTorch as a dependency (you could use C++ here, but sometimes it is not as straightforward as you would like it to be).

First, you need a Git repo, so you create it.

git init ml-model
cd ml-model
touch main.py
touch Dockerfile

Then you proceed with your code.

# main.py
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device")

class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device)
print(model)

X = torch.rand(1, 28, 28, device=device)
logits = model(X)
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

You need to create a virtualenv and “freeze” your dependencies with a requirements file.

python -m venv .venv
source .venv/bin/activate
pip install torch torchvision
pip freeze > requirements.txt

And your Dockerfile, so you can actually make it portable.

FROM pytorch/pytorch:2.7.0-cuda11.8-cudnn9-runtime

COPY ./requirements.txt /install/requirements.txt
RUN pip install -r /install/requirements.txt

WORKDIR /code
COPY main.py /code/main.py

Now you don’t have a portable binary, but you do have a Docker image. It’s a little heavier, but at least it’s guaranteed to run anywhere. We won’t dig into the details of deployment—that’s another problem—but with Docker, you can pretty much deploy it anywhere.

Businness is doing well #

Your solution turns out to be very good, but software is in constant change, and new ideas and services are developed. So, how can we deploy multiple services and structure the repository in a way they don’t mess with each other? Well, there are many solutions—one, for instance, is to create a new repository for each service. But then you’d have to copy common libraries over and over, and changes would be hard to propagate. Of course, multi-repo makes a lot of things easier, but if your services share common logic, things get messy.

If the idea of using multirepo doesn’t make sense for your use case, the following example might be of help.

Growing repo #

The previous model is generally used to classify numbers on images of size 28×28. Now we need to add another model. Here is where things get interesting. Let’s say the new model only works in TensorFlow with Python version 3.11, because for some reason the other versions are broken for this specific model. And the previous model works with Python versions greater than 3.12.

In this case, we need to use a tool like pyenv, but then we have to modify our CI/CD pipeline to support an additional tool, and mark these two models with different Python versions so that neither the pipelines nor the developers install the wrong ones.

Two Dockerfiles are now needed to support the different configurations.

Shared libraries #

Let’s say our two models are a success. But we need a way to log what is going on with our services in a structured way. We need a shared library for our two services—in this case, a logger. Now we have an interesting problem, because we could use two approaches here. Either we create a new repository for each library and manage the dependencies with pip, or copy the dependency for each project that needs it. Of course, the latter doesn’t scale very well.

The repository now would look like this

logger/
    requirements.txt
    logger.py
model_a/
    Dockerfile
    src/
        requirements.txt
        main.py
model_b/
    Dockerfile
    src/
        requirements.txt
        main.py

Creating a new repo for shared libraries doesn’t seem like a bad idea at first, but as the shared libraries grow in size, the final deployed application also grows unnecessarily big. And we’re not even talking about the dependencies of those dependencies—which could be large too.

Maybe I am being a pessimist and you won’t need to scale to hundreds of shared packages. But if you want to avoid duplicating code and creating another repo, there is a simple hack we could use.

The hack is very simple: we create a Makefile that copies the logger to the current directory, then we build the image, and finally, we clean the copied package.

build:
    cp -r ../logger .
    docker buildx build -t model_a:latest
    rm -rf logger

And for the Dockerfile we have some changes too.

FROM pytorch/pytorch:2.7.0-cuda11.8-cudnn9-runtime

WORKDIR /code
COPY logger /code/logger
COPY src /code
RUN pip install -r /code/requirements.txt
RUN pip install -r /code/logger/requirements.txt

Now, instead of building our Docker image directly. We execute our make file using the build target make build.

Introducing uv #

One of the most interesting features Go introduced is its building and tooling ecosystem. We talked previously about C++ and how painful compiling was. C++ doesn’t come with a building and tooling ecosystem, so there are a bunch of tools like Make, CMake, and Bazel—just to name a few.

Just like C++, the Python ecosystem is a little fragmented. They support virtual environments by default and this make it somewhat easy. But it falls short, because virtualenvs are not package managers—they just provide isolation. If you want to use different Python versions, you need a tool like pyenv; if you need a package manager there is poetry; and for distribution, there is twine.

uv is a package manager written in Rust and developed by astral-sh. Their aim is to provide an ecosystem of high-performance developer tools for Python. Astral is also known for creating ruff a linter and code formatter for python.

But what makes uv so special? First, they replace all the tools previously mentioned—and many more. Having a unified tool makes a lot of things easier. This starts from code editor integration (I’m planning to write another entry for Neovim integration), to local development and CI/CD pipeline development.

Personally, I think one of the best features they provide is the use of workspaces. It is an interesting concept they took from Cargo (the package manager of Rust), where you have many related projects that you want to manage together but probably with different dependencies. However, this falls shorts in cases where there are conflicts in dependencies or Python versions.

Multiple projects using workspaces #

Now, I promise you that we’ll get back to the original example, but for now, let’s consider a simple case where workspaces excel.

Let’s say we have a startup with two services. One is our backend app, where all the code related to the core product is hosted. The payment service is a separate service that handles payments. Payments and app are related, but it’s better to keep them separate. Different teams are maintaining these services, and keeping them separate could make some things easier.

There are 2 shared packages. logger is used by both services but auth is used only by app. This is an interesting example because distributing the payments service with an unrelated package is not ideal, and uv solves it very nicely.

There is also another problem that workspaces solve: package discovery. Imports might not be recognizable if they are not in the same folder. For instance, the app project couldn’t import logger because it is not in the same folder. A workaround for this issue would be to add logger to the PYTHONPATH environment variable, but this becomes annoying when there are many packages and you start to lose track of each dependency.

Now, our project might look like this:

├── packages
│   ├── auth
│   │   ├── pyproject.toml
│   │   └── src
│   │       └── auth
│   │           ├── __init__.py
│   │           ├── jwt.py
│   └── logger
│       ├── pyproject.toml
│       └── src
│           └── logger
│               ├── __init__.py
│               └── logger.py
├── servers
│   ├── app
│   │   ├── main.py
│   │   └── pyproject.toml
│   └── payments
│       ├── main.py
│       └── pyproject.toml
├── main.py
├── pyproject.toml
└── uv.lock

Here we have declared our root folder as a project in uv. You can identify it because there is a pyproject.toml and a uv.lock. We have two folders, one is packages and the other is servers. Where each have two uv projects.

To make each package a member of the workspace, we must use the tool.uv.workspaces table in the pyproject.toml file and include the folders where each package is located. For the previous root pyproject.toml, it would look like this.

[project]
name = "workspaces-demo"
version = "0.1.0"
description = "Basic infrastructure repo for saas"
readme = "README.md"
requires-python = ">=3.9"
dependencies = []

[tool.uv.workspace]
members = ["servers/*", "packages/*"]

Now, if we want app to use logger and auth packages, we must go to servers/app/pyproject.toml and add them as dependencies. Also, we need to specify that they come from the workspace so uv knows exactly where to find them.

# servers/app/pyproject.toml
[project]
name = "app"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.9"
dependencies = ["auth", "logger"]

[tool.uv.sources]
auth = { workspace = true }
logger = { workspace = true }

Now your imports works fine

# servers/app/main.py
from auth import jwt
from logger import logger

def main():
    logger.Info("Hello from app!", jwt.validate_token("sometoken"))

if __name__ == "__main__":
    main()

Limits of workspaces #

When dealing with workspaces, there is a root project responsible for managing all dependencies across members. This root project is in charge of resolving shared dependencies that every member can use. If this isn’t the case, workspaces won’t work for your use case. This also applies to Python versions—all members need to use a common Python version, otherwise it will fail.

However, not everything is lost. You can still use workspaces and keep different projects in the same repo. So let’s return to our initial example, where we had two different models and a logger, and integrate it with what we’ve learned so far.

We could design it so logger can be used by the four projects the following way.

├── README.md
├── main.py
├── ml
│   ├── model-b
│   │   ├── main.py
│   │   └── pyproject.toml
│   └── model_a
│       ├── main.py
│       └── pyproject.toml
├── packages
│   ├── auth
│   │   ├── pyproject.toml
│   │   └── src
│   │       └── auth
│   │           ├── jwt.py
│   └── logger
│       ├── README.md
│       ├── pyproject.toml
│       └── src
│           └── logger
│               ├── logger.py
├── servers
│   ├── app
│   │   ├── main.py
│   │   └── pyproject.toml
│   └── payments
│       ├── main.py
│       └── pyproject.toml
├── pyproject.toml
└── uv.lock

As you can see, we’ve added a new folder called ml, where all our code related to ML models will belong. However, these two models cannot be workspace members, because—as we mentioned previously—they have conflicting Python versions.

In order to make them work with the logger, we must first remove logger from the workspace:

[project]
name = "workspaces-demo"
version = "0.1.0"
description = "Basic infrastructure repo for saas"
readme = "README.md"
requires-python = ">=3.9"
dependencies = []

[tool.uv.workspace]
members = [
    "servers/*",
    "packages/auth",
]

Then in each model pyproject.toml, we add it as dependency the following way:

[project]
name = "model-a"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "logger",
    "requests>=2.32.4",
]

[tool.uv.sources]
logger = { path = "../../packages/logger", editable = true }

The important part above is the table tool.uv.sources, which declares the location of each package —in this case, a local path in the repository.

Distributing using uv #

One of the best things uv offers, compared with other building tools, is its building system. To deploy an app, a requirements.txt file is required for installing external dependencies, .along with the local code that needs to run on the server.

With workspaces, distribution is easier. First, we add external or third party dependencies to a requirements.txt. For local dependencies, we use uv build --all-packages to discover and package them into wheels (you can also use tarballs). Finally, we pack the main application into a wheel as well using uv build.

The workflow would look like this:

uv export --no-emit-workspace > requirements.txt
uv build --all-packages --wheel --out-dir ./dist
uv build --wheel --out-dir .dist

This process generates a set of .whl files in the dist directory, including one for each local package and one for the main application. By doing this, we ensure our application can be reliably built and installed in any environment—local development, CI pipelines, or production servers—without worrying about whether local imports or paths are resolved correctly.

Also, wheels are the preferred method of distribution for Python packages. They’re faster to install and don’t require running setup.py, making them safer and more consistent across environments.

To install our whole application we only need two commands.

pip install -r requirements.txt
pip install wheels/*.whl

Conclusion #

Managing Python dependencies in real-world projects is rarely as simple as the tutorials make it seem. Between conflicting Python versions, internal packages, and deployment constraints, things get messy fast. With uv, we gain a modern toolset that simplifies local development using workspaces and enables clean, reproducible builds for production.