We operate more similarly to an academic group than a production devops setting, prioritizing accessible, well-commented, simplistic code with informative READMEs and a flat code organization. Our science-in-the-open approach means our codebase must be newcomer-friendly and easily interpretable, with timely research progress taking precedence over production-grade software standards.
uv (specifically uv sync
) ****for environment management
native PyTorch DDP as opposed to Accelerate, Lightning, etc.
config files (e.g., YAML) for defining hyperparameters (as opposed to command-line argparse or hard-coded changes)
Think of a repo where you immediately see (almost) every .py file that matters:
my_experiment/
│
├── train.py
├── eval.py
├── dataset.py
├── models.py
├── losses.py
├── utils.py
└── README.md
That—plus maybe a few small sub-packages such as models/
or datasets/
—is a flat layout. It deliberately avoids the deep, layered package trees you’d find in a production service (e.g., my_app/core/data/io/...
). Further, we prefer to limit dependencies to core libraries (torch
, torchvision
, timm
, etc) so that everything you need to understand the project is directly there in the folder. This practice improves speed of iteration, readability for newcomers, fewer abstractions, isolation from external breakages, etc.
Some exceptions can be made to this, such as wanting to fork an existing codebase and make minimal adjustments to it to get a proof of concept working, or for non-research projects.
See here for more information on the “wide-and-flat” codebase approach.
When it comes to working with larger teams of volunteers it is important not to overcomplicate things. Simply use Discord to share tasks for a project and who is working on what. For organization of core tasks also feel free to also use Github Issues (helpful for linking PRs to specific issues or for having a clearly defined place to describe the nuances of a specific issue), but these issues should also be conveyed in the Discord and not created in isolation.
Collaborators should not be able to push directly to main — they should fork the branch and open PRs and the project lead should approve pushes to main. The project lead handles all pull request approvals (including their own).
Use branches to protect the integrity of the main repository. The main branch should represent the definitive, stable version of the project—a reliable and up-to-date entry point for all newcomers. New and experimental features should first be developed in their own isolated branches. If an experiment proves to be a superior solution, its code should be quickly merged to replace the relevant logic in main, not just appended to it (that is, unless both solutions can be simply implemented as a config parameter). This helps prevent code bloat and ensures the primary codebase remains focused and maintainable. Don’t wait too long to push features to main because we do not want to have a situation where someone is contributing to main and it turns out to be wholly incompatible with your current under development experimental branch.
PRs should be focused to a single feature at a time whenever possible to cleanly separate modifications to the main branch. It is totally fine to have a messy branch where you test various experimental features and never intend to push it to main. In such cases, our preference would be that whenever you have a new feature that you are relatively confident is making the codebase better, to create a new branch off of main and then implement solely that one feature. Then test to make sure that the codebase still works, and push that as a clean pull request to main. If you are using well-labeled commits then it should hopefully be simple enough to find the code specific to that feature you want to isolate and push to main.