When working in Databricks, it’s common to develop additional Python classes as reusable helpers—for example, to read data from APIs or to apply repeatable PySpark transformations.
The most common approach is to create separate .ipynb
notebook files and import them into your main notebook using the %run
magic command:
%run ./PythonClass
%run ./Notebook
?There are several major drawbacks to relying on %run
for code reuse:
No Version Control:
If someone accidentally modifies a shared notebook (especially in a production workspace), all notebooks using %run
will silently break. There’s no built-in versioning or dependency management.
Performance Issues:
Every time you use %run
, the imported notebook is fully re-executed—even if nothing has changed. This can slow down development and job execution.
Not Portable:
Databricks notebooks are stored as JSON files. Code written in them is tightly coupled to the Databricks environment and cannot be easily reused or shared outside of Databricks (unlike a Python package or module).
A Python wheel (.whl
) is a portable, versioned package.
With GitHub Actions, you can automate the build and upload process every time you push to main
.
src
layout)