My overall approach to coding, including data science and machine learning.

Programming languages

Mainly Python, although I would really like to try and use Julia.

Programming IDE

Visual Studio Code, in my opinion, doesn't even seem to have proper competition at the moment. It's just so good 😍 Even for notebooks, with its IntelliSense autocomplete, extensions and proper debugging, it's a more productive and practical option than JupyterLab. With this and the option to use scripts as notebooks, it almost solves all of Joel Grus' famous complaints (I don't like notebooks).

In order to have a better pipeline management, I'll be looking into Orchest and similar options soon.

Data science frameworks

For most stuff, I use Pandas. When I have larger datasets, I should opt for either:

Machine learning

For deep learning models, PyTorch. For other ML models, scikit-learn and XGBoost.

TensorFlow can be a valid alternative to PyTorch. Jax also seems to be gaining steam, as well as some Julia packages.

Comet is great for logging model training experiments and performing hyperparameter tuning.

SHAP is my go-to in terms of model interpretability (by the way, I adapted it to RNN-type models, as you can see in my article Interpreting recurrent neural networks on multivariate time series and in my Master's Thesis Presentation).

RaySGD (which is part of Ray) is a great tool to do efficient distributed training. PyTorch Lightning is a good alternative.

Cloud computing

Colab is a free option (with GPUs and TPUs) that can suit some non-confidential exploration.