I find that I mostly work on getting things to work so I can then do work.
Does that last sentence work?
Basically, most of the time I'm not doing analyses or visualizing something cool. Instead I'm setting up an environment or making sure things are pointing at the right files or whatever. I'm getting things ready a lot, so I wanted to try smooth out some of the hassle of setting things up.
So, I made a little template called vanillin to setup a project quickly, allowing me to work out of Docker and code in Jupyter Notebooks with a few of the sharp edges removed.
To backup, let's start with "I like coding in notebooks".
I like the code notebook. But they require a bit of setup each time. And keeping track of all that can be a pain. Mostly it's keeping track of Python.
I've always been told that working in Python, you have to work out of virtual environments. The annoying thing I've found about this is sharing notebooks with other people. Collaborators have their own setup and make their own environments. It is annoying to keep track of. People have made a lot of tools like pipenv and Poetry, or general practices like requirements.txt, but I find it a bit of a hassle because I only want to run a jupyter notebook. After all of the generating and setup, we are working out of a browser in jupyter. It would be nice if when sharing notebooks I wouldn't have to worry so much about the environment they are run in. I just want a command that builds up an environment and gets me inside the jupyter environment.
In other words, I want to limit the amount of time I worry about my kitchen and instead focus on cooking.
Turns out to do this, you kind of have to think a lot about kitchens first.
My first attempt was to make my own kitchen from scratch. I was using Docker to make custom environments for each project. This worked, but it was a hassle, since it was a lot of things to manage.
- Had to make custom environments for different use cases and host them on DockerHub
- Had to maintain all that stuff
Rather than making things easier to use, it was just a hassle to handle. I wasn't very familiar with Docker and I wanted to leave the maintenance to people with more experience.
Turns out Jupyter already had this setup...setup.
Came across the Jupyter Docker Stacks and that made things a lot simpler. They maintain docker images and I can pull the ones I want to run jupyter notebooks. When I'm working out of the images, I can be a lot less worried about installing things. I also can now pass a notebook along with a Dockerfile and know it'll run for someone. Then we can actually collaborate rather than keep asking questions about how our machines are working.
This setup still is a bit of a hassle. These docker commands are quite intimidating:
docker build --rm -t DOCKER_IMAGE_NAME . docker run --rm -p 10000:8888 -e JUPYTER_ENABLE_LAB=yes -v $PWD:/home/jovyan/work DOCKER_IMAGE_NAME
When I run these, I only ever change the image name and the port. So let's smooth that out a bit.
- Dockerfile is taking a base image from the Jupyter Docker Stacks. And I can pick the fanciest kitchen they have for no extra cost. They have a bunch to pick from if size is a big issue.
- Additional packages I may need can be installed in the Dockerfile or run in a notebook cell
- Command to start things up is similar to
jupyter notebook, trying to be as easy to use as possible
vanillin DOCKER_IMAGE_NAME # build command vanillin DOCKER_IMAGE_NAME 10000 # run notebook on port 10000
It's not actually doing much, but I find it much more user-friendly.
A lot of ice cream: The drawback of this setup is the images are big. The fancy kitchen does come at a cost of the size of 1-2 GB. But it handles everything. I think it's a good tradeoff.
I'm sure I'll soon find a smoother way to code things up in code notebooks, but right now this setup has been very helpful.
- Have smart people handle docker containers/images
- Make a Dockerfile that suits the needs of a project
- Get going! Start up jupyter easily. Install things without getting goosebumps. Share work with others without a bunch of back and forth (jupytext does help as well).
Make a mess in the kitchen. Cook up something tasty.
That's it. Feel free to give vanillin a try. Hope you are doing well.
Till next time!