Python — Docker image security and Trivy scan
Today we will talk about Docker, security, and best practices to keep your image more secure. We need to change our mind and think that our solution is only one part of one ecosystem, and when this is inside the ecosystem, then it can be exposed to some risks.
Docker is a wonderful tool that changed the game some years ago, breaking or reducing some developer’s comments like “on my PC it runs” because it “isolates” our application in a similar production environment.
We can think that our app can depend on something like caches, databases, and other things to work in a production environment. Also, the database can have a lot of data. Well, we can discuss this at another moment, maybe when we discuss something about integration tests, docker-compose, or something like this.
Then, in this article, our application is, supposedly, a simpleDjango application, and we’d use a Dockerfile using the public image of Python in your version 3.11, and we will scan this using Trivy.
During this article, we will discuss when we can use Trivy or other solutions before our vulnerabilities are delivered to a production environment. In the conclusion, we will discuss a Docker image security, we’ll know how to scan our images, and the advantages of using these tools in the right way.
But first, I need to explain that I have more than 15 years of experience developing software. Currently, I work primarily with Python, but I have also worked with other programming languages like Java, JavaScript, and PHP.
Let’s move on to the explanation, and I hope this article helps you identify and address some common code issues.
About Docker
Docker is a very important tool for developing solutions with almost no problems of incompatibilities between OS or environments. It works very well in the CI/CD process and when we are using Kubernetes or some PaaS (Platform as a Service) like Heroku for example (how to). We have more control over how our application will work in a production environment or how much it will be billed in the cloud.
A common utilization of your application’s Docker image is to be sent to a Container Registry and used by Kubernetes to control our application inside a cluster, for tasks like starting, restarting, and upgrading a pod, for example. Our application will be similar to this if we are using a Linux Distribution:
Thinking about this image, where can be our application vulnerabilities? First layer, our own code/app could have some vulnerabilities and bugs, something like a route that executes a command to compromise our database, for example; second layer, the selected version of Python could have some vulnerabilities that may not immediately disrupt our application but could be exploited at a later time; thirdly layer, our chosen distribution may also have vulnerabilities.
Our app can have other vulnerabilities in upper layers, but most of the time we don’t have control over them, so I won’t mention them.
Because of our answer regarding vulnerabilities, we need to have some precautions about our Dockerfile and the commands that we are using within this. Otherwise, we need to know how to verify these vulnerabilities, and to do this there are a lot of tools, but at this moment, we are talking about Trivy.
Okay, we talked about vulnerabilities and know where we can find it. Then, what is “wrong” for our application with this Dockerfile?
Note: remember, all the teams are watching for vulnerabilities, the possible fault when writing the Dockerfile can be treated in another layer of security or mitigated. Think that “everything can happen, including nothing”, but “security is never too much.”
Only thinking about this, we have a Python public image, on version 3.11. At this moment, we can’t say this image is “wrong,” but we can wait for some risks, like this image never having an update command to perform security updates on the Python:3.11 image.
Let's scan
Okay, we don’t know or remember all security issues about the Python 3.11 public image, then we can use Trivy to check if this image has some problems. To install, you can follow this link.
In this case, we are using this tool installed on our computer and only to scan the image. This tool has other options like scanning public images, your computer’s files, and a Kubernetes cluster.
Then we will build this image to create an image, and after this, we will scan this image to check what the image’s vulnerabilities are. Now we will scan the image using Trivy to check for our image vulnerabilities.
Default and bookworm
When we scan this image, Trivy detected 60 HIGH and 3 CRITICAL severity vulnerabilities, showing the CVEs and libraries that have them. Then, at this point, our application has 63 external vulnerabilities that could potentially cause it to malfunction, stop, expose data, or slow down unnecessarily.
Then we can try to solve this problem by applying the "apt-get update -y && apt-get upgrade -y" command to update some packages of our image.
But, we can see this doesn’t have a good result because we are left with an image that has 50 vulnerabilities, 49 HIGH, and 1 CRITICAL.
Ok, let’s try something different; we will try another Python 3.11 base image. By default, the image of Python 3.11 uses Bookwork (dockerhub).
We will experiment with using the same version (3.11 + Bookwork) but the slim version, and to execute the scan. We tested using FROM python:3.11-slim-bookworm (dockerhub) and FROM python:3.11-slim, and it returns the same result:
Bullseye
Well, we have some progress, what can we do now? Let’s try another Python image, let’s try python:3.11-bullseye. And this is the result:
WOW! If you didn’t know the bullseye, you can be impressed with this result. But let’s keep trying. We will test bullseye, like the 3.11 default, and bookworm, doing the test using this slim version and executing the tests.
Alpine
Ok, we detected that bullseye has more HIGH and CRITICAL vulnerabilities than the default and bookworm. Now we’ll try the alpine version, python:3.11-alpine.
Nice, this image doesn’t have HIGH or CRITICAL vulnerabilities; it’s so good. It happens because this image is smaller and doesn’t have some libraries when we compare it with other images, so we are safer about the OS layer. Otherwise, we need to install almost all that we need for our application to work. A common example is packages for communication and auxiliary tasks.
About the images
We are using an application that we want to use Poetry to manage our application dependencies, using only Django + Gunicorn.
Alpine
The Alpine image is more suitable, and it can make our Docker image safer, reducing the contact zone. It’s good. But, we will need to have more knowledge about our application in order to keep the environment ready to execute our application.
This app can require some communication protocol or dependencies from the OS layer, and there’s a high chance that this dependency doesn’t exist. Then we will ask, “in my computer it works,” which will happen because our personal computer’s OS has more installed packages than our Alpine image.
In our example, we need to add one more command before installing the dependencies using Poetry, we needed add command “apk add libffi-dev libc-dev gcc”. Something like this:
It works, but we need to keep looking at this; eventually, we need to add something to our Python application to make it work well. And we need to ensure that we have no vulnerabilities.
Default-Slim
The Slim image is fit, but it has a lot of utility packages and libraries that help us prevent problems outside of our code, unlike the Alpine version, where we need to keep thinking, “Do I need to install something for my application to work well?”. This risk is reduced. The image for the same example has fewer instructions:
But we are keeping our application with 3 vulnerabilities in libc and perl-base. We can update and/or remove unused libraries.
Other versions
The other versions can help with having dependencies that our application will need. We need to remember that some applications need specific dependencies, and Alpine and Slim versions will require more attention to add these manually.
On the other hand, the default and big versions are more prone to security failures, and this can make our application vulnerable, slow, or even break it.
Conclusion
When we are developing our application, we need to think about the environment in which it will run. In this article, we talked about some Docker images, scanning them one by one and discussing the results. After this, we have a decision to make.
Some points about our decision should consider, “what does my application do?”, “what vulnerabilities are exposed?”, “how likely is it to break because of this?”, “do I want to have work to prevent, mitigate, or contain the application damage?”, “am I able to detect an outside error?”.
Errors outside our application are very hard to detect; we can monitor the environment and add some tools to verify performance or irregular behavior. In my opinion, we need to prevent this as much as possible to keep our attention more on our code.
Another point is that to keep the software alive, you will need to know where it can break and help other teams, like SRE and security teams, to identify possible failures.
Finally, if we using a simple REST API or application that is using low OS Resources, I tend to use the version Slim or Alpine, but if I have more OS dependencies and I prevent to my application be a controlled environment then I prefer use the big images like Bullseye or Default.
I hope this was helpful. Thank you for taking the time to read this. :D