5. Optimizing Dockerfiles (Best Practices) | The Complete Docker Handbook.
Welcome to Article 5 of The Complete Docker Handbook.
In Article 4, you wrote your first Dockerfile and successfully built a custom image. Congratulations! But if you check the size of that image, you might be surprised. A simple "Hello World" Python app could easily result in an image over 900MB.
In production, large images mean:
- Slower Builds: More data to transfer and process.
- Slower Deploys: More data to pull onto your servers.
- Security Risks: More installed packages mean more potential vulnerabilities.
In this article, we will transform you from a Docker beginner into a Docker practitioner. We will cover how to shrink your images, speed up your builds, and secure your containers.
1. Choose the Right Base Image
The foundation of your image determines its final size. Docker Hub offers various versions of the same image.
| Image Tag | Description | Size (Approx) | Use Case |
|---|---|---|---|
python:3.9 | Full Debian-based image. | ~900 MB | Debugging, complex deps requiring glibc. |
python:3.9-slim | Stripped-down Debian. | ~120 MB | Recommended default. Good balance. |
python:3.9-alpine | Based on Alpine Linux. | ~50 MB | Minimal size. Caution: Uses musl libc instead of glibc. |
The Alpine Trap
Alpine images are tiny, but they aren't always compatible. Some Python packages (like numpy or pandas) or Node.js binaries require specific Linux libraries (glibc) that Alpine doesn't have.
- Tip: Start with
-slim. Switch to-alpineonly if you need to shave off megabytes and have verified compatibility.
2. Leverage Build Cache (Order Matters)
Docker builds images layer by layer. If a layer hasn't changed since the last build, Docker uses the cache instead of re-running the instruction.
The Golden Rule: Copy files that change least often first.
❌ The Slow Way
Plain Text1COPY . . 2RUN pip install -r requirements.txt
Why it's bad: If you change one line of code, the COPY . . layer changes. Docker invalidates the cache for this layer AND all subsequent layers. pip install will run every single time, even if dependencies didn't change.
✅ The Fast Way
Plain Text1COPY requirements.txt . 2RUN pip install -r requirements.txt 3COPY . .
Why it's good: If you change your code, only the last COPY layer is invalidated. The pip install layer is cached because requirements.txt didn't change. This saves minutes on every build.
3. Minimize Layers (Combine RUN Commands)
Every RUN instruction creates a new layer. Too many layers can bloat your image. You can combine commands using && and line continuations \.
❌ Many Layers
Plain Text1RUN apt-get update 2RUN apt-get install -y vim 3RUN apt-get install -y curl 4RUN rm -rf /var/lib/apt/lists/*
✅ Single Layer
Plain Text1RUN apt-get update && apt-get install -y --no-install-recommends \ 2 vim \ 3 curl \ 4 && rm -rf /var/lib/apt/lists/*
--no-install-recommends: Prevents installing unnecessary suggested packages.rm -rf ...: Cleans up apt cache in the same layer. If you do this in a separateRUNcommand, the data is deleted in the new layer, but the old layer still contains the cached files, keeping the image size large.
4. The .dockerignore File
We mentioned this in Article 4, but it deserves emphasis. Without a .dockerignore, Docker copies everything in your folder to the build context, including:
.gitfolders (huge!)node_modules(huge!)- Local environment files (
.env) containing secrets. - Build artifacts (
__pycache__,dist/).
Create a .dockerignore:
Plain Text1.git 2__pycache__ 3*.pyc 4.env 5venv 6node_modules 7*.md
This reduces the build context size, speeding up the initial transfer to the Docker Daemon.
5. Multi-Stage Builds (The Game Changer)
This is the most powerful optimization technique. It allows you to use one image to build your app and a different, smaller image to run it.
Scenario: You have a Go application.
- Stage 1: You need a large image with the Go compiler to build the binary.
- Stage 2: You only need the compiled binary to run. You don't need the compiler in production.
Example: Multi-Stage Dockerfile
Plain Text1# --- Stage 1: Build --- 2FROM golang:1.19 AS builder 3WORKDIR /app 4COPY . . 5RUN go build -o main . 6 7# --- Stage 2: Run --- 8FROM alpine:latest 9WORKDIR /root/ 10# Copy only the binary from the builder stage 11COPY --from=builder /app/main . 12CMD ["./main"]
Result:
- Single Stage Image: ~800 MB (Includes Go compiler).
- Multi-Stage Image: ~15 MB (Includes only the binary and Alpine OS).
You can use this for Node.js (copy node_modules from builder to runner) or Python (compile bytecode in builder, run in runner).
6. Security Best Practices
Optimization isn't just about size; it's about safety.
1. Don't Run as Root
By default, containers run as the root user. If a hacker exploits your app, they have root access to the container.
Fix: Create a user and switch to it.
Plain Text1FROM python:3.9-slim 2 3# Create user 4RUN useradd -m appuser 5 6WORKDIR /app 7COPY . . 8RUN chown -R appuser /app 9 10# Switch user 11USER appuser 12 13CMD ["python", "app.py"]
2. Scan for Vulnerabilities
Use tools to check your images for known security flaws.
- Docker Scan:
docker scan my-image - Trivy: A popular open-source vulnerability scanner.
3. No Secrets in Dockerfile
Never hardcode passwords or API keys in your Dockerfile.
- Bad:
ENV DB_PASSWORD=supersecret - Good: Pass secrets at runtime using Docker Secrets or Environment Variables via Docker Compose (covered in Article 8).
Comparison: Before vs. After
Let's look at the impact of these changes on a hypothetical Node.js app.
| Metric | Beginner Dockerfile | Optimized Dockerfile |
|---|---|---|
| Base Image | node:latest (Full) | node:18-alpine |
| Layer Order | COPY . . then npm install | COPY package.json then npm install |
| Layers | Multiple RUN commands | Combined RUN commands |
| User | Root | Non-root user |
| Image Size | ~950 MB | ~110 MB |
| Build Time | 2 minutes (no cache) | 10 seconds (with cache) |
Summary Checklist
By the end of this article, you should be able to:
- Choose appropriate base images (
slimvsalpine). - Order Dockerfile instructions to maximize cache usage.
- Combine
RUNcommands to minimize layers. - Implement Multi-Stage Builds to reduce image size.
- Configure a non-root user for security.
- Use
.dockerignoreeffectively.
What's Next?
Your images are now lean, mean, and secure. But so far, our containers have been stateless. If you delete a container, any data created inside it (like a database record or an uploaded file) is lost forever.
In Module 3, we tackle Data and Networking. In Article 6, we will learn how to persist data using Docker Volumes so your database survives container restarts.
Link: Read Article 6: Persistent Data with Volumes
Challenge: Take the Dockerfile you wrote in Article 4. Try to reduce its size using Multi-Stage builds or by switching to a -slim image. Share your size reduction in the comments!
Next Up: Persistent Data with Volumes