5. Optimizing Dockerfiles (Best Practices) | The Complete Docker Handbook.

6 mins read
2 Likes
29 Views

Welcome to Article 5 of The Complete Docker Handbook.

In Article 4, you wrote your first Dockerfile and successfully built a custom image. Congratulations! But if you check the size of that image, you might be surprised. A simple "Hello World" Python app could easily result in an image over 900MB.

In production, large images mean:

  • Slower Builds: More data to transfer and process.
  • Slower Deploys: More data to pull onto your servers.
  • Security Risks: More installed packages mean more potential vulnerabilities.

In this article, we will transform you from a Docker beginner into a Docker practitioner. We will cover how to shrink your images, speed up your builds, and secure your containers.


1. Choose the Right Base Image

The foundation of your image determines its final size. Docker Hub offers various versions of the same image.

Image Tag Description Size (Approx) Use Case
python:3.9 Full Debian-based image. ~900 MB Debugging, complex deps requiring glibc.
python:3.9-slim Stripped-down Debian. ~120 MB Recommended default. Good balance.
python:3.9-alpine Based on Alpine Linux. ~50 MB Minimal size. Caution: Uses musl libc instead of glibc.

The Alpine Trap

Alpine images are tiny, but they aren't always compatible. Some Python packages (like numpy or pandas) or Node.js binaries require specific Linux libraries (glibc) that Alpine doesn't have.

  • Tip: Start with -slim. Switch to -alpine only if you need to shave off megabytes and have verified compatibility.

2. Leverage Build Cache (Order Matters)

Docker builds images layer by layer. If a layer hasn't changed since the last build, Docker uses the cache instead of re-running the instruction.

The Golden Rule: Copy files that change least often first.

❌ The Slow Way

Plain Text
1COPY . .
2RUN pip install -r requirements.txt

Why it's bad: If you change one line of code, the COPY . . layer changes. Docker invalidates the cache for this layer AND all subsequent layers. pip install will run every single time, even if dependencies didn't change.

✅ The Fast Way

Plain Text
1COPY requirements.txt .
2RUN pip install -r requirements.txt
3COPY . .

Why it's good: If you change your code, only the last COPY layer is invalidated. The pip install layer is cached because requirements.txt didn't change. This saves minutes on every build.


3. Minimize Layers (Combine RUN Commands)

Every RUN instruction creates a new layer. Too many layers can bloat your image. You can combine commands using && and line continuations \.

❌ Many Layers

Plain Text
1RUN apt-get update
2RUN apt-get install -y vim
3RUN apt-get install -y curl
4RUN rm -rf /var/lib/apt/lists/*

✅ Single Layer

Plain Text
1RUN apt-get update && apt-get install -y --no-install-recommends \
2    vim \
3    curl \
4    && rm -rf /var/lib/apt/lists/*
  • --no-install-recommends: Prevents installing unnecessary suggested packages.
  • rm -rf ...: Cleans up apt cache in the same layer. If you do this in a separate RUN command, the data is deleted in the new layer, but the old layer still contains the cached files, keeping the image size large.

4. The .dockerignore File

We mentioned this in Article 4, but it deserves emphasis. Without a .dockerignore, Docker copies everything in your folder to the build context, including:

  • .git folders (huge!)
  • node_modules (huge!)
  • Local environment files (.env) containing secrets.
  • Build artifacts (__pycache__, dist/).

Create a .dockerignore:

Plain Text
1.git
2__pycache__
3*.pyc
4.env
5venv
6node_modules
7*.md

This reduces the build context size, speeding up the initial transfer to the Docker Daemon.


5. Multi-Stage Builds (The Game Changer)

This is the most powerful optimization technique. It allows you to use one image to build your app and a different, smaller image to run it.

Scenario: You have a Go application.

  • Stage 1: You need a large image with the Go compiler to build the binary.
  • Stage 2: You only need the compiled binary to run. You don't need the compiler in production.

Example: Multi-Stage Dockerfile

Plain Text
1# --- Stage 1: Build ---
2FROM golang:1.19 AS builder
3WORKDIR /app
4COPY . .
5RUN go build -o main .
6
7# --- Stage 2: Run ---
8FROM alpine:latest
9WORKDIR /root/
10# Copy only the binary from the builder stage
11COPY --from=builder /app/main .
12CMD ["./main"]

Result:

  • Single Stage Image: ~800 MB (Includes Go compiler).
  • Multi-Stage Image: ~15 MB (Includes only the binary and Alpine OS).

You can use this for Node.js (copy node_modules from builder to runner) or Python (compile bytecode in builder, run in runner).


6. Security Best Practices

Optimization isn't just about size; it's about safety.

1. Don't Run as Root

By default, containers run as the root user. If a hacker exploits your app, they have root access to the container.

Fix: Create a user and switch to it.

Plain Text
1FROM python:3.9-slim
2
3# Create user
4RUN useradd -m appuser
5
6WORKDIR /app
7COPY . .
8RUN chown -R appuser /app
9
10# Switch user
11USER appuser
12
13CMD ["python", "app.py"]

2. Scan for Vulnerabilities

Use tools to check your images for known security flaws.

  • Docker Scan: docker scan my-image
  • Trivy: A popular open-source vulnerability scanner.

3. No Secrets in Dockerfile

Never hardcode passwords or API keys in your Dockerfile.

  • Bad: ENV DB_PASSWORD=supersecret
  • Good: Pass secrets at runtime using Docker Secrets or Environment Variables via Docker Compose (covered in Article 8).

Comparison: Before vs. After

Let's look at the impact of these changes on a hypothetical Node.js app.

Metric Beginner Dockerfile Optimized Dockerfile
Base Image node:latest (Full) node:18-alpine
Layer Order COPY . . then npm install COPY package.json then npm install
Layers Multiple RUN commands Combined RUN commands
User Root Non-root user
Image Size ~950 MB ~110 MB
Build Time 2 minutes (no cache) 10 seconds (with cache)

Summary Checklist

By the end of this article, you should be able to:

  • Choose appropriate base images (slim vs alpine).
  • Order Dockerfile instructions to maximize cache usage.
  • Combine RUN commands to minimize layers.
  • Implement Multi-Stage Builds to reduce image size.
  • Configure a non-root user for security.
  • Use .dockerignore effectively.

What's Next?

Your images are now lean, mean, and secure. But so far, our containers have been stateless. If you delete a container, any data created inside it (like a database record or an uploaded file) is lost forever.

In Module 3, we tackle Data and Networking. In Article 6, we will learn how to persist data using Docker Volumes so your database survives container restarts.

Link: Read Article 6: Persistent Data with Volumes


Challenge: Take the Dockerfile you wrote in Article 4. Try to reduce its size using Multi-Stage builds or by switching to a -slim image. Share your size reduction in the comments!

Next Up: Persistent Data with Volumes

Share:

Comments

0
Join the conversation

Sign in to share your thoughts and connect with other readers

No comments yet

Be the first to share your thoughts!