6. Persistent Data with Volumes | The Complete Docker Handbook.

6 mins read
1 Like
27 Views

Welcome to Article 6 of The Complete Docker Handbook.

In Article 5, we optimized our Dockerfiles for size and security. Now we face a critical challenge: Data Persistence.

If you have been following along, you might have noticed something troubling. When you delete a container (docker rm), any data created inside that container disappears forever. This is because containers are designed to be ephemeral (temporary).

This is fine for stateless web servers, but what about databases? What about uploaded files? If your database container crashes and you restart it, you don't want to lose all your customer data.

In this article, we will solve the ephemeral problem using Docker Volumes. We will learn how to persist data outside the container's lifecycle, ensuring your information survives restarts, updates, and deletions.


The Problem: The Writable Layer

To understand volumes, we need to recall the container architecture from Article 3.

  1. Image Layers: Read-only.
  2. Container Layer: A thin writable layer on top.

When you write a file inside a running container, it is saved to this Container Layer.

  • The Catch: When the container is deleted, this writable layer is deleted with it.
  • The Limitation: This layer is tied to the specific container instance. You cannot easily share this data with another container.

[Visual Idea: Diagram showing data written to the container layer vanishing when the container is removed]


The Solution: Docker Volumes

Docker Volumes allow you to store data on the host machine (or a remote drive) and mount it into the container.

  • Persistence: Data exists independently of the container. You can delete the container, and the data remains safe on the host.
  • Sharing: Multiple containers can mount the same volume to share data.
  • Performance: Volumes are often faster than the container's writable layer, especially on Docker Desktop (Mac/Windows).

Types of Storage in Docker

Docker offers three main ways to persist data.

  • Location: Stored in a part of the host filesystem managed by Docker (/var/lib/docker/volumes/ on Linux).
  • Management: Created and managed via Docker CLI (docker volume create).
  • Best For: Production databases, persistent app data.
  • Security: Non-root users cannot easily access the data on the host.

2. Bind Mounts

  • Location: Anywhere on the host machine (e.g., /home/user/code).
  • Management: Managed by the host OS.
  • Best For: Development. Mounting your source code into the container so changes reflect instantly without rebuilding.
  • Security: Riskier. The container can access any file in the mounted path.

3. Tmpfs Mounts

  • Location: Stored in the host system's memory (RAM).
  • Best For: Sensitive data (secrets) that should never be written to disk.
  • Limitation: Data is lost when the container stops. Linux only.

Practical Example: Persisting a Database

Let's see volumes in action using PostgreSQL. We will compare running a database without a volume vs. with a volume.

Scenario A: Without Volume (Data Loss)

  1. Run the container:
    Plain Text
    1bash
    2docker run --name db-test -e POSTGRES_PASSWORD=mysecret -d postgres
  2. Create data: (Imagine you connect and create a table).
  3. Delete the container:
    Plain Text
    1bash
    2docker rm -f db-test
  4. Restart: If you run the command again, you have a fresh, empty database. Data is lost.

Scenario B: With Volume (Data Persistence)

  1. Create a volume:
    Plain Text
    1bash
    2docker volume create pg-data
  2. Run the container with the volume:
    Plain Text
    1bash
    2docker run --name db-prod -e POSTGRES_PASSWORD=mysecret -v pg-data:/var/lib/postgresql/data -d postgres
    • -v pg-data:/var/lib/postgresql/data: This maps the Docker volume pg-data to the directory inside the container where Postgres stores its data.
  3. Create data: (Create a table).
  4. Delete the container:
    Plain Text
    1bash
    2docker rm -f db-prod
  5. Restart:
    Plain Text
    1bash
    2docker run --name db-prod-new -e POSTGRES_PASSWORD=mysecret -v pg-data:/var/lib/postgresql/data -d postgres
  6. Result: Your data is still there! The new container attached to the existing volume.

Managing Volumes

You can manage volumes using the docker volume command suite.

Command Description
docker volume create Create a new volume.
docker volume ls List all volumes.
docker volume inspect View details (mountpoint, driver).
docker volume rm Delete a specific volume.
docker volume prune Remove all unused local volumes.

Inspecting a Volume:

Bash
docker volume inspect pg-data

Output shows the Mountpoint on your host machine where the data physically lives.


Bind Mounts for Development

While Volumes are best for production, Bind Mounts are essential for development workflows.

Imagine you are coding a Python app. Without a bind mount, every time you change a line of code, you would need to:

  1. Save file.
  2. Rebuild Docker image.
  3. Restart container.

With a Bind Mount, you map your local code folder to the container's app folder.

Bash
docker run -v $(pwd):/app -p 8000:8000 my-python-app
  • $(pwd): Gets your current host directory path (use %cd% on Windows CMD).
  • /app: The path inside the container.

Result: You edit app.py on your laptop using VS Code, and the container sees the change immediately. No rebuild needed.


Best Practices & Security

  1. Use Named Volumes for Databases: Always use docker volume create or named volumes (-v my-db-data:/data) for databases. Avoid bind mounts for DB data in production because permission issues can corrupt your database.
  2. Backup Volumes: Since volumes live on the host, you need to back them up. You can use a temporary container to tarball the volume data.
  3. Permissions: If you use Bind Mounts on Linux, ensure the user inside the container has permission to write to the host folder.
  4. Don't Mount Root: Never bind mount your host's root directory (/) to a container. This gives the container full control over your host OS.

Summary Checklist

By the end of this article, you should be able to:

  • Explain why container data is ephemeral.
  • Differentiate between Volumes, Bind Mounts, and Tmpfs.
  • Create and manage Docker Volumes.
  • Run a database container with persistent storage.
  • Use Bind Mounts for local development.

What's Next?

You now know how to store data permanently. But containers rarely work in isolation. Your web app needs to talk to your database. Your frontend needs to talk to your backend.

How do containers find each other? How do you expose ports securely?

In Article 7, we will dive into Docker Networking.

  • Bridge networks.
  • Container-to-container communication.
  • Port mapping deep dive.

Link: Read Article 7: Docker Networking Explained


Challenge: Run a MySQL container with a volume. Create a database and a table. Stop and remove the container. Start a new container attached to the same volume and verify the table still exists.

Next Up: Docker Networking Explained

Share:

Comments

0
Join the conversation

Sign in to share your thoughts and connect with other readers

No comments yet

Be the first to share your thoughts!