Skip to main content
  1. Articles/

Postgres databases in Kubernetes

·3575 words·17 mins·
Vegard S. Hagen
Author
Vegard S. Hagen
Pondering post-physicists
Table of Contents

Traditionally, an ideal Kubernetes workload should be stateless, meaning it should be able to run anywhere and scale freely, though this also severely limits which kinds of applications we can run, i.e. stateful applications.

In this article we’ll take a look at how we can solve the run anywhere and scalability challenges with stateful workloads in Kubernetes. We’ll first create a rudimentary Postgres database workload to get a grasp of the basics, before we take a look at Helm charts to ease some of the hassle of deploying a database. Lastly we’ll take a look at database operators and how they might make things even easier. At the end we’ll deploy the CloudNative PG operator to automate everything for us.

I’m not a database admin, so if you are — or at least know more than me about databases, I’d be happy to get your feedback and corrections!

Why a standalone database?
#

Databases are used to store data in an organised and consistent manner, making the data easy to access and manipulate. Many applications intended for self-hosting opt for an embedded database like SQLite, though they often support external databases.

Assuming we decide to use an application’s embedded database we now have a stateful Kubernetes workload, thus running into both the node affinity and scalability issues previously mentioned.

To alleviate the affinity limitation for stateful applications we can use distributed storage solutions like Ceph or Longhorn, which prevents us from binding the workload to a single node. Though, with distributed storage we have no guarantee that the application natively supports concurrent writes, meaning we have to make sure only one instance of the application runs at any one time. In Kubernetes, we can guarantee this by e.g. using a StatefulSet, or the Recreate strategy for a Deployment, making sure replicas is set to 1.

A possible solution to both the affinity and scalability challenges is using a database like PostgreSQL. If done correctly we should be able to connect to the database from every node — solving the node affinity issue, and if we’re lucky the database connection will also deal with concurrent writes — solving the scalability issue.

Here we’re effectively shifting the state from one workload — the main application, to another workload — the database. This helps with separation of concern, and by creating a database cluster we can potentially more easily configure high availability of the application. Backup of an external database if often also easier than with an embedded database, and is made even easier if you rely on an operator like CloudNative PG. We can potentially also collect multiple applications under different schemas in the same database, though this is not a recommended approach.

Databases in Kubernetes?
#

This article won’t discuss the benefits and drawback of running databases in Kubernetes. Instead, I’d like to guide you to a very informative article titled Recommended architectures for PostgreSQL in Kubernetes on the official CNCF blog.

I’ve decided that running databases inside the cluster makes sense in my homelab, though your mileage may vary.

A Simple Database in Kubernetes
#

The easiest way to get started is by creating a StatefulSet using Docker’s official Postgres image

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# simple/stateful-set.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: postgres-simple
spec:
  serviceName: postgres-service
  selector:
    matchLabels:
      app: postgres
  replicas: 1
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: docker.io/postgres:17.0
          ports:
            - containerPort: 5432
          env:
            - name: POSTGRES_DB
              value: postgres
            - name: POSTGRES_USER
              value: admin
            - name: POSTGRES_PASSWORD
              value: supers3cret

This spins up a containerised PostgreSQL 17 database in a pod with no persistent storage, meaning we will lose state when the pod is deleted.

If you’re inclined you can connect to the database by port-forwarding the pod which can be done by running

kubectl -n postgres-example port-forward postgres-0 5432:5432

You should now be able to connect to the database at localhost:5432/postgres using admin as the username and supers3cret as the password connecting to the postgres database.

To actually connect to the database from a different workload in the cluster we can create a Headless Service which makes it possible to address a specific pod directly on <pod-name>.<service-name>.<namespace>.svc.cluster.local

We can create a headless service setting ClusterIP to None, e.g.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# simple/svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: postgres-service
  namespace: postgres-simple
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: postgres
  ports:
    - protocol: TCP
      port: 5432

This allows other workloads inside the cluster to reach the database at postgres-0.postgres-service.postgres-simple.svc.cluster.local.

Note that the StatefulSetSpec API reference mentions that the serviceName Service must exist before the StatefulSet, so you probably have to delete and recreate the database StatefulSet above for the pod to be reachable.

We’re still not storing the state through pod deletion though, for this we need to attach a persistent volume.

Assuming you’re following my Kubernetes journey you might know that I’m currently running Talos on Proxmox and using the Proxmox CSI Plugin to provision storage, though however you provision storage should be fine.

If you’re just getting started and want to try things out, a simple hostPath mount might be enough, e.g. adding

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: postgres-example
spec:
  ...
  template:
    ...
    spec:
      containers:
          - name: postgres
            ...
            volumeMounts:
              - name: local
                mountPath: /var/lib/postgresql/data
      volumes:
        - name: local
          hostPath:
            path: <HOST_NODE_PATH>

to the StatefulSet above. Though if you intend for a more permanent installation you should instead consider using a local persistent volume.

You can find a complete example using Kustomize to deploy a Postgres database connected to a NFS volume in the Summary section.

Leveraging Helm charts
#

Instead of manually manufacturing and maintaining manifests for our Postgres installation we can use Helm charts to do the heavy lifting. A popular choice for this is the Bitnami PostgreSQL chart, and we can often find it bundled as an optional sub-chart to other Helm charts, e.g. Authelia and Immich.

The Bitnami PostgreSQL chart has a lot of buttons and levers, including LDAP integration, an easy way to create multiple read only replicas, and settings for taking backups. Properly covering all the options is an article in itself, so I’ll allow myself to be brief by only spinning up a primary instance. You can take a look at all the options in the chart default values.yaml file.

Before we spin up a database using the Bitnami PostgreSQL chart we have to decide how we want to store the state. Assuming you already have a StorageClass that can automatically provision PersistentVolumes — e.g. one that uses Proxmox CSI Plugin, the easiest way to get started is to reference that in the chart values.yaml file, e.g.

global:
  defaultStorageClass: "<STORE_CLASS_NAME>"

If you don’t care about persistence you can instead disable it for the primary instance by adding

primary:
  persistence:
    enabled: false

to the values.yaml file.

As a third option I’ve chosen to manually create a PersistentVolumeClaim using a pre-defined StorageClass, though you can also find an example of a local-storage PV and PVC in the Summary section.

# helm/pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
  namespace: postgres-helm
spec:
  storageClassName: proxmox-csi
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 4Gi

Applying the above PVC we can reference it the chart values.yaml file for the primary database instance by adding

primary:
  persistence:
    existingClaim: postgres-data

Next we should configure authentication for our database. A common pattern is to create an admin user for setting up the database using e.g. Flyway, and a less privileged user for the application runtime.

To do this we can append

auth:
  enablePostgresUser: true
  username: user
  existingSecret: auth

to the values.yaml file. The secret containing the passwords for the postgres and app users can be created by running

kubectl create secret generic auth \
  --from-literal postgres-password=supers3cret \
  --from-literal password=s3cret \
  --namespace postgres-helm

using the default secret key names which is postgres-password and password for the admin and custom user respectively.

If you even think about checking this secret into your version control system you should take a look at another Bitnami project called Sealed secrets used encrypt secrets.

Having crafted a desired values.yaml configuration we can install the Helm chart by running

helm install postgresql \
  oci://registry-1.docker.io/bitnamicharts/postgresql \
  --values values.yaml \
  --namespace postgres-helm

which should greet us with a command we can use to connect to the database similar to

kubectl port-forward --namespace postgres-helm svc/postgresql 5432:5432 &
psql --host 127.0.0.1 --dbname postgres --pport 5432 --username postgresql

Note that the Helm chart creates both a headless Service for internal communication and a regular Service for external connections which is what we connect to using the above port-forwarding.

The chart also creates a PodDisruptionBudget setting maxUnavailable primary database pods to 1 — which doesn’t really do much with our one replica, and a NetworkPolicy allowing all egress traffic and ingress traffic on port 5432 from any source. A ServiceAccount is also created which I assume can be given special privileges though the chart.

Sub-charts
#

Configuring a standalone PostgreSQL database helps us explore the different options. For other Helm charts bundling the Bitnami PostgreSQL chart we can often directly configure the installation under the postgresql key in the main application chart, e.g.

postgresql:
  global:
    defaultStorageClass: proxmox-csi
  primary:
    persistence:
      enabled: true

CloudNative PG Operator
#

A step up from Helm charts in regard to automating the Kubernetes workload lifecycle is using operators. Operators extend the Kubernetes API with Custom Resource Definitions to allow for new functionality.

There are many database operators to chose from. Inspired by Jérôme Petazzoni’s and Alexandre Buisine’s KubeCon + CloudNativeCon Europe 2024 talk titled We Tested and Compared 6 Database Operators. The Results are In! I decided to give CloudNative PG a go. An honorable mention is the Zalando postgres-operator which also has an optional GUI.

Installation
#

Following the CloudNative PG documentation, we can install the operator either directly from the manifest

kubectl apply --server-side -f \
  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.1.yaml

or through the CloudNative-PG Helm chart as

helm repo add cnpg https://cloudnative-pg.github.io/charts
helm upgrade --install cnpg \
  --namespace cnpg-system \
  --create-namespace \
  cnpg/cloudnative-pg

which will install the operator in the cnpg-system namespace. Note that CloudNative PG requires a Server-side apply to ensure compatability with Kubernetes 1.29 and higher. Keep this in mind if you’re planning to install CloudNative PG using other methods, e.g. using Argo CD Applications.

To see if the operator is up and running, we can check that the associated deployment is ready by executing

kubectl get deploy -n cnpg-system 

For Operator customisation see the Helm chart values file and the documentation on Operator Configuration.

With the operator operational we’re ready to deploy a Postgres Database Cluster using the postgresql.cnpg.io/v1 Cluster CRD. This is also available as a Cluster Helm chart from the same place as the operator.

Kubectl Plugin
#

CloudNative PG comes with an optional kubectl plugin to help make it easier to manage CloudNative PG resources. You can install the plugin using the time-honoured tradition of curling directly to a shell

curl -sSfL \
  https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | \
  sudo sh -s -- -b /usr/local/bin

The plugin documentation also lists other ways of installing the plugin if you prefer your packages signed. There is also an addon for the popular K9s TUI.

Basic Database Cluster
#

Assuming you’ve configured a default StorageClass that can automatically provision volumes for you, the simplest database “cluster” manifest you can create is

# cnpg/simplest.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: simple-database
  namespace: cnpg-database
spec:
  storage:
    size: 4Gi

This will provision a 4 GiB volume for our database, create RBAC objects — ServiceAccount, Role, and RoleBindingcertificates, a PodDisruptionBudget, and three different Services, along with a bootstrap Job, before finally starting the database in a new Pod. A Secret containing the database connection info is also created.

The certificates are used to facilitate TLS encryption and can be user provided with e.g. Cert-manager, or manually created.

Note that the default spec.instance value is 1, which spins up only a primary database with no replicas, i.e. not really a database cluster.

Database Services
#

The Cluster resource creates three Services (prefixed rw, ro and r). In the case of multiple instances the rw-service (read/write) will point to the primary instance, the ro-service (read-only) points to the replicas, and the r-service (read) points to any database instance. For further customisation, the ro- and r -services can be disabled. It’s also possible to add custom services, if you e.g. require a LoadBalancer type Service.

--- title: Database services and their connections --- flowchart TB rw{{"Read/Write"}} ro{{"Read Only"}} r{{"Read"}} primary[("Primary")] repl1[("Replica")] repl2[("Replica")] ro --> repl1 ro --> repl2 r --> primary r --> repl1 r --> repl2 rw --> primary

Bootstrapping
#

The bootstrap job can be used to create a new cluster from scratch, or from an existing PostgreSQL cluster, either directly or indirectly through a backup.

In this article we’ll only take a look at bootstrapping a fresh cluster, though I encourage you to get to know the other bootstrapping possibilities for recovery purposes.

Database Users
#

CloudNative PG automatically generates a postgres and an app user with a random password.

If no user password is configured using the .spec.bootstrap.initdb.secret field, a <CLUSTER_NAME>-app secret is created containing the user password and database connection details. The .spec.bootstrap.initdb can also be used to change the username of the custom user — in which case the generated secret will be postfixed with that username instead of app.

The postgres superuser — and the postgres database, are supposed to be only used by the operator to configure the database cluster, though it can be enabled by setting .spec.enableSuperuserAccess to true, which will create a <CLUSTER_NAME>-superuser secret similar to the app-user secret.

To create additional users we can use the declarative Database Role Management options to configure them.

Custom Database Cluster
#

The CloudNative PG Cluster resource has a lot of options. I recommend you take a look at the samples along with the API Reference to explore all the possibilities.

To go through some of the options I’ve prepared the following custom three instance (line 9) cluster — one primary and two replicas. You can find the full example in the Summary section.

In the example cluster, the storageClass is explicitly set (line 12) to be proxmox-csi (more details in this article). The database version is pinned in line 15 to be 16.4. Against better judgement we’ve enabled superuser access (line 20) with the password provided in the secret on line 22.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# cnpg/custom-cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: custom-cluster
  namespace: cnpg-database
spec:

  instances: 3

  storage:
    storageClass: proxmox-csi
    size: 2Gi

  imageName: ghcr.io/cloudnative-pg/postgresql:16.4

  monitoring:
    enablePodMonitor: true

  enableSuperuserAccess: true
  superuserSecret:
    name: postgres-credentials

  bootstrap:
    initdb:

      database: appdb

      owner: user
      secret:
        name: user-credentials

      postInitApplicationSQLRefs:
        configMapRefs:
          - name: post-init
            key: post-init.sql

  postgresql:
    parameters:
      timezone: "Europe/Oslo"

  managed:

    roles:
      - name: read
        passwordSecret:
          name: read-credentials
        ensure: present
        comment: Read Only user
        login: true
        inherit: true
        inRoles:
          - pg_read_all_data

    services:
      disabledDefaultServices: [ "ro" ]
      additional:
        - selectorType: rw
          updateStrategy: patch
          serviceTemplate:
            metadata:
              name: custom-database-ext
              annotations:
                io.cilium/lb-ipam-ips: 192.168.1.233
            spec:
              type: LoadBalancer

In the bootstrap section we’re creating a new database called appdb (line 27) with an owner called user (line 29). We again provide the login password using a named Secret (line 31). In the post init section we’ve referenced a ConfigMap with some illustrative instructions (lines 33-36).

We lightly touch the postgresql parameters to set the timezone to something appropriate (line 40).

Next we utilise the managed configuration to create a read-only user called read and disable the default read-only (ro) service on line 56. We also create a custom LoadBalancer type Service to allow for external connections. Here we’ve also taken advantage of Cilium’s LB-IPAM functionality (line 64) to declaratively give the Service an IP we can reach outside the cluster (more details in this article).

Backups and Recovery
#

CloudNative PG relies on Barman to perform on physical and Write-Ahead Logging (WAL) backups, though if you prefer to have a logical backup of your database they’ve described how to perform this using pg_dump in an Emergency backup section of their documentation.

This “emergency” procedure boils down to running

kubectl exec  "<PRIMARY_CLUSTER_POD>" -c postgres \
  -- pg_dump -Fc -d "<DATABSE>" > "<DATABASE>.dump"

to save the logical backup on your work machine. To restore the logical backup you can then run

kubectl exec -i "<NEW_PRIMARY_CLUSTER_POD>" -c postgres \
  -- pg_restore --no-owner --role="<USER>" -d "<DATABASE>" --verbose < "<DATABASE>.dump"

Regular backups using CloudNative PG requires that the database Cluster is provided with a BackupConfiguration to perform either Kubernetes volume snapshot — backed by a StorageClass that supports it, or an object store backup using the Barman Cloud tool to an S3 compatible object store like e.g. MinIO.

The recommended way of performing a backup using CloudNative PG is by using their ScheduledBackup resource, though they also offer an on-demand backup using the Backup resource. Assuming you’ve configured the backup section of your Cluster correctly, this resource can be created using the kubectl plugin mentioned earlier by executing

kubectl cnpg backup <CLUSTER_NAME>

which should start creating a backup of your database.

Monitoring
#

CloudNative PG readily integrates with both Prometheus and Grafana.

To make it easy to get started they’ve also prepared a Grafana dashboard template to show relevant metrics for your different Cluster resources.

For more details see the Monitoring section in the CloudNative PG documentation.

Other features
#

Worth noting is CloudNative PG’s Image Catalog feature along with the Rolling Updates options that allows for a controlled upgrade of an existing database cluster.

For more advanced setups, CloudNative PG also supports Connection Pooling using PgBouncer, Tablespaces, as well as the PostGIS extension, and pprof for profiling along with other nifty features.

For inspiration and further reading on how to use the CloudNative PG operator, I found this article to be interesting.

Summary
#

To avoid to many details in the main text I’ve put some of them in the summary to hopefully provide better examples. You can also find these examples on GitLab

Kustomize
#

# kustomize/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

configMapGenerator:
  - name: database
    namespace: postgres-simple
    literals:
      - POSTGRES_DB=postgres

resources:
  - ns.yaml
  - svc.yaml
  - local-pv.yaml
  - local-pvc.yaml
  - credentials.yaml
  - stateful-set.yaml
# kustomize/ns.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: postgres-simple
# kustomize/svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: postgres-service
  namespace: postgres-simple
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: postgres
  ports:
    - protocol: TCP
      port: 5432
# kustomize/credentials.yaml
apiVersion: v1
kind: Secret
metadata:
  name: credentials
  namespace: postgres-simple
type: Opaque
stringData:
  POSTGRES_USER: 'admin'
  POSTGRES_PASSWORD: 'supers3cret'
# kustomize/stateful-set.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: postgres-simple
spec:
  serviceName: postgres-service
  selector:
    matchLabels:
      app: postgres
  replicas: 1
  template:
    metadata:
      labels:
        app: postgres
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: "<NFS_UID>"
        runAsGroup: "<NFS_GID>"
      containers:
        - name: postgres
          image: docker.io/postgres:17.0
          ports:
            - containerPort: 5432
          envFrom:
            - configMapRef:
                name: database
            - secretRef:
                 name: credentials
          volumeMounts:
            - name: db
              mountPath: /var/lib/postgresql/data
      volumes:
        - name: db
          nfs:
            server: "<SERVER>"
            path: "<PATH>"

Bitnami Helm chart
#

# helm/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ns.yaml
  - local-pv.yaml
  - local-pvc.yaml
  - auth-secret.yaml

helmCharts:
  - name: postgresql
    repo: oci://registry-1.docker.io/bitnamicharts
    releaseName: postgresql
    namespace: postgres-helm
    version: 16.0.3
    valuesFile: values.yaml
# helm/values.yaml
image:
  tag: 17.0.0
auth:
  username: app
  existingSecret: auth
primary:
  persistence:
    existingClaim: postgres-data
# helm/ns.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: postgres-helm
# helm/local-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: postgres-data
spec:
  capacity:
    storage: 8Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-db-storage
  local:
    path: "<LOCAL_PATH>"
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - { key: kubernetes.io/hostname, operator: In, values: [ "<NODE_NAME>" ] }
# helm/local-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
  namespace: postgres-helm
spec:
  storageClassName: local-db-storage
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
# helm/auth-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: auth
  namespace: postgres-helm
type: Opaque
stringData:
  password: 's3cret'
  postgres-password: 'supers3cret'

Cloud Native PG
#

# cnpg/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ns.yaml
  - postgres-credentials.yaml
  - user-credentials.yaml
  - read-credentials.yaml
  - cm-post-init.yaml
  - custom-cluster.yaml
# cnpg/ns.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: cnpg-database
# cnpg/postgres-credentials.yaml
apiVersion: v1
kind: Secret
metadata:
  name: postgres-credentials
  namespace: cnpg-database
type: kubernetes.io/basic-auth
stringData:
  username: 'postgres'
  password: 'supers3cret'
# cnpg/user-credentials.yaml
apiVersion: v1
kind: Secret
metadata:
  name: user-credentials
  namespace: cnpg-database
type: kubernetes.io/basic-auth
stringData:
  username: 'user'
  password: 's3cret'
# cnpg/read-credentials.yaml
apiVersion: v1
kind: Secret
metadata:
  name: read-credentials
  namespace: cnpg-database
type: kubernetes.io/basic-auth
stringData:
  username: 'read'
  password: 'letmelook'
# cnpg/cm-post-init.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: post-init
  namespace: cnpg-database
data:
  post-init.sql: |
    create table for_two (seats integer);
    insert into for_two (seats) values (2);
    grant all privileges on table for_two to "user";    
# cnpg/custom-cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: custom-cluster
  namespace: cnpg-database
spec:

  instances: 3

  storage:
    storageClass: proxmox-csi
    size: 2Gi

  imageName: ghcr.io/cloudnative-pg/postgresql:16.4

  monitoring:
    enablePodMonitor: true

  enableSuperuserAccess: true
  superuserSecret:
    name: postgres-credentials

  bootstrap:
    initdb:

      database: appdb

      owner: user
      secret:
        name: user-credentials

      postInitApplicationSQLRefs:
        configMapRefs:
          - name: post-init
            key: post-init.sql

  postgresql:
    parameters:
      timezone: "Europe/Oslo"

  managed:

    roles:
      - name: read
        passwordSecret:
          name: read-credentials
        ensure: present
        comment: Read Only user
        login: true
        inherit: true
        inRoles:
          - pg_read_all_data

    services:
      disabledDefaultServices: [ "ro" ]
      additional:
        - selectorType: rw
          updateStrategy: patch
          serviceTemplate:
            metadata:
              name: custom-database-ext
              annotations:
                io.cilium/lb-ipam-ips: 192.168.1.233
            spec:
              type: LoadBalancer