Traditionally, an ideal Kubernetes workload should be stateless, meaning it should be able to run anywhere and scale freely, though this also severely limits which kinds of applications we can run, i.e. stateful applications.
In this article we’ll take a look at how we can solve the run anywhere and scalability challenges with stateful workloads in Kubernetes. We’ll first create a rudimentary Postgres database workload to get a grasp of the basics, before we take a look at Helm charts to ease some of the hassle of deploying a database. Lastly we’ll take a look at database operators and how they might make things even easier. At the end we’ll deploy the CloudNative PG operator to automate everything for us.
I’m not a database admin, so if you are — or at least know more than me about databases, I’d be happy to get your feedback and corrections!
Why a standalone database?#
Databases are used to store data in an organised and consistent manner, making the data easy to access and manipulate. Many applications intended for self-hosting opt for an embedded database like SQLite, though they often support external databases.
Assuming we decide to use an application’s embedded database we now have a stateful Kubernetes workload, thus running into both the node affinity and scalability issues previously mentioned.
To alleviate the affinity limitation for stateful applications we can use distributed storage solutions like Ceph or Longhorn, which prevents us from binding the workload to a single node. Though, with distributed storage we have no guarantee that the application natively supports concurrent writes, meaning we have to make sure only one instance of the application runs at any one time. In Kubernetes, we can guarantee this by e.g. using a StatefulSet, or the Recreate strategy for a Deployment, making sure replicas is set to 1.
A possible solution to both the affinity and scalability challenges is using a database like PostgreSQL. If done correctly we should be able to connect to the database from every node — solving the node affinity issue, and if we’re lucky the database connection will also deal with concurrent writes — solving the scalability issue.
Here we’re effectively shifting the state from one workload — the main application, to another workload — the database. This helps with separation of concern, and by creating a database cluster we can potentially more easily configure high availability of the application. Backup of an external database if often also easier than with an embedded database, and is made even easier if you rely on an operator like CloudNative PG. We can potentially also collect multiple applications under different schemas in the same database, though this is not a recommended approach.
Databases in Kubernetes?#
This article won’t discuss the benefits and drawback of running databases in Kubernetes. Instead, I’d like to guide you to a very informative article titled Recommended architectures for PostgreSQL in Kubernetes on the official CNCF blog.
I’ve decided that running databases inside the cluster makes sense in my homelab, though your mileage may vary.
A Simple Database in Kubernetes#
The easiest way to get started is by creating a StatefulSet using Docker’s official Postgres image
|
|
This spins up a containerised PostgreSQL 17 database in a pod with no persistent storage, meaning we will lose state when the pod is deleted.
If you’re inclined you can connect to the database by port-forwarding the pod which can be done by running
kubectl -n postgres-example port-forward postgres-0 5432:5432
You should now be able to connect to the database at localhost:5432/postgres
using admin
as the username and
supers3cret
as the password connecting to the postgres
database.
To actually connect to the database from a different workload in the cluster we can create a
Headless Service which makes it
possible to address a specific pod directly on <pod-name>.<service-name>.<namespace>.svc.cluster.local
We can create a headless service setting ClusterIP
to None
, e.g.
|
|
This allows other workloads inside the cluster to reach the database at
postgres-0.postgres-service.postgres-simple.svc.cluster.local
.
Note that
the StatefulSetSpec API reference
mentions that the serviceName
Service must exist before the StatefulSet,
so you probably have to delete and recreate the database StatefulSet above for the pod to be reachable.
We’re still not storing the state through pod deletion though, for this we need to attach a persistent volume.
Assuming you’re following my Kubernetes journey you might know that I’m currently running Talos on Proxmox and using the Proxmox CSI Plugin to provision storage, though however you provision storage should be fine.
If you’re just getting started and want to try things out, a simple hostPath mount might be enough, e.g. adding
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: postgres-example
spec:
...
template:
...
spec:
containers:
- name: postgres
...
volumeMounts:
- name: local
mountPath: /var/lib/postgresql/data
volumes:
- name: local
hostPath:
path: <HOST_NODE_PATH>
to the StatefulSet above. Though if you intend for a more permanent installation you should instead consider using a local persistent volume.
You can find a complete example using Kustomize to deploy a Postgres database connected to a NFS volume in the Summary section.
Leveraging Helm charts#
Instead of manually manufacturing and maintaining manifests for our Postgres installation we can use Helm charts to do the heavy lifting. A popular choice for this is the Bitnami PostgreSQL chart, and we can often find it bundled as an optional sub-chart to other Helm charts, e.g. Authelia and Immich.
The Bitnami PostgreSQL chart has a lot of buttons and levers, including LDAP integration, an easy way to create multiple read only replicas, and settings for taking backups. Properly covering all the options is an article in itself, so I’ll allow myself to be brief by only spinning up a primary instance. You can take a look at all the options in the chart default values.yaml file.
Before we spin up a database using the Bitnami PostgreSQL chart we have to decide how we want to store the state.
Assuming you already have a StorageClass that can
automatically provision PersistentVolumes
— e.g. one that uses Proxmox CSI Plugin,
the easiest way to get started is to reference that in the chart values.yaml
file, e.g.
global:
defaultStorageClass: "<STORE_CLASS_NAME>"
If you don’t care about persistence you can instead disable it for the primary instance by adding
primary:
persistence:
enabled: false
to the values.yaml
file.
As a third option I’ve chosen to manually create a PersistentVolumeClaim using a pre-defined StorageClass, though you can also find an example of a local-storage PV and PVC in the Summary section.
# helm/pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: postgres-helm
spec:
storageClassName: proxmox-csi
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 4Gi
Applying the above PVC we can reference it the chart values.yaml
file for the primary database instance by adding
primary:
persistence:
existingClaim: postgres-data
Next we should configure authentication for our database. A common pattern is to create an admin user for setting up the database using e.g. Flyway, and a less privileged user for the application runtime.
To do this we can append
auth:
enablePostgresUser: true
username: user
existingSecret: auth
to the values.yaml
file.
The secret containing the passwords for the postgres
and app
users can be created by running
kubectl create secret generic auth \
--from-literal postgres-password=supers3cret \
--from-literal password=s3cret \
--namespace postgres-helm
using the default secret key names which is postgres-password
and password
for the admin and custom user
respectively.
If you even think about checking this secret into your version control system you should take a look at another Bitnami project called Sealed secrets used encrypt secrets.
Having crafted a desired values.yaml
configuration we can install the Helm chart by running
helm install postgresql \
oci://registry-1.docker.io/bitnamicharts/postgresql \
--values values.yaml \
--namespace postgres-helm
which should greet us with a command we can use to connect to the database similar to
kubectl port-forward --namespace postgres-helm svc/postgresql 5432:5432 &
psql --host 127.0.0.1 --dbname postgres --pport 5432 --username postgresql
Note that the Helm chart creates both a headless Service for internal communication and a regular Service for external connections which is what we connect to using the above port-forwarding.
The chart also creates a PodDisruptionBudget
setting maxUnavailable
primary database pods to 1
— which doesn’t really do much with our one replica,
and a NetworkPolicy allowing all egress
traffic and ingress traffic on port 5432
from any source.
A ServiceAccount is also created which I assume can
be given special privileges though the chart.
Sub-charts#
Configuring a standalone PostgreSQL database helps us explore the different options.
For other Helm charts bundling the Bitnami PostgreSQL chart we can often directly configure the installation under the
postgresql
key in the main application chart, e.g.
postgresql:
global:
defaultStorageClass: proxmox-csi
primary:
persistence:
enabled: true
CloudNative PG Operator#
A step up from Helm charts in regard to automating the Kubernetes workload lifecycle is using operators. Operators extend the Kubernetes API with Custom Resource Definitions to allow for new functionality.
There are many database operators to chose from. Inspired by Jérôme Petazzoni’s and Alexandre Buisine’s KubeCon + CloudNativeCon Europe 2024 talk titled We Tested and Compared 6 Database Operators. The Results are In! I decided to give CloudNative PG a go. An honorable mention is the Zalando postgres-operator which also has an optional GUI.
Installation#
Following the CloudNative PG documentation, we can install the operator either directly from the manifest
kubectl apply --server-side -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.24/releases/cnpg-1.24.1.yaml
or through the CloudNative-PG Helm chart as
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm upgrade --install cnpg \
--namespace cnpg-system \
--create-namespace \
cnpg/cloudnative-pg
which will install the operator in the cnpg-system
namespace.
Note that CloudNative PG requires a Server-side
apply
to ensure compatability with Kubernetes 1.29 and higher.
Keep this in mind if you’re planning to install CloudNative PG using other methods,
e.g. using Argo CD Applications.
To see if the operator is up and running, we can check that the associated deployment is ready by executing
kubectl get deploy -n cnpg-system
For Operator customisation see the Helm chart values file and the documentation on Operator Configuration.
With the operator operational we’re ready to
deploy a Postgres Database Cluster using the
postgresql.cnpg.io/v1
Cluster CRD.
This is also available as
a Cluster Helm chart from the same place
as the operator.
Kubectl Plugin#
CloudNative PG comes with an optional kubectl plugin to help make it easier to manage CloudNative PG resources. You can install the plugin using the time-honoured tradition of curling directly to a shell
curl -sSfL \
https://github.com/cloudnative-pg/cloudnative-pg/raw/main/hack/install-cnpg-plugin.sh | \
sudo sh -s -- -b /usr/local/bin
The plugin documentation also lists other ways of installing the plugin if you prefer your packages signed. There is also an addon for the popular K9s TUI.
Basic Database Cluster#
Assuming you’ve configured a default StorageClass that can automatically provision volumes for you, the simplest database “cluster” manifest you can create is
# cnpg/simplest.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: simple-database
namespace: cnpg-database
spec:
storage:
size: 4Gi
This will provision a 4 GiB volume for our database, create RBAC objects — ServiceAccount, Role, and RoleBinding — certificates, a PodDisruptionBudget, and three different Services, along with a bootstrap Job, before finally starting the database in a new Pod. A Secret containing the database connection info is also created.
The certificates are used to facilitate TLS encryption and can be user provided with e.g. Cert-manager, or manually created.
Note that the default spec.instance
value is 1
,
which spins up only a primary database with no replicas,
i.e. not really a database cluster.
Database Services#
The Cluster resource creates three Services (prefixed rw
, ro
and r
).
In the case of multiple instances the rw
-service (read/write) will point to the primary instance,
the ro
-service (read-only) points to the replicas,
and the r
-service (read) points to any database instance.
For further customisation,
the ro
- and r
-services can be disabled.
It’s also possible
to add custom services,
if you e.g. require a LoadBalancer
type Service.
Bootstrapping#
The bootstrap job can be used to create a new cluster from scratch, or from an existing PostgreSQL cluster, either directly or indirectly through a backup.
In this article we’ll only take a look at bootstrapping a fresh cluster, though I encourage you to get to know the other bootstrapping possibilities for recovery purposes.
Database Users#
CloudNative PG automatically generates a postgres
and an app
user with a random password.
If no user password is configured using the .spec.bootstrap.initdb.secret
field,
a <CLUSTER_NAME>-app
secret is created containing the user password and database connection details.
The .spec.bootstrap.initdb
can also be used to change the username of the custom user
— in which case the generated secret will be postfixed with that username instead of app
.
The postgres
superuser
— and the postgres
database,
are supposed to be only used by the operator to configure the database cluster,
though it can be enabled by setting .spec.enableSuperuserAccess
to true
,
which will create a <CLUSTER_NAME>-superuser
secret similar to the app
-user secret.
To create additional users we can use the declarative Database Role Management options to configure them.
Custom Database Cluster#
The CloudNative PG Cluster resource has a lot of options. I recommend you take a look at the samples along with the API Reference to explore all the possibilities.
To go through some of the options I’ve prepared the following custom three instance (line 9) cluster — one primary and two replicas. You can find the full example in the Summary section.
In the example cluster, the storageClass
is explicitly set (line 12) to be proxmox-csi
(more details in
this article).
The database version is pinned in line 15 to be 16.4
.
Against better judgement we’ve enabled superuser access (line 20) with the password provided in the secret on line 22.
|
|
In the bootstrap section we’re creating a new database called appdb
(line 27) with an owner called user
(line 29).
We again provide the login password using a named Secret (line 31).
In the post init section we’ve referenced a ConfigMap with some illustrative instructions (lines 33-36).
We lightly touch the postgresql
parameters
to set the timezone to something appropriate (line 40).
Next we utilise the managed
configuration
to create a read-only user called read
and disable the default read-only
(ro
) service on line 56.
We also create a custom LoadBalancer type Service to allow for external connections.
Here we’ve also taken advantage of Cilium’s LB-IPAM functionality
(line 64) to declaratively give the Service an IP we can reach outside the cluster (more details
in this article).
Backups and Recovery#
CloudNative PG relies on Barman to perform on physical
and Write-Ahead Logging (WAL)
backups,
though if you prefer to have a logical backup of your database they’ve described how to perform this using pg_dump
in
an Emergency backup section of their
documentation.
This “emergency” procedure boils down to running
kubectl exec "<PRIMARY_CLUSTER_POD>" -c postgres \
-- pg_dump -Fc -d "<DATABSE>" > "<DATABASE>.dump"
to save the logical backup on your work machine. To restore the logical backup you can then run
kubectl exec -i "<NEW_PRIMARY_CLUSTER_POD>" -c postgres \
-- pg_restore --no-owner --role="<USER>" -d "<DATABASE>" --verbose < "<DATABASE>.dump"
Regular backups using CloudNative PG requires that the database Cluster is provided with a BackupConfiguration to perform either Kubernetes volume snapshot — backed by a StorageClass that supports it, or an object store backup using the Barman Cloud tool to an S3 compatible object store like e.g. MinIO.
The recommended way of performing a backup using CloudNative PG is by using their ScheduledBackup
resource,
though they also offer an on-demand backup using the Backup resource.
Assuming you’ve configured the backup
section of your Cluster correctly,
this resource can be created using the kubectl plugin mentioned earlier by executing
kubectl cnpg backup <CLUSTER_NAME>
which should start creating a backup of your database.
Monitoring#
CloudNative PG readily integrates with both Prometheus and Grafana.
To make it easy to get started they’ve also prepared a Grafana dashboard template to show relevant metrics for your different Cluster resources.
For more details see the Monitoring section in the CloudNative PG documentation.
Other features#
Worth noting is CloudNative PG’s Image Catalog feature along with the Rolling Updates options that allows for a controlled upgrade of an existing database cluster.
For more advanced setups, CloudNative PG also supports Connection Pooling using PgBouncer, Tablespaces, as well as the PostGIS extension, and pprof for profiling along with other nifty features.
For inspiration and further reading on how to use the CloudNative PG operator, I found this article to be interesting.
Summary#
To avoid to many details in the main text I’ve put some of them in the summary to hopefully provide better examples. You can also find these examples on GitLab
Kustomize#
# kustomize/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
configMapGenerator:
- name: database
namespace: postgres-simple
literals:
- POSTGRES_DB=postgres
resources:
- ns.yaml
- svc.yaml
- local-pv.yaml
- local-pvc.yaml
- credentials.yaml
- stateful-set.yaml
# kustomize/ns.yaml
apiVersion: v1
kind: Namespace
metadata:
name: postgres-simple
# kustomize/svc.yaml
apiVersion: v1
kind: Service
metadata:
name: postgres-service
namespace: postgres-simple
spec:
type: ClusterIP
clusterIP: None
selector:
app: postgres
ports:
- protocol: TCP
port: 5432
# kustomize/credentials.yaml
apiVersion: v1
kind: Secret
metadata:
name: credentials
namespace: postgres-simple
type: Opaque
stringData:
POSTGRES_USER: 'admin'
POSTGRES_PASSWORD: 'supers3cret'
# kustomize/stateful-set.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: postgres-simple
spec:
serviceName: postgres-service
selector:
matchLabels:
app: postgres
replicas: 1
template:
metadata:
labels:
app: postgres
spec:
securityContext:
runAsNonRoot: true
runAsUser: "<NFS_UID>"
runAsGroup: "<NFS_GID>"
containers:
- name: postgres
image: docker.io/postgres:17.0
ports:
- containerPort: 5432
envFrom:
- configMapRef:
name: database
- secretRef:
name: credentials
volumeMounts:
- name: db
mountPath: /var/lib/postgresql/data
volumes:
- name: db
nfs:
server: "<SERVER>"
path: "<PATH>"
Bitnami Helm chart#
# helm/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ns.yaml
- local-pv.yaml
- local-pvc.yaml
- auth-secret.yaml
helmCharts:
- name: postgresql
repo: oci://registry-1.docker.io/bitnamicharts
releaseName: postgresql
namespace: postgres-helm
version: 16.0.3
valuesFile: values.yaml
# helm/values.yaml
image:
tag: 17.0.0
auth:
username: app
existingSecret: auth
primary:
persistence:
existingClaim: postgres-data
# helm/ns.yaml
apiVersion: v1
kind: Namespace
metadata:
name: postgres-helm
# helm/local-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-data
spec:
capacity:
storage: 8Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-db-storage
local:
path: "<LOCAL_PATH>"
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- { key: kubernetes.io/hostname, operator: In, values: [ "<NODE_NAME>" ] }
# helm/local-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: postgres-helm
spec:
storageClassName: local-db-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
# helm/auth-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: auth
namespace: postgres-helm
type: Opaque
stringData:
password: 's3cret'
postgres-password: 'supers3cret'
Cloud Native PG#
# cnpg/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ns.yaml
- postgres-credentials.yaml
- user-credentials.yaml
- read-credentials.yaml
- cm-post-init.yaml
- custom-cluster.yaml
# cnpg/ns.yaml
apiVersion: v1
kind: Namespace
metadata:
name: postgres-helm
# cnpg/postgres-credentials.yaml
apiVersion: v1
kind: Secret
metadata:
name: postgres-credentials
type: kubernetes.io/basic-auth
stringData:
username: 'postgres'
password: 'supers3cret'
# cnpg/user-credentials.yaml
apiVersion: v1
kind: Secret
metadata:
name: user-credentials
type: kubernetes.io/basic-auth
stringData:
username: 'user'
password: 's3cret'
# cnpg/read-credentials.yaml
apiVersion: v1
kind: Secret
metadata:
name: read-credentials
type: kubernetes.io/basic-auth
stringData:
username: 'read'
password: 'letmelook'
# cnpg/cm-post-init.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: post-init
namespace: cnpg-database
data:
post-init.sql: |
create table for_two (seats integer);
insert into for_two (seats) values (2);
grant all privileges on table for_two to "user";
# cnpg/custom-cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: custom-cluster
namespace: cnpg-database
spec:
instances: 3
storage:
storageClass: proxmox-csi
size: 2Gi
imageName: ghcr.io/cloudnative-pg/postgresql:16.4
monitoring:
enablePodMonitor: true
enableSuperuserAccess: true
superuserSecret:
name: postgres-credentials
bootstrap:
initdb:
database: appdb
owner: user
secret:
name: user-credentials
postInitApplicationSQLRefs:
configMapRefs:
- name: post-init
key: post-init.sql
postgresql:
parameters:
timezone: "Europe/Oslo"
managed:
roles:
- name: read
passwordSecret:
name: read-credentials
ensure: present
comment: Read Only user
login: true
inherit: true
inRoles:
- pg_read_all_data
services:
disabledDefaultServices: [ "ro" ]
additional:
- selectorType: rw
updateStrategy: patch
serviceTemplate:
metadata:
name: custom-database-ext
annotations:
io.cilium/lb-ipam-ips: 192.168.1.233
spec:
type: LoadBalancer