Nessie in Kubernetes (Minikube)

kurangdoa
4 min readAug 27, 2024

--

Intro

This article will not explain nessie in depth but more on how to deploy it in Kubernetes (especially via minikube). In short, nessie will be used as a catalog for Iceberg https://iceberg.apache.org/concepts/catalog/?h=catalog. Due to my curiosity of using nessie, https://projectnessie.org/guides/about/, I decided to use it as iceberg’s catalog.

Deployment

The deployment of nessie has pre-requisites of PSQL in my current setup mentioned in this article

and

The final configuration would look like below scenario where 3 (actually it is 2) replicaset of nessie deployment would be connected to postgresql that was already deployed previously in kubernetes as well (explained in the link above).

Deployment

All of the code can be seen in this github repo. https://github.com/kurangdoa/lakehouse_iceberg/tree/main/nessie

First, we need to create namespace to separate nessie deployment with the other. However, first, we delete the namespace (if it has been created before)

kubectl delete namespace nessie-dev
kubectl create namespace nessie-dev

Then, we need to apply the secret in kubernetes, it will contain the secret that is used to access postgresql, in this example, the password would be “postgres”

kubectl apply -f nessie-secret.yaml -n nessie-dev

The creation of PV below is optional because the nessie will use postgresql as version store https://projectnessie.org/nessie-latest/configuration/

kubectl delete pv nessie-volume -n nessie-dev --grace-period=0 --force
kubectl delete pvc nessie-volume-claim -n nessie-dev
kubectl apply -n nessie-dev -f nessie-pv.yaml
kubectl apply -n nessie-dev -f nessie-pvclaim.yaml

Then, deployment is done using command below.

kubectl apply -n nessie-dev -f nessie-deployment.yaml
kubectl apply -n nessie-dev -f nessie-service.yaml

In order to troubleshoot the deployment, you could see one of these command

kubectl get pv -n nessie-dev
kubectl get pvc -n nessie-dev
kubectl get deployments -n nessie-dev
kubectl get pods -n nessie-dev -o wide
kubectl get svc -n nessie-dev
kubectl get all -n nessie-dev

Finally, you can access the ui with http://127.0.0.1:6788

Detail Explanation

I am going to explain the file in this https://github.com/kurangdoa/lakehouse_iceberg/tree/main/nessie one by one.

nessie-secret.yaml

the postresql-password below is the base64 form of “postgres”

apiVersion: v1
kind: Secret
metadata:
name: nessie-secret
type: Opaque
data:
postgresql-password: cG9zdGdyZXM=

nessie-pv.yaml

Only one PV is created

apiVersion: v1
kind: PersistentVolume
metadata:
name: nessie-volume
labels:
type: local
app: nessie
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
hostPath:
path: /data/nessie

nessie-pvc.yaml

PVC is created to reserve the volume and use it for nessie deployment

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nessie-volume-claim
labels:
app: nessie
spec:
storageClassName: manual
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi

nessie-deployment.yaml

nessie deployment will consist of the connection into PSQL. Following the environment variable mentioned in this article, https://projectnessie.org/nessie-latest/configuration/, I am trying to match the setup with the deployment of PSQL explained in another article (link is on top of this article)

Special attention needed for the “jdbc:postgresql://datasaku-postgres-postgresql-ha-pgpool.psql-dev.svc.cluster.local:5433/nessie” it follows pattern of

jdbc:postgresql://<service-name>.<namespace>.svc.cluster.local:<psql-api-port>/<name of the database>

for more detail explanation, you can see it here https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#services

apiVersion: apps/v1
kind: Deployment
metadata:
name: nessie
spec:
replicas: 2
selector:
matchLabels:
app: nessie
template:
metadata:
labels:
app: nessie
spec:
containers:
- name: nessie
image: ghcr.io/projectnessie/nessie:0.83.2-java
imagePullPolicy: IfNotPresent
env:
# - name: QUARKUS_HTTP_PORT
# value: "6789"
- name: NESSIE_VERSION_STORE_TYPE
value: JDBC
- name: NESSIE_VERSION_STORE_PERSIST_JDBC_DATASOURCE
value: postgresql
- name: QUARKUS_DATASOURCE_POSTGRESQL_JDBC_URL
value: "jdbc:postgresql://datasaku-postgres-postgresql-ha-pgpool.psql-dev.svc.cluster.local:5433/nessie"
- name: QUARKUS_DATASOURCE_POSTGRESQL_USERNAME
value: postgres
- name: QUARKUS_DATASOURCE_POSTGRESQL_PASSWORD
valueFrom:
secretKeyRef:
name: nessie-secret
key: postgresql-password
ports:
- containerPort: 19120
volumeMounts:
- mountPath: /data/nessie
name: nessiedata
volumes:
- name: nessiedata
persistentVolumeClaim:
claimName: nessie-volume-claim

nessie-service.yaml

to open service outside of minikube, you would need load balancer service while opening the tunnel of minikube. If you haven’t done so, you would need to do

minikube tunnel -p datasaku-cluster

Port 6788 is the port to access nessie

apiVersion: v1
kind: Service
metadata:
name: nessie-service
spec:
type: LoadBalancer
ports:
- name: api
port: 6788
targetPort: 19120
selector:
app: nessie

Final Remark

Now, nessie has been deployed with Kubernetes and since it will be connected to iceberg, you might want to see the iceberg + spark deployment.

--

--