Intro
This article will not explain nessie in depth but more on how to deploy it in Kubernetes (especially via minikube). In short, nessie will be used as a catalog for Iceberg https://iceberg.apache.org/concepts/catalog/?h=catalog. Due to my curiosity of using nessie, https://projectnessie.org/guides/about/, I decided to use it as iceberg’s catalog.
Deployment
The deployment of nessie has pre-requisites of PSQL in my current setup mentioned in this article
and
The final configuration would look like below scenario where 3 (actually it is 2) replicaset of nessie deployment would be connected to postgresql that was already deployed previously in kubernetes as well (explained in the link above).
All of the code can be seen in this github repo. https://github.com/kurangdoa/lakehouse_iceberg/tree/main/nessie
First, we need to create namespace to separate nessie deployment with the other. However, first, we delete the namespace (if it has been created before)
kubectl delete namespace nessie-dev
kubectl create namespace nessie-dev
Then, we need to apply the secret in kubernetes, it will contain the secret that is used to access postgresql, in this example, the password would be “postgres”
kubectl apply -f nessie-secret.yaml -n nessie-dev
The creation of PV below is optional because the nessie will use postgresql as version store https://projectnessie.org/nessie-latest/configuration/
kubectl delete pv nessie-volume -n nessie-dev --grace-period=0 --force
kubectl delete pvc nessie-volume-claim -n nessie-dev
kubectl apply -n nessie-dev -f nessie-pv.yaml
kubectl apply -n nessie-dev -f nessie-pvclaim.yaml
Then, deployment is done using command below.
kubectl apply -n nessie-dev -f nessie-deployment.yaml
kubectl apply -n nessie-dev -f nessie-service.yaml
In order to troubleshoot the deployment, you could see one of these command
kubectl get pv -n nessie-dev
kubectl get pvc -n nessie-dev
kubectl get deployments -n nessie-dev
kubectl get pods -n nessie-dev -o wide
kubectl get svc -n nessie-dev
kubectl get all -n nessie-dev
Finally, you can access the ui with http://127.0.0.1:6788
Detail Explanation
I am going to explain the file in this https://github.com/kurangdoa/lakehouse_iceberg/tree/main/nessie one by one.
nessie-secret.yaml
the postresql-password below is the base64 form of “postgres”
apiVersion: v1
kind: Secret
metadata:
name: nessie-secret
type: Opaque
data:
postgresql-password: cG9zdGdyZXM=
nessie-pv.yaml
Only one PV is created
apiVersion: v1
kind: PersistentVolume
metadata:
name: nessie-volume
labels:
type: local
app: nessie
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
hostPath:
path: /data/nessie
nessie-pvc.yaml
PVC is created to reserve the volume and use it for nessie deployment
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nessie-volume-claim
labels:
app: nessie
spec:
storageClassName: manual
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
nessie-deployment.yaml
nessie deployment will consist of the connection into PSQL. Following the environment variable mentioned in this article, https://projectnessie.org/nessie-latest/configuration/, I am trying to match the setup with the deployment of PSQL explained in another article (link is on top of this article)
Special attention needed for the “jdbc:postgresql://datasaku-postgres-postgresql-ha-pgpool.psql-dev.svc.cluster.local:5433/nessie” it follows pattern of
jdbc:postgresql://<service-name>.<namespace>.svc.cluster.local:<psql-api-port>/<name of the database>
for more detail explanation, you can see it here https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#services
apiVersion: apps/v1
kind: Deployment
metadata:
name: nessie
spec:
replicas: 2
selector:
matchLabels:
app: nessie
template:
metadata:
labels:
app: nessie
spec:
containers:
- name: nessie
image: ghcr.io/projectnessie/nessie:0.83.2-java
imagePullPolicy: IfNotPresent
env:
# - name: QUARKUS_HTTP_PORT
# value: "6789"
- name: NESSIE_VERSION_STORE_TYPE
value: JDBC
- name: NESSIE_VERSION_STORE_PERSIST_JDBC_DATASOURCE
value: postgresql
- name: QUARKUS_DATASOURCE_POSTGRESQL_JDBC_URL
value: "jdbc:postgresql://datasaku-postgres-postgresql-ha-pgpool.psql-dev.svc.cluster.local:5433/nessie"
- name: QUARKUS_DATASOURCE_POSTGRESQL_USERNAME
value: postgres
- name: QUARKUS_DATASOURCE_POSTGRESQL_PASSWORD
valueFrom:
secretKeyRef:
name: nessie-secret
key: postgresql-password
ports:
- containerPort: 19120
volumeMounts:
- mountPath: /data/nessie
name: nessiedata
volumes:
- name: nessiedata
persistentVolumeClaim:
claimName: nessie-volume-claim
nessie-service.yaml
to open service outside of minikube, you would need load balancer service while opening the tunnel of minikube. If you haven’t done so, you would need to do
minikube tunnel -p datasaku-cluster
Port 6788 is the port to access nessie
apiVersion: v1
kind: Service
metadata:
name: nessie-service
spec:
type: LoadBalancer
ports:
- name: api
port: 6788
targetPort: 19120
selector:
app: nessie
Final Remark
Now, nessie has been deployed with Kubernetes and since it will be connected to iceberg, you might want to see the iceberg + spark deployment.