Jun 15, 2021

NEGs with Load Balancer on GKE

On GKE, the Google’s managed Kubernetes, you can expose your services using the Ingress option which behind the scenes uses the Cloud Load Balancing and NEGs (Network Endpoints Groups). The Ingress will create all the necessary components including the backend services, the. load balancer, configures the Cloud CDN and more.

But, the Ingress on GKE currently does not support all the Load Balancer options and all the Cloud CDN features, like negative caching for example which is enabled by default and creates really big problems when your site is down, and if you want to use these features you don’t have to many options; one would be to deploy your own version of ingress-gce, I wrote an article about this Deploy custom Ingress on GKE, and the other option is to not use the Ingress at all, and manually create and manage all the components.

In this tutorial I will show you how to expose you application to the internet using the Cloud Load Balancing and NEGs (Network Endpoints Groups) and Cloud CDN, without using the Ingress option.

Prerequisites

GCP project with billing enabled. If you don’t have one then sign-in to Google Cloud Platform Console and create a new project
Access to a standard internet browser

Setup

First let’s define some variables:

PROJECT_ID=$(gcloud config list project --format='value(core.project)')
ZONE=us-central1-a
CLUSTER_NAME=demo-cluster

and we need a cluster

gcloud container clusters \
        create $CLUSTER_NAME \
        --zone $ZONE --machine-type "e2-medium" \
        --enable-ip-alias \
        --num-nodes=2

the --enable-ip-alias enables the VPC-native traffic routing option for your cluster. This option creates and attaches additional subnets to VPC, the pods will have IP address allocated from the VPC subnets, and in this way the pods can be addressed directly by the load balancer aka container-native load balancing.

Next we need a simple deployment, we will use nginx

cat << EOF > app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
EOF
kubectl apply -f app-deployment.yaml

and the service

cat << EOF > app-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: app-service
  annotations:
    cloud.google.com/neg: '{"exposed_ports": {"80":{"name": "app-service-80-neg"}}}'
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: nginx
EOF
kubectl apply -f app-service.yaml

this annotation cloud.google.com/neg tells the GKE to create a NEG for this service and to add and remove endpoints (pods) to this group.

Notice here that the type is ClusterIP. Yes it is possible to expose the service to the internet even if the type is ClusterIP. This one of the magic of NEGs.

You can check if the NEG was created by using next command

gcloud compute network-endpoint-groups list

Next let’s create the load balancer and all the required components.

We need a firewall rule that will allow the traffic from the load balancer

# find the network tags used by our cluster
NETWORK_TAGS=$(gcloud compute instances describe \
    $(kubectl get nodes -o jsonpath='{.items[0].metadata.name}') \
    --zone=$ZONE --format="value(tags.items[0])")

# create the firewall rule
gcloud compute firewall-rules create $CLUSTER_NAME-lb-fw \
    --allow tcp:80 \
    --source-ranges 130.211.0.0/22,35.191.0.0/16 \
    --target-tags $NETWORK_TAGS

and a health check configuration

gcloud compute health-checks create http app-service-80-health-check \
  --request-path / \
  --port 80 \
  --check-interval 60 \
  --unhealthy-threshold 3 \
  --healthy-threshold 1 \
  --timeout 5

and a backend service

gcloud compute backend-services create $CLUSTER_NAME-lb-backend \
  --health-checks app-service-80-health-check \
  --port-name http \
  --global \
  --enable-cdn \
  --connection-draining-timeout 300

next we need to add our NEG to the backend service

gcloud compute backend-services add-backend $CLUSTER_NAME-lb-backend \
  --network-endpoint-group=app-service-80-neg \
  --network-endpoint-group-zone=$ZONE \
  --balancing-mode=RATE \
  --capacity-scaler=1.0 \
  --max-rate-per-endpoint=1.0 \
  --global

This was the backend configuration, let’s setup also the fronted.

First the url map

gcloud compute url-maps create $CLUSTER_NAME-url-map \
  --default-service $CLUSTER_NAME-lb-backend

and then the http proxy

gcloud compute target-http-proxies create $CLUSTER_NAME-http-proxy \
  --url-map $CLUSTER_NAME-url-map

and finally the global forwarding rule

gcloud compute forwarding-rules create $CLUSTER_NAME-forwarding-rule \
  --global \
  --ports 80 \
  --target-http-proxy $CLUSTER_NAME-http-proxy

Done! Give some time for the load balancer to setup all the components and then you can test if your setup works as expected.

# get the public ip address
IP_ADDRESS=$(gcloud compute forwarding-rules describe $CLUSTER_NAME-forwarding-rule --global --format="value(IPAddress)")
# print the public ip address
echo $IP_ADDRESS
# make a request to the service
curl -s -I http://$IP_ADDRESS/

and the output should be similar to this

HTTP/1.1 200 OK
Server: nginx/1.21.0
.....

You can now control the options of the Cloud CDN, like disable the negative caching.

gcloud compute backend-services update $CLUSTER_NAME-lb-backend \
  --no-negative-caching \
  --global

You can find out more about the limitations of the standalone zonal NEGs from the here Container-native load balancing through standalone zonal NEGs and pay a special attention to NEGs leaks:

When a GKE service is deleted, the associated NEG will not be garbage collected if the NEG is still referenced by a backend service. Dereference the NEG from the backend service to allow NEG deletion.
When a cluster is deleted, standalone NEGs are not deleted.

Cleaning up

This tutorial uses billable resources and you should be cleaning up.

Make user you still have the environment variables or you can go to setup.

# delete the forwarding-rule aka frontend
gcloud -q compute forwarding-rules delete $CLUSTER_NAME-forwarding-rule --global
# delete the http proxy
gcloud -q compute target-http-proxies delete $CLUSTER_NAME-http-proxy
# delete the url map
gcloud -q compute url-maps delete $CLUSTER_NAME-url-map
# delete the backend
gcloud -q compute backend-services delete $CLUSTER_NAME-lb-backend --global
# delete the health check
gcloud -q compute health-checks delete app-service-80-health-check
# delete the firewall rule
gcloud -q compute firewall-rules delete $CLUSTER_NAME-lb-fw
# delete the cluster
gcloud -q container clusters delete $CLUSTER_NAME --zone=$ZONE
# delete the NEG  
gcloud compute network-endpoint-groups delete app-service-80-neg --zone=$ZONE

Conclusion

When the out of the box solution does not work as you want you can always ca back to the basics.