copyright | lastupdated | keywords | subcollection | ||
---|---|---|---|---|---|
|
2020-07-31 |
kubernetes, iks, nginx, nlb, help |
containers |
{:beta: .beta} {:codeblock: .codeblock} {:deprecated: .deprecated} {:download: .download} {:external: target="_blank" .external} {:faq: data-hd-content-type='faq'} {:gif: data-image-type='gif'} {:help: data-hd-content-type='help'} {:important: .important} {:new_window: target="_blank"} {:note: .note} {:pre: .pre} {:preview: .preview} {:screen: .screen} {:shortdesc: .shortdesc} {:support: data-reuse='support'} {:table: .aria-labeledby="caption"} {:tip: .tip} {:troubleshoot: data-hd-content-type='troubleshoot'} {:tsCauses: .tsCauses} {:tsResolve: .tsResolve} {:tsSymptoms: .tsSymptoms}
{: #cs_troubleshoot_lb}
As you use {{site.data.keyword.containerlong}}, consider these techniques for general load balancer troubleshooting and debugging. {: shortdesc}
While you troubleshoot, you can use the {{site.data.keyword.containerlong_notm}} Diagnostics and Debug Tool to run tests and gather pertinent information from your cluster. {: tip}
{: #cs_loadbalancer_fails}
{: tsSymptoms} You publicly exposed your app by creating an NLB service in your classic cluster. When you tried to connect to your app by using the public IP address of the NLB, the connection failed or timed out.
{: tsCauses} Your NLB service might not be working properly for one of the following reasons:
- The cluster is a free cluster or a standard cluster with only one worker node.
- The cluster is not fully deployed yet.
- The configuration script for your NLB service includes errors.
{: tsResolve} To troubleshoot your NLB service:
- Check that you set up a standard cluster that is fully deployed and has at least two worker nodes to ensure high availability for your NLB service.
ibmcloud ks worker ls --cluster <cluster_name_or_ID>
{: pre}
In your CLI output, make sure that the **Status** of your worker nodes displays **Ready** and that the **Machine Type** shows a flavor other than **free**.
-
For version 2.0 NLBs: Ensure that you complete the NLB 2.0 prerequisites.
-
Check the accuracy of the configuration file for your NLB service.
-
Version 2.0 NLBs:
apiVersion: v1 kind: Service metadata: name: myservice annotations: service.kubernetes.io/ibm-load-balancer-cloud-provider-enable-features: "ipvs" spec: type: LoadBalancer selector: <selector_key>:<selector_value> ports: - protocol: TCP port: 8080 externalTrafficPolicy: Local
{: screen}
- Check that you defined LoadBalancer as the type for your service.
- Check that you included the
service.kubernetes.io/ibm-load-balancer-cloud-provider-enable-features: "ipvs"
annotation. - In the
spec.selector
section of the LoadBalancer service, ensure that the<selector_key>
and<selector_value>
is the same as the key/value pair that you used in thespec.template.metadata.labels
section of your deployment YAML. If labels do not match, the Endpoints section in your LoadBalancer service displays and your app is not accessible from the internet. - Check that you used the port that your app listens on.
- Check that you set
externalTrafficPolicy
toLocal
.
-
Version 1.0 NLBs:
apiVersion: v1 kind: Service metadata: name: myservice spec: type: LoadBalancer selector: <selector_key>:<selector_value> ports: - protocol: TCP port: 8080
{: screen}
- Check that you defined LoadBalancer as the type for your service.
- In the
spec.selector
section of the LoadBalancer service, ensure that the<selector_key>
and<selector_value>
is the same as the key/value pair that you used in thespec.template.metadata.labels
section of your deployment YAML. If labels do not match, the Endpoints section in your LoadBalancer service displays and your app is not accessible from the internet. - Check that you used the port that your app listens on.
-
-
Check your NLB service and review the Events section to find potential errors.
kubectl describe service <myservice>
{: pre}
Look for the following error messages:
Clusters with one node must use services of type NodePort
To use the NLB service, you must have a standard cluster with at least two worker nodes.No cloud provider IPs are available to fulfill the NLB service request. Add a portable subnet to the cluster and try again
This error message indicates that no portable public IP addresses are left to be allocated to your NLB service. Refer to Adding subnets to clusters to find information about how to request portable public IP addresses for your cluster. After portable public IP addresses are available to the cluster, the NLB service is automatically created.Requested cloud provider IP is not available. The following cloud provider IPs are available:
You defined a portable public IP address for your load balancer YAML by using the **`loadBalancerIP`** section, but this portable public IP address is not available in your portable public subnet. In the **`loadBalancerIP`** section your configuration script, remove the existing IP address and add one of the available portable public IP addresses. You can also remove the **`loadBalancerIP`** section from your script so that an available portable public IP address can be allocated automatically.- You do not have enough worker nodes to deploy an NLB service. One reason might be that you deployed a standard cluster with more than one worker node, but the provisioning of the worker nodes failed.
No available nodes for NLB services
- List available worker nodes.
kubectl get nodes
- If at least two available worker nodes are found, list the worker node details.
ibmcloud ks worker get --cluster <cluster_name_or_ID> --worker <worker_ID>
- Make sure that the public and private VLAN IDs for the worker nodes that were returned by the
kubectl get nodes
and theibmcloud ks worker get
commands match.
-
If you use a custom domain to connect to your NLB service, make sure that your custom domain is mapped to the public IP address of your NLB service.
-
Find the public IP address of your NLB service.
kubectl describe service <service_name> | grep "LoadBalancer Ingress"
{: pre}
-
Check that your custom domain is mapped to the portable public IP address of your NLB service in the Pointer record (PTR).
-
{: #cs_subnet_limit_lb}
{: tsSymptoms}
When you describe the ibm-cloud-provider-vlan-ip-config
configmap in your classic cluster, you might see an error message similar to the following example output.
kubectl describe cm ibm-cloud-provider-vlan-ip-config -n kube-system
{: pre}
Warning CreatingLoadBalancerFailed ... ErrorSubnetLimitReached: There are already the maximum number of subnets permitted in this VLAN.
{: screen}
{: tsCauses} In standard clusters, the first time that you create a cluster in a zone, a public VLAN and a private VLAN in that zone are automatically provisioned for you in your IBM Cloud infrastructure account. In that zone, 1 public portable subnet is requested on the public VLAN that you specify and 1 private portable subnet is requested on the private VLAN that you specify. For {{site.data.keyword.containerlong_notm}}, VLANs have a limit of 40 subnets. If the cluster's VLAN in a zone already reached that limit, you might not have a portable public IP address available to create a network load balancer (NLB).
To view how many subnets a VLAN has:
- From the IBM Cloud infrastructure console, select Network > IP Management > VLANs.
- Click the VLAN Number of the VLAN that you used to create your cluster. Review the Subnets section to see whether 40 or more subnets exist.
{: tsResolve} If you need a new VLAN, order one by contacting {{site.data.keyword.cloud_notm}} support. Then, create a cluster that uses this new VLAN.
If you have another VLAN that is available, you can set up VLAN spanning in your existing cluster. After, you can add new worker nodes to the cluster that use the other VLAN with available subnets. To check if VLAN spanning is already enabled, use the ibmcloud ks vlan spanning get --region <region>
command.
If you are not using all the subnets in the VLAN, you can reuse subnets on the VLAN by adding them to your cluster.
- Check that the subnet that you want to use is available.
The infrastructure account that you use might be shared across multiple {{site.data.keyword.cloud_notm}} accounts. In this case, even if you run the `ibmcloud ks subnets` command to see subnets with **Bound Clusters**, you can see information only for your clusters. Check with the infrastructure account owner to make sure that the subnets are available and not in use by any other account or team.
-
Use the
ibmcloud ks cluster subnet add
command to make an existing subnet available to your cluster. -
Verify that the subnet was successfully created and added to your cluster. The subnet CIDR is listed in the Subnet VLANs section.
ibmcloud ks cluster get --show-resources <cluster_name_or_ID>
{: pre}
In this example output, a second subnet was added to the
2234945
public VLAN:Subnet VLANs VLAN ID Subnet CIDR Public User-managed 2234947 10.xxx.xx.xxx/29 false false 2234945 169.xx.xxx.xxx/29 true false 2234945 169.xx.xxx.xxx/29 true false
{: screen}
-
Verify that the portable IP addresses from the subnet that you added are used for the load balancer's EXTERNAL-IP. It might take several minutes for the services to use the portable IP addresses from the newly-added subnet.
kubectl get svc -n kube-system
{: pre}
{: #cs_source_ip_fails_lb}
{: tsSymptoms}
In a classic cluster, you enabled source IP preservation for a version 1.0 load balancer service by changing externalTrafficPolicy
to Local
in the service's configuration file. However, no traffic reaches the back-end service for your app.
{: tsCauses} When you enable source IP preservation for load balancer services, the source IP address of the client request is preserved. The service forwards traffic to app pods on the same worker node only to ensure that the request packet's IP address isn't changed. Typically, load balancer service pods are deployed to the same worker nodes that the app pods are deployed to. However, some situations exist where the service pods and app pods might not be scheduled onto the same worker node. If you use Kubernetes taints{: external} on worker nodes, any pods that don't have a taint toleration are prevented from running on the tainted worker nodes. Source IP preservation might not be working based on the type of taint you used:
-
Edge node taints: You added the
dedicated=edge
label to two or more worker nodes on each public VLAN in your cluster to ensure that load balancer pods deploy to those worker nodes only. Then, you also tainted those edge nodes to prevent any other workloads from running on edge nodes. However, you didn't add an edge node affinity rule and toleration to your app deployment. Your app pods can't be scheduled on the same tainted nodes as the service pods, and no traffic reaches the back-end service for your app. -
Custom taints: You used custom taints on several nodes so that only app pods with that taint toleration can deploy to those nodes. You added affinity rules and tolerations to the deployments of your app and load balancer service so that their pods deploy to only those nodes. However,
ibm-cloud-provider-ip
keepalived
pods that are automatically created in theibm-system
namespace ensure that the load balancer and the app pods are always scheduled onto the same worker node. Thesekeepalived
pods don't have the tolerations for the custom taints that you used. They can't be scheduled on the same tainted nodes that your app pods are running on, and no traffic reaches the back-end service for your app.
{: tsResolve} Resolve the issue by choosing one of the following options:
-
Edge node taints: To ensure that your load balancer and app pods deploy to tainted edge nodes, add edge node affinity rules and tolerations to your app deployment. Load balancer pods have these affinity rules and tolerations by default.
-
Custom taints: Remove custom taints that the
keepalived
pods don't have tolerations for. Instead, you can label worker nodes as edge nodes, and then taint those edge nodes.
If you complete one of the above options but the keepalived
pods are still not scheduled, you can get more information about the keepalived
pods:
-
Get the
keepalived
pods.kubectl get pods -n ibm-system
{: pre}
-
In the output, look for
ibm-cloud-provider-ip
pods that have a Status ofPending
. Example:ibm-cloud-provider-ip-169-61-XX-XX-55967b5b8c-7zv9t 0/1 Pending 0 2m <none> <none> ibm-cloud-provider-ip-169-61-XX-XX-55967b5b8c-8ptvg 0/1 Pending 0 2m <none> <none>
{:screen}
-
Describe each
keepalived
pod and look for the Events section. Address any error or warning messages that are listed.kubectl describe pod ibm-cloud-provider-ip-169-61-XX-XX-55967b5b8c-7zv9t -n ibm-system
{: pre}
{: #vpc_ts_lb}
{: tsSymptoms}
You publicly exposed your app by creating a Kubernetes LoadBalancer
service in your VPC cluster. When you try to connect to your app by using the hostname that is assigned to the Kubernetes LoadBalancer
, the connection fails or times out.
When you run kubectl describe svc <kubernetes_lb_service_name>
, you might see a warning message similar to one of the following in the Events section:
The VPC load balancer that routes requests to this Kubernetes `LoadBalancer` service is offline.
{: screen}
The VPC load balancer that routes requests to this Kubernetes `LoadBalancer` service was deleted from your VPC.
{: screen}
{: tsCauses}
When you create a Kubernetes LoadBalancer
service in your cluster, a VPC load balancer is automatically created in your VPC. The VPC load balancer routes requests only to the app that the Kubernetes LoadBalancer
service exposes. Requests cannot be routed to your app in the following situations:
- A VPC security group is blocking incoming traffic to your worker nodes, including incoming requests to your app.
- The VPC load balancer is offline, such as due to load balancer provisioning errors or VSI connection errors.
- The VPC load balancer is deleted through the VPC console or the CLI.
- The VPC load balancer's DNS entry is still registering.
- You reached the maximum number of VPC load balancers permitted per account. Across all of your VPC clusters in your VPC, a maximum of 20 VPC load balancers can be created.
{: tsResolve} Verify that no VPC security groups are blocking traffic to your cluster and that the VPC load balancer is available.
-
VPC Gen 2 clusters: Allow traffic requests that are routed by the VPC load balancer to node ports on your worker nodes.
-
Verify that the VPC load balancer for the Kubernetes
LoadBalancer
service exists. In the output, look for the VPC load balancer that is formattedkube-<cluster_ID>-<kubernetes_lb_service_UID>
. You can get the KubernetesLoadBalancer
service UID by runningkubectl get svc <service_name> -o yaml
.
ibmcloud is load-balancers
{: pre}
- If the VPC load balancer is not listed, it does not exist for one of the following reasons:
- You reached the maximum number of VPC load balancers permitted per account. Across all of your VPC clusters in your VPC, a maximum of 20 VPC load balancers can be created. One VPC load balancer is created for each Kubernetes
LoadBalancer
service that you create, and it routes requests to that KubernetesLoadBalancer
service only. - The VPC load balancer was deleted through the VPC console or the CLI. To re-create the VPC load balancer for your Kubernetes
LoadBalancer
service, restart the Kubernetes master by runningibmcloud ks cluster master refresh --cluster <cluster_name_or_id>
.
If you want to remove the load-balancing setup for an app in your VPC cluster, delete the Kubernetes `LoadBalancer` service by running `kubectl delete svc `. The VPC load balancer that is associated with the Kubernetes `LoadBalancer` service is automatically deleted from your VPC.
- You reached the maximum number of VPC load balancers permitted per account. Across all of your VPC clusters in your VPC, a maximum of 20 VPC load balancers can be created. One VPC load balancer is created for each Kubernetes
- If the VPC load balancer is listed, it might not be responsive for the following reasons:
- Its DNS entry might still be registering. When a VPC load balancer is created, the hostname is registered through a public DNS. In some cases, it can take several minutes for this DNS entry to be replicated to the specific DNS that your client is using. You can either wait for the hostname to be registered in your DNS, or access the VPC load balancer directly by using one of its IP addresses. To find the VPC load balancer IP addresses, look for the Public IP column in the output of
ibmcloud is load-balancers
. - If after several minutes you cannot reach the load balancer, it might be offline due to provisioning or connection issues. Open an {{site.data.keyword.cloud_notm}} support case. For the type, select Technical. For the category, select Network in the VPC section. In the description, include your cluster ID and the VPC load balancer ID.
- Its DNS entry might still be registering. When a VPC load balancer is created, the hostname is registered through a public DNS. In some cases, it can take several minutes for this DNS entry to be replicated to the specific DNS that your client is using. You can either wait for the hostname to be registered in your DNS, or access the VPC load balancer directly by using one of its IP addresses. To find the VPC load balancer IP addresses, look for the Public IP column in the output of
{: #vpc_no_lb}
{: tsSymptoms}
You publicly exposed your app by creating a Kubernetes LoadBalancer
service in your VPC cluster. When you run kubectl describe svc <kubernetes_lb_service_name>
, you see a warning message in the Events section similar to one of the following:
The subnet with ID(s) '<subnet_id>' has insufficient available ipv4 addresses.
{: screen}
{: tsCauses}
When you create a Kubernetes LoadBalancer
service in your cluster, a VPC load balancer is automatically created in your VPC. The VPC load balancer puts a floating IP address for your Kubernetes LoadBalancer
service behind a hostname that you can access your app through.
In VPC clusters, both worker nodes and services are assigned IP addresses from the same subnets. Traffic routing is enabled between subnets, so when all IP addresses in a subnet for a zone are used by worker nodes or services, you can still create new worker nodes or services in that zone because they use IP addresses from subnets in other zones. However, if all IP addresses on all subnets are in use, a new Kubernetes LoadBalancer
service cannot be successfully provisioned.
{: tsResolve} After you create a VPC subnet, you cannot resize it or change its IP range. Instead, you must create a larger VPC subnet in one or more zones where you have worker nodes. Then you create a new worker pool using the larger subnets.
-
Create a new VPC subnet{: external} in the same VPC and in one or more zones where your cluster has worker nodes. Make sure that you create a subnet that can support both the number of worker nodes and services that you plan to create in your cluster. The default CIDR size of each VPC subnet is
/24
, which can support up to 253 worker nodes and services. To check your cluster's VPC and zones, runibmcloud ks cluster get --cluster <cluster_name_or_ID>
. -
Create a new worker pool in your cluster.
- VPC Generation 1 clusters:
{: pre}
ibmcloud ks worker-pool create vpc-classic --name <name> --cluster <cluster_name_or_ID> --flavor <flavor> --size-per-zone <number_of_worker_nodes> --label <key>=<value>
- VPC Generation 2 clusters:
{: pre}
ibmcloud ks worker-pool create vpc-gen2 --name <name> --cluster <cluster_name_or_ID> --flavor <flavor> --size-per-zone <number_of_worker_nodes> --label <key>=<value>
- VPC Generation 1 clusters:
-
Using the ID for the larger subnets that you created in step 1, add the zones to the worker pool. Repeat the following command for each zone and subnet.
- VPC Generation 1 clusters:
{: pre}
ibmcloud ks zone add vpc-classic --zone <zone> --subnet-id <subnet_id> --cluster <cluster_name_or_ID> --worker-pool <worker_pool_name>
- VPC Generation 2 clusters:
{: pre}
ibmcloud ks zone add vpc-gen2 --zone <zone> --subnet-id <subnet_id> --cluster <cluster_name_or_ID> --worker-pool <worker_pool_name>
- After a few minutes, verify that your
LoadBalancer
service is successfully provisioned onto one of the new subnets. If the service is provisioned successfully, noWarning
orError
events are displayed.
kubectl describe svc <kubernetes_lb_service_name>
{: pre}