Node-undertaker is a tool that was built to address handling Kubernetes nodes that are unhealthy.
Kubernetes itself marks such nodes and then using NoExecute taint removes pods out of them. But such a node still runs in the cloud provider and consumes resources. This tool detects such nodes and terminates them in the cloud provider.
Currently supported cloud providers:
This tool checks every minute all the nodes if they have “fresh” lease in a namespace. It can check leases in the kube-node-lease namespace (created by kubelet) or any other namespace that contains similar leases (for custom healthchecking solution).
Before you can start node-undertaker it needs credentials with access granted to cloud provider.
For AWS node-undertaker requires to have granted IAM role with following policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:TerminateInstances",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeTrafficSources",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:DeregisterTargets"
],
"Resource": "*"
}
]
}
In case there are more resources than one cluster it is advised to limit access to only one cluster’s resources (for example by using Conditions). Example policy for clusters tagged with ‘kubernetes.io/cluster/CLUSTER_NAME=owned’:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:TerminateInstances",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:DeregisterTargets"
],
"Resource": "*",
"Condition": {
"StringLike": {
"aws:ResourceTag/kubernetes.io/cluster/CLUSTER_NAME": "owned"
}
}
},
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeTrafficSources"
],
"Resource": "*"
}
]
}
helm repo add dbschenker https://dbschenker.github.io/node-undertaker
helm upgrade --install --create-namespace -n node-undertaker node-undertaker node-undertaker
make kwok
make local
or run command with customized configurationexample/kwok/node*.yaml
example/kwok/create-node-lease.sh NODE_NAME kube-node-lease 100
Cleanup: kwokctl delete cluster
make kind
make docker
make kind_load
make kind_helm
Cleanup: kind delete cluster
This project is maintained by: