Kubernetes scheduler extender

Motivation

Kubernetes(k8s) currently schedules pods based on cpu, memory resources. k8s uses flat networking, the pod pecification doesn’t carry any networking requirements. Hence networking is not a scheduling constraint. Similarly, storage is network attached and hence not a scheduling constraint. To help enrterprise applications achieve consistent performance, I/O (storage & network) and QoS are also important considerations for the scheduler. This started us down the path of exploring extensibility options for the k8s scheduler.

Requirements
We had come up with the following requirements for the Scheduler Extender:

  1. apiserver should still be the api endpoint for pods and existing tools like kubectl should work as is
  2. Any networking, storage, quality of service requirements should be specified in the pod spec
  3. Should leverage cpu, mem based scheduling available in k8s
  4. Should work with the k8s binaries delivered by OS vendors (like Redhat or CoreOS)
  5. Should implement a generic interface that the community can benefit from

The Details (From #13580)
There are three ways to add new scheduling rules (predicates and priority functions) to Kubernetes:

  1. by adding these rules to the scheduler and recompiling (described here: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/scheduler.md),
  2. implementing your own scheduler process that runs instead of, or alongside of, the standard Kubernetes scheduler,
  3. implementing a “scheduler extender” process that the standard Kubernetes scheduler calls out to as a final pass when making scheduling decisions.

The third approach is needed for use cases where scheduling decisions need to be made on resources not directly managed by the standard Kubernetes scheduler. The extender helps make scheduling decisions based on such resources. (Note that the three approaches are not mutually exclusive.)

When scheduling a pod, the extender allows an external process to filter and prioritize nodes. k8s scheduler policy file allows configuration for the extender.

A sample scheduler policy file with extender configuration:

{
“predicates”: [
{
“name”: “HostName”
},
{
“name”: “MatchNodeSelector”
},
{
“name”: “PodFitsResources”
}
],
“priorities”: [
{
“name”: “LeastRequestedPriority”,
“weight”: 1
}
],
“extenders”: [
{
“urlPrefix”: “http://127.0.0.1:12345/api/scheduler”,
“filterVerb”: “filter”,
“enableHttps”: false
}
]
}

Arguments passed to the FilterVerb endpoint on the extender are the set of nodes filtered through the k8s predicates and the pod. Arguments passed to the PrioritizeVerb endpoint on the extender are the set of nodes filtered through the k8s predicates and extender predicates and the pod.

The “filter” call returns a list of nodes (api.NodeList). The “prioritize” call returns priorities for each node (schedulerapi.HostPriorityList).

The “filter” call may prune the set of nodes based on its predicates. Scores returned by the “prioritize” call are added to the k8s scores (computed through its priority functions) and used for final host selection.
Multiple extenders can be configured in the scheduler policy.

Feedback
Please send us any feedback at [email protected].