One of the changes in progress to improve the scheduler or placement engine in Nova is something called generic resource pools. These are a way of representing an entity which can provide inventories of one or more class of resources (things like virtual cpus, disk, networks).
I was in the bathtub having a think (a somewhat common and useful thing) and started thinking about failure modes of a scheduler that operates at scale. Things like, what happens when there's a network partition? When I first started working on resource pools my intuitive reaction to keeping all this information in a relational database (MySQL by default) was pretty negative. It smelled incorrect: we're duplicating information from its authoritative source to somewhere else. Isn't that bound to be a problem? Also, with any new database table that is added to Nova there is a deep debate on whether the table should go in the API database or the cell database.
Meanwhile, elsewhere in my brain, I was remembering the Super Scheduling cross project spec proposal that explores how to do scheduling for all sorts of resources, not just VMs.
With those things in context I had a blue sky idea which may have zero merit, but I thought I better write down:
One could create a distributed, hiearchical generic resource scheduler by having:
- a scheduler suite at the API layer
- a scheduler suite in each cell
- resource pools at API layer that represent non-cpu resources that are shared between cells
- a single resource pool per cell that represents aggregates of all classes of resource (including cpu) that are only in that cell
- resource pools and providers in the current cell representing the actual resources
Everything is stored in the API database except for the resources local to any given cell. Those are stored in the cell.
A scheduling request changes from "I want a VM with the following characteristics" to "I want a volume of resources with these classes and limits". If compute-related stuff is in there, you get a VM, otherwise you get something else.
A top layer scheduling request returns those resource pools which can satisfy it, some of which may be an entire cell. If a cell is selected then the request is passed to the cell and it does its own internal scheduling.