Write – Nova's use of Placement

Nova's use of Placement

Posted on: Thu 26 July 2018

Category: openstack – Tags: openstack, opensource, placement

A year and a half ago I did some analysis on how nova uses placement.

I've repeated some of that analysis today and here's a brief summary of the results. Note that I don't present this because I'm concerned about load on placement, we've demonstrated that placement scales pretty well. Rather, this analysis indicates that the compute node is doing redundant work which we'd prefer not to do. The compute node can't scale horizontally in the same way placement does. If offloading the work to placement and being redundant is the easiest way to avoid work on the compute node, let's do that, but that doesn't seem to be quite what's happening here.

Nova uses placement mainly from two places:

The nova-compute nodes report resource provider and inventory to placement and make sure that the placement view of what hardware is present is accurate.
The nova-scheduler processes request candidates for placement, and claim resources by writing allocations to placement.

There are some additional interactions, mostly associated with migrations or fixing up unusual edge cases. Since those things are rare they are sort of noise in this discussion, so left out.

When a basic (where basic means no nested resource providers) compute node starts up it POSTs to create a resource provider and then PUTs to set the inventory. After that a periodic job runs, usually every 60 seconds. In that job we see the following 11 requests:

GET /placement/resource_providers?in_tree=82fffbc6-572b-4db0-b044-c47e34b27ec6
GET /placement/resource_providers/82fffbc6-572b-4db0-b044-c47e34b27ec6/inventories
GET /placement/resource_providers/82fffbc6-572b-4db0-b044-c47e34b27ec6/aggregates
GET /placement/resource_providers/82fffbc6-572b-4db0-b044-c47e34b27ec6/traits
GET /placement/resource_providers/82fffbc6-572b-4db0-b044-c47e34b27ec6/inventories
GET /placement/resource_providers/82fffbc6-572b-4db0-b044-c47e34b27ec6/allocations
GET /placement/resource_providers?in_tree=82fffbc6-572b-4db0-b044-c47e34b27ec6
GET /placement/resource_providers/82fffbc6-572b-4db0-b044-c47e34b27ec6/inventories
GET /placement/resource_providers/82fffbc6-572b-4db0-b044-c47e34b27ec6/aggregates
GET /placement/resource_providers/82fffbc6-572b-4db0-b044-c47e34b27ec6/traits
GET /placement/resource_providers/82fffbc6-572b-4db0-b044-c47e34b27ec6/inventories

A year and a half ago it was 5 requests per-cycle, but they were different requests:

GET /placement/resource_providers/0e33c6f5-62f3-4522-8f95-39b364aa02b4/aggregates
GET /placement/resource_providers/0e33c6f5-62f3-4522-8f95-39b364aa02b4/inventories
GET /placement/resource_providers/0e33c6f5-62f3-4522-8f95-39b364aa02b4/allocations
GET /placement/resource_providers/0e33c6f5-62f3-4522-8f95-39b364aa02b4/aggregates
GET /placement/resource_providers/0e33c6f5-62f3-4522-8f95-39b364aa02b4/inventories

The difference comes from two changes:

We no longer confirm allocations on the compute node.
We've now have things called ProviderTrees which are responsible for managing nested providers, aggregates and traits in a unified fashion.

It appears, however, that we have some redundancies. We get inventories 4 times; aggregates, providers and traits 2 times, and allocations once.

The in_tree calls happen from the report client method _get_providers_in_tree which is called by _ensure_resource_provider which can be called from multiple places, but in this case is being called both times from get_provider_tree_and_ensure_root, which is also responsible for two of the inventory request.

get_provider_tree_and_ensure_root is called by _update in the resource tracker.

_update is called by both _init_compute_node and _update_available_resource. Every single period job iteration. _init_compute_node is called from _update_available_resource` itself.

That accounts for the overall doubling.

The two calls inventories per group come from the following, in get_provider_tree_and_ensure_root:

_ensure_resource_provider in the report client calls _refresh_and_get_inventory for every provider in the tree (the result of the in_tree query)
Immediately after the the call to _ensure_resource_provider every provider in the provider tree (from self._provider_tree.get_provider_uuids()) then has a _refresh_and_get_inventory call made.

In a non-sharing, non-nested scenario (such as a single node devstack, which is where I'm running this analysis) these are the exact same one resource provider. I'm insufficiently aware of what might be in the provider tree in more complex situations to be clear on what could be done to limit redundancy here, but it's a place worth looking.

The requests for aggregates and traits happen via _refresh_associations in _ensure_resource_provider.

The single allocation request is from the resource tracker calling _remove_deleted_instances_allocations checking to see if it is possible to clean up any allocations left over from migrations.

Summary/Actions

So what now? There are two avenues for potential investigation:

Each time _update is called it calls get_provider_tree_and_ensure_root. Can one of those be skipped while keeping the rest of _update? Or perhaps it is possible to avoid one of the calls to _update entirely?
Can the way get_provider_tree_and_ensure_root tries to manage inventory twice be rationalized for simple cases?

I've run out of time for now, so this doesn't address the requests that happen once an instance exists. I'll get to that another time.

Posted on: Thu 26 July 2018

Category: openstack – Tags: openstack, opensource, placement