This is placement update 18-24, a weekly update of ongoing development related to the OpenStack placement service.
It's been quite a while since the last one, mostly because of travel, but also because coming to grips with the placement universe takes some time. Catching up will mean that this update is likely to be a bit long. Bear with it. This is obviously an expand style update (where we add new stuff). Next week will be a contract.
One thing I'd like to highlight is that with the merge of change 560459 we've hit a long promised milestone with placement. Thanks to an initial hit by Eric Fried and considerable followups by Bhagyashri Shewale, we now have rudimentary support in nova for libvirt-using compute nodes that use shared disk to accurately report and claim that disk. Using it requires some currently manual set up for the resource provider associated with the disk and creating the aggregate of that disk with the compute nodes that use it. But: this is one of the earliest promises provided by the placement concept, in the works for more than two years by many different people, finally showing up. Open the bubbly or something, a light celebration is in order.
The flip side of this is that it highlights that we have a growing documentation debt with the many features provided by placement and how to make best use of them in nova (and other services that might like to use placement). Before the end of the cycle we will need to be sure that we set aside a considerable chunk of time to address this gap.
Most Important
Getting nested providers and consumer generations working are still the key pieces of work. See the links in the themes below.
A lot of complicated work is in progress or recently merged and we are getting deeper into the cycle. There are going to be bugs. The sooner we get stuff merged so it has time to interact and we have time to experiment with it the better. And there's also that documentation gap mentioned above.
Also a reminder that for blueprints that have code that is ready for wide review, put it on the runway.
What's Changed
(This is rather long because of the gap since the last report, but also because we've hit a point where lots of stuff can merge.)
Discussion revealed an issue with allocations and inventory that
exists on a top-level resource provider which we'd later like to
move to a nested provider. An example is VGPU inventory which, until
sometime very soon, was represented as inventory on the compute
node (I think). Fixing this should be an atomic operation so a spec
is in progress for Handling Reshaped Provider
Trees. This suggests a
new /migrator
URI in the placement service, and for the sake of
fast-forward-upgrades, a way to reach that URI from a within-process
placement service (rather than over HTTP). The
PlacementDirect tool has
been created to allow this and has merged. Quite a lot of work will
need to be done to implement that spec, so I'm going to add it as a
theme (below).
Nova now requires the 1.25 placement microversion. It will go up again soon.
The groundwork for consumer generations (including requiring some form of project and user on all allocations) has merged. What remains is exposing it all at the API layer.
The placement version discovery document was incomplete, causing trouble for certain ways of using the openstacksdk. This has been fixed.
Placement now supports granular policy (policy per URI) in-code, with customization possible via a policy file.
A potential 500 when listing usage information has been fixed.
There is now a heal allocations CLI which is designed to help people migrate away from the CachingScheduler (which doesn't use placement).
Nova host aggregates are now magically mirrored as placement aggregates and, amongst other things, this is used to honor the availability_zone hint via placement.
Bugs
- Placement related bugs not yet in progress: 16, same as last time, but a different set of bugs.
- In progress placement bugs 9, -1 on last time.
Specs
Total four weeks ago: 13. Now: 13
Spec-freeze has passed, so presumably exceptions will be required for these. There's already a notional exception for "Reshaped Provider Trees".
-
https://review.openstack.org/#/c/549067/ VMware: place instances on resource pool (using update_provider_tree)
-
https://review.openstack.org/#/c/552924/ Proposes NUMA topology with RPs
-
https://review.openstack.org/#/c/544683/ Account for host agg allocation ratio in placement
-
https://review.openstack.org/#/c/552105/ Support default allocation ratios
-
https://review.openstack.org/#/c/438640/ Spec on preemptible servers
-
https://review.openstack.org/#/c/555081/ Standardize CPU resource tracking
-
https://review.openstack.org/#/c/509042/ Propose counting quota usage from placement
-
https://review.openstack.org/#/c/560174/ Add history behind nullable project_id and user_id
-
https://review.openstack.org/#/c/565730/ Placement: any traits in allocation_candidate query
-
https://review.openstack.org/#/c/565741/ Placement: support mixing required traits with any traits
-
https://review.openstack.org/#/c/559718/ [WIP] Support Placement in Cinder
-
https://review.openstack.org/#/c/572583/ Handling Reshaped Provider Trees
-
https://review.openstack.org/#/c/569011/ Count quota based on resource class
Main Themes
"Mirror nova host aggregates to placement" and "Granular" are done, so no longer listed as a theme. "Reshaped Provider Trees" is added because we're stuck if we don't do it.
Nested providers in allocation candidates
Quite a bit of the work related to nested providers in allocation candidates has merged. What remains is on this topic:
Eric noticed that in this process we've injected some changes in behavior in Rocky in the response to /allocation_candidates without guarding it by microversion changes. There's some discussion about it in IRC. First with me and then later with Jay. The gist is that it's unfortunate that happened, but it's not a disaster and the best outcome is that the diff between Queens and Rocky demonstrates the right behavior.
Consumer Generations
This allows multiple agents to "safely" update allocations for a single consumer. The code is in progress:
As noted above, much of this is merged. Most of what is left is exposing the functionality at the API level.
Reshaped Provider Trees
This allows moving inventory and allocations that were on resource provider A to resource provider B in an atomic fashion. Right now this is a spec on the following topic:
A glance at the spec will reveal that this is a multi-faceted and multi-party effort. Nine people are listed in the Assignee section.
The placement direct part merged today.
Extraction
The placement db connection change has been previously +W but since had a few merge conflicts. It presumably will merge soon. This will allow installations to optionally use a separate database for placement data. When that merges a zuul change to use it will adjust the nova-next job. The changes required to devstack are already in place.
A stack of changes to placement unit tests to make them not rely on nova.test has merged. There are functional tests remaining which still use that. If you are looking for extraction-related work, finding ways in which nova code is imported but isn't really needed is a good way to make progress.
A while back, Jay made a first pass at an os-resource-classes, which needs some additional eyes on it. I personally thought it might be heavier than required. If you have ideas please share them.
The placement extraction forum session went well. There was pretty good consensus from the people in the room and we got some useful feedback from some operators on how things ought to work.
An area we will need to prepare for is dealing with the various infra and co-gating issues that will come up once placement is extracted.
Other
19 entries four weeks ago. 23 now.
Some of the older items in this list are not getting much attention. That's a shame. The list is ordered (oldest first) the way it is on purpose.
-
https://review.openstack.org/#/c/546660/ Purge comp_node and res_prvdr records during deletion of cells/hosts
-
https://review.openstack.org/#/q/topic:bp/placement-osc-plugin-rocky A huge pile of improvements to osc-placement
-
https://review.openstack.org/#/c/527791/ Get resource provider by uuid or name (osc-placement)
-
https://review.openstack.org/#/c/477478/ placement: Make API history doc more consistent
-
https://review.openstack.org/#/c/556669/ Handle agg generation conflict in report client
-
https://review.openstack.org/#/c/537614/ Add unit test for non-placement resize
-
https://review.openstack.org/#/c/493865/ cover migration cases with functional tests
-
https://review.openstack.org/#/q/topic:bug/1732731 Bug fixes for sharing resource providers
-
https://review.openstack.org/#/c/535517/ Move refresh time from report client to prov tree
-
https://review.openstack.org/#/c/561770/ PCPU resource class
-
https://review.openstack.org/#/c/566166/ rework how we pass candidate request information
-
https://review.openstack.org/#/c/564876/ add root parent NULL online migration
-
https://review.openstack.org/#/q/topic:bp/bandwidth-resource-provider add resource_requests field to RequestSpec
-
https://review.openstack.org/#/c/575127/ replace deprecated accept.best_match
-
https://review.openstack.org/#/c/575222/ Don't heal allocations for deleted servers
-
https://review.openstack.org/#/c/575237/ Ignore UserWarning for scope checks during test runs
-
https://review.openstack.org/#/c/568965/ Enforce placement minimum in nova.cmd.status
-
https://review.openstack.org/#/c/560107/ normalize_name helper (in os-traits)
-
https://review.openstack.org/#/c/573475/ Fix nits in nested provider allocation candidates(2)
-
https://review.openstack.org/#/c/538498/ Convert driver supported capabilities to compute node provider traits
-
https://review.openstack.org/#/c/568639/ Use placement.inventory.inuse in report client
-
https://review.openstack.org/#/c/517921/ ironic: Report resources as reserved when needed
-
https://review.openstack.org/#/c/568713/ Test for multiple limit/group_policy qparams
End
Yow. That was long. Thanks for reading. Review some code please.