It's been quite a while since the last one, mostly because of travel, but also because coming to grips with the placement universe takes some time. Catching up will mean that this update is likely to be a bit long. Bear with it. This is obviously an expand style update (where we add new stuff). Next week will be a contract.
One thing I'd like to highlight is that with the merge of change 560459 we've hit a long promised milestone with placement. Thanks to an initial hit by Eric Fried and considerable followups by Bhagyashri Shewale, we now have rudimentary support in nova for libvirt-using compute nodes that use shared disk to accurately report and claim that disk. Using it requires some currently manual set up for the resource provider associated with the disk and creating the aggregate of that disk with the compute nodes that use it. But: this is one of the earliest promises provided by the placement concept, in the works for more than two years by many different people, finally showing up. Open the bubbly or something, a light celebration is in order.
The flip side of this is that it highlights that we have a growing documentation debt with the many features provided by placement and how to make best use of them in nova (and other services that might like to use placement). Before the end of the cycle we will need to be sure that we set aside a considerable chunk of time to address this gap.
Getting nested providers and consumer generations working are still the key pieces of work. See the links in the themes below.
A lot of complicated work is in progress or recently merged and we are getting deeper into the cycle. There are going to be bugs. The sooner we get stuff merged so it has time to interact and we have time to experiment with it the better. And there's also that documentation gap mentioned above.
Also a reminder that for blueprints that have code that is ready for wide review, put it on the runway.
(This is rather long because of the gap since the last report, but also because we've hit a point where lots of stuff can merge.)
Discussion revealed an issue with allocations and inventory that
exists on a top-level resource provider which we'd later like to
move to a nested provider. An example is VGPU inventory which, until
sometime very soon, was represented as inventory on the compute
node (I think). Fixing this should be an atomic operation so a spec
is in progress for Handling Reshaped Provider
Trees. This suggests a
/migrator URI in the placement service, and for the sake of
fast-forward-upgrades, a way to reach that URI from a within-process
placement service (rather than over HTTP). The
PlacementDirect tool has
been created to allow this and has merged. Quite a lot of work will
need to be done to implement that spec, so I'm going to add it as a
Nova now requires the 1.25 placement microversion. It will go up again soon.
The groundwork for consumer generations (including requiring some form of project and user on all allocations) has merged. What remains is exposing it all at the API layer.
The placement version discovery document was incomplete, causing trouble for certain ways of using the openstacksdk. This has been fixed.
Placement now supports granular policy (policy per URI) in-code, with customization possible via a policy file.
A potential 500 when listing usage information has been fixed.
There is now a heal allocations CLI which is designed to help people migrate away from the CachingScheduler (which doesn't use placement).
Nova host aggregates are now magically mirrored as placement aggregates and, amongst other things, this is used to honor the availability_zone hint via placement.
- Placement related bugs not yet in progress: 16, same as last time, but a different set of bugs.
- In progress placement bugs 9, -1 on last time.
Total four weeks ago: 13. Now: 13
Spec-freeze has passed, so presumably exceptions will be required for these. There's already a notional exception for "Reshaped Provider Trees".
https://review.openstack.org/#/c/549067/ VMware: place instances on resource pool (using update_provider_tree)
https://review.openstack.org/#/c/552924/ Proposes NUMA topology with RPs
https://review.openstack.org/#/c/544683/ Account for host agg allocation ratio in placement
https://review.openstack.org/#/c/552105/ Support default allocation ratios
https://review.openstack.org/#/c/438640/ Spec on preemptible servers
https://review.openstack.org/#/c/555081/ Standardize CPU resource tracking
https://review.openstack.org/#/c/509042/ Propose counting quota usage from placement
https://review.openstack.org/#/c/560174/ Add history behind nullable project_id and user_id
https://review.openstack.org/#/c/565730/ Placement: any traits in allocation_candidate query
https://review.openstack.org/#/c/565741/ Placement: support mixing required traits with any traits
https://review.openstack.org/#/c/559718/ [WIP] Support Placement in Cinder
https://review.openstack.org/#/c/572583/ Handling Reshaped Provider Trees
https://review.openstack.org/#/c/569011/ Count quota based on resource class
"Mirror nova host aggregates to placement" and "Granular" are done, so no longer listed as a theme. "Reshaped Provider Trees" is added because we're stuck if we don't do it.
Nested providers in allocation candidates
Quite a bit of the work related to nested providers in allocation candidates has merged. What remains is on this topic:
Eric noticed that in this process we've injected some changes in behavior in Rocky in the response to /allocation_candidates without guarding it by microversion changes. There's some discussion about it in IRC. First with me and then later with Jay. The gist is that it's unfortunate that happened, but it's not a disaster and the best outcome is that the diff between Queens and Rocky demonstrates the right behavior.
This allows multiple agents to "safely" update allocations for a single consumer. The code is in progress:
As noted above, much of this is merged. Most of what is left is exposing the functionality at the API level.
Reshaped Provider Trees
This allows moving inventory and allocations that were on resource provider A to resource provider B in an atomic fashion. Right now this is a spec on the following topic:
A glance at the spec will reveal that this is a multi-faceted and multi-party effort. Nine people are listed in the Assignee section.
The placement direct part merged today.
The placement db connection change has been previously +W but since had a few merge conflicts. It presumably will merge soon. This will allow installations to optionally use a separate database for placement data. When that merges a zuul change to use it will adjust the nova-next job. The changes required to devstack are already in place.
A stack of changes to placement unit tests to make them not rely on nova.test has merged. There are functional tests remaining which still use that. If you are looking for extraction-related work, finding ways in which nova code is imported but isn't really needed is a good way to make progress.
A while back, Jay made a first pass at an os-resource-classes, which needs some additional eyes on it. I personally thought it might be heavier than required. If you have ideas please share them.
The placement extraction forum session went well. There was pretty good consensus from the people in the room and we got some useful feedback from some operators on how things ought to work.
An area we will need to prepare for is dealing with the various infra and co-gating issues that will come up once placement is extracted.
19 entries four weeks ago. 23 now.
Some of the older items in this list are not getting much attention. That's a shame. The list is ordered (oldest first) the way it is on purpose.
https://review.openstack.org/#/c/546660/ Purge comp_node and res_prvdr records during deletion of cells/hosts
https://review.openstack.org/#/q/topic:bp/placement-osc-plugin-rocky A huge pile of improvements to osc-placement
https://review.openstack.org/#/c/527791/ Get resource provider by uuid or name (osc-placement)
https://review.openstack.org/#/c/477478/ placement: Make API history doc more consistent
https://review.openstack.org/#/c/556669/ Handle agg generation conflict in report client
https://review.openstack.org/#/c/537614/ Add unit test for non-placement resize
https://review.openstack.org/#/c/493865/ cover migration cases with functional tests
https://review.openstack.org/#/q/topic:bug/1732731 Bug fixes for sharing resource providers
https://review.openstack.org/#/c/535517/ Move refresh time from report client to prov tree
https://review.openstack.org/#/c/561770/ PCPU resource class
https://review.openstack.org/#/c/566166/ rework how we pass candidate request information
https://review.openstack.org/#/c/564876/ add root parent NULL online migration
https://review.openstack.org/#/q/topic:bp/bandwidth-resource-provider add resource_requests field to RequestSpec
https://review.openstack.org/#/c/575127/ replace deprecated accept.best_match
https://review.openstack.org/#/c/575222/ Don't heal allocations for deleted servers
https://review.openstack.org/#/c/575237/ Ignore UserWarning for scope checks during test runs
https://review.openstack.org/#/c/568965/ Enforce placement minimum in nova.cmd.status
https://review.openstack.org/#/c/560107/ normalize_name helper (in os-traits)
https://review.openstack.org/#/c/573475/ Fix nits in nested provider allocation candidates(2)
https://review.openstack.org/#/c/538498/ Convert driver supported capabilities to compute node provider traits
https://review.openstack.org/#/c/568639/ Use placement.inventory.inuse in report client
https://review.openstack.org/#/c/517921/ ironic: Report resources as reserved when needed
https://review.openstack.org/#/c/568713/ Test for multiple limit/group_policy qparams
Yow. That was long. Thanks for reading. Review some code please.