Woo! Placement update 19-19. First one post PTG and Summit. Thanks to everyone who helped make it a useful event for Placement. Having the pre-PTG meant that we had addressed most issues prior to getting there meaning that people were freed up to work in other areas and the discussions we did have were highly coherent.
Thanks, also, to everyone involved in getting placement deleted from nova. We did that while at the PTG and had a little celebration.
Most Important
We're still working on narrowing priorities and focusing the details of those priorities. There's an etherpad where we're taking votes on what's important. There are three specs in progress from that that need review and refinement. There are two others which have been put on the back burner (see specs section below).
What's Changed
-
We're now running a subset of nova's functional tests in placement's gate.
-
osc-placement is using the PlacementFixture to run its functional tests making them much faster.
-
There's a set of StoryBoard worklists that can be used to help find in progress work and new bugs. That section also describes how tags are used.
-
There's a summary of summaries email message that summarizes and links to various results from the PTG.
Specs/Features
As the summary of summaries points out, we have two major features this cycle, one of which is large: getting consumer types going and getting a whole suite of features going to support nested providers in a more effective fashion.
-
https://review.opendev.org/654799 Support Consumer Types. This is very close with a few details to work out on what we're willing and able to query on. It only has reviews from me so far.
-
https://review.opendev.org/658510 Spec for Nested Magic. This is associated with a lengthy story that includes visual artifacts from the PTG. It covers several related features to enable nested-related requirements from nova and neutron. It is a work in progress, with several unanswered questions. It is also something that efried started but will be unable to finish so the rest of us will need to finish it up as the questions get answered. And it also mostly subsumes a previous spec on subtree affinity. (Eric, please correct me if I'm wrong on that.)
-
https://review.opendev.org/657582 Resource provider - request group mapping in allocation candidate. This spec was copied over from nova. It is a requirement of the overall nested magic theme. While it has a well-defined and refined design, there's currently no one on the hook implement it.
There are also two specs that are still live but de-prioritized:
-
https://review.openstack.org/649992 support any trait in allocation candidates
-
https://review.openstack.org/649368 support mixing required traits with any traits
These and other features being considered can be found on the feature worklist.
Some non-placement specs are listed in the Other section below.
Stories/Bugs
There are 23 stories in the placement group. 0 are untagged. 4 are bugs. 5 are cleanups. 12 are rfes. 2 are docs.
If you're interested in helping out with placement, those stories are good places to look.
On launchpad:
-
Placement related nova bugs not yet in progress on launchpad: 16. +3
-
In progress placement bugs on launchpad: 6. +2. These are placement-related, in nova.
Of those there two interesting ones to note:
-
https://bugs.launchpad.net/nova/+bug/1829062 nova placement api non-responsive due to eventlet error. When using placement-in-nova in stein, recent eventlet changes can cause issues. As I've mentioned on the bug the best way out of this problem is to use placement-in-placement but there are other solutions.
-
https://bugs.launchpad.net/nova/+bug/1829479 The allocation table has residual records when instance is evacuated and the source physical node is removed. This appears to be yet another issue related to orphaned allocations during one of the several move operations. The impact they are most concerned with, though, seems to be the common "When I bring up a new compute node with the same name there's an existing resource provider in the way" that happens because of the unique constrain on the rp name column.
I'm still not sure that constraint is the right thing unless we want to make people's lives hard when they leave behind allocations. We may want to make it hard because it will impact quota...
osc-placement
osc-placement is currently behind by 11 microversions. No change since the last report.
Pending changes:
Note: a few of these having been sitting for some time with my +2 awaiting review by some other placement core. Please remember osc-placement when reviewing.
-
https://review.openstack.org/#/c/640898/ Add 'resource provider inventory update' command (that helps with aggregate allocation ratios).
-
https://review.openstack.org/#/c/651783/ Add support for 1.22 microversion
-
https://review.openstack.org/586056 Provide a useful message in the case of 500-error
-
https://review.openstack.org/650257 Remove unused cruft from doc and releasenotes config
-
https://review.openstack.org/652100 Improve aggregate version check error messages with min_version
-
https://review.opendev.org/653285 Expose version error message generically
Main Themes
Now that the PTG has passed some themes have emerged. Since the Nested Magic one is rather all encompassing and Cleanup is a catchall, I think we can consider three enough. If there's some theme that you think is critical that is being missed, let me know.
For people coming from the nova-side of the world who need or want something like review runways to know where they should be focusing their review energy, consider these themes and the links within them as a runway. But don't forget bugs and everything else.
Nested Magic
At the PTG we decided that it was worth the effort, in both Nova and Placement, to make the push to make better use of nested providers — things like NUMA layouts, multiple devices, networks — while keeping the "simple" case working well. The general ideas for this are described in a story and an evolving spec.
Some code has started, mostly to reveal issues:
-
https://review.opendev.org/657419 Changing request group suffix to string
-
https://review.opendev.org/657510 WIP: Allow RequestGroups without resources
-
https://review.opendev.org/657463 Add NUMANetworkFixture for gabbits
-
https://review.opendev.org/658192 Gabbi test cases for can_split
Consumer Types
Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting. A spec has started. There are some questions about request and response details that need to be resolved, but the overall concept is sound.
Cleanup
As we explore and extend nested functionality we'll need to do some work to make sure that the code is maintainable and has suitable performance. There's some work in progress for this that's important enough to call out as a theme:
-
https://storyboard.openstack.org/#!/story/2005712 Some work from Tetsuro exploring ways to remove redundancies in the code. There's a related WIP
-
https://review.opendev.org/659522 Enhance debug logging in allocation candidate handling
-
https://review.opendev.org/658164 Start of a stack that will allow us to remove the protections against null root providers (which turns out is a pretty significant performance hit).
-
https://review.opendev.org/643269 WIP: Optionally run a wsgi profiler when asked. This was used to find some of the above issues. Should we make it generally available or is it better as a thing to base off when exploring?
Ed Leafe has also been doing some intriguing work on using graph databases with placement. It's not yet clear if or how it could be integrated with mainline placement, but there are likely many things to be learned from the experiment.
Other Placement
-
https://review.opendev.org/#/q/topic:refactor-classmethod-diaf A suite of refactorings that given their lack of attention perhaps we don't need or want, but let's be explicit about that rather than ignoring the patches if that is indeed the case.
-
https://review.opendev.org/645255 A start at some unit tests for the PlacementFixture which got lost in the run up to the PTG. They may be less of a requirement now that placement is running nova's functional tests. But again, we should be explicit about that decision.
Other Service Users
New discoveries are added to the end. Merged stuff is removed.
-
https://review.openstack.org/552924 Nova: Spec: Proposes NUMA topology with RPs
-
https://review.openstack.org/622893 Nova: Spec: Virtual persistent memory libvirt driver implementation
-
https://review.openstack.org/641899 Nova: Check compute_node existence in when nova-compute reports info to placement
-
https://review.openstack.org/601596 Nova: spec: support virtual persistent memory
-
https://review.openstack.org/#/q/topic:bug/1790204 Workaround doubling allocations on resize
-
https://review.openstack.org/645316 Nova: Pre-filter hosts based on multiattach volume support
-
https://review.openstack.org/647396 Nova: Add flavor to requested_resources in RequestSpec
-
https://review.openstack.org/633204 Blazar: Retry on inventory update conflict
-
https://review.openstack.org/#/q/topic:bp/count-quota-usage-from-placement Nova: count quota usage from placement
-
https://review.openstack.org/#/q/topic:bug/1819923 Nova: nova-manage: heal port allocations
-
https://review.openstack.org/648665 Nova: Spec for a new nova virt driver to manage an RSD
-
https://review.openstack.org/625284 Cyborg: Initial readme for nova pilot
-
https://review.openstack.org/629142 Tempest: Add QoS policies and minimum bandwidth rule client
-
https://review.openstack.org/648687 Nova-spec: Add PENDING vm state
-
https://review.openstack.org/650188 nova-spec: Allow compute nodes to use DISK_GB from shared storage RP
-
https://review.openstack.org/651024 nova-spec: RMD Plugin: Energy Efficiency using CPU Core P-State control
-
https://review.openstack.org/651455 puppet: Debian: Add support for placement-api over uwsgi
-
https://review.openstack.org/650963 nova-spec: Proposes NUMA affinity for vGPUs. This describes a legacy way of doing things because affinity in placement may be a ways off. But it also may not be.
-
https://review.openstack.org/#/q/topic:heal_allocations_dry_run Nova: heal allocations, --dry-run
-
https://review.openstack.org/642527 Neutron: Fullstack test for placement sync
-
https://review.opendev.org/656448 Watcher spec: Add Placement helper
-
https://review.opendev.org/659233 Cyborg: Placement report
-
https://review.opendev.org/657884 Nova: Spec to pre-filter disabled computes with placement
-
https://review.opendev.org/657801 rpm-packaging: placement service
-
https://review.opendev.org/657016 Delete resource providers for all nodes when deleting compute service
End
I'm out of practice on these things. This one took a long time.