Here's a placement update. Last week there wasn't one, because of the PTG. There will be some references to various PTG stuff within but since we haven't fully resolved what the priorities will be, the discussion here will be somewhat unfocused.
Most Important
Two main important things to do:
As is typical (at least in my experience), last week we discussed and planned more work than anyone could be reasonably be expected to accomplish in a few years, let alone a single cycle, so there will be an inevitable winnowing and prioritizing of ideas and specs over the next few days. There's some discussion of priorities on an etherpad, but the details of which to do and how to implement are not fully resolved. Reviewing the specs (below) ought to help that.
We're still working towards a complete set of integration and upgrade tests for the new placement repo. The unit and functional tests are happy and nicely fast, but they aren't covering important things like upgrading from placement-in-nova to just-placement, nor do they do any live testing with a full devstack. Work is in progress on all of this, see the "extraction" section below.
What's Changed
We had a meeting to come up with a plan for migrating placement to an independent project. Mel wrote up a summary email with the steps.
Questions and Links
(I've added "links" to this section because since there's a good one this week, why not?)
-
There was a demo at the PTG for the minimum bandwith work. That's been written up in a blog post.
-
Yesterday, belmoreira showed up in #openstack-placement with some issues with expected resource providers not showing up in allocation candidates. This was traced back to
max_unit
forVCPU
being locked at ==total
and hardware which had had SMT turned off now reporting fewer CPUs, thus being unable to accept existing large flavors. Discussion ensued about ways to potentially makemax_unit
more manageable by operators. The existing constraint is there for a reason (discussed in IRC) but that reason is not universally agreed.There are two issues with this: The "reason" is not universally agreed and we didn't resolve that. Also, management of
max_unit
of any inventory gets more complicated in a world of complex NUMA topologies.
Bugs
- Placement related bugs not yet in progress: 17. No change (in number) from last time.
- In progress placement bugs 10. Same as last time.
Specs
New (or newly discovered) ones are at the end. Specs which have merged have been removed. As stated above: We still haven't solidified priorities, so some specs may merge as "low priority".
-
https://review.openstack.org/#/c/544683/ Account for host agg allocation ratio in placement (Still in rocky/)
-
https://review.openstack.org/#/c/595236/ Add subtree filter for GET /resource_providers
-
https://review.openstack.org/#/c/597601/ Resource provider - request group mapping in allocation candidate
-
https://review.openstack.org/#/c/549067/ VMware: place instances on resource pool (still in rocky/)
-
https://review.openstack.org/#/c/555081/ Standardize CPU resource tracking
-
https://review.openstack.org/#/c/599957/ Allow overcommit of dedicated CPU (Has an alternative which changes allocations to a float)
-
https://review.openstack.org/#/c/600016/ List resource providers having inventory
-
https://review.openstack.org/#/c/593475/ Bi-directional enforcement of traits
-
https://review.openstack.org/#/c/599598/ allow transferring ownership of instance
-
https://review.openstack.org/#/c/591037/ Modelling passthrough devices for report to placement
-
https://review.openstack.org/#/c/509042/ Propose counting quota usage from placement and API database (A bit out of date but may be worth resurrecting)
-
https://review.openstack.org/#/c/603585/ Spec: allocation candidates in tree
-
https://review.openstack.org/#/c/603805/ [WIP] generic device discovery policy
-
https://review.openstack.org/#/c/603955/ Nova Cyborg interaction specification.
-
https://review.openstack.org/#/c/601596/ supporting virtual NVDIMM devices
-
https://review.openstack.org/#/c/603352/ Spec: Support filtering by forbidden aggregate
-
https://review.openstack.org/#/c/552924/ Proposes NUMA topology with RPs
-
https://review.openstack.org/#/c/552105/ Support initial allocation ratios (There are at least two pending allocation ratio handling cleanup specs. It's not clear from the PTG etherpad which of these was chosen as the future (we did choose, but the etherpad is confusing). 544683 (above) is the other one.)
-
https://review.openstack.org/#/c/569011/ Count quota based on resource class
Main Themes
These are interim themes while we work out what priorities are.
Making Nested Useful
An acknowledged outcome from the PTG was that we need to do the work to make workloads that want to use nested resource providers actually able to land on a host somewhere. This involves work across many parts of nova and could easily lead to a mass of bug fixes in placement. I'm probably missing a fair bit but the following topics are good starting points:
- https://review.openstack.org/#/q/topic:bp/use-nested-allocation-candidates
- https://review.openstack.org/#/q/topic:use-nested-allocation-candidates
- https://review.openstack.org/#/q/topic:bug/1792503
Consumer Generations
gibi is still working hard to drive home support for consumer generations on the nova side. Because of some dependency management that stuff is currently in the following topic:
Extraction
As mentioned above, getting the extracted placement happy is proceeding apace. Besides many of the generic cleanups happening to the repo we need to focus some effort on upgrade and integration testing, docs publishing, and doc correctness.
Dan has started a database migration script which will be used by deployers and grenade for upgrades. Matt is hoping to make some progress on the grenade side of things. I have a hacked up devstack for using the extracted placement.
All of this is dependent on:
- database migrations being "collapsed"
- the existence of a
placement-manage
script to initialize the database
I made a faked up placement-manage for the devstack patch above, but it only creates tables, doesn't migrate, and is not fit for purpose as a generic CLI.
I have started some experiments on using gabbi-tempest to drive some integration tests for placement with solely gabbi YAML files. I initially did this using "legacy" style zuul jobs, and made it work, but it was ugly and I've since started using more modern zuul, but haven't yet made it work.
Other
As with last time, I'm not going to make a list of links to pending changes that aren't already listed above. I'll start doing that again eventually (once priorities are more clear), but for now it is useful to look at open placement patches and patches from everywhere which mention placement in the commit message.
End
In case anyone is wondering where I am, I'm out M-W next week.