Placement Update 19-08

Posted on: Fri 01 March 2019

Welcome back to the placement update. If I've read the signs correctly, I should now be back to this as a regular thing. Apologies for the gap, I had to attend to some other responsibilities.

Most Important

A lot has changed in the past few months, so it's hard to extract out a most important. It will depend on who is reading. Review what's changed for a summary of important stuff.

What's Changed

Placement is now its own official project. Until elections are held (it looks like nominations start this coming Tuesday), Mel is the PTL.
Setting up storyboard for placement-related projects is in progress. For the time being we are continuing to use launchpad for most tracking. See a related email thread.
Deleting placement code from nova has been put on hold until Train to make it easier for certain types of upgrades to happen. New installs should prefer the extracted code, as the nova-side is frozen, but the placement side is not.
A large stack of code to remove oslo.versionedobjects from placement has merged. This has resulted in a significant change in performance on the perfload test that runs in the gate. While not a complete representation of the entire system, it's enough to say "yeah, that was worth it": A request for allocation candidates that used to take around 2.5 seconds now takes 1.2. That refactoring continues (see below), seeking additional simplifications.
Microversion 1.31 adds in_tree and in_treeN query parameters to GET /allocation_candidates. This is useful in a variety of nested resource provider scenarios, including the big bandwidth QoS changes that are in progress in nova and neutron.
Placement is now publishing install docs but it is important to note that those docs have not been validated (as far as I'm aware) by the packagers. That's a thing that needs to happen, presumably by the packagers.
os-resource-classes 0.3.0 has been released with a normalize_name function.
There are some pending specs from nova which are primarily placement feature specs. We'll continue with those as is (see below), but come the next cycle the plan is to manage specs in the placement repo, not have a separate repo, and not have separate spec cores.

Specs/Blueprints/Features

Near to Done

Filter Allocation Candidates by Provider Tree has been mostly completed by Tetsuro, but there's a pending update to the spec.

Not yet Done

Not yet Approved

Update alloc-candidates-in-tree updates the in-tree spec above to reflect what was learned while doing the actual implementation. Notably how numbered in_tree parameters impact results.
Resource provider - request group mapping in allocation candidate has had a recent resurgence in attention.

Bugs

Placement related bugs not yet in progress: 15.
In progress placement bugs 17.

osc-placement

osc-placement is currently behind by 14 microversions.

Code for 1.18 is under review.

Main Themes

This section now overlaps a bit with the Specs/Features bit above. This will settle out with a bit more clarity as we move along.

Nested

Reshaper handing in nova keeps exposing additional things that need to be remembered on the nova-side, so there are a few patches remaining related to vgpu reshaping but it is mostly ready.
The bandwidth-resource-provider topic has merged a vast amount of code but there is still plenty left.

Related to all this nested stuff: The complex hardware models that drove the development of the nested resource provider system are challenging to test. The cloud hardware provided to OpenStack infrastructure does not expose the hardware that would allow real integration tests. If anyone reading this is in a position to provide third party CI with fancy hardware for NUMA, NFV, FPGA, and GPU related integration testing with nova, there's a significant need for that.

Refactoring

(I think refactoring should be a constant theme. To reflect that, I'm going to have a section here. Editorial privilege or something.)

There's a collection of patches in progress, currently under the topic scrub-Lists that is a follow up to the patches that removed oslo versioned objects. That work pointed out some opportunities to DRY-up the List classes (e.g., UsageList) to remove some duplication and simplify. Then, after looking at that, it became clear that entirely removing the List classes, in favor of using python native lists, would further simplify the code.

Apart from the previously mentioned performance and simplicity benefits of these changes, it's also managed to expose and fix a few bugs, simple because we were looking at things and moving them around. If you pick up rocks, you can see the bugs and squash them. If you don't, they breed.

Other Placement

https://review.openstack.org/#/q/topic:improve-debug-log A series of improvements leading to a better debug log when retrieving allocation candidates.
https://review.openstack.org/#/c/639628/ Docs: extract testing info to own sub-page
https://review.openstack.org/#/q/topic:cd/gabbi-tempest-job Gabbi-based integration tests of placement. These recently found a bug that none of the functional, grenade, nor tempest tests did.
https://review.openstack.org/#/c/619050/ Optionally migrate database at service startup (so you don't have to run placement-manage db sync if you don't want to).
https://review.openstack.org/#/c/630216/ Add a vision-reflection (of the Technical Vision doc).

Other Service Users

Nova

See also the several links above for more nova changes. Also, I'm a bit behind on my tracking in this area, so there is likely plenty of other stuff too. This will improve over time.

https://review.openstack.org/538498 Convert driver supported capabilities to compute node provider traits
https://review.openstack.org/621494 Add descriptions of numbered resource classes and traits
https://review.openstack.org/636412 Make move_allocations handle empty source allocations (Part of a series on cross-cell resize)
https://review.openstack.org/#/q/topic:bp/count-quota-usage-from-placement Using placement (from nova) for counting (some of) quota.

Not Nova

https://review.openstack.org/#/q/topic:tripleo-nova-placement-removal
https://review.openstack.org/#/q/topic:tripleo-placement-extraction
https://review.openstack.org/#/q/topic:minimum-bandwidth-allocation-placement-api Neutron side of minimum bandwidth.
https://review.openstack.org/#/q/topic:puppet-placement-extraction
https://review.openstack.org/#/q/bp/no-affinity-instance-reservation Blazar reservation handling, including some manipulation of inventory in placement.
https://review.openstack.org/633204 Blazar: Retry on inventory update conflict

End

Though this is long, it doesn't really bring us fully up to date. If something is missing that you think is important please let me know. Once I'm back in the flow it should become increasingly complete.

Posted on: Fri 01 March 2019

Category: openstack – Tags: placement