Placement Update 19-32

Posted on: Fri 16 August 2019

Here's placement update 19-32. There will be no update 33; I'm going to take next week off. If there are Placement-related issues that need immediate attention please speak with any of Eric Fried (efried), Balazs Gibizer (gibi), or Tetsuro Nakamura (tetsuro).

Most Important

Same as last week: The main things on the Placement radar are implementing Consumer Types and cleanups, performance analysis, and documentation related to nested resource providers.

A thing we should place on the "important" list is bringing the osc placement plugin up to date. We also need to discuss what would we would like the plugin to be. Is it required that it have ways to perform all the functionality of the API, or is it about providing ways to do what humans need to do with the placement API? Is there a difference?

We decided that consumer types is medium priority: The nova-side use of the functionality is not going to happen in Train, but it would be nice to have the placement-side ready when U opens. The primary person working on it, tssurya, is spread pretty thin so it might not happen unless someone else has the cycles to give it some attention.

On the documentation front, we realized during some performance work last week that it easy to have an incorrect grasp of how same_subtree works when there are more than two groups involved. It is critical that we create good "how to use" documentation for this and other advanced placement features. Not only can it be easy to get wrong, it can be challenge to see that you've got it wrong (the failure mode is "more results, only some of which you actually wanted").

What's Changed

Yet more performance fixes are in the process of merging. Most of these are related to getting _merge_candidates and _build_provider_summaries to have less impact. The fixes are generally associated with avoiding duplicate work by generating dicts of reusable objects earlier in the request. This is possible because of the relatively new RequestWideSearchContext. In a request that returns many provider summaries _build_provider_summaries continues to have a significant impact because it has to create many objects but overall everything is much less heavyweight. More on performance in Themes, below.
The combination of all these performance fixes, and because of microversions, makes it reasonable for anyone running placement in a resource constrained environment (or simply wanting things to be faster) to consider running Train placement with any release of OpenStack. Obviously you should test it first, but it is worth investigating. More information on how to achieve this can be found in the upgrade to stein docs

Stories/Bugs

(Numbers in () are the change since the last pupdate.)

There are 23 (1) stories in the placement group. 0 (0) are untagged. 4 (1) are bugs. 4 (0) are cleanups. 11 (0) are rfes. 4 (0) are docs.

If you're interested in helping out with placement, those stories are good places to look.

Placement related nova bugs not yet in progress on launchpad: 18 (1).
Placement related nova in progress bugs on launchpad: 4 (-1).

osc-placement

osc-placement is currently behind by 12 microversions.

https://review.opendev.org/666542 Add support for multiple member_of. There's been some useful discussion about how to achieve this, and a consensus has emerged on how to get the best results.
https://review.opendev.org/640898 Adds a new '--amend' option which can update resource provider inventory without requiring the user to pass a full replacement for inventory. This has been broken up into three patches to help with review.

Main Themes

Consumer Types

Adding a type to consumers will allow them to be grouped for various purposes, including quota accounting.

https://review.opendev.org/#/q/topic:bp/support-consumer-types A WIP, as microversion 1.37, has started.

As mentioned above, this is currently paused while other things take priority. If you have time that you could spend on this please respond here expressing that interest.

Cleanup

Cleanup is an overarching theme related to improving documentation, performance and the maintainability of the code. The changes we are making this cycle are fairly complex to use and are fairly complex to write, so it is good that we're going to have plenty of time to clean and clarify all these things.

As said above, there's lots of performance work in progress. We'll need to make a similar effort with regard to docs. For example, all of the coders involved in the creation and review of the same_subtree functionality struggle to explain, clearly and simply, how it will work in a variety of situations. We need to enumerate the situations and the outcomes, in documentation.

One outcome of this work will be something like a Deployment Considerations document to help people choose how to tweak their placement deployment to match their needs. The simple answer is use more web servers and more database servers, but that's often very wasteful.

On the performance front, there is one major area of impact which has not received much attention yet. When requesting allocation candidates (or resource providers) that will return many results the cost of JSON serialization is just under one quarter of the processing time. This is to be expected when the response body is 2379k big, and 154000 lines long (when pretty printed) for 7000 provider summaries and 2000 allocation requests.

But there are ways to fix it. One is to ask more focused questions (so fewer results are expected). Another is to limit=N the results (but this can lead to issues with migrations).

Another is to use a different JSON serializer. Should we do that? It make a big difference with large result sets (which will be common in big and sparse clouds).

Other Placement

Miscellaneous changes can be found in the usual place.

There are two os-traits changes being discussed. And zero os-resource-classes changes.

Other Service Users

New discoveries are added to the end. Merged stuff is removed. Anything that has had no activity in 4 weeks has been removed.

https://review.openstack.org/#/q/topic:bug/1819923 Nova: nova-manage: heal port allocations
https://review.opendev.org/659233 Cyborg: Placement report
https://review.opendev.org/662229 helm: add placement chart
https://review.opendev.org/634551 libvirt: report pmem namespaces resources by provider tree
https://review.opendev.org/660852 Nova: Remove PlacementAPIConnectFailure handling from AggregateAPI
https://review.opendev.org/670112 Nova: WIP: Add a placement audit command
https://review.opendev.org/671312 blazar: Fix placement operations in multi-region deployments
https://review.opendev.org/671793 Nova: libvirt: Start reporting PCPU inventory to placement A part of <https://review.opendev.org/#/q/topic:bp/cpu-resources
https://review.opendev.org/#/q/topic:bp/support-move-ops-with-qos-ports Nova: support move ops with qos ports
https://review.opendev.org/666202 Blazar: Create placement client for each request
https://review.opendev.org/667952 nova: Support filtering of hosts by forbidden aggregates
https://review.opendev.org/669079 blazar: Send global_request_id for tracing calls
https://review.opendev.org/670696 tempest: Add placement API methods for testing routed provider nets
https://review.opendev.org/672678 openstack-helm: Build placement in OSH-images
https://review.opendev.org/674129 Correct global_request_id sent to Placement
https://review.opendev.org/#/q/topic:bp/cross-cell-resize Nova: cross cell resize
https://review.opendev.org/674524 Nova: Scheduler translate properties to traits
https://review.opendev.org/623558 Nova: single pass instance info fetch in host manager
https://review.opendev.org/674708 Zun: [WIP] Claim container allocation in placement

End

Have a good next week.

Posted on: Fri 16 August 2019

Category: openstack – Tags: placement