OpenStack Denver PTG

Posted on: Mon 02 October 2017

Because it's a thing, here's my summary from the most recent OpenStack PTG in Denver. This was the second such event. A PTG is a "Project Teams Gathering". This means it is a time for contributors to the various OpenStack projects to do detailed planning for the coming development cycle, without the conference obligations normally associated with summit.

Denver was the second PTG. The first was in Atlanta (I wrote a post about that one too). It's pretty clear that the organizers took the feedback from Atlanta to heart when orchestrating Denver:

In the entire six days of OpenStack work (the first day there was a board meeting) I experienced a coffee drought only once. This is a huge improvement over Atlanta (and the intervening summit in Boston).
The food was much better. I looked forward to lunch.

Despite the trains the facilities at the hotel worked very well for what we were there to do: sit in a room and talk.

The week was divided up into two sections: The first two days oriented towards cross-project or horizontal teams; the latter three days more project specific. In the most recent TC report I've already reported on the TC-related work. The rest of this discusses API SIG and Nova.

API

In Atlanta the API room ended up being something of a destination. In part this was because people didn't know where else to go but it was also because were talking about formalizing the interoperability guidelines and thus talking about microversions and that tends to draw the crowds.

(Etherpad from the API room.)

This time around microversions were not on the agenda. Instead we chatted about capabilities, the API working group becoming a SIG, and reaching out to developers of SDKs that happen to support OpenStack.

Capabilities is an overloaded term. In the context of APIs it has at least four meanings:

What can be done with this cloud?
What can be done with this service (in this cloud)?
What can be done with this type of resource (in this service)?
What can be done with this instance (of this type of resource)?

Each of these can change according to the authorization of the requesting user.

For the first, there's work in progress on a cloud profile document which will, in broad strokes, describe the bones of a single OpenStack deployment. This is useful to prescribe because openstack-infra already has lots of experience with needing this, due to the fact that they use multiple clouds.

For the other meanings we are less sure of the needs, so some exploration is needed. The Cinder project has committed to exploring service-level capabilities. From there a standard will evolve. The hope is that we will be able to have something consistent such that discovery of capabilities follows the same pattern in each service. Otherwise we're doing a huge disservice to API consumers.

Becoming a SIG is an effort to ensure that all people who are interested in OpenStack APIs have a place to collaborate. In the past, because the vast majority of working group effort was in improving or correcting the implementations of service APIs, there was a perception that the group was for developers only. This has never been the intent and the hope is that now anyone who uses, makes, or makes tools for, OpenStack APIs can use the SIG as a meeting place.

One major audience is developers of SDKs (such as gophercloud) that consume OpenStack APIs. We discussed strategies for implementing microversions (the topic got in after all!) in clients. One strategy is to be opinionated about which microversion for any given request is "best" but also allow the caller to declare otherwise. Supporting this kind of thing in clients is complex ("Does this mean we need to support every single microversion ever?") but is the real cost of making clients interoperate with different deployments that may be at different points in apparent time ("Yes, mostly.").

There was also discussion about coming up with straightforward ways to evaluate existing client implementations and bless them as preferred solutions, not because we like them, but because they work correctly. To make that most effective a plan is underway to provide testing resources to SDK developers so they have known good OpenStack clouds to test against.

Nova

I traditionally dread any in-person Nova planning event. I like seeing the people, and it's great to share ideas but the gap between the amount of stuff we need to talk about and actually get around to talking about is so astronomically huge that it is difficult to not become frustrated, depressed and even angry.

A cycle retrospective was the first item on the three day agenda. It wasn't really a retrospective. People are sufficiently burnt from lack of change, too much work, or inability to really talk in earnest about the issues that only token gestures were made to addressing anything. The most concrete statements were "try harder to not get into nit-picky arguments" and "merge sooner in the cycle so we can catch global breakages and avoid end-of-cycle chaos". Both of these are reasonable things if taken at face value, but if our behavior since the PTG is any indicator we're going to struggle to live up to those plans.

A great deal of the time at the PTG was devoted to placement and placement-related topics. Part of this is because placement is in the middle of everything, part of it is because lots of "enhanced platform" work is blocked on placement, and part of it is because those of us who are placement people are prolix. Our main accomplishment was to limit placement priorities to something potentially realistic:

implementing alternate selected hosts to support retries in cellsv2
getting allocation handling for migrations using migration uuids
implementing nested resource providers

Typically, these are all bigger than they sound. There's some additional information in one my recent resource provider updates.

Another large topic, which will have implications in placement, is coming up with new ways to generically support device management that can deal with PCI devices, FPGA, GPUs and such things in libvirt environments as well as in other hypervisor setups (PowerVM, VMWare) where such devices are not locally visible in the filesystem. The outcome is we're not going to work on that yet, but we made sure that work happening now with nested resource providers will not limit our options later. And that we're going to stop using the PCI whitelist for everything related to PCI devices and take it back to only being a whitelist. Eventually.

The next in person gathering will be in Sydney for the next summit, including the "Forum" which is oriented towards engaging and extracting feedback from users, operators and other developers to make long term plans. I'm hoping that we can limit the amount of "placement, placement, placement" at the forum as we've got enough to work with for now and there are plenty of other topics that need attention.

As reported in the last TC report I'm on the hook to report at summit on the community health issues related to "developer happiness". I feel pretty strongly that some of the issues not addressed in the Nova retrospective are strong factors in these community health issues but I'm as yet unsure how to express them or address them. This is not as simple as "Nova needs more cores" (something that has been stated in the past). The unhealthiness is just as much an issue for cores (who are overstretched, too often pre-disposed to heroics) as non-cores. If you have ideas, please let me know, and I'll work to integrate them. I'll be writing more about this.

Posted on: Mon 02 October 2017

Category: openstack – Tags: opensource

OpenStack Denver PTG

API

Nova

Next