Copyright 2015 Hewlett-Packard Development Company, L.P.

This work is licensed under a Creative Commons Attribution 3.0
Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode

shade: A library that understands clouds

Infra uses multiple clouds and as a result has learned a lot about what needs to be done to do that. In the interest of being good citizens, instead of that knowledge being inside of nodepool, it should be in a reusable library.

Problem Description

As much as OpenStack promises a utopian future where an application can be written once and target multiple clouds that run OpenStack, the reality is that deployer choice leaks through the abstractions to the point where the end user must know about it. This causes logic to require a-priori knowledge about clouds, as well as complex logic even on discoverable differences.

The current user interface libraries, python-*client, are particularly user unfriendly as they were primarily written with server-to-server communication in mind. They were also each designed completely differently so that an application which uses more than one OpenStack feature becomes quickly confusing to write.

In addition to Infra, ansible has a set of modules that focus on creating and managing cloud resources. As part of using ansible to orchestrate puppet, it only makes sense for Infra to use ansible to manage its resources, which means that the logic Infra has learned about how that works should be applicable. Specifics on using ansible for that purpose are out of scope of this spec, but ansible upstream as a consumer is an important design consideration.

Proposed Change

The shade library will handle all of this. It will contain the logic learned from nodepool, or moving forward, it will contain any new complex cloud manipulation logic that nodepool needs. It should be considered that nodepool is shade’s primary user.

To that end, shade must support constructs like application based API rate limiting and caching appropriate for long-lived connections.

A consumer of shade should never need to put in logic such as “if my cloud supports X, then do Y, else Z”. There are two situations in which such logic might arise.

Firstly, there are two or more ways of doing the same logical action. An example is getting a floating IP, which could be the purview of neutron or of nova. shade should present a general create_floating_ip to the user and hide all details about where it came from.

Secondly, there is functionality that simply does not exist on a cloud. For example, some clouds are deployed without trove. In that case, the user will receive an error message stating that the selected cloud does not support managing trove resources.

The python-*client libraries are not written with end users in mind. They have, as their primary use case, the enabling of server to server communication. As such, they make a set of assumptions that is not in keeping with a consumer point of view. Their use should be replaced by python-openstacksdk once it is ready. However, it is not, so in the mean time the python-*client libraries need to be used. As the future plan is to replace them, all objects and exceptions they return should be expressly hidden, even though masking exceptions is considered poor form.

A future state could be imagined where shade and python-openstacksdk merge, but it does not seem to be the primary concern of either library at the moment. If it did happen, it would likely be as a “simple” API or something on top of or to the side of the rest of the SDK. The reasons for this largely is that python-openstacksdk is more concerned with providing an SDK to program the OpenStack APIs with - and shade is more concerned with hiding the ways in which deployers have chosen to do things that leak through the API. It is likely that a future state where shade is depreciated is one in which the issues it deals with are bundled into the server APIs. In this instance, a layer of business logic is not needed.

Passthrough access to the underlying Client objects is useful for phased adoption of shade. Before 1.0 is released, removal should be considered, or hidden behind a disableable warning. This is to ensure a user has to explicity opt-in knowing that they are not part of the API.

ansible is the second user of shade. The main addition this brings is the need for idempotent operations. The ansible modules must have enough in the API to be able to provide that without large amounts of repeated logic in the modules themselves. In fact, most of the ansible modules should actually contain very little code that is not related to ansible argument processing or interpretation of results into a suitable format.

Finally, it is not shade’s purpose in life to express what is or is not OpenStack, nor to be involved in such categorizations. Its job is to improve the end user’s experience. For that reason, shade should take a maximal approach to including support for things. If someone wants to add support for designate or magnum or manila or whatever, that’s awesome.

It is a conscious and active decision to not use a plugin interface for this. Because again shade exists to reduce the cognitive burden on the user, the user should not have to know to install plugins to be able to use their cloud. The two main reasons for pluggable clients in the past is:

Strict policies on what is ‘Integrated’
To enable proprietary extensions

The first is no longer a problem for OpenStack broadly, and even if it was it’s still not a practical issue for an Infra project.

The second is the thing that will ultimately cause OpenStack to die if it is allowed to continue. While the right of people to choose to destroy all the goodness in the world is an important right for them to have, there is no need for Infra to involve itself such a tragedy.

Anything that’s in shade needs to be testable by running shade functional tests against a devstack in the Infra gates.

There is currently one exception to the testable in Infra gates, which is that the Rackspace Task API for Glance does not work in devstack, so we cannot test it. We have an exception for this because at the moment, nodepool must use that API, and it is an API that exists in glance, even if the backing code is broken. However, the general rule stands, and any violations of that rule need to be carefully considered exceptions - and probably accompanied by a large amount of complaining.

Alternatives

We could ignore writing a library and write all of our logic directly in nodepool. This is problematic because it causes a lot of really useful code and logic to not be easily reusable by the community at large.

We could write all of the logic directly in the ansible modules upstream and then have nodepool turn into an engine which consumes the ansible modules. This is more tempting, but ansible does not support long-lived objects, which means that we’d be execing ansible on every operation which seems rather extreme. It also means that people not using ansible would be unable to benefit from the logic.

We could improve the client libraries or python-openstacksdk. We’ve tried to include richer logic in the client libraries and have been told it’s not what they are for. The python-openstacksdk is still young and we’ve been told it’s not ready for production use yet. We need some of the logic for shade now, so the timescale for getting it done in python-openstacksdk isn’t very workable.

Implementation

Assignee(s)

Primary assignee:: mordred
Additional assignee(s):: Shrews greghaynes dguerri TheJulia Spamaps

Gerrit Topic

shade is a library itself, so there is no dedicated gerrit topic.

Work Items

Implement Image uploading for nodepool
Get to feature parity with nodepool on floating-ips and server creation
Implement ansible modules for every function in shade

Repositories

openstack-infra/shade

Servers

None

DNS Entries

None

Documentation

shade needs developer documentation of its API

Security

None

Testing

shade should have both unit tests and functional tests. The functional tests should run against devstack VMs. If a developer chooses to, they should be able to manually run functional tests against live clouds, since the purpose of shade is to enable use of myriad clouds, not to support or expose theoretical APIs.

Dependencies

None