Copyright 2019 Red Hat Inc.

This work is licensed under a Creative Commons Attribution 3.0
Unported License.
http://creativecommons.org/licenses/by/3.0/legalcode

Retire static.openstack.org

Include the URL of your StoryBoard story:

https://storyboard.openstack.org/#!/story/2006598

Move the services provided by static.openstack.org into less centralised approaches more consistent with modern deployment trends.

Problem Description

The static.openstack.org host is a monolithic server providing various hosting services via a large amount of volume-attached storage.

The immediate problem is it currently running Ubuntu Trusty which is reaching the end of its supported life.

The secondary problems are twofold:

Firstly, we would like to move the various publishing and hosting operations from centralised volumes on a single server to our AFS distributed file-system.

Secondly, we would like to make the hosting portion more OpenDev compatible; this means avoiding working on legacy deployment methods (i.e. puppet) and integrating with our general idea of a “whitebox” service that can be used by many different projects.

Thus we propose breaking up the services it offers to utlise more modern infrastructure alternatives and retiring the host.

Proposed Change

We can break the services down

Log storage: Legacy log storage (~14tb)
Redirects: Apache service redirects a number of legacy URLs to new locations
Static site serving: 100gb attached partition holding various static sites (i.e. plain HTML publishing, no middleware, etc)
Tarball: 512gb partition which holds and publishes release tarballs for all projects.

Alternatives

apt-get dist-ugprade the host to a more recent distribution, fix any puppet issues and ignore it until next time it needs updating.

Implementation

Assignee(s)

Primary assignee:: TBD

Gerrit Topic

Use Gerrit topic “static-services” all patches related to this spec.

git-review -t static-services

Work Items

Log storage

OpenDev CI logs have been moved to various object-storage backends provided by donors. The existing logs will age out per our existing old-log cleanup jobs.

Since logs were always ephemeral there should be no issues with old links. For clarity we will remove (rather than redirect) the logs.openstack.org DNS entry so there is no confusion that logs might still live there.

Workitems:

remove logs.openstack.org DNS entries after old logs entries have cleared out

Legacy redirects

The following do straight redirects from their config hostnames to docs.openstack.org

50-cinder.openstack.org.conf
50-devstack.org.conf
50-glance.openstack.org.conf
50-horizon.openstack.org.conf
50-keystone.openstack.org.conf
50-nova.openstack.org.conf
50-swift.openstack.org.conf

The following have slightly different semantics

50-ci.openstack.org.conf
- /nodepool, /shade, /zuul, etc all to docs; see https://opendev.org/opendev/system-config/src/branch/master/modules/openstack_project/templates/ci.vhost.erb
50-qa.openstack.org.conf
- currently redirects to broken link https://docs.openstack.org/developer/qa

The following redirects to openstack.org

50-summit.openstack.org.conf

Clearly there is a need for a generic ability to redirect various URLs as things change over time.

We will use a single containerised haproxy instance to handle redirects for the OpenDev project. Although initially it will simply be handling 302 redirects, it is imagined that future services can use it for it’s availability or load-balancing services as well. Note that gitea services also have their own load-balancer; although it reuses all the deployment mechanisms, the production service is kept separately to maintain isolation been probably the most important service (code) and more informational services.

Proof-of-concept reviews are provided at:

https://review.opendev.org/677903 : make haproxy role more generic

https://review.opendev.org/678159 : add a service load balancer

The work items consist of:

approval of the above reviews

starting the production host

iterating the extant DNS records and pointing them to the new load-balancer

OpenDev infrastructure migration

We wish to provide new services only using our latest deployment methods, to avoid introducing even more legacy services and to provide a basis for the migration process to OpenDev services.

Although files02.openstack.org has an existing role as a webserver serving content from the /openstack.org AFS mount, it is configured using legacy puppet. Thus a new server will be provisioned using our Ansible environment, rather than adding more hosts to legacy configuration.

This server should be a “whitebox” server that is capable of serving a range of domains that OpenDev would like to serve. However, it’s role will only to be to serve static directories on AFS volumes. After this process, there will be numerous examples of SSL certificate generation, vhost configuration, AFS volume setup and publishing jobs for any other projects to copy and implement.

Initially this server needs to serve https sites for the replacement services; namely

governance.openstack.org

specs.openstack.org

security.openstack.org

service-types.openstack.org

releases.openstack.org

tarballs.openstack.org

Currently, SSL certificates are manually provisioned and entered into puppet secret data, where they are deployed to the host. We wish to use automatically renewing letsencrypt certificates per our other infrastructure, utilising our DNS based authentication. However, since openstack.org remains administered by external teams in RAX’s propietary environment, we will make an exception and setup DNS validation records manually for these legacy sites until a full migration of openstack.org to OpenDev infrastructure is possible. Other domains will use OpenDev nameservers, which support automated DNS validation renewals.

We will have the new server provisioned and ready before we begin the steps of migrating publishing locations. This means we can debug any setup issues outside production, and effects a zero-downtime cutover when the sites are ready.

Workitems are as follows:

Write roles and tests to provision a new static01.opendev.org server which will be limited to running Apache and serving AFS directories.
Create the server
Create CNAME static.opendev.org which will be the main service hostname, to provide for easier server replacement or other updates in the future.
Pre-provision https certificates for the above listed services
- Using the RAX web interface for name services and the openstack infra permissions, setup _acme-challenge.<service>.openstack.org records as a CNAME to acme.opendev.org.
- Each site should have a separate certificate provisioned. The configuration would be something like
```
letsencrypt_certs:
  governance-openstack-org:
    - governance.openstack.org
  specs.openstack.org:
    - specs.openstack.org
  and.so.on.
```
- Debug any failures; however the theory is (taking one example): the existing letsencrypt roles should request a certificate for governance.openstack.org on static01.opendev.org and receive the authentication key, which is placed in a TXT record in acme.opendev.org. The certificate creation will will trigger a lookup of _acme-challenge.governance.openstack.org which will be a CNAME to acme.opendev.org, which contains the correct TXT record. The certificate is issued on static01.opendev.org.
Preconfigure the vhost configuration for the above sites (using prior provisioned keys for SSL)
Confirm correct operation of the sites with dummy content.

Static hosting

A number of jobs publish directly to /srv/static on the server. These are then served by Apache as static websites.

In general, we want these jobs to publish to our AFS volumes. By publishing to AFS we remove the central point of failure of a single server and it’s attached disks (mitigated by multiple AFS servers and replicas).

The AFS volumes are then served by static01.opendev.org which has a dedicated role as an AFS to HTTP bridge.

The sites in question are:

50-governance.openstack.org.conf * https://governance.openstack.org * main source -> https://opendev.org/openstack/governance-website * published via https://opendev.org/openstack/project-config/src/branch/master/zuul.d/projects.yaml#L2298 * aliases /srv/static/<election|sigs|tc|uc>
50-security.openstack.org.conf * https://security.openstack.org * single repo source -> https://opendev.org/openstack/ossa * deployed by publish-security job -> https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L739
50-service-types.openstack.org.conf * https://service-types.openstack.org * single repo -> https://opendev.org/openstack/service-types-authority * https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L551
50-specs.openstack.org.conf * https://specs.openstack.org * various spec repos; published by openstack-spec-jobs to subdirectories
50-releases.openstack.org.conf * https://releases.openstack.org * generated by -> https://opendev.org/openstack/releases/ * note generates .htaccess with contsraints links, used widely in pip * publish-tox-jobs-static : https://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L685
50-tarballs.openstack.org.conf * https://tarballs.openstack.org * every project’s release jobs

The extant AFS layout has volumes for each project. Thus we will continue this theme and an admin will create one volume for each of the above static sites; e.g.

/afs/openstack.org/project/governance.openstack.org (~200mb)
/afs/openstack.org/project/security.openstack.org (100mb)
/afs/openstack.org/project/service-types.openstack.org (520k)
/afs/openstack.org/project/specs.openstack.org (current 706mb)
/afs/openstack.org/project/releases.openstack.org (current 57mb)
/afs/openstack.org/project/tarballs.openstack.org (current 134gb)

The work items are as follows

Create the volumes for each site as described above
Migrate the extant data to the new volumes. It is impractical to recreate all the sites as it would require triggering many often infrequently updated repos.
Publishing jobs will be updated to use AFS publishing to these new locations. During transition period, we can publish to both locations.
Update the site configuration on static01.opendev.org to serve the site from the new location
We should be able to fully test the new sites at this point with manual host entries. Ensure: * https certificates working correctly * old links remain consistent
For each site, move to production by updating the CNAME entries in the openstack.org domain for the main server to point to static.opendev.org (note, not the server directly, i.e. static01.opendev.org, to give us flexibility in managing the backend service with server replacements or load-balancing in the future). Per prior testing, this should be transparent.
Old publishing jobs removed

Repositories

Unlikley to require new repositories

Servers

a new http server for serving AFS content
A load-balancer server is suggested to host the haproxy container

DNS Entries

Quite a few DNS entries will need to be updated as described

Documentation

Developers should largely not care where the results are published.

Small doc updates for any new services.

A guide to setting up jobs, host configuration, etc. for publishing static data for other projects may be useful.

Security

N/A

Testing

Since all updates are replacements, we can confirm that the new sites are operational before putting them into production. Any DNS switches can be essentially zero impact.

Dependencies

N/A at this time