tripleo-specs

https://blueprints.launchpad.net/tripleo/+spec/upgrades-ci-standalone

The main goal of this work is to improve coverage of service upgrade_tasks in tripleo ci upgrades jobs, by making use of the Standalone_installer_work. Using a standalone node as a single node ‘overcloud’ allows us to exercise both controlplane and dataplane services in the same job and within current resources of 2 nodes and 3 hours. Furthermore and once proven successful this approach can be extended to include even single service upgrades testing to vastly improve on the current coverage with respect to all the service upgrade_tasks defined in the tripleo-heat-templates (which is currently minimal).

Traditionally upgrades jobs have been restricted by resource constraints (nodes and walltime). For example the undercloud and overcloud upgrade are never exercised in the same job, that is an overcloud upgrade job uses an undercloud that is already on the target version (so called mixed version deployment).

A further example is that upgrades jobs have typically exercised either controlplane or dataplane upgrades (i.e. controllers only, or compute only) and never both in the same job, again because constraints. The currently running tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades job for example has 2 nodes, where one is undercloud and one is overcloud controller. The workflow is being exercised, but controller only. Furthermore, whilst the current_upgrade_ci_scenario is only exercising a small subset of the controlplane services, it is still running at well over 140 minutes. So there is also very little coverage with respect to the upgrades_tasks across the many different service templates defined in the tripleo-heat-templates.

Thus the main goal of this work is to use the standalone installer to define ci jobs that test the service upgrade_tasks for a one node ‘overcloud’ with both controlplane and dataplane services. This approach is composable as the services in the stand-alone are fully configurable. Thus after the first iteration of compute/control, we can also define per-service ci jobs and over time hopefully reach coverage for all the services deployable by TripleO.

Finally it is worth emphasising that the jobs defined as part of this work will not be testing the TripleO upgrades workflow at all. Rather this is about testing the service upgrades_tasks specifically. The workflow instead will be tested using the existing ci upgrades job (tripleo-ci-centos-7-scenario000-multinode-oooq-container-upgrades) subject to modifications to strip it down to a bare minimum required (e.g. hardly any services). There are more pointers to this from the discussion at the TripleO-Stein-PTG but ultimately we will have two approximations of the upgrade tested in ci - the service upgrade_tasks as described by this spec, and the workflow itself using a different ci job or modifying the existing one.

Problem Description

As described above we have not been able to have control and dataplane services upgraded as part of the same tripleo ci job. Such a job would have to be 3 nodes for starters (undercloud,controller,compute).

A full upgrade workflow would need the following steps:

deploy undercloud, deploy overcloud
upgrade undercloud
upgrade prepare the overcloud (heat stack update generates playbooks)
upgrade run controllers (ansible-playbook via mistral workflow)
upgrade run computes/storage etc (repeat until all done)
upgrade converge (heat stack update).

The problem being solved here is that we can run only some approximation of the upgrade workflow, specifically the upgrade_tasks, for a composed set of services and do so within the ci timeout. The first iteration will focus on modelling a one node ‘overcloud’ with both controller and compute services. If we prove this to be successful we can also consider single-service upgrades jobs (a job for testing just nova,or glance upgrade tasks for example) for each of services that we want to test the upgrades tasks. Thus even though this is just an approximation of the upgrade (upgrade_tasks only, not the full workflow), it can hopefully allow for a wider coverage of services in ci than is presently possible.

One of the early considerations when writing this spec was how we could enforce a separation of services with respect to the upgrade workflow. That is, enforce that controlplane upgrade_tasks and deploy_steps are executed first and then dataplane compute/storage/ceph as is usually the case with the upgrade workflow. However review comments on this spec as well as PTG discussions around it, in particular that this is just some approximation of the upgrade (service upgrade tasks, not workflow) in which case it may not be necessary to artificially induce this control/dataplane separation here. This may need to be revisited once implementation begins.

Another core challenge that needs solving is how to collect ansible playbooks from the tripleo-heat-templates since we don’t have a traditional undercloud heat stack to query. This will hopefully be a lesser challenge assuming we can re-use the transient heat process used to deploy the standalone node. Futhermore discussion around this point at the TripleO-Stein-PTG has informed us of a way to keep the heat stack after deployment with keep-running so we could just re-use it as we would with a ‘normal’ deployment.

Proposed Change

Overview

We will need to define a new ci job in the tripleo-ci_zuul.d_standalone-jobs (preferably following the currently ongoing ci_v3_migrations define this as v3 job).

For the generation of the playbooks themselves we hope to use the ephemeral heat service that is used to deploy the stand-alone node, or use the keep-running option to the stand-alone deployment to keep the stack around after deployment.

As described in the problem statement we hope to avoid the task of having to distinguish between control and dataplane services in order to enforce that controlplane services are upgraded first.

Alternatives

Add another node and have 3 node upgrades jobs together with increasing the walltime but this is not scalable in the long term assuming limited resources!

Security Impact

None

Other End User Impact

None

Performance Impact

None

Other Deployer Impact

More coverage of services should mean less breakage because of upgrades incompatible things being merged.

Developer Impact

Might be easier for developers too who may have limited access to resources to take the reproducer script with the standalone jobs and get a dev env for testing upgrades.

Implementation

Assignee(s)

tripleo-ci and upgrades squads

Work Items

First we must solve the problem of generating the ansible playbooks, that will include all the latest configuration from the tripleo-heat-templates at the time of upgrade (including all upgrade_tasks etc) when there is no undercloud Heat stack to query.

We might consider some non-heat solution by parsing the tripleo-heat-templates but I don’t think that is a feasible solution (re-inventing wheels). There is ongoing work to transfer tasks to roles which is promising and that is another area to explore.

One obvious mechanism to explore given the current tools is to re-use the same ephemeral heat process that the stand-alone uses in deploying the overcloud, but setting the usual ‘upgrade-init’ environment files for a short stack ‘update’. This is not tested at all yet so needs to be investigated further. As identified earlier there is now in fact a keep-running option to the tripleoclient that will keep this heat process around

For the first iteration of this work we will aim to use the minimum possible combination of services to implement a ‘compute’/’control’ overcloud. That is, using the existing services from the current current_upgrade_ci_scenario with the addition of nova-compute and any dependencies.

Finally a third major consideration is how to execute this service upgrade, that is how to invoke the playbook generation and then run the resulting playbooks (it probably doesn’t need to converge if we are just interested in the upgrades tasks). One consideration might be to re-use the existing python-tripleoclient “openstack overcloud upgrade” prepare and run sub-commands. However the first and currently favored approach will be to use the existing stand-alone client commands (tripleo_upgrade tripleo_deploy). So one work item is to try these and discover any modifications we might need to make them work for us.

Items:

Work out/confirm generation the playbooks for the standalone upgrade tasks.
Work out any needed changes in the client/tools to execute the ansible playbooks
Define new ci job in the tripleo-ci_zuul.d_standalone-jobs with control and compute services, that will exercise upgrade_tasks, deployment_tasks and post_upgrade_tasks playbooks.

Once this first iteration is complete we can then consider defining multiple jobs for small subsets of services, or even for single services.

Dependencies

This obviously depends on stand-alone installer

Testing

There will be at least one new job defined here

Documentation Impact

None

References

Problem Description

The soft analysis over the past one to two years is that landing major new features and function in CI is difficult while being interrupted by a constant stream of issues. Each individual is siloed in their own work, feature or section of the production chain and there is very little time for thoughtful peer review and collaborative development.

Policy

Goals

Increase developer focus, decrease distractions, interruptions, and time slicing.
Encourage collaborative team development.
Better and faster code reviews

Team Structure

The Ruck
The Rover
The Sprint Team

The Ruck

One person per week will be on the front lines reporting failures found in CI. The Ruck & Rover switch roles in the second week of the sprint.

Primary focus is to watch CI, report bugs, improve debug documentation.
Does not participate in the sprint
Attends the meetings where the team needs to be represented
Responds to pings on #oooq / #tripleo regarding CI
Reviews and improves documentation
Attends meetings for the group where possible
For identification, use the irc nick $user|ruck

The Rover

The primary backup for the Ruck. The Ruck should be catching all the issues in CI and passing the issues to the Rover for more in depth analysis or resolution of the bug.

Back up for the Ruck
Workload is driven from the tripleo-quickstart bug queue, the Rover is not monitoring CI
A secondary input for work is identified technical debt defined in the Trello board.
Attends the sprint meetings, but is not responsible for any sprint work
Helps to triage incoming gerrit reviews
Responds to pings on irc #oooq / #tripleo
If the Ruck is overwhelmed with any of their responsibilities the Rover is the primary backup.
For identification, use the irc nick $user|rover

The Sprint Team

The team is defined at the beginning of the sprint based on availability. Members on the team should be as focused on the sprint epic as possible. A member of team should spend 80% of their time on sprint goals and 20% on any other duties like code review or incoming high priority bugs that the Rover can not manage alone.

hand off interruptions to the Ruck and Rover as much as possible
focus as a team on the sprint epic
collaborate with other members of the sprint team
seek out peer review regarding sprint work
keep the Trello board updated daily
One can point to Trello cards in stand up meetings for status

The Squads

The squads operate as a subunit of the sprint team. Each squad will operate with the same process and procedures and are managed by the team catalyst.

Current Squads
CI
Responsible for the TripleO CI system ( non-infra ) and build verification.
Tempest
Responsible for tempest development.

Team Leaders

The team catalyst (TC)

The member of the team responsible organizing the group. The team will elect or appoint a team catalyst per release.

organize and plan sprint meetings
collect status and send status emails

The user advocate (UA)

The member of the team responsible for help to prioritize work. The team will elect or appoint a user advocate per release.

organize and prioritize the Trello board for the sprint planning
monitor the board during the sprint.
ensure the right work is being done.

The Squads

There are two squads on the CI team.

tripleo ci
tempest development

Each squad has a UA and they share a TC. Both contribute to Ruck and Rover rotations.

Current Leaders for Rocky

team catalyst (ci, tempest) - Matt Young
user advocate (ci) - Gabriele Cerami
user advocate (tempest) - Chandan Kumar

Sprint Structure

The goal of the sprint is to define a narrow and focused feature called an epic to work on in a collaborative way. Work not completed in the sprint will be added to the technical debt column of Trello.

Note: Each sprint needs a clear definition of done that is documented in the epic used for the sprint.

Sprint Start ( Day 1 ) - 2.5 hours

Sprints are three weeks in length
A planning meeting is attended by the entire team including the Ruck and Rover
Review PTO
Review any meetings that need to be covered by the Ruck/Rover
The UA will present options for the sprint epic
Discuss the epic, lightly breaking each one down
Vote on an epic
The vote can be done using a doodle form
Break down the sprint epic into cards
Review each card
Each card must have a clear definition of done
As a group include as much detail in the card as to provide enough information for an engineer with little to no background with the task.

Sprint End ( Day 15 ) - 2.5 hours

Retrospective
team members, ruck and rover only
Document any technical debt left over from the sprint
Ruck / Rover hand off
Assign Ruck and Rover positions
Sprint demo - when available
Office hours on irc

Scrum meetings - 30 Min

Planning meeting, video conference
Sprint End, video and irc #oooq on freenode
2 live video conference meetings per week
sprint stand up
Other days, post status to the team’s Trello board and/or cards

TripleoO CI Community meeting

A community meeting should be held once a week.
The meeting should ideally be conveniently scheduled immediately after the TripleO community meeting on #tripleo (freenode)
The CI meeting should be announced as part of the TripleO community meeting to encourage participation.

Alternatives & History

In the past the CI team has worked as individuals or by pairing up for distinct parts of the CI system and for certain features. Neither has been overwhelmingly successful for delivering features on a regular cadence.

Implementation

Primary author: Wes Hayutin weshayutin at gmail

Other contributors:

Ronelle Landy rlandy at redhat
Arx Cruz acruz at redhat
Sagi Shnaidman at redhat

Milestones

This document is likely to evolve from the feedback discussed in sprint retrospectives. An in depth retrospective should be done at the end of each upstream cycle.

References

Trello

A Trello board will be used to organize work. The team is expected to keep the board and their cards updated on a daily basis.

https://trello.com/b/U1ITy0cu/tripleo-ci-squad

Dashboards

A number of dashboards are used to monitor the CI

http://cistatus.tripleo.org/
https://dashboards.rdoproject.org/rdo-dev
http://zuul-status.tripleo.org/

Team Notes

https://etherpad.openstack.org/p/tripleo-ci-squad-meeting

Bug Queue

http://tinyurl.com/yag6y9ne

Revision History

Revisions
Release Name	Description
Rocky	April 16 2018

Note

This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode

https://blueprints.launchpad.net/tripleo/+spec/zero-footprint

This spec introduces support for an installer mode which has zero (or at least much less) dependencies than we do today. It is meant to be an iteration of the Undercloud and All-In-One (standalone) installers that allows you to end up with the same result without having to install all of the TripleO dependencies on your host machine.

Problem Description

Installing python-tripleoclient on a host machine currently installs a lot of dependencies many of which may be optional for smaller standalone type installations. Users of smaller standalone installations can have a hard time understanding the differences between what TripleO dependencies get installed vs which services TripleO installs.

Additionally, some developers would like a fast-track way to develop and run playbooks without requiring local installation of an Undercloud which in many cases is done inside a virtual machine to encapsulate the dependencies that get installed.

Proposed Change

A new zero footprint installer can help drive OpenStack Tripleoclient commands running within a container. Using this approach you can:

Generate Ansible playbooks from a set of Heat templates (tripleo-heat-templates), Heat environments, and Heat parameters exactly like we do today using a Container. No local dependencies would be required to generate the playbooks.
(optionally) Execute the playbooks locally on the host machine. This would require some Ansible modules to be installed that TripleO depends on but is a much smaller footprint than what we require elsewhere today.

Alternatives

Create a subpackage of python-tripleoclient which installs less dependencies. The general footprint of required packages would still be quite high (lots of OpenStack packages will still be installed for the client tooling).

Or do nothing and continue to use VMs to encapsulate the dependencies for an Undercloud/All-In-One installer and generate Ansible playbooks. Setting up a local VM requires more initial setup and dependencies however and is heavier than just using a local container to generate the same playbooks.

Security Impact

As a container will be used to generate Ansible playbooks the user may need to expose some local data/files to the installer container. This is likely a minimal concern as we already require this data to be exposed to the Undercloud and All-In-One installers.

Other End User Impact

None

Performance Impact

Faster deployment and testing of local All-On-One setups.

Other Deployer Impact

None

Developer Impact

Faster deployment and testing of local All-On-One setups.

Implementation

Assignee(s)

Primary assignee:: dprince

Work Items

A new ‘tripleoclient’ container
New project to drive the installation (Talon?)
Continue to work on refining the Ansible playbook modules to provide a cleaner set of playbook dependencies. Specifically those that depend on the any of the traditional TripleO/Heat agent hooks and scripts.
documentation updates

Dependencies

None.

Testing

This new installer can likely suppliment or replace some of the testing we are doing for All-In-One (standalone) deployments in upstream CI.

Documentation Impact

Docs will need to be updated.

References

None

https://blueprints.launchpad.net/tripleo/+spec/inflight-validations

Currently, we don’t have any way to run validations inside a deploy run. This spec aims to provide the necessary information on how to implement such in-flight validations for an overcloud deploy.

Problem Description

Currently, operators and developers have to wait a long time before getting an error in case a service isn’t running as expected.

This leads to loss of time and resources.

Proposed Change

Overview

After each container/service is started, a new step is added to run one or more validations on the deployed host in order to ensure the service is actually working as expected at said step.

These validations must not use Mistral Workflow, in order to provide support for the undercloud/standalone case.

The best way to push those validations would be through the already existing deploy_steps_tasks keywork. A validation should be either at the start of the next step, or at the end of the current step we want to check.

The validations should point to an external playbook, for instance hosted in tripleo-validations. If there isn’t real use to create a playbook for the validation, it might be inline - but it must be short, for example a single test for an open port.

Alternatives

There isn’t really other alternative. We might think running the validation ansible playbook directly is a good idea, but it will break the wanted convergence with the UI.

For now, there isn’t such validations, we can start fresh.

Security Impact

No security impact.

Upgrade Impact

If a service isn’t starting properly, the upgrade might fail. This is also true for a fresh deploy.

We might want different validation tasks/workflows if we’re in an upgrade state.

Other End User Impact

End user will get early failure in case of issues detected by the validations. This is an improvement, as for now it might fail at a later step, and might break things due to the lack of valid state.

Performance Impact

Running in-flight validation WILL slow the overall deploy/upgrade process, but on the other hand, it will ensure we have a clean state before each step.

Other Deployer Impact

No other deployer impact.

Developer Impact

Validations will need to be created and documented in order to get proper runs.

Implementation

Assignee(s)

Who is leading the writing of the code? Or is this a blueprint where you’re throwing it out there to see who picks it up?

If more than one person is working on the implementation, please designate the primary author and contact.

Primary assignee:: cjeanner
Other contributors:: <launchpad-id or None>

Work Items

Add new hook for the validation_tasks
Provide proper documentation on its use

Dependencies

Please keep in mind the Validation Framework spec when implementing things: https://review.openstack.org/589169

Testing

TBD

Documentation Impact

What is the impact on the docs? Don’t repeat details discussed above, but please reference them here.

References

https://review.openstack.org/589169

We need a way to verify a Highly Available TripleO deployment with proper tests that check if the HA bits are behaving correctly.

Problem Description

Currently, we test HA behavior of TripleO deployments only by deploying environments with three controllers and see if we’re able to spawn an instance, but this is not enough.

There should be a way to verify the HA capabilities of deployments, and if the behavior of the environment is still correct after inducted failures, simulated outages and so on.

This tool should be a standalone component to be included by the user if necessary, without breaking any of the dynamics present in TripleO.

Proposed Change

Overview

The proposal is to create an Ansible based project named tripleo-ha-utils that will be consumable by the various tools that we use to deploy TripleO environments like tripleo-quickstart or infrared or by manual deployments.

The project will initially cover three principal roles:

stonith-config: a playbook used to automate the creation of fencing devices in the overcloud;
instance-ha: a playbook that automates the seventeen manual steps needed to configure instance HA in the overcloud, test them via rally and verify that instance HA works appropriately;
validate-ha: a playbook that runs a series of disruptive actions in the overcloud and verifies it always behaves correctly by deploying a heat-template that involves all the overcloud components;

Today the project exists outside the TripleO umbrella, and it is named tripleo-quickstart-utils [1] (see “Alternatives” for the historical reasons of this name). It is used internally inside promotion pipelines, and has also been tested with success in RDOCloud.

Pluggable implementation

The base principle of the project is to give people the ability to integrate the first roles with whatever kind of test. For example, today we’re using a simple bash framework to interact with the cluster (so pcs commands and other interactions), rally to test instance-ha and Ansible itself to simulate full power outage scenarios. The idea is to keep this pluggable approach leaving the final user the choice about what to use.

Retro compatibility

One of the aims of this project is to be retro-compatible with the previous version of OpenStack. Starting from Liberty, we cover instance-ha and stonith-config Ansible playbooks for all the releases. The same happens while testing HA since all the tests are plugged in depending on the release.

Alternatives

While evaluating alternatives, the first thing to consider is that this project aims to be a TripleO-centric set of tools for HA, not a generic OpenStack’s one. We want tools to help the user answer questions like “Is the Galera bundle cluster resource able to tolerate a stop and a consecutive start without affecting the environment capabilities?” or “Is the environment able to evacuate instances after being configured for Instance HA?”. And the answer we want is YES or NO.

tripleo-validations: the most logical place to put this, at least
looking at the name, would be tripleo-validations. By talking with folks working on it, it came out that the meaning of tripleo-validations project is not doing disruptive tests. Integrating this stuff would be out of scope.
tripleo-quickstart-extras: apart from the fact that this is not
something meant just for quickstart (the project supports infrared and “plain” environments as well) even if we initially started there, in the end, it came out that nobody was looking at the patches since nobody was able to verify them. The result was a series of reviews stuck forever. So moving back to extras would be a step backward.

Other End User Impact

None. The good thing about this solution is that there’s no impact for anyone unless the solution gets loaded inside an existing project. Since this will be an external project, it will not impact anything of the current stuff.

Performance Impact

None. Unless the deployments, the CI runs or whatever include the roles there will be no impact, and so the performances will not change.

Implementation

Primary assignees:

rscarazz

Work Items

Import the tripleo-quickstart-utils [1] as a new repository and start new deployments from there.

Testing

Due to the disruptive nature of these tests, the TripleO CI should not be updated to include these tests, mostly because of timing issues. This project should remain optionally usable by people when needed, or in specific CI environments meant to support longer than usual jobs.

Documentation Impact

All the implemented roles are today fully documented in the tripleo-quickstart-utils [1] project, so importing its repository as is will also give its full documentation.

References

[1] Original project to import as new: https://github.com/redhat-openstack/tripleo-quickstart-utils

Launchpad blueprint:

https://blueprints.launchpad.net/tripleo/+spec/podman-support

There is an ongoing desire to manage TripleO containers with a set of tools designed to solve complex problems when deploying applications. The containerization of TripleO started with a Docker CLI implementation but we are looking at how we could leverage the container orchestration on a Kubernetes friendly solution.

Problem Description

There are three problems that this document will cover:

There is an ongoing discussion on whether or not Docker will be maintained on future versions of Red Hat platforms. There is a general move on OCI (Open Containers Initiative) conformant runtimes, as CRI-O (Container Runtime Interface for OCI).
The TripleO community has been looking at how we could orchestrate the containers lifecycle with Kubernetes, in order to bring consistency with other projects like OpenShift for example.
The TripleO project aims to work on the next version of Red Hat platforms, therefore we are looking at Docker alternatives in Stein cycle.

Proposed Change

Introduction

The containerization of TripleO has been an ongoing effort since a few releases now and we’ve always been looking at a step-by-step approach that tries to maintain backward compatibility for the deployers and developers; and also in a way where upgrade from a previous release is possible, without too much pain. With that said, we are looking at a proposed change that isn’t too much disruptive but is still aligned with the general roadmap of the container story and hopefully will drive us to manage our containers with Kubernetes. We use Paunch project to provide an abstraction in our container integration. Paunch will deal with container configurations formats with backends support.

Integrate Podman CLI

The goal of Podman is to allow users to run standalone (non-orchestrated) containers which is what we have been doing with Docker until now. Podman also allows users to run groups of containers called Pods where a Pod is a term developed for the Kubernetes Project which describes an object that has one or more containerized processes sharing multiple namespaces (Network, IPC and optionally PID). Podman doesn’t have any daemon which makes it lighter than Docker and use a more traditional fork/exec model of Unix and Linux. The container runtime used by Podman is runc. The CLI has a partial backward compatibility with Docker so its integration in TripleO shouldn’t be that painful.

It is proposed to add support for Podman CLI (beside Docker CLI) in TripleO to manage the creation, deletion, inspection of our containers. We would have a new parameter called ContainerCli in TripleO, that if set to ‘podman’, will make the container provisionning done with Podman CLI and not Docker CLI.

Because there is no daemon, there are some problems that we needs to solve:

Automatically restart failed containers.
Automatically start containers when the host is (re)booted.
Start the containers in a specific order during host boot.
Provide an channel of communication with containers.
Run container healthchecks.

To solve the first 3 problems, it is proposed to use Systemd:

Use Restart so we can configure a restart policy for our containers. Most of our containers would run with Restart=always policy, but we’ll have to support some exceptions.
The systemd services will be enabled by default so the containers start at boot.
The ordering will be managed by Wants which provides Implicit Dependencies in Systemd. Wants is a weaker version of Requires. It’ll allow to make sure we start HAproxy before Keepalived for example, if they are on the same host. Because it is a weak dependency, they will only be honored if the containers are running on the same host.
The way containers will be managed (start/stop/restart/status) will be familiar for our operators used to control Systemd services. However we probably want to make it clear that this is not our long term goal to manage the containers with Systemd.

The Systemd integration would be:

complete enough to cover our use-cases and bring feature parity with the Docker implementation.
light enough to be able to migrate our container lifecycle with Kubernetes in the future (e.g. CRI-O).

For the fourth problem, we are still investigating the options:

varlink: interface description format and protocol that aims to make services accessible to both humans and machines in the simplest feasible way.
CRI-O: CI-based implementation of Kubernetes Container Runtime Interface without Kubelet. For example, we could use a CRI-O Python binding to communicate with the containers.
A dedicated image which runs the rootwrap daemon, with rootwrap filters to only run the allowed commands. The controlling container will have the rootwrap socket mounted in so that it can trigger allowed calls in the rootwrap container. For pacemaker, the rootwrap container will allow image tagging. For neutron, the rootwrap container will spawn the processes inside the container, so it will need to be a long-lived container that is managed outside paunch.
+———+ +———-+ | | | | | L3Agent +—–+ Rootwrap | | | | | +———+ +———-+
In this example, the L3Agent container has mounted in the rootwrap daemon socket so that it can run allowed commands inside the rootwrap container.

Finally, the fifth problem is still an ongoing question. There are some plans to support healthchecks in Podman but nothing has been done as of today. We might have to implement something on our side with Systemd.

Alternatives

Two alternatives are proposed.

CRI-O Integration

CRI-O is meant to provide an integration path between OCI conformant runtimes and the kubelet. Specifically, it implements the Kubelet Container Runtime Interface (CRI) using OCI conformant runtimes. Note that the CLI utility for interacting with CRI-O isn’t meant to be used in production, so managing the containers lifecycle with a CLI is only possible with Docker or Podman.

So instead of a smooth migration from Docker CLI to Podman CLI, we could go straight to Kubernetes integration and convert our TripleO services to work with a standalone Kubelet managed by CRI-O. We would have to generate YAML files for each container in a Pod format, so CRI-O can manage them. It wouldn’t require Systemd integration, as the containers will be managed by Kubelet. The operator would control the container lifecycle by using kubectl commands and the automated deployment & upgrade process would happen in Paunch with a Kubelet backend.

While this implementation will help us to move to a multi-node Kubernetes friendly environment, it remains the most risky option in term of the quantity of work that needs to happen versus the time that we have to design, implement, test and ship the next tooling before the end of Stein cycle.

We also need to keep in mind that CRI-O and Podman share containers/storage and containers/image libraries, so the issues that we have had with Podman will be hit with CRI-O as well.

Keep Docker

We could keep Docker around and do not change anything in the way we manage containers. We could also keep Docker and make it work with CRI-O. The only risk here is that Docker tooling might not be supported in the future by Red Hat platforms and we would be on our own if any issue with Docker. The TripleO community is always seeking for an healthy and long term collaboration between us and the projects communities that we are interracting with.

Proposed roadmap

In Stein:

Make Paunch support Podman as an alternative to Docker.
Get our existing services fully deployable on Podman, with parity to what we had with Docker.
If we have time, add Podman pod support to Paunch

In “T” cycle:

Rewrite all of our container yaml to the pod format.
Add a Kubelet backend to Paunch (or change our agent tooling to call Kubelet directly from Ansible).
Get our existing service fully deployable via Kublet, with parity to what we had with Podman / Docker.
Evaluate switching to Kubernetes proper.

Security Impact

The TripleO containers will rely on Podman security. If we don’t use CRI-O or varlink to communicate with containers, we’ll have to consider running some containers in privileged mode and mount /var/lib/containers into the containers. This is a security concern and we’ll have to evaluate it. Also, we’ll have to make the proposed solution with SELinux in Enforcing mode.

Docker solution doesn’t enforce selinux separation between containers. Podman does, and there’s currently no easy way to deactivate that globally. So we’ll basically get a more secure containers with Podman, as we have to support separation from the very beginning.

Upgrade Impact

The containers that were managed by Docker Engine will be removed and provisioned into the new runtime. This process will happen when Paunch generates and execute the new container configuration. The operator shouldn’t have to do any manual action and the migration will be automated, mainly by Paunch. The Containerized Undercloud upgrade job will test the upgrade of an Undercloud running Docker containers on Rocky and upgrade to Podman containers on Stein. The Overcloud upgrade jobs will also test.

Note: as the docker runtime doesn’t have the selinux separation, some chcon/relabelling might be needed prior the move to podman runtime.

End User Impact

The operators won’t be able to run Docker CLI like before and instead will have to use Podman CLI, where some backward compatibility is garanteed.

Performance Impact

There are different aspects of performances that we’ll need to investigate:

Container performances (relying on Podman).
How Systemd + Podman work together and how restart work versus Docker engine.

Deployer Impact

There shouldn’t be much impact for the deployer, as we aim to make this change the most transparent as possible. The only option (so far) that will be exposed to the deployer will be “ContainerCli”, where only ‘docker’ and ‘podman’ will be supported. If ‘podman’ is choosen, the transition will be automated.

Developer Impact

There shouldn’t be much impact for the developer of TripleO services, except that there are some things in Podman that slightly changed when comparing with Docker. For example Podman won’t create the missing directories when doing bind-mount into the containers, while Docker create them.

Implementation

Contributors

Bogdan Dobrelya
Cédric Jeanneret
Emilien Macchi
Steve Baker

Work Items

Update TripleO services to work with Podman (e.g. fix bind-mounts issues).
SELinux separation (relates to bind-mounts rights + some other issues when we’re calling iptables/other host command from a containe)
Systemd integration.
Healthcheck support.
Socket / runtime: varlink? CRI-O?
Upgrade workflow.
Testing.
Documentation for operators.

Dependencies

The Podman integration depends a lot on how stable is the tool and how often it is released and shipped so we can test it in CI.
The Healthchecks interface depends on Podman’s roadmap.

Testing

First of all, we’ll switch the Undercloud jobs to use Podman and this work should be done by milestone-1. Both the deployment and upgrade jobs should be switched and actually working. The overcloud jobs should be switched by milestone-2.

We’ll keep Docker testing support until we keep testing running on CentOS7 platform.

Documentation Impact

We’ll need to document the new commands (mainly the same as Docker), and the differences of how containers should be managed (Systemd instead of Docker CLI for example).

References

https://blueprints.launchpad.net/tripleo/+spec/nova-less-deploy

Currently TripleO undercloud uses Heat, Nova, Glance, Neutron and Ironic for provisioning bare metal machines. This blueprint proposes excluding Heat, Nova and Glance from this flow, removing Nova and Glance completely from the undercloud.

Problem Description

Making TripleO workflows use Ironic directly to provision nodes has quite a few benefits:

First and foremost, getting rid of the horrible “no valid hosts found” exception. The scheduling will be much simpler and the errors will be clearer.
Note
This and many other problems with using Nova in the undercloud come from the fact that Nova is cloud-oriented software, while the undercloud is more of a traditional installer. In the “pet vs cattle” metaphore, Nova handles the “cattle” case, while the undercloud is the “pet” case.
Also important for the generic provisioner case, we’ll be able to get rid of Nova and Glance, reducing the memory footprint.
We’ll get rid of pre-deploy validations that currently try to guess what Nova scheduler will expect.
We’ll be able to combine nodes deployed by Ironic with pre-deployed servers.
We’ll become in charge of building the configdrive, potentially putting more useful things there.
Hopefully, scale-up will be less error-prone.

Also in the future we may be able to:

Integrate things like building RAID on demand much easier.
Use introspection data in scheduling and provisioning decisions. Particularly, we can automate handling root device hints.
Make Neutron optional and use static DHCP and/or os-net-config.

Proposed Change

Overview

This blueprint proposes removal replacing the triad Heat-Nova-Glance with Ironic driven directly by Mistral. To avoid placing Ironic-specific code into tripleo-common, a new library metalsmith has been developed and accepted into the Ironic governance.

As part of the implementation, this blueprint proposes completely separting the bare metal provisioning process from software configuration, including the CLI level. This has two benefits:

Having a clear separation between two error-prone processes simplifies debugging for operators.
Reusing the existing deployed-server workflow simplifies the implementation.

In the distant future, the functionality of metalsmith may be moved into Ironic API itself. In this case it will be phased out, while keeping the same Mistral workflows.

Operator workflow

As noted in Overview, the CLI/GUI workflow will be split into hardware provisioning and software configuration parts (the former being optional).

In addition to existing Heat templates, a new file baremetal_deployment.yaml will be populated by an operator with the bare metal provisioning information.
Bare metal deployment will be conducted by a new CLI command or GUI operation using the new deploy_roles workflow:
```
openstackovercloudnodeprovision \
   -obaremetal_environment.yamlbaremetal_deployment.yaml
```
This command will take the input from baremetal_deployment.yaml, provision requested bare metal machines and output a Heat environment file baremetal_environment.yaml to use with the deployed-server feature.

Finally, the regular deployment is done, including the generated file:

openstackoverclouddeploy \
   <othercliarguments> \
   -ebaremetal_environment.yaml \
   -e/usr/share/openstack-tripleo-heat-templates/environments/deployed-server-environment.yaml \
   -e/usr/share/openstack-tripleo-heat-templates/environments/deployed-server-bootstrap-environment-centos.yaml \
   -r/usr/share/openstack-tripleo-heat-templates/deployed-server/deployed-server-roles-data.yaml

For simplicity the two commands can be combined:

openstackoverclouddeploy \
   <othercliarguments> \
   -bbaremetal_deployment.yaml \
   -e/usr/share/openstack-tripleo-heat-templates/environments/deployed-server-environment.yaml \
   -e/usr/share/openstack-tripleo-heat-templates/environments/deployed-server-bootstrap-environment-centos.yaml \
   -r/usr/share/openstack-tripleo-heat-templates/deployed-server/deployed-server-roles-data.yaml

The new argument --baremetal-deployment/-b will accept the baremetal_deployment.yaml and do the deployment automatically.

Breakdown of the changes

This section describes the required changes in depth.

Image upload

As Glance will no longer be used, images will have to be served from other sources. Ironic supports HTTP and file sources from its images. For the undercloud case, the file source seems to be the most straightforward, also the Edge case may require using HTTP images.

To make both cases possible, the openstackovercloudimageupload command will now copy the three overcloud images (overcloud-full.qcow2, overcloud-full.kernel and overcloud-full.ramdisk) to /var/lib/ironic/httpboot/overcloud-images. This will allow referring to images both via file:///var/lib/ironic/httpboot/overcloud.images/... and http(s)://<UNDERCLOUDHOST>:<IPXEPORT>/overcloud-images/....

Finally, a checksum file will be generated from the copied images using:

cd/var/lib/ironic/httpboot/overcloud-imagesmd5sumovercloud-full.*>MD5SUMS

This is required since the checksums will no longer come from Glance.

baremetal_deployment.yaml

This file will describe which the bare metal provisioning parameters. It will provide the information that is currently implicitly deduced from the Heat templates.

Note

We could continue extracting it from the templates well. However, a separate file will avoid a dependency on any Heat-specific logic, potentially benefiting standalone installer cases. It also provides the operators with more control over the provisioning process.

The format of this file resembles one of the roles_data file. It describes the deployment parameters for each role. The file contains a list of roles, each with a name. Other accepted parameters are:

count: number of machines to deploy for this role. Defaults to 1.
profile: profile (compute, control, etc) to use for this role. Roughly corresponds to a flavor name for a Nova based deployment. Defaults to no profile (any node can be picked).
hostname_format: a template for generating host names. This is similar to HostnameFormatDefault of a roles_data file and should use %index% to number the nodes. The default is %stackname%-<rolenameinlowercase>-%index%.
instances: list of instances in the format accepted by deploy_instances workflow. This allows to tune parameters per instance.

Examples

Deploy one compute and one control with any profile:

-name:Compute-name:Controller

HA deployment with two computes and profile matching:

-name:Computecount:2profile:computehostname_format:compute-%index%.example.com-name:Controllercount:3profile:controlhostname_format:controller-%index%.example.com

Advanced deployment with custom hostnames and parameters set per instance:

-name:Computeprofile:computeinstances:-hostname:compute-05.us-west.example.comnics:-network:ctlplanefixed_ip:10.0.2.5traits:-HW_CPU_X86_VMX-hostname:compute-06.us-west.example.comnics:-network:ctlplanefixed_ip:10.0.2.5traits:-HW_CPU_X86_VMX-name:Controllerprofile:controlinstances:-hostname:controller-1.us-west.example.comswap_size_mb:4096-hostname:controller-2.us-west.example.comswap_size_mb:4096-hostname:controller-3.us-west.example.comswap_size_mb:4096

deploy_roles workflow

The workflow tripleo.baremetal_deploy.v1.deploy_roles will accept the information from baremetal_deployment.yaml, convert it into the low-level format accepted by the deploy_instances workflow and call the deploy_instances workflow with it.

It will accept the following mandatory input:

roles: parsed baremetal_deployment.yaml file.

It will accept one optional input:

plan: plan/stack name, used for templating. Defaults to overcloud.

It will return the same output as the deploy_instances workflow plus:

environment: the content of the generated baremetal_environment.yaml file.

Examples

The examples from baremetal_deployment.yaml will be converted to:

-hostname:overcloud-compute-0-hostname:overcloud-controller-0

-hostname:compute-0.example.comprofile:compute-hostname:compute-1.example.comprofile:compute-hostname:controller-0.example.comprofile:control-hostname:controller-1.example.comprofile:control-hostname:controller-2.example.comprofile:control

-hostname:compute-05.us-west.example.comnics:-network:ctlplanefixed_ip:10.0.2.5profile:computetraits:-HW_CPU_X86_VMX-hostname:compute-06.us-west.example.comnics:-network:ctlplanefixed_ip:10.0.2.5profile:computetraits:-HW_CPU_X86_VMX-hostname:controller-1.us-west.example.comprofile:controlswap_size_mb:4096-hostname:controller-2.us-west.example.comprofile:controlswap_size_mb:4096-hostname:controller-3.us-west.example.comprofile:controlswap_size_mb:4096

deploy_instances workflow

The workflow tripleo.baremetal_deploy.v1.deploy_instances is a thin wrapper around the corresponding metalsmith calls.

The following inputs are mandatory:

instances: list of requested instances in the format described in Instance format.
ssh_keys: list of SSH public keys contents to put on the machines.

The following inputs are optional:

ssh_user_name: SSH user name to create, defaults to heat-admin for compatibility.
timeout: deployment timeout, defaults to 3600 seconds.
concurrency: deployment concurrency - how many nodes to deploy at the same time. Defaults to 20, which matches introspection.

Instance format

The instance record format closely follows one of the metalsmith ansible role with only a few TripleO-specific additions and defaults changes.

Either or both of the following fields must be present:

hostname: requested hostname. It is used to identify the deployed instance later on. Defaults to name.
name: name of the node to deploy on. If hostname is not provided, name is also used as the hostname.

The following fields will be supported:

capabilities: requested node capabilities (except for profile and boot_option).
conductor_group: requested node’s conductor group. This is primary for the Edge case when nodes managed by the same Ironic can be physically separated.
nics: list of requested NICs, see metalsmith documentation for details. Defaults to {"network":"ctlplane"} which requests creation of a port on the ctlplane network.
profile: profile to use (e.g. compute, control, etc).
resource_class: requested node’s resource class, defaults to baremetal.
root_size_gb: size of the root partition in GiB, defaults to 49.
swap_size_mb: size of the swap partition in MiB, if needed.
traits: list of requested node traits.
whole_disk_image: boolean, whether to treat the image (overcloud-full.qcow2 or provided through the image field) as a whole disk image. Defaults to false.

The following fields will be supported, but the defaults should work for all but the most extreme cases:

image: file or HTTP URL of the root partition or whole disk image.
image_kernel: file or HTTP URL of the kernel image (partition images only).
image_ramdisk: file or HTTP URL of the ramdisk image (partition images only).
image_checksum: checksum of URL of checksum of the root partition or whole disk image.

Certificate authority configuration

If TLS is used in the undercloud, we need to make the nodes trust the Certificate Authority (CA) that signed the TLS certificates. If /etc/pki/ca-trust/source/anchors/cm-local-ca.pem exists, it will be included in the generated configdrive, so that the file is copied into the same location on target systems.

Outputs

The workflow will provide the following outputs:

ctlplane_ips

mapping of host names to their respective IP addresses on the ctlplane network.

instances

mapping of host names to full instance representations with fields:

node: Ironic node representation.
ip_addresses: mapping of network names to list of IP addresses on them.
hostname: instance hostname.
state: metalsmith instance state.
uuid: Ironic node uuid.

Also two subdicts of instances are provided:

existing_instances: only instances that already existed.
new_instances: only instances that were deployed.

Note

Instances are distinguised by their hostnames.

baremetal_environment.yaml

This file will serve as an output of the bare metal provisioning process. It will be fed into the overcloud deployment command. Its goal is to provide information for the deployed-server workflow.

The file will contain the HostnameMap generated from role names and hostnames, e.g.

parameter_defaults:HostnameMap:overcloud-controller-0:controller-1.us-west.example.comovercloud-controller-1:controller-2.us-west.example.comovercloud-controller-2:controller-3.us-west.example.comovercloud-novacompute-0:compute-05.us-west.example.comovercloud-novacompute-1:compute-06.us-west.example.com

undeploy_instances workflow

The workflow tripleo.baremetal_deploy.v1.undeploy_instances will take a list of hostnames and undeploy the corresponding nodes.

Novajoin replacement

The novajoin service is currently used to enroll nodes into IPA and provide them with TLS certificates. Unfortunately, it has hard dependencies on Nova, Glance and Metadata API, even though the information could be provided via other means. Actually, the metadata API cannot always be provided with Ironic (notably, it may not be available when using isolated provisioning networks).

A potential solution is to provide the required information via a configdrive, and make the nodes register themselves instead.

Alternatives

Do nothing, continue to rely on Nova and work around cases when it does match our goals well. See Problem Description for why it is not desired.
Avoid metalsmith, use OpenStack Ansible modules or Bifrost. They currently lack features (such as VIF attach/detach API) and do not have any notion of scheduling. Implementing sophisticated enough scheduling in pure Ansible seems a serious undertaking.
Avoid Mistral, drive metalsmith via Ansible. This is a potential future direction of this work, but currently it seems much simpler to call metalsmith Python API from Mistral actions. We would anyway need Mistral ( (or Ansible Tower) to drive Ansible, because we need some API level.
Remove Neutron in the same change. Would reduce footprint even further, but some operators may find the presence of an IPAM desirable. Also setting up static DHCP would increase the scope of the implementation substantially and complicate the upgrade even further.
Keep Glance but remove Nova. Does not make much sense, since Glance is only a requirement because of Nova. Ironic can deploy from HTTP or local file locations just as well.

Security Impact

Overcloud images will be exposed to unauthenticated users via HTTP. We need to communicate it clearly that secrets must not be built into images in plain text and should be delivered via configdrive instead. If it proves a problem, we can limit ourselves to providing images via local files.
Note
This issue exists today, as images are transferred via insecure medium in all supported deploy methods.
Removing two services from the undercloud will reduce potential attack surface and simplify audit.

Upgrade Impact

The initial version of this feature will be enabled for new deployments only.

The upgrade procedure will happen within a release, not between releases. It will go roughly as follows:

Upgrade to a release where undercloud without Nova and Glance is supported.
Make a full backup of the undercloud.
Run openstackovercloudimageupload to ensure that the overcloud-full images are available via HTTP(s).

The next steps will probably be automated via an Ansible playbook or a Mistral workflow:

Mark deployed nodes protected in Ironic to prevent undeploying them by mistake.
Run a Heat stack update replacing references to Nova servers with references to deployed servers. This will require telling Heat not to remove the instances.
Mark nodes as managed by metalsmith (optional, but simplifies troubleshooting).
Update node’s instance_info to refer to images over HTTP(s).
Note
This may require temporary moving nodes to maintenance.
Run an undercloud update removing Nova and Glance.

Other End User Impact

Nova CLI will no longer be available for troubleshooting. It should not be a big problem in reality, as most of the problems it is used for are caused by using Nova itself.
metalsmith provides a CLI tool for troubleshooting and advanced users. We will document using it for tasks like determining IP addresses of nodes.
It will no longer be possible to update images via Glance API, e.g. from GUI. It should not be a bit issue, as most of users use pre-built images. Advanced operators are likely to resort to CLI anyway.
No valid host found error will no longer be seen by operators. metalsmith provides more detailed errors, and is less likely to fail because of its scheduling approach working better with the undercloud case.

Performance Impact

A substantial speed-up is expected for deployments because of removing several layers of indirection. The new deployment process will also fail faster if the scheduling request cannot be satisfied.
Providing images via local files will remove the step of downloading them from Glance, providing even more speed-up for larger images.
An operator will be able to tune concurrency of deployment via CLI arguments or GUI parameters, other than nova.conf.

Other Deployer Impact

None

Developer Impact

New features for bare metal provisioning will have to be developed with this work in mind. It may mean implementing something in metalsmith code instead of relying on Nova servers or flavors, or Glance images.

Implementation

Assignee(s)

Primary assignee:: Dmitry Tantsur, IRC: dtantsur, LP: divius

Work Items

Phase 1 (Stein, technical preview):

Update openstackovercloudimageupload to copy images into the HTTP location and generate checksums.
Implement deploy_instances workflow and undeploy_instances workflow.
Update validations to not fail if Nova and/or Glance are not present.
Implement deploy_roles workflow.
Provide CLI commands for the created workflows.
Provide an experimental OVB CI job exercising the new approach.

Phase 2 (T+, fully supported):

Update openstackoverclouddeploy to support the new workflow.
Support scaling down.
Provide a Novajoin replacement.
Provide an upgrade workflow.
Consider deprecating provisioning with Nova and Glance.

Dependencies

metalsmith library will be used for easier access to Ironic+Neutron API.

Testing

Since testing this feature requires bare metal provisioning, a new OVB job will be created for it. Initially it will be experimental, and will move to the check queue before the feature is considered fully supported.

Documentation Impact

Documentation will have to be reworked to explain the new deployment approach. Troubleshooting documentation will have to be updated.

References

This spec describes a pattern which can be used as an alternative to what TripleO does today to allow certain containers (Neutron, etc.) to spawn side processes which require special privs like network namespaces. Specifically it avoids exposing the docker socket or using Podman nsenter hacks that have recently entered the codebase in Stein.

Problem Description

In Queens TripleO implemented a containerized architecture with the goal of containerizing all OpenStack services. This architecture was a success but a few applications had regressions when compared with their baremetal deployed equivalent. One of these applications was Neutron, which requires the ability to spawn long lived “side” processes that are launched directly from the Neutron agents themselves. In the original Queens architecture Neutron launched these side processes inside of the agent container itself which caused a service disruption if the neutron agents themselves were restarted. This was previously not the case on baremetal as these processes would continue running across an agent restart/upgrade.

The work around in Rocky was to add “wrapper” scripts for Neutron agents and to expose the docker socket to each agent container. These wrappers scripts were bind mounted into the containers so that they overwrote the normal location of the side process. Using this crude mechanism binaries like ‘dnsmasq’ and ‘haproxy’ would instead launch a shell script instead of the normal binary and these custom shell scripts relied on the an exposed docker socket from the host to be able to launch a side container with the same arguments supplied to the script.

This mechanism functionally solved the issues with our containerization but exposed some security problems in that we were now exposing the ability to launch any container to these Neutron agent containers (privileged containers with access to a docker socket).

In Stein things changed with our desire to support Podman. Unlike Docker Podman does not include a daemon on the host. All Podman commands are executed via a CLI which runs the command on the host directly. We landed patches which required Podman commands to use nsenter to enter the hosts namespace and run the commands there directly. Again this mechanism requires extra privileges to be granted to the Neutron agent containers in order for them to be able to launch these commands. Furthermore the mechanism is a bit cryptic to support and debug in the field.

Proposed Change

Overview

Use systemd on the host to launch the side process containers directly with support for network namespaces that Neutron agents require. The benefit of this approach is that we no longer have to give the Neutron containers privs to launch containers which they shouldn’t require.

The pattern could work like this:

A systemd.path file monitors a know location on the host for changes. Example (neutron-dhcp-dnsmasq.path):

[Path]PathModified=/var/lib/neutron/neutron-dnsmasq-processes-timestampPathChanged=/var/lib/neutron/neutron-dnsmasq-processes-timestamp[Install]WantedBy=multi-user.target

When systemd.path notices a change it fires the service for this path file: Example (neutron-dhcp-dnsmasq.service):

[Unit]Description=neutron dhcp dnsmasq sync service[Service]Type=oneshotExecStart=/usr/local/bin/neutron-dhcp-dnsmasq-process-syncUser=root

We use the same “wrapper scripts” used today to write two files. The first file is a dump of CLI arguments used to launch the process on the host. This file can optionally include extra data like network namespaces which are required for some neutron side processes. The second file is a timestamp which is monitored by systemd.path on the host for changes and is used as a signal that it needs to process the first file with arguments.

# When a change is detected the systemd.service above executes a script on the: host to cleanly launch containerized side processes. When the script finishes launching processes it truncates the file to start with a clean slate.
# Both the wrapper scripts and the host scripts use flock to eliminate race: conditions which could cause issues in relaunching or missed containers.

Alternatives

With Podman an API like varlink would be an option however it would likely still required exposure to a socket on the host which would involve extra privileges like what we have today. This would avoid the nsenter hacks however.

An architecture like Kubernetes would give us an API which could be used to launch containers directly via the COE.

Additionally an external process manager in Neutron that is “containers aware” could be written to improve either of the above options. The current python in Neutron was writtin primarily for launching processes on baremetal with assumptions that some of the processes it launches are meant to live across a contain restart. Implementing a class that can launch side processes via a clean interface rather than overwriting binaries would be desirable. Classes which supported launching containers via Kubernetes and or Systemd via the host directly could be supported.

Security Impact

This mechanism should allow us to remove some of the container privileges for neutron agents which in the past were used to execute containers. It is a more restrictive crude interface that allows the containers only to launch a specific type of process rather than any container it chooses.

Upgrade Impact

The side process containers should be the same regardless of how they are launched so the upgrade should be minimal.

Implementation

Assignee(s)

Primary assignee:: dan-prince
Other contributors:: emilienm

Work Items

# Ansible playbook to create systemd files, wrappers

# TripleO Heat template updates to use the new playbooks

# Remove/deprecate the old docker.socket and nsenter code from puppet-tripleo

https://blueprints.launchpad.net/tripleo/+spec/upgrades-with-os

Note

Abbreviation “OS” in this spec stands for “operating system”, not “OpenStack”.

So far all our update and upgrade workflows included doing minor operating system updates (essentially a yumupdate) on the machines managed by TripleO. This will need to change as we can’t stay on a single OS release indefinitely – we’ll need to perform a major OS upgrade. The intention is for the TripleO tooling to help with the OS upgrade significantly, rather than leaving this task entirely to the operator.

Problem Description

We need to upgrade undercloud and overcloud machines to a new release of the operating system.

We would like to provide an upgrade procedure both for environments where Nova and Ironic are managing the overcloud servers, and “Deployed Server” environments where we don’t have control over provisioning.

Further constraints are imposed by Pacemaker clusters: Pacemaker is non-containerized, so it is upgraded via packages together with the OS. While Pacemaker would be capable of a rolling upgrade, Corosync also changes major version, and starts to rely on knet for the link protocol layer, which is incompatible with previous version of Corosync. This introduces additional complexity: we can’t do OS upgrade in a rolling fashion naively on machines which belong to the Pacemaker cluster (controllers).

Proposed Change - High Level View

The Pacemaker constraints will be addressed by performing a one-by-one (though not rolling) controller upgrade – temporarily switching to a single-controller cluster on the new OS, and gradually upgrading the rest. This will also require implementation of persistent OpenStack data transfer from older to newer OS releases (to preserve uptime and for easier recoverability in case of failure).

We will also need to ensure that at least 2 ceph-mon services run at all times, so ceph-mon services will keep running even after we switch off Pacemaker and OpenStack on the 2 older controllers.

We should scope two upgrade approaches: full reprovisioning, and in-place upgrade via an upgrade tool. Each come with different benefits and drawbacks. The proposed CLI workflows should ideally be generic enough to allow picking the final preferred approach of overcloud upgrade late in the release cycle.

While the overcloud approach is still wide open, undercloud seems to favor an in-place upgrade due to not having a natural place to persist the data during reprovisioning (e.g. we can’t assume overcloud contains Swift services), but that could be overcome by making the procedure somewhat more manual and shifting some tasks onto the operator.

The most viable way of achieving an in-place (no reprovisioning) operating system upgrade currently seems to be Leapp, “an app modernization framework”, which should include in-place upgrade capabilites.

Points in favor of in-place upgrade:

While some data will need to be persisted and restored regardless of approach taken (to allow safe one-by-one upgrade), reprovisioning may also require managing data which would otherwise persist on its own during an in-place upgrade.
In-place upgrade allows using the same approach for Nova+Ironic and Deployed Server environments. If we go with reprovisioning, on Deployed Server environments the operator will have to reprovision using their own tooling.
Environments with a single controller will need different DB mangling procedure. Instead of system_upgrade_transfer_data step below, their DB data will be included into the persist/restore operations when reprovisioning the controller.

Points in favor of reprovisioning:

Not having to integrate with external in-place upgrade tool. E.g. in case of CentOS, there’s currently not much info available about in-place upgrade capabilities.
Allows to make changes which wouldn’t otherwise be possible, e.g. changing a filesystem.
Reprovisioning brings nodes to a clean state. Machines which are continuously upgraded without reprovisioining can potentially accumulate unwanted artifacts, resulting in increased number of problems/bugs which only appear after an upgrade, but not on fresh deployments.

Proposed Change - Operator Workflow View

The following is an example of expected upgrade workflow in a deployment with roles: ControllerOpenstack, Database, Messaging, Networker, Compute, CephStorage. It’s formulated in a documentation-like manner so that we can best imagine how this is going to work from operator’s point of view.

Upgrading the Undercloud

The in-place undercloud upgrade using Leapp will likely consist of the following steps. First, prepare for OS upgrade via Leapp, downloading the necessary packages:

leappupgrade

Then reboot, which will upgrade the OS:

reboot

Then run the undercloud upgrade, which will bring back the undercloud services (using the newer OpenStack release):

openstacktripleocontainerimagepreparedefault \
    --output-env-filecontainers-prepare-parameter.yamlopenstackundercloudupgrade

If we wanted or needed to upgrade the undercloud via reprovisioning, we would use a backup and restore procedure as currently documented, with restore perhaps being utilized just partially.

Upgrading the Overcloud

Update the Heat stack, generate Heat outputs for building upgrade playbooks:
```
openstackovercloudupgradeprepare<DEPLOYARGS>
```
Notes:
- Among the <DEPLOYARGS> should be containers-prepare-parameter.yaml bringing in the containers of newer OpenStack release.
Prepare an OS upgrade on one machine from each of the “schema-/cluster-sensitive” roles:
```
openstackovercloudupgraderun \
    --tagssystem_upgrade_prepare \
    --limitcontroller-openstack-0,database-0,messaging-0
```
Notes:
- This stops all services on the nodes selected.
- For external installers like Ceph, we’ll have a similar external-upgrade command, which can e.g. remove the nodes from the Ceph cluster:
```
openstackovercloudexternal-upgraderun \
    --tagssystem_upgrade_prepare \
    -esystem_upgrade_nodes=controller-openstack-0,database-0,messaging-0
```
- If we use in-place upgrade:
  - This will run the leappupgrade command. It should use newer OS and newer OpenStack repos to download packages, and leave the node ready to reboot into the upgrade process.
  - Caution: Any reboot after this is done on a particular node will cause that node to automatically upgrade to newer OS.
- If we reprovision:
  - This should persist node’s important data to the undercloud. (Only node-specific data. It would not include e.g. MariaDB database content, which would later be transferred from one of the other controllers instead.)
  - Services can export their upgrade_tasks to do the persistence, we should provide an Ansible module or role to make it DRY.
Upload new overcloud base image:
```
openstackovercloudimageupload--update-existing \
    --image-path/home/stack/new-images
```
Notes:
- For Nova+Ironic environments only. After this step any new or reprovisioned nodes will receive the new OS.
Run an OS upgrade on one node from each of the “schema-/cluster-sensitive” roles or reprovision those nodes.
Only if we do reprovisioning:
```
openstackserverrebuildcontroller-openstack-0openstackserverrebuilddatabase-0openstackserverrebuildmessaging-0openstackovercloudadminauthorize \
    --overcloud-ssh-user<user> \
    --overcloud-ssh-key<path-to-key> \
    --overcloud-ssh-network<ssh-network> \
    --limitcontroller-openstack-0,database-0,messaging-0
```
Both reprovisioning and in-place:
```
openstackovercloudupgraderun \
    --tagssystem_upgrade_run \
    --limitcontroller-openstack-0,database-0,messaging-0
```
Notes:
- This step either performs a reboot of the nodes and lets Leapp upgrade them to newer OS, or reimages the nodes with a fresh new OS image. After they come up, they’ll have newer OS but no services running. The nodes can be checked before continuing.
- In case of reprovisioning:
  - The overcloudadminauthorize will ensure existence of tripleo-admin user and authorize Mistral’s ssh keys for connection to the newly provisioned nodes. The --overcloud-ssh-* work the same as for overclouddeploy.
  - The --tagssystem_upgrade_run is still necessary because it will restore the node-specific data from the undercloud.
  - Services can export their upgrade_tasks to do the restoration, we should provide an Ansible module or role to make it DRY.
- Ceph-mon count is reduced by 1 (from 3 to 2 in most environments).
- Caution: This will have bad consequences if run by accident on unintended nodes, e.g. on all nodes in a single role. If possible, it should refuse to run if –limit is not specified. If possible further, it should refuse to run if a full role is included, rather than individual nodes.
Stop services on older OS and transfer data to newer OS:
```
openstackovercloudexternal-upgraderun \
    --tagssystem_upgrade_transfer_data \
    --limitControllerOpenstack,Database,Messaging
```
Notes:
- This is where control plane downtime starts.
- Here we should:
  - Detect which nodes are on older OS and which are on newer OS.
  - Fail if we don’t find at least one older OS and exactly one newer OS node in each role.
  - On older OS nodes, stop all services except ceph-mon. (On newer node, no services are running yet.)
  - Transfer data from an older OS node (simply the first one in the list we detect, or do we need to be more specific?) to the newer OS node in a role. This is probably only going to do anything on the Database role which includes DBs, and will be a no-op for others.
  - Services can export their external_upgrade_tasks for the persist/restore operations, we’ll provide an Ansible module or role to make it DRY. The transfer will likely go via undercloud initially, but it would be nice to make it direct in order to speed it up.
Run the usual upgrade tasks on the newer OS nodes:
```
openstackovercloudupgraderun \
    --limitcontroller-openstack-0,database-0,messaging-0
```
Notes:
- Control plane downtime stops at the end of this step. This means the control plane downtime spans two commands. We should not make it one command because the commands use different parts of upgrade framework underneath, and the separation will mean easier re-running of individual parts, should they fail.
- Here we start pcmk cluster and all services on the newer OS nodes, using the data previously transferred from the older OS nodes.
- Likely we won’t need any special per-service upgrade tasks, unless we discover we need some data conversions or adjustments. The node will be with all services stopped after upgrade to newer OS, so likely we’ll be effectively “setting up a fresh cloud on pre-existing data”.
- Caution: At this point the newer OS nodes became the authority on data state. Do not re-run the previous data transfer step after services have started on newer OS nodes.
- (Currently upgraderun has --nodes and --roles which both function the same, as Ansible --limit. Notably, nothing stops you from passing role names to --nodes and vice versa. Maybe it’s time to retire those two and implement --limit to match the concept from Ansible closely.)
Perform any service-specific && node-specific external upgrades, most importantly Ceph:
```
openstackovercloudexternal-upgraderun \
    --tagssystem_upgrade_run \
    -esystem_upgrade_nodes=controller-openstack-0,database-0,messaging-0
```
Notes:
- Ceph-ansible here runs on a single node and spawns a new version of ceph-mon. Per-node run capability will need to be added to ceph-ansible.
- Ceph-mon count is restored here (in most environments, it means going from 2 to 3).
Upgrade the remaining control plane nodes. Perform all the previous control plane upgrade steps for the remaining controllers too. Two important notes here:
- Do not run the ``system_upgrade_transfer_data`` step anymore. The remaining controllers are expected to join the cluster and sync the database data from the primary controller via DB replication mechanism, no explicit data transfer should be necessary.
- To have the necessary number of ceph-mons running at any given time (often that means 2 out of 3), the controllers (ceph-mon nodes) should be upgraded one-by-one.
After this step is finished, all of the nodes which are sensitive to Pacemaker version or DB schema version should be upgraded to newer OS, newer OpenStack, and newer ceph-mons.

Upgrade the rest of the overcloud nodes (Compute, Networker, CephStorage), either one-by-one or in batches, depending on uptime requirements of particular nodes. E.g. for computes this would mean evacuating and then also running:

openstackovercloudupgraderun \
    --tagssystem_upgrade_prepare \
    --limitnovacompute-0openstackovercloudupgraderun \
    --tagssystem_upgrade_run \
    --limitnovacompute-0openstackovercloudupgraderun \
    --limitnovacompute-0

Notes:

Ceph OSDs can be removed by the external-upgraderun--tagssystem_upgrade_prepare step before reprovisioning, and after upgraderun command, ceph-ansible can recreate the OSD via the external-upgraderun--tagssystem_upgrade_run step, always limited to the OSD being upgraded:

# Remove OSDopenstackovercloudexternal-upgraderun \
    --tagssystem_upgrade_prepare \
    -esystem_upgrade_nodes=novacompute-0# <<Here the node is reprovisioned and upgraded>># Re-deploy OSDopenstackovercloudexternal-upgraderun \
    --tagssystem_upgrade_run \
    -esystem_upgrade_nodes=novacompute-0

Perform online upgrade (online data migrations) after all nodes have been upgraded:
```
openstackovercloudexternal-upgraderun \
    --tagsonline_upgrade
```
Perfrom upgrade converge to re-assert the overcloud state:
```
openstackovercloudupgradeconverge<DEPLOYARGS>
```

Clean up upgrade data persisted on undercloud:

openstackovercloudexternal-upgraderun \
    --tagssystem_upgrade_cleanup

Additional notes on data persist/restore

There are two different use cases:
- Persistence for things that need to survive reprovisioning (for each node)
- Transfer of DB data from node to node (just once to bootstrap the first new OS node in a role)
The synchronize Ansible module shipped with Ansible seems fitting, we could wrap it in a role to handle common logic, and execute the role via include_role from upgrade_tasks.
We would persist the temporary data on the undercloud under a directory accessible only by the user which runs the upgrade playbooks (mistral user). The root dir could be /var/lib/tripleo-upgrade and underneath would be subdirs for individual nodes, and one more subdir level for services.
- (Undercloud’s Swift also comes to mind as a potential place for storage. However, it would probably add more complexity than benefit.)
The data persist/restore operations within the upgrade do not supplement or replace backup/restore procedures which should be performed by the operator, especially before upgrading. The automated data persistence is solely for upgrade purposes, not for disaster recovery.

Alternatives

Parallel cloud migration. We could declare the in-place upgrade of operating system + OpenStack as too risky and complex and time consuming, and recommend standing up a new cloud and transferring content to it. However, this brings its own set of challenges.
This option is already available for anyone whose environment is constrained such that normal upgrade procedure is not realistic, e.g. in case of extreme uptime requirements or extreme risk-aversion environments.
Implementing parallel cloud migration is probably best handled on a per-environment basis, and TripleO doesn’t provide any automation in this area.
Upgrading the operating system separately from OpenStack. This would simplify things on several fronts, but separating the operating system upgrade while preserving uptime (i.e. upgrading the OS in a rolling fashion node-by-node) currently seems not realistic due to:
- The pacemaker cluster (corosync) limitations mentioned earlier. We would have to containerize Pacemaker (even if just ad-hoc non-productized image).
- Either we’d have to make OpenStack (and dependencies) compatible with OS releases in a way we currently do not intend, or at least ensure such compatibility when running containerized. E.g. for data transfer, we could then probably use Galera native replication.
- OS release differences might be too large. E.g. in case of differing container runtimes, we might have to make t-h-t be able to deploy on two runtimes within one deployment.
Upgrading all control plane nodes at the same time as we’ve been doing so far. This is not entirely impossible, but rebooting all controllers at the same time to do the upgrade could mean total ceph-mon unavailability. Also given that the upgraded nodes are unreachable via ssh for some time, should something go wrong and the nodes got stuck in that state, it could be difficult to recover back into a working cloud.
This is probably not realistic, mainly due to concerns around Ceph mon availability and risk of bricking the cloud.

Security Impact

How we transfer data from older OS machines to newer OS machines is a potential security concern.
The same security concern applies for per-node data persist/restore procedure in case we go with reprovisioning.
The stored data may include overcloud node’s secrets and should be cleaned up from the undercloud when no longer needed.
In case of using the synchronize Ansible module: it uses rsync over ssh, and we would store any data on undercloud in a directory only accessible by the same user which runs the upgrade playbooks (mistral). This undercloud user has full control over overcloud already, via ssh keys authorized for all management operations, so this should not constitute a significant expansion of mistral user’s knowledge/capabilities.

Upgrade Impact

The upgrade procedure is riskier and more complex.
- More things can potentially go wrong.
- It will take more time to complete, both manually and automatically.
Given that we upgrade one of the controllers while the other two are still running, the control plane services downtime could be slightly shorter than before.
When control plane services are stopped on older OS machines and running on newer OS machine, we create a window without high availability.
Upgrade framework might need some tweaks but on high level it seems we’ll be able to fit the workflow into it.
All the upgrade steps should be idempotent, rerunnable and recoverable as much as we can make them so.

Other End User Impact

Floating IP availability could be affected. Neutron upgrade procedure typically doesn’t immediately restart sidecar containers of L3 agent. Restarting will be a must if we upgrade the OS.

Performance Impact

When control plane services are stopped on older OS machines and running on newer OS machine, only one controller is available to serve all control plane requests.
Depending on role/service composition of the overcloud, the reduced throughput could also affect tenant traffic, not just control plane APIs.

Other Deployer Impact

Automating such procedure introduces some code which had better not be executed by accident. The external upgrade tasks which are tagged system_upgrade_* should also be tagged never, so that they only run when explicitly requested.
For the data transfer step specifically, we may also introduce a safety “flag file” on the target overcloud node, which would prevent re-running of the data transfer until the file is manually removed.

Developer Impact

Developers who work on specific composable services in TripleO will need to get familiar with the new upgrade workflow.

Main Risks

Leapp has been somewhat explored but its viability/readiness for our purpose is still not 100% certain.
CI testing will be difficult, if we go with Leapp it might be impossible (more below).
Time required to implement everything may not fit within the release cycle.
We have some idea how to do the data persist/restore/transfer parts, but some prototyping needs to be done there to gain confidence.
We don’t know exactly what data needs to be persisted during reprovisioning.

Implementation

Assignee(s)

Primary assignees::: jistr, chem, jfrancoa
Other contributors::: fultonj for Ceph

Work Items

With aditional info in format: (how much do we know about this task, estimate of implementation difficulty).

(semi-known, est. as medium) Change tripleo-heat-templates + puppet-tripleo to be able to set up a cluster on just one controller (with newer OS) while the Heat stack knows about all controllers. This is currently not possible.
(semi-known, est. as medium) Amend upgrade_tasks to work for Rocky->Stein with OS upgrade.
system_upgrade_transfer_data:
- (unknown, est. as easy) Detect upgraded vs. unupgraded machines to transfer data to/from.
- (known, est. as easy) Stop all services on the unupgraded machines transfer data to/from. (Needs to be done via external upgrade tasks which is new, but likely not much different from what we’ve been doing.)
- (semi-known, est. as medium/hard) Implement an Ansible role for transferring data from one node to another via undercloud.
- (unknown, est. as medium) Figure out which data needs transferring from old controller to new, implement it using the above Ansible role – we expect only MariaDB to require this, any special services should probably be tackled by service squads.
(semi-known, est. as medium/hard) Implement Ceph specifics, mainly how to upgrade one node (mon, OSD, …) at a time.
(unknown, either easy or hacky or impossible :) ) Implement --limit for external-upgraderun. (As external upgrade runs on undercloud by default, we’ll need to use delegate_to or nested Ansible for overcloud nodes. I’m not sure how well –limit will play with this.)
(known, est. as easy) Change update/upgrade CLI from --nodes and --roles to --limit.
(semi-known, est. as easy/medium) Add -e variable pass-through support to external-upgraderun.
(unknown, unknown) Test as much as we can in CI – integrate with tripleo-upgrade and OOOQ.
For reprovisioning:
- (semi-known, est. as medium) Implement openstackovercloudadminauthorize. Should take --stack, --limit, --overcloud-ssh-* params.
- (semi-known, est. as medium/hard) Implement an Ansible role for temporarily persisting overcloud nodes’ data on the undercloud and restoring it.
- (known, est. as easy) Implement external-upgraderun--tagssystem_upgrade_cleanup.
- (unknown, est. as hard in total, but should probably be tackled by service squads) Figure out which data needs persisting for particular services and implement the persistence using the above Ansible role.
For in-place:
- (semi-known, est. as easy) Calls to Leapp in system_upgrade_prepare, system_upgrade_run.
- (semi-known, est. as medium) Implement a Leapp actor to set up or use the repositories we need.

Dependencies

For in-place: Leapp tool being ready to upgrade the OS.
Changes to ceph-ansible might be necessary to make it possible to run it on a single node (for upgrading mons and OSDs node-by-node).

Testing

Testing is one of the main estimated pain areas. This is a traditional problem with upgrades, but it’s even more pronounced for OS upgrades.

Since we do all the OpenStack infra cloud testing of TripleO on CentOS 7 currently, it would make sense to test an upgrade to CentOS 8. However, CentOS 8 is nonexistent at the time of writing.
It is unclear when Leapp will be ready for testing an upgrade from CentOS 7, and it’s probably the only thing we’d be able to execute in CI. The openstackserverrebuild alternative is probably not easily executable in CI, at least not in OpenStack infra clouds. We might be able to emulate reprovisioning by wiping data.
Even if we find a way to execute the upgrade in CI, it might still take too long to make the testing plausible for validating patches.

Documentation Impact

Upgrade docs will need to be amended, the above spec is written mainly from the perspective of expected operator workflow, so it should be a good starting point.

References

In general, TripleO follows the standard “2 +2” review standard, but there are situations where we want to make an exception. This policy is intended to document those exceptions.

Problem Description

Core reviewer time is precious, and there is never enough of it. In some cases, requiring 2 +2’s on a patch is a waste of that core time, so we need to be reasonable about when to make exceptions. While core reviewers are always free to use their judgment about when to merge or not merge a patch, it can be helpful to list some specific situations where it is acceptable and even expected to approve a patch with a single +2.

Part of this information is already in the wiki, but the future of the wiki is in doubt and it’s better to put policies in a place that they can be reviewed anyway.

Policy

Single +2 Approvals

A core can and should approve patches without a second +2 under the following circumstances:

The change has multiple +2’s on previous patch sets, indicating an agreement from the other cores that the overall design is good, and any alterations to the patch since those +2’s must be minor implementation details only - trivial rebases, minor syntax changes, or comment/documentation changes.
Backports proposed by another core reviewer. Backports should already have been reviewed for design when they merged to master, so if two cores agree that the backport is good (one by proposing, the other by reviewing), they can be merged with a single +2 review.
Requirements updates proposed by the bot.
Translation updates proposed by the bot. (See also reviewing translation imports.)

Co-author +2

Co-authors on a patch are allowed to +2 that patch, but at least one +2 from a core not listed as a co-author is required to merge the patch. For example, if core A pushes a patch with cores B and C as a co-authors, core B and core C are both allowed to +2 that patch, but another core is required to +2 before the patch can be merged.

Self-Approval

It is acceptable for a core to self-approve a patch they submitted if it has the requisite 2 +2’s and a CI pass. However, this should not be done if there is any dispute about the patch, such as on a change with 2 +2’s and an unresolved -1.

Note on CI

This policy does not affect CI requirements. Patches must still pass CI before merging.

Alternatives & History

This policy has been in effect for a while now, but not every TripleO core is aware of it, so it is simply being written down in an official location for reference.

Implementation

Author(s)

Primary author:: bnemec

Milestones

The policy is already in effect.

Work Items

Ensure all cores are aware of the policy. Once the policy has merged, an email should be sent to openstack-dev referring to it.

References

Existing wiki on review guidelines: https://wiki.openstack.org/wiki/TripleO/ReviewGuidelines

Previous spec that implemented some of this policy: https://specs.openstack.org/openstack/tripleo-specs/specs/kilo/tripleo-review-standards.html

Revision History

Revisions
Release Name	Description
Newton	Introduced
Newton	Added co-author +2 policy
Ocata	Added note on translation imports

Note

This work is licensed under a Creative Commons Attribution 3.0 Unported License. https://creativecommons.org/licenses/by/3.0/legalcode

Launchpad Blueprint:

https://blueprints.launchpad.net/tripleo/+spec/os-tempest-tripleo

Tempest provides a set of API and integrations tests with batteries included in order to validate the OpenStack Deployment. In TripleO project, we are working towards using a unified tempest role i.e. os_tempest provided by OpenStack Ansible project in TripleO CI in order to foster collaboration with multiple deployment tools and improve our testing strategies within OpenStack Community.

Problem Description

In the OpenStack Ecosystem, we have multiple ansible based deployment tools that use their own roles for install/configure and running tempest testing. Each of these roles is trying to do similar stuff tied to the different deployment tools. For example: validate-tempest ansible role on TripleO CI provides most of the stuff but it is tied with the TripleO deployment and provides some nice feature (Like: bugcheck, failed tests email notification, stackviz, python-tempestconf support for auto tempest.conf generation) which are missing in other roles. It is leading to duplication and reduces what tempest tests are not working across them, leading to no collaboration on the Testing side.

The OpenStack Ansible team provides os_tempest role for installing/ configuring/running tempest and post tempest results processing and there is a lot of duplication between their work and the roles used for testing by the various deployment tools.It almost provides most of the stuff provided by each of the deployment tool specific tempest roles. There are few stuffs which are missing can be added in the role and make it useable so that other deployment tools can consume it.

Proposed Change

Using unified os_tempest ansible role in TripleO CI will help to maintain one less role within TripleO project and help us to collaborate with openstack-ansible team in order to share/improve tests strategies across OpenStack ecosystem and solve tempest issues fastly.

In order to achieve that, we need:

Improve os_tempest role to add support for package/container install, python-tempestconf, stackviz, skip list, bugcheck, tempest log collection at the proper place.
Have a working CI job on standalone running tempest from os_tempest role as well as on OSA side.
Provide an easy migration path from validate-tempest role.

Alternatives

If we do not use the existing os_tempest role then we need to re-write the validate-tempest role which will result in again duplication and it will cost too much time and it also requires another set of efforts for adoption in the community which does not seems to feasible.

Security Impact

None

Upgrade Impact

None

Other End User Impact

We need to educate users for migrating to os_tempest.

Performance Impact

None

Other Deployer Impact

None

Developer Impact

Helps more collaboration and improves testing.

Implementation

Assignee(s)

Primary assignee:

Arx Cruz (arxcruz)
Chandan Kumar (chkumar246)
Martin Kopec (mkopec)

Work Items

Install tempest and it’s dependencies from Distro packages
Running tempest from containers
Enable stackviz
python-tempestconf support
skiplist management
Keeping all tempest related files at one place
Bugcheck
Standalone based TripleO CI job consuming os_tempest role
Migration path from validate-tempest to os_tempest role
Documentation update on How to use it
RDO packaging

Dependencies

Currently, os_tempest role depends on python_venv_build role when tempest is installed from source (git, pip, venv). We need to package it in RDO.

Testing

The unified tempest role os_tempest will replace validate-tempest role with much more improvements.

Documentation Impact

Documentation on how to consume os_tempest needs to be updated.

References

Unified Tempest role creation & calloboration email: http://lists.openstack.org/pipermail/openstack-dev/2018-August/133838.html
os_tempest role: http://git.openstack.org/cgit/openstack/openstack-ansible-os_tempest

https://blueprints.launchpad.net/tripleo/+spec/validation-framework

Currently, we’re lacking a common validation framework in tripleoclient. This framework should provide an easy way to validate environment prior deploy and prior update/upgrade, on both undercloud and overcloud.

Problem Description

Currently, we have two types of validations:

Those launched prior the undercloud deploy, embedded into the deploy itself
Those launched at will via a Mistral Workflow

There isn’t any unified way to call any validations by itself in an easy way, and we lack the capacity to easily add new validations for the undercloud preflight checks.

The current situation is not optimal, as the operator must go in the UI in order to run validations - there is a way to run them from the CLI, using the exact same workflows as the UI. This can’t be used in order to get proper preflight validations, especially when we don’t get a working Mistral (prior the undercloud deploy, or with all-on-one/standalone).

Moreover, there is a need to make the CLI and UI converge. The latter already uses the full list of validations. Adding the full support of tripleo-validations to the CLI will improve the overall quality, usability and maintenance of the validations.

Finally, a third type should be added: service validations called during the deploy itself. This doesn’t directly affect the tripleoclient codebase, but tripleo-heat-templates.

Proposed Change

Overview

In order to improve the current situation, we propose to create a new “branching” in the tripleoclient commands: openstack tripleo validator

This new subcommand will allow to list and run validations in an independent way.

Doing so will allow to get a clear and clean view on the validations we can run depending on the stage we’re in.

(Note: the subcommand has yet to be defined - this is only a “mock-up”.)

The following subcommands should be supported:

openstacktripleovalidatorlist: will display all the available validations with a small description, like “validate network capabilities on undercloud”
openstacktripleovalidatorrun: will run the validations. Should take options, like:
- --validation-name: run only the passed validation.
- --undercloud: runs all undercloud-related validations
- --overcloud: runs all overcloud-related validations
- --use-mistral: runs validations through Mistral
- --use-ansible: runs validations directly via Ansible
- --plan: allows to run validations against specific plan. Defaults to $TRIPLEO_PLAN_NAME or “overcloud”
in addition, common options for all the subcommands:
- --extra-roles: path to a local directory containing validation roles maintained by the operator, or swift directory containing extra validation roles.
- --output: points to a valid Ansible output_callback, such as the native json, or custom validation_output. The default one should be the latter as it renders a “human readable” output. More callbacks can be added later.

The --extra-roles must support both local path and remote swift container, since the custom validation support will push any validation to a dedicated swift directory.

The default engine will be determined by the presence of Mistral: if Mistral is present and accepting requests (meaning the Undercloud is most probably deployed), the validator has to use it by default. If no Mistral is present, it must fallback on the ansible-playbook.

The validations should be in the form of Ansible roles, in order to be easily accessed from Mistral as well (as it is currently the case). It will also allow to get a proper documentation, canvas and gives the possibility to validate the role before running it (ensuring there are metadata, output, and so on).

We might also create some dedicated roles in order to make a kind of “self validation”, ensuring we actually can run the validations (network, resources, and so on).

The UI uses Mistral workflows in order to run the validations - the CLI must be able to use those same workflows of course, but also run at least some validations directly via ansible, especially when we want to validate the undercloud environment before we even deploy it.

Also, in order to avoid Mistral modification, playbooks including validation roles will be created.

In the end, all the default validation roles should be in one and only one location: tripleo-validations. The support for “custom validations” being added, such custom validation should also be supported (see references for details).

In order to get a proper way to “aim” the validations, proper validation groups must be created and documented. Of course, one validation can be part of multiple groups.

In addition, a proper documentation with examples describing the Good Practices regarding the roles content, format and outputs should be created.

For instance, a role should contain a description, a “human readable error output”, and if applicable a possible solution.

Proper testing for the default validations (i.e. those in tripleo-validations) might be added as well in order to ensure a new validation follows the Good Practices.

We might want to add support for “nagios-compatible outputs” and exit codes, but it is not sure running those validations through any monitoring tool is a good idea due to the possible load it might create. This has to be discussed later, once we get the framework in place.

Alternatives

No real alternatives in fact. Currently, we have many ways to validate, but they are all unrelated, not concerted. If we don’t provide a unified framework, we will get more and more “side validations ways” and it won’t be maintainable.

Security Impact

Rights might be needed for some validations - they should be added accordingly in the system sudoers, in a way that limits unwanted privilege escalations.

Other End User Impact

The end user will get a proper way to validate the environment prior to any action. This will give more confidence in the final product, and ease the update and upgrade processes.

It will also provide a good way to collect information about the systems in case of failures.

If a “nagios-compatible output” is to be created (mix of ansible JSON output, parsing and compatibility stuff), it might provide a way to get a daily report about the health of the stack - this might be a nice feature, but not in the current scope (will need a new stdout_callback for instance).

Performance Impact

The more validations we get, the more time it might take IF we decide to run them by default prior any action.

The current way to disable them, either with a configuration file or a CLI option will stay.

In addition, we can make a great use of “groups” in order to filter out greedy validations.

Other Deployer Impact

Providing a CLI subcommand for validation will make the deployment easier.

Providing a unified framework will allow an operator to run the validations either from the UI, or from the CLI, without any surprise regarding the validation list.

Developer Impact

A refactoring will be needed in python-tripleoclient and probably in tripleo-common in order to get a proper subcommand and options.

A correct way to call Ansible from Python is to be decided (ansible-runner?).

A correct way to call Mistral workflows from the CLI is to be created if it does not already exist.

In the end, the framework will allow other Openstack projects to push their own validations, since they are the ones knowing how and what to validate in the different services making Openstack.

All validations will be centralized in the tripleo-validations repository. This means we might want to create a proper tree in order to avoid having 100+ validations in the same directory.

Implementation

Assignee(s)

Primary assignee:: cjeanner
Other contributors:: akrivoka ccamacho dpeacock florianf

Work Items

List current existing validations in both undercloud_preflight.py and openstack-tripleo-validations.
Decide if we integrate ansible-runner as a dependency (needs to be packaged).
Implement the undercloud_preflight validations as Ansible roles.
Implement a proper way to call Ansible from the tripleoclient code.
Implement support for a configuration file dedicated for the validations.
Implement the new subcommand tree in tripleoclient.
Validate, Validate, Validate.

Dependencies

Ansible-runner: https://github.com/ansible/ansible-runner
Openstack-tripleo-validations: https://github.com/openstack/tripleo-validations

Testing

The CI can’t possibly provide the “right” environment with all the requirements. The code has to implement a way to configure the validations so that the CI can override the productive values we will set in the validations.

Documentation Impact

A new entry in the documentation must be created in order to describe this new framework (for the devs) and new subcommand (for the operators).

References

Launchpad blueprint:

https://blueprints.launchpad.net/tripleo/+spec/ansible-certmonger

Problem Description

There are multiple issues with the current way certificates are managed with Puppet and Certmonger, especially in a containerized environment:

Multiple containers are using the same certificate
There isn’t any easy way to find out which container needs to be restarted upon certificate renewal
Shared certificates are bad

The main issue now is the use of “pkill”, especially for httpd services. Since Certmonger has no knowledge of what container has an httpd service running, it uses a wide fly swatter in the hope all related services will effectively be reloaded with the new certificate.

The usage of “pkill” by Certmonger is prevented on a SELinux enforcing host.

Proposed Change

Introduction

While the use of certmonger isn’t in question, the way we’re using it is.

The goal of this document is to describe how we could change that usage, allowing to provide a better security, while allowing Certmonger to restart only the needed containers in an easy fashion.

Implement certmonger in Ansible

A first step will be to implement a certmonger “thing” in Ansible. There are two ways to do that:

Reusable role
Native Ansible module

While the first one is faster to implement, the second would be better, since it will allow to provide a clean way to manage the certificates.

Move certificate management to tripleo-heat-templates

Once we have a way to manage Certmonger within Ansible, we will be able to move calls directly in relevant tripleo-heat-templates files, allowing to generate per-container certificate.

Doing so will also allow Certmonger to know exactly which container to restart upon certificate renewal, using a simple “container_cli kill” command.

Alternatives

One alternative is proposed

Maintain a list

We could maintain the code as-is, and just add a list for the containers needing a restart/reload. Certmonger would loop on that list, and do its job upon certificate renewal.

This isn’t a good solution, since the list will eventually lack updates, and this will create new issues instead of solving the current ones.

Also, it doesn’t allow to get per-container certificate, which is bad.

Proposed roadmap

In Stein:

Create “tripleo-certmonger” Ansible reusable role in tripleo-common

In Train:

Move certificate management/generation within tripleo-heat-templates.
Evaluate the benefices of moving to a proper Ansible module for Certmonger.
If evaluation is good and we have time, implement it and update current code.

In “U” release:

Check if anything relies on puppet-certmonger, and if not, drop this module.

Security Impact

We will provide a better security level by avoiding shared x509 keypairs.

Upgrade Impact

Every container using the shared certificate will be restarted in order to load the new, dedicated one.

We will have to ensure the nova metadata are properly updated in order to let novajoin create the services in FreeIPA, allowing to request per-service certificates.

Tests should also be made regarding novajoin update/upgrade in order to ensure all is working as expected.

If the containers are already using dedicated certificates, no other impact is expected.

End User Impact

During the upgrade, a standard short downtime is to be expected, unless the deployment is done using HA.

Performance Impact

No major performance impact is expected.

Deployer Impact

No major deployer impact is expected.

Developer Impact

People adding new services requiring a certificate will need to call the Certmonger module/role in the new tripleo-heat-templates file.

They will also need to ensure new metadata is properly generated in order to let novajoin create the related service in FreeIPA.

Implementation

Contributors

Cédric Jeanneret
Grzegorz Grasza
Nathan Kinder

Work Items

Implement reusable role for Certmonger
Move certificate management to tripleo-heat-templates
Remove certmonger parts from Puppet
Update/create needed documentations about the certificate management

Later: * Implement a proper Ansible Module * Update the role in order to wrap module calls

Dependencies

None - currently, no Certmonger module for Ansible exists.

Testing

We have to ensure the dedicated certificate is generated with the right content, and ensure it’s served by the right container.

We can do that using openssl CLI, maybe adding a new check in the CI via a new role in tripleo-quickstart-extras.

This is also deeply linked to novajoin, thus we have to ensure it works as expected.

Documentation Impact

We will need to document how the certificate are managed.

References

https://blueprints.launchpad.net/tripleo/undercloud-minion

In order to improve our scale, we have identified heat-engine and possibly ironic-conductor as services that we can add on to an existing undercloud deployment. Adding heat-engine allows for additional processing capacity when creating and updating stacks for deployment. By adding a new light weight minion node, we can scale the Heat capacity horizontally.

Additionally since these nodes could be more remote, we could add an ironic-conductor instance to be able to manage hosts in a remote region while still having a central undercloud for the main management.

Problem Description

Currently we use a single heat-engine on the undercloud for the deployment. According to the Heat folks, it can be beneficial for processing to have additional heat-engine instances for scale. The recommended scaling is out rather than up. Additionally by being able to deploy a secondary host, we can increase our capacity for the undercloud when additional scale capacity is required.

Proposed Change

Overview

We are proposing to add a new undercloud “minion” configuration that can be used by operators to configure additional instances of heat-engine and ironic-conductor when they need more processing capacity. We would also allow the operator to disable heat-engine from the main undercloud to reduce the resource usage of the undercloud. By removing the heat-engine from the regular undercloud, the operator could possibly avoid timeouts on other services like keystone and neutron that can occur when the system is under load.

Alternatives

An alternative would be to make the undercloud deployable in a traditional HA capacity where we share the services across multiple nodes. This would increase the overall capacity but adds additional complexity to the undercloud. Additionally this does not let us target specific services that are resource heavy.

Security Impact

The new node would need to have access to the the main undercloud’s keystone, database and messaging services.

Upgrade Impact

The new minion role would need to be able to be upgraded by the user.

Other End User Impact

None.

Performance Impact

This additional minion role may improve heat processing due to the additional resource capacity being provided.
Locating an ironic-conductor closer to the nodes being managed can improve performance by being closer to the systems (less latency, etc).

Other Deployer Impact

Additional undercloud role and a new undercloud-minion.conf configuration file will be created. Additionally a new option may be added to the undercloud.conf to manage heat-engine instalation.

Developer Impact

None.

Implementation

Assignee(s)

Primary assignee:: mwhahaha
Other contributors:: slagle EmilienM

Work Items

Work items or tasks – break the feature up into the things that need to be done to implement it. Those parts might end up being done by different people, but we’re mostly trying to understand the timeline for implementation.

python-tripleoclient

New ‘openstack undercloud minion deploy’ command for installation
New ‘openstack undercloud minion upgrade’ command for upgrades
New configuration file ‘undercloud-minion.conf’ to drive the installation and upgrades.
New configuration option in ‘undercloud.conf’ to provide ability to disable the heat-engine on the undercloud.

tripleo-heat-templates

New ‘UndercloudMinion’ role file
New environment file for the undercloud minion deployment
Additional environment files to enable or disable heat-engine and ironic-conductor.

Dependencies

None.

Testing

We would add a new CI job to test the deployment of the minion node. This job will likely be a new multinode job.

Documentation Impact

We will need to document the usage of the undercloud minion installation and the specific use cases where this can be beneficial.

References

See the notes from the Train PTG around Scaling.

Scaling-up a team is a common challenge in OpenStack. We always increase the number of projects, with more contributors and it often implies some changes in the organization. This policy is intended to document how we will address this challenge in the TripleO project.

Problem Description

Projects usually start from a single git repository and very often grow to dozen of repositories, doing different things. As long as a project gets some maturity, people who work together on a same topic needs some space to collaborate the open way. Currently, TripleO is acting as a single team where everyone meets on IRC once a week to talk about bugs, CI status, release management. Also, it happens very often that new contributors have hard time to find an area of where they could quickly start to contribute. Time is precious for our developers and we need to find a way to allow them to keep all focus on their area of work.

Policy

The idea of this policy is to create squads of people who work on the same topic and allow them to keep focus with low amount of external distractions.

Anyone would be free to join and leave a squad at will. Right now, there is no size limit for a squad as this is something experimental. If we realize a squad is too big (more than 10 people), we might re-consider the focus of area of the squad.
Anyone can join one or multiple squads at the same time. Squads will be documented in a place anyone can contribute.
Squads are free to organize themselves a weekly meeting.
#tripleo remains the official IRC channel. We won’t add more channels.
Squads will have to choose a representative, who would be a squad liaison with TripleO PTL.
TripleO weekly meeting will still exist, anyone is encouraged to join, but topics would stay high level. Some examples of topics: release management; horizontal discussion between squads, CI status, etc. The meeting would be a TripleO cross-projects meeting.

We might need to test the idea for at least 1 or 2 months and invest some time to reflect what is working and what could be improved.

Benefits

More collaboration is expected between people working on a same topic. It will reflect officially what we have nearly done over the last cycles.
People working on the same area of TripleO would have the possibility to do public and open meetings, where anyone would be free to join.
Newcomers would more easily understand what TripleO project delivers since squads would provide a good overview of the work we do. Also it would be an opportunity for people who want to learn on a specific area of TripleO to join a new squad and learn from others.
Open more possibilities like setting up mentoring program for each squad, or specific docs to get involved more quickly.

Challenges

We need to avoid creating silos and keep horizontal collaboration. Working on a squad doesn’t meen you need to ignore other squads.

Squads

The list tends to be dynamic over the cycles, depending on which topics the team is working on. The list below is subject to change as squads change.

Squad	Description
ci	Group of people focusing on Continuous Integration tooling and system
upgrade	Group of people focusing on TripleO upgrades
validations	Group of people focusing on TripleO validations tooling
networking	Group of people focusing on networking bits in TripleO
integration	Group of people focusing on configuration management (eg: services)
security	Group of people focusing on security
edge	Group of people focusing on Edge/multi-site/multi-cloud https://etherpad.openstack.org/p/tripleo-edge-squad-status
transformation	Group of people focusing on converting heat templates / puppet to Ansible within the tripleo-ansible framework

Note

Note about CI: the squad is about working together on the tooling used by OpenStack Infra to test TripleO, though every squad has in charge of maintaining the good shape of their tests.

Alternatives & History

One alternative would be to continue that way and keep a single horizontal team. As long as we try to welcome in the team and add more projects, we’ll increase the problem severity of scaling-up TripleO project. The number of people involved and the variety of topics that makes it really difficult to become able to work on everything.

Implementation

Author(s)

Primary author:: emacchi

Milestones

Ongoing

Work Items

Work with TripleO developers to document the area of work for every squad.
Document the output.
Document squads members.
Setup Squad meetings if needed.
For each squad, find a liaison or a squad leader.

Note

This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode

https://blueprints.launchpad.net/tripleo/scaling-with-Ansible-inventory

Scaling an existing deployment should be possible by adding new host definitions directly to the Ansible inventory, and not having to increase the <Role>Count parameters.

Problem Description

Currently to scale a deployment, a Heat stack update is required. The stack update reflects the new desired node count of each role, which is then represented in the generated Ansible inventory. The inventory file is then used by the config-download process when ansible-playbook is executed to perform the software configuration on each node.

Updating the Heat stack with the new desired node count has posed some scaling challenges. Heat creates a set of resources associated with each node. As the number of nodes in a deployment increases, Heat has more and more resources to manage.

As the stack size grows, Heat must be tuned with software configurations or horizontally scaled with additional engine workers. However, horizontal scaling of Heat workers will only help so much as eventually other service workers would need to be scaled as well, such as database, messaging, or Keystone worker process. Having to increasingly scale worker processes results in additional physical resource consumption.

Heat performance also begins to degrade as stack size increases. It takes longer and longer for stack operations to complete as node count increases. The stack operation time often reaches into taking many hours, which is usually outside the range of typical maintenance windows.

It is also hard to predict what changes Heat will make. Often, no changes are desired other than to scale out to new nodes. However, unintended template changes or user error around forgetting to pass environment files poses additional unnecessary risk to the scaling operation.

Proposed Change

Overview

The proposed change would allow for users to directly add new node definitions to the Ansible inventory by way of a new Heat parameter to allow for scaling services onto those new nodes. No change in the <Role>Count parameters would be required.

A minimum set of data would be required when adding a new node to the Ansible inventory. Presently, this includes the TripleO role, and an IP address on each network that is used by that role.

Only scaling of already defined roles will be possible with this method. Defining new roles would still require a full Heat stack update which defined the new role.

Once the new node(s) are added to the inventory, ansible-playbook could be rerun with the config-download directory to scale the software services out on to the new nodes.

As increasing the node count in the Heat stack operation won’t be necessary when scaling, if baremetal provisioning is required for the new nodes, then this work depends on the nova-less-deploy work:

https://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html

Once baremetal provisioning is migrated out of Heat with the above work, then new nodes can be provisioned with those new workflows before adding them directly to the Ansible inventory.

Since new nodes added directly to the Ansible inventory would still be consuming IP’s from the subnet ranges defined for the overcloud networks, Neutron needs to be made aware of those assignments so that there are no overlapping IP addresses. This could be done with a new interface in tripleo-heat-templates that allows for specifying the extra node inventory data. The parameter would be called ExtraInventoryData. The templates would take care of operating on that input and creating the appropriate Neutron ports to correspond to the IP addresses specified in the data.

When tripleo-ansible-inventory is used to generate the inventory, it would query Heat as it does today, but also layer in the extra inventory data as specified by ExtraInventoryData. The resulting inventory would be a unified view of all nodes in the deployment.

ExtraInventoryData may be a list of files that are consumed with Heat’s get_file function so that the deployer can keep their inventory data organized by file.

Alternatives

This change is primarily targeted at addressing scaling issues around the Heat stack operation. Alternative methods include using undercloud minions:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/undercloud_minion.html

Multi-stack/split-controlplane also addresses the issue somewhat by breaking up the deployment into smaller and more manageable stacks:

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/distributed_compute_node.html

These alternatives are complimentary to the proposed solution here, and all of these solutions can be used together for the greatest benefits.

Direct manipulation of inventory data

Another alternative would be to not make use of any new interface in the templates such as the previously mentioned ExtraInventoryData. Users could just update the inventory file manually, or drop inventory files in a specified location (since Ansible can use a directory as an inventory source).

The drawbacks to this approach are that another tool would be necessary to create associated ports in Neutron so that there are no overlapping IP addresses. It could also be a manual step, although that is prone to error.

The advantages to this approach is that it would completely eliminate the stack update operation as part of the scaling. Not having any stack operation is appealing in some regards due to the potential to forget environment files or other user error (out of date templates, etc).

Security Impact

IP addresses and hostnames would potentially exist in user managed templates that have the value for ExtraInventoryData, however this is no different than what is present today.

Upgrade Impact

The upgrade process will need to be aware that not all nodes are represented in the Heat stack, and some will be represented only in the inventory. This should not be an issue as long as there is a consistent interface to get a single unified inventory as there exists now.

Any changes around creating the unified view of the inventory should be made within the implementation of that interface (tripleo-ansible-inventory) such that existing tooling continues to use an inventory that contains all nodes for a deployment.

Other End User Impact

Users will potentially have to manage additional environment files for the extra inventory data.

Performance Impact

Performance should be improved during scale out operations.

However, it should be noted that Ansible will face scaling challenges as well. While this change does not directly introduce those new challenges, it may expose them more rapidly as it bypasses the Heat scaling challenges.

For example, it is not expected that simply adding hundreds or thousands of new nodes directly to the Ansible inventory means that scaling operation would succeed. It would likely expose new scaling challenges in other tooling, such as the playbook and role tasks or Ansible itself.

Other Deployer Impact

Since this proposal is meant to align with the nova-less-deploy, all nodes (whether they are known to Heat or not) would be unprovisioned if the deployment is deleted.

If using pre-provisioned nodes, then there is no change in behavior in that deleting the Heat stack does not actually “undeploy” any software. This proposal does not change that behavior.

Developer Impact

Developers could more quickly test scaling by bypassing the Heat stack update completely if desired, or using the ExtraInventoryData interface.

Implementation

Assignee(s)

Primary assignee:: James Slagle <jslagle@redhat.com>

Work Items

Add new parameter ExtraInventoryData
Add Heat processing of ExtraInventoryData
- create Neutron ports
- add stack outputs
Update tripleo-ansible-inventory to consume from added stack outputs
Update HostsEntry to be generic

Dependencies

Depends on nova-less-deploy work for baremetal provisioning outside of Heat. If using pre-provisioned nodes, does not depend on nova-less-deploy.
All deployment configurations coming out of Heat need to be generic per role. Most of this work was complete in Train, however this should be reviewed. For example, the HostsEntry data is still static and Heat is calculating the node list. This data needs to be moved to an Ansible template.

Testing

Scaling is not currently tested in CI, however perhaps it could be with this change.

Manual test plans and other test automation would need to be updated to also test scaling with ExtraInventoryData.

Documentation Impact

Documentation needs to be added for ExtraInventoryData.

The feature should also be fully explained in that users and deployers need to be made aware of the change of how nodes may or may not be represented in the Heat stack.

References

https://blueprints.launchpad.net/tripleo/+spec/tripleo-operator-ansible

As an operator of a TripleO deployment, I would like to be able to comsume supported ansible roles and modules that let me perform TripleO related actions in my automation.

Problem Description

The existing tripleo-ansible repository currently contains roles, plugins and modules that are consumed by TripleO to perform the actual deployments and configurations. As these are internal implementations to TripleO, we would not want operators consuming these directly. The tripleo-ansible repository is also branched which means that the contents within the repo and packaging are specific to a singular release. This spec propose that we create a new repository targeted for external automation for any supported version.

Currently Operators do not have a set of official ansible roles and modules that can be used to deploy and manage TripleO environments. For folks who wish to manage their TripleO environments in an automated fashion, we have seen multiple folks implement the same roles to manage TripleO. e.g. tripleo-quickstart, tripleo-quickstart-extras, infrared, tripleo-lab.

TripleO should provide a set of ansible roles and modules that can be used by the end user to deploy and manage an Undercloud and Overcloud.
TripleO should provide a set of ansible roles and modules that can be used to perform scaling actions.
TripleO should provide a set of ansible roles and modules that can be used to perform update and upgrade actions.

Proposed Change

Overview

TripleO should create a new repository where ansible roles, plugins and modules that wrap TripleO actions can be stored. This repository should be branchless so that the roles can be used with any currently supported version of TripleO. The goal is to only provide automation for TripleO actions and not necessarily other cloud related actions. The roles in this new repository should only be targeted to providing an automation interface for the existing tripleoclient commands. The repository may provide basic setups actions such as implementing a wrapper around tripleo-repos. The roles contained in this repository should not implement additional day 2 cloud related operations such as creating servers, networks or other resources on the deployed Overcloud.

This new repository should be able to be packaged and distributed via an RPM as well as being able to be published to Ansible Galaxy. The structure of this new repository should be Ansible collections compatible.

The target audience of the new repository would be end users (operators, developers, etc) who want to write automation around TripleO. The new repository and roles would be our officially supported automation artifacts. One way to describe this would be like providing Puppet modules for a given peice of software so that it can be consumed by users who use Puppet. The existing CLI will continue to function for users who do not want to use Ansible to automate TripleO deployments or who wish to continue to use the CLI by hand. The roles are not a replacement for the CLI, but only provide an official set of roles for people who use Ansible.

The integration point for Ansible users would be the roles provided via tripleo-operator-ansible. We would expect users to perform actions by including our provided roles.

An example playbook for a user could be:

-hosts:undercloudgather_facts:truetasks:-include_role:role:tripleo_undercloudtasks_from:installvars:tripleo_undercloud_configuration:DEFAULT:undercloud_debug:Truelocal_ip:192.168.50.1/24-name:Copy nodes.jsoncopy:src:/home/myuser/my-environment-nodes.jsondest:/home/stack/nodes.json-include_role:role:tripleo_baremetaltasks_from:introspectionvars:tripleo_baremetal_nodes_file:/home/stack/nodes.jsontripleo_baremetal_introspection_provide:Truetripleo_baremetal_introspection_all_managable:True-include_role:role:tripleo_overcloudtasks_from:deployvars:tripleo_overcloud_environment_files:-network_isolation.yaml-ceph_storage.yamltripleo_overcloud_roles:-Controller-Networker-Compute-CephStorage

The internals of these roles could possibly proceed in two different paths:

Implement simple wrappers around the invocation of the actual TripleO commands using execs, shell or commands. This path will likely be the fastest path to have an initial implementation.

-name:Install undercloudcommand:"openstackundercloudinstall{{tripleo_undercloud_install_options}}"chdir:"{{tripleo_undercloud_install_directory}}"

Implement a python wrapper to call into the provided tripleoclient classes. This path may be a longer term goal as we may be able to provide better testing by using modules.

#!/usr/bin/python# import the python-tripleoclient# undercloud clifromtripleoclient.v1importundercloudimportsysimportjsonimportosimportshlex# See the following for details# https://opendev.org/openstack/python-tripleoclient/src/branch/# master/tripleoclient/v1/undercloud.py# setup the osc commandclassArg:verbose_level=4# instantiate theu=undercloud.InstallUndercloud('tripleo',Arg())# prog_name = 'openstack undercloud install'tripleo_args=u.get_parser('openstack undercloud install')# read the argument string from the arguments fileargs_file=sys.argv[1]args_data=file(args_file).read()# For this module, we're going to do key=value style arguments.arguments=shlex.split(args_data)forarginarguments:# ignore any arguments without an equals in itif"="inarg:(key,value)=arg.split("=")# if setting the time, the key 'time'# will contain the value we want to set the time toifkey=="dry_run":ifvalue=="True":tripleo_args.dry_run=Trueelse:tripleo_args.dry_run=Falsetripleo_args.force_stack_validations=Falsetripleo_args.no_validations=Truetripleo_args.force_stack_update=Falsetripleo_args.inflight=False# execute the install via python-tripleoclientrc=u.take_action(tripleo_args)ifrc!=0:print(json.dumps({"failed":True,"msg":"failed tripleo undercloud install"}))sys.exit(1)print(json.dumps({"changed":True,"msg":"SUCCESS"}))sys.exit(0)

-name:Install undercloudtripleo_undercloud:install:truefoo:bar

These implementations will need to be evaluated to understand which works best when attempting to support multiple versions of TripleO where options may or may not be available. The example of this is where we supported one cli parameter in versions >= Stein but not prior to this.

The goal is to have a complete set of roles to do basic deployments within a single cycle. We should be able to itterate on the internals of the roles once we have established basic set to prove out the concept. More complex actions or other version support may follow on in later cycles.

Alternatives

Do nothing and continue to have multiple tools re-implement the actions in ansible roles.
Pick a singular implementaion from the existing set and merge them together within this existing tool. This however may include additional actions that are outside of the scope of the TripleO management. This may also limit the integration by others if established interfaces are too opinionated.

Security Impact

None.

Upgrade Impact

There should be no upgrade impact other than pulling in the upgrade related actions into this repository.

Other End User Impact

None.

Performance Impact

None.

Other Deployer Impact

None.

Developer Impact

Developers will need to ensure the supported roles are updated if the cli or other actions are updated with new options or patterns.

Implementation

Assignee(s)

Primary assignee:: mwhahaha
Other contributors:: weshay emilienm cloudnull

Work Items

The existing roles should be evaulated to see if they can be reused and pulled into the new repository.

Create new tripleo-operator-ansible
Establish CI and testing framework for the new repository
Evaulate and pull in existing roles if possible
Initial implementation may only be a basic wrapper over the cli
Update tripleo-quickstart to leverage the newly provided roles and remove previously roles.

Dependencies

If there are OpenStack service related actions that need to occur, we may need to investigate the inclusion of OpenStackSDK, shade or other upstream related tools.

Testing

The new repository should have molecule testing for any new role created. Additionally once tripleo-quickstart begins to consume the roles we will need to ensure that other deployment related CI jobs are included in the testing matrix.

Documentation Impact

The roles should be documented (perferrably automated) for the operators to be able to consume these new roles.

References

None.

The main TripleO bug tracker is used to keep track of bugs for multiple projects that are all parts of TripleO. In order to reduce confusion, we are using a list of approved tags to categorize them.

Problem Description

Given the heavily interconnected nature of the various TripleO projects, there is a desire to track all the related bugs in a single bug tracker. However when it is needed, it can be difficult to narrow down the bugs related to a specific aspect of the project. Launchpad bug tags can help us here.

Policy

The Launchpad official tags list for TripleO contains the following tags. Keeping them official in Launchpad means the tags will auto-complete when users start writing them. A bug report can have any combination of these tags, or none.

Proposing new tags should be done via policy update (proposing a change to this file). Once such a change is merged, a member of the driver team will create/delete the tag in Launchpad.

Tag	Description
alert	For critical bugs requiring immediate attention. Triggers IRC notification
ci	A bug affecting the Continuous Integration system
ci-reproducer	A bug affecting local recreation of Continuous Integration environments
config-agent	A bug affecting os-collect-config, os-refresh-config, os-apply-config
containers	A bug affecting container based deployments
depcheck	A bug affecting 3rd party dependencies, for example ceph-ansible, podman
deployment-time	A bug affecting deployment time
documentation	A bug that is specific to documentation issues
edge	A bug that correlates to EDGE computing cases by network/scale etc. areas
i18n	A bug related to internationalization issues
low-hanging-fruit	A good starter bug for newcomers
networking	A bug that is specific to networking issues
promotion-blocker	Bug that is blocking promotion job(s)
puppet	A bug affecting the TripleO Puppet templates
quickstart	A bug affecting tripleo-quickstart or tripleo-quickstart-extras
selinux	A bug related to SELinux
tech-debt	A bug related to TripleO tech debt
tempest	A bug related to tempest running on TripleO
tripleo-common	A bug affecting tripleo-common
tripleo-heat-templates	A bug affecting the TripleO Heat Templates
tripleoclient	A bug affecting python-tripleoclient
ui	A bug affecting the TripleO UI
upgrade	A bug affecting upgrades
ux	A bug affecting user experience
validations	A bug affecting the Validations
workflows	A bug affecting the Mistral workflows
xxx-backport-potential	Cherry-pick request for the stable team

Alternatives & History

The current ad-hoc system is not working well, as people use inconsistent subject tags and other markers. Likewise, with the list not being official Launchpad tags do not autocomplete and quickly become inconsistent, hence not as useful.

We could use the wiki to keep track of the tags, but the future of the wiki is in doubt. By making tags an official policy, changes to the list can be reviewed.

Implementation

Author(s)

Primary author:: jpichon

Milestones

Newton-3

Work Items

Once the policy has merged, someone with the appropriate Launchpad permissions should create the tags and an email should be sent to openstack-dev referring to this policy.

References

Launchpad page to manage the tag list: https://bugs.launchpad.net/tripleo/+manage-official-tags

Thread that led to the creation of this policy: http://lists.openstack.org/pipermail/openstack-dev/2016-July/099444.html

Revision History

Revisions
Release Name	Description
Newton	Introduced
Queens	tech-debt tag added

Note

This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode

Include the URL of your launchpad blueprint:

https://blueprints.launchpad.net/tripleo/+spec/tripleo-mistral-to-ansible

The goal of this proposal is to replace Mistral in TripleO with Ansible playbooks.

Problem Description

Mistral was originally added to take the place of an “API” and provide common logic for tripleoclient and TripleO UI. After the TripleO UI was removed, the only consumer of Mistral is tripleoclient. This means that Mistral now adds unnecessary overhead and complexity.

Proposed Change

Overview

Remove Mistral from the TripleO undercloud and convert all Mistral workbooks, workflows and actions to Ansible playbooks within tripleo-ansible. tripleoclient will then be updated to execute the Ansible playbooks rather than the Mistral workflows.

Alternatives

The only other alternative candidate is to keep using Mistral and accept the complexity and reinvest in the project.

Security Impact

As the code will be re-writing Mistral workflows that currently deal with passwords, tokens and secrets we will need to be careful. However the logic should be largely the same.
With the eventual removal of Mistral and Zaqar two complex systems can be removed which will reduce the surface area for security issues.
The new Ansible playbooks will only use the undercloud OpenStack APIs, therefore they shouldn’t create a new attack vector.

Upgrade Impact

Upgrades will need to remove Mistral services and make sure the Ansible playbooks are in place.
Older versions of tripleoclient will no longer work with the undercloud as they will expect Mistral to be present.
Most of the data in Mistral is ephemeral, but some longer term data is stored in Mistral environments. This data will likely be moved to Swift.

Other End User Impact

The output of CLI commands will change format. For example, the Mistral workflow ID will no longer be included and other Ansible specific output will be included. Where possible we will favour streaming Ansible output to the user, making tripleoclient very light and transparent.

Some CLI commands, such as introspection will need to fundamentally change their output. Currently they send real time updates and progress to the client with Zaqar. Despite moving the execution locally, we are unable to easily get messages from a Ansible playbook while it is running. This means the user may need to wait a long time before they get any feedback.

Performance Impact

There is no expected performance impact as the internal logic should be largely the same. However, the Ansible playbooks will be executed where the user runs the CLI rather than by the Mistral server. This could then be slower or faster depending on the resources available to the machine and the network connection to the undercloud.

The undercloud itself should have more resources available since it wont be running Mistral or Zaqar.

Other Deployer Impact

If anyone is using the Mistral workflows directly, they will stop working. We currently don’t know of any users doing this and it was never documented.

Developer Impact

Developers will need to contribute to Ansible playbooks instead of Mistral workflows. As the pool of developers that know Ansible is larger than those that know Mistral this should make development easier. Ansible contributions will likely expect unit/functional tests.

Implementation

Assignee(s)

Primary assignee:: d0ugal

Other contributors:

apetrich
ekultails
sshnaidm
cloudnull

Work Items

Storyboard is being used to track this work:: https://storyboard.openstack.org/#!/board/208

Migrate the Mistral workflows to Ansible playbooks.
Migrate or replace custom Mistral actions to Ansible native components.
Remove Mistral and Zaqar.
Update documentation specific to Mistral.
Extend our auto-documentation plugin to support playbooks within tripleo-ansible. This will allow us to generate API documentation for all playbooks committed to tripleo-ansible, which will include our new cli prefixed playbooks.

Converting Mistral Workflows to Ansible

For each Mistral workflow the following steps need to be taken to port them to Ansible.

Re-write the Mistral workflow logic in Ansible, reusing the Mistral Python actions where appropriate.
Update python-tripleoclient to use the new Ansible playbooks. It should prefer showing the native Ansible output rather than attempting to replicate the previous output.
The Workflows and related code should be deleted from tripleo-common.

A complete example can be seen for the openstack undercloud backup command.

Dependencies

None

Testing

Since this change will largely be a re-working of existing code the changes will be tested by the existing CI coverage. This should be improved and expanded as is needed.

Documentation Impact

Any references to Mistral will need to be updated to point to the new ansible playbook.

References

https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-deployment

TripleO uses shared L2 networks today, so each node is attached to the provisioning network, and any other networks are also shared. This significantly reduces the complexity required to deploy on bare metal, since DHCP and PXE booting are simply done over a shared broadcast domain. This also makes the network switch configuration easy, since there is only a need to configure VLANs and ports, but no added complexity from dynamic routing between all switches.

This design has limitations, however, and becomes unwieldy beyond a certain scale. As the number of nodes increases, the background traffic from Broadcast, Unicast, and Multicast (BUM) traffic also increases. This design also requires all top-of-rack switches to trunk the VLANs back to the core switches, which centralizes the layer 3 gateway, usually on a single core switch. That creates a bottleneck which is not present in Clos architecture.

This spec serves as a detailed description of the overall problem set, and applies to the master blueprint. The sub-blueprints for the various implementation items also have their own associated spec.

Problem Description

Where possible, modern high-performance datacenter networks typically use routed networking to increase scalability and reduce failure domains. Using routed networks makes it possible to optimize a Clos (also known as “spine-and-leaf”) architecture for scalability:

,=========.,=========.|spine1|____|spine2|'==|\=====\_ \__________________/ _/=====/|=='| \_     \___/       \  ___/_/|^|    \___/    \ _______/   \ ___/||--Dynamicrouting(BGP,OSPF,|/  \       /       \      /  \    |vEIGRP),------.,------,------.,------.|leaf1|....|leaf2||leaf3|....|leaf4|========Layer2/3boundary'------''------''------''------'|||||||||-[serv-A1]=-||-[serv-B1]=-||-[serv-A2]=-||-[serv-B2]=-||-[serv-A3]=-||-[serv-B3]=-|RackARackB

In the above diagram, each server is connected via an Ethernet bond to both top-of-rack leaf switches, which are clustered and configured as a virtual switch chassis. Each leaf switch is attached to each spine switch. Within each rack, all servers share a layer 2 domain. The subnets are local to the rack, and the default gateway is the top-of-rack virtual switch pair. Dynamic routing between the leaf switches and the spine switches permits East-West traffic between the racks.

This is just one example of a routed network architecture. The layer 3 routing could also be done only on the spine switches, or there may even be distribution level switches that sit in between the top-of-rack switches and the routed core. The distinguishing feature that we are trying to enable is segregating local systems within a layer 2 domain, with routing between domains.

In a shared layer-2 architecture, the spine switches typically have to act in an active/passive mode to act as the L3 gateway for the single shared VLAN. All leaf switches must be attached to the active switch, and the limit on North-South bandwidth is the connection to the active switch, so there is an upper bound on the scalability. The Clos topology is favored because it provides horizontal scalability. Additional spine switches can be added to increase East-West and North-South bandwidth. Equal-cost multipath routing between switches ensures that all links are utlized simultaneously. If all ports are full on the spine switches, an additional tier can be added to connect additional spines, each with their own set of leaf switches, providing hyperscale expandability.

Each network device may be taken out of service for maintenance without the entire network being offline. This topology also allows the switches to be configured without physical loops or Spanning Tree, since the redundant links are either delivered via bonding or via multiple layer 3 uplink paths with equal metrics. Some advantages of using this architecture with separate subnets per rack are:

Reduced domain for broadcast, unknown unicast, and multicast (BUM) traffic.
Reduced failure domain.
Geographical separation.
Association between IP address and rack location.
Better cross-vendor support for multipath forwarding using equal-cost multipath forwarding (ECMP) via L3 routing, instead of proprietary “fabric”.

This topology is significantly different from the shared-everything approach that TripleO takes today.

As this is a complex topic, it will be easier to break the problems down into their constituent parts, based on which part of TripleO they affect:

Problem #1: TripleO uses DHCP/PXE on the Undercloud provisioning net (ctlplane).

Neutron on the undercloud does not yet support DHCP relays and multiple L2 subnets, since it does DHCP/PXE directly on the provisioning network.

Possible Solutions, Ideas, or Approaches:

Modify Ironic and/or Neutron to support multiple DHCP ranges in the dnsmasq configuration, use DHCP relay running on top-of-rack switches which receives DHCP requests and forwards them to dnsmasq on the Undercloud. There is a patch in progress to support that 11.
Modify Neutron to support DHCP relay. There is a patch in progress to support that 10.

Currently, if one adds a subnet to a network, Neutron DHCP agent will pick up the changes and configure separate subnets correctly in dnsmasq. For instance, after adding a second subnet to the ctlplane network, here is the resulting startup command for Neutron’s instance of dnsmasq:

dnsmasq--no-hosts--no-resolv--strict-order--except-interface=lo \
--pid-file=/var/lib/neutron/dhcp/aae53442-204e-4c8e-8a84-55baaeb496cf/pid \
--dhcp-hostsfile=/var/lib/neutron/dhcp/aae53442-204e-4c8e-8a84-55baaeb496cf/host \
--addn-hosts=/var/lib/neutron/dhcp/aae53442-204e-4c8e-8a84-55baaeb496cf/addn_hosts \
--dhcp-optsfile=/var/lib/neutron/dhcp/aae53442-204e-4c8e-8a84-55baaeb496cf/opts \
--dhcp-leasefile=/var/lib/neutron/dhcp/aae53442-204e-4c8e-8a84-55baaeb496cf/leases \
--dhcp-match=set:ipxe,175--bind-interfaces--interface=tap4ccef953-e0 \
--dhcp-range=set:tag0,172.19.0.0,static,86400s \
--dhcp-range=set:tag1,172.20.0.0,static,86400s \
--dhcp-option-force=option:mtu,1500--dhcp-lease-max=512 \
--conf-file=/etc/dnsmasq-ironic.conf--domain=openstacklocal

The router information gets put into the dhcp-optsfile, here are the contents of /var/lib/neutron/dhcp/aae53442-204e-4c8e-8a84-55baaeb496cf/opts:

tag:tag0,option:classless-static-route,172.20.0.0/24,0.0.0.0,0.0.0.0/0,172.19.0.254tag:tag0,249,172.20.0.0/24,0.0.0.0,0.0.0.0/0,172.19.0.254tag:tag0,option:router,172.19.0.254tag:tag1,option:classless-static-route,169.254.169.254/32,172.20.0.1,172.19.0.0/24,0.0.0.0,0.0.0.0/0,172.20.0.254tag:tag1,249,169.254.169.254/32,172.20.0.1,172.19.0.0/24,0.0.0.0,0.0.0.0/0,172.20.0.254tag:tag1,option:router,172.20.0.254

The above options file will result in separate routers being handed out to separate IP subnets. Furthermore, Neutron appears to “do the right thing” with regard to routes for other subnets on the same network. We can see that the option “classless-static-route” is given, with pointers to both the default route and the other subnet(s) on the same Neutron network.

In order to modify Ironic-Inspector to use multiple subnets, we will need to extend instack-undercloud to support network segments. There is a patch in review to support segments in instack undercloud 0.

Potential Workaround

One possibility is to use an alternate method to DHCP/PXE boot, such as using DHCP configuration directly on the router, or to configure a host on the remote network which provides DHCP and PXE URLs, then provides routes back to the ironic-conductor and metadata server as part of the DHCP response.

It is not always feasible for groups doing testing or development to configure DHCP relay on the switches. For proof-of-concept implementations of spine-and-leaf, we may want to configure all provisioning networks to be trunked back to the Undercloud. This would allow the Undercloud to provide DHCP for all networks without special switch configuration. In this case, the Undercloud would act as a router between subnets/VLANs. This should be considered a small-scale solution, as this is not as scalable as DHCP relay. The configuration file for dnsmasq is the same whether all subnets are local or remote, but dnsmasq may have to listen on multiple interfaces (today it only listens on br-ctlplane). The dnsmasq process currently runs with --bind-interface=tap-XXX, but the process will need to be run with either binding to multiple interfaces, or with --except-interface=lo and multiple interfaces bound to the namespace.

For proof-of-concept deployments, as well as testing environments, it might make sense to run a DHCP relay on the Undercloud, and trunk all provisioning VLANs back to the Undercloud. This would allow dnsmasq to listen on the tap interface, and DHCP requests would be forwarded to the tap interface. The downside of this approach is that the Undercloud would need to have IP addresses on each of the trunked interfaces.

Another option is to configure dedicated hosts or VMs to be used as DHCP relay and router for subnets on multiple VLANs, all of which would be trunked to the relay/router host, thus acting exactly like routing switches.

Problem #2: Neutron’s model for a segmented network that spans multiple L2 domains uses the segment object to allow multiple subnets to be assigned to the same network. This functionality needs to be integrated into the Undercloud.

Possible Solutions, Ideas, or Approaches:

Implement Neutron segments on the undercloud.

The spec for Neutron routed network segments 1 provides a schema that we can use to model a routed network. By implementing support for network segments, we can provide assign Ironic nodes to networks on routed subnets. This allows us to continue to use Neutron for IP address management, as ports are assigned by Neutron and tracked in the Neutron database on the Undercloud. See approach #1 below.

Multiple Neutron networks (1 set per rack), to model all L2 segments.

By using a different set of networks in each rack, this provides us with the flexibility to use different network architectures on a per-rack basis. Each rack could have its own set of networks, and we would no longer have to provide all networks in all racks. Additionally, a split-datacenter architecture would naturally have a different set of networks in each site, so this approach makes sense. This is detailed in approach #2 below.

Multiple subnets per Neutron network.

This is probably the best approach for provisioning, since Neutron is already able to handle DHCP relay with multiple subnets as part of the same network. Additionally, this allows a clean separation between local subnets associated with provisioning, and networks which are used in the overcloud, such as External networks in two different datacenters). This is covered in more detail in approach #3 below.

Use another system for IPAM, instead of Neutron.

Although we could use a database, flat file, or some other method to keep track of IP addresses, Neutron as an IPAM back-end provides many integration benefits. Neutron integrates DHCP, hardware switch port configuration (through the use of plugins), integration in Ironic, and other features such as IPv6 support. This has been deemed to be infeasible due to the level of effort required in replacing both Neutron and the Neutron DHCP server (dnsmasq).

Approaches to Problem #2:

Approach 1 (Implement Neutron segments on the Undercloud):

The Neutron segments model provides a schema in Neutron that allows us to model the routed network. Using multiple subnets provides the flexibility we need without creating exponentially more resources. We would create the same provisioning network that we do today, but use multiple segments associated to different routed subnets. The disadvantage to this approach is that it makes it impossible to represent network VLANs with more than one IP subnet (Neutron technically supports more than one subnet per port). Currently TripleO only supports a single subnet per isolated network, so this should not be an issue.

Approach 2 (Multiple Neutron networks (1 set per rack), to model all L2 segments):

We will be using multiple networks to represent isolated networks in multiple L2 domains. One sticking point is that although Neutron will configure multiple routes for multiple subnets within a given network, we need to be able to both configure static IPs and routes, and be able to scale the network by adding additional subnets after initial deployment.

Since we control addresses and routes on the host nodes using a combination of Heat templates and os-net-config, it is possible to use static routes to supernets to provide L2 adjacency. This approach only works for non-provisioning networks, since we rely on Neutron DHCP servers providing routes to adjacent subnets for the provisioning network.

Example: Suppose 2 subnets are provided for the Internal API network: 172.19.1.0/24 and 172.19.2.0/24. We want all Internal API traffic to traverse the Internal API VLANs on both the controller and a remote compute node. The Internal API network uses different VLANs for the two nodes, so we need the routes on the hosts to point toward the Internal API gateway instead of the default gateway. This can be provided by a supernet route to 172.19.x.x pointing to the local gateway on each subnet (e.g. 172.19.1.1 and 172.19.2.1 on the respective subnets). This could be represented in os-net-config with the following:

-type:interfacename:nic3addresses:-ip_netmask:{get_param:InternalApiIpSubnet}routes:-ip_netmask:{get_param:InternalApiSupernet}next_hop:{get_param:InternalApiRouter}

Where InternalApiIpSubnet is the IP address on the local subnet, InternalApiSupernet is ‘172.19.0.0/16’, and InternalApiRouter is either 172.19.1.1 or 172.19.2.1 depending on which local subnet the host belongs to.

The end result of this is that each host has a set of IP addresses and routes that isolate traffic by function. In order for the return traffic to also be isolated by function, similar routes must exist on both hosts, pointing to the local gateway on the local subnet for the larger supernet that contains all Internal API subnets.

The downside of this is that we must require proper supernetting, and this may lead to larger blocks of IP addresses being used to provide ample space for scaling growth. For instance, in the example above an entire /16 network is set aside for up to 255 local subnets for the Internal API network. This could be changed into a more reasonable space, such as /18, if the number of local subnets will not exceed 64, etc. This will be less of an issue with native IPv6 than with IPv4, where scarcity is much more likely.

Approach 3 (Multiple subnets per Neutron network):

The approach we will use for the provisioning network will be to use multiple subnets per network, using Neutron segments. This will allow us to take advantage of Neutron’s ability to support multiple networks with DHCP relay. The DHCP server will supply the necessary routes via DHCP until the nodes are configured with a static IP post-deployment.

Problem #3: Ironic introspection DHCP doesn’t yet support DHCP relay

This makes it difficult to do introspection when the hosts are not on the same L2 domain as the controllers. Patches are either merged or in review to support DHCP relay.

Possible Solutions, Ideas, or Approaches:

A patch to support a dnsmasq PXE filter driver has been merged. This will allow us to support selective DHCP when using DHCP relay (where the packet is not coming from the MAC of the host but rather the MAC of the switch) 12.
A patch has been merged to puppet-ironic to support multiple DHCP subnets for Ironic Inspector 13.
A patch is in review to add support for multiple subnets for the provisioning network in the instack-undercloud scripts 14.

For more information about solutions, please refer to the tripleo-routed-networks-ironic-inspector blueprint 5 and spec 6.

Problem #4: The IP addresses on the provisioning network need to be static IPs for production.

Possible Solutions, Ideas, or Approaches:

Dan Prince wrote a patch 9 in Newton to convert the ctlplane network addresses to static addresses post-deployment. This will need to be refactored to support multiple provisioning subnets across routers.

Solution Implementation

This work is done and merged for the legacy use case. During the initial deployment, the nodes receive their IP address via DHCP, but during Heat deployment the os-net-config script is called, which writes static configuration files for the NICs with static IPs.

This work will need to be refactored to support assigning IPs from the appropriate subnet, but the work will be part of the TripleO Heat Template refactoring listed in Problems #6, and #7 below.

For the deployment model where the IPs are specified (ips-from-pool-all.yaml), we need to develop a model where the Control Plane IP can be specified on multiple deployment subnets. This may happen in a later cycle than the initial work being done to enable routed networks in TripleO. For more information, reference the tripleo-predictable-ctlplane-ips blueprint 7 and spec 8.

Problem #5: Heat Support For Routed Networks

The Neutron routed networks extensions were only added in recent releases, and there was a dependency on TripleO Heat Templates.

Possible Solutions, Ideas or Approaches:

Add the required objects to Heat. At minimum, we will probably have to add OS::Neutron::Segment, which represents layer 2 segments, the OS::Neutron::Network will be updated to support the l2-adjacency attribute, OS::Neutron::Subnet and OS::Neutron:port would be extended to support the segment_id attribute.

Solution Implementation:

Heat now supports the OS::Neutron::Segment resource. For example:

heat_template_version:2015-04-30...resources:...the_resource:type:OS::Neutron::Segmentproperties:description:Stringname:Stringnetwork:Stringnetwork_type:Stringphysical_network:Stringsegmentation_id:Integer

This work has been completed in Heat with this review 15.

Problem #6: Static IP assignment: Choosing static IPs from the correct subnet

Some roles, such as Compute, can likely be placed in any subnet, but we will need to keep certain roles co-located within the same set of L2 domains. For instance, whatever role is providing Neutron services will need all controllers in the same L2 domain for VRRP to work properly.

The network interfaces will be configured using templates that create configuration files for os-net-config. The IP addresses that are written to each node’s configuration will need to be on the correct subnet for each host. In order for Heat to assign ports from the correct subnets, we will need to have a host-to-subnets mapping.

Possible Solutions, Ideas or Approaches:

The simplest implementation of this would probably be a mapping of role/index to a set of subnets, so that it is known to Heat that Controller-1 is in subnet set X and Compute-3 is in subnet set Y.
We could associate particular subnets with roles, and then use one role per L2 domain (such as per-rack).
The roles and templates should be refactored to allow for dynamic IP assignment within subnets associated with the role. We may wish to evaluate the possibility of storing the routed subnets in Neutron using the routed networks extensions that are still under development. This would provide additional flexibility, but is probably not required to implement separate subnets in each rack.
A scalable long-term solution is to map which subnet the host is on during introspection. If we can identify the correct subnet for each interface, then we can correlate that with IP addresses from the correct allocation pool. This would have the advantage of not requiring a static mapping of role to node to subnet. In order to do this, additional integration would be required between Ironic and Neutron (to make Ironic aware of multiple subnets per network, and to add the ability to make that association during introspection).

Solution Impelementation:

Solutions 1 and 2 above have been implemented in the “composable roles” series of patches 16. The initial implementation uses separate Neutron networks for different L2 domains. These templates are responsible for assigning the isolated VLANs used for data plane and overcloud control planes, but does not address the provisioning network. Future work may refactor the non-provisioning networks to use segments, but for now non-provisioning networks must use different networks for different roles.

Ironic autodiscovery may allow us to determine the subnet where each node is located without manual entry. More work is required to automate this process.

Problem #7: Isolated Networking Requires Static Routes to Ensure Correct VLAN is Used

In order to continue using the Isolated Networks model, routes will need to be in place on each node, to steer traffic to the correct VLAN interfaces. The routes are written when os-net-config first runs, but may change. We can’t just rely on the specific routes to other subnets, since the number of subnets will increase or decrease as racks are added or taken away. Rather than try to deal with constantly changing routes, we should use static routes that will not need to change, to avoid disruption on a running system.

Possible Solutions, Ideas or Approaches:

Require that supernets are used for various network groups. For instance, all the Internal API subnets would be part of a supernet, for instance 172.17.0.0/16 could be used, and broken up into many smaller subnets, such as /24. This would simplify the routes, since only a single route for 172.17.0.0/16 would be required pointing to the local router on the 172.17.x.0/24 network.
Modify os-net-config so that routes can be updated without bouncing interfaces, and then run os-net-config on all nodes when scaling occurs. A review for this functionality was considered and abandeded 3. The patch was determined to have the potential to lead to instability.

os-net-config configures static routes for each interface. If we can keep the routing simple (one route per functional network), then we would be able to isolate traffic onto functional VLANs like we do today.

At a later time, the possibility of using dynamic routing should be considered, since it reduces the possibility of user error and is better suited to centralized management. SDN solutions are one way to provide this, or other approaches may be considered, such as setting up OVS tunnels.

Proposed Change

The proposed changes are discussed below.

Overview

In order to provide spine-and-leaf networking for deployments, several changes will have to be made to TripleO:

Support for DHCP relay in Ironic and Neutron DHCP servers. Implemented in patch 15 and the patch series 17.
Refactoring of TripleO Heat Templates network isolation to support multiple subnets per isolated network, as well as per-subnet and supernet routes. The bulk of this work is done in the patch series 16 and in patch 10.
Changes to Infra CI to support testing.
Documentation updates.

Alternatives

The approach outlined here is very prescriptive, in that the networks must be known ahead of time, and the IP addresses must be selected from the appropriate pool. This is due to the reliance on static IP addresses provided by Heat.

One alternative approach is to use DHCP servers to assign IP addresses on all hosts on all interfaces. This would simplify configuration within the Heat templates and environment files. Unfortunately, this was the original approach of TripleO, and it was deemed insufficient by end-users, who wanted stability of IP addresses, and didn’t want to have an external dependency on DHCP.

Another approach is to use the DHCP server functionality in the network switch infrastructure in order to PXE boot systems, then assign static IP addresses after the PXE boot is done via DHCP. This approach only solves for part of the requirement: the net booting. It does not solve the desire to have static IP addresses on each network. This could be achieved by having static IP addresses in some sort of per-node map. However, this approach is not as scalable as programatically determining the IPs, since it only applies to a fixed number of hosts. We want to retain the ability of using Neutron as an IP address management (IPAM) back-end, ideally.

Another approach which was considered was simply trunking all networks back to the Undercloud, so that dnsmasq could respond to DHCP requests directly, rather than requiring a DHCP relay. Unfortunately, this has already been identified as being unacceptable by some large operators, who have network architectures that make heavy use of L2 segregation via routers. This also won’t work well in situations where there is geographical separation between the VLANs, such as in split-site deployments.

Security Impact

One of the major differences between spine-and-leaf and standard isolated networking is that the various subnets are connected by routers, rather than being completely isolated. This means that without proper ACLs on the routers, networks which should be private may be opened up to outside traffic.

This should be addressed in the documentation, and it should be stressed that ACLs should be in place to prevent unwanted network traffic. For instance, the Internal API network is sensitive in that the database and message queue services run on that network. It is supposed to be isolated from outside connections. This can be achieved fairly easily if supernets are used, so that if all Internal API subnets are a part of the 172.19.0.0/16 supernet, an ACL rule will allow only traffic between Internal API IPs (this is a simplified example that could be applied to any Internal API VLAN, or as a global ACL):

allowtrafficfrom172.19.0.0/16to172.19.0.0/16denytrafficfrom*to172.19.0.0/16

Other End User Impact

Deploying with spine-and-leaf will require additional parameters to provide the routing information and multiple subnets required. This will have to be documented. Furthermore, the validation scripts may need to be updated to ensure that the configuration is validated, and that there is proper connectivity between overcloud hosts.

Performance Impact

Much of the traffic that is today made over layer 2 will be traversing layer 3 routing borders in this design. That adds some minimal latency and overhead, although in practice the difference may not be noticeable. One important consideration is that the routers must not be too overcommitted on their uplinks, and the routers must be monitored to ensure that they are not acting as a bottleneck, especially if complex access control lists are used.

Other Deployer Impact

A spine-and-leaf deployment will be more difficult to troubleshoot than a deployment that simply uses a set of VLANs. The deployer may need to have more network expertise, or a dedicated network engineer may be needed to troubleshoot in some cases.

Developer Impact

Spine-and-leaf is not easily tested in virt environments. This should be possible, but due to the complexity of setting up libvirt bridges and routes, we may want to provide a simulation of spine-and-leaf for use in virtual environments. This may involve building multiple libvirt bridges and routing between them on the Undercloud, or it may involve using a DHCP relay on the virt-host as well as routing on the virt-host to simulate a full routing switch. A plan for development and testing will need to be developed, since not every developer can be expected to have a routed environment to work in. It may take some time to develop a routed virtual environment, so initial work will be done on bare metal.

Implementation

Assignee(s)

Primary assignee:: Dan Sneddon <dsneddon@redhat.com>

Approver(s)

Primary approver:: Emilien Macchi <emacchi@redhat.com>

Work Items

Add static IP assignment to Control Plane [DONE]
Modify Ironic Inspector dnsmasq.conf generation to allow export of multiple DHCP ranges, as described in Problem #1 and Problem #3.
Evaluate the Routed Networks work in Neutron, to determine if it is required for spine-and-leaf, as described in Problem #2.
Add OS::Neutron::Segment and l2-adjacency support to Heat, as described in Problem #5. This may or may not be a dependency for spine-and-leaf, based on the results of work item #3.
Modify the Ironic-Inspector service to record the host-to-subnet mappings, perhaps during introspection, to address Problem #6.
Add parameters to Isolated Networking model in Heat to support supernet routes for individual subnets, as described in Problem #7.
Modify Isolated Networking model in Heat to support multiple subnets, as described in Problem #8.
Add support for setting routes to supernets in os-net-config NIC templates, as described in the proposed solution to Problem #2.
Implement support for iptables on the Controller, in order to mitigate the APIs potentially being reachable via remote routes. Alternatively, document the mitigation procedure using ACLs on the routers.
Document the testing procedures.
Modify the documentation in tripleo-docs to cover the spine-and-leaf case.

Implementation Details

Workflow:

Operator configures DHCP networks and IP address ranges
Operator imports baremetal instackenv.json
When introspection or deployment is run, the DHCP server receives the DHCP request from the baremetal host via DHCP relay
If the node has not been introspected, reply with an IP address from the introspection pool* and the inspector PXE boot image
If the node already has been introspected, then the server assumes this is a deployment attempt, and replies with the Neutron port IP address and the overcloud-full deployment image
The Heat templates are processed which generate os-net-config templates, and os-net-config is run to assign static IPs from the correct subnets, as well as routes to other subnets via the router gateway addresses.

The introspection pool will be different for each provisioning subnet.

When using spine-and-leaf, the DHCP server will need to provide an introspection IP address on the appropriate subnet, depending on the information contained in the DHCP relay packet that is forwarded by the segment router. dnsmasq will automatically match the gateway address (GIADDR) of the router that forwarded the request to the subnet where the DHCP request was received, and will respond with an IP and gateway appropriate for that subnet.

The above workflow for the DHCP server should allow for provisioning IPs on multiple subnets.

Dependencies

There may be a dependency on the Neutron Routed Networks. This won’t be clear until a full evaluation is done on whether we can represent spine-and-leaf using only multiple subnets per network.

There will be a dependency on routing switches that perform DHCP relay service for production spine-and-leaf deployments.

Testing

In order to properly test this framework, we will need to establish at least one CI test that deploys spine-and-leaf. As discussed in this spec, it isn’t necessary to have a full routed bare metal environment in order to test this functionality, although there is some work to get it working in virtual environments such as OVB.

For bare metal testing, it is sufficient to trunk all VLANs back to the Undercloud, then run DHCP proxy on the Undercloud to receive all the requests and forward them to br-ctlplane, where dnsmasq listens. This will provide a substitute for routers running DHCP relay. For Neutron DHCP, some modifications to the iptables rule may be required to ensure that all DHCP requests from the overcloud nodes are received by the DHCP proxy and/or the Neutron dnsmasq process running in the dhcp-agent namespace.

Documentation Impact

The procedure for setting up a dev environment will need to be documented, and a work item mentions this requirement.

The TripleO docs will need to be updated to include detailed instructions for deploying in a spine-and-leaf environment, including the environment setup. Covering specific vendor implementations of switch configurations is outside this scope, but a specific overview of required configuration options should be included, such as enabling DHCP relay (or “helper-address” as it is also known) and setting the Undercloud as a server to receive DHCP requests.

The updates to TripleO docs will also have to include a detailed discussion of choices to be made about IP addressing before a deployment. If supernets are to be used for network isolation, then a good plan for IP addressing will be required to ensure scalability in the future.

References

0: Review: TripleO Heat Templates: Tripleo routed networks ironic inspector, and Undercloud
1: Spec: Routed Networks for Neutron
3: Review: Modify os-net-config to make changes without bouncing interface
5: Blueprint: Modify TripleO Ironic Inspector to PXE Boot Via DHCP Relay
6: Spec: Modify TripleO Ironic Inspector to PXE Boot Via DHCP Relay
7: Blueprint: User-specifiable Control Plane IP on TripleO Routed Isolated Networks
8: Spec: User-specifiable Control Plane IP on TripleO Routed Isolated Networks
9: Review: Configure ctlplane network with a static IP
10(1,2): Review: Neutron: Make “on-link” routes for subnets optional
11: Review: Ironic Inspector: Make “on-link” routes for subnets optional
12: Review: Ironic Inspector: Introducing a dnsmasq PXE filter driver
13: Review: Multiple DHCP Subnets for Ironic Inspector
14: Review: Instack Undercloud: Add support for multiple inspection subnets
15(1,2): Review: DHCP Agent: Separate local from non-local subnets
16(1,2): Review Series: topic:bp/composable-networks
17: Review Series: project:openstack/networking-baremetal

.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

http://creativecommons.org/licenses/by/3.0/legalcode

https://blueprints.launchpad.net/tripleo/+spec/fast-forward-upgrades

Fast-forward upgrades are upgrades that move an environment from release N to N+X in a single step, where X is greater than 1 and for fast-forward upgrades is typically 3. This spec outlines how such upgrades can be orchestrated by TripleO between the Newton and Queens OpenStack releases.

Problem Description

OpenStack upgrades are often seen by operators as problematic 1 2. Whilst TripleO upgrades have improved greatly over recent cycles many operators are still reluctant to upgrade with each new release.

This often leads to a situation where environments remain on the release used when first deployed. Eventually this release will come to the end of its supported life (EOL), forcing operators to upgrade to the next supported release. There can also be restrictions imposed on an environment that simply do not allow upgrades to be performed ahead of the EOL of a given release, forcing operators to again wait until the release hits EOL.

While it is possible to then linearly upgrade to a supported release with the cadence of upstream releases, downstream distributions providing long-term support (LTS) releases may not be able to provide the same path once the initially installed release reaches EOL. Operators in such a situation may also want to avoid running multiple lengthy linear upgrades to reach their desired release.

Proposed Change

Overview

TripleO support for fast-forward upgrades will first target N to N+3 upgrades between the Newton and Queens releases:

Newton    Ocata     Pike       Queens
+-----+   +-----+   +-----+    +-----+
||| N+1 || N+2 ||||  N  | ---------------------> | N+3 |||||||||
+-----+   +-----+   +-----+    +-----+

This will give the impression of the Ocata and Pike releases being skipped with the fast-forward upgrade moving the environment from Newton to Queens. In reality as OpenStack projects with the supports-upgrade tag are only required to support N to N+1 upgrades 3 the upgrade will still need to move through each release, completing database migrations and a limited set of other tasks.

Caveats

Before outlining the suggested changes to TripleO it is worth highlighting the following caveats for fast-forward upgrades:

The control plane is inaccessible for the duration of the upgrade
The data plane and active workloads must remain available for the duration of the upgrade.

Prerequisites

Prior to the overcloud fast-forward upgrade starting the following prerequisite tasks must be completed:

Rolling minor update of the overcloud on N

This is a normal TripleO overcloud update 4 and should bring each node in the environment up to the latest supported version of the underlying OS and pulling in the latest packages. Operators can then reboot the nodes as required. The reboot ensuring that the latest kernel, openvswitch, QEMU and any other reboot dependant package is reloaded before proceeding with the upgrade. This can happen well in advance of the overcloud fast-forward upgrade and should remove the need for additional reboots during the upgrade.

Upgrade undercloud from N to N+3

The undercloud also needs to be upgraded to N+3 ahead of any overcloud upgrade. Again this can happen well in advance of the overcloud upgrade. For the time being this is a traditional, linear upgrade between N and N+1 releases until we reach the target N+3 Queens release.

Container images cached prior to the start of the upgrade

With the introduction of containerised TripleO overclouds in Pike operators will need to cache the required container images prior to the fast-forward upgrade if they wish to end up with a containerised Queens overcloud.

High level flow

At a high level the following actions will be carried out by the fast-forward upgrade to move the overcloud from N to N+3:

Stop all OpenStack control and compute services across all roles

This will bring down the OpenStack control plane, leaving infrastructure services such as the databases running, while allowing any workloads to continue running without interruption. For HA environments this will disable the cluster, ensuring that OpenStack services are not restarted.

Upgrade a single host from N to N+1 then N+1 to N+2

As alluded to earlier, OpenStack projects currently only support N to N+1 upgrades and so fast-forward upgrades still need to cycle through each release in order to complete data migrations and any other tasks that are required before these migrations can be completed. This part of the upgrade is limited to a single host per role to ensure this is completed as quickly as possible.

Optional upgrade and deployment of single canary compute host to N+3

As fast-forward upgrades aim to ensure workloads are online and accessible during the upgrade we can optionally upgrade all control service hosting roles _and_ a single canary compute to N+3 to verify that workloads will remain active and accessible during the upgrade.

A canary compute node will be selected at the start of the upgrade and have instances launched on it to validate that both it and the data plane remain active during the upgrade. The upgrade will halt if either become inaccessible with a recovery procedure being provided to move all hosts back to N+1 without further disruption to the active workloads on the untouched compute hosts.

Upgrade and deployment of all roles to N+3

If the above optional canary compute host upgrade is not used then the final action in the fast-forward upgrade will be a traditional N to N+1 migration between N+2 and N+3 followed by the deployment of all roles on N+3. This final action essentially being a redeployment of the overcloud to containers on N+3 (Queens) as previously seen when upgrading TripleO environments from Ocata to Pike.

A python-tripleoclient command and associated Mistral workflow will control if this final step is applied to all roles in parallel (default), all hosts in a given role or selected hosts in a given role. The latter being useful if a user wants to control the order in which computes are moved from N+1 to N+3 etc.

Implementation

As with updates 5 and upgrades 6 specific fast-forward upgrade Ansible tasks associated with the first two actions above will be introduced into the tripleo-heat-template service templates for each service as RoleConfig outputs.

As with upgrade_tasks each task is associated with a particular step in the process. For fast_forward_upgrade_tasks these steps are split between prep tasks that apply to all hosts and bootstrap tasks that only apply to a single host for a given role.

Prep step tasks will map to the following actions:

Step=1: Disable the overall cluster
Step=2: Stop OpenStack services
Step=3: Update host repositories

Bootstrap step tasks will map to the following actions:

Step=4: Take OpenStack DB backups
Step=5: Pre package update commands
Step=6: Update required packages
Step=7: Post package update commands
Step=8: OpenStack service DB sync
Step=9: Validation

As with update_tasks each task will use simple when conditionals to identify which step and release(s) it is associated with, ensuring these tasks are executed at the correct point in the upgrade.

For example, a step 2 fast_forward_upgrade_task task on Ocata is listed below:

fast_forward_upgrade_tasks:-name:Example Ocata step 2 taskcommand:/bin/foo barwhen:-step|int == 2-release == 'ocata'

These tasks will then be collated into role specific Ansible playbooks via the RoleConfig output of the overcloud heat template, with step and release variables being fed in to ensure tasks are executed in the correct order.

As with major upgrades8 a new mistral workflow and tripleoclient command will be introduced to generate and execute the associated Ansible tasks.

openstack overcloud fast-forward-upgrade --templates [..path to latest THT..]\[..original environment arguments..]\[..new container environment agruments..]

Operators will also be able to generate 7 , download and review the playbooks ahead of time using the latest version of tripleo-heat-templates with the following commands:

openstack overcloud deploy --templates [..path to latest THT..]\[..original environment arguments..]\[..new container environment agruments..]\
                           -e environments/fast-forward-upgrade.yaml \
                           -e environments/noop-deploy-steps.yaml
openstack overcloud config download

Dev workflow

The existing tripleo-upgrade Ansible role will be used to automate the fast-forward upgrade process for use by developers and CI, including the initial overcloud minor update, undercloud upgrade to N+3 and fast-forward upgrade itself.

Developers working on fast_forward_upgrade_tasks will also be able to deploy minimal overcloud deployments via tripleo-quickstart using release configs also used by CI.

Further, when developing tasks, developers will be able to manually render and run fast_forward_upgrade_tasks as standalone Ansible playbooks. Allowing them to run a subset of the tasks against specific nodes using tripleo-ansible-inventory. Examples of how to do this will be documented hopefully ensuring a smooth development experience for anyone looking to contribute tasks for specific services.

Alternatives

Continue to force operators to upgrade linearly through each major release
Parallel cloud migrations.

Security Impact

N/A

Other End User Impact

The control plane will be down for the duration of the upgrade
The data plane and workloads will remain up.

Performance Impact

N/A

Other Deployer Impact

N/A

Developer Impact

Third party service template providers will need to provide fast_forward_upgrade_steps in their THT service configurations.

Implementation

Assignee(s)

Primary assignees:

lbezdick
marios
chem

Other contributors:

shardy
lyarwood

Work Items

Introduce fast_forward_upgrades_playbook.yaml to RoleConfig
Introduce fast_forward_upgrade_tasks in each service template
Introduce a python-tripleoclient command and associated Mistral workflow.

Dependencies

TripleO - Ansible upgrade Workflow with UI integration 9

The new major upgrade workflow being introduced for Pike to Queens upgrades will obviously impact what fast-forward upgrades looks like to Queens. At present the high level flow for fast-forward upgrades assumes that we can reuse the current upgrade_tasks between N+2 and N+3 to disable and then potentially remove baremetal services. This is likely to change as the major upgrade workflow is introduced and so it is likely that these steps will need to be encoded in fast_forward_upgrade_tasks.

Testing

Third party CI jobs will need to be created to test Newton to Queens using RDO given the upstream EOL of stable/newton with the release of Pike.
These jobs should cover the initial undercloud upgrade, overcloud upgrade and optional canary compute node checks.
An additional third party CI job will be required to verify that a Queens undercloud can correctly manage a Newton overcloud, allowing the separation of the undercloud upgrade and fast-forward upgrade discussed under prerequisites.
Finally, minimal overcloud roles should be used to verify the upgrade for certain services. For example, when changes are made to the fast_forward_upgrade_tasks of Nova via changes to docker/services/nova-*.yaml files then a basic overcloud deployment of Keystone, Glance, Swift, Cinder, Neutron and Nova could be used to quickly verify the changes in regards to fast-forward upgrades.

Documentation Impact

This will require extensive developer and user documentation to be written, most likely in a new section of the docs specifically detailing the fast-forward upgrade flow.

References

1: https://etherpad.openstack.org/p/MEX-ops-migrations-upgrades
2: https://etherpad.openstack.org/p/BOS-forum-skip-level-upgrading
3: https://governance.openstack.org/tc/reference/tags/assert_supports-upgrade.html
4: http://tripleo.org/install/post_deployment/package_update.html
5: https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/README.rst#update-steps
6: https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/README.rst#upgrade-steps
7: https://review.openstack.org/#/c/495658/
8: https://review.openstack.org/#/q/topic:major-upgrade+(status:open+OR+status:merged)
9: https://specs.openstack.org/openstack/tripleo-specs/specs/queens/tripleo_ansible_upgrades_workflow.html

Goal

Provide a basic policy for tracking and being able to reference tech debt related changes in TripleO.

Problem Description

During the development of TripleO, sometimes tech debt is acquired due to time or resource constraints that may exist. Without a solid way of tracking when we intentially add tech debt, it is hard to quantify how much tech debt is being self inflicted. Additionally tech debt gets lost in the code and without a way to remember where we left it, it is almost impossible to remember when and where we need to go back to fix some known issues.

Proposed Change

Tracking Code Tech Debt with Bugs

Intentionally created tech debt items should have a bug 1 created with the tech-debt tag added to it. Additionally the commit message of the change should reference this tech-debt bug and if possible a comment should be added into the code referencing who put it in there.

Example Commit Message:

Alwaysexit0becausefooiscurrentlybrokenWeneedtoalwaysexit0becausethefooprocesseroneouslyreturns42.Abughasbeenreportedupstreambutwearenotsurewhenitwillbeaddressed.Related-Bug:#1234567

Example Comment:

# TODO(aschultz): We need this because the world is falling apart LP#1234567foo||exit0

Triaging Bugs as Tech Debt

If an end user reports a bug that we know is a tech debt item, the person triaging the bug should add the tech-debt tag to the bug.

Reporting Tech Debt

With the tech-debt tag on bugs, we should be able to keep a running track of the bugs we have labeled and should report on this every release milestone to see trends around how much is being added and when. As part of our triaging of bugs, we should strive to add net-zero tech-debt bugs each major release if possible.

Alternatives

We continue to not track any of these things and continue to rely on developers to remember when they add code and circle back around to fix it themselves or when other developers find the issue and remove it.

Implementation

Core reviewers should request that any tech debt be appropriately tracked and feel free to -1 any patches that are adding tech debt without proper attribution.

Author(s)

Primary author:: aschultz

Milestones

Queens-1

Work Items

aschultz to create tech-debt tag in Launchpad.

References

1: https://docs.openstack.org/tripleo-docs/latest/contributor/contributions.html#reporting-bugs

Revision History

Revisions
Release Name	Description
Queens	Introduced

Note

This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode

https://blueprints.launchpad.net/tripleo/+spec/tripleo-derive-parameters

This specification proposes a generic interface for automatically populating environment files with parameters which were derived from formulas; where the formula’s input came from introspected hardware data, workload type, and deployment type. It also provides specific examples of how this interface may be used to improve deployment of overclouds to be used in DPDK or HCI usecases. Finally, it proposes how this generic interface may be shared and extended by operators who optionally chose to have certain parameters prescribed so that future systems tuning expertise may be integrated into TripleO.

Problem Description

Operators must populate parameters for a deployment which may be specific to hardware and deployment type. The hardware information of a node is available to the operator once the introspection of the node is completed. However, the current process requires that the operator manually read the introspected data, make decisions based on that data and then update the parameters in an environment file. This makes deployment preparation unnecessarily complex.

For example, when deploying for DPDK, the operator must provide the list of CPUs which should be assigned to the DPDK Poll Mode Driver (PMD) and the CPUs should be provided from the same NUMA node on which the DPDK interface is present. In order to provide the correct parameters, the operator must cross check all of these details.

Another example is the deployment of HCI overclouds, which run both Nova compute and Ceph OSD services on the same nodes. In order to prevent contention between compute and storage services, the operator may manually apply formulas, provided by performance tuning experts, which take into account available hardware, type of workload, and type of deployment, and then after computing the appropriate parameters based on those formulas, manually store them in environment files.

In addition to the complexity of the DPDK or HCI usecase, knowing the process to assign CPUs to the DPDK Poll Mode Driver or isolate compute and storage resources for HCI is, in itself, another problem. Rather than document the process and expect operators to follow it, the process should be captured in a high level language with a generic interface so that performance tuning experts may easily share new similar processes for other use cases with operators.

Proposed Change

This spec aims to make three changes to TripleO outlined below.

Mistral Workflows to Derive Parameters

A group of Mistral workflows will be added for the features which are complex to determine the deployment parameters. Features like DPDK, SR-IOV and HCI require, input from the introspection data to be analyzed to compute the deployment parameters. This derive parameters workflow will provide a default set of computational formulas by analyzing the introspected data. Thus, there will be a hard dependency with node introspection for this workflow to be successful.

During the first iterations, all the roles in a deployment will be analyzed to find a service associated with the role, which requires parameter derivation. Various options of using this and the final choice for the current iteration is discussed in below section Workflow Association with Services.

This workflow assumes that all the nodes in a role have a homegenous hardware specification and introspection data of the first node will be used for processing the parameters for the entire role. This will be reexamined in later iterations, based on the need for node specific derivations. The workflow will consider the flavor-profile association and nova placement scheduler to identify the nodes associated with a role.

Role-specific parameters are an important requirement for this workflow. If there are multiple roles with the same service (feature) enabled, the parameters which are derived from this workflow will be applied only on the corresponding role.

The input sources for these workflows are the ironic database and ironic introspection data stored in Swift, in addition to the Deployment plan stored in Swift. Computations done to derive the parameters within the Mistral workflow will be implemented in YAQL. These computations will be a separate workflow on per feature basis so that the formulas can be customizable. If an operator has to modify the default formulas, he or she has to update only this workflow with customized formula.

Applying Derived Parameters to the Overcloud

In order for the resulting parameters to be applied to the overcloud, the deployment plan, which is stored in Swift on the undercloud, will be modified with the Mistral tripleo.parameters.update action or similar.

The methods for providing input for derivation and the update of parameters which are derivation output should be consistent with the Deployment Plan Management specification 1. The implementation of this spec with respect to the interfaces to set and get parameters may change as it is updated. However, the basic workflow should remain the same.

Trigger Mistral Workflows with TripleO

Assuming that workflows are in place to derive parameters and update the deployment plan as described in the previous two sections, an operator may take advantage of this optional feature by enabling it via plan-environment.yaml. A new section workflow_parameters will be added to the plan-environments.yaml file to accomodate the additional parameters required for executing workflows. With this additional section, we can ensure that the workflow specific parameters are provide only to the workflow, without polluting the heat environments. It will also be possible to provide multiple plan environment files which will be merged in the CLI before plan creation.

These additional parameters will be read by the derive params workflow directly from the merged plan-environment.yaml file stored in Swift.

It is possible to modify the created plan or modify the profile-node association, after the derive parameters workflow execution. As of now, we assume that there no such alterations done, but it will be extended after the initial iteration, to fail the deployment with some validations.

An operator should be able to derive and view parameters without doing a deployment; e.g. “generate deployment plan”. If the calculation is done as part of the plan creation, it would be possible to preview the calculated values. Alternatively the workflow could be run independently of the overcloud deployment, but how that will fit with the UI workflow needs to be determined.

Usecase 1: Derivation of DPDK Parameters

A part of the Mistral workflow which uses YAQL to derive DPDK parameters based on introspection data, including NUMA 2, exists and may be seen on GitHub 3.

Usecase 2: Derivation Profiles for HCI

This usecase uses HCI, running Ceph OSD and Nova Compute on the same node. HCI derive parameters workflow works with a default set of configs to categorize the type of the workload that the role will host. An option will be provide to override the default configs with deployment specific configs via plan-environment.yaml.

In case of HCI deployment, the additional plan environment used for the deployment will look like:

workflow_parameters:tripleo.workflows.v1.derive_parameters:# HCI Derive ParametersHciProfile:nfv-defaultHciProfileConfig:default:average_guest_memory_size_in_mb:2048average_guest_CPU_utilization_percentage:50many_small_vms:average_guest_memory_size_in_mb:1024average_guest_CPU_utilization_percentage:20few_large_vms:average_guest_memory_size_in_mb:4096average_guest_CPU_utilization_percentage:80nfv_default:average_guest_memory_size_in_mb:8192average_guest_CPU_utilization_percentage:90

In the above example, the section workflow_parameters is used to provide input parameters for the workflow in order to isolate Nova and Ceph resources while maximizing performance for different types of guest workloads. An example of the derivation done with these inputs is provided in nova_mem_cpu_calc.py on GitHub 4.

Other Integration of Parameter Derivation with TripleO

Users may still override parameters

If a workflow derives a parameter, e.g. cpu_allocation_ratio, but the operator specified a cpu_allocation_ratio in their overcloud deploy, then the operator provided value is given priority over the derived value. This may be useful in a case where an operator wants all of the values that were derived but just wants to override a subset of those parameters.

Handling Cross Dependency Resources

It is possible that multiple workflows will end up deriving parameters based on the same resource (like CPUs). When this happens, it is important to have a specific order for the workflows to be run considering the priority.

For example, let us consider the resource CPUs and how it should be used between DPDK and HCI. DPDK requires a set of dedicated CPUs for Poll Mode Drivers (NeutronDpdkCoreList), which should not be used for host process (ComputeHostCpusList) and guest VM’s (NovaVcpuPinSet). HCI requires the CPU allocation ratio to be derived based on the number of CPUs that are available for guest VMs (NovaVcpuPinSet). Priority is given to DPDK, followed by HOST parameters and then HCI parameters. In this case, the workflow execution starts with a pool of CPUs, then:

DPDK: Allocate NeutronDpdkCoreList
HOST: Allocate ComputeHostCpusList
HOST: Allocate NovaVcpuPinSet
HCI: Fix the cpu allocation ratio based on NovaVcpuPinSet

Derived parameters for specific services or roles

If an operator only wants to configure Enhanced Placement Awareness (EPA) features like CPU pinning or huge pages, which are not associated with any feature like DPDK or HCI, then it should be associated with just the compute service.

Workflow Association with Services

The optimal way to associate the derived parameter workflows with services, is to get the list of the enabled services on a given role, by previewing Heat stack. With the current limitations in Heat, it is not possible fetch the enabled services list on a role. Thus, a new parameter will be introduced on the service which is associated with a derive parameters workflow. If this parameter is referenced in the heat resource tree, on a specific role, then the corresponding derive parameter workflow will be invoked. For example, the DPDK service will have a new parameter “EnableDpdkDerivation” to enable the DPDK specific workflows.

Future integration with TripleO UI

If this spec were implemented and merged, then the TripleO UI could have a menu item for a deployment, e.g. HCI, in which the deployer may choose a derivation profile and then deploy an overcloud with that derivation profile.

The UI could better integrate with this feature by allowing a deployer to use a graphical slider to vary an existing derivation profile and then save that derivation profile with a new name. The following cycle could be used by the deployer to tune the overcloud.

Choose a deployment, e.g. HCI
Choose an HCI profile, e.g. many_small_vms
Run the deployment
Benchmark the planned workload on the deployed overcloud
Use the sliders to change aspects of the derivation profile
Update the deployment and re-run the benchmark
Repeat as needed
Save the new derivation profile as the one to be deployed in the field

The implementation of this spec would enable the TripleO UI to support the above.

Alternatives

The simplest alternative is for operators to determine what tunings are appropriate by testing or reading documentation and then implement those tunings in the appropriate Heat environment files. For example, in an HCI scenario, an operator could run nova_mem_cpu_calc.py 4 and then create a Heat environment file like the following with its output and then deploy the overcloud and directly reference this file:

parameter_defaults:ExtraConfig:nova::compute::reserved_host_memory:75000nova::cpu_allocation_ratio:8.2

This could translate into a variety of overrides which would require initiative on the operator’s part.

Another alternative is to write separate tools which generate the desired Heat templates but don’t integrate them with TripleO. For example, nova_mem_cpu_calc.py and similar, would produce a set of Heat environment files as output which the operator would then include instead of output containing the following:

nova.conf reserved_host_memory_mb = 75000 MB
nova.conf cpu_allocation_ratio = 8.214286

When evaluating the above, keep in mind that only two parameters for CPU allocation and memory are being provided as an example, but that a tuned deployment may contain more.

Security Impact

There is no security impact from this change as it sits at a higher level to automate, via Mistral and Heat, features that already exist.

Other End User Impact

Operators need not manually derive the deployment parameters based on the introspection or hardware specification data, as it is automatically derived with pre-defined formulas.

Performance Impact

The deployment and update of an overcloud may take slightly longer if an operator uses this feature because an additional Mistral workflow needs to run to perform some analytics before applying configuration updates. However, the performance of the overcloud would be improved because this proposal aims to make it easier to tune the overcloud for performance.

Other Deployer Impact

A new configuration option is being added, but it has to be explicitly enabled, and thus it would not take immediate effect after its merged. Though, if a deployer chooses to use it and there is a bug in it, then it could affect the overcloud deployment. If a deployer uses this new option, and had a deploy in which they set a parameter directly, e.g. the Nova cpu_allocation_ratio, then that parameter may be overridden by a particular tuning profile. So that is something a deployer should be aware of when using this proposed feature.

The config options being added will ship with a variety of defaults based on deployments put under load in a lab. The main idea is to make different sets of defaults, which were produced under these conditions, available. The example discussed in this proposal and to be made available on completion could be extended.

Developer Impact

This spec proposes modifying the deployment plan which, if there was a bug, could introduce problems into a deployment. However, because the new feature is completely optional, a developer could easily disable it.

Implementation

Assignee(s)

Primary assignees:: skramaja fultonj
Other contributors:: jpalanis abishop shardy gfidente

Work Items

Derive Params start workflow to find list of roles
Workflow run for each role to fetch the introspection data and trigger individual features workflow
Workflow to identify if a service associated with a features workflow is enabled in a role
DPDK Workflow: Analysis and concluding the format of the input data (jpalanis)
DPDK Workflow: Parameter deriving workflow (jpalanis)
HCI Workflow: Run a workflow that calculates the parameters (abishop)
SR-IOV Workflow
EPA Features Workflow
Run the derive params workflow from CLI
Add CI scenario testing if workflow with produced expected output

Dependencies

NUMA Topology in introspection data (ironic-python-agent) 5

Testing

Create a new scenario in the TripleO CI in which a deployment is done using all of the available options within a derivation profile called all-derivation-options. A CI test would need to be added that would test this new feature by doing the following:

A deployment would be done with the all-derivation-options profile
The deployment would be checked that all of the configurations had been made
If the configuration changes are in place, then the test passed
Else the test failed

Relating the above to the HCI usecase, the test could verify one of two options:

A Heat environment file created with the following syntactically valid Heat:

parameter_defaults:ExtraConfig:nova::compute::reserved_host_memory:75000nova::cpu_allocation_ratio:8.2

The compute node was deployed such that the commands below return something like the following:

[root@overcloud-osd-compute-0~]# grep reserved_host_memory /etc/nova/nova.confreserved_host_memory_mb=75000[root@overcloud-osd-compute-0~]# grep cpu_allocation_ratio /etc/nova/nova.confcpu_allocation_ratio=8.2[root@overcloud-osd-compute-0~]#

Option 1 would put less load on the CI infrastructure and produce a faster test but Option 2 tests the full scenario.

If a new derived parameter option is added, then the all-derivation-options profile would need to be updated and the test would need to be updated to verify that the new options were set.

Documentation Impact

A new chapter would be added to the TripleO document on deploying with derivation profiles.

References

1: Deployment Plan Management specification
2: Spec for Ironic to retrieve NUMA node info
3: https://github.com/Jaganathancse/Jagan/tree/master/mistral-workflow
4(1,2): nova_mem_cpu_calc.py
5: NUMA Topology in introspection data (ironic-python-agent)

https://blueprints.launchpad.net/tripleo/+spec/tripleo-ceph-ansible

Enable TripleO to deploy Ceph via Ceph Ansible using a new Mistral workflow. This will make the Ceph installation less tightly coupled with TripleO but the existing operator interfaces to deploy Ceph with TripleO will still be supported until the end of the Queens release.

Problem Description

The Ceph community maintains ceph-ansible to deploy and manage Ceph. Members of the TripleO community maintain similar tools too. This is a proposal to have TripleO trigger the Ceph community’s tools via Mistral as an alternative method to deploy and manage Ceph.

Benefits of using another project to deploy and manage Ceph

Avoid duplication of effort

If there is a feature or bug fix in the Ceph community’s tools not in the tools used by TripleO, then members of the TripleO community could allow deployers to use those features directly instead of writing their own implementation. If this proposal is successful, then it might result in not maintaining two code bases, (along with the bug fixes and testing included) in the future. For example, if ceph-ansible fixed a bug to correctly handle alternative system paths to block devices, e.g. /dev/disk/by-path/ in lieu of /dev/sdb, then the same bug would not need to be fixed in puppet-ceph. This detail would also be nicely abstracted from a deployer because this spec proposes maintaining parity with TripleO Heat Templates. Thus, the deployer would not need to change the ceph::profile::params::osds parameter as the same list of OSDs would work.

In taking this approach it’s possible for there to be cases where TripleO’s deployment architecture may have unique features that don’t exist within ceph-ansible. In these cases, efforts may need to be taken so ensure such a features remian in parity with this approach. In no way, does this proposal enable a TripleO deployer to bypass TripleO and use ceph-ansible directly. Also, because Ceph is not an OpenStack service itself but a service that TripleO uses, this approach remains consistent with the TripleO mission.

Consistency between OpenStack and non-OpenStack Ceph deployments

A deployer may seek assistance from the Ceph community with a Ceph deployment and this process will be simplified if both deployments were done using the same tool.

Enable Decoupling of Ceph management from TripleO

The complexity of Ceph management can be moved to a different tool and abstracted, where appropriate, from TripleO making the Ceph management aspect of TripleO less complex. Combining this with containerized Ceph would offer flexible deployment options. This is a deployer benefit that is difficult to deliver today.

Features in the Ceph community’s tools not in TripleO’s tools

The Ceph community tool, ceph-ansible 1, offers benefits to OpenStack users not found in TripleO’s tool chain, including playbooks to deploy Ceph in containers and migrate a non-containerized deployment to a containerized deployment without downtime. Also, making the Ceph deployment in TripleO less tightly coupled, by moving it into a new Mistral workflow, would make it easier in a future release to add a business logic layer through a tool like Tendrl 2, to offer additional Ceph policy based configurations and possibly a graphical tool to see the status of the Ceph cluster. However, the scope of this proposal for Pike does not include Tendrl and instead takes the first step towards deploying Ceph via a Mistral workflow by triggering ceph-ansible directly. After the Pike cycle is complete triggering Mistral may be considered in a future spec.

Proposed Change

Overview

The ceph-ansible 1 project provides a set of playbooks to deploy and manage Ceph. A proof of concept 3 has been written which uses two custom Mistral actions from the experimental mistral-ansible-actions project 4 to have a Mistral workflow on the undercloud trigger ceph-ansible to produce a working hyperconverged overcloud.

The deployer experience to stand up Ceph with TripleO at the end of this cycle should be the following:

The deployer chooses to deploy a role containing any of the Ceph server services: CephMon, CephOSD, CephRbdMirror, CephRgw, or CephMds.
The deployer provides the same Ceph parameters they provide today in a Heat env file, e.g. a list of OSDs.
The deployer starts the deploy and gets an overcloud with Ceph

Thus, the deployment experience remains the same for the deployer but behind the scenes a Mistral workflow is started which triggers ceph-ansible. The details of the Mistral workflow to accomplish this follows.

TripleO Ceph Deployment via Mistral

TripleO’s workflow to deploy a Ceph cluster would be changed so that there are two ways to deploy a Ceph cluster; the way currently supported by TripleO and the way described in this proposal.

The workflow described here assumes the following:

A deployer chooses to deploy Ceph server services from the following list of five services found in THT’s roles_data.yaml: CephMon, CephOSD, CephRbdMirror, CephRgw, or CephMds.
The deployer chooses to include new Heat environment files which will be in THT when this spec is implemented. The new Heat environment file will change the implementation of any of the five services from the previous step. Using storage-environment.yaml, which defaults to Ceph deployed by puppet-ceph, will still trigger the Ceph deployment by puppet-ceph. However, if the new Heat environment files are included instead of storage-environment.yaml, then the implementation of the service will be done by ceph-ansible instead; which already configures these services for hosts under the following roles in the Ansible inventory: mons, osds, mdss, rgws, or rbdmirrors.
The undercloud has a directory called /usr/share/ceph-ansible which contains the ceph-ansible playbooks described in this spec. It will be present because its install will contain the installation of the ceph-ansible package.
The Mistral on the Undercloud will contain to custom actions called ansible and ansible-playbook (or similar) and will also contain the workflow for each task below and can be observed by running openstack workflow list. Assume this is the case because the tripleo-common package will be modified to ship these actions and they will be available after undercloud installation.
Heat will ship a new CustomResource type like OS::Mistral::WorflowExecution 6, which will execute custom Mistral workflows.

The standard TripleO workflow, as executed by a deployer, will create a custom Heat resource which starts an independent Mistral workflow to interact with ceph-ansible. An example of such a Heat resource would be OS::Mistral::WorflowExecution 6.

Each independent Mistral workflow may be implemented directly in tripleo-common/workbooks. A separate Mistral workbook will be created for each goal described below:

Initial deployment of OpenStack and Ceph
Adding additional Ceph OSDs to existing OpenStack and Ceph clusters

The initial goal for the Pike cycle will be to maintain feature parity with what is possible today in TripleO and puppet-ceph but with containerized Ceph. Additional Mistral workflows may be written, time permitting or in a future cycle to add new features to TripleO’s Ceph deployment which leverage ceph-ansible playbooks to shrink the Ceph Cluster and safely remove an OSD or to perform maintenance on the cluster by using Ceph’s ‘noout’ flag so that the maintenance does not result in more data migration than necessary.

Initial deployment of OpenStack and Ceph

The sequence of events for this new Mistral workflow and Ceph-Ansible to be triggered during initial deployment with TripleO follows:

Define the Overcloud on the Undercloud in Heat. This includes the Heat parameters that are related to storage which will later be passed to ceph-ansible via a Mistral workflow.
Run openstack overcloud deploy with standard Ceph options but including a new Heat environment file to make the implementation of the service deployment use ceph-ansible.
The undercloud assembles and uploads the deployment plan to the undercloud Swift and Mistral environment.
Mistral starts the workflow to deploy the Overcloud and interfaces with Heat accordingly.
A point in the deployment is reached where the Overcloud nodes are imaged, booted, and networked. At that point the undercloud has access to the provisioning or management IPs of the Overcloud nodes.
A new Heat Resource is created which starts a Mistral workflow to Deploy Ceph on the systems with the any of the five Ceph server services, including CephMon, CephOSD, CephRbdMirror, CephRgw, or CephMds 6.
The servers which host Ceph services have their relevant firewall ports opened according to the needs of their service, e.g. the Ceph monitor firewalls are configured to accept connections on TCP port 6789. 7.
The Heat resource is passed the same parameters normally found in the tripleo-heat-templates environments/storage-environment.yaml but instead through a new Heat environment file. Additional files may be passed to include overrides, e.g. the list of OSD disks.
The Heat resource passes its parameters to the Mistral workflow as parameters. This will include information about which hosts should have which of the five Ceph server services.
The Mistral workflow translates these parameters so that they match the parameters that ceph-ansible expects, e.g. ceph::profile::params::osds would become devices though they’d have the same content, which would be a list of block devices. The translation entails building an argument list that may be passed to the playbook by calling ansible-playbook –extra-vars. Typically ceph-ansible uses modified files in the group_vars directory but in this case, no files are modified and instead the parameters are passed programmatically. Thus, the playbooks in /usr/share/ceph-ansible may be run unaltered and that will be the default directory. However, it will be possible to pass an alternative location for the /usr/share/ceph-ansible playbook as an argument. No playbooks are run yet at this stage.
The Mistral environment is updated to generate a new SSH key-pair for ceph-ansible and the Overcloud nodes using the same process that is used to create the SSH keys for TripleO validations and install the public key on Overcloud nodes. After this environment update it will be possible to run mistral environment-get ssh_keys_ceph on the undercloud and see the public and private keys in JSON.
The Mistral Action Plugin ansible-playbook is called and passed the list of parameters as described earlier. The dynamic ansible inventory used by tripleo-validations is used with the -i option. In order for ceph-ansible to work as usual there must be a group called [mons] and [osds] in the inventory. In addition to optional groups for [mdss], [rgws], or [rbdmirrors]. Modifications to the tripleo-validations project’s tripleo-ansible-inventory script may be made to support this, or a derivative work of the same as shipped by TripleO common. The SSH private key for the heat-admin user and the provisioning or management IPs of the Overcloud nodes are what Ansible will use.
The mistral workflow computes the number of forks in Ansible according to the number of machines that are going to be bootstrapped and will pass this number with ansible-playbook –forks.
Mistral verifies that the Ansible ping module can execute ansible $group -m ping for any group in mons, osds, mdss, rgws, or rbdmirrors, that was requested by the deployer. For example, if the deployer only specified the CephMon and CephOSD service, then Mistral will only run ansible mons -m ping and ansible osds -m ping. The Ansible ping module will SSH into each host as the heat-admin user with key which was generated as described previously. If this fails, then the deployment fails.
Mistral starts the Ceph install using the ansible-playbook action.
The Mistral workflow creates a Zaqar queue to send progress information back to the client (CLI or web UI).
The workflow posts messages to the “tripleo” Zaqar queue or the queue name provided to the original deploy workflow.
If there is a problem during the status of the deploy may be seen by openstack workflow execution list | grep ceph and in the logs at /var/log/mistral/{engine.log,executor.log}. Running openstack stack resource list would show the custom Heat resource that started the Mistral workflow, but openstack workflow execution list and openstack workflow task list would contain more details about what steps completed within the Mistral workflow.
The Ceph deployment is done in containers in a way which must prevent any configuration file conflict for any composed service, e.g. if a Nova compute container (as deployed by TripleO) and a Ceph OSD container are on the same node, then they must have different ceph.conf files, even if those files have the same content. Though, ceph-ansible will manage ceph.conf for Ceph services and puppet-ceph will still manage ceph.conf for OpenStack services, neither tool will both try to manage the same ceph.conf because it will be in a different location on the container host and bind mounted to /etc/ceph/ceph.conf within different containers.
After the Mistral workflow is completed successfully, the custom Heat resource is considered successfully created. If the Mistral workflow does not complete successfully, then the Heat resource is not considered successfully created. TripleO should handle this the same way that it handles any Heat resource that failed to be created. For example, because the workflow is idempotent, if the resource creation fails because the wrong parameter was passed or because of a temporary network issue, the deployer could simply run a stack-update the Mistral worklow would run again and if the issues which caused the first run to fail were resolved, the deployment should succeed. Similarly if a user updates a parameter, e.g. a new disk is added to ceph::profile::params::osds, then the workflow will run again without breaking the state of the running Ceph cluster but it will configure the new disk.
After the dependency of the previous step is satisfied, the TripleO Ceph external Heat resource is created to configure the appropriate Overcloud nodes as Ceph clients.
For the CephRGW service, hieradata will be emitted so that it may be used for the haproxy listener setup and keystone users setup.
The Overcloud deployment continues as if it was using an external Ceph cluster.

Adding additional Ceph OSD Nodes to existing OpenStack and Ceph clusters

The process to add an additional Ceph OSD node is similar to the process to deploy the OSDs along with the Overcloud:

Introspect the new hardware to host the OSDs.
In the Heat environment file containing the node counts, increment the CephStorageCount.
Run openstack overcloud deploy with standard Ceph options and the environment file which specifies the implementation of the Ceph deployment via ceph-ansible.
The undercloud updates the deployment plan.
Mistral starts the workflow to update the Overcloud and interfaces with Heat accordingly.
A point in the deployment is reached where the new Overcloud nodes are imaged, booted, and networked. At that point the undercloud has access to the provisioning or management IPs of the Overcloud nodes.
A new Heat Resource is created which starts a Mistral workflow to add new Ceph OSDs.
TCP ports 6800:7300 are opened on the OSD host 7.
The Mistral environment already has an SSH key-pair as described in the initial deployment scenario. The same process that is used to install the public SSH key on Overcloud nodes for TripleO validations is used to install the SSH keys for ceph-ansible.
If necessary, the Mistral workflow updates the number of forks in Ansible according to the new number of machines that are going to be bootstrapped.
The dynamic Ansible inventory will contain the new node.
Mistral confirms that Ansible can execute ansible osds -m ping. This causes Ansible to SSH as the heat-admin user into all of the CephOsdAnsible nodes, including the new nodes. If this fails, then the update fails.
Mistral uses the Ceph variables found in Heat as described in the initial deployment scenario.
Mistral runs the osd-configure.yaml playbook from ceph-ansible to add the extra Ceph OSD server.
The OSDs on the server are each deployed in their own containers and docker ps will list each OSD container.
After the Mistral workflow is completed, the Custom Heat resource is considered to be updated.
No changes are necessary for the TripleO Ceph external Heat resource since the Overcloud Ceph clients only need information about new OSDs from the Ceph monitors.
The Overcloud deployment continues as if it was using an external Ceph cluster.

Containerization of configuration files

As described in the Containerize TripleO spec, configuration files for the containerized service will be generated by Puppet and then passed to the containerized service using a configuration volume 8. A similar containerization feature is already supported by ceph-ansible, which uses the following sequence to generate the ceph.conf configuration file.

Ansible generates a ceph.conf on a monitor node
Ansible runs the monitor container and bindmount /etc/ceph
No modification is being done in the ceph.conf
Ansible copies the ceph.conf to the Ansible server
Ansible copies the ceph.conf and keys to the appropriate machine
Ansible runs the OSD container and bindmount /etc/ceph
No modification is being done in the ceph.conf

These similar processes are compatible, even in the case of container hosts which run more than one OpenStack service but which each need their own copy of the configuration file per container. For example, consider a containerzation node which hosts both Nova compute and Ceph OSD services. In this scenario, the Nova compute service would be a Ceph client and puppet-ceph would generate its ceph.conf and the Ceph OSD service would be a Ceph server and ceph-ansible would generate its ceph.conf. It is necessary for Puppet to configure the Ceph client because Puppet configures the other OpenStack related configuration files as is already provided by TripleO. Both generated ceph.conf files would need to be stored in a separate directory on the containerization hosts to avoid conflicts and the directories could be mapped to specific containers. For example, host0 could have the following versions of foo.conf for two different containers:

host0:/container1/etc/foo.conf<---generatedbyconftool1host0:/container2/etc/foo.conf<---generatedbyconftool2

When each container is started on the host, the different configuration files could then be mapped to the different containers:

dockerruncontainter1.../container1/etc/foo.conf:/etc/foo.confdockerruncontainter2.../container2/etc/foo.conf:/etc/foo.conf

In the above scenario, it is necessary for both configuration files to be generated from the same parameters. I.e. both Puppet and Ansible will use the same values from the Heat environment file, but will generate the configuration files differently. After the configuration programs have run it won’t matter that Puppet idempotently updated lines of the ceph.conf and that Ansible used a Jina2 template. What will matter is that both configuration files have the same value, e.g. the same FSID.

Configuration files generated as described in the Containerize TripleO spec will not store those configuration files on the container host’s /etc directory before passing it to the container guest with a bind mount. By default, ceph-ansible generates the initial ceph.conf on the container host’s /etc directory before it uses a bind mount to pass it through to the container. In order to be consistent with the Containerize TripleO spec, ceph-ansible will get a new feature for deploying Ceph in containers so that it will not generate the ceph.conf on the container host’s /etc directory. The same option will need to apply when generating Ceph key rings; which will be stored in /etc/ceph in the container, but not on the container host.

Because Mistral on the undercloud runs the ansible playbooks, the user “mistral” on the undercloud will be the one that SSH’s into the overcloud nodes to run ansible playbooks. Care will need to be taken to ensure that user doesn’t make changes which are out of scope.

Alternatives

From a high level, this proposal is an alternative to the current method of deploying Ceph with TripleO and offers the benefits listed in the problem description.

From a lower level, how this proposal is implemented as described in the Workflow section should be considered.

In a split-stack scenario, after the hardware has been provisioned by the first Heat stack and before the configuration Heat stack is created, a Mistral workflow like the one in the POC 3 could be run to configured Ceph on the Ceph nodes. This scenario would be more similar to the one where TripleO is deployed using the TripleO Heat Templates environment file puppet-ceph-external.yaml. This could be an alternative to a new OS::Mistral::WorflowExecution Heat resource 6.
Trigger the ceph-ansible deployment before the OpenStack deployment In the initial workflow section, it is proposed that “A new Heat Resource is created which starts a Mistral workflow to Deploy Ceph”. This may be difficult because, in general, composable services currently define snippets of puppet data which is then later combined to define the deployment steps, and there is not yet a way to support running an arbitrary Mistral workflow at a given step of a deployment. Thus, the Mistral workflow could be started first and then it could wait for what is described in step 6 of the overview section.

Security Impact

A new SSH key pair will be created on the undercloud and will be accessible in the Mistral environment via a command like mistral environment-get ssh_keys_ceph. The public key of this pair will be installed in the heat-admin user’s authorized_keys file on all Overcloud nodes which will be Ceph Monitors or OSDs. This process will follow the same pattern used to create the SSH keys used for TripleO validations so nothing new would happen in that respect; just another instance on the same type of process.
An additional tool would do configuration on the Overcloud, though the impact of this should be isolated via Containers.
Regardless of how Ceph services are configured, they require changes to the firewall. This spec will implement parity in fire-walling for Ceph services 7.

Other End User Impact

None.

Performance Impact

The following applies to the undercloud:

Mistral will need to run an additional workflow
Heat’s role in deploying Ceph would be lessened so the Heat stack would be smaller.

Other Deployer Impact

Ceph will be deployed using a method that is proven but who’s integration is new to TripleO.

Developer Impact

None.

Implementation

Assignee(s)

Primary assignee:: fultonj
Other contributors:: gfidente leseb colonwq d0ugal (to review Mistral workflows/actions)

Work Items

Prototype a Mistral workflow to independently install Ceph on Overcloud nodes 3. [done]
Prototype a Heat Resource to start an independent Mistral Workflow 6. [done]
Expand mistral-ansible-actions with necessary options (fultonj)
Parametize mistral workflow (fultonj)
Update and have merged Heat CustomResource 6 (gfidente)
Have ceph-ansible create openstack pools and keys for containerized deployments: https://github.com/ceph/ceph-ansible/issues/1321 (leseb)
get ceph-ansible packaged in ceph.com and push to centos cbs (fultonj / leseb)
Make undercloud install produce /usr/share/ceph-ansible by modifying RDO’s instack RPM’s spec file to add a dependency (fultonj)
Submit mistral workflow and ansible-mistral-actions to tripleo-common (fultonj)
Prototype new service plugin interface that defines per-service workflows (gfidente / shardy / fultonj)
Submit new services into tht/roles_data.yaml so users can use it. This should include a change to the tripleo-heat-templates ci/environments/scenario001-multinode.yaml to include the new service, e.g. CephMonAnsible so that CI is tested. This may not work unless it all co-exists in a single overcloud deploy. If it works, we use it to get started. The initial plan is for scenario004 to keep using puppet-ceph.
Implement the deleting the Ceph Cluster scenario
Implement the adding additional Ceph OSDs to existing OpenStack and Ceph clusters scenario
Implement the removing Ceph OSD nodes scenario
Implement the performing maintenance on Ceph OSD nodes (optional)

Dependencies

Containerization of the Ceph services provided by ceph-ansible is used to ensure the configuration tools aren’t competing. This will need to be compatible with the Containerize TripleO spec 9.

Testing

A change to tripleo-heat-templates’ scenario001-multinode.yaml will be submitted which includes deployment of the new services CephMonAnsible and CephOsdAnsible (note that these role names will be changed when fully working). This testing scenario may not work unless all of the services may co-exist; however, preliminary testing indicates that this will work. Initially scenario004 will not be modified and will be kept using puppet-ceph. We may start by changing ovb-nonha scenario first as we believe this may be faster. When the CI move to tripleo-quickstart happens and there is a containers only scenario we will want to add a hyperconverged containerized deployment too.

Documentation Impact

A new TripleO Backend Configuration document “Deploying Ceph with ceph-ansible” would be required.

References

1(1,2): ceph-ansible
2: Tendrl
3(1,2,3): POC tripleo-ceph-ansible
4: Experimental mistral-ansible-actions project
6(1,2,3,4,5,6): Proposed new Heat resource OS::Mistral::WorflowExecution
7(1,2,3): These firewall changes must be managed in a way that does not conflict with TripleO’s mechanism for managing host firewall rules and should be done before the Ceph servers are deployed. We are working on a solution to this problem.
8: Configuration files generated by Puppet and passed to a containerized service via a config volume
9: Spec to Containerize TripleO

https://blueprints.launchpad.net/tripleo/+spec/tripleo-juno-remove-mergepy

merge.py is where we’ve historically accumulated the technical debt for our Heat templates 0 with the intention of migrating away from it when Heat meets our templating needs.

Its main functionality includes combining smaller template snippets into a single template describing the full TripleO deployment, merging certain resources together to reduce duplication while keeping the snippets themselves functional as standalone templates and a support for manual scaling of Heat resources.

This spec describes the changes necessary to move towards templates that do not depend on merge.py. We will use native Heat features where we can and document the rest, possibly driving new additions to the Heat template format.

It is largely based on the April 2014 discussion in openstack-dev 1.

Problem Description

Because of the mostly undocumented nature of merge.py our templates are difficult to understand or modify by newcomers (even those already familiar with Heat).

It has always been considered a short-term measure and Heat can now provide most of what we need in our templates.

Proposed Change

We will start with making small correctness-preserving changes to our templates and merge.py that move us onto using more Heat native features. Where we cannot make the change for some reason, we will file a bug with Heat and work with them to unblock the process.

Once we get to a point where we have to do large changes to the structure of our templates, we will split them off to new files and enable them in our CI as parallel implementations.

Once we are confident that the new templates fulfill the same requirements as the original ones, we will deprecate the old ones, deprecate merge.py and switch to the new ones as the default.

The list of action items necessary for the full transition is below.

1. Remove the custom resource types

TripleO Heat templates and merge.py carry two custom types that (after the move to software config 8, 9) are no longer used for anything:

OpenStack::ImageBuilder::Elements
OpenStack::Role

We will drop them from the templates and deprecate in the merge tool.

2. Remove combining whitelisted resource types

If we have two AWS::AutoScaling::LaunchConfiguration resources with the same name, merge.py will combine their Properties and Metadata. Our templates are no longer using this after the software-config update.

3. Port TripleO Heat templates to HOT

With most of the non-Heat syntax out of the way, porting our CFN/YAML templates to pure HOT format 2 should be straightforward.

We will have to update merge.py as well. We should be able to support both the old format and HOT.

We should be able to differentiate between the two by looking for the heat_template_version top-level section which is mandatory in the HOT syntax.

Most of the changes to merge.py should be around spelling (Parameters -> parameters, Resources -> resources) and different names for intrinsic functions, etc. (Fn::GetAtt -> get_attr).

This task will require syntactic changes to all of our templates and unfortunately, it isn’t something different people can update bit by bit. We should be able to update the undercloud and overcloud portions separately, but we can’t e.g. just update a part of the overcloud. We are still putting templates together with merge.py at this point and we would end up with a template that has both CFN and HOT bits.

4. Move to Provider resources

Heat allows passing-in multiple templates when deploying a stack. These templates can map to custom resource types. Each template would represent a role (compute server, controller, block storage, etc.) and its parameters and outputs would map to the custom resource’s properties and attributes.

These roles will be referenced from a master template (overcloud.yaml, undercloud.yaml) and eventually wrapped in a scaling resource (OS::Heat::ResourceGroup5) or whatever scaling mechanism we adopt.

Note

Provider resources represent fully functional standalone templates. Any provider resource template can be passed to Heat and turned into a stack or treated as a custom resource in a larger deployment.

Here’s a hypothetical outline of compute.yaml:

parameters:flavor:type:stringimage:type:stringamqp_host:type:stringnova_compute_driver:type:stringresources:compute_instance:type:OS::Nova::Serverproperties:flavor:{get_param:flavor}image:{get_param:image}compute_deployment:type:OS::Heat::StructuredDeploymentproperties:server:{ref:compute_instance}config:{ref:compute_config}input_values:amqp_host:{get_param:amqp_host}nova_compute_driver:{get_param:nova_compute_driver}compute_config:type:OS::Heat::StructuredConfigproperties:group:os-apply-configconfig:amqp:host:{get_input:amqp_host}nova:compute_driver:{get_input:nova_compute_driver}...

We will use a similar structure for all the other roles (controller.yaml, block-storage.yaml, swift-storage.yaml, etc.). That is, each role will contain the OS::Nova::Server, the associated deployments and any other resources required (random string generators, security groups, ports, floating IPs, etc.).

We can map the roles to custom types using Heat environments 4.

role_map.yaml:

resource_registry:OS::TripleO::Compute:compute.yamlOS::TripleO::Controller:controller.yamlOS::TripleO::BlockStorage:block-storage.yamlOS::TripleO::SwiftStorage:swift-storage.yaml

Lastly, we’ll have a master template that puts it all together.

overcloud.yaml:

parameters:compute_flavor:type:stringcompute_image:type:stringcompute_amqp_host:type:stringcompute_driver:type:string...resources:compute0:# defined in controller.yaml, type mapping in role_map.yamltype:OS::TripleO::Computeparameters:flavor:{get_param:compute_flavor}image:{get_param:compute_image}amqp_host:{get_param:compute_amqp_host}nova_compute_driver:{get_param:compute_driver}controller0:# defined in controller.yaml, type mapping in role_map.yamltype:OS::TripleO::Controllerparameters:flavor:{get_param:controller_flavor}image:{get_param:controller_image}...outputs:keystone_url:description:URLfortheOvercloudKeystoneservice# `keystone_url` is an output defined in the `controller.yaml` template.# We're referencing it here to expose it to the Heat user.value:{get_attr:[controller_0,keystone_url]}

and similarly for undercloud.yaml.

Note

The individual roles (compute.yaml, controller.yaml) are structured in such a way that they can be launched as standalone stacks (i.e. in order to test the compute instance, one can type heatstack-create-fcompute.yaml-P...). Indeed, Heat treats provider resources as nested stacks internally.

5. Remove FileInclude from ``merge.py``

The goal of FileInclude was to keep individual Roles (to borrow a loaded term from TripleO UI) viable as templates that can be launched standalone. The canonical example is nova-compute-instance.yaml3.

With the migration to provider resources, FileInclude is not necessary.

6. Move the templates to Heat-native scaling

Scaling of resources is currently handled by merge.py. The --scale command line argument takes a resource name and duplicates it as needed (it’s a bit more complicated than that, but that’s beside the point).

Heat has a native scaling OS::Heat::ResourceGroup5 resource that does essentially the same thing:

scaled_compute:type:OS::Heat::ResourceGroupproperties:count:42resource_def:type:OS::TripleO::Computeparameters:flavor:baremetalimage:compute-image-rhel7...

This will create 42 instances of compute hosts.

7. Replace Merge::Map with scaling groups’ inner attributes

We are using the custom Merge::Map helper function for getting values out of scaled-out servers:

Building a comma-separated list of RabbitMQ nodes

Getting the name of the first controller node

List of IP addresses of all controllers

Building the /etc/hosts file

The ResourceGroup resource supports selecting an attribute of an inner resource as well as getting the same attribute from all resources and returning them as a list.

Example of getting an IP address of the controller node:

{get_attr:[controller_group,resource.0.networks,ctlplane,0]}

(controller_group is the ResourceGroup of our controller nodes, ctlplane is the name of our control plane network)

Example of getting the list of names of all of the controller nodes:

{get_attr:[controller_group,name]}

The more complex uses of Merge::Map involve formatting the returned data in some way, for example building a list of {ip:...,name:...} dictionaries for haproxy or generating the /etc/hosts file.

Since our ResourceGroups will not be using Nova servers directly, but rather the custom role types using provider resources and environments, we can put this data formatting into the role’s outputs section and then use the same mechanism as above.

Example of building out the haproxy node entries:

# overcloud.yaml:resources:controller_group:type:OS::Heat::ResourceGroupproperties:count:{get_param:controller_scale}resource_def:type:OS::TripleO::Controllerproperties:...controllerConfig:type:OS::Heat::StructuredConfigproperties:...haproxy:nodes:{get_attr:[controller_group,haproxy_node_entry]}# controller.yaml:resources:...controller:type:OS::Nova::Serverproperties:...outputs:haproxy_node_entry:description:A{ip:...,name:...}dictionaryforconfiguringthehaproxynodevalue:ip:{get_attr:[controller,networks,ctlplane,0]}name:{get_attr:[controller,name]}

Alternatives

This proposal is very t-h-t and Heat specific. One alternative is to do nothing and keep using and evolving merge.py. That was never the intent, and most members of the core team do not consider this a viable long-term option.

Security Impact

This proposal does not affect the overall functionality of TripleO in any way. It just changes the way TripleO Heat templates are stored and written.

If anything, this will move us towards more standard and thus more easily auditable templates.

Other End User Impact

There should be no impact for the users of vanilla TripleO.

More advanced users may want to customise the existing Heat templates or write their own. That will be made easier when we rely on standard Heat features only.

Performance Impact

This moves some of the template-assembling burden from merge.py to Heat. It will likely also end up producing more resources and nested stacks on the background.

As far as we’re aware, no one has tested these features at the scale we are inevitably going to hit.

Before we land changes that can affect this (provider config and scaling) we need to have scale tests in Tempest running TripleO to make sure Heat can cope.

These tests can be modeled after the large_ops scenario: a Heat template that creates and destroys a stack of 50 Nova server resources with associated software configs.

We should have two tests to asses the before and after performance:

A single HOT template with 50 copies of the same server resource and software config/deployment.
A template with a single server and its software config/deploys, an environment file with a custom type mapping and an overall template that wraps the new type in a ResourceGroup with the count of 50.

Other Deployer Impact

Deployers can keep using merge.py and the existing Heat templates as before – existing scripts ought not break.

With the new templates, Heat will be called directly and will need the resource registry (in a Heat environment file). This will mean a change in the deployment process.

Developer Impact

This should not affect non-Heat and non-TripleO OpenStack developers.

There will likely be a slight learning curve for the TripleO developers who want to write and understand our Heat templates. Chances are, we will also encounter bugs or unforeseen complications while swapping merge.py for Heat features.

The impact on Heat developers would involve processing the bugs and feature requests we uncover. This will hopefully not be an avalanche.

Implementation

Assignee(s)

Primary assignee:: Tomas Sedovic <lp: tsedovic> <irc: shadower>

Work Items

Remove the custom resource types
Remove combining whitelisted resource types
Port TripleO Heat templates to HOT
Move to Provider resources
Remove FileInclude from merge.py
Move the templates to Heat-native scaling
Replace Merge::Map with scaling groups’ inner attributes

Dependencies

The Juno release of Heat
Being able to kill specific nodes in Heat (for scaling down or because they’re misbehaving) - Relevant Heat blueprint: autoscaling-parameters

Testing

All of these changes will be made to the tripleo-heat-templates repository and should be testable by our CI just as any other t-h-t change.

In addition, we will need to add Tempest scenarios for scale to ensure Heat can handle the load.

Documentation Impact

We will need to update the devtest, Deploying TripleO and Using TripleO documentation and create a guide for writing TripleO templates.

References

0: https://github.com/openstack/tripleo-heat-templates
1: http://lists.openstack.org/pipermail/openstack-dev/2014-April/031915.html
2: http://docs.openstack.org/developer/heat/template_guide/hot_guide.html
3: https://github.com/openstack/tripleo-heat-templates/blob/master/nova-compute-instance.yaml
4: http://docs.openstack.org/developer/heat/template_guide/environment.html
5(1,2): http://docs.openstack.org/developer/heat/template_guide/openstack.html#OS::Heat::ResourceGroup
8: https://review.openstack.org/#/c/81666/
9: https://review.openstack.org/#/c/93319/

Blueprint: https://blueprints.launchpad.net/tuskar/+spec/tripleo-juno-tuskar-plan-rest-api

In Juno, the Tuskar API is moving towards a model of being a large scale application planning service. Its initial usage will be to deploy OpenStack on OpenStack by leveraging TripleO Heat Templates and fitting into the greater TripleO workflow.

As compared to Icehouse, Tuskar will no longer make calls to Heat for creating and updating a stack. Instead, it will serve to define and manipulate the Heat templates for describing a cloud. Tuskar will be the source for the cloud planning while Heat is the source for the state of the live cloud.

Tuskar employs the following concepts:

Deployment Plan - The description of an application (for example, the overcloud) being planned by Tuskar. The deployment plan keeps track of what roles will be present in the deployment and their configuration values. In TripleO terms, each overcloud will have its own deployment plan that describes what services will run and the configuration of those services for that particular overcloud. For brevity, this is simply referred to as the “plan” elsewhere in this spec.
Role - A unit of functionality that can be added to a plan. A role is the definition of what will run on a single server in the deployed Heat stack. For example, an “all-in-one” role may contain all of the services necessary to run an overcloud, while a “compute” role may provide only the nova-compute service.

Put another way, Tuskar is responsible for assembling the user-selected roles and their configuration into a Heat environment and making the built Heat templates and files available to the caller (the Tuskar UI in TripleO but, more generally, any consumer of the REST API) to send to Heat.

Tuskar will ship with the TripleO Heat Templates installed to serve as its roles (dependent on the conversions taking place this release 4). For now it is assumed those templates are installed as part of the TripleO’s installation of Tuskar. A different spec will cover the API calls necessary for users to upload and manipulate their own custom roles.

This specification describes the REST API clients will interact with in Tuskar, including the URLs, HTTP methods, request, and response data, for the following workflow:

Create an empty plan in Tuskar.
View the list of available roles.
Add roles to the plan.
Request, from Tuskar, the description of all of the configuration values necessary for the entire plan.
Save user-entered configuration values with the plan in Tuskar.
Request, from Tuskar, the Heat templates for the plan, which includes all of the files necessary to deploy the configured application in Heat.

The list roles call is essential to this workflow and is therefore described in this specification. Otherwise, this specification does not cover the API calls around creating, updating, or deleting roles. It is assumed that the installation process for Tuskar in TripleO will take the necessary steps to install the TripleO Heat Templates into Tuskar. A specification will be filed in the future to cover the role-related API calls.

Problem Description

The REST API in Tuskar seeks to fulfill the following needs:

Flexible selection of an overcloud’s functionality and deployment strategy.
Repository for discovering what roles can be added to a cloud.
Help the user to avoid having to manually manipulate Heat templates to create the desired cloud setup.
Storage of a cloud’s configuration without making the changes immediately live (future needs in this area may include offering a more structured review and promotion lifecycle for changes).

Proposed Change

Overall Concepts

These API calls will be added under the /v2/ path, however the v1 API will not be maintained (the model is being changed to not contact Heat and the existing database is being removed 3).
All calls have the potential to raise a 500 if something goes horribly wrong in the server, but for brevity this is omitted from the list of possible response codes in each call.
All calls have the potential to raise a 401 in the event of a failed user authentication and have been similarly omitted from each call’s documentation.

Retrieve a Single Plan

URL: /plans/<plan-uuid>/

Method: GET

Description: Returns the details of a specific plan, including its list of assigned roles and configuration information.

Notes:

The configuration values are read from Tuskar’s stored files rather than Heat itself. Heat is the source for the live stack, while Tuskar is the source for the plan.

Request Data: None

Response Codes:

200 - if the plan is found
404 - if there is no plan with the given UUID

Response Data:

JSON document containing the following:

Tuskar UUID for the given plan.
Name of the plan that was created.
Description of the plan that was created.
The timestamp of the last time a change was made.
List of the roles (identified by name and version) assigned to the plan. For this sprint, there will be no pre-fetching of any more role information beyond name and version, but can be added in the future while maintaining backward compatibility.
List of parameters that can be configured for the plan, including the parameter name, label, description, hidden flag, and current value if set.

Response Example:

{"uuid":"dd4ef003-c855-40ba-b5a6-3fe4176a069e","name":"dev-cloud","description":"Development testing cloud","last_modified":"2014-05-28T21:11:09Z","roles":[{"uuid":"55713e6a-79f5-42e1-aa32-f871b3a0cb64","name":"compute","version":"1","links":{"href":"http://server/v2/roles/55713e6a-79f5-42e1-aa32-f871b3a0cb64/","rel":"bookmark"}},{"uuid":"2ca53130-b9a4-4fa5-86b8-0177e8507803","name":"controller","version":"1","links":{"href":"http://server/v2/roles/2ca53130-b9a4-4fa5-86b8-0177e8507803/","rel":"bookmark"}}],"parameters":[{"name":"database_host","label":"Database Host","description":"Hostname of the database server","hidden":"false","value":"10.11.12.13"}],"links":[{"href":"http://server/v2/plans/dd4ef003-c855-40ba-b5a6-3fe4176a069e/","rel":"self"}]}

Retrieve a Plan’s Template Files

URL: /plans/<plan-uuid>/templates/

Method: GET

Description: Returns the set of files to send to Heat to create or update the planned application.

Notes:

The Tuskar service will build up the entire environment into a single file suitable for sending to Heat. The contents of this file are returned from this call.

Request Data: None

Response Codes:

200 - if the plan’s templates are found
404 - if no plan exists with the given ID

Response Data: <Heat template>

List Plans

URL: /plans/

Method: GET

Description: Returns a list of all plans stored in Tuskar. In the future when multi-tenancy is added, this will be scoped to a particular tenant.

Notes:

The detailed information about a plan, including its roles and configuration values, are not returned in this call. A follow up call is needed on the specific plan. It may be necessary in the future to add a flag to pre-fetch this information during this call.

Request Data: None (future enhancement will require the tenant ID and potentially support a pre-fetch flag for more detailed data)

Response Codes:

200 - if the list can be retrieved, even if the list is empty

Response Data:

JSON document containing a list of limited information about each plan. An empty list is returned when no plans are present.

Response Example:

[{"uuid":"3e61b4b2-259b-4b91-8344-49d7d6d292b6","name":"dev-cloud","description":"Development testing cloud","links":{"href":"http://server/v2/plans/3e61b4b2-259b-4b91-8344-49d7d6d292b6/","rel":"bookmark"}},{"uuid":"135c7391-6c64-4f66-8fba-aa634a86a941","name":"qe-cloud","description":"QE testing cloud","links":{"href":"http://server/v2/plans/135c7391-6c64-4f66-8fba-aa634a86a941/","rel":"bookmark"}}]

Create a New Plan

URL: /plans/

Method: POST

Description: Creates an entry in Tuskar’s storage for the plan. The details are outside of the scope of this spec, but the idea is that all of the necessary Heat environment infrastructure files and directories will be created and stored in Tuskar’s storage solution 3.

Notes:

Unlike in Icehouse, Tuskar will not make any calls into Heat during this call. This call is to create a new (empty) plan in Tuskar that can be manipulated, configured, saved, and retrieved in a format suitable for sending to Heat.
This is a synchronous call that completes when Tuskar has created the necessary files for the newly created plan.
As of this time, this call does not support a larger batch operation that will add roles or set configuration values in a single call. From a REST perspective, this is acceptable, but from a usability standpoint we may want to add this support in the future.

Request Data:

JSON document containing the following:

Name - Name of the plan being created. Must be unique across all plans in the same tenant.
Description - Description of the plan to create.

Request Example:

{"name":"dev-cloud","description":"Development testing cloud"}

Response Codes:

201 - if the create is successful
409 - if there is an existing plan with the given name (for a particular tenant when multi-tenancy is taken into account)

Response Data:

JSON document describing the created plan. The details are the same as for the GET operation on an individual plan (see Retrieve a Single Plan).

Delete an Existing Plan

URL: /plans/<plan-uuid>/

Method: DELETE

Description: Deletes the plan’s Heat templates and configuration values from Tuskar’s storage.

Request Data: None

Response Codes:

200 - if deleting the plan entries from Tuskar’s storage was successful
404 - if there is no plan with the given UUID

Response Data: None

Adding a Role to a Plan

URL: /plans/<plan-uuid>/roles/

Method: POST

Description: Adds the specified role to the given plan.

Notes:

This will cause the parameter consolidation to occur and entries added to the plan’s configuration parameters for the new role.
This call will update the last_modified timestamp to indicate a change has been made that will require an update to Heat to be made live.

Request Data:

JSON document containing the uuid of the role to add.

Request Example:

{"uuid":"role_uuid"}

Response Codes:

201 - if the addition is successful
404 - if there is no plan with the given UUID
409 - if the plan already has the specified role

Response Data:

The same document describing the plan as from Retrieve a Single Plan. The newly added configuration parameters will be present in the result.

Removing a Role from a Plan

URL: /plans/<plan-uuid>/roles/<role-uuid>/

Method: DELETE

Description: Removes a role identified by role_uuid from the given plan.

Notes:

This will cause the parameter consolidation to occur and entries to be removed from the plan’s configuration parameters.
This call will update the last_modified timestamp to indicate a change has been made that will require an update to Heat to be made live.

Request Data: None

Response Codes:

200 - if the removal is successful
404 - if there is no plan with the given UUID or it does not have the specified role and version combination

Response Data:

The same document describing the cloud as from Retrieve a Single Plan. The configuration parameters will be updated to reflect the removed role.

Changing a Plan’s Configuration Values

URL: /plans/<plan-uuid>/

Method: PATCH

Description: Sets the values for one or more configuration parameters.

Notes:

This call will update the last_modified timestamp to indicate a change has been made that will require an update to Heat to be made live.

Request Data: JSON document containing the parameter keys and values to set for the plan.

Request Example:

[{"name":"database_host","value":"10.11.12.13"},{"name":"database_password","value":"secret"}]

Response Codes:

200 - if the update was successful
400 - if one or more of the new values fails validation
404 - if there is no plan with the given UUID

Response Data:

The same document describing the plan as from Retrieve a Single Plan.

Retrieving Possible Roles

URL: /roles/

Method: GET

Description: Returns a list of all roles available in Tuskar.

Notes:

There will be a separate entry for each version of a particular role.

Request Data: None

Response Codes:

200 - containing the available roles

Response Data: A list of roles, where each role contains:

Name
Version
Description

Response Example:

[{"uuid":"3d46e510-6a63-4ed1-abd0-9306a451f8b4","name":"compute","version":"1","description":"Nova Compute"},{"uuid":"71d6c754-c89c-4293-9d7b-c4dcc57229f0","name":"compute","version":"2","description":"Nova Compute"},{"uuid":"651c26f6-63e2-4e76-9b60-614b51249677","name":"controller","version":"1","description":"Controller Services"}]

Alternatives

There are currently no alternate schemas proposed for the REST APIs.

Security Impact

These changes should have no additional security impact.

Other End User Impact

None

Performance Impact

The potential performance issues revolve around Tuskar’s solution for storing the cloud files 3.

Other Deployer Impact

None

Developer Impact

After being merged, there will be a period where the Tuskar CLI is out of date with the new calls. The Tuskar UI will also need to be updated for the changes in flow.

Implementation

Assignee(s)

Primary assignee:: jdob

Work Items

Implement plan CRUD APIs
Implement role retrieval API
Write REST API documentation

Dependencies

These API changes are dependent on the rest of the Tuskar backend being implemented, including the changes to storage and the template consolidation.

Additionally, the assembly of roles (provider resources) into a Heat environment is contingent on the conversion of the TripleO Heat templates 4.

Testing

Tempest testing should be added as part of the API creation.

Documentation Impact

The REST API documentation will need to be updated accordingly.

References

3(1,2,3): https://review.openstack.org/#/c/97553/
4(1,2): https://review.openstack.org/#/c/97939/

Simple container generation is an initiative to reduce complexity in the TripleO container build, deployment, and distribution process by reducing the size and scope of the TripleO container build tools.

The primary objective of this initiative is to replace Kolla, and our associated Kolla customization tools, as the selected container generation tool-kit. The TripleO community has long desired an easier solution for deployers and integrators alike and this initiative is making that desire a reality.

The Simple container generation initiative is wanting to pivot from a tool-chain mired between a foundational component of Kolla-Ansible and a general purpose container build system, to a vertically integrated solution that is only constructing what TripleO needs, in a minimally invasive, and simple to understand way.

Problem Description

TripleO currently leverages Kolla to produce container images. These images are built for Kolla-Ansible using an opinionated build process which has general purpose features. While our current images work, they’re large and not well suited for the TripleO use-case, especially in distributed data-centers. The issue of container complexity and size impacts three major groups, deployers, third party integrators, and maintainers. As the project is aiming to simplify interactions across the stack, the container life cycle and build process has been identified as something that needs to evolve. The TripleO project needs something vertically integrated which produces smaller images, that are easier to maintain, with far fewer gyrations required to tailor images to our needs.

Proposed Change

Overview

Implement a container file generation role, and a set of statically defined override variable files which are used to generate our required container files. 2

Layering

tripleo-base+---+
                |
                |
                +---+-openstack-${SERVICE}-1-common-+-->openstack-${SERVICE}-1-a
                    |                               |
                    |                               +-->openstack-${SERVICE}-1-b
                    |                               |
                    |                               +-->openstack-${SERVICE}-1-c
                    +-->openstack-${SERVICE}-2
                    |
                    +-->ancillary-${SERVICE}-1
                    |
                    +-->ancillary-${SERVICE}-2

User Experience

Building the standard set of images will be done through a simple command line interface using the TripleO python client.

$ openstack tripleo container image build [opts]<args>

This simple sub-command will provide users the ability to construct images as needed, generate container files, and debug runtime issues.

CLI Options

The python TripleO client options for the new container image build entry point.

Option	Default	Description
config-file	$PATH/overcloud_containers.yaml	Configuration file setting the list of containers to build.
exclude	[]	Container type exclude. Can be specified multiple times.
work-dir	/tmp/container-builds	Container builds directory, storing the container files and logs for each image and its dependencies.
skip-push	False	Skip pushing images to the registry
skip-build	False	Only generates container files without producing a local build.
base	centos	Base image name.
type	binary	Image type.
tag	latest	Image tag.
registry	localhost	Container registry URL.
namespace	tripleomaster	Container registry namespace.
volume	[]	Container bind mount used when building the image. Should be specified multiple times if multiple volumes.

Container Image Build Tools

Container images will be built using Buildah, The required Buildah functionality will leverage BuildahBuilder via python-tripleoclient integration and be exposed though CLI options.

Image layout

Each image will have its own YAML file which has access to the following parameters. Each YAML file will have one required parameter (tcib_from for the source image to build from) and optional parameters.

Option	Default	Type	Required	Description
tcib_path	{{ lookup(‘env’, ‘HOME’) }}	String		Path to generated the container file(s) for a given image.
tcib_args		Dict[str, str]		Single level key:value pairs. Implements arg.
tcib_from	centos:8	Str	True	Container image to deploy from. Implements from.
tcib_labels		Dict[str, str]		Single level key:value pairs. Implements label.
tcib_envs		Dict[str, str]		Single level key:value pairs. Implements env.
tcib_onbuilds		List[str]		<item>=String. Implements onbuild.
tcib_volumes		List[str]		<item>=String. Implements volume.
tcib_workdir		Str		Implements workdir.
tcib_adds		List[str]		<item>=String. Implements add.
tcib_copies		List[str]		<item>=String. Implements copy.
tcib_exposes		List[str]		<item>=String. Implements expose.
tcib_user		Str		Implements user.
tcib_shell		Str		Implements shell.
tcib_runs		List[str]		<item>=String. Implements run.
tcib_healthcheck		Str		Implements healthcheck.
tcib_stopsignal		Str		Implements stopsignal.
tcib_entrypoint		Str		Implements entrypoint.
tcib_cmd		Str		Implements cmd.
tcib_actions		List[Dict[str, str]]		Each item is a Single level Dictionary key:value pairs. Allows for arbitrary verbs which maintains ordering.
tcib_gather_files		List[str]		Each item is a String. Collects files from the host and stores them in the build directory.

Application packages are sorted within each container configuration file. This provides a programmatic interface to derive package sets, allows overrides, and is easily visualized. While the package option is not processes by the tripleo_container_image_build role, it will serve as a standard within our templates.
Option
Description
tcib_packages
Dictionary of packages to install.
common:-openstack-${SERVICE}-commondistro-1:common:-openstack-${SERVICE}-proprietaryx86_64:-$dep-x86_64power:-$dep-powerdistro-2:common:-openstack-${SERVICE}-$dep
This option is then captured and processed by a simple RUN action.
tcib_actions:-run:"dnfinstall-y{{tcib_packages['common']}}{{tcib_packages[ansible_distribution][ansible_architecture]}}"

Option	Description
tcib_packages	Dictionary of packages to install. common:-openstack-${SERVICE}-commondistro-1:common:-openstack-${SERVICE}-proprietaryx86_64:-$dep-x86_64power:-$dep-powerdistro-2:common:-openstack-${SERVICE}-$dep

Example Container Variable File

tcib_from:ubi8tcib_path:"{{lookup('env','HOME')}}/example-image"tcib_labels:maintainer:MaintainerXtcib_entrypoint:dumb-init --single-child --tcib_stopsignal:SIGTERMtcib_envs:LANG:en_US.UTF-8tcib_runs:-mkdir -p /etc/ssh && touch /etc/ssh/ssh_known_hosttcib_copies:-/etc/hosts /opt/hoststcib_gather_files:-/etctcib_packages:common:-curlcentos:x86_64:-wgettcib_actions:-run:"dnfinstall-y{{tcib_packages['common']}}{{tcib_packages[ansible_distribution][ansible_architecture]}}"-copy:/etc/resolv.conf /resolv.conf-run:["/bin/bash","-c","echohelloworld"]

Container File Structure

The generated container file(s) will follow a simple directory structure which provide an easy way to view, and understand, build relationships and dependencies throughout the stack.

tripleo-base/${CONTAINERFILE}
tripleo-base/ancillary-${SERVICE}-1/${CONTAINERFILE}
tripleo-base/ancillary-${SERVICE}-2/${CONTAINERFILE}
tripleo-base/openstack-${SERVICE}-1-common/${CONTAINERFILE}
tripleo-base/openstack-${SERVICE}-1-common/openstack-${SERVICE}-1-a/${CONTAINERFILE}
tripleo-base/openstack-${SERVICE}-1-common/openstack-${SERVICE}-1-b/${CONTAINERFILE}
tripleo-base/openstack-${SERVICE}-1-common/openstack-${SERVICE}-1-c/${CONTAINERFILE}
tripleo-base/openstack-${SERVICE}-2/${CONTAINERFILE}

Alternatives

Use Ansible Bender

Ansible Bender was evaluated as a tool which could help to build the container images. However it has not been productized downstream; which would make it difficult to consume. It doesn’t generate Dockerfiles and there is a strong dependency on Bender tool; the container image build process would therefore be more difficult to do in a standalone environment where Bender isn’t available. 1

Leave the container image build process untouched.

We could leave the container image generate process untouched. This keeps us a consumer of Kolla and requires we maintain our complex ancillary tooling to ensure Kolla containers work for TripleO.

Security Impact

While security is not a primary virtue in the simple container generation initiative, security will be improved by moving to simplified containers. If the simple container generation initiative is ratified, all containers used within TripleO will be vertically integrated into the stack, making it possible to easily audit the build tools and all applications, services, and files installed into our containerized runtimes. With simplification we’ll improve the ease of understanding and transparency which makes our project more sustainable, thereby more secure. The proposed solution must provide layers where we know what command has been run exactly; so we can quickly figure out how an image was built.

Upgrade Impact

There is no upgrade impact because the new container images will provide feature parity with the previous ones; they will have the same or similar injected scripts that are used when the containers start.

Other End User Impact

None

Performance Impact

We should expect better performance out of our containers, as they will be smaller. While the runtime will act the same, the software delivery will be faster as the size of each container will smaller, with better constructed layers. Smaller containers will decrease the mean time to ready which will have a positive performance impact and generally improve the user experience.

Other Deployer Impact

The simplified container generation initiative will massively help third party integrators. With simplified container build tools we will be able to easily articulate requirements to folks looking to build on-top of TripleO. Our tool-chain will be capable of bootstrapping applications where required, and simple enough to integrate with a wide variety of custom applications constructed in bespoke formats.

Developer Impact

In the first phase, there won’t be any developer impact because the produced images will be providing the same base layers as before. For example, they will contain all the Kolla scripts that are required to merge configuration files or initialize the container at startup.

These scripts will be injected in the container images for backward compatibility:

kolla_extend_start
set_configs.py
start.sh
copy_cacerts.sh
httpd_setup.sh

In a second phase, we will simplify these scripts to remove what isn’t needed by TripleO. The interface in the composable services will likely evolve over time. For example kolla_config will become container_config. There is no plan at this time to rewrite the configuration file merge logic.

Implementation

Assignee(s)

Primary assignee:

Cloudnull
EmilienM

Work Items

First phase

Ansible role to generate container file(s) - https://review.opendev.org/#/c/722557
Container images layouts - https://review.opendev.org/#/c/722486
Deprecate “openstack overcloud container image build”
Implement “openstack tripleo container image build” which will reuse the BuildahBuilder and the same logic as the deprecated command but without Kolla.
Build new images and publish them.
Switch the upstream CI to use the new images.

Second phase:

Simplifying the injected scripts to only do what we need in TripleO.
Rename the configuration interfaces in TripleO Heat Templates.

Dependencies

The tooling will be in existing repositories so there is no new dependency. It will mainly be in tripleo-ansible, tripleo-common, python-tripleoclient and tripleo-heat-templates. Like before, Buildah will be required to build the images.

Testing

The tripleo-build-containers-centos-8 job will be switched to be using the new “openstack tripleo container image build” command.
A molecule job will exercise the container image build process using the new role.
Some end-to-end job will also be investigated to build and deploy a container into a running deployment.

Documentation Impact

Much of the documentation impact will be focused on cleanup of the existing documentation which references Kolla, and the creation of documentation that highlights the use of the vertically integrated stack.

Since the changes should be transparent for the end-users who just pull images without rebuilding it, the manuals will still be updated with the new command and options if anyone wants to build the images themselves.

References

1: https://review.opendev.org/#/c/722136/
2: https://review.opendev.org/#/c/722557/
3: https://blueprints.launchpad.net/tripleo/+spec/simplified-containers

Problem description

There is currently no automated way to deploy VxFlexOS from within TripleO. Goal is to provide an ease of use at the time of deployment as well as during lifecycle operations.

Proposed changes

Overview

VxFlexOS has been rebranded to PowerFlex.

The deployer experience to stand up PowerFlex with TripleO should be the following:

The deployer chooses to deploy a role containing any of the PowerFlex services: PowerflexMDM, PowerflexLIA, PowerflexSDS and PowerflexSDC.

At least three new Overcloud roles should be defined such as: - Controller with PowerFlex - Compute with PowerFlex - Storage with PowerFlex

Custom roles definition are used to define which service will run on which type of nodes. We’ll use this custom roles_data.yaml to deploy the overcloud.

PowerFlex support for HCI, which combines compute and storage into a single node, has been considered but will not be part of the first drop.

The deployer provides the PowerFlex parameters as offered today in a Heat env file.

The deployer starts the deployment and gets an overcloud with PowerFlex and appropriate services deployed on each node per its role. Current code is available here. Still WIP.

https://github.com/dell/tripleo-powerflex

The following files are created in /usr/share/openstack-tripleo-heat-templates/deployment/powerflex-ansible : - powerflex-base.yaml - powerflex-lia.yaml - powerflex-mdm.yaml - powerflex-sdc.yaml - powerflex-sds.yaml All of these files are responsible of the configuration of each sevice. Each service is based upon the powerflex-base.yaml template which calls the Ansible playbook and triggers the deployment.

The directory /usr/share/powerflex-ansible holds the Ansible playbook which installs and configure PowerFlex.

A new tripleo-ansible role is created in /usr/share/ansible/roles called tripleo-powerflex-run-ansible which prepares the variables and triggers the execution of the PowerFlex Ansible playbook.

An environment name powerflex-ansible.yaml file is created in /usr/share/openstack-tripleo-heat-emplates/environments/powerflex-ansible and defines the resource registry mapping and additional parameters required by the PowerFlex Ansible playbook.

Ports which have to be opened are managed by TripleO.

PowerFlex deployment with TripleO Ansible

Proposal to create a TripleO Ansible playbook to deploy a PowerFlex system.

We refer to a PowerFlex system as a set of services deployed on nodes on a per-role basis.

The playbook described here assumes the following:

A deployer chooses to deploy PowerFlex and includes the following Overcloud roles which installs the PowerFlex services based upon the mapping found in THT’s roles_data.yaml:

Role       | Associated PowerFlex service             |
———- | —————————————- |
Controller | PowerflexMDM, PowerflexLIA, PowerflexSDC |
Compute    | PowerflexLIA, PowerflexSDC               |
Storage    | PowerflexLIA, PowerflexSDS               |

The deployer chooses to include new Heat environment files which will be in THT when this spec is implemented. An environment file will change the implementation of any of the four services from the previous step.

A new Ansible playbook is called during the deployment which triggers the execution of the appropriate PowerFlex Ansible playbook.

This can be identified as an cascading-ansible deployment.

A separate Ansible playbook will be created for each goal described below:

Initial deployment of OpenStack and PowerFlex
Update and upgrade PowerFlex SW
Scaling up or down DayN operations

This proposal only refers to a single PowerFlex system deployment.

RPMS/Kernel dependencies

Virt-Customize will be used to inject the rpms into the overcloud-full-image for new installations.

Version dependencies

Version control is handled outside current proposal. The staging area has the PowerFlex packages specific to the OS version of overcloud image.

Ansible playbook

Initial deployment of OpenStack and PowerFlex

The sequence of events for this new Ansible playbook to be triggered during initial deployment with TripleO follows:

1. Define the Overcloud on the Undercloud in Heat. This includes the Heat parameters that are related to PowerFlex which will later be passed to powerflex-ansible via TripleO Ansible playbook.

2. Run openstack overcloud deploy with default PowerFlex options and include a new Heat environment file to make the implementation of the service deployment use powerflex-ansible.

3. The undercloud assembles and uploads the deployment plan to the undercloud Swift.

TripleO starts to deploy the Overcloud and interfaces with Heat accordingly.

5. A point in the deployment is reached where the Overcloud nodes are imaged, booted, and networked. At that point the undercloud has access to the provisioning or management IPs of the Overcloud nodes.

6. The TripleO Ansible playbook responsible to Deploy PowerFlex with any of the four PowerFlex services, including PowerflexMDM, PowerflexLIA, PowerflexSDS and PowerflexSDC.

7. The servers which host PowerFlex services have their relevant firewall ports opened according to the needs of their service, e.g. the PowerflexMDM are configured to accept traffic on TCP port 9011 and 6611.

8. A new Heat environment file which defines additional parameters that we want to override is passed to the TripleO Ansible playbook.

9. The TripleO Ansible playbook translates these parameters so that they match the parameters that powerflex-ansible expects. The translation entails building an argument list that may be passed to the playbook by calling ansible-playbook –extra-vars. An alternative location for the /usr/share/powerflex-ansible playbook is possible via an argument. No playbooks are run yet at this stage.

10. The TripleO Ansible playbook is called and passed the list of parameters as described earlier. A dynamic Ansible inventory is used with the -i option. In order for powerflex-ansible to work there must be a group called [mdms], ‘[tbs]’, ‘[sdss]’ and ‘[sdcs]’ in the inventory.

11. The TripleO Ansible playbook starts the PowerFlex install using the powerflex-ansible set of playbooks

Update/Upgrade PowerFlex SW

TBD

Scaling up/down

This implementation supports the add or remove of SDS and/or SDC at any moment (Day+N operations) using the same deployment method.

1. The deployer chooses which type of node he wants to add or remove from the Powerflex system.

2. The deployer launches an update on the Overcloud which will bring up or down the nodes to add/remove.

The nodes will be added or removed from the Overcloud.
The SDS and SDC SW will be added or removed from the PowerFlex system.

5. Storage capacity will be updated consequently. For Scaling down operation, it will succeed only if: - the minimum of 3 SDS nodes remains - the free storage capacity available is enough for rebalancing the data

PowerFlex services breakdown

The PowerFlex system is broken down into multiple components, each of these have to be installed on specific node types.

Non HCI model

Controllers will host the PowerflexLIA, PowerflexMDM and PowerflexSDC (Glance) components. A minimum of 3 MDMs is required.
Computes will host the PowerflexLIA and PowerflexSDC as they will be responsible for accessing volumes. There is no minimum.
Storage will host the PowerflexLIA and PowerflexSDS as disks will be presented as backend. A minimum of 3 SDS is required. A minimum of 1 disk per SDS is also required to connect the SDS.

HCI model

Controllers will host the PowerflexLIA, PowerflexMDM and PowerflexSDC (Glance) components. A minimum of 3 MDMs is required.
Compute HCI will host the PowerflexLIA and PowerflexSDC as they will be responsible for accessing volumes and the PowerflexSDS as disks will be presented as backend. A minimum of 3 SDS is required. A minimum of 1 disk per SDS is also required to connect the SDS.

Security impact

A new SSH key pair will be created on the undercloud. The public key of this pair will be installed in the heat-admin user’s authorized_keys file on all Overcloud nodes which will be MDMs, SDSs, or SDCs. This process will follow the same pattern used to create the SSH keys used for TripleO validations so nothing new would happen in that respect; just another instance on the same type of process.
Additional firewall configuration need to include all TCP/UDP ports needed by Powerflex services according to the following: | Overcloud role | PowerFlex Service | Ports | | ————– | —————– | ———————- | | Controller | LIA, SDC, SDS | 9099, 7072, 6611, 9011 | | Compute | LIA, SDC | 9099 | | Storage | LIA, SDS | 9099, 7072 |
Kernel modules package like scini.ko will be installed depending of the version of the operating system of the overcloud node.
Question: Will there be any SELinux change needed for IP ports that vxflexOS is using?

Performance Impact

The following applies to the undercloud:

TripleO Ansible will need to run an additional playbook

The network data schema (network_data.yaml) used to define composable networks in TripleO has had several additions since it was first introduced. Due to legacy compatibility some additions make the schema somewhat non- intuitive. Such as adding support for routed networks, where the subnets map was introduced.

The goal of this spec is to get discussion and settle on a new network data (v2) format that will be used once management of network resources such as networks, segments and subnets are moved out of the heat stack.

Problem description

The current schema is somewhat inconsistent, and not as precice as it could be. For example the base subnet being at level-0, while additional subnets are in the subnets map. It would be more intuitive to define all subnets in the subnets map.

Currently the network resource properties are configured via a mix of parameters in the heat environment and network data. For example dns_domain, admin_state_up, enable_dhcp, ipv6_address_mode, ipv6_ra_mode and shared properties are configured via Heat parameters, while other properties such as cidr, gateway_ip, host_routes etc. is defined in network data.

Proposed Change

Overview

Change the network data format so that all network properties are managed in network data, so that network resources can be managed outside of the heat stack.

Note

Network data v2 format will only be used with the new tooling that will manage networks outside of the heat stack.

Network data v2 format should stay compatible with tripleo-heat-templates jinja2 rendering outside of the OS::TripleO::Network resource and it’s subresources OS::TripleO::Network::{{network.name}}.

User Experience

Tooling will be provided for user’s to export the network information from an existing deployment. This tooling will output a network data file in v2 format, which from then on can be used to manage the network resources using tripleoclient commands or tripleo-ansible cli playbooks.

The command line tool to manage the network resources will output the environment file that must be included when deploying the heat stack. (Similar to the environment file produced when provisioning baremetal nodes without nova.)

CLI Commands

Command to export provisioned overcloud network information to network data v2 format.

openstack overcloud network export\
  --stack <stack_name> \
  --output <network_data_v2.yaml>

Command to create/update overcloud networks outside of heat.

openstack overcloud network provision \
  --networks-file <network_data_v2.yaml> \
  --output <network_environment.yaml>

Main difference between current network data schema and the v2 schema proposed here:

Base subnet is moved to the subnets map, aligning configuration for non-routed and routed deploymends (spine-and-leaf, DCN/Edge)
The enabled (bool) is no longer used. Disabled networks should be excluded from the file, removed or commented.
The compat_name option is no longer required. This was used to change the name of the heat resource internally. Since the heat resource will be a thing of the past with network data v2, we don’t need it.
The keys ip_subnet, gateway_ip, allocation_pools, routes, ipv6_subnet, gateway_ipv6, ipv6_allocation_pools and routes_ipv6 are no longer valid at the network level.
New key physical_network, our current physical_network names for base and non-base segments are not quite compatible. Adding logic in code to compensate is complex. (This field may come in handy when creating ironic ports in metalsmith as well.)
New keys network_type and segmentation_id since we could have users that used {{network.name}}NetValueSpecs to set network_type vlan.

Note

The new tooling should validate that non of the keys previously valid in network data v1 are used in network data v2.

Example network data v2 file for IPv4

-name:Storagename_lower:storage                     (optional, default:name.lower())admin_state_up:false                   (optional, default:false)dns_domain:storage.localdomain.        (optional, default:undef)mtu:1442                               (optional, default:1500)shared:false                           (optional, default:false)service_net_map_replace:storage        (optional, default:undef)ipv6:true                              (optional, default:false)vip:true                               (optional, default:false)subnets:subnet01:ip_subnet:172.18.1.0/24gateway_ip:172.18.1.254            (optional, default:undef)allocation_pools:(optional, default:[])- start:172.18.1.10end:172.18.1.250enable_dhcp:false                  (optional, default:false)routes:(optional, default:[])- destination:172.18.0.0/24nexthop:172.18.1.254vlan:21                            (optional, default:undef)physical_network:storage_subnet01  (optional, default:{{name.lower}}_{{subnet name}})network_type:flat                  (optional, default:flat)segmentation_id:21                 (optional, default:undef)subnet02:ip_subnet:172.18.0.0/24gateway_ip:172.18.0.254            (optional, default:undef)allocation_pools:(optional, default:[])- start:172.18.0.10end:172.18.0.250enable_dhcp:false                  (optional, default:false)routes:(optional, default:[])- destination:172.18.1.0/24nexthop:172.18.0.254vlan:20                            (optional, default:undef)physical_network:storage_subnet02  (optional, default:{{name.lower}}_{{subnet name}})network_type:flat                  (optional, default:flat)segmentation_id:20                 (optional, default:undef)

Example network data v2 file for IPv6

-name:Storagename_lower:storageadmin_state_up:falsedns_domain:storage.localdomain.mtu:1442shared:falsevip:truesubnets:subnet01:ipv6_subnet:2001:db8:a::/64gateway_ipv6:2001:db8:a::1ipv6_allocation_pools:-start:2001:db8:a::0010end:2001:db8:a::fff9enable_dhcp:falseroutes_ipv6:-destination:2001:db8:b::/64nexthop:2001:db8:a::1ipv6_address_mode:nullipv6_ra_mode:nullvlan:21physical_network:storage_subnet01  (optional, default:{{name.lower}}_{{subnet name}})network_type:flat                  (optional, default:flat)segmentation_id:21                 (optional, default:undef)subnet02:ipv6_subnet:2001:db8:b::/64gateway_ipv6:2001:db8:b::1ipv6_allocation_pools:-start:2001:db8:b::0010end:2001:db8:b::fff9enable_dhcp:falseroutes_ipv6:-destination:2001:db8:a::/64nexthop:2001:db8:b::1ipv6_address_mode:nullipv6_ra_mode:nullvlan:20physical_network:storage_subnet02  (optional, default:{{name.lower}}_{{subnet name}})network_type:flat                  (optional, default:flat)segmentation_id:20                 (optional, default:undef)

Example network data v2 file for dual stack

Dual IPv4/IPv6 with two subnets per-segment, one for IPv4 and the other for IPv6. A single neutron port with an IP address in each subnet can be created.

In this case ipv6 key will control weather services are configured to bind to IPv6 or IPv4. (default ipv6: false)

-name:Storagename_lower:storageadmin_state_up:falsedns_domain:storage.localdomain.mtu:1442shared:falseipv6:true                            (default ipv6:false)vip:truesubnets:subnet01:ip_subnet:172.18.1.0/24gateway_ip:172.18.1.254allocation_pools:-start:172.18.1.10end:172.18.1.250routes:-destination:172.18.0.0/24nexthop:172.18.1.254ipv6_subnet:2001:db8:a::/64gateway_ipv6:2001:db8:a::1ipv6_allocation_pools:-start:2001:db8:a::0010end:2001:db8:a::fff9routes_ipv6:-destination:2001:db8:b::/64nexthop:2001:db8:a::1vlan:21subnet02:ip_subnet:172.18.0.0/24gateway_ip:172.18.0.254allocation_pools:-start:172.18.0.10end:172.18.0.250routes:-destination:172.18.1.0/24nexthop:172.18.0.254ipv6_subnet:2001:db8:b::/64gateway_ipv6:2001:db8:b::1ipv6_allocation_pools:-start:2001:db8:b::0010end:2001:db8:b::fff9routes_ipv6:-destination:2001:db8:a::/64nexthop:2001:db8:b::1vlan:20

Alternatives

Not changing the network data format
In this case we need an alternative to provide the values for resource properties currently managed using heat parameters, when moving management of the network resources outside the heat stack.
Only add new keys for properties
Keep the concept of the base subnet at level-0, and only add keys for properties currently managed using heat parameters.

Security Impact

N/A

Upgrade Impact

When (if) we remove the capability to manage network resources in the overcloud heat stack, the user must run the export command to generate a new network data v2 file. Use this file as input to the openstackovercloudnetworkprovision command, to generate the environment file required for heat stack without network resources.

Performance Impact

N/A

Documentation Impact

The network data v2 format must be documented. Procedures to use the commands to export network information from existing deployments as well as procedures to provision/update/adopt network resources with the non-heat stack tooling must be provided.

Heat parameters which will be deprecated/removed:

{{network.name}}NetValueSpecs: Deprecated, Removed. This was used to set provider:physical_network and provider:network_type, or actually any network property.
{network.name}}NetShared: Deprecated, replaced by network level shared (bool)
{{network.name}}NetAdminStateUp: Deprecated, replaced by network level admin_state_up (bool)
{{network.name}}NetEnableDHCP: Deprecated, replaced by subnet level enable_dhcp (bool)
IPv6AddressMode: Deprecated, replaced by subnet level ipv6_address_mode
IPv6RAMode: Deprecated, replaced by subnet level ipv6_ra_mode

Once deployed_networks.yaml (https://review.opendev.org/751876) is used the following parameters are Deprecated, since they will no longer be used:

{{network.name}}NetCidr
{{network.name}}SubnetName
{{network.name}}Network
{{network.name}}AllocationPools
{{network.name}}Routes
{{network.name}}SubnetCidr_{{subnet}}
{{network.name}}AllocationPools_{{subnet}}
{{network.name}}Routes_{{subnet}}

Implementation

Assignee(s)

Primary assignee:

Harald Jensås

Work Items

Add tags to resources using heat stack - https://review.opendev.org/750666
Tools to extract provisioned networks from existing deployment https://review.opendev.org/750671, https://review.opendev.org/750672
New tooling to provision/update/adopt networks https://review.opendev.org/751739, https://review.opendev.org/751875
Deployed networks template in THT - https://review.opendev.org/751876

https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-templates

This blueprint is part of a the series tripleo-routed-networks-deployment 0.

TripleO uses shared L2 networks for all networks except the provisioning network today. (Support for L3 provisioning network where added in Queens.)

L3 support on the provisioning network is using network segments, a concept in Neutron routed networks, we can represent more than one subnet per VLAN. Without network segments, we would be limited to one subnet per VLAN.

For the non-provisioning networks we have no way to model a true L3 routed network in TripleO today. When deploying such an architecture we currently create custom (neutron) networks for all the different l2 segments for each isolated network. While this approach works it comes with some caveats.

This spec covers refactoring the TripleO Heat Templates to support deployment onto networks which are segregated into multiple layer 2 domains with routers forwarding traffic between layer 2 domains.

Problem Description

The master blueprint for routed networks for deployments breaks the problem set into multiple parts 0. This blueprint presents the problems which are applicable to this blueprint below.

Problem Descriptions

Problem #1: Deploy systems onto a routed provisioning network.

While we can model a routed provisioning network and deploy systems on top of that network today. Doing so requires additional complex configuration, such as:

Setting up the required static routes to ensure traffic within the L3 control plane takes the desired path troughout the network.
L2 segments use different router addresses.
L2 segments may use different subnet masks.
Other L2 segment property differences.

This configuration is essentially manually passing in information in the templates to deploy the overcloud. Information that was already provided when deploying the undercloud. While this works, it increases complexity and the possibility that the user provides incorrect configuration data.

We should be able to get as much of this information based on what was provided when deploying the undercloud.

In order to support this model, there are some requirements that have to be met in Heat and Neutron.

Alternative approaches to Problem #1:

Approach 1:

Note

This is what we currently do.

Since we control addresses and routes on the host nodes using a combination of Heat templates and os-net-config, it may be possible to use static routes to supernets to provide L2 adjacency, rather than relying on Neutron to generate dynamic lists of routes that would need to be updated on all hosts.

Approch 2:

Instead of passing parameters such as ControlPlaneCidr, ControlPlaneDefaultRoute etc implement Neutron RFE 5 and Heat RFE 6. In tripleo-heat-templates we can then use get_attr to get the data. And we leave it to neutron to calculate and provide the routes for the L3 network.

This would require 3, which I believe was in quite good shape before it was abandoned due to activity policy. (An alternative would be to change os-net-config to have an option to only change and apply routing configuration. Something like running ifdown-routes / ifup-routes , however 3 is likely the better solution.)

Problem #2: Static IP assignment: Choosing static IPs from the correct subnet

Possible Solutions, Ideas or Approaches:

Note

We currently use #2, by specifying parameters for each role.

The simplest implementation of this would probably be a mapping of role/index to a set of subnets, so that it is known to Heat that Controller-1 is in subnet set X and Compute-3 is in subnet set Y. The node would then have the ip and subnet info for each network chosen from the appropriate set of subnets. For other nodes, we would need to programatically determine which subnets are correct for a given node.
We could associate particular subnets with roles, and then use one role per L2 domain (such as per-rack). This might be achieved with a map of roles to subnets, or by specifying parameters for each role such as: supernet, subnet (ID and/or ip/netmask), and subnet router.
Initial implementation might follow the model for isolated networking demonstrated by the environments/ips-from-pool-all.yaml. Developing the ips-from-pool model first will allow testing various components with spine-and-leaf while the templates that use dynamic assignment of IPs within specified subnets are developed.
The roles and templates should be refactored to allow for dynamic IP assignment within subnets associated with the role. We may wish to evaluate the possibility of storing the routed subnets in Neutron using the routed networks extensions that are still under development. However, in this case, This is probably not required to implement separate subnets in each rack.
A scalable long-term solution is to map which subnet the host is on during introspection. If we can identify the correct subnet for each interface, then we can correlate that with IP addresses from the correct allocation pool. This would have the advantage of not requiring a static mapping of role to node to subnet. In order to do this, additional integration would be required between Ironic and Neutron (to make Ironic aware of multiple subnets per network, and to add the ability to make that association during introspection.

We will also need to take into account sitations where there are heterogeneous hardware nodes in the same layer 2 broadcast domain (such as within a rack).

Note

This can be done either using node groups in NetConfigDataLookup as implemented in review 4 or by using additional custom roles.

Problem #3: Isolated Networking Requires Static Routes to Ensure Correct VLAN is Used

Possible Solutions, Ideas or Approaches:

Require that supernets are used for various network groups. For instance, all the Internal API subnets would be part of a supernet, for instance 172.17.0.0/16 could be used, and broken up into many smaller subnets, such as /24. This would simplify the routes, since only a single route for 172.17.0.0/16 would be required pointing to the local router on the 172.17.x.0/24 network.
Example: Suppose 2 subnets are provided for the Internal API network: 172.19.1.0/24 and 172.19.2.0/24. We want all Internal API traffic to traverse the Internal API VLANs on both the controller and a remote compute node. The Internal API network uses different VLANs for the two nodes, so we need the routes on the hosts to point toward the Internal API gateway instead of the default gateway. This can be provided by a supernet route to 172.19.x.x pointing to the local gateway on each subnet (e.g. 172.19.1.1 and 172.19.2.1 on the respective subnets). This could be represented in an os-net-config with the following:
```
-type:interfacename:nic3addresses:-ip_netmask:{get_param:InternalApiXIpSubnet}routes:-ip_netmask:{get_param:InternalApiSupernet}next_hop:{get_param:InternalApiXDefaultRoute}
```
Where InternalApiIpSubnet is the IP address on the local subnet, InternalApiSupernet is ‘172.19.0.0/16’, and InternalApiRouter is either 172.19.1.1 or 172.19.2.1 depending on which local subnet the host belongs to.
Modify os-net-config so that routes can be updated without bouncing interfaces, and then run os-net-config on all nodes when scaling occurs. A review for this functionality is in progress 3.
Instead of passing parameters to THT about routes (or supernet routes), implement Neutron RFE 5 and Heat RFE 6. In tripleo-heat-templates we can then use get_attr to get the data we currently read from user provided parameters such as the InternalApiSupernet and InternalApiXDefaultRoute in the example above. (We might also consider replacing 6 with a change extending the network/ports/port.j2 in tripleo-heat-templates to output this data.)

It would be a change to the existing workflow to have os-net-config run on updates as well as deployment, but if this were a non-impacting event (the interfaces didn’t have to be bounced), that would probably be OK. (An alternative is to add an option to have an option in os-net-config that only adds new routes. Something like, os-net-config –no-activate + ifdown-routes/ifup-routes.)

At a later time, the possibility of using dynamic routing should be considered, since it reduces the possibility of user error and is better suited to centralized management. The overcloud nodes might participate in internal routing protocols. SDN solutions are another way to provide this, or other approaches may be considered, such as setting up OVS tunnels.

Problem #4: Isolated Networking in TripleO Heat Templates Needs to be Refactored

The current isolated networking templates use parameters in nested stacks to define the IP information for each network. There is no room in the current schema to define multiple subnets per network, and no way to configure the routers for each network. These values are provided by single parameters.

Possible Solutions, Ideas or Approaches:

We would need to refactor these resources to provide different routers for each network.
We extend the custom and isolated networks in TripleO to add support for Neutron routed-networks (segments) and multiple subnets. Each subnet will be mapped to a different L2 segment. We should make the extension backward compatible and only enable Neutron routed-networks (I.e associate subnets with segments.) when the templates used define multiple subnets on a network. To enable this we need some changes to land in Neutron and Heat, these are the in-progress reviews:
- Allow setting network-segment on subnet update 7
- Allow updating the segment property of OS::Neutron::Subnet 8
- Add first_segment convenience attr to OS::Neutron::Net 9

Proposed Change

The proposed changes are discussed below.

Overview

In order to provide spine-and-leaf networking for deployments, several changes will have to be made to TripleO:

Support for DHCP relay in Neutron DHCP servers (in progress), and Ironic DHCP servers (this is addressed in separate blueprints in the same series).
Refactor assignment of Control Plane IPs to support routed networks (that is addressed by a separate blueprint: tripleo-predictable-ctlplane-ips 2.
Refactoring of TripleO Heat Templates network isolation to support multiple subnets per isolated network, as well as per-subnet and supernet routes.
Changes to Infra CI to support testing.
Documentation updates.

Alternatives

Security Impact

This should be addressed in the documentation, and it should be stressed that ACLs should be in place to prevent unwanted network traffic. For instance, the Internal API network is sensitive in that the database and message queue services run on that network. It is supposed to be isolated from outside connections. This can be achieved fairly easily if supernets are used, so that if all Internal API subnets are a part of the 172.19.0.0/16 supernet, a simple ACL rule will allow only traffic between Internal API IPs (this is a simplified example that would be generally applicable to all Internal API router VLAN interfaces or for a global ACL):

allowtrafficfrom172.19.0.0/16to172.19.0.0/16denytrafficfrom*to172.19.0.0/16

The isolated networks design separates control plane traffic from data plane traffic, and separates administrative traffic from tenant traffic. In order to preserve this separatation of traffic, we will use static routes pointing to supernets. This ensures all traffic to any subnet within a network will exit via the interface attached to the local subnet in that network. It will be important for the end user to implement ACLs in a routed network to prevent remote access to networks that would be completely isolated in a shared L2 deployment.

Other End User Impact

Performance Impact

Other Deployer Impact

Developer Impact

Spine-and-leaf is not easily tested in virt environments. This should be possible, but due to the complexity of setting up libvirt bridges and routes, we may want to provide a pre-configured quickstart environment for testing. This may involve building multiple libvirt bridges and routing between them on the Undercloud, or it may involve using a DHCP relay on the virt-host as well as routing on the virt-host to simulate a full routing switch. A plan for development and testing will need to be developed, since not every developer can be expected to have a routed environment to work in. It may take some time to develop a routed virtual environment, so initial work will be done on bare metal.

A separate blueprint will cover adding routed network support to tripleo-quickstart.

Implementation

Assignee(s)

Primary assignee:

Dan Sneddon <dsneddon@redhat.com>

Other assignees:

Bob Fournier <bfournie@redhat.com>
Harald Jensas <hjensas@redhat.com>
Steven Hardy <shardy@redhat.com>
Dan Prince <dprince@redhat.com>

Approver(s)

Primary approver:: Alex Schultz <aschultz@redhat.com>

Work Items

Implement support for DHCP on routed networks using DHCP relay, as described in Problem #1 above.
Add parameters to Isolated Networking model in Heat to support supernet routes for individual subnets, as described in Problem #3.
Modify Isolated Networking model in Heat to support multiple subnets, as described in Problem #4.
Implement support for iptables on the Controller, in order to mitigate the APIs potentially being reachable via remote routes, as described in the Security Impact section. Alternatively, document the mitigation procedure using ACLs on the routers.
Document the testing procedures.
Modify the documentation in tripleo-docs to cover the spine-and-leaf case.
Modify the Ironic-Inspector service to record the host-to-subnet mappings, perhaps during introspection, to address Problem #2 (long-term).

Implementation Details

Workflow:

Operator configures DHCP networks and IP address ranges
Operator imports baremetal instackenv.json
When introspection or deployment is run, the DHCP server receives the DHCP request from the baremetal host via DHCP relay
If the node has not been introspected, reply with an IP address from the introspection pool* and the inspector PXE boot image
If the node already has been introspected, then the server assumes this is a deployment attempt, and replies with the Neutron port IP address and the overcloud-full deployment image
The Heat templates are processed which generate os-net-config templates, and os-net-config is run to assign static IPs from the correct subnets, as well as routes to other subnets via the router gateway addresses.

The above workflow for the DHCP server should allow for provisioning IPs on multiple subnets.

Dependencies

There may be a dependency on the Neutron Routed Networks. This won’t be clear until a full evaluation is done on whether we can represent spine-and-leaf using only multiple subnets per network.

There will be a dependency on routing switches that perform DHCP relay service for production spine-and-leaf deployments.

Testing

Documentation Impact

The procedure for setting up a dev environment will need to be documented, and a work item mentions this requirement.

https://blueprints.launchpad.net/tripleo/+spec/tripleo-ceph

A light Ansible framework for TripleO integration with Ceph clusters deployed with cephadm and managed with Ceph orchestrator.

Problem Description

Starting in the Octopus release, Ceph has its own day1 tool called cephadm and it’s own day2 tool called orchestrator which will replace ceph-ansible. What should TripleO’s Ceph integration do about this? We currently provide the following user experience:

Describe an OpenStack deployment, which includes Ceph, and TripleO will “make it so”

The above has been true for TripleO since Kilo and should continue. TripleO should also continue hyper-converged support (collocation of OpenStack and Ceph containers). There is sufficient value in both of these (one tool and hyper-convergence) to justify this project. At the same time we want to deploy Ceph in a way consistent with the way the Ceph project is moving and decouple the complexity of day2 management of Ceph from TripleO.

Proposed Change

Overview

Modify tripleo-ansible, tripleo-heat-templates, and python-tripleoclient in support of the following goals:

Provide Ansible roles which deploy Ceph by calling cephadm and Ceph orchestrator
Focus on the day1 problem for Ceph RBD, RGW, CephFS, and Dashboard deployment by leveraging cephadm bootstrap –apply-spec as described in Ceph issue 44873
By default, day2 Ceph operations should be done directly with Ceph orchestrator or Ceph Dashboard and not by running openstack overcloud deploy
TripleO stack updates do not trigger the new Ansible roles introduced by this spec.
Provide an opinionated Ceph installation based on parameters from TripleO (including hardware details from Ironic)
Configure cephx keyrings and pools for OpenStack on a deployed Ceph cluster
Support collocation (hyperconvergence) of OpenStack/Ceph containers on same host - cephadm reconciliation loop must not break OpenStack configuration - TripleO configuration updates must not break Ceph configuration
Provide Ceph integration but maximize orthogonality between OpenStack and Ceph

The implementation of the TripleO CephClient service during the W cycle is covered in a different spec in review 757644. This work will be merged before the work described in this spec as it will be compatible with the current Ceph deployment methods. It will also be compatible with the future deployment methods described in this spec.

Integration Points

The default deployment method of OpenStack/Ceph for TripleO Victoria is the following 2-step-process:

Deploy nodes with metalsmith
Deploy OpenStack and Ceph with openstack overcloud deploy

The Ceph portion of item 2 uses external_deploy_steps_tasks to call ceph-ansible by using the tripleo-ansible roles: tripleo_ceph_common, tripleo_ceph_uuid, tripleo_ceph_work_dir, tripleo_ceph_run_ansible.

The ultimate goal for this spec is to support the following 4-step-process:

Deploy the hardware with metalsmith
Configure networking (including storage networks)
Deploy Ceph with the roles and interface provided by tripleo-ansible/python-tripleoclient
Deploy OpenStack with openstack overcloud deploy

Item 2 above depends on the spec for network data v2 format described in review 752437 and a subsequent network-related feature which moves port management out of Heat, and supports applying network configuration prior to Heat stack deployment described in review 760536.

Item 3 above is the focus of this spec but it is not necessarily the only integration point. If it is not possible to configure the storage networks prior to deploying OpenStack, then the new method of Ceph deployment will still happen via external_deploy_steps_tasks as it currently does in Victoria via the 2-step-process. Another way to say this is that Ceph may be deployed during the overcloud deployment in the 2-step process or Ceph may be deployed before the overcloud during the 4-step process; in either case we will change how Ceph is deployed.

The benefit of deploying Ceph before deploying the overcloud is that the complexity of the Ceph deployment is decoupled from the complexity of the OpenStack deployment. Even if Ceph is deployed before the overcloud, its deployment remains a part of TripleO the same way that the bare metal deployment remains a part of TripleO; even though a separate tool, e.g. metalsmith or cephadm is used to deploy the resources which are not deployed when openstack overcloud deploy is run.

Additional details on how Ceph is deployed before vs during the overcloud deployment are covered in the implementation section.

Alternatives

We could ask deployers to do this:

Deploy hardware and configure networking
Use cephadm and orchestrator directly to configure that hardware with Ceph and create OpenStack pools accessible by CephX clients
Use TripleO to configure OpenStack

We have completed a POC of the above using Ussuri and config-download tags to only run certain steps but would prefer to offer an option to automate the Ceph deployment. The TripleO project has already ensured that the move from one to three is automated and requires only two commands because the tripleo python client now has an option to call metalsmith. The alternative is to not automate step two, but that is user unfriendly.

Another alternative is to continue using ceph-ansible as we do today. However, even though ceph-ansible can deploy Octopus today and will continue to support deployment of Luminous and Nautilus, the project has a cephadm-adopt playbook for converting Ceph clusters that it has deployed to mangement by cephadm orchestrator so seems to be moving away from true Octopus support. ceph-ansible has lot of code and day2 support; porting ceph-ansible itself to cephadm or orchestrator is more work than completing this project with a smaller scope and looser coupling.

Security Impact

The cephadm tool is imperative and requires SSH access to the Ceph cluster nodes in order to execute remote commands and deploy the specified services. This command will need to be installed on one of the overcloud nodes which will host the composable CephMon service. From the cephadm point of view, that node will be a bootstrap node on which the Ceph cluster is created.

For this reason the Ceph cluster nodes must be SSH accessible and provide a user with root privileges to perform some tasks. For example, the standard way to add a new host when using cephadm is to run the following:

ssh-copy-id -f -i /etc/ceph/ceph.pub root@*<new-host>*
ceph orch host add *<new-host>*

The TripleO deployment flow, and in particular config-download, already provides the key elements to properly configure and run the two actions described above, hence the impact from a security point of view is unchanged compared to the previous deployment model.

We will create a user like ceph-admin using the same process config-download uses to create the tripleo-admin user and then cephadm will use this user when it runs commands to add other hosts.

Upgrade Impact

Ceph Nautilus clusters are still managed by ceph-ansible, and cephadm can be enabled, as the new, default backend, once the Octopus release is reached. Therefore, starting from Nautilus, two main steps are identified in the upgrade process:

Upgrade the cluster using ceph-ansiblerolling_update: ceph-ansible should provide, as already done in the past, a rolling update playbook that can be executed to upgrade all the services to the Octopus release
Migrate the existing cluster to cephadm/orchestrator: when all the services are updated to Octopus cephadm-adopt will be executed as an additional step

New Ceph Octopus deployed clusters will use cephadm and ceph orchestrator by default, and the future upgrade path will be provided by cephadm_upgrade, which will be able to run, stop and resume all the Ceph upgrade phases. At that point day2 ceph operations will need to be carried out directly with ceph orchestrator. Thus, it will no longer be necessary to include the tripleo-heat-templates/environments/ceph-ansible/* files in the openstack overcloud deploy command with the exception of the Ceph client configuration as described in review 757644, which will have a new environment file.

Note

The Upgrade process for future releases can be subject of slight modifications according to the OpenStack requirements.

Other End User Impact

The main benefit from the operator perspective is the ability to take advantage of the clear separation between the deployment phase and day2 operations as well as the separation between the Ceph deployment and the OpenStack deployment. At the same time TripleO can still address all the deployment phase operations with a single tool but leave and rely on orchestrator for what concerns day2 tasks.

Many common tasks can now be performed the same way regardless of if the Ceph cluster is internal (deployed by) or external to TripleO. The operator can use the cephadm and orchestrator tools which will be accessible from one of the Ceph cluster monitor nodes.

For instance, since cephadm maintains the status of the cluster, the operator is now able to perform the following tasks without interacting with TripleO at all:

Monitor replacement
OSD replacement (if a hardware change is necessary then Ironic might be involved)

Note

Even though cephadm standalone, when combined with Ceph orchestrator, should support all the commands required to the carry out day2 operations, our plan is for tripleo-ceph to continue to manage and orchestrate other actions that can be taken by an operator when TripleO should be involved. E.g. a CephStorage node is added as a scale-up operation, then the tripleo-ceph Ansible roles should make calls to add the OSDs.

Performance Impact

Stack updates will not trigger Ceph tools so “OpenStack only” changes won’t be delayed by Ceph operations. Ceph client configuration will take less time though this benefit is covered in review 757644.

Other Deployer Impact

Like ceph-ansible, cephadm is distributed as an RPM and can be installed from Ceph repositories. However, since the deployment approach is changed and cephadm requires a Ceph monitor node to bootstrap a minimal cluster, we would like to install the cephadm RPM on the overcloud image. As of today this RPM is approximately 46K and we expect this to simplify the installation process. When cephadm bootstraps the first Ceph monitor (on the first Controller node by default) it will download the necessary Ceph containers. To contrast this proposal with the current Ceph integration, ceph-ansible needs to be installed on the undercloud and it then manages the download of Ceph containers to overcloud nodes. In the case of both cephadm and ceph-ansible, no other package changes are needed for the overcloud nodes as both tools run Ceph in containers.

This change affects all TripleO users who deploy an Overcloud which interfaces with Ceph. Any TripleO users who does not interface with Ceph will not be directly impacted by this project.

TripleO users who currently use environments/ceph-ansible/ceph-ansible.yaml in order to have their overcloud deploy an internal Ceph cluster will need to migrate to the new method when deploying W. This file and others will deprecated as described in more detail below.

The proposed changes do not take immediate effect after they are merged because both the ceph-ansible and cephadm interfaces will exist intree concurrently.

Developer Impact

How Ceph is deployed could change for anyone maintaining TripleO code for OpenStack services which use Ceph. In theory there should be no change as the CephClient service will still configure the Ceph configuration and Ceph key files in the same locations. Those developers will just need to switch to the new interfaces when they are stable.

Implementation

How configuration data is passed to the new tooling when Ceph is deployed before or during the overcloud deployment, as described in the Integration Points section of the beginning of this spec, will be covered in more detail in this section.

Deprecations

Files in tripleo-heat-templates/environments/ceph-ansible/* and tripleo-heat-templates/deployment/ceph-ansible/* will be deprecated in W and removed in X. They will be obsoleted by the new THT parameters covered in the next section with the exception of ceph-ansible/ceph-ansible-external.yaml which will be replaced by environments/ceph-client.yaml as described in review 757644.

The following tripleo-ansible roles will be deprecated at the start of W: tripleo_ceph_common, tripleo_ceph_uuid, tripleo_ceph_work_dir, and tripleo_ceph_run_ansible. The ceph_client role will not be deprecated but it will be re-implemented as described in review 757644. New roles will be introduced to tripleo-ansible to replace them.

Until the project described here is complete during X we will continue to maintain the deprecated ceph-ansible roles and Heat templates for the duration of W and so it is likely that during one release we will have intree support both ceph-ansible and cephadm.

New THT Templates

Not all THT configuration for Ceph can be removed. The firewall is still configured based on THT as descrbed in the next section and THT also controls which composable service is deployed and where. The following new files will be created in tripleo-heat-templates/environments/:

cephadm.yaml: triggers new cephadm Ansible roles until openstack overcloud ceph … makes it unnecessary. Contains the paths to the files described in the Ceph End State Definition YAML Input section.
ceph-rbd.yaml: RBD firewall ports, pools and cephx key defaults
ceph-rgw.yaml: RGW firewall ports, pools and cephx key defaults
ceph-mds.yaml: MDS firewall ports, pools and cephx key defaults
ceph-dashboard.yaml: defaults for Ceph Dashboard firewall ports

All of the above (except cephadm.yaml) will result in the appropriate firewall ports being opened as well as a new idempotent Ansible role connecting to the Ceph cluster in order to create the Ceph pools and cephx keys to access those pools. Which ports, pools and keys are created will depend on which files are included. E.g. if the deployer ran openstack overcloud deploy … -e ceph-rbd.yaml -e cep-rgw.yaml then the ports, pools and cephx keys would be configured for Nova, Cinder, and Glance to use Ceph RBD and RGW would be configured with Keystone, but no firewall ports, pools and keys for the MDS service would be created and the firewall would not be opened for the Ceph dashboard.

None of the above files, except cephadm.yaml, will result in Ceph itself being deployed and none of the parameters needed to deploy Ceph itself will be in the above files. E.g. PG numbers and OSD devices will not be defined in THT anymore. Instead the parameters which are needed to deploy Ceph itself will be in tripleo_ceph_config.yaml as described in the Ceph End State Definition YAML Input section and cephadm.yaml will only contain references to those files.

The cephx keys and pools, created as described above, will result in output data which looks like the following:

pools:-volumes-vms-images-backupsopenstack_keys:-caps:mgr:allow*mon:profilerbdosd:'osd: profile rbd pool=volumes, profile rbd pool=backups,profilerbdpool=vms,profilerbdpool=images'key:AQCwmeRcAAAAABAA6SQU/bGqFjlfLro5KxrB1Q==mode:'0600'name:client.openstack

The above can be written to a file, e.g. ceph_client.yaml, and passed as input to the the new ceph client role described in review 757644 (along with the ceph_data.yaml file produced as output as described in Ceph End State Definition YAML Output).

In DCN deployments this type of information is extracted from the Heat stack with overcloud export ceph. When the new method of deployment is used this information can come directly from each genereated yaml file (e.g. ceph_data.yaml and ceph_client.yaml) per Ceph cluster.

Firewall

Today the firewall is not configured by ceph-ansible and it won’t be configured by cephadm as its –skip-firewalld will be used. We expect the default overcloud to not have firewall rules until openstack overcloud deploy introduces them. The THT parameters described in the previous section will have the same firewall ports as the ones they will deprecate (environments/ceph-ansible/*) so that the appropriate ports per service and based on composable roles will be opened in the firewall as they are today.

OSD Devices

The current defaults will always be wrong for someone because the devices list of available disks will always vary based on hardware. The new default will use all available devices when creating OSDs by running ceph orch apply osd –all-available-devices. It will still be possible to override this default though the ceph-ansible syntax of the devices list will be deprecated. In its place the OSD Service Specification defined by cephadm drivegroups will be used and the tool will apply it by running ceph orch apply osd -i osd_spec.yml. More information on the osd_spec.yaml is covered in the Ceph End State Definition YAML Input section.

Ceph Placement Group Parameters

The new tool will deploy Ceph with the pg autotuner feature enabled. Parameters to set the placement groups will be deprecated. Those who wish to disable the pg autotuner may do so using Ceph CLI tools after Ceph is deployed.

Ceph End State Definition YAML Input

Regardless of if Ceph is deployed before or during overcloud deployment, a new playbook which deploys Ceph using cephadm will be created and it will accept the following files as input:

deployed-metal.yaml: this file is generated by running a command like openstack overcloud node provision … –output deployed-metal.yaml when using metalsmith.
(Optional) “deployed-network-env”: the file that is generated by openstack network provision as described in review 752437. This file is used when deploying Ceph before the overcloud to identify the storage networks. This will not be necessary when deploying Ceph during overcloud deployment so it is optional and the storage network will be identified instead as it is today.
(Optional) Any valid cephadm config.yml spec file as described in Ceph issue 44205 may be directly passed to the cephadm execution and where applicable will override all relevant settings in the file described at the end of this list.
(Optional) Any valid drivegroup YAML file (e.g. osd_spec.yml) may be passed and the tooling will apply it with ceph orch apply osd -i osd_spec.yml. This setting will override all relevant settings in the file described at the end of this list.
tripleo_ceph_config.yaml: This file will contain configuration data compatible with nearly all Ceph options supported today by TripleO Heat Templates with the exception of the firewall, ceph pools and cephx keys. A template of this file will be provided in as a default in one of the new tripleo-ansible roles (e.g. tripleo_cephadm_common)

Another source of data which is input into the new playbook is the inventory which is covered next section.

Ansible Inventory and Ansible User

The current Ceph implementation uses the Ansible user tripleo-admin. That user and the corresponding SSH keys are created by the tripleo-ansible role tripleo_create_admin. This role uses the heat-admin account which is the default account if openstack overcloud node provision is not passed the –overcloud-ssh-user option. The current implementation also uses the inventory generated by tripleo-ansible-inventory. These resources will not be available if Ceph is deployed before the overcloud and there’s no reason they are needed if Ceph is deployed during the overcloud deployment.

Regardless if Ceph is deployed before or during overcloud, prior to deploying Ceph, openstack overcloud admin authorize should be run and it should pass options to enable a ceph-admin user which can be used by cephadm and to allow SSH access for the ansible roles described in this spec.

A new command, openstack overcloud ceph inventory will be implemented which creates an Ansible inventory for the new playbook and roles described in this spec. This command will require the following input:

deployed-metal.yaml: this file is generated by running a command like openstack overcloud node provision … –output deployed-metal.yaml when using metalsmith.
(Optional) roles.yaml: If this file is not passed then /usr/share/openstack-tripleo-heat-templates/roles_data.yaml will be used in its place. If the roles in deployed-metal.yaml do not have a definition found in roles.yaml, then an error is thrown that a role being used is undefined. By using this file, the TripleO composable roles will continue to work as they to today. The services matching “OS::TripleO::Services::Ceph*” will correspond to a new Ansible inventory group and the hosts in that group will correspond to the hosts found in deployed-metal.yaml.
(Options) -u –ssh-user <USER>: this is not a file but an option which defaults to “ceph-admin”. This represents the user which was created created on all overcloud nodes by openstack overcloud admin authorize.
(Options) -i –inventory <FILE>: this is not a file but an option which defaults to “/home/stack/inventory.yaml”. This represents the inventory which will be created.

If Ceph is deployed before the overcloud, users will need to run this command to generate an Ansible inventory file. They will also need to pass the path to the generated inventory file to openstack overcloud ceph provision as input.

If Ceph is deployed during overcloud deployment, users do not need to know about this command as external_deploy_steps_tasks will run this command directly to generate the inventory before running the new tripleo ceph playbook with this inventory.

Ceph End State Definition YAML Output

The new playbook will write output data to one yaml file which contains information about the Ceph cluster and may be used as input to other processes.

In the case that Ceph is deployed before the overcloud, if openstack overcloud ceph provision –output ceph_data.yaml were run, then ceph_data.yaml would then be passed to openstack overcloud deploy … -e ceph_data.yaml. The ceph_data.yaml file will contain key/value pairs such as the Ceph FSID, Name, and the Ceph monitor IPs.

In the case that Ceph is deployed with the overcloud, if external_deploy_steps_tasks calls the new playbook, then the same file will be written to it’s default location (/home/stack/ceph_data.yaml) and the new client role will directly read the parameters from this file.

An example of what this file, e.g. ceph_data.yaml, looks like is:

cluster:cephfsid:af25554b-42f6-4d2b-9b9b-d08a1132d3e899ceph_mon_ips:-172.18.0.5-172.18.0.6-172.18.0.7

In DCN deployments this type of information is extracted from the Heat stack with overcloud export ceph. When the new method of deployment is used this information can come directly from the ceph_data.yaml file per Ceph cluster. This file will be passed as input to the new ceph client role described in review 757644.

Requirements for deploying Ceph during Overcloud deployment

If Ceph is deployed during the overcloud deployment, the following should be the case:

The external_deploy_steps_tasks playbook will execute the new Ansible roles after openstack overcloud deploy is executed.
If openstack overcloud node provision .. –output deployed-metal.yaml were run, then deployed-metal.yaml would be input to openstack overcloud deploy. This is the current behavior we have in V.
Node scale up operations for day2 Ceph should be done by running openstack overcloud node provision and then openstack overcloud deploy. This will include reasserting the configuration of OpenStack services unless those operations are specifically set to “noop”.
Creates its own Ansible inventory and user
The path to the “Ceph End State Definition YAML Input” is referenced via a THT parameter so that when external_deploy_steps_tasks runs it will pass this file to the new playbook.

Requirements for deploying Ceph before Overcloud deployment

If Ceph is deployed before the overcloud deployment, the following should be the case:

The new Ansible roles will be triggered when the user runs a command like openstack overcloud ceph …; this command is meant to be run after running openstack overcloud node provision to trigger metalsmith but before running openstack overcloud deploy.
If openstack overcloud node provision .. –output deployed-metal.yaml were run, then deployed-metal.yaml would be input to openstack overcloud ceph provision.
Node scale up operations for day2 Ceph should be done by running openstack overcloud node provision, openstack overcloud network provision, and openstack overcloud admin authorize to enable a ceph-admin user. However it isn’t necessary to run openstack overcloud ceph … because the operator should connect to the Ceph cluster itself to add the extra resources, e.g. use a cephadm shell to add the new hardware as OSDs or other Ceph resource. If the operation includes adding hyperconverged node with both Ceph and OpenStack services then the third step will be to run openstack overcloud deploy.
Requires the user to create an inventory (and user) before running using new Ceph deployment tools.
“Ceph End State Definition YAML Input” is directly passed.

Container Registry Support

It is already supported to host a container registry on the undercloud. This registry contains Ceph and OpenStack containers and it may be populated before deployment or during deployment. When deploying ceph before overcloud deployment it will need to be populated before deployment. The new integration described in this spec will direct cephadm to pull the Ceph containers from the same source identified by ContainerCephDaemonImage. For example:

ContainerCephDaemonImage:undercloud.ctlplane.mydomain.tld:8787/ceph-ci/daemon:v4.0.13-stable-4.0-nautilus-centos-7-x86_64

Network Requirements for Ceph to be deployed before the Overcloud

The deployment will be completed by running the following commands:

openstack overcloud node provision …
openstack overcloud network provision … (see review 751875)
openstack overcloud ceph … (triggers cephadm/orchestrator)
openstack overcloud deploy …

In the past stack updates did everything, but the split for metalsmith established a new pattern. As per review 752437 and a follow up spec to move port management out of Heat, and apply network configuration prior to the Heat stack deployment, it will eventually be possible for the network to be configured before openstack overcloud deploy is run. This creates an opening for the larger goal of this spec which is a looser coupling between Ceph and OpenStack deployment while retaining full integration. After the storage and storage management networks are configured, then Ceph can be deployed before any OpenStack services are configured. This should be possible regardless of if the same node hosts both Ceph and OpenStack containers.

Development work on for deploying Ceph before overcloud deployment can begin before the work described in reviews 752437 and 760536 is completed by either of the following methods:

Option 1: - openstack overcloud deploy –skip-tags step2,step3,step4,step5 - use tripleo-ceph development code to stand up Ceph - openstack overcloud deploy –tags step2,step3,step4,step5

The last step will also configure the ceph clients. This sequence has been verified to work in a proof of concept of this proposal.

Option 2: - Create the storage and storage management networks from the undercloud (using review 751875) - Create the Ironic ports for each node as per review 760536 - Use instances Nics Properties to pass a list of dicts to provision the node not just on the ctlplane network but also the storage and storage-management networks when the node is provisioned with metalsmith - Metalsmith/Ironic should attach the VIFs so that the nodes are connected to the Storage and Storage Management networks so that Ceph can then be deployed.

PID1 services used by Ceph

During the W cycle we will not be able to fully deploy an HA Dashboard and HA RGW service before the overcloud is deployed. Thus, we will deploy these services as we do today; by using a ceph tool, though we’ll use cephadm in place of ceph-ansible, and then complete the configuration of these services during overcloud deployment. Though the work to deploy the service itself will be done before overcloud deployment, the service won’t be accessible in HA until after the overcloud deployment.

Why can’t we fully deploy the HA RGW service before the overcloud? Though cephadm can deploy an HA RGW service without TripleO its implementation uses keepalived which cannot be collocated with pacemaker, which is required on controller nodes. Thus, during the W cycle we will keep using the RGW service with haproxy and revisit making it a separate deployment with collaboration with the PID1 team in a future cycle.

Why can’t we fully deploy the HA Dashboard service before the overcloud? cephadm does not currently have a builtin HA model for its dashboard and the HA Dashboard is only available today when it is deployed by TripleO (unless it’s configured manually).

Ceph services which need VIPs (Dashbard and RGW) need to know what the VIPs will be in advance but the VIPs do not need to be pingable before those Ceph services are deployed. Instead we will be able to know what the VIPs are before deploying Ceph per the work related to reviews 751875 and 760536. We will pass these VIPs as input to cephadm.

For example, if we know the Dashboard VIP in advance, we can run the following:

ceph--cluster{{cluster}}dashboardset-grafana-api-url{{dashboard_protocol}}://{{VIP}}:{{grafana_port}}"

The new automation could then save the VIP parameter in the ceph mgr global config. A deployer could then and wait for haproxy to be available from the overcloud deploy so that an HA dashbard similar to the one Victoria deploys is available.

It would be simpler if we could address the above issues before overcloud deployment but doing so is out of the scope of this spec. However, we can aim to offer the dashboard in HA with the new tooling around the time of the X cycle and we hope to do so through collaboration with the Ceph orchestrator community.

TripleO today also supports deploying the Ceph dashboard on any composed network. If the work included in review 760536 allows us to compose and deploy the overcloud networks in advance, then we plan to pass parameters to cephadm to continue support of the dashboard on its own private network.

TLS-Everywhere

If Ceph is provisioned before the overcloud, then we will not have the certificates and keys generated by certmonger via TripleO’s tls-everywhere framework. We expect cephadm to be able to deploy the Ceph Dashboard (with Grafana), RGW (with HA via haproxy) with TLS enabled. For the sake of orthogonality we could require that the certificates and keys for RGW and Dashboard be generated outside of TripleO so that these services could be fully deployed without the overcloud. However, because we still need to use PID1 services as described in the previous section, we will continue to use TripleO’s TLS-e framework.

Assignee(s)

fmount
fultonj
gfidente
jmolmo

Work Items

Create a set of roles matching tripleo_ansible/roles/tripleo_cephadm_* which can coexist with the current tripleo_ceph_common, tripleo_ceph_uuid, tripleo_ceph_work_dir, tripleo_ceph_run_ansible, roles.
Patch the python tripleo client to support the new command options
Create a new external_deploy_steps_tasks interface for deploying Ceph using the new method during overcloud deployment
Update THT scenario001/004 to use new method of ceph deployment

Proposed Schedule

OpenStack W: merge tripleo-ansible/roles/ceph_client descrbed in review 757644 early as it will work with ceph-ansible internal ceph deployments too. Create tripleo-ansible/roles/cephadm_* roles and tripleo client work to deploy Octopus as experimental and then default (only if stable). If new tripleo-ceph is not yet stable, then Wallaby will release with Nautilus support as deployed by ceph-ansible just like Victoria. Either way Nautilus support via current THT and tripleo-ansible triggering ceph-ansible will be deprecated.
OpenStack X: tripleo-ansible/roles/cephadm_* become the default, tripleo-ansible/roles/ceph_* are removed except the new ceph_client, tripleo-heat-templates/environments/ceph-ansible/* removed. Migrate to Ceph Pacific which GAs upstream in March 2021.

Dependencies

The spec for tripleo-ceph-client described in review 757644
The spec for network data v2 format described in review 752437
The spec for node ports described in review 760536

The last two items above are not required if we deploy Ceph during overcloud deployment.

Testing

This project will be tested against at least two different scenarios. This will ensure enough coverage on different use cases and cluster configurations, which is pretty similar to the status of the job definition currently present in the TripleO CI. The defined scenarios will test different features that can be enabled at day1. As part of the implementation plan, the definition of the tripleo-heat-templates environment CI files, which contain the testing job parameters, is one of the goals of this project, and we should make sure to have:

a basic scenario that covers the ceph cluster deployment using cephadm; we will gate the tripleo-ceph project against this scenario, as well as the related tripleo heat templates deployment flow;
a more advanced use case with the purpose of testing the configuration that can be applied to the ceph cluster and are orchestrated by the tripleo-ceph project.

The two items described above are pretty similar to the test suite that today is maintained in the TripleO CI, and they can be implemented reworking the existing scenarios, adding the proper support to the cephadm deployment model. A WIP patch can be created and submitted with the purpose of testing and gating the tripleo-ceph project, and, when it becomes stable enough, the scenario001 will be able to be officially merged. The same approach can be applied to the existing scenario004, which can be seen as an improvement of the first testing job. This is mostly used to test the Rados Gateway service deployment and the manila pools and key configuration. An important aspect of the job definition process is related to standalone vs multinode. As seen in the past, multinode can help catching issues that are not visible in a standalone environment, but of course the job configuration can be improved in the next cycles, and we can start with standalone testing, which is what is present today in CI. Maintaining the CI jobs green will be always one of the goals of the ceph integration project, providing a smooth path and a good experience moving from ceph-ansible to cephadm, continuously improving the testing area to ensure enough coverage of the implemented features.

Documentation Impact

tripleo-docs will be updated to cover Ceph integration with the new tool.

The goal of this proposal is to introduce the community to the idea of disabling Swift on the TripleO Undercloud. Within this propose we intend to provide a high-level overview of how we can accomplish this goal.

Problem Description

Swift is being used to store objects related to the deployment which are managed entirely on the Undercloud. In the past, there was an API / UI to interact with the deployment tooling; however, with the deprecation of the UI and the removal of Mistral this is no longer the case. The Undercloud is assumed to be a single node which is used to deploy OpenStack clouds, and requires the user to login to the node to run commands. Because we’re no longer attempting to make the Undercloud a distributed system there’s no need for an API’able distributed storage service. Swift, in it’s current state, is under-utilized and carries unnecessary operational and resource overhead.

Proposed Change

Overview

Decommission Swift from the Undercloud.

To decommission Swift, we’ll start by removing all of the tripleoclient Swift interactions. These interactions are largely storing and retrieving YAML files which provide context to the user for current deployment status. To ensure we’re not breaking deployment expectations, we’ll push everything to the local file system and retain all of the file properties wherever possible. We will need coordinate with tripleo-ansible to ensure we’re making all direct Swift client and module interactions optional.

Once we’re able to remove the tripleoclient Swift interactions, we’ll move to disable Swift interactions from tripleo-common. These interactions are similar to the ones found within the tripleoclient, though tripleo-common has some complexity; we’ll need to ensure we’re not breaking expectations we’ve created with our puppet deployment methodologies which have some Swift assumptions.

Alternatives

We keep everything as-is.

Security Impact

There should be no significant security implications when disabling Swift. It could be argued that disabling Swift might make the deployment more secure, it will lessen the attack surface; however, given the fact that Swift on the Undercloud is only used by director I would consider any benefit insignificant.

Upgrade Impact

There will be no upgrade impact; this change will be transparent to the end-user.

Other End User Impact

None.

Performance Impact

Disabling Swift could make some client interactions faster; however, the benefit should be negligible. That said, disabling Swift would remove a service on the Undercloud, which would make setup faster and reduce the resources required to run the Undercloud.

Other Deployer Impact

Operationally we should see an improvement as it will no longer be required to explore a Swift container, and download files to debug different parts of the deployment. All deployment related file artifacts housed within Swift will exist on the Undercloud using the local file system, and should be easily interacted with.

Developer Impact

None, if anything disabling Swift should make the life of a TripleO developer easier.

Implementation

Excising Swift client interactions will be handled directly in as few reviews as possible; hopefully allowing us to backport this change, should it be deemed valuable to stable releases.

All of the objects stored within Swift will be stored in /var/lib/tripleo/{named_artifact_directories}. This will allow us to implement all of the same core logic in our various libraries just without the use of the API call to store the object.

In terms of enabling us to eliminate swift without having a significant impact on the internal API we’ll first start by trying to replace the swift object functions within tripleo-common with local file system calls. By using the existing functions and replacing the backend we’ll ensure API compatibility and lessen the likely hood of creating regressions.

Note

We’ll need to collaborate with various groups to ensure we’re porting assumed functionality correctly. While this spec will not go into the specifics implementation details for porting assumed functionality, it should be known that we will be accountable for ensuring existing functionality is ported appropriately.

Assignee(s)

Primary assignee:: cloudnull

Other contributors:

emilien
ekultails

Work Items

The work items listed here are high level, and not meant to provide specific implementation details or timelines.

Enumerate all of the Swift interactions
Create a space on the Undercloud to house the files
This location will be on the local file system and will be created into a git archive; git is used for easier debug, rapid rollback, and will provide simple versioning.
Create an option to disable Swift on the Undercloud.
Convert client interactions to using the local file system
Ensure all tripleo-ansible Swift client calls are made optional
Convert tripleo-common Swift interactions to using the local file system
Disable Swift on the Undercloud

Dependencies

Before Swift can be disabled on the Undercloud we will need ensure the deployment methodology has been changed to Metalsmith.

Testing

The Swift tests will need to be updated to use the local file system, however the existing tests and test structure will be reused.

Documentation Impact

There are several references to Swift in our documentation which we will need to update.

References

https://blueprints.launchpad.net/tripleo/+spec/tripleo-ceph-client

Native Ansible roles for TripleO integration with Ceph clusters.

Problem Description

Starting in the Octopus release, Ceph has its own day1 tool called cephadm 1 and it’s own day2 tool called orchestrator 2 which will replace ceph-ansible 3. While ceph-ansible had the necessary features to configure Ceph clients, distributing for example config file and keyrings as necessary on nodes which aren’t members of the Ceph cluster, neither cephadm or the orchestrator will manage Ceph clients configuration.

Goal is to create some new ansible roles in TripleO to perform the Ceph clients (Nova, Cinder, Glance, Manila) configuration, which is of special importance in TripleO to support deployment scenarios where the Ceph cluster is externally managed, not controlled by the undercloud, yet the OpenStack services configuration remains a responsibility of TripleO.

Proposed Change

Overview

Introduce a new role into tripleo-ansible for Ceph client configuration.

The new role will:

Configure OpenStack services as clients of an external Ceph cluster (in the case of collocation, the ceph cluster is still logically external)
Provide Ceph configuration files and cephx keys for OpenStack clients of RBD and CephFS (Nova, Cinder, Glance, Manila)
Full multiclient support, e.g. one OpenStack deployment may use multiple Ceph clusters, e.g. multibackend Glance
Configure clients quickly, e.g. generate the key in one place and copy it efficiently
This is a standalone role which is reusable to configure OpenStack against an externally managed Ceph cluster
Not break existing support for CephExternalMultiConfig which is used for configuring OpenStack to work with more than one Ceph cluster when deploying Ceph in DCN environments (Deployment of dashboard on DCN sites is not in scope with this proposal).

Alternatives

Support for clients configuration might be added in future versions of cephadm, yet there are some reasons why we won’t be able to use this feature as-is even if it was available today:

it assumes the for the cephadm tool to be configured with admin privileges for the external Ceph cluster, which we don’t have when Ceph is not managed by TripleO;
it also assumes that each and every client node has been provisioned into the external Ceph orchestrator inventory so that evey Ceph MON is able to log into the client node (overcloud nodes) via SSH;
while offering the necessary functionalities to copy the config files and cephx keyrings over to remote client nodes, it won’t be able to configure for example Nova with the libvirtd secret for qemu-kvm, which is a task only relevant when the client is OpenStack;

Security Impact

None derived directly from the decision to create new ansible roles. The distribution of the cephx keyrings itself though should be implemented using a TripleO service, like the existing CephClient service, so that keyrings are only deployed on those nodes which actually need those.

Upgrade Impact

The goal is to preserve and reuse any existing Heat parameter which is currently consumed to drive ceph-ansible; from operators’ perspective the problem of configuring a Ceph client isn’t changed and there shouldn’t be a need to change the existing parameters, it’s just the implementation which will change.

Performance Impact

As described in the Proposed Change section, the purpose of this role is to proper configure clients and it allows OpenStack services to connect to an internal or external Ceph cluster, as well as multiple Ceph cluster in a DCN context. Since both config files and keys are necessary for many OpenStack services (Nova, Cinder, Glance, Manila) to make them able to properly interact with the Ceph cluster, at least two actions should be performed:

generate keys in one place
copy the generated keys efficiently

The ceph_client role should be very small, and a first improvement in terms of performances can be found on key generation since they are created in one, centralized place. The generated keys, then, just need to be distributed across the nodes of the Ceph cluster, as well as the Ceph cluster config file. Adding this role to tripleo-ansible avoid adding an extra calls from a pure deployment perspective; in fact, no additional ansible playbooks will be triggered and we expect to see performances improved since no additional layers are involved here.

Developer Impact

Implementation

The new role should be enabled by a TripleO service, like it happens today with the CephClient service. Depending on the environment file chosen at deployment time, the actual implementation of such a service could either be based on ceph-ansible or on the new role.

When the Ceph cluster is not external, the role will also create pools and the cephx keyrings into the Ceph cluster; these steps will be skipped instead when Ceph is external precisely because we won’t have admin privileges to change the cluster configuration in that case.

TripleO Heat Templates

The existing implementation which depends on ceph-ansible will remain in-tree for at least 1 deprecation cycle. By reusing the existing Heat input parameters we should be able to transparently make the clients configuration happen with ceph-ansible or the new role just by switching the environment file used at deployment time. TripleO users who currently use environments/ceph-ansible/ceph-ansible-external.yaml in order to have their Overcloud use an existing Ceph cluster, should be able to apply the same templates to the new template for configuring Ceph clients, e.g. environments/ceph-client.yaml. This will result in the new tripleo-ansible/roles/ceph_client role being executed.

Assignee(s)

fmount
fultonj
gfidente
jmolmo

Work Items

Proposed Schedule

OpenStack W: start tripleo-ansible/roles/ceph_client as experimental and then set it as default in scenarios 001/004. We expect to to become stable during the W cycle.

Dependencies

The ceph_client role will be added in tripleo-ansible and allow configuring the OpenStack services as clients of an external or TripleO managed Ceph cluster; no new dependencies are added for tripleo-ansible project. The ceph_client role will work with External Ceph, Internal Ceph deployed by ceph-ansible, and the Ceph deployment described in 4.

Testing

It should be possible to reconfigure one of the existing CI scenarios already deploying with Ceph to use the newer ceph_client role, making it non-voting until the code is stable. Then switch the other existing CI scenario to it.

Documentation Impact

No doc changes should be needed.

References

1: cephadm
2: orchestrator
3: ceph-ansible
4: tripleo-ceph

https://blueprints.launchpad.net/tripleo/+spec/tripleo-ceph-ganesha

Starting in the Octopus release, Ceph has its own day1 tool called cephadm and its own day2 tool called orchestrator which will replace ceph-ansible.

During the Wallaby cycle TripleO will no longer use ceph-ansible for Ceph deployment and instead use cephadm 2 as described in 1. Ganesha deserves special attention because for its deployment we will use special functionalities in cephadm 2 meant to deploy the Ganesha service standalone when the Ceph cluster is external.

Problem Description

In TripleO we support deployment of Ganesha both when the Ceph cluster is itself managed by TripleO and when the Ceph cluster is itself not managed by TripleO.

When the Ceph cluster is not managed by Tripleo, the Ganesha service must be deployed standalone; that is, without any additional core Ceph daemon and it should instead be configured to use the external Ceph MON and MDS daemons.

Proposed Change

Overview

An ansible task will trigger cephadm 2 with special arguments for it to stand up a standalone Ganesha container and to it we will provide:

the Ceph cluster config file, generated using tripleo-ceph-client 3 role
the Ceph cluster keyring to interact with MDS
the Ganesha config file with pointers to the Ceph config/keyring to use

The container will then be controlled by pacemaker, as it is today and reusing the same code which today manages the ceph-nfs systemd service created by ceph-ansible.

Alternatives

Forking and reusing the existing ceph-ansible role for ceph-nfs has been discussed but ultimately discarded as that would have moved ownership of the Ganesha deployment tasks in TripleO, while our goal remaing to keep ownership where subject expertise is, in the Ceph deployment tool.

Security Impact

None, the same code which TripleO would already use for the generation of the Ceph cluster config and keyrings will be consumed.

Upgrade Impact

Some upgrade tasks which stop and remove the pre-existing ceph-nfs container and systemd unit will be added to clean up the system from the ceph-ansible managed resources.

Other End User Impact

None, the existing input parameters will be reused to drive the newer deployment tool.

Performance Impact

No changes.

Other Deployer Impact

No impact on users.

Developer Impact

The Ganesha config file will be generated using a specific tripleo-ceph task while previously, with ceph-ansible, this was created by ceph-ansible itself.

Implementation

Deployment Flow

The deployment and configuration described in this spec will happen before openstack overcloud deploy, as described in 1. This is consistent with how ceph-ansible used to run during step2 to configure these services. However, parts of the Manila configuration which use Ganesha will still happen when openstack overcloud deploy is run. This is because some of the configuration for Ganesha and Manila needs to happen during step 5. Thus, files like environments/manila-cephfsganesha-config.yaml will be updated to trigger the new required actions.

Assignee(s)

fmount
fultonj
gfidente

Work Items

Create a set of tasks to deploy on overcloud nodes the Ganesha config file
Create a set of tasks to trigger cephadm with special arguments

Dependencies

The tripleo-ceph spec 1

Testing

Testing is currently impossible as we only have one network while for Ganesha we require at least two, one which connects it to the Ceph public network and another where the NFS proxy service is exposed to tenants.

This is a design decision, one of the values added by the use of an NFS proxy for CephFS is to implement network isolation in between the tenant guests and the actual Ceph cluster.

Such a limitation does not come from the migration to cephadm 2 but it has always existed; the code which enforces the use of two isolated networks is in fact in TripleO, not in the Ceph tool itself. We might revisit this in the future but it is not a goal of this spec to change this.

Documentation Impact

No changes should be necessary to the TripleO documentation.

References

1(1,2,3): tripleo-ceph
2(1,2,3,4): cephadm
3: tripleo-ceph-client

With “Network Data v2” the goal is to move management of network resources out of the heat stack. The schema spec 1 talked about the network_data.yaml format and managing networks, segments and subnets. This spec follows up with node ports for composable networks and moving the node network configuration action to the baremetal/network configuration workflow.

Problem description

Applying a network change on day 2, currently requires a full stack update since network resources such as ports are managed by heat. It has also been problematic to create ports for large scale deployments; neutron on the single node undercloud gets overwhelmed and it is difficult to throttle port creation in Heat. As an early indication on the performance of port creation with the proposed ansible module:

Performance stats: 100 nodes x 3 networks = 300 ports

        4xCPU 1.8 GHz (8GB)             8x CPU 2.6 GHz (12GB)
        -------------------  --------------------------------
Concurr:                 10          20         10          4
........     ..............   .........  .........  .........
Create       real 5m58.006s   1m48.518s  1m51.998s  1m25.022s
Delete:      real 4m12.812s   0m47.475s  0m48.956s  1m19.543s
Re-run:      real 0m19.386s    0m4.389s   0m4.453s   0m4.977s

Proposed Change

Extend the baremetal provisioning workflow that runs before overcloud deployment to also create ports for composable networks. The baremetal provisioning step already create ports for the provisioning network. Moving the management of ports for composable networks to this workflow will consolidate all port management into one workflow.

Also make baremetal provisioning workflow execute the tripleo-ansible tripleo_network_config role to configure node networking after node provisioning.

The deploy workflow would be:

Operator defines composable networks in network data YAML file.
Operator provisions composable networks by running the openstackovercloudnetworkprovision command, providing the network data YAML file as input.
Operator defines roles and nodes in the baremetal deployment YAML file. This YAML also defines the networks for each role.
Operator deploys baremetal nodes by running the openstackovercloudnodeprovision command. This step creates ports in neutron, and also configures networking; including composable networks; on the nodes using ansible role to apply network config with os-net-config 2.
Operator deploys heat stack including the environment files produced by the commands executed in the previous steps by running the openstackoverclouddeploy command.
Operator executes config-download to install and configure openstack on the overcloud nodes. (optional - only if overcloud deploy command executed with ``-stack-only``)

Implementation

Assignee(s)

Primary assignee:: Harald Jensås <hjensas@redhat.com>

Approver(s)

Primary approver:: TODO

Implementation Details

The baremetal YAML definition will be extended, adding the networks and the network_config keys in role defaults as well as per-instance to support fixed_ip addressing, manually pre-created port resource and per-node network configuration template.

The networks will replace the current nic key, until the nic key is deprecated either can be used but not both at the same time. Networks in networks will support a boolean key vif which indicate if the port should be attached in Ironic or not. If no network with vif:true is specified an implicit one for the control plane will be appended:

-network:ctlplanevif:true

For networks with vif:true, ports will be created by metalsmith. For networks with vif:false (or vif not specified) the workflow will create neutron ports based on the YAML definition.

The neutron ports will initially be tagged with the stack name and the instance hostname, these tags are used for idempotency. The ansible module managing ports will get all ports with the relevant tags and then add/remove ports based on the expanded roles defined in the Baremetal YAML definition. (The hostname and stack_name tags are also added to ports created with heat in this tripleo-heat-templates change 4, to enable adoption of neutron ports created by heat for the upgrade scenario.)

Additionally the ports will be tagged with the ironic node uuid when this is available. Full set of tags are shown in the example below.

{"port":{"name":"controller-1-External","tags":["tripleo_ironic_uuid=<IRONIC_NODE_UUID>","tripleo_hostname=controller-1","tripleo_stack_name=overcloud"],}}

Note

In deployments where baremetal nodes have multiple physical NIC’s multiple networks can have vif:true, so that VIF attach in ironic and proper neutron port binding happens. In a scenario where neutron on the Undercloud is managing the switch this would enable automation of the Top-of-Rack switch configuration.

Mapping of the port data for overcloud nodes will go into a NodePortMap parameter in tripleo-heat-tempaltes. The map will contain submaps for each node, keyed by the node name. Initially the NodePortMap will be consumed by alternative fake-portOS::TripleO::{{role.name}}::Ports::{{network.name}}Port resource templates. In the final implementation the environment file created can be extended and the entire OS::TripleO::{{role.name}} resource can be replaced with a template that references parameter in the generated environment directly, i.e a re-implemented puppet/role.role.j2.yaml without the server and port resources. The NodePortMap will be added to the overcloud-baremetal-deployed.yaml created by the workflow creating the overcloud node port resources.

Network ports for vif:false networks, will be managed by a new ansible module tripleo_overcloud_network_ports, the input for this role will be a list of instance definitions as generated by the tripleo_baremetal_expand_roles ansible module. The tripleo_baremetal_expand_roles ansible module will be extended to add network/subnet information from the baremetal deployment YAML definition.

The baremetal provision workflow will be extended to write a ansible inventory, we should try extend tripleo-ansible-inventory so that the baremetal provisioning workflow can re-use existing code to create the inventory. The inventory will be used to configure networking on the provisioned nodes using the triple-ansibletripleo_network_config ansible role.

Already Deployed Servers

The Baremetal YAML definition will be used to describe the pre-deployed servers baremetal deployment. In this scenario there is no Ironic node to update, no ironic UUID to add to a port’s tags and no ironic node to attach VIFs to.

All ports, including the ctlplane port will be managed by the tripleo_overcloud_network_ports ansible module. The Baremetal YAML definition for a deployment with pre-deployed servers will have to include an instance entry for each pre-deployed server. This entry will have the managed key set to false.

It should be possible for an already deployed server to have a management address that is completely separate from the tripleo managed addreses. The Baremetal YAML definition can be extended to carry a management_ip field for this purpose. In the case no managment address is available the ctlplane network entry for pre-deployed instances must have fixed_ip configured.

The deployment workflow will short circuit the baremetal provisioning of managed:false instances. The Baremetal YAML definition can define a mix of already deployed server instances, and instances that should be provisioned via metalsmith. See Example: Baremetal YAML for Already Deployed Servers.

YAML Examples

Example: Baremetal YAML definition with defaults properties

-name:Controllercount:1hostname_format:controller-%index%defaults:profile:controlnetwork_config:template:templates/multiple_nics/multiple_nics.j2physical_bridge_name:br-expublic_interface_name:nic1network_deployment_actions:['CREATE']net_config_data_lookup:{}networks:-network:ctlplanevif:true-network:externalsubnet:external_subnet-network:internal_apisubnet:internal_api_subnet-network:storagesubnet:storage_subnet-network:storage_mgmtsubnet:storage_mgmt_subnet-network:Tenantsubnet:tenant_subnet-name:Computecount:1hostname_format:compute-%index%defaults:profile:computenetwork_config:template:templates/multiple_nics/multiple_nics.j2physical_bridge_name:br-expublic_interface_name:nic1network_deployment_actions:['CREATE']net_config_data_lookup:{}networks:-network:ctlplanevif:true-network:internal_apisubnet:internal_api_subnet-network:tenantsubnet:tenant_subnet-network:storagesubnet:storage_subnet

Example: Baremetal YAML definition with per-instance overrides

-name:Controllercount:1hostname_format:controller-%index%defaults:profile:controlnetwork_config:template:templates/multiple_nics/multiple_nics.j2physical_bridge_name:br-expublic_interface_name:nic1network_deployment_actions:['CREATE']net_config_data_lookup:{}bond_interface_ovs_options:networks:-network:ctlplanevif:true-network:externalsubnet:external_subnet-network:internal_apisubnet:internal_api_subnet-network:storagesubnet:storage_subnet-network:storage_mgmtsubnet:storage_mgmt_subnet-network:tenantsubnet:tenant_subnetinstances:-hostname:controller-0name:node00networks:-network:ctlplanevif:true-network:internal_api:fixed_ip:172.21.11.100-hostname:controller-1name:node01networks:External:port:controller-1-external-hostname:controller-2name:node02-name:ComputeLeaf1count:1hostname_format:compute-leaf1-%index%defaults:profile:compute-leaf1networks:-network:internal_apisubnet:internal_api_subnet-network:tenantsubnet:tenant_subnet-network:storagesubnet:storage_subnetinstances:-hostname:compute-leaf1-0name:node03network_config:template:templates/multiple_nics/multiple_nics_dpdk.j2physical_bridge_name:br-expublic_interface_name:nic1network_deployment_actions:['CREATE']net_config_data_lookup:{}num_dpdk_interface_rx_queues:1networks:-network:ctlplanevif:true-network:internal_apifixed_ip:172.21.12.105-network:tenantport:compute-leaf1-0-tenant-network:storagesubnet:storage_subnet

Example: Baremetal YAML for Already Deployed Servers

-name:Controllercount:3hostname_format:controller-%index%defaults:profile:controlnetwork_config:template:templates/multiple_nics/multiple_nics.j2networks:-network:ctlplane-network:externalsubnet:external_subnet-network:internal_apisubnet:internal_api_subnet-network:storagesubnet:storage_subnet-network:storage_mgmtsubnet:storage_mgmt_subnet-network:tenantsubnet:tenant_subnetmanaged:falseinstances:-hostname:controller-0networks:-network:ctlplanefixed_ip:192.168.24.10-hostname:controller-1networks:-network:ctlplanefixed_ip:192.168.24.11-hostname:controller-2networks:-network:ctlplanefixed_ip:192.168.24.12-name:Computecount:2hostname_format:compute-%index%defaults:profile:computenetwork_config:template:templates/multiple_nics/multiple_nics.j2networks:-network:ctlplane-network:internal_apisubnet:internal_api_subnet-network:tenantsubnet:tenant_subnet-network:storagesubnet:storage_subnetinstances:-hostname:compute-0managed:falsenetworks:-network:ctlplanefixed_ip:192.168.24.100-hostname:compute-1managed:falsenetworks:-network:ctlplanefixed_ip:192.168.24.101

Example: NodeNetworkDataMappings

NodePortMap:controller-0:ctlplane:ip_address:192.168.24.9 (2001:DB8:24::9)ip_subnet:192.168.24.9/24 (2001:DB8:24::9/64)ip_address_uri:192.168.24.9 ([2001:DB8:24::9])internal_api:ip_address:172.18.0.9 (2001:DB8:18::9)ip_subnet:172.18.0.9/24 (2001:DB8:18::9/64)ip_address_uri:172.18.0.9 ([2001:DB8:18::9])tenant:ip_address:172.19.0.9 (2001:DB8:19::9)ip_subnet:172.19.0.9/24 (2001:DB8:19::9/64)ip_address_uri:172.19.0.9 ([2001:DB8:19::9])compute-0:ctlplane:ip_address:192.168.24.15 (2001:DB8:24::15)ip_subnet:192.168.24.15/24 (2001:DB8:24::15/64)ip_address_uri:192.168.24.15 ([2001:DB8:24::15])internal_api:ip_address:172.18.0.15 (2001:DB8:18::1)ip_subnet:172.18.0.15/24 (2001:DB8:18::1/64)ip_address_uri:172.18.0.15 ([2001:DB8:18::1])tenant:ip_address:172.19.0.15 (2001:DB8:19::15)ip_subnet:172.19.0.15/24 (2001:DB8:19::15/64)ip_address_uri:172.19.0.15 ([2001:DB8:19::15])

Example: Ansible inventory

Controller:vars:role_networks:-External-InternalApi-Tenantrole_networks_lower:External:externalInternalApi:internal_apiTenant:tenantnetworks_all:-External-InternalApi-Tenantneutron_physical_bridge_name:br-exneutron_public_interface_name:nic1tripleo_network_config_os_net_config_mappings:{}network_deployment_actions:['CREATE','UPDATE']ctlplane_subnet_cidr:24ctlplane_mtu:1500ctlplane_gateway_ip:192.168.24.254ctlplane_dns_nameservers:[]dns_search_domains:[]ctlplane_host_routes:{}internal_api_cidr:24internal_api_gateway_ip:172.18.0.254internal_api_host_routes:[]internal_api_mtu:1500internal_api_vlan_id:20tenant_cidr:24tenant_api_gateway_ip:172.19.0.254tenant_host_routes:[]tenant_mtu:1500hosts:controller-0:ansible_host:192.168.24.9ctlplane_ip:192.168.24.9internal_api_ip:172.18.0.9tenant_ip:172.19.0.9Compute:vars:role_networks:-InternalApi-Tenantrole_networks_lower:InternalApi:internal_apiTenant:tenantnetworks_all:-External-InternalApi-Tenantneutron_physical_bridge_name:br-exneutron_public_interface_name:nic1tripleo_network_config_os_net_config_mappings:{}network_deployment_actions:['CREATE','UPDATE']ctlplane_subnet_cidr:24ctlplane_mtu:1500ctlplane_gateway_ip:192.168.25.254ctlplane_dns_nameservers:[]dns_search_domains:[]ctlplane_host_routes:{}internal_api_cidr:24internal_api_gateway_ip:172.18.1.254internal_api_host_routes:[]internal_api_mtu:1500internal_api_vlan_id:20tenant_cidr:24tenant_api_gateway_ip:172.19.1.254tenant_host_routes:[]tenant_mtu:1500hosts:compute-0:ansible_host:192.168.25.15ctlplane_ip:192.168.25.15internal_ip:172.18.1.15tenant_ip:172.19.1.15

TODO

Constraint validation, for example BondInterfaceOvsOptions uses allowed_pattern:^((?!balance.tcp).)*$ to ensure balance-tcp bond mode is not used, as it is known to cause packet loss.

Work Items

Write ansible inventory after baremetal provisioning
Create an ansible inventory, similar to the inventory created by config- download. The ansible inventory is required to apply network configuration to the deployed nodes.
We should try to extend tripleo-ansible-inventory so that the baremetal provisioning workflow can re-use existing code to create the inventory.
It is likely that it makes sense for the workflow to also run the tripleo-ansible role tripleo_create_admin to create the tripleo-admin ansible user.
Extend baremetal provisioning workflow to create neutron ports and update the ironic node extra field with the tripleo_networks map.
The baremetal provisioning workflow needs a pre-deployed-server option that cause it to not deploy baremetal nodes, only create network ports. When this option is used the baremetal deployment YAML file will also describe the already provisioned nodes.
Apply and validate network configuration using the triple-ansibletripleo_network_config ansible role. This step will be integrated in the provisioning command.
Disable and remove management of composable network ports in tripleo-heat-templates.
Change the Undercloud and Standalone deploy to apply network configuration prior to the creating the ephemeral heat stack using the tripleo_network_config ansible role.

Testing

Multinode OVB CI job’s with network-isolation will be updated to test the new workflow.

Upgrade Impact

During upgrade switching to use network ports managed outside of the heat stack the PortDeletionPolicy must be set to retain during the update/upgrade prepare step, so that the existing neutron ports (which will be adopted by the pre-heat port management workflow) are not deleted when running the update/ upgrade converge step.

Moving node network configuration out of tripleo-heat-templates will require manual (or scripted) migration of settings controlled by heat template parameters to the input file used for baremetal/network provisioning. At least the following parameters are affected:

NeutronPhysicalBridge
NeutronPublicInterface
NetConfigDataLookup
NetworkDeploymentActions

Parameters that will be deprecated:

NetworkConfigWithAnsible
{{role.name}}NetworkConfigTemplate
NetworkDeploymentActions
{{role.name}}NetworkDeploymentActions
BondInterfaceOvsOptions
NumDpdkInterfaceRxQueues
{{role.name}}LocalMtu
NetConfigDataLookup
DnsServers
DnsSearchDomains
ControlPlaneSubnetCidr
HypervisorNeutronPublicInterface
HypervisorNeutronPhysicalBridge

The environment files used to select one of the pre-defined nic config templates will no longer work. The template to use must be set in the YAML defining the baremetal/network deployment. This affect the following environment files:

environments/net-2-linux-bonds-with-vlans.j2.yaml
environments/net-bond-with-vlans.j2.yaml
environments/net-bond-with-vlans-no-external.j2.yaml
environments/net-dpdkbond-with-vlans.j2.yaml
environments/net-multiple-nics.j2.yaml
environments/net-multiple-nics-vlans.j2.yaml
environments/net-noop.j2.yaml
environments/net-single-nic-linux-bridge-with-vlans.j2.yaml
environments/net-single-nic-with-vlans.j2.yaml
environments/net-single-nic-with-vlans-no-external.j2.yaml

Documentation Impact

The documentation effort is heavy and will need to be incrementally updated. As a minumum, a separate page explaining the new process must be created.

The TripleO docs will need updates in many sections, including:

Alternatives

Not changing how ports are created
In this case we keep creating the ports with heat, the do nothing alternative.
Create a completely separate workflow for composable network ports
A separate workflow that can run before/after node provisioning. It can read the same YAML format as baremetal provisioning, or it can have it’s own YAML format.
The problem with this approach is that we loose the possibility to store relations between neutron-port and baremetal node in a database. As in, we’d need our own database (a file) maintaining the relationships.
Note
We need to implement this workflow anyway for a pre-deployed server scenario, but instead of a completely separate workflow the baremetal deploy workflow can take an option to not provision nodes.
Create ports in ironic and bind neutron ports
Instead of creating ports unknown to ironic, create ports for the ironic nodes in the baremetal service.
The issue is that ironic does not have a concept of virtual port’s, so we would have to either add this support in ironic, switch TripleO to use neutron trunk ports or create fake ironic ports that don’t actually reflect NICs on the baremetal node. (This abandoned ironic spec 3 discuss one approach for virtual port support, but it was abandoned in favor of neutron trunk ports.)
With each PTG there is a re-occurring suggestion to replace neutron with a more light weight IPAM solution. However, the effort to actually integrate it properly with ironic and neutron for composable networks probably isn’t time well spent.

References

1: Review: Spec for network data v2 format.
2: os-net-config.
3: Abandoned spec for VLAN Aware Baremetal Instances.
4: Review: Add hostname and stack_name tags to ports.

The goal of this spec is to design and plan requirements for adding support to TripleO to install and provide a basic configuration of Free Range Router (FRR) on overcloud nodes in order to support BGP dynamic routing. There are multiple reasons why an administrator might want to run FRR, including to obtain multiple routes on multiple uplinks to northbound switches, or to advertise routes to networks or IP addresses via dynamic routing protocols.

Problem description

There are several use cases for using BGP, and in fact there are separate efforts underway to utilize BGP for the control plane and data plane.

BGP may be used for equal-cost multipath (ECMP) load balancing of outbound links, and bi-directional forwarding detection (BFD) for resiliency to ensure that a path provides connectivity. For outbound connectivity BGP will learn routes from BGP peers.

BGP may be used for advertising routes to API endpoints. In this model HAProxy will listen on an IP address and FRR will advertise routes to that IP to BGP peers. High availability for HAProxy is provided via other means such as Pacemaker, and FRR will simply advertise the virtual IP address when it is active on an API controller.

BGP may also be used for routing inbound traffic to provider network IPs or floating IPs for instance connectivity. The Compute nodes will run FRR to advertise routes to the local VM IPs or floating IPs hosted on the node. FRR has a daemon named Zebra that is responsible for exchanging routes between routing daemons such as BGP and the kernel. The redistribute connected statement in the FRR configuration will cause local IP addresses on the host to be advertised via BGP. Floating IP addresses are attached to a loopback interface in a namespace, so they will be redistributed using this method. Changes to OVN will be required to ensure provider network IPs assigned to VMs will be assigned to a loopback interface in a namespace in a similar fashion.

Proposed Change

Overview

Create a container with FRR. The container will run the BGP daemon, BFD daemon, and Zebra daemon (which copies routes to/from the kernel). Provide a basic configuration that would allow BGP peering with multiple peers. In the control plane use case the FRR container needs to be started along with the HA components, but in the data plane use case the container will be a sidecar container supporting Neutron. The container is defined in a change proposed here: 1

The container will be deployed using a TripleO Deployment Service. The service will use Ansible to template the FRR configuration file, and a simple implementation exists in a proposed change here: 2

The current FRR Ansible module is insufficient to configure BGP parameters and would need to be extended. At this time the Ansible Networking development team is not interested in extending the FRR module, so the configuration will be provided using TripleO templates for the FRR main configuration file and daemon configuration file. Those templates are defined in a change proposed here: 3

A user-modifiable environment file will need to be provided so the installer can provide the configuration data needed for FRR (see User Experience below).

OVN will need to be modified to enable the Compute node to assign VM provider network IPs to a loopback interface inside a namespace. These IP address will not be used for sending or receiving traffic, only for redistributing routes to the IPs to BGP peers. Traffic which is sent to those IP addresses will be forwarded to the VM using OVS flows on the hypervisor. An example agent for OVN has been written to demonstrate how to monitor the southbound OVN DB and create loopback IP addresses when a VM is started on a Compute node. The OVN changes will be detailed in a separate OVN spec. Demonstration code is available on Github: 4

User Experience

The installer will need to provide some basic information for the FRR configuration, including whether to enable BFD, BGP IPv4, BGP IPv6, and other settings. See the Example Configuration Data section below.

Additional user-provided data may include inbound or outbound filter prefixes. The default filter prefixes will accept only default routes via BGP, and will export only loopback IPs, which have a /32 subnet mask for IPv4 or /128 subnet mask for IPv6.

Example Configuration Data

tripleo_frr_bfd:falsetripleo_frr_bgp:falsetripleo_frr_bgp_ipv4:truetripleo_frr_bgp_ipv4_allowas_in:falsetripleo_frr_bgp_ipv6:truetripleo_frr_bgp_ipv6_allowas_in:falsetripleo_frr_config_basedir:"/var/lib/config-data/ansible-generated/frr"tripleo_frr_hostname:"{{ansible_hostname}}"tripleo_frr_log_level:informationaltripleo_frr_watchfrr:truetripleo_frr_zebra:false

Alternatives

Routing outbound traffic via multiple uplinks
Fault-tolerance and load-balancing for outbound traffic is typically provided by bonding Ethernet interfaces. This works for most cases, but is susceptible to unidirectional interface failure, a situation where traffic works in only one direction. The LACP protocol for bonding does provide some protection against unidirectional traffic failures, but is not as robust as bi-directional forwarding detection (BFD) provided by FRR.
Routing inbound traffic to highly-available API endpoints
The most common method currently used to provide HA for API endpoints is to use a virtual IP that fails over from active to standby nodes using a shared Ethernet MAC address. The drawback to this method is that all standby API controllers must reside on the same layer 2 segment (VLAN) as the active controller. This presents a challenge if the operator wishes to place API controllers in different failure domains for power and/or networking. A BGP daemon avoids this limitation by advertising a route to the shared IP address directly to the BGP peering router over a routed layer 3 link.
Routing to Neutron IP addresses
Data plane traffic is usually delivered to provider network or floating IP addresses via the Ethernet MAC address associated with the IP and determined via ARP requests on a shared VLAN. This requires that every Compute node which may host a provider network IP or floating IP has the appropriate VLAN trunked to a provider bridge attached to an interface or bond. This makes it impossible to migrate VMs or floating IPs across layer 3 boundaries in edge computing topologies or in a fully layer 3 routed datacenter.

Security Impact

There have been no direct security impacts identified with this approach. The installer should ensure that security policy on the network as whole prevents IP spoofing which could divert legitimate traffic to an unintended host. This is a concern whether or not the OpenStack nodes are using BGP themselves, and may be an issue in environments using traditional routing architecture or static routes.

Upgrade Impact

When (if) we remove the capability to manage network resources in the overcloud heat stack, we will need to evaluate whether we want to continue to provide BGP configuration as a part of the overcloud configuration.

If an operator wishes to begin using BGP routing at the same time as upgrading the version of OpenStack used they will need to provide the required configuration parameters if they differ from the defaults provided in the TripleO deployment service.

Performance Impact

No performance impacts are expected, either positive or negative by using this approach. Attempts have been made to minimize memory and CPU usage by using conservative defaults in the configuration.

Documentation Impact

This is a new TripleO deployment service and should be properly documented to instruct installers in the configuration of FRR for their environment.

The TripleO docs will need updates in many sections, including:

The FRR daemons are documented elsewhere, and we should not need to document usage of BGP in general, as this is a standard protocol. The configuration of top-of-rack switches is different depending on the make and model of routing switch used, and we should not expect to provide configuration examples for network hardware.

Implementation

The implementation will require a new TripleO deployment service, container definition, and modifications to the existing role definitions. Those changes are proposed upstream, see the References section for URL links.

Assignee(s)

Primary assignee:

Dan Sneddon

Secondary assignees:

Michele Baldessari
Carlos Gonclaves
Daniel Alvarez Sanchez
Luis Tomas Bolivar

Work Items

Develop the container definition
Define the TripleO deployment service templates
Define the TripleO Ansible role
Modify the existing TripleO roles to support the above changes
Merge the changes to the container, deployment service, and Ansible role
Ensure FRR packages are available for supported OS versions

References

1: Review: DNR/DNM Frr support.
2: Review: Add tripleo_frr role.
3: Review: WIP/DNR/DNM FRR service.
4: OVN BGP Agent.

https://blueprints.launchpad.net/tripleo/+spec/ephemeral-heat-overcloud

This spec proposes using the ephemeral Heat stack model for all deployments types, including the overcloud. Using ephemeral Heat is already done for standalone deployments with the “tripleo deploy” command, and for the undercloud install as well. Expanding its use to overcloud deployments will align the different deployment methods into just a single method. It will also make the installation process more stateless and with better predictability since there is no Heat stack to get corrupted or possibly have bad state or configuration.

Problem Description

Maintaining the Heat stack can be problematic due to corruption via either user or software error. Backups are often not available, and even when they exist, they are no guarantee to recover the stack. Corruption or loss of the Heat stack, such as accidental deletion, requires custom recovery procedures or re-deployments.
The Heat deployment itself must be maintained, updated, and upgraded. These tasks are not large efforts, but they are areas of maintenance that would be eliminated when using ephemeral Heat instead.
Relying on the long lived Heat process makes the deployment less portable in that there are many assumptions in TripleO that all commands are run directly from the undercloud. Using ephemeral Heat would at least allow for the stack operation and config-download generation to be entirely portable such that it could be run from any node with python-tripleoclient installed.
There are large unknowns in the state of each Heat stack that exists for all current deployments. These unknowns can cause issues during update/upgrade as we can’t possibly account for all of these items, such as out of date parameter usage or old/incorrect resource registry mappings. Having each stack operation create a new stack will eliminate those issues.

Proposed Change

Overview

The ephemeral Heat stack model involves starting a short lived heat process using a database engine for the purposes of creating the stack. The initial proposal assumes using the MySQL instance already present on the undercloud as the database engine. To maintain compatibility with the already implemented “tripleo deploy” code path, SQLite will also be supported for single node deployments. SQLite may also be supported for other deployments of sufficiently small size so as that SQLite is not a bottleneck.

After the stack is created, the config-download workflow is run to download and render the ansible project directory to complete the deployment. The short lived heat process is killed and the database is deleted, however, enough artifacts are saved to reproduce the Heat stack if necessary including the database dump. The undercloud backup and restore procedure will be modified to account for the removal of the Heat database.

This model is already used by the “tripleo deploy” command for the standalone and undercloud installations and is well proven for those use cases. Switching the overcloud deployment to also use ephemeral Heat aligns all of the different deployments to use Heat the same way.

We can scale the ephemeral Heat processes by using a podman pod that encapsulates containers for heat-api, heat-engine, and any other process we needed. Running separate Heat processes containerized instead of a single heat-all process will allow starting multiple engine workers to allow for scale. Management and configuration of the heat pod will be fairly prescriptive and it will use default podman networking as we do not need the Heat processes to scale beyond a single host. Moving forward, undercloud minions will no longer install heat-engine process as a means for scale.

As part of this change, we will also add the ability to run Heat commands against the saved database from a given deployment. This will give operators a way to inspect the Heat stack that was created for debugging purposes.

Managing the templates used during the deployment becomes even more important with this change, as the templates and environments passed to the “overcloud deploy” command are the entire source of truth to recreate the deployment. We may consider further management around the templates, such as a git repository but that is outside the scope of this spec.

There are some cases where the saved state in the stack is inspected before a deployment operation. Two examples are comparing the Ceph fsid’s between the input and what exists in the stack, as well as checking for a missing network-isolation.yaml environment.

In cases such as these, we need a way to perform these checks outside of inspecting the Heat stack itself. A straightforward way to do these types of checks would be to add ansible tasks that check the existing deployed overcloud (instead of the stack) and then cause an error that will stop the deployment if an invalid change is detected.

Alternatives

The alternative is to make no changes and continue to use Heat as we do today for the overcloud deployment. With the work that has already been done to decouple Heat from Nova, Ironic, and now Neutron, it instead seems like the next iterative step is to use ephemeral Heat for all of our deployment types.

Security Impact

The short lived ephemeral heat process uses no authentication. This is in contrast to the Heat process we have on the undercloud today that uses Keystone for authentication. In reality, this change has little effect on security as all of the sensitive data is actually passed into Heat from the templates. We should however make sure that the generated artifacts are secured appropriately.

Since the Heat process is ephemeral, no change related to SRBAC (Secure RBAC) is needed.

Upgrade Impact

When users upgrade to Wallaby, the Heat processes will be shutdown on the undercloud, and further stack operations will use ephemeral Heat.

Upgrade operations for the overcloud will work as expected as all of the update and upgrade tasks are entirely generated with config-download on each stack operation. We will however need to ensure proper upgrade testing to be sure that all services can be upgraded appropriately using ephemeral Heat.

Other End User Impact

End users will no longer have a running instance of Heat to interact with or run heat client commands against. However, we will add management around starting an ephemeral Heat process with the previously used database for debugging inspection purposes (stack resource list/show, etc).

Performance Impact

The ephemeral Heat process is presently single threaded. Addressing this limitation by using a podman pod for the Heat processes will allow the deployment to scale to meet overcloud deployment needs, while keeping the process ephemeral and easy to manage with just a few commands.

Using the MySQL database instead of SQLite as the database engine should alleviate any impact around the database being a bottleneck. After the database is backed up after a deployment operation, it would be wiped from MySQL so that no state is saved outside of the produced artifacts from the deployment.

Alternatively, we can finish the work started in Scaling with the Ansible inventory. That work will enable deploying the Heat stack with a count of 1 for each role. With that change, the Heat stack operation times will scale with the number of roles in the deployment, and not the number of nodes, which will allow for similar performance as currently exists. Even while using the inventory to scale, we are still likely to have worse performance with a single heat-all process than we do today. With just a few roles, using just heat-all becomes a bottleneck.

Other Deployer Impact

Initially, deployers will have the option to enable using the ephemeral Heat model for overcloud deployments, until it becomes the default.

Developer Impact

Developers will need to be aware of the new commands that will be added to enable inspecting the Heat stack for debugging purposes.

In some cases, some service template updates may be required where there are instances that those templates rely on saved state in the Heat stack.

Implementation

Assignee(s)

Primary assignee:: james-slagle

Work Items

The plan is to start prototyping this effort and have the option in place to use it for a default overcloud deployment in Wallaby. There may be additional fine tunings that we can finish in the X release, with a plan to backport to Wallaby. Ideally, we would like to make this the default behavior in Wallaby. To the extent that is possible will be determined by the prototype work.

Add management of Heat podman pod to tripleoclient
Add option to “overcloud deploy” to use ephemeral Heat
Use code from “tripleo deploy” for management of ephemeral Heat
Ensure artifacts from the deployment are saved in known locations and reusable as needed
Update undercloud backup/restore to account for changes related to Heat database.
Add commands to enable running Heat commands with a previously used database
Modify undercloud minion installer to no longer install heat-engine
Switch some CI jobs over to use the optional ephemeral Heat
Eventually make using ephemeral Heat the default in “overcloud deploy”
Align the functionality from “tripleo deploy” into the “overcloud deploy” command and eventually deprecate “tripleo deploy”.

Dependencies

This work depends on other ongoing work to decouple Heat from management of other OpenStack API resources, particularly the composable networks v2 work.

Network Data v2 Blueprint - https://blueprints.launchpad.net/tripleo/+spec/network-data-v2-ports

Testing

Initially, the change will be optional within the “overcloud deploy” command. We can choose some CI jobs to switch over to opt-in. Eventually, it will become the default behavior and all CI jobs would then be affected.

Documentation Impact

Documentation updates will be necessary to detail the changes around using ephemeral Heat. Specifically:

User Interface changes
How to run Heat commands to inspect the stack
Where artifacts from the deployment were saved and how to use them

References

Scaling with the Ansible inventory specification

The TripleO first principles are a set of principles that guide decision making around future direction with TripleO. The principles are used to evaluate choices around changes in direction and architecture. Every impactful decision does not necessarily have to follow all the principles, but we use them to make informed decisions about trade offs when necessary.

Problem Description

When evaluating technical direction within TripleO, a better and more consistent method is needed to weigh pros and cons of choices. Defining the principles is a step towards addressing that need.

Policy

Definitions

Framework

The functional implementation which exposes a set of standard enforcing interfaces that can be consumed by a service to describe that service’s deployment and management. The framework includes all functional pieces that implement such interfaces, such as CLI’s, API’s, or libraries.

Example: tripleoclient/tripleo-common/tripleo-ansible/tripleo-heat-templates

Service

The unit of deployment. A service will implement the necessary framework interfaces in order to describe it’s deployment.

The framework does not enforce a particular service boundary, other than by prescribing best practices. For example, a given service implementation could deploy both a REST API and a database, when in reality the API and database should more likely be deployed as their own services and expressed as dependencies.

Example: Keystone, MariaDB, RabbitMQ

Third party integrations

Service implementations that are developed and maintained outside of the TripleO project. These are often implemented by vendors aiming to add support for their products within TripleO.

Example: Cinder drivers, Neutron plugins

First Principles

[UndercloudMigrate] No Undercloud Left Behind
1. TripleO itself as the deployment tool can be upgraded. We do not immediately propose what the upgrade will look like or the technology stack, but we will offer an upgrade path or a migration path.
[OvercloudMigrate] No Overcloud Left Behind
1. An overcloud deployed with TripleO can be upgraded to the next major version with either an in place upgrade or migration.
[DefinedInterfaces] TripleO will have a defined interface specification.
1. We will document clear boundaries between internal and external (third party integrations) interfaces.
2. We will document the supported interfaces of the framework in the same way that a code library or API would be documented.
3. Individual services of the framework can be deployed and tested in isolation from other services. Service dependencies are expressed per service, but do not preclude using the framework to deploy a service isolated from its dependencies. Whether that is successful or not depends on how the service responds to missing dependencies, and that is a behavior of the service and not the framework.
4. The interface will offer update and upgrade tasks as first class citizens
5. The interface will offer validation tasks as first class citizens
[OSProvisioningSeparation] Separation between operating system provisioning and software configuration.
1. Baremetal configuration, network configuration and base operating system provisioning is decoupled from the software deployment.
2. The software deployment will have a defined set of minimal requirements which are expected to be in-place before it begins the software deployment.
  1. Specific linux distributions
  2. Specific linux distribution versions
  3. Password-less access via ssh
  4. Password-less sudo access
  5. Pre-configured network bridges
[PlatformAgnostic] Platform agnostic deployment tooling.
1. TripleO is sufficiently isolated from the platform in a way that allows for use in a variety of environments (baremetal/virtual/containerized/OS version).
2. The developer experience is such that it can easily be run in isolation on developer workstations
[DeploymentToolingScope] The deployment tool has a defined scope
1. Data collection tool.
  1. Responsible for collecting host and state information and posting to a centralized repository.
  2. Handles writes to central repository (e.g. read information from repository, do aggregation, post to central repository)
2. A configuration tool to configure software and services as part of the deployment
  1. Manages Software Configuration
    1. Files
    2. Directories
    3. Service (containerized or non-containerized) state
    4. Software packages
  2. Executes commands related to “configuration” of a service Example: Configure OpenStack AZ’s, Neutron Networks.
  3. Isolated executions that are invoked independently by the orchestration tool
  4. Single execution state management
    1. Input is configuration data/tasks/etc
    2. A single execution produces the desired state or reports failure.
    3. Idempotent
  5. Read-only communication with centralized data repository for configuration data
3. The deployment process depends on an orchestration tool to handle various task executions.
  1. Task graph manager
  2. Task transport and execution tracker
  3. Aware of hosts and work to be executed on the hosts
  4. Ephemeral deployment tooling
  5. Efficient execution
  6. Scale and reliability/durability are first class citizens
[CI/CDTooling] TripleO functionality should be considered within the context of being directly invoked as part of a CI/CD pipeline.
[DebuggableFramework] Diagnosis of deployment/configuration failures within the framework should be quick and simple. Interfaces should be provided to enable debuggability of service failures.
[BaseOSBootstrap] TripleO can start from a base OS and go to full cloud
1. It should be able to start at any point after base OS, but should be able to handle the initial OS bootstrap
[PerServiceManagement] TripleO can manage individual services in isolation, and express and rely on dependencies and ordering between services.
[Predictable/Reproducible/Idempotent] The deployment is predictable
1. The operator can determine what changes will occur before actually applying those changes.
2. The deployment is reproducible in that the operator can re-run the deployment with the same set of inputs and achieve the same results across different environments.
3. The deployment is idempotent in that the operator can re-run the deployment with the same set of inputs and the deployment will not change other than when it was first deployed.
4. In the case where a service needs to restart a process, the framework will have an interface that the service can use to notify of the needed restart. In this way, the restarts are predictable.
5. The interface for service restarts will allow for a service to describe how it should be restarted in terms of dependencies on other services, simultaneous restarts, or sequential restarts.

Non-principles

[ContainerImageManagement] The framework does not manage container images. Other than using a given container image to start a container, the framework does not encompass common container image management to include:
1. Building container images
2. Patching container images
3. Serving or mirroring container images
4. Caching container images
Specific tools for container image and runtime management and that need to leverage the framework during deployment are expected to be implemented as services.
[SupportingTooling] Tools and software executed by the framework to deploy services or tools required prior to service deployment by the framework are not considered part of the framework itself.
Examples: podman, TCIB, image-serve, nova-less/metalsmith

Alternatives & History

Many, if not all, the principles are already well agreed upon and understood as core to TripleO. Writing them down as policy makes them more discoverable and official.

Historically, there have been instances when decisions have been guided by desired technical implementation or outcomes. Recording the principles does not necessarily mean those decisions would stop, but it does allow for a more reasonable way to think about the trade offs.

We do not need to adopt any principles, or record them. However, there is no harm in doing so.

Implementation

Author(s)

Primary author:: James Slagle <jslagle@redhat.com>
Other contributors:: <launchpad-id or None>

Milestones

None.

Work Items

None.

References

None.

Revision History

Revisions
Release Name	Description
v0.0.1	Introduced

Note

This work is licensed under a Creative Commons Attribution 3.0 Unported License. http://creativecommons.org/licenses/by/3.0/legalcode

This proposal lays out the plan to use tripleo-repos as a single source to install and configure non-base OS repos for TripleO - including setting the required DLRN hashes.

https://blueprints.launchpad.net/tripleo/+spec/tripleo-repos-single-source

Problem Description

Reviewing the code base, there are multiple places where repos are specified. For example,in the release files we are setting the configuration that is applied by repo setup role. Some of the other repo/version configurations are included in:

The process of setting repo versions requires getting and transforming DLRN hashes, for example resolving ‘current-tripleo’ to a particular DLRN build ID and specifying the correct proxies. Currently a large portion of this work is done in the release files resulting in sections of complicated and fragile Bash scripts - duplicated across numerous release files.

This duplication, coupled with the various locations in use for setting repo configurations, modules and supported versions is confusing and error prone.

There should be one source of truth for which repos are installed within a tripleo deployment and how they are installed. Single-sourcing all these functions will avoid the current problems of duplication, over-writing settings and version confusion.

Proposed Change

Overview

This proposal puts forward using tripleo-repos as the ‘source of truth’ for setting repo configurations, modules and supported versions - including setting the DLRN hashes required to specify exact repo versions to install for upstream development/CI workflows.

Having a single source of truth for repo config, modules, etc. will make development and testing more consistent, reliable and easier to debug.

The intent is to use the existing tripleo-repos repo for this work and not to create a new repo. It is as yet to be determined if we will add a v2/versioned api or how we will handle the integration with the existing functionality there.

We aim to modularize the design and implementation of the proposed tripleo-repos work. Two sub systems in particular have been identified that can be implemented independently of, and ultimately to be consumed by, tripleo-repos; the resolution of delorean build hashes from known tags (i.e. resolving ‘current-tripleo’ to a particular DLRN build ID) and the configuration of dnf repos and modules will be implemented as independent python modules, with their own unit tests, clis, ansible modules etc.

Integration Points

The new work in tripleo-repos will have to support with all the cases currently in use and will have to integrate with:

DLRN Repos
release files
container and overcloud image builds
rdo config
yum/dnf repos and modules
Ansible (Ansible module)
promotion pipeline - ensuring the correct DLRN hashes

Incorporating the DLRN hash functionality makes the tool more complex. Unit tests will be required to guard against frequent breakages. This is one of the reasons that we decided to split this DLRN hash resolution into its own dedicated python module ‘tripleo-get-hash’ for which we can have independent unit tests.

The scope of the new tripleo-repos tool will be limited to upstream development/CI workflows.

Alternatives

Functionality to set repos, modules and versions is already available today. It would be possible to leave the status quo or:

Use rdo config to set one version per release - however, this would not address the issue of changing DLRN hashes
Create an rpm that lays down /etc/tripleo-release where container-tools could be meta data in with that, similar to /etc/os-release

Security Impact

No security impact is anticipated. The work is currently in the tripleo open-source repos and will remain there - just in a consolidated place and format.

Upgrade Impact

Currently there will be no upgrade impact. The new CLI will support all release versions under support and in use. At a later date, when the old CLI is deprecated there may be some update implications.

However,there may be work to make the emit_releases_file https://opendev.org/openstack/tripleo-ci/src/branch/master/scripts/emit_releases_file/emit_releases_file.py functionality compatible with the new CLI.

Other End User Impact

Work done on the new project branch will offer a different version of CLI, v2. End users would be able to select which version of the CLI to use - until the old CLI is deprecated.

Performance Impact

No performance impact is expected. Possible performance improvements could result from ensuring that proxy handling (release file, mirrors, rdoproject) is done correctly and consistently.

Other Deployer Impact

Developer Impact

See `OtherEndUserImpact` section.

Implementation

The functionality added to tripleo-repos will be writen as a Python module with a CLI and will be able to perform the following primary functions:

Single source the installation of all TripleO related repos
Include the functionality current available in the repo-setup role including creating repos from templates and files
Perform proxy handling such as is done in the release files (mirrors, using rdoproject for DLRN repos)
Get and transform human-readable DLRN hashes - to be implemented as an independent module.
Support setting yum modules such as container-tools - to be implemented as an independent module.
Support enabling and disabling repos and setting their priorities

The repo-setup role shall remain but it will invoke tripleo-repos. All options required to be passed to tripleo-repos should be in the release file.

Work done on the new project branch will offer a different version of CLI, v2. Unit tests will be added on this branch to test the new CLI directly. CI would be flipped to run in the new branch when approved by TripleO teams. All current unit tests should pass with the new code.

An Ansible module will be added to call the tripleo-repos options from Ansible directly without requiring the end user to invoke the Python CLI from within Ansible.

The aim is for tripleo-repos to be the single source for all repo related configuration. In particular the goal is to serve the following 3 personas:

Upstream/OpenStack CI jobs
Downstream/OSP/RHEL jobs
Customer installations

The configuration required to serve each of these use cases is slightly different. In upstream CI jobs we need to configure the latest current-tripleo promoted content repos. In downstream/OSP jobs we need to use rhos-release and in customer installations we need to use subscription manager.

Because of these differing requirements we are leaning towards storing the configuration for each in their intended locations, with the upstream config being the ‘base’ and the downstream config building ontop of that (the implication is that some form of inheritance will be used to avoid duplication). This was discussed during the Xena PTG session

Assignee(s)

sshnaidm (DF and CI)
marios (CI and W-release PTL)
weshay
chandankumar
ysandeep
arxcruz
rlandy
other DF members (cloudnull)

Work Items

Proposed Schedule

Investigative work will be begin in the W-release cycle on a project branch in tripleo-repos. The spec will be put forward for approval in the X-release cycle and impactful and integration work will be visible once the spec is approved.

Dependencies

This work has a dependency on the DLRN API and on yum/dnf.

Testing

Specific unit tests will be added with the python-based code built. All current CI tests will run through this work and will test it on all releases and in various aspects such as:

container build
overcloud image build
TripleO deployments (standalone, multinode, scenarios, OVB)
updates and upgrades

CLI Design

Here is an abstract sketch of the intended cli design for the new tripleo-repos.

It covers most of the needs discussed at multiple places.

Scenario 1

The goal is to construct a repo with the correct hash for an integration or a component pipeline.

For this scenario:

Any combination of hash, distro, commit, release, promotion, url parameters can passed
Use the tripleo-get-hash module to determine the DLRN build ID
Use the calculated DLRN build ID to create and add a repo

Scenario 2

The goal is to construct any type of yum/dnf repo.

For this scenario:

Construct and add a yum/dnf repo using a combination of the following parameters
filename - filename for saving the resulting repo (mandatory)
reponame - name of repository (mandatory)
baseurl - base URL of the repository (mandatory)
down_url - URL to download repo file from (mandatory/multually exclusive to baseurl)
priority - priority of resulting repo (optional)
enabled - 0/1 whether the repo is enabled or not (default: 1 - enabled)
gpgcheck - whether to check GPG keys for repo (default: 0 - don’t check)
module_hotfixes - whether to make all RPMs from the repository available (default: 0)
sslverify - whether to use a cert to use repo metadata (default: 1)
type - type of the repo(default: generic, others: custom and file)

Scenario 3

The goal is to enable or disable specific dnf module and also install or remove a specific package.

For this scenario:

Specify
module name
which version to disable
which version to enable
which specific package from the module to install (optional)

Scenario 4

The goal is to enable or disable some repos, remove any associated repo files no longer needed, and then perform a system update.

For this scenario:

Specify
repo names to be disabled
repo names to be enabled
the files to be removed
whether to perform the system update

Documentation Impact

tripleo-docs will be updated to point to the new supported repo/modules/versions setting workflow in tripleo-repos.

References to old sources of settings such as tripleo-ansible, release files in tripleo-quickstart and the repo-setup role will have to be removed and replaced to point to the new workflow.

https://blueprints.launchpad.net/tripleo/+spec/clean-container-healthchecks

We don’t rely on the container healthcheck results for anything in the infrastructure. They are time and resource consuming, and their maintenance is mostly random. We can at least remove the ones that aren’t hitting an actual API healthcheck endpoint.

This proposal was discussed during a session at the Xena PTG

Problem Description

Since we moved the services to container, first with the docker engine, then with podman, container healthchecks have been implemented and used.

While the very idea of healthchecks isn’t bad, the way we (TripleO) are making and using them is mostly wrong:

no action is taken upon healthcheck failure
some (most) aren’t actually checking if the service is working, but merely that the service container is running

The healthchecks such as healthcheck_port, healthcheck_listen, healthcheck_socket as well as most of the scripts calling healthcheck_curl are mostly NOT doing anything more than ensuring a service is running - and we already have this info when the container is “running” (good), “restarting” (not good) or “exited” (with a non-0 code - bad).

Also, the way podman implements healthchecks is relying on systemd and its transient service and timers. Basically, for each container, a new systemd unit is created and injected, as well as a new timer - meaning systemd calls podman. This isn’t really good for the hosts, especially the ones having heavy load due to their usage.

Proposed Change

Overview

A deep cleaning of the current healthcheck is needed, such as the healthcheck_socket, healthcheck_port, and healthcheck_curl that aren’t calling an actual API healthcheck endpoint. This list isn’t exhaustive.

This will drastically reduce the amount of “podman” calls, leading to less resource issues, and provide a better comprehension when we list the processes or services.

In case an Operator wants to get some status information, they can leverage an existing validation:

openstacktripleovalidatorrun--validationservice-status

This validation can be launched from the Undercloud directly, and will gather remote status for every OC nodes, then provide a clear summary.

Such a validation could also be launched from a third-party monitoring instance, provided it has the needed info (mostly the inventory).

Alternatives

There are multiple alternatives we can even implement as a step-by-step solution, though any of them would more than probably require their own specifications and discussions:

Replace the listed healthchecks by actual service healthchecks

Doing so would allow to get a better understanding of the stack health, but will not solve the issue with podman calls (hence resource eating and related things). Such healchecks can be launched from an external tool, for instance based on a host’s cron, or an actual service.

Call the healthchecks from an external tool

Doing so would prevent the potential resource issues with the “podman exec” calls we’re currently seeing, while allowing a centralization for the results, providing a better way to get metrics and stats.

Keep things as-is

Because we have to list this one, but there are hints this isn’t the right thing to do (hence the current spec).

Security Impact

No real Security impact. Less services/calls might lead to smaller attack surface, and it might prevent some denial of service situations.

Upgrade Impact

No Upgrade impact.

Other End User Impact

The End User doesn’t have access to the healthcheck anyway - that’s more for the operator.

Performance Impact

The systems will be less stressed, and this can improve the current situation regarding performances and stability.

Other Deployer Impact

There is no “deployer impact” if we don’t consider they are the operator.

For the latter, there’s a direct impact: podmanps won’t be able to show the health status anymore or, at least, not for the containers without such checks.

But the operator is able to leverage the service-status validation instead - this validation will even provide more information since it takes into account the failed containers, a thing podmanps doesn’t show without the proper option, and even with it, it’s not that easy to filter.

Developer Impact

In order to improve the healthchecks, especially for the API endpoints, service developers will need to implement specific tests in the app.

Once it’s existing, working and reliable, they can push it to any healthcheck tooling at disposition - being the embedded container healthcheck, or some dedicated service as described in the third step.

Implementation

Assignee(s)

Primary assignee:: cjeanner

Work Items

Triage existing healthcheck, and if they aren’t calling actual endpoint, deactive them in tripleo-heat-templates
Ensure the stack stability isn’t degraded by this change, and properly document the “service-status” validation with the Validation Framework Team

The second work item is more an empirical data on the long term - we currently don’t have actual data, appart a Launchpad issue pointing to a problem maybe caused by the way healthchecks are launched.

Possible future work items

Initiate a discussion with CloudOps (metrics team) regarding an dedicated healthcheck service, and how to integrate it properly within TripleO
Initiate a cross-Team work toward actual healthcheck endpoints for the services in need

Those are just here for the sake of evolution. Proper specs will be needed in order to frame the work.

Dependencies

For step 1 and 2, no real dependencies are needed.

Testing

Testing will require different things:

Proper metrics in order to ensure there’s no negative impact - and that any impact is measurable
Proper insurance the removal of the healthcheck doesn’t affect the services in a negative way
Proper testing of the validations, especially “service-status” in order to ensure it’s reliable enough to be considered as a replacement at some point

Documentation Impact

A documentation update will be needed regarding the overall healthcheck topic.

References

Podman Healthcheck implementation and usage

https://blueprints.launchpad.net/tripleo/+spec/whole-disk-default

This blueprint tracks the tasks required to switch to whole-disk overcloud images by default instead of the current overcloud-full partition image.

Whole disk images vs partition images

The current overcloud-full partition image consists of the following:

A compressed qcow2 image file which contains a single root partition with all the image contents
A kernel image file for the kernel to boot
A ramdisk file to boot with the kernel

Whereas the overcloud-hardened-uefi-full whole-disk image consists of a single compressed qcow2 image containing the following:

A partition layout containing UEFI boot, legacy boot, and a root partition
The root partition contains a single lvm group with a number of logical volumes of different sizes which are mounted at /, /tmp, /var, /var/log, etc.

When a partition image is deployed, ironic-python-agent does the following on the baremetal disk being deployed to:

Creates the boot and root partitions on the disk
Copies the partition image contents to the root partition
Populates the empty boot partition with everything required to boot, including the kernel image, ramdisk file, a generated grub config, and an installed grub binary

When a whole-disk image is deployed, ironic-python-agent simply copies the whole image to the disk.

When the partition image deploy boots for the first time, the root partition grows to take up all of the available disk space. This mechanism is provided by the base cloud image. There is no equivalent partition growing mechanism for a multi-volume LVM whole-disk image.

Problem Description

The capability to build and deploy a whole-disk overcloud image has been available for many releases, but it is time to switch to this as the default. Doing this will avoid the following issues and bring the following benefits:

As of CentOS-8.4, grub will stop support for installing the bootloader on a UEFI system. ironic-python-agent depends on grub installs to set up EFI boot with partition images, so UEFI boot will stop working when CentOS 8.4 is used.
Other than this new grub behaviour, keeping partition boot working in ironic-python-agent has been a development burden and involves code complexity which is avoided for whole-disk deployments.
TripleO users are increasingly wanting to deploy with UEFI Secure Boot enabled, this is only possible with whole-disk images that use the signed shim bootloader.
Partition images need to be distributed with kernel and ramdisk files, adding complexity to file management of deployed images compared to a single whole-disk image file.
The requirements for a hardened image includes having separate volumes for root, data etc. All TripleO users get the security benefit of hardened images when a whole-disk image is used.
We currently need dedicated CI jobs both in the upstream check/gate (when the relevant files changed) but also in periodic integration lines, to build and publish the latest ‘current-tripleo’ version of the hardened images. In the long term, only a single hardend UEFI whole-disk image needs to be built and published, reducing the CI footprint. (in the short term, CI footprint may go up so the whole-disk image can be published, and while hardened vs hardened-uefi jobs are refactored.

Proposed Change

Overview

Wherever the partition image overcloud-full.qcow2 is built, published, or used needs to be updated to use overcloud-hardened-uefi-full.qcow2 by default.

This blueprint will be considered complete when it is possible to follow the default path in the documentation and the result is an overcloud deployed with whole-disk images.

Image upload tool

The default behaviour of openstackovercloudimageupload needs to be aware that overcloud-hardened-uefi-full.qcow2 should be uploaded by default when it is detected in the local directory.

Reviewing image build YAML

Once the periodic jobs are updated, image YAML defining overcloud-hardened-full can be deleted, leaving only overcloud-hardened-uefi-full. Other refactoring can be done such as renaming -python3.yaml back to -base.yaml.

Reviewing partition layout

Swift data is stored in /srv and according to the criteria of hardened images this should be in its own partition. This will need to be added to the existing partition layout for whole-disk UEFI images.

Partition growing

On node first boot, a replacement mechanism for growing the root partition is required. This is a harder problem for the multiple LVM volumes which the whole-disk image creates. Generally the /var volume should grow to take available disk space because this is where TripleO and OpenStack services store their state, but sometimes /srv will need to grow for Swift storage, and sometimes there may need to be a proportional split of multiple volumes. This suggests that there will be new tripleo-heat-templates variables which will specify the volume/proportion growth behaviour on a per-role basis.

A new utility is required which automates this LVM volume growing requirement. It could be implemented a number of ways:

A new project/package containing the utility, installed on the image and run by first-boot or early tripleo-ansible.
A utility script installed by a diskimage-builder/tripleo-image-elements element and run by first-boot or as a first-boot ansible task (post-provisioning or early deploy).
Implement entirely in an ansible role, either in its own repository, or as part of tripleo-ansible. It would be run by early tripleo-ansible.

This utility will also be useful to other cloud workloads which use LVM based images, so some consideration is needed for making it a general purpose tool which can be used outside an overcloud image. Because of this, option 2. is proposed initially as the preferred way to install this utility, and it will be proposed as a new element in diskimage-builder. Being coupled with diskimage-builder means the utility can make assumptions about the partition layout:

a single Volume Group that defaults to name vg
volume partitions are formatted with XFS, which can be resized while mounted

Alternatives

Because of the grub situation, the only real alternative is dropping support for UEFI boot, which means only supporting legacy BIOS boot indefinitely. This would likely have negative feedback from end-users.

Security Impact

All deployments will use images that comply with the hardened-image requirements, so deployments will gain these security benefits
Whole disk images are UEFI Secure Boot enabled, so this blueprint brings us closer to recommending that Secure Boot be switched on always. This will validate to users that they have deployed boot/kernel binaries signed by Red Hat.

Upgrade Impact

Nodes upgraded in-place will continue to be partition image based, and new/replaced nodes will be deployed with whole-disk images. This doesn’t have a specific upgrade implication, unless we document an option for replacing every node in order to ensure all nodes are deployed with whole-disk images.

Other End User Impact

There is little end-user impact other than:

The change of habit required to use overcloud-hardened-uefi-full.qcow2 instead of overcloud-full.qcow2
The need to set the heat variable if custom partition growing behaviour is required

Performance Impact

There is no known performance impact with this change.

Other Deployer Impact

All deployer impacts have already been mentioned elsewhere.

Developer Impact

There are no developer impacts beyond the already mentioned deployer impacts.

Implementation

Assignee(s)

Primary assignee:: Steve Baker <sbaker@redhat.com>

Work Items

python-tripleoclient: image upload command, handle overcloud-hardened-uefi-full.qcow2 as the default if it exists locally
tripleo-ansible/cli-overcloud-node-provision.yaml: detect overcloud-hardened-uefi-full.(qcow2|raw) as the default if it exists in /var/lib/ironic/images
RDO jobs: * add periodic job for overcloud-hardened-uefi-full * remove periodic job for overcloud-hardened-full * modify image publishing jobs to publish overcloud-hardened-uefi-full.qcow2
tripleo-image-elements/overcloud-partition-uefi: add /srv logical volume for swift data
tripleo-quickstart-extras: Use the whole_disk_images=True variable to switch to downloading/uploading/deploying overcloud-hardened-uefi-full.qcow2
tripleo-ci/featureset001/002: Enable whole_disk_images=True
diskimage-builder: Add new element which installs utility for growing LVM volumes based on specific volume/proportion mappings
tripleo-common/image-yaml: * refactor to remove non-uefi hardened image * rename -python3.yaml back to -base.yaml * add the element which installs the grow partition utility
tripleo-heat-templates: Define variables for driving partition growth volume/proportion mappings
tripleo-ansible: Consume the volume/proportion mapping and run the volume growing utility on every node in early boot.
tripleo-docs: * Update the documentation for deploying whole-disk images by default * Document variables for controlling partition growth

Dependencies

Unless diskimage-builder require separate tracking to add the partition growth utility, all tasks can be tracked under this blueprint.

Testing

Image building and publishing

Periodic jobs which build images, and jobs which build and publish images to downloadable locations need to be updated to build and publish overcloud-hardened-uefi-full.qcow2. Initially this can be in parallel with the existing overcloud-full.qcow2 publishing, but eventually that can be switched off.

overcloud-hardened-full.qcow2 is the same as overcloud-hardened-uefi-full.qcow2 except that it only supports legacy BIOS booting. Since overcloud-hardened-uefi-full.qcow2 supports both legacy BIOS and UEFI boot, the periodic jobs which build overcloud-hardened-full.qcow2 can be switched off from Wallaby onwards (assuming these changes are backported as far back as Wallaby).

CI support

CI jobs which consume published images need to be modified so they can download overcloud-hardened-uefi-full.qcow2 and deploy it as a whole-disk image.

Documentation Impact

The TripleO Deployment Guide needs to be modified so that overcloud-hardened-uefi-full.qcow2 is referred to throughout, and so that it correctly documents deploying a whole-disk image based overcloud.

References

Launchpad blueprint:

https://blueprints.launchpad.net/tripleo/+spec/ansible-logging-tripleoclient

Problem description

Currently, the ansible playbooks logging as shown during a deploy or day-2 operations such us upgrade, update, scaling is either too verbose, or not enough.

Furthermore, since we’re moving to ephemeral services on the Undercloud (see ephemeral heat for instance), getting information about the state, content and related things is a bit less intuitive. A proper logging, with associated CLI, can really improve that situation and provide a better user experience.

Requirements for the solution

No new service addition

We are already trying to remove things from the Undercloud, such as Mistral, it’s not in order to add new services.

No increase in deployment and day-2 operations time

The solution must not increase the time taken for deploy, update, upgrades, scaling and any other day-2 operations. It must be 100% transparent to the operator.

Use existing tools

In the same way we don’t want to have new services, we don’t want to reinvent the wheel once more, and we must check the already huge catalog of existing solutions.

KISS

Keep It Simple Stupid is a key element - code must be easy to understand and maintain.

Proposed Change

Introduction

While working on the Validation Framework, a big part was about the logging. There, we found a way to get an actual computable output, and store it in a defined location, allowing to provide a nice interface in order to list and show validation runs.

This heavily relies on an ansible callback plugin with specific libs, which are shipped in python-validations-libs package.

Since the approach is modular, those libs can be re-used pretty easily in other projects.

In addition, python-tripleoclient already depends on python-validations-libs (via a dependency on validations-common), meaning we already have the needed bits.

The Idea

Since we have the mandatory code already present on the system (provided by the new python-validations-libs package), we can modify how ansible-runner is configured in order to inject a callback, and get the output we need in both the shell (direct feedback to the operator) and in a dedicated file.

Since callback aren’t cheap (but, hopefully not expensive either), proper PoC must be conducted in order to gather metrics about CPU, RAM and time. Please see Performance Impact section.

Direct feedback

The direct feedback will tell the operator about the current task being done and, when it ends, if it’s a success or not.

Using a callback might provide a “human suited” output.

File logging

Here, we must define multiple things, and take into account we’re running multiple playbooks, with multiple calls to ansible-runner.

File location

Nowadays, most if not all of the deploy related files are located in the user home directory (i.e. ~/overcloud-deploy/<stack>/). It therefore sounds reasonable to get the log in the same location, or a subdirectory in that location.

Keeping this location also solves the potential access right issue, since a standard home directory has a 0700 mode, preventing any other user to access its content.

We might even go a bit deeper, and enforce a 0600 mode, just to be sure.

Remember, logs might include sensitve data, especially when we’re running with extra debugging.

File format convention

In order to make the logs easily usable by automated tools, and since we already heavily rely on JSON, the log output should be formated as JSON. This would allow to add some new CLI commands such as “history list”, “history show” and so on.

Also, JSON being well known by logging services such as ElasticSearch, using it makes sending them to some central logging service really easy and convenient.

While JSON is nice, it will more than probably prevent a straight read by the operator - but with a working CLI, we might get something closer to what we have in the Validation Framework, for instance (see this example). We might even consider a CLI that will allow to convert from JSON to whatever the operator might want, including but not limited to XML, plain text or JUnit (Jenkins).

There should be a new parameter allowing to switch the format, from “plain” to “json” - the default value is still subject to discussion, but providing this parameter will ensure Operators can do whetever they want with the default format. A concensus seems to indicate “default to plain”.

Filename convention

As said, we’re running multiple playbooks during the actions, and we also want to have some kind of history.

In order to do that, the easiest way to get a name is to concatenate the time and the playbook name, something like:

timestamp-playbookname.json

Use systemd/journald instead of files

One might want to use systemd/journald instead of plain files. While this sounds appealing, there are multiple potential issues:

Sensitive data will be shown in the system’s journald, at hand of any other user
Journald has rate limitations and threshold, meaning we might hit them, and therefore lose logs, or prevent other services to use journald for their own logging
While we can configure a log service (rsyslog, syslog-ng, etc) in order to output specific content to specific files, we will face access issues on them

Therefore, we shouldn’t use journald.

Does it meet the requirements?

No service addition: yes - it’s only a change in the CLI, no new dependecy is needed (tripleoclient already depends on validations-common, which depends on validations-libs)
No increase in operation time: this has to be proven with proper PoC and metrics gathering/comparison.
Existing Tool: yes
Actively maintained: so far, yes - expected to be extended outside of TripleO
KISS: yes, based on the validations-libs and simple Ansible callback

Alternatives

ARA

ARA Records Ansible provides some of the functionnalities we implemented in the Validation Framework logging, but it lacks some of the wanted features, such as

CLI integration within tripleoclient
Third-party service independency
plain file logging in order to scrap them with SOSReport or other tools

ARA needs a DB backend - we could inject results in the existing galera DB, but that might create some issues with the concurrent accesses happening during a deploy for instance. Using sqlite is also an option, but it means new packages, new file location to save, binary format and so on.

It also needs some web server in order to show the reporting, meaning yet another httpd configuration, and the need to access to it on the undercloud.

Also, ARA being a whole service, it would require to deploy it, configure it, and maintain it - plus ensure it is properly running before each action in order to ensure it gets the logs.

By default, ARA doesn’t affect the actual playbook output, while the goal of this spec is mostly about it: provide a concise feedback to the operator, while keeping the logs on disk, in files, with the ability to interact with them through the CLI directly.

In the end, ARA might be a solution, but it will require more work to get it integrated, and, since the Triple UI has been deprecated, there isn’t real way to integrate it in an existing UI tool.

Would it meet the requirements?

No service addition: no, due to the “REST API” aspect. A service must answer API calls
No increase in operation time: probably yes, depending on the way ARA can manage inputs queues. Since it’s also using a callback, we have to account for the potential resources used by it.
Existing tool: yes
Actively maintained: yes
KISS: yes, but it adds new dependencies (DB backend, Web server, ARA service, and so on)

Note on the “new dependencies”: while ARA can be launched without any service, it seems to be only for devel purpose, according to the informative note we can read on the documentation page:

Goodforsmallscaleusagebutinefficientandcontainsalotofsmallfilesatalargescale.

Therefore, we shouldn’t use ARA.

Proposed Roadmap

In Xena:

Ensure we have all the ABI capabilities within validations-libs in order to set needed/wanted parameters for a different log location and file naming
Start to work on the ansible-runner calls so that it uses a tweaked callback, using the validations-libs capabilities in order to get the direct feedback as well as the formatted file in the right location

Security Impact

As we’re going to store full ansible output on the disk, we must ensure log location accesses are closed to any non-wanted user. As stated while talking about the file location, the directory mode and ownership must be set so that only the needed users can access its content (root + stack user)

Once this is sorted out, no other security impact is to be expected - further more, it will even make things more secure than now, since the current way ansible is launched within tripleoclient puts an “ansible.log” file in the operator home directory without any specific rights.

Upgrade Impact

Appart from ensuring the log location exists, there isn’t any major upgrade impact. A doc update must be done in order to point to the log location, as well as some messages within the CLI.

End User Impact

There are two impacts to the End User:

CLI output will be reworked in order to provide useful information (see Direct Feedback above)
Log location will change a bit for the ansible part (see File Logging above)

Performance Impact

A limited impact is to be expected - but proper PoC with metrics must be conducted to assess the actual change.

Multiple deploys must be done, with different Overcloud design, in order to see the actual impact alongside the number of nodes.

Deployer Impact

Same as End User Impact: CLI output will be changed, and the log location will be updated.

Developer Impact

The callback is enabled by default, but the Developer might want to disable it. Proper doc should reflect this. No real impact in the end.

Implementation

Contributors

Cédric Jeanneret
Mathieu Bultel

Work Items

Modify validations-libs in order to provided the needed interface (shouldn’t be really needed, the libs are already modular and should expose the wanted interfaces and parameters)
Create a new callback in tripleo-ansible
Ensure the log directory is created with the correct rights
Update the ansible-runner calls to enable the callback by default
Ensure tripleoclient outputs status update on a regular basis while the logs are being written in the right location
Update/create the needed documentations about the new logging location and management

Include the URL of your launchpad blueprint:

https://blueprints.launchpad.net/tripleo

This spec proposes that we move all tripleo repos to the independent release model. The proposal was first raised during tripleo irc meetings 1 and then also on the openstack-discuss mailing list 2.

Problem Description

The TripleO repos 3 mostly follow the cycle-with-intermediary release model, for example tripleo-heat-templates at 4. Mostly because some of tripleo repos use the independent release model, for example tripleo-upgrade at 5. A description of the different release models can be found at 6.

By following the cycle-with-intermediary release model, TripleO is bound to produce a release for each OpenStack development cycle and a corresponding stable/branch in the tripleo repos. However as we have seen this causes an ongoing maintenance burden; consider that currently TripleO supports 5 active branches - Train, Ussuri, Victoria, Wallaby and Xena (current master). In fact until very recently that list contained 7 branches, including Stein and Queens (currently moving to End Of Life 7).

This creates an ongoing maintenance and resource burden where for each branch we are backporting changes, implementing, running and maintaining upstream CI and ensuring compatibility with the rest of OpenStack with 3rd party CI and the component and integration promotion pipelines 8, on an ongoing bases.

Finally, changes in the underlying OS between branches means that for some branches we maintain two “types” of CI job; for stable/train we have to support both Centos 7 and Centos 8. With the coming stable/xena, we would likely have to support Centos-Stream-8 as well as Centos-Stream-9 in the event that Stream-9 is not fully available by the xena release, which further compounds the resource burden. By adopting the proposal laid out here we can choose to skip the Xena branch thus avoiding this increased CI and maintenance cost.

Proposed Change

Overview

The proposal is for all TripleO repos that are currently using the cycle-with-intermediary release model to switch to independent. This will allow us to choose to skip a particular release and more importantly skip the creation of the given stable/branch on those repos.

This would allow the TripleO community to focus our resources on those branches that are most ‘important’ to us, namely the ‘FFU branches’. That is, the branches that are part of the TripleO Fast Forward Upgrade chain (currently these are Train -> Wallaby -> Z?). For example it is highly likely that we would not create a Xena branch.

Developers will be freed from having to backport changes across stable/branches and this will have a dramatic effect on our upstream CI resource consumption and maintenance burden.

Alternatives

We can continue to create all the stable/branches and use the same release model we currently have. This would mean we would continue to have an increased maintenance burden and would have to address that with increased resources.

Security Impact

None

Upgrade Impact

For upgrades it would mean that TripleO would no longer directly support all OpenStack stable branches. So if we decide not to create stable/xena for example then you cannot upgrade from wallaby to xena using TripleO. In some respects this would more closely match reality since the focus of the active tripleo developer community has typically been on ensuring the Fast Forward Upgrade (e.g. train to wallaby) and less so on ensuring the point to point upgrade between 2 branches.

Other End User Impact

TripleO would no longer be able to deploy all versions of OpenStack. One idea that was brough forth in the discussions around this topic thus far, is that we can attempt to address this by designating a range of git tags as compatible with a particular OpenStack stable branch.

For example if TripleO doesn’t create a stable/xena, but during the xena cycle makes releases for the various Tripleo repos then those releases will be compatible for deploying OpenStack stable/xena. We can maintain and publicise a set of compatible tags for each of the affected repos (e.g., tripleo-heat-templates versions 15.0.0 to 15.999.999 are compatible with OpenStack stable/xena).

Some rules around tagging will help us. Generally we can keep doing what we currently do with respect to tagging; For major.minor.patch (e.g. 15.1.1) in the release tag, we will always bump major to signal a new stable branch.

One problem with this solution is that there is no place to backport fixes to. For example if you are using tripleo-heat-templates 15.99.99 to deploy OpenStack Xena (and there is no stable/xena for tht) then you’d have to apply any fixes to the top of the 15.99.99 tag and use it. There would be no way to commit these fixes into the code repo.

Performance Impact

None

Other Deployer Impact

There were concerns raised in the openstack-discuss thread [2] about RDO packaging and how it would be affected by this proposal. As was discussed there are no plans for RDO to stop building packages for any branch. For the building of tripleo repos we would have to rely on the latest compatible git tag, as outlined above in Other End User Impact.

Developer Impact

Will have less stable/branches to backport fixes to. It is important to note however that by skipping some branches, resulting backports across multiple branches will result in a larger code diff and so be harder for developers to implement. That is, there will be increased complexity in resulting backports if we skip intermediate branches.

As noted in the Other End User Impact section above, for those branches that tripleo decides not to create, there will be no place for developers to commit any branch specific fixes to. They can consume particular tagged releases of TripleO repos that are compatible with the given branch, but will not be able to commit those changes to the upstream repo since the branch will not exist.

Implementation

Assignee(s)

Wesley Hayutin <weshay@redhat.com> Marios Andreou <marios@redhat.com>

Work Items

Besides posting the review against the releases repo 9 we will need to update documentation to reflect and inform about this change.

Dependencies

None

Testing

None

Documentation Impact

Yes we will at least need to add some section to the docs to explain this. We may also add some landing page to show the currently ‘active’ or supported TripleO branches.

References

1: Tripleo IRC meeting logs 25 May 2021
2: openstack-discuss thread ‘[tripleo] Changing TripleO’s release model’
3: TripleO section in governance projects.yaml
4: tripleo-heat-templates wallaby release file
5: tripleo-upgrade independent release file
6: OpenStack project release models
7: openstack-discuss [TripleO] moving stable/stein and stable/queens to End of Life
8: TripleO Docs - TripleO CI Promotions
9: opendev.org openstack/releases git repo

The goal of this proposal is to introduce the community to the idea of removing Keystone from TripleO undercloud and run the remaining OpenStack services either with basic authentication or noauth (i.e. Standalone mode).

Problem Description

With the goal of having a thin undercloud we’ve been simplifying the undercloud architecture since a few cycles and have removed a number of OpenStack services. After moving to use network_data_v2 and ephemeral_heat by default, we are left only with neutron, ironic and ironic-inspector services.

Keystone authentication and authorization does not add lot of value to the undercloud. We use admin and admin project for everything. There are also few service users (one per service) for communication between services. Most of the overcloud deployment and configuration is done as the os user. Also, for large deployments we increase token expiration time to a large value which is orthogonal to keystone security.

Proposed Change

Overview

At present, we have keystone running in the undercloud providing catalog, authentication/authorization services to the remaining deployed services neutron, ironic and ironic-inspector. Ephemeral heat uses a fake keystone client which does not talk to keystone.

All these remaining services are capabale of running standalone using either http_basic or noauth auth_strategy and clients using openstacksdk and keystoneauth can use HTTPBasicAuth or NoAuth identity plugins with the standalone services.

The proposal is to deploy these OpenStack services either with basic auth or noauth and remove keystone from the undercloud by default.

Deploy ironic/ironic-inspector/neutron with http_basic (default) or noauth

This would also allow us to remove some additional services like memcached from the undercloud mainly used for authtoken caching.

Alternatives

Keep keystone in the undercloud as before.

Security Impact

There should not be any significant security implications by disabling keystone on the undercloud as there are no multi-tenancy and RABC requirements for undercloud users/operators. Deploying baremetal and networking services with http_basic authentication would protect against any possible intrusion as before.

Upgrade Impact

There will be no upgrade impact; this change will be transparent to the end-user.

Other End User Impact

None.

Performance Impact

Disabling authentication and authorization would make the API calls faster and the overall resource requirements of undercloud would reduce.

Other Deployer Impact

None

Developer Impact

None.

Implementation

Add THT support for configuring auth_strategy for ironic and neutron services and manage htpasswd files used for basic authentication by the ironic services.

IronicAuthStrategy:http_basicNeutronAuthStrategy:http_basic

Normally, Identity service middleware provides a X-Project-Id header based on the authentication token submitted by the service client. However when keystone is not available neutron expects project_id in the POST requests (i.e create API). Also, metalsmith communicates with neutron to create ctlplane ports for instances.
Add a middleware for neutron API http_basic pipeline to inject a fake project_id in the context.
Add basic authentication middleware to oslo.middleware and use it for undercloud neutron.
Create/Update clouds.yaml to use auth_type: http_basic and use endpoint overrides for the public endpoints with <service_name>_endpoint_override entries. We would leverage the EndpointMap and change extraconfig/post_deploy to create and update clouds.yaml.

clouds:undercloud:auth:password:piJsuvz3lKUtCInsiaQd4GZ1wusername:adminauth_type:http_basicbaremetal_api_version:'1'baremetal_endpoint_override:https://192.168.24.2:13385baremetal_introspection_endpoint_override:https://192.168.24.2:13050network_api_version:'2'network_endpoint_override:https://192.168.24.2:13696

Assignee(s)

Primary assignee:: ramishra

Other contributors:

Work Items

Add basic authentication middleware in oslo.middleware https://review.opendev.org/c/openstack/oslo.middleware/+/802234
Support auth_strategy with ironic and neutron services https://review.opendev.org/c/openstack/tripleo-heat-templates/+/798241
Neutron middleware to add fake project_id to noauth pipleline https://review.opendev.org/c/openstack/neutron/+/799162
Configure neutron paste deploy for basic authentication https://review.opendev.org/c/openstack/tripleo-heat-templates/+/804598
Disable keystone by default https://review.opendev.org/c/openstack/tripleo-heat-templates/+/794912
Add option to enable keystone if required https://review.opendev.org/c/openstack/python-tripleoclient/+/799409
Other patches: https://review.opendev.org/c/openstack/tripleo-ansible/+/796991 https://review.opendev.org/c/openstack/tripleo-common/+/796825 https://review.opendev.org/c/openstack/tripleo-ansible/+/797381 https://review.opendev.org/c/openstack/tripleo-heat-templates/+/799408

Dependencies

Ephemeral heat and network-data-v2 are used as defaults.

Documentation Impact

Update the undercloud installation and upgrade guides.

References

network_data_v2 specification
ephemeral_heat specification

Problem Description

In TripleO we support deployment of Ganesha both when the Ceph cluster is itself managed by TripleO and when the Ceph cluster is itself not managed by TripleO. When the cluster is managed by TripleO, an NFS daemon can be deployed as a regular TripleO service via the tripleo-ansible module 4. It is preferable to have cephadm manage the lifecycle of the NFS container instead of deploying it with tripleo-ansible. In order to do this we will require the following changes on both TripleO and Manila:

the orchestrator provides an interface that should be used by Manila to interact with the ganesha instances. The nfs orchestrator interface is described in 5 and can be used to manipulate the nfs daemon, as well as create and delete exports. In the past the ganesha configuration file was fully customized by ceph-ansible; the orchestrator is going to have a set of overrides to preserve backwards compatibility. This result is achieved by setting a userconfig object that lives within the Ceph cluster 5. It’s going to be possible to check, change and reset the nfs daemon config using the same interface provided by the orchestrator 11.
The deployed NFS daemon is based on the watch_url mechanism 6: adopting a cephadm deployed ganesha instance requires the Manila driver be updated to support this new approach. This work is described in 10.
The ceph-nfs daemon deployed by cephadm has its own HA mechanism, called ingress, which is based on haproxy and keepalived 7 so we would no longer use pcmk as the VIP owner. Note this means we would run pcmk and keepalived in addition to haproxy (deployed by tripleo) and another haproxy (deployed by cephadm) on the same server (though with listeners on different ports). Because cephadm is controlling the ganesha life cycle, the pcs cli will no longer be used to interact with the ganesha daemon and we will change where the ingress daemon is used.

When the Ceph cluster is not managed by TripleO, the Ganesha service is currently deployed standalone on the overcloud and it’s configured to use the external Ceph MON and MDS daemons. However, if this spec is implemented, then the standalone ganesha service will no longer be deployed by TripleO. Instead, we will require that the admin of the external ceph cluster add the ceph-nfs service to that cluster. Though TripleO will still configure Manila to use that service.

Thus in the external case, Ganesha won’t be deployed and details about the external Ganesha must be provided as input during overcloud deployment. We will also provide tools to help someone who has deployed Ganesaha on the overcloud transition the service to their external Ceph cluster. From a high level the process will be the following:

Generate a cephadm spec so that after the external ceph cluster becomes managed by cephadm the spec can be used to add a the ceph-nfs service with the required properties.
Disable the VIP PCS uses and provide a documented method for it to be moved to the external ceph cluster.

Proposed Change

Overview

An ansible task will generate the Ceph NFS daemon spec and it will trigger cephadm 2 to deploy the Ganesha container.

the NFS spec should be rendered and applied against the existing Ceph cluster
the ingress spec should be rendered (as part of the NFS deployment) and applied against the cluster

The container will be no longer controlled by pacemaker.

Security Impact

None, the same code which TripleO would already use for the generation of the Ceph cluster config and keyrings will be consumed.

Upgrade Impact

We will deprecate the ganesha managed by PCS so that it will still work up until Z.
We will provide playbooks which migrate from the old NFS service to the new one.
We will assume these playbooks will be available in Z and run prior to the upgrade to the next release.

Other End User Impact

For fresh deployments, the existing input parameters will be reused to drive the newer deployment tool. For an existing environment, after the Ceph upgrade, the TripleO deployed NFS instance will be stopped and removed by the migration playbook provided, as well as the related pacemaker resources and constraints; cephadm will be able to deploy and manage the new NFS instances, and the end user will see a disruption in the NFS service.

Performance Impact

No changes.

Other Deployer Impact

“deployed ceph”: For the first implementation of this spec we’ll deploy during overcloud deployment but we will aim to deliver this so that it is compatible with “deployed ceph”. VIPs are provisioned with openstack overcloud network vip provision before openstack overcloud network provision and before openstack overcloud node provision so we would have an ingress VIP in advance so we could do this with “deployed ceph”.
directord/task-core: We will ultimately need this implemented for the directord/task-core tool but could start with ansible tasks added to the tripleo_ceph role. Depending on the state of the directord/task-core migration when we implement we might skip the ansible part, though we could POC with it to get started.

Developer Impact

Assuming the manila services are able to interact with Ganesha using the watch_url mechanism, the NFS daemon can be generated as a regular Ceph daemon using the spec approach provided by the tripleo-ansible module 4.

Implementation

Deployment Flow

The deployment and configuration described in this spec will happen during openstack overcloud deploy, as described in 8. This is consistent with how tripleo-ansible used to run during step2 to configure these services. The tripleo-ansible tasks should be moved from a pure ansible templating approach that generates the systemd unit according to the input provided to a cephadm based daemon that can be configured with the usual Ceph mgr config-key mechanism. As described in the overview section, an ingress object will be defined and deployed and this is supposed to manage both the VIP and the HA for this component.

Assignee(s)

fmount
fultonj
gfidente

Work Items

Change the tripleo-ansible module to support the Ceph ingress daemon type
Create a set of tasks to deploy both the nfs and the related ingress daemons
Deprecate the pacemaker related configuration for ceph-nfs, including pacemaker constraints between the manila-share service and ceph-nfs
Create upgrade playbooks to transition from TripleO/pcmk managed nfs ganesha to nfs daemons deployed by cephadm and managed by ceph orch

Dependencies

This work depends on the manila spec 10 that moves from dbus to the watch_url approach

Testing

The NFS daemon feature can be enabled at day1 and it will be tested against the existing TripleO scenario004 9. As part of the implementation plan, the update of the existing heat templates environment CI files, which contain the testing job parameters, is one of the goals of this spec. An important aspect of the job definition process is related to standalone vs multinode. As seen in the past, multinode can help catching issues that are not visible in a standalone environment, but of course the job configuration can be improved in the next cycles, and we can start with standalone testing, which is what is present today in CI.

Documentation Impact

No changes should be necessary to the TripleO documentation, as the described interface remains the unchanged. However, we should provide upgrade instructions for pre existing environments that need to transition from TripleO/pcmk managed nfs ganesha to nfs daemons deployed by cephadm and managed by ceph orch.

Include the URL of your launchpad blueprint: https://blueprints.launchpad.net/tripleo/+spec/unified-orchestration

The purpose of this spec is to introduce core concepts around Task-Core and Directord, explain their benefits, and cover why the project should migrate from using Ansible to using Directord and Task-Core.

TripleO has long been established as an enterprise deployment solution for OpenStack. Different task executions have been used at different times. Originally, os-collect-config was used, then the switch to Ansible was completed. A new task execution environment will enable moving forward with a solution designed around the specific needs of TripleO.

The tools being introduced are Task-Core and Directord.

Task-Core:: A dependency management and inventory graph solution which allows operators to define tasks in simple terms with robust dominion over a given environment. Declarative dependencies will ensure that if a container/config is changed, only the necessary services are reloaded/restarted. Task-Core provides access to the right tools for a given job with provenance, allowing operators and developers to define outcomes confidently.
Directord:: A deployment framework built to manage the data center life cycle, which is both modular and fast. Directord focuses on consistently maintaining deployment expectations with a near real-time level of performance at almost any scale.

Problem Description

Task execution in TripleO is:

Slow
Resource intensive
Complex
Defined in a static and sequential order
Not optimized for scale

TripleO presently uses Ansible to achieve its task execution orchestration goals. While the TripleO tooling around Ansible (playbooks, roles, modules, plugins) has worked and is likely to continue working should maintainers bear an increased burden, future changes around direction due to Ansible Execution Environments provide an inflection point. These upstream changes within Ansible, where it is fundamentally moving away from the TripleO use case, force TripleO maintainers to take on more ownership for no additional benefit. The TripleO use case is actively working against the future direction of Ansible.

Further, the Ansible lifecycle has never matched that of TripleO. A single consistent and backwards compatible Ansible version can not be used across a single version of TripleO without the tripleo-core team committing to maintain that version of Ansible, or commit to updating the Ansible version in a stable TripleO release. The cost to maintain a tool such as Ansible that the core team does not own is high vs switching to custom tools designed specifically for the TripleO use case.

The additional cost of maintaining Ansible as the task execution engine for TripleO, has a high likelihood of causing a significant disruption to the TripleO project; this is especially true as the project looks to support future OS versions.

Presently, there are diminishing benefits that can be realized from any meaningful performance, scale, or configurability improvments. The simplification efforts and work around custom Ansible strategies and plugins have reached a conclusion in terms of returns.

While other framework changes to expose scaling mechanisms, such as using --limit or partitioning of the ansible execution across multiple stacks or roles do help with the scaling problem, they are however in the category of work arounds as they do not directly address the inherent scaling issues with task executions.

Proposed Change

To make meaningful task execution orchestration improvements, TripleO must simplify the framework with new tools, enable developers to build intelligent tasks, and provide meaningful performance enhancements that scale to meet operators’ expectations. If TripleO can capitalize on this moment, it will improve the quality of life for day one deployers and day two operations and upgrades.

The proposal is to replace all usage of Ansible with Directord for task execution, and add the usage of Task-Core for dynamic task dependencies.

In some ways, the move toward Task-Core and Directord creates a General-Problem, as it’s proposing the replacement of many bespoke tools, which are well known, with two new homegrown ones. Be that as it may, much attention has been given to the user experience, addressing many well-known pain points commonly associated with TripleO environments, including: scale, barrier to entry, execution times, and the complex step process.

Overview

This specification consists of two parts that work together to achieve the project goals.

Task-Core:: Task-Core builds upon native OpenStack libraries to create a dependency graph and executes a compiled solution. With Task-Core, TripleO will be able to define a deployment with dependencies instead of brute-forcing one. While powerful, Task-Core keeps development easy and consistent, reducing the time to deliver and allowing developers to focus on their actual deliverable, not the orchestration details. Task-Core also guarantees reproducible builds, runtime awareness, and the ability to resume when issues are encountered.

Templates containing step-logic and ad-hoc tasks will be refactored into Task-Core definitions.
Each component can have its own Task-Core purpose, providing resources and allowing other resources to depend on it.
The invocation of Task-Core will be baked into the TripleO client, it will not have to be invoked as a separate deployment step.
Advanced users will be able to use Task-Core to meet their environment expectations without fully understanding the deployment nuance of multiple bespoke systems.
Employs a validation system around inputs to ensure they are correct before starting the deployment. While the validation wont ensure an operational deployment, it will remove some issues caused by incorrect user input, such as missing dependent services or duplicate services; providing early feedback to deployers so they’re able to make corrections before running longer operations.

Directord:: Directord provides a modular execution platform that is aware of managed nodes. Because Directord leverages messaging, the platform can guarantee availability, transport, and performance. Directord has been built from the ground up, making use of industry-standard messaging protocols which ensure pseudo-real-time performance and limited resource utilization. The built-in DSL provides most of what the TripleO project will require out of the box. Because no solution is perfect, Directord utilizes a plugin system that will allow developers to create new functionality without compromise or needing to modify core components. Additionally, plugins are handled the same, allowing Directord to ensure the delivery and execution performance remain consistent.

Directord is a single application that is ideally suited for containers while also providing native hooks into systems; this allows Directord to operate in heterogeneous environments. Because Directord is a simplified application, operators can choose how they want to run it and are not forced into a one size fits all solution.
Directord is platform-agnostic, allowing it to run across systems, versions, and network topologies while simultaneously guaranteeing it maintains the smallest possible footprint.
Directord is built upon messaging, giving it the unique ability to span network topologies with varying latencies; messaging protocols compensate for high latency environments and will finally give TripleO the ability to address multiple data-centers and fully embrace “the edge.”
Directord client/server communication is secured (TLS, etc) and encrypted.
Directord node management to address unreachable or flapping clients.

With Task-Core and Directord, TripleO will have an intelligent dependency graph that is both easy to understand and extend. TripleO will now be aware of things like service dependencies, making it possible to run day two operations quickly and more efficiently (e.g, update and restart only dependent services). Finally, TripleO will shrink its maintenance burden by eliminating Ansible.

Alternatives

Stay the course with Ansible

Continuing with Ansible for task execution means that the TripleO core team embraces maintaining Ansible for the specific TripleO use case. Additionally, the TripleO project begins documenting the scale limitations and the boundaries that exist due to the nature of task execution. Focus needs to shift to the required maintenance necessary for functional expectations TripleO. Specific Ansible versions also need to be maintained beyond their upstream lifecycle. This maintenance would likely include maintaining an Ansible branch where security and bug fixes could be backported, with our own project CI to validate functionality.

TripleO could also embrace the use of Ansible Execution Environments through continued investigative efforts. Although, if TripleO is already maintaining Ansible, this would not be strictly required.

Security Impact

Task-Core and Directord are two new tools and attack surfaces, which will require a new security assessment to be performed to ensure the tooling exceeds the standard already set. That said, steps have already been taken to ensure the new proposed architecture is FIPS compatible, and enforces transport encryption.

Directord also uses ssh-python for bootstrapping tasks.

Ansible will be removed, and will no longer have a security impact within TripleO.

Upgrade Impact

The undercloud can be upgraded in place to use Directord and Task-Core. There will be upgrade tasks that will migrate the undercloud as necessary to use the new tools.

The overcloud can also be upgraded in place with the new tools. Upgrade tasks will be migrated to use the Directord DSL just like deployment tasks. This spec proposes no changes to the overcloud architecture itself.

As part of the upgrade task migration, the tasks can be rewritten to take advantage of the new features exposed by these tools. With the introduction of Task-Core, upgrade tasks can use well-defined dependencies for dynamic ordering. Just like deployment, update/upgrade times will be decreased due to the aniticipated performance increases.

Other End User Impact

When following the happy path, the end-user, deployers, and operators will not interact with this change as the user interface will effectively remain the same. However the user experience will change. Operators accustomed to Ansible tasks, logging, and output, will instead need to become familiar with those same aspects of Directord and Task-Core.

If an operator wishes to leverage the advanced capabilities of either Task-Core or Directord, the tooling will have documented end user interfaces available for interfaces such as custom components and orchestrations.

It should be noted that there’s a change in deployment architecture in that Directord follows a server/client model; albeit an ephemeral one. This change aims to be fully transparent, however, it is something that end users, deployers, will need to be aware of.

Performance Impact

This specification will have a positive impact on performance. Due to the messaging architecture of Directord, near-realtime task execution will be possible in parallel across all nodes.

Performance analysis has been done comparing configurability and runtime of Directord vs. Ansible, the TripleO default orchestration tool. This analysis highlights some of the performance gains this specification will provide; initial testing suggests that Task-Core and Directord is more than 10x faster than our current tool chain, representing a potential 90% time savings in just the task execution overhead.
One of the goals of this specification is to remove impediments in the time to work. Deployers should not be spending exorbitant time waiting for tools to do work; in some cases, waiting longer for a worker to be available than it would take to perform a task manually.
Improvements from being able to execute more efficiently in parallel. The Ansible strategy work allowed us to run tasks from a given Ansible play in parallel accoss the nodes. However this was limited to a effectively a single play per node in terms of execution. The granularity was limited to a play such that an Ansible play that with 100 items of work for one role and 10 items of work would be run in parallel on the nodes. The role with 10 items of work would likely finish first and the overall execution would have to wait until the entire play was completed everywhere. The long pole for a play’s execution is the node with the most set of tasks. With the transition to task-core and directord, the overall unit of work is an orchestration which may have 5 tasks. If we take the same 100 tasks for one role and split them up into 20 orchestrations that can be run in parallel, and the 10 items of work into two orchestrations for the other roles. We are able to better execute the work in parallel when there are no specific ordering requirements. Improvements are expected around host prep tasks and other services where we do not have specific ordering requirements. Today these tasks get put in a random spot within a play and have to wait on other unrelated tasks to complete before being run. We expect there to be less execution overhead time per the other items in this section, however the overall improvements are limited based on how well we can remove unnecessary ordering requirements.
Deployers will no longer be required to run a massive server for medium-scale deployment. Regardless of size, the memory footprint and compute cores needed to execute a deployment will be significantly reduced.

Other Deployer Impact

Task-Core and Directord represent an unknown factor; as such, they are not battle-tested and will create uncertainty in an otherwise “stable” project.

Deployers will experience the time savings of doing deployments. Deployers who implement new services will need to do so with Directord and Task-Core.

Extensive testing has been done; all known use-cases, from system-level configuration to container pod orchestration, have been covered, and automated tests have been created to ensure nothing breaks unexpectedly. Additionally, for the first time, these projects have expectations on performance, with tests backing up those claims, even at a large scale.

At present, TripleO assumes SSH access between the Undercloud and Overcloud is always present. Additionally, TripleO believes the infrastructure is relatively static, making day two operations risky and potentially painful. Task-Core will reduce the computational burden when crafting action plans, and Directord will ensure actions are always performed against the functional hosts.

Another improvement this specification will enhance is in the area of vendor integrations. Vendors will be able to provide meaningful task definitions which leverage an intelligent inventory and dependency system. No longer will TripleO require vendors have in-depth knowledge of every deployment detail, even those outside of the scope of their deliverable. By easing the job definitions, simplifying the development process, and speeding up the execution of tasks are all positive impacts on deployers.

Test clouds are still highly recommended sources of information; however, system requirements on the Undercloud will reduce. By reducing the resources required to operate the Undercloud, the cost of test environments, in terms of both hardware and time, will be significantly lowered. With a lower barrier to entry developers and operators alike will be able to more easily contribute to the overall project.

Developer Impact

To fully realize the benefits of this specification Ansible tasks will need to be refactored into the Task-Core scheme. While Task-Core can run Ansible and Directord has a plugin system which easily allows developers to port legacy modules into Directord plugins, there will be a developer impact as the TripleO development methodology will change. It’s fair to say that the potential developer impact will be huge, yet, the shift isn’t monumental. Much of the Ansible presently in TripleO is shell-oriented, and as such, it is easily portable and as stated, compatibility layers exist allowing the TripleO project to make the required shift gradually. Once the Ansible tasks are ported, the time saved in execution will be significant.

Example Task-Core and Directord implementation for Keystone:: While this implementation example is fairly basic, it does result in a functional Keystone environment and in roughly 5 minutes and includes services like MySQL, RabbitMQ, Keystone as well as ensuring that the operating systems is setup and configured for a cloud execution environment. The most powerful aspect of this example is the inclusion of the graph dependency system which will allow us easily externalize services.

The use of advanced messaging protocols instead of SSH means TripleO can more efficiently address deployments in local data centers or at the edge
The Directord server and storage can be easily offloaded, making it possible for the TripleO Client to be executed from simple environments without access to the overcloud network; imagine running a massive deployment from a laptop.

Implementation

In terms of essential TripleO integration, most of the work will occur within the tripleoclient, with the following new workflow.

Execution Workflow:

┌────┐   ┌─────────────┐   ┌────┐   ┌─────────┐   ┌─────────┬──────┐   ???????????
│USER├──►│TripleOclient├──►│Heat├──►│Task-Core├──►│Directord│Server├──►? Network ?
└────┘   └─────────────┘   └────┘   └─────────┘   └─────────┴──────┘   ???????????
                ▲                                             ▲             ▲
                │                       ┌─────────┬───────┐   |             |
                └──────────────────────►│Directord│Storage│◄──┘             |
                                        └─────────┴───────┘                 |
                                                                            |
                                                  ┌─────────┬──────┐        |
                                                  │Directord│Client│◄───────┘
                                                  └─────────┴──────┘

Directord|Server - Task executor connecting to client.
Directord|Client - Client program running on remote hosts connecting back to the Directord|Server.
Directord|Storage - An optional component, when not externalized, Directord will maintain the runtime storage internally. In this configuration Directord is ephemeral.

To enable a gradual transition, ansible-runner has been implemented within Task-Core, allowing the TripleO project to convert playbooks into tasks that rely upon strongly typed dependencies without requiring a complete rewrite. The initial implementation should be transparent. Once the Task-Core hooks are set within tripleoclient functional groups can then convert their tripleo-ansible roles or ad-hoc Ansible tasks into Directord orchestrations. Teams will have the flexibility to transition code over time and are incentivized by a significantly improved user experience and shorter time to delivery.

Assignee(s)

Primary assignee:

Cloudnull - Kevin Carter
Mwhahaha - Alex Schultz
Slagle - James Slagle

Other contributors:

Work Items

Migrate Directord and Task-Core to the OpenStack namespace.
Package all of Task-Core, Directord, and dependencies for pypi
RPM Package all of Task-Core, Directord, and dependencies for RDO
Directord container image build integration within TripleO / tcib
Converge on a Directord deployment model (container, system, hybrid).
Implement the Task-Core code path within TripleO client.
Port in template Ansible tasks to Directord orchestrations.
Port Ansible roles into Directord orchestrations.
Port Ansible modules and actions into pure Python or Directord components
Port Ansible workflows in tripleoclient into pure Python or Directord orchestrations.
Migration tooling for Heat templates, Ansible roles/modules/actions.
Port Ansible playbook workflows in tripleoclient to pure Python or Directord orchestrations.
Undercloud upgrade tasks to migrate to Directord + Task-Core architecture
Overcloud upgrade tasks to migrate to enable Directord client bootstrapping

Dependencies

Both Task-Core and Directord are dependencies, as they’re new projects. These dependencies may or may not be brought into the OpenStack namespace; regardless, both of these projects, and their associated dependencies, will need to be packaged and provided for by RDO.

Testing

If successful, the implementation of Task-Core and Directord will leave the existing testing infrastructure unchanged. TripleO will continue to function as it currently does through the use of the tripleoclient.

New tests will be created to ensure the Task-Core and Directord components remain functional and provide an SLA around performance and configurability expectations.

Documentation Impact

Documentation around Ansible will need to be refactored.

New documentation will need to be created to describe the advanced usage of Task-Core and Directord. Much of the client interactions from the “happy path” will remain unchanged.

References

Directord official documentation https://directord.com
Ansible’s decision to pivot to execution environments: https://ansible-runner.readthedocs.io/en/latest/execution_environments.html

Starting in the Octopus release, Ceph introduced its own day1 tool called cephadm and its own day2 tool called orchestrator which replaced ceph-ansible. During the Wallaby and Xena cycles TripleO moved away from ceph-ansible and adopted cephadm 1 as described in 2. During Xena cycle a new approach of deploying Ceph in a TripleO context has been established and now a Ceph cluster can be provisioned before the overcloud is created, leaving to the overcloud deployment phase the final configuration of the Ceph cluster which depends on the OpenStack enabled services defined by the tripleo-heat-templates interface. The next goal is to deploy as many Ceph services as possible using the deployed ceph interface instead of during overcloud deployment. As part of this effort, we should pay attention to the high-availability aspect, how it’s implemented in the current release and how it should be changed for Ceph. This spec represents a follow up of 3, it defines the requirements to rely on the Ceph provided HA daemons and describes the changes required in TripleO to meet this goal.

Problem Description

In the following description we are referring to the Ganesha daemon and the need of the related Ceph Ingress daemon deployment, but the same applies to all the existing daemons that requires an high-availability configuration (e.g., RGW and the Ceph dashboard for the next Ceph release). In TripleO we support deployment of Ganesha both when the Ceph cluster is itself managed by TripleO and when the Ceph cluster is itself not managed by TripleO. When the cluster is managed by TripleO, as per spec 3, it is preferable to have cephadm manage the lifecycle of the NFS container instead of deploying it with tripleo-ansible, and this is broadly covered and solved by allowing the tripleo Ceph mkspec module to support the new Ceph daemon 4. The ceph-nfs daemon deployed by cephadm has its own HA mechanism, called ingress, which is based on haproxy and keepalived 5 so we would no longer use pcmk as the VIP owner. This means we would run pcmk and keepalived in addition to haproxy (deployed by tripleo) and another haproxy (deployed by cephadm) on the same server (though with listeners on different ports). This approach only relies on Ceph components, and both external and internal scenarios are covered. However, adopting the ingress daemon for a TripleO deployed Ceph cluster means that we need to make the overcloud aware about the new running services: for this reason the proposed change is meant to introduce a new TripleO resource that properly handles the interface with the Ceph services and is consistent with the tripleo-heat-templates roles.

Proposed Change

Overview

The change proposed by this spec requires the introduction of a new TripleO Ceph Ingress resource that describes the ingress service that provides load balancing and HA. The impact of adding a new OS::TripleO::Services::CephIngress resource can be seen on the following projects.

tripleo-common

As described in Container Image Preparation 6 the undercloud may be used as a container registry for all the ceph related containers and a new, supported syntax, has been introduced to deployed ceph to download containers from authenticated registries. However, as per 7, the Ceph ingress daemons won’t be baked into the Ceph daemon container, hence tripleo container image prepare should be executed to pull the new container images/tags in the undercloud as made for the Ceph Dashboard and the regular Ceph image. Once the ingress containers are available, it’s possible to deploy the daemon on top of ceph-nfs or ceph-rgw. In particular, if this spec is going to be implemented, deployed ceph will be the only way of setting up this daemon through cephadm for ceph-nfs, resulting in a simplified tripleo-heat-templates interface and a less number of tripleo ansible tasks execution because part of the configuration is moved before the overcloud is deployed. As part of this effort, considering that the Ceph related container images have grown over the time, a new condition will be added to the tripleo-container jinja template 8 to avoid pulling additional ceph images if Ceph is not deployed by TripleO 10. This will result in a new optimization for all the Ceph external cluster use cases, as well as the existing CI jobs without Ceph.

tripleo-heat-templates

A Heat resource will be created within the cephadm space. The new resource will be also added to the existing Controller roles and all the relevant environment files will be updated with the new reference. In addition, as described in the spec 3, pacemaker constraints for ceph-nfs and the related vip will be removed. The tripleo-common ceph_spec library is already able to generate the spec for this kind of daemon and it will trigger cephadm 4 to deploy an ingress daemon provided that the NFS Ceph spec is applied against an existing cluster and the backend daemon is up and running. As mentioned before, the ingress daemon can also be deployed on top of an RGW instance, therefore the proposed change is valid for all the Ceph services that require an HA configuration.

Security Impact

The ingress daemon applied to an existing ceph-nfs instance is managed by cephadm, resulting in a simplified model in terms of lifecycle. A Ceph spec for the ingress daemon is generated right after the ceph-nfs instance is applied, and as per 5 it requires two additional options:

frontend_port
monitoring_port

The two ports are required by haproxy to accept incoming requests and for monitoring purposes, hence we need to make TripleO aware about this new service and properly setup the firewall rules. As long as the ports defined by the spec are passed to the overcloud deployment process and defined in the tripleo-heat-templates CephIngress daemon resource, the firewall_rules tripleo ansible role is run and rules are applied for both the frontend and monitoring port. The usual network used by this daemon (and affected by the new applied rules) is the StorageNFS, but we might have cases where an operator overrides it. The lifecycle, builds and security aspects for the container images associated to the CephIngress resource are not managed by TripleO, and the Ceph organization takes care about maintanance and updates.

Upgrade Impact

The problem of an existing Ceph cluster is covered by the spec 8.

Performance Impact

Since two new images (and the equivalent tripleo-heat-templates services) have been introduced, some time is required to pull these new additional containers in the undercloud. However, the tripleo_containers jinja template has been updated, splitting off the Ceph related container images. In particular, during the containers image prepare phase, a new boolean option has been added and pulling the Ceph images can be avoided by setting the ceph_images boolean to false. By doing this we can improve performances when Ceph is not required.

Developer Impact

This effort can be easily extended to move the RGW service to deployed ceph, which is out of scope of this spec.

Implementation

Deployment Flow

The deployment and configuration described in this spec will happen during openstack overcloud ceph deploy, as described in 8. The current implementation of openstack overcloud network vip provision allows to provision 1 vip per network, which means that using the new Ceph Ingress daemon (that requires 1 vip per service) can break components that are still using the VIP provisioned on the storage network (or any other network depending on the tripleo-heat-templates override specified) and are managed by pacemaker. A new option –ceph-vip for openstack overcloud ceph deploy command will be added 11. This option may be used to reserve VIP(s) for each Ceph service specified by the ‘service/network’ mapping defined as input. For instance, a generic ceph service mapping can be something like the following:

---ceph_services:-service:ceph_nfsnetwork:storage-service:ceph_rgwnetwork:storage

For each service added to the list above, a virtual ip on the specified network (that can be a composable network) will be created and used as frontend_vip of the ingress daemon. As described in the overview section, an ingress object will be defined and deployed and this is supposed to manage both the VIP and the HA for this component.

Assignee(s)

fmount
fultonj
gfidente

Work Items

Create a new Ceph prefixed Heat resource that describes the Ingress daemon in the TripleO context.
Add both haproxy and keepalived containers to the Ceph container list so that they can be pulled during the Container Image preparation phase.
Create a set of tasks to deploy both the nfs and the related ingress daemon
Deprecate the pacemaker related configuration for ceph-nfs, including pacemaker constraints between the manila-share service and ceph-nfs
Create upgrade playbooks to transition from TripleO/pcmk managed nfs ganesha to nfs/ingress daemons deployed by cephadm and managed by ceph orch

Depending on the state of the directord/task-core migration we might skip the ansible part, though we could POC with it to get started, extending the existing tripleo-ansible cephadm role.

Dependencies

This work depends on the tripleo_ceph_nfs spec 3 that moves from tripleo deployed ganesha to the cephadm approach.

Testing

Documentation Impact

The documentation will describe the new parameters introduced to the deployed ceph cli to give the ability to deploy additional daemons (ceph-nfs and the related ingress daemon) as part of deployed ceph. However, we should provide upgrade instructions for pre existing environments that need to transition from TripleO/pcmk managed nfs ganesha to nfs daemons deployed by cephadm and managed by ceph orch.

https://blueprints.launchpad.net/tripleo/+spec/decouple-tripleo-tasks

This spec proposes decoupling tasks across TripleO by organizing tasks in a way that they are grouped as a function of what they manage. The desire is to be able to better isolate and minimize what tasks need to be run for specific management operations. The process of decoupling tasks is implemented through moving tasks into standalone native ansible roles and playbooks in tripleo-ansible.

Problem Description

TripleO presently manages the entire software configuration of the overcloud at once each time openstackoverclouddeploy is executed. Regardless of whether nodes were already deployed, require a full redeploy for some reason, or are new nodes (scale up, replacement) all tasks are executed. The functionality of only executing needed tasks lies within Ansible.

The problem with relying entirely on Ansible to determine if any changes are needed is that it results in long deploy times. Even if nothing needs to be done, it can take hours just to have Ansible check each task in order to make that determination.

Additionally, TripleO’s reliance on external tooling (Puppet, container config scripts, bootstrap scripts, etc) means that tasks executing those tools must be executed by Ansible as Ansible does not have the necessary data needed in order to determine if those tasks need to be executed or not. These tasks often have cascading effects in determining what other tasks need to be run. This is a general problem across TripleO, and is why the model of just executing all tasks on each deploy has been the accepted pattern.

Proposed Change

The spec proposes decoupling tasks and separating them out as needed to manage different functionality within TripleO. Depending on the desired management operation, tripleoclient will contain the necessary functionality to trigger the right tasks. Decoupling and refactoring tasks will be done by migrating to standalone ansible role and playbooks within tripleo-ansible. This will allow for reusing the standalone ansible artifacts from tripleo-ansible to be used natively with just ansible-playbook. At the same time, the tripleo-heat-templates interfaces are maintained by consuming the new roles and playbooks from tripleo-ansible.

Overview

There are 3 main changes proposed to implement this spec:

Refactor ansible tasks from tripleo-heat-templates into standalone roles in tripleo-ansible.
Develop standalone playbooks within tripleo-ansible to consume the tripleo-ansible roles.
Update tripleo-heat-templates to use the standalone roles and playbooks from tripleo-ansible with new role_data interfaces to drive specific functionality with new openstackovercloud commands.

Writing standalone roles in tripleo-ansible will largely be an exercise of copy/paste from tasks lists in tripleo-heat-templates. As tasks are moved into standalone roles, tripleo-heat-templates can be directly updated to run tasks from the those roles using include_role. This pattern is already well established in tripleo-heat-templates with composable services that use existing standalone roles.

New playbooks will be developed within tripleo-ansible to drive the standalone roles using pure ansible-playbook. These playbooks will offer a native ansible experience for deploying with tripleo-ansible.

The design principles behind the standalone role and playbooks are:

Native execution with ansible-playbook, an inventory, and variable files.
No Heat. While Heat remains part of the TripleO architecture, it has no bearing on how the native ansible is developed in tripleo-ansible. tripleo-heat-templates can consume the standalone ansible playbooks and roles from tripleo-ansible, but it does not dictate the interface. The interface should be defined for native ansible best practices.
No puppet. As the standalone roles are developed, they will not rely on puppet for configuration or any other tasks. To allow integration with tripleo-heat-templates and existing TripleO interfaces (Hiera, Heat parameters), the roles will allow skipping config generation and other parts that use puppet so that pieces can be overridden by tripleo-heat-templates specific tasks. When using native Ansible, templated config files and native ansible tasks will be used instead of Puppet.
While the decoupled tasks will allow for cleaner interfaces for executing just specific management operations, all tasks will remain idempotent. A full deployment that re-runs all tasks will still work, and result in no effective changes for an already deployed cloud with the same set of inputs.

The standalone roles will use separated task files for each decoupled management interface exposed. The playbooks will be separated by management interface as well to allow for executing just specific management functionality.

The decoupled management interfaces are defined as:

bootstrap
install
pre-network
network
configure
container-config
service-bootstrap

New task interfaces in tripleo-heat-templates will be added under role_data to correspond with the new management interfaces, and consume the standalone ansible from tripleo-ansible. This will allow executing just specific management interfaces and using the standalone playbooks from tripleo-ansible directly.

New subcommands will be added to tripleoclient to trigger the new management interface operations, openstackovercloudinstall, openstackovercloudconfigure, etc.

openstackoverclouddeploy would continue to function as it presently does by doing a full assert of the system state with all tasks. The underlying playbook, deploy-steps-playbook.yaml would be updated as necessary to include the other playbooks so that all tasks can be executed.

Alternatives

Alternative 1 - Use –tags/–skip-tags

With --tags / --skip-tags, tasks could be selectively executed. In the past this has posed other problems within TripleO. Using tags does not allow for composing tasks to the level needed, and often results in running tasks when not needed or forgetting to tag needed tasks. Having to add the special cased always tag becomes necessary so that certain tasks are run when needed. The tags become difficult to maintain as it is not apparent what tasks are tagged when looking at the entire execution. Additionally, not all operations within TripleO map to Ansible tasks one to one. Container startup are declared in a custom YAML format, and that format is then used as input to a task. It is not possible to tag individual container startups unless tag handling logic was added to the custom modules used for container startup.

Alternative 2 - Use –start-at-task

Using --start-at-task is likewise problematic, and it does not truly partition the full set of tasks. Tasks would need to be reordered anyway across much of TripleO so that --start-at-task would work. It would be more straightforward to separate by playbook if a significant number of tasks need to be reordered.

Security Impact

Special consideration should be given to security related tasks to ensure that the critical tasks are executed when needed.

Upgrade Impact

Upgrade and update tasks are already separated out into their own playbooks. There is an understanding that the full deploy_steps_playbook.yaml is executed after an update or upgrade however. This full set of tasks could end up being reduced if tasks are sufficiently decoupled in order to run the necessary pieces in isolation (config, bootstrap, etc).

Other End User Impact

Users will need to be aware of the limitations of using the new management commands and playbooks. The expectation within TripleO has always been the entire state of the system is re-asserted on scale up and configure operations.

While the ability to still do a full assert would be present, it would no longer be required. Operators and users will need to understand that only running certain management operations may not fully apply a desired change. If only a reconfiguration is done, it may not imply restarting containers. With the move to standalone and native ansible components, with less config-download based generation, it should be more obvious what each playbooks is responsible for managing. The native ansible interfaces will help operators reason about what needs to be run and when.

Performance Impact

Performance should be improved for the affected management operations due to having to run less tasks, and being able to run only the tasks needed for a given operation.

There should be no impact when running all tasks. Tasks must be refactored in such a way that the overall deploy process when all tasks are run is not made slower.

Other Deployer Impact

Discuss things that will affect how you deploy and configure OpenStack that have not already been mentioned, such as:

What config options are being added? Should they be more generic than proposed (for example a flag that other hypervisor drivers might want to implement as well)? Are the default values ones which will work well in real deployments?
Is this a change that takes immediate effect after its merged, or is it something that has to be explicitly enabled?

Developer Impact

TripleO developers will be responsible for updating the service templates that they maintain in order to refactor the tasks.

Implementation

Assignee(s)

Primary assignee:: James Slagle <jslagle@redhat.com>

Work Items

Dependencies

None.

Testing

Existing CI jobs would cover changes to task refactorings. New CI jobs could be added for the new isolated management operations.

Documentation Impact

New commands and playbooks must be documented.

References

standalone-roles POC

https://blueprints.launchpad.net/tripleo/+spec/mixed-operating-system-versions

This spec proposes that a single TripleO release supports multiple operating system versions.

Problem Description

Historically a single branch or release of TripleO has supported only a single version of an operating system at a time. In the past, this has been specific versions of Ubuntu or Fedora in the very early days, and now has standardized on specific versions of CentOS Stream.

In order to upgrade to a later version of OpenStack, it involves first upgrading the TripleO undercloud, and then upgrading the TripleO overcloud to the later version of OpenStack. The problem with supporting only a single operating system version at a time is that such an OpenStack upgrade typically implies an upgrade of the operating system at the same time. Combining the OpenStack upgrade with a simultaneous operating system upgrade is problematic due to:

Upgrade complexity
Upgrade time resulting in extended maintenance windows
Operating system incompatibilities with running workloads (kernel, libvirt, KVM, qemu, OVS/OVN, etc).
User impact of operating system changes (docker vs. podman, network-scripts vs. NetworkManager, etc).

Proposed Change

Overview

This spec proposes that a release of TripleO support 2 major versions of an operating system, particularly CentOS Stream. A single release of TripleO supporting two major versions of CentOS Stream will allow for an OpenStack upgrade while remaining on the same operating version.

There are multiple software versions in play during an OpenStack upgrade:

TripleO:: The TripleO version is the version of the TripleO related packages installed on the undercloud. While some other OpenStack software versions are used here (Ironic, Neutron, etc), for the purposes of this spec, all TripleO and OpenStack software on the undercloud will be referred to as the TripleO version. The TripleO version corresponds to an OpenStack version. Examples: Train, Wallaby, Zed.
OpenStack:: The OpenStack version is the version of OpenStack on the overcloud that is being managed by the TripleO undercloud. Examples: Train, Wallaby, Zed.
Operating System:: The operating system version is the version of CentOS Stream. Both the undercloud and overcloud have operating system versions. The undercloud and the overcloud may not have the same operating system version, and all nodes in the overcloud may not have the same operating system version. Examples: CentOS Stream 8, 9, 10
Container Image:: The container image version is the version of the base container image used by tcib. This is a version of the Red Hat universal base image (UBI). Examples: UBI 8, 9, 10

For the purposes of this spec, the operating system versions being discussed will be CentOS Stream 8 and 9, while the OpenStack versions will be Train and Wallaby. However, the expectation is that TripleO continues to support 2 operating system versions with each release going forward. Subsequently, the Zed. release of TripleO would support CentOS Stream 9 and 10.

With the above version definitions and considerations in mind, a TripleO managed upgrade from Train to Wallaby would be described as the following:

Upgrade the undercloud operating system version from CentOS Stream 8 to 9.
Upgrade the undercloud TripleO version from Train to Wallaby.
1. The Wallaby version of the TripleO undercloud will only run on CentOS Stream 9.
2. Implies upgrading all TripleO and OpenStack software on the undercloud to Wallaby.
Upgrade the OpenStack version on the overcloud from Train to Wallaby
1. Does not imply upgrading the operating system version from CentOS Stream 8 to 9.
2. Implies upgrading to new container image versions that are the images for OpenStack Wallaby. These container image versions will likely be service dependent. Some services may use UBI version 9, while some may remain on UBI version 8.
Upgrade the operating system version on the overcloud nodes from CentOS Stream 8 to 9.
1. Can happen node by node, with given constraints that might include all control plane nodes need to be upgraded at the same time.
2. Data plane nodes could be selectively upgraded.

The default behavior will be that users and operators can choose to upgrade to CentOS Stream 9 separately from the OpenStack upgrade. For those operators who want a combined OpenStack and operating system upgrade to match previous FFU behavior, they can perform both upgrades back to back. The OpenStack and operating system upgrades will be separate processes. There may be UX around making the processes appear as one, but that is not prescribed by this spec.

New TripleO deployments can choose either CentOS Stream 8 or 9 for their Overcloud operating system version.

The implication with such a change is that the TripleO software needs to know how to manage OpenStack on different operating system versions. Ansible roles, puppet modules, shell scripts, etc, all need to remove any assumptions about a given operating system and be developed to manage both CentOS Stream 8 and 9. This includes operating system utilities that may function quite differently depending on the underlying version, such as podman and container-tools.

CentOS Stream 8 support could not be dropped until the Zed. release of TripleO, at which time, support would be needed for CentOS Stream 9 and 10.

Alternatives

Alternative 1:: The TripleO undercloud Wallaby version could support running on both CentOS Stream 8 and 9. There does not seem to be much benefit in supporting both. Some users may refuse to introduce 9 into their environments at all, but TripleO has not encountered similar resistance in the past.
Alternative 2:: When upgrading the overcloud to the OpenStack Wallaby version, it could be required that all control plane nodes go through an operating system upgrade as well. Superficially, this appears to reduce the complexity of the development and test matrix. However, given the nature of composable roles, this requirement would really need to be prescribed per-service, and not per-role. Enforcing such a requirement would be problematic given the flexibility of running any service on any role. It would instead be better that TripleO document what roles need to be upgraded to a newer operating system version at the same time, by documenting a set of already provided roles or services. E.g., all nodes running a pacemaker managed service need to be upgraded to the same operating system version at the same time.
Alternative 3:: A single container image version could be used for all of OpenStack Wallaby. In order to support running those containers on both CentOS Stream 8 and 9, the single UBI container image would likely need to be 8, as anticipated support statements may preclude support for running UBI 9 images on 8.
Alternative 4:: New deployments could be forced to use CentOS Stream 9 only for their overcloud operating system version. However, some users may have workloads that have technical or certification requirements that could require CentOS Stream 8.

Security Impact

None.

Upgrade Impact

This proposal is meant to improve the FFU process by separating the OpenStack and operating system upgrades.

Most users and operators will welcome this change. Some may prefer the old method which offered a more simultaneous and intertwined upgrade. While the new process could be implemented in such a way to offer a similar simultaneous experience, it will still be different and likely appear as 2 distinct steps.

Distinct steps should result in an overall simplification of the upgrade process.

Other End User Impact

None.

Performance Impact

The previous implementations of FFU had the OpenStack and operating system upgrades intertwined in the way that they were performed. With the separation of the upgrade processes, the overall upgrade of both OpenStack and the operating system may take a longer amount of time overall. Operators would need to plan for longer maintenance windows in the cases where they still want to upgrade both during the same windows.

Otherwise, operators can choose to upgrade just OpenStack first, and then the operating system at a later date, resulting in multiple, but shorter, maintenance windows.

Other Deployer Impact

None.

Developer Impact

TripleO developers will need support managing OpenStack software across multiple operating system versions.

Service developers responsible for TripleO integrations, will need to decide upgrade requirements around their individual services when it comes to container image versions and supporting different operating system versions.

Given that the roll out of CentOS Stream 9 support in TripleO has happened in a way that overlaps with supporting 8, it is largely true today that TripleO Wallaby already supports both 8 and 9. CI jobs exist that test Wallaby on both 8 and 9. Going forward, that needs to remain true.

Implementation

Assignee(s)

Primary assignee:: <launchpad-id or None>
Other contributors:: <launchpad-id or None>

Work Items

tripleo-ansible - CentOS Stream 8 and 9 support
tripleo-heat-templates - CentOS Stream 8 and 9 support
puppet-tripleo - CentOS Stream 8 and 9 support
puppet-* - CentOS Stream 8 and 9 support
tcib - build right container image versions per service

Dependencies

CentOS Stream 9 builds will be required to fully test and develop

Testing

FFU is not typically tested in upstream CI. However, CI will be needed that tests deploying OpenStack Wallaby on both CentOS Stream 8 and 9 in order to verify that TripleO Wallaby is compatible with both operating system versions.

Documentation Impact

The matrix of supported versions will need to be documented within tripleo-docs.

References

None.