podman/test/upgrade
Matthew Heon 141de86862 Revamp Libpod state strings for Docker compat
Improve our compatibility with Docker by better handling the
state strings that we print in `podman ps`. Docker capitalizes
all states in `ps` (we do not) - fix this in our PS code. Also,
stop normalizing ContainerStateConfigured to the "Created" state,
and instead make it always be Created, with the existing Created
state becoming Initialized.

I didn't rename the actual states because I'm somewhat reticent
to make such a large change a day before we leave for break. It's
somewhat confusing that ContainerStateConfigured now returns
Created, but internally and externally we're still consistent.

[NO NEW TESTS NEEDED] existing tests should catch anything that
broke.

I also consider this a breaking change. I will flag appropriately
on Github.

Fixes RHBZ#2010432 and RHBZ#2032561

Signed-off-by: Matthew Heon <mheon@redhat.com>
2022-01-17 13:56:07 -05:00
..
helpers.bash podman upgrade tests for networking 2021-09-09 10:26:29 +02:00
README.md codespell cleanup 2021-05-05 15:50:59 -04:00
test-upgrade.bats Revamp Libpod state strings for Docker compat 2022-01-17 13:56:07 -05:00

Background

For years we've been needing a way to test podman upgrades; this became much more critical on December 7, 2020, when Matt disclosed a bug he had found over the weekend (#8613) in which reuse of a previously-defined field name would result in fatal JSON decode failures if current-podman were to try reading containers created with podman <= 1.8 (FIXME: confirm)

Upgrade testing is a daunting problem; but in the December 12 Cabal meeting Dan suggested using podman-in-podman. This PR is the result of fleshing out that idea.

Overview

The BATS script in this directory fetches and runs an old-podman container image from quay.io/podman, uses it to create and run a number of containers, then uses new-podman to interact with those containers.

As of 2021-02-23 the available old-podman versions are:

$ ./bin/podman search --list-tags quay.io/podman/stable | awk '$2 ~ /^v/ { print $2}' | sort | column -c 75
v1.4.2  v1.5.0  v1.6    v1.9.0  v2.0.2  v2.1.1
v1.4.4  v1.5.1  v1.6.2  v1.9.1  v2.0.6  v2.2.1

Test invocation is:

$ sudo env PODMAN=bin/podman PODMAN_UPGRADE_FROM=v1.9.0 PODMAN_UPGRADE_TEST_DEBUG= bats test/upgrade

(Path assumes you're cd'ed to top-level podman repo). PODMAN_UPGRADE_FROM can be any of the versions above. PODMAN_UPGRADE_TEST_DEBUG is empty here, but listed so you can set it =1 and leave the podman_parent container running. Interacting with this container is left as an exercise for the reader.

The script will pull the given podman image, invoke it with a scratch root directory, and have it do a small set of podman stuff (pull an image, create/run some containers). This podman process stays running because if it exits, it kills containers running inside the container.

We then invoke the current (host-installed) podman, using the same scratch root directory, and perform operations on those images and containers. Most of those operations are done in individual @tests.

The goal is to have this upgrade test run in CI, iterating over a loop of known old versions. This list would need to be hand-maintained and updated on new releases. There might also need to be extra configuration defined, such as per-version commands (see below).

Findings

Well, first, v1.6.2 won't work on default f32/f33: the image does not include crun, so it can't work at all:

ERRO[0000] oci runtime "runc" does not support CGroups V2: use system migrate to mitigate

I realize that it's kind of stupid not to test 1.6, since that's precisely the test that would've caught #8613 early, but I just don't think it's worth the hassle of setting up cgroupsv1 VMs.

For posterity, in an earlier incantation of this script I tried booting f32 into cgroupsv1 and ran into the following warnings when running new-podman on old-containers:

ERRO[0000] error joining network namespace for container 322b66d94640e31b2e6921565445cf0dade4ec13cabc16ee5f29292bdc038341: error retrieving network namespace at /var/run/netns/cni-577e2289-2c05-2e28-3c3d-002a5596e7da: failed to Statfs "/var/run/netns/cni-577e2289

Where To Go From Here

  • Tests are still (2021-02-23) incomplete, with several failing outright. See FIXMEs in the code.

  • Figuring out how/if to run rootless. I think this is possible, perhaps even necessary, but will be tricky to get right because of home-directory mounting.

  • Figuring out how/if to run variations with different config files (e.g. running OLD-PODMAN that creates a user libpod.conf, tweaking that in the test, then running NEW-PODMAN upgrade tests)