We currently submit builds to Coverity manually every now and then,
but it would make sense to submit them more frequently and periodically,
so that it can detect defects sooner.
Add a "coverity" stage to the pipeline, which submits a build to Coverit
(the scheduls currently set to run every week).
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1973
Add explanation of how to indicate the new issues workflow to
MAINTAINERS.md: triage -> investigation -> devel. The different
stages are indicated using Gitlab's scoped labels (mutually exclusive).
These stages try to hightlight that the issue cannot be fixed and it's
not moving forward because more info is needed, already. Also, add a
section to CONTRIBUTING.md highlighting the importance of helping in
the triage and investigation stages: developers often cannot fix bugs
because lack of time to investigate, but even users that doesn't know
how to fix it due to lack of knowledge of the code base can help thanks
to their knowledge on networking.
Finally, make the 'triage:issues' CI job to work again, adding some
new policies with new automations. The automation will add or remove the
labels: stale, help-needed::{triage, investigation, devel} and
unassigned.
The labels help-needed::* and unassigned will be automatically added to
all issues without an assignee. This reflects better the reality of not
having enough time to work on most of the issues unless there is some
external help.
This job was supposed to run periodically. However, it stopped working
when a "workflow" section was added to .gitlab-ci.yml because it
prevented pipelines of the type "scheduled" to be created.
7fa72645e5 ('gitlab-ci: make detached MR pipeline for external contributor's pipelines to run')
Now, if it's run, it fails with error:
multi_xml requires Ruby version >= 3.1.4. The current ruby version is 2.7.8.225.
Let's disable the job until we fix it and we decide what triage we want
to do. When we do it, we will need to adapt the jobs to be run with the
right periodicity, maybe using custom pipeline variables.
This will force to regenerate the various images of the distributions
that we want to test so we get an updated snapshot on them. Otherwise we
might be testing a months old version of them.
A bug in ci-fairy was making the deletion of the images to partially
fail. It is fixed in the latest version of ci-fairy, so we need to
update the value of templates_sha to pick it.
The task will run only on pipelines of type "scheduled". Then we can
create a weekly scheduled pipeline in Gitlab.
Some warnings in the generation of the translation files indicate real
errors, like strings that cannot be extracted for translations. Check
that no warnings are emitted.
Debian 9 (stretch) is end of life, and the repositories are archived. We
need to patch the containers so that `apt-get update` continues to work.
A new ci-templates version brings that.
Note that at the moment, there is still another issue for debian:9
containers. Unclear whether that can be fixed. In any case, bumping to
latest ci-templates is not wrong, and works around the first issue on
debian:9, making it possible to at least look at the second issue.
https://gitlab.freedesktop.org/freedesktop/ci-templates/-/merge_requests/175
These stages were not properly implemented and don't seem to work.
Drop them.
Note that we do want that our cached containers get collected eventually.
As these are just caches for performance reasons, that could be done with
little downsides (we can just regenerate the containers when we need them).
However, that's not done by our gitlab-ci stages. Instead, it should
be done on a project level. It's not clear whether that is actually done,
but if there is a need (because of the resources that this wastes), then
we should do that (on freedesktop.org's gitlab instance).
We want that the tier2+ tests are only run manually. As those tests
depend on the respective prep step, there are 3 possibilities:
1) make prep manual and the tier test automatic. That is what we would
want, because then we can just manually trigger the prep step (one
click). However, in the past this didn't work.
2) make the prep automatic and the test manual. That works, the downside
is that we often run the prep step when its not needed. This is what
we used to do to workaround 1).
3) make prep and the test manual. Then there are no unnecessary tests
run, but triggering a manual test is cumbersome. First click to start
the prep step, then wait, then click again.
Revisit this. It seems 1) is working now. Yeay.
Also rename the prep stages, so that it's clear to which tier they
belong. I guess, I could move them instead to prep1, prep2, prep3
stages, but then there are a lot of columns on the web site.
The distro.name is not just a pretty name, its the name under which we fetch
the container. It is thus a well-known name, that we can rely on.
The "base_type" only depends on the distro name, and it makes no sense
to ever choose a different name. Tracking it in the "distributions"
array is thus redundant.
Move the mapping of distro.name to the base type to a separate place.
The tag we actually use already contains a hash of the input files and
is generated (by `ci-fairy generate-templates`). There is no need for having
this fixed prefix. As also seens by having a date there, which is maintained
badly and meaningless.
Drop it.
The benefit is that instead of one long running job for fedora:37 (the
current tier1 test), we have several smaller.
A minor downside is, that if the build is broken, then usually the very
first test would already fail. Previously, that meant that the follow up
tests were skipped. Now, they run all in parallel. However, test
failures should be the exception, so the wasted resources are probably
irrelevant. The upside is, that we can see which tests fail, and we run
them much faster (in parallel).
This is only done for the tier1 test, because those tests are started
automatically. Other tiers need to be triggered manually, which already
means a lot of clicking. Making those also matrix tests, would result in
an insane amount of clicking. As those other tests are run much more
seldom, having them huge is probably fine.
We have many test configurations (i.e. distros like fedora:37,
debian:9). Almost all of them run manually triggered, because running
them every time would be wasteful.
Still, even as we trigger those tests only seldom, whenever we trigger
them all together, they consume still too many resources of the
freedesktop.org gitlab infrastructure.
One possibility would be to just drop old distros (e.g. fedora:30).
Which tests are setup in gitlab-ci is constantly refined and adjusted.
So dropping some distros is not necessarily wrong and bound to happen
eventually.
However, I also don't find it great to just disable tests that are still
passing. If we want to avoid consuming too many resources, we can just
choose not to run those tests. We don't need to enforce that by deleting
tests. Once deleted, such a configuration cannot be tested anymore as it
would be too cumbersome to recreate the setup manually.
Instead, introduce stages/tiers to clearer mark configuration that we
should test even less frequently.
Note that it is still required from the developer to not trigger too
many tests at once, to not monopolize the CI resources. The stages
should make that clearer to see, but don't solve it. Deleting tests
might solve it, but only if we delete a significant number of those
tests, which seems not desirable.
During the test, we `tee` the output to a log file in "/tmp".
We do that, because the test script cleans the working directory
several times, so the file cannot reside there.
Afterwards, we need to move the file back into the git-tree, so that
gitlab can archive it.
Previously that was done by "after_script", but the "after_script" may not
see the same "/tmp" as the test run ([1]). This needs to be done as part of the
"script" step.
[1] https://docs.gitlab.com/ee/ci/yaml/#after_script
The "check-{patch,tree}" jobs use the same container as the default
test on Fedora ("pages_build", which also builds our documentation).
Previously, we thus extended "t_fedora:35". But that way we also
got things that we didn't want (.nm_artifacts and .build@template).
Solve this differently, by letting the jobs directly define what they
need. It's not much more, than extending "t_fedora:35" and workaround
to drop stuff we don't want.
Our test is long and verbose. The output gets truncated after
a few megabytes, but sometimes it's interesting to see what
happens afterwards. Redirect also to a file and archive it.
The output of our test scripts is captured by gitlab. It does however
sanitize things that look like secrets. So it was reasonably save
to call `env` from within the test script.
Next, we will redirect (`tee`) the output of the test script to a
file and archive it. When we do that, the output does not get sanitized
and can be downloaded from the artifacts page.
Stop running `env` as part of the test script. Do it instead as a
separate step. After all, it is useful to see the environment variables
of the test. But sanitized.
It's true, that our gitlab-ci test mostly consists of building NetworkManager.
Hence the name of the script was not entirely wrong. But it's not only building.
I think "run-test.sh" is a much better name. Rename.
"nm-code-format.sh" is going to change the default behavior from "-n" to
"-i", that is, from check-only to reformat. Explicitly pass "-n" where
we want it.
...
File "/usr/local/lib/python3.9/site-packages/jinja2/environment.py", line 1361, in generate
yield self.environment.handle_exception()
File "/usr/local/lib/python3.9/site-packages/jinja2/environment.py", line 925, in handle_exception
raise rewrite_traceback_stack(source=source)
File ".gitlab-ci/ci.template", line 178, in top-level template code
{% if not version in distro.always and (distro.name != pages_build.name or version != pages_build.version) %}
jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'always'
- the container that is also "pages_build" should always
run automatically. This can replace the "always" tag.
- comment out the "always: 33" part, because we no longer need
it. It was also wrong, because by now we should run Fedora 34
automatically.
Ubuntu/Debian and CentOS/Fedora are sufficiently similar that it's
better that we have only one variant of ".gitlab-ci/*-install.sh"
and "contrib/*/REQUIRED_PACKAGES".
This was already the case, however, we used to symlink
".gitlab-ci/centos-install.sh" to "fedora-install.sh". That
worked, but it didn't scale very well. For example, if we would follow
that pattern, we would also need a symlink "contrib/centos/REQUIRED_PACKAGES"
Or should "contrib/centos" symlink to "contrib/fedora"? That seems even
more wrong.
We already had the "distro.base_type" variable for that. Make use of
that also for the install script.
ci-templates builds and caches the test containers. When the build
scripts, the ci-template or "config.yml" changes, we need to bump
the tag so that the containers get rebuild.
Partly automate this. The tag now gets generated by the template and
contains a checksum of certain build files. Thus, if you change
any build files, then `ci-fairy generate-template` would generate a
different tag. You can not miss that, because we have tests that ensure
that our ".gitlab-ci.yml" is up to date. Also, you no longer need to
manually bump the tag when a build script changes, just regenerate
".gitlab-ci.yml" with `ci-fairy generate-template`.
See also: https://gitlab.freedesktop.org/freedesktop/ci-templates/-/merge_requests/54
The goal is to run most distros only manually. However, it would be nice
to avoid (manually) clicking twice to start the tests for one distro:
once for the container preparation, and once for the actual test.
Previously, the container prep part was set to manual and the actual
test automatic. It worked almost as desired, except that this leads
to the entire gitlab-ci pipeline be be in running state indefinitely.
To fix that, always run the container prep steps. If the container is
cached, this is supposed to be fast and cheap. Now only the actual tests
are marked as "manual".
It seems "pages" test does not get properly triggered, if only
t_fedora:33 completes. It should, because the other distros are
optional. Try to set "needs" to fix that.
Certain parts of the code are entirely generated or must follow
a certain format that can be enforced by a tool. These invariants
must never fail:
- ci-fairy generate-template (check-ci-script)
- black python formatting
- clang-format C formatting
- msgfmt -vs
On the other hand, we also have a checkpatch script that checks
the current patch for common errors. These are heuristics and
only depend on the current patch (contrary to the previous type
that depend on the entire source tree).
Refactor the gitlab-ci tests:
- split "checkpatch" into "check-patch" and "check-tree".
- merge the "check-ci-script" test into "check-tree".
All the steps of "checkpatch" test (except the last) check
the current tree for consistency. Those checks must always
pass.
Only the last step calls the "checkpatch-feature-branch.sh".
That script checks for common patterns, like avoiding g_assert()
(in favor of other assertion types). That last check only checks
the current patch, and there are many cases where the test is
known to fail (because these are just heuristics). As such, the
step that may fail should be called as last.
ci-templates encourages building specific containers that can be re-used:
- containers are re-used across pipelines, producing consistent results
- containers are re-used by contributors since they will use the upstream
containers for their MR, thus guaranteeing the same results.
Containers are automatically rebuild whenever the respective
FDO_DISTRIBUTION_TAG changes. This is particularly interesting now that
Docker Hub will introduce pull limits.
This CI script consists of a config file and a jinja2 template, simply
running 'ci-fairy generate-template' produces the .gitlab-ci.yml.
ci-fairy is part of the freedesktop.org ci-templates and can be pip
installed, see the check-ci-script job.
Functional changes to the previous script:
- new job: check-ci-script, verifies that our gitlab-ci.yml is the one
generated by the sources
- Added distributions:
- Fedora 33
- The actual work is now down by a set of scripts in .gitlab-ci/,
specifically:
- .gitlab-ci/build.sh is the previous do_build job
- .gitlab-ci/{fedora|debian}-install.sh are the previous {fedora|debian}_install jobs
symlinks are in place for centos and ubuntu
Why the scripts instead of steps in the CI? Easer to reading and
reproduce. With the containers being static, it's easy to pull one
locally and re-run the CI job to reproduce an issue. Having everything in a
single script makes that trivial.
Signed-off-by: Peter Hutterer <peter.hutterer@who-t.net>
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/664