we are currently using something like /run/user/UID/run as runroot, as
it is already done by Buildah. This ends up with
/run/user/UID/run/runc for the runc directory. Change to drop the
additional /run so that runc will use /run/user/UID/runc.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
It was setting the wrong variable (CamelCase)
in the wrong module ("main", not "libpod")...
Signed-off-by: Anders F Björklund <anders.f.bjorklund@gmail.com>
for the purposes of performance and security, we use securejoin to contstruct
the root fs's path so that symlinks are what they appear to be and no pointing
to something naughty.
then instead of chrooting to parse /etc/passwd|/etc/group, we now use the runc user/group
methods which saves us quite a bit of performance.
Signed-off-by: baude <bbaude@redhat.com>
We are still requiring oci-systemd-hook to be installed in order to run
systemd within a container. This patch properly mounts
/sys/fs/cgroup/systemd/libpod_parent/libpod-UUID on /sys/fs/cgroup/systemd inside of container.
Since we need the UUID of the container, we needed to move Systemd to be a config option of the
container.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
If someone runs podman as a user (uid) that is not defined in the container
we want generate a passwd file so that getpwuid() will work inside of container.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
As of now, there is no way to debug podman clean up processes.
They are started by conmon with no stdout/stderr and log nowhere.
This allows us to actually figure out what is going on when a
cleanup process runs.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Add the --ip flag back with bash completions. Manpages still
missing.
Add plumbing to pass appropriate the appropriate option down to
libpod to connect the flag to backend logic added in the previous
commits.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
the issue is caused by the Go Runtime that messes up with the process
signals, overriding SIGSETXID and SIGCANCEL which are used internally
by glibc. They are used to inform all the threads to update their
stored uid/gid information. This causes a hang on the set*id glibc
wrappers since the handler installed by glibc is never invoked.
Since we are running with only one thread, we don't really need to
update other threads or even the current thread as we are not using
getuid/getgid before the execvp.
Closes: https://github.com/containers/libpod/issues/1625
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
The vendoring issues with libnetwork were significant (it was
dragging in massive amounts of code) and were just not worth
spending the time to work through. Highly unlikely we'll ever end
up needing to update this code, so move it directly into pkg/ so
we don't need to vendor libnetwork. Make a few small changes to
remove the need for the remainder of libnetwork.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
To work better with Kata containers, we need to delete() from the
OCI runtime as a part of cleanup, to ensure resources aren't
retained longer than they need to be.
To enable this, we need to add a new state to containers,
ContainerStateExited. Containers transition from
ContainerStateStopped to ContainerStateExited via cleanupRuntime
which is invoked as part of cleanup(). A container in the Exited
state is identical to Stopped, except it has been removed from
the OCI runtime and thus will be handled differently when
initializing the container.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
* Update varlink document
* Add NoContainersInPod error in go and python
* Add support for varlink pod interface
* New code passes pylint
* Fix bug in test_runner.sh
* Update integration tests for race condition on status check
* Add missing port config file support
Signed-off-by: Jhon Honce <jhonce@redhat.com>
For the sake of debug and problem reporting, we would benefit from knowing
what buildah version was vendored into podman. Also, knowing the distribution
and distribution version would also be handy.
Signed-off-by: baude <bbaude@redhat.com>
We added a timeout for convenience, but most invocations don't
care about it. Refactor it into WaitWithTimeout() and add a
Wait() that doesn't require a timeout and uses the default.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Closes: #1527
Approved by: mheon
Most container images assume there are at least 65536 UIDs/GIDs
available. Raise an error if there are not enough IDs allocated to
the current user.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1520
Approved by: rhatdan
Also update some missing fields libpod.conf obtions in man pages.
Fix sort order of security options and add a note about disabling
labeling.
When a process requests a new label. libpod needs to reserve all
labels to make sure that their are no conflicts.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1406
Approved by: mheon
Firstly, when adding the privileged catch-all resource device,
first remove the spec's default catch-all resource device.
Second, remove our default rootfs propogation config - Docker
does not set this by default, so I don't think we should either.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Closes: #1491
Approved by: TomSweeneyRedHat
when using the varlink api, we should pass on the registries information
as is present in the cli info command.
Signed-off-by: baude <bbaude@redhat.com>
This matches Docker behavior more closely and should resolve an
issue we were seeing with /sys mounts
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Closes: #1465
Approved by: rhatdan
Waiting uses a lot of CPU, so drop back to checking once/second
and allow user to pass in the interval.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
This is an incomplete fix, as it would be best for the libpod library to be in charge of coordinating the container's dependencies on the infra container. A TODO was left as such. UTS is a special case, because the docker library that namespace handling is based off of doesn't recognize a UTS based on another container as valid, despite the library being able to handle it correctly. Thus, it is left in the old way.
Signed-off-by: haircommander <pehunt@redhat.com>
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1347
Approved by: mheon
Use the new firewall code vendored from CNI to replace the
existing iptables rule addition handler we had in place. This
adds proper support for firewalld and should be much better at
interacting with the firewall.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Closes: #1431
Approved by: baude
The upstream CNI project has a PR open for adding iptables and
firewalld support, but this has been stalled for the better part
of a year upstream.
On advice of several maintainers, we are vendoring this code into
libpod, to perform the relevant firewall configuration ourselves.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Closes: #1431
Approved by: baude
We should be sharing cgroups namespace by default in pods
uts namespace sharing was broken in pods.
Create a new libpod/pkg/namespaces for handling of namespace fields
in containers
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1418
Approved by: mheon
When there was a conflict between a user-added volume and a mount
already in the spec, we previously respected the mount already in
the spec and discarded the user-added mount. This is counter to
expected behavior - if I volume-mount /dev into the container, I
epxect it will override the default /dev in the container, and
not be ignored.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Closes: #1419
Approved by: TomSweeneyRedHat
When user-specified volume mounts overlap with mounts already in
the spec, remove the mount in the spec to ensure there are no
conflicts.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Closes: #1419
Approved by: TomSweeneyRedHat
Podman logs was not parsing CRI logs well, especially
the F and P logs. Now using the same parsing code as
in kube here.
Signed-off-by: umohnani8 <umohnani@redhat.com>
Closes: #1403
Approved by: rhatdan
change the tests to use chroot to set a numeric UID/GID.
Go syscall.Credential doesn't change the effective UID/GID of the
process.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1372
Approved by: mheon
This will help document the defaults in podman build.
podman build --help will now show the defaults and mention
the environment variables that can be set to change them.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1364
Approved by: mheon
In the API docs, we generally state the type of error that should be returned
if a container or image cannot be found. In several cases, the code did not
match the API doc, when the API doc was correct.
Signed-off-by: baude <bbaude@redhat.com>
Closes: #1353
Approved by: rhatdan
Unfortunately this is not enough to get it working as runc doesn't
allow to bind mount /proc.
Depends on: https://github.com/opencontainers/runc/pull/1832
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1349
Approved by: rhatdan
Fix the test for checking when /sys must be bind mounted from the
host. It should be done only when userNS are enabled (the
!UsernsMode.IsHost() check is not enough for that).
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1349
Approved by: rhatdan
Manage the case where the main process of the container creates and
joins a new user namespace.
In this case we want to join only the first child in the new
hierarchy, which is the user namespace that was used to create the
container.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1331
Approved by: rhatdan
We cannot re-exec into a new user namespace to gain privileges and
access an existing as the new namespace is not the owner of the
existing container.
"unshare" is used to join the user namespace of the target container.
The current implementation assumes that the main process of the
container didn't create a new user namespace.
Since in the setup phase we are not running with euid=0, we must skip
the setup for containers/storage.
Closes: https://github.com/containers/libpod/issues/1329
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1331
Approved by: rhatdan
Also it fix the issue of exposing both tc/udp port even if
only one proto specified.
Signed-off-by: Kunal Kushwaha <kushwaha_kunal_v7@lab.ntt.co.jp>
Closes: #1325
Approved by: mheon
I am often asked about the list of capabilities availabel to a container.
We should be listing this data in the inspect command for effective
capabilities and the bounding set.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1335
Approved by: TomSweeneyRedHat
As well as small style corrections, update pod_top_test to use CreatePod, and move handling of adding a container to the pod's namespace from container_internal_linux to libpod/option.
Signed-off-by: haircommander <pehunt@redhat.com>
Closes: #1187
Approved by: mheon
A pause container is added to the pod if the user opts in. The default pause image and command can be overridden. Pause containers are ignored in ps unless the -a option is present. Pod inspect and pod ps show shared namespaces and pause container. A pause container can't be removed with podman rm, and a pod can be removed if it only has a pause container.
Signed-off-by: haircommander <pehunt@redhat.com>
Closes: #1187
Approved by: mheon
This results in some functionality changes:
If a ErrCtrStateInvalid is returned to GetPodStats, the container is ommitted from the stats.
As such, if an empty slice of Container stats are returned to GetPodStats in varlink, an error will occur.
GetContainerStats will return the ErrCtrStateInvalid as well.
Finally, if ErrCtrStateInvalid is returned to the podman stats call, the container will be ommitted from the stats.
Signed-off-by: haircommander <pehunt@redhat.com>
Closes: #1319
Approved by: baude
When a non-nil process was used and a hook was set to match
always, this would not actually match. Fix this.
Fixes: #1308
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Closes: #1311
Approved by: rhatdan
Devices are supposed to be able to be passed in via the form of
--device /dev/foo
--device /dev/foo:/dev/bar
--device /dev/foo:rwm
--device /dev/foo:/dev/bar:rwm
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1299
Approved by: umohnani8
Do not set any hostname value in the OCI configuration when --uts=host
is used and the user didn't specify any value. This prevents an error
from the OCI runtime as it cannot set the hostname without a new UTS
namespace.
Differently, the HOSTNAME environment variable is always set. When
--uts=host is used, HOSTNAME gets the value from the host.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1280
Approved by: baude
To better reflect it's usage: to share functions between podman and varlink.
Signed-off-by: haircommander <pehunt@redhat.com>
Closes: #1275
Approved by: mheon
Need to get some small changes into libpod to pull back into buildah
to complete buildah transition.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1270
Approved by: mheon
We need to pass the image format OCI or docker in the varlink commit command.
Signed-off-by: Qi Wang <qiwan@redhat.com>
Closes: #1281
Approved by: mheon
Hostname should be set to the hosts hostname when network is none.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1274
Approved by: giuseppe
Use a pipe instead of a temporary file to load the apparmor profile.
This change has a measurable speed improvement for apparmor users.
Signed-off-by: Valentin Rothberg <vrothberg@suse.com>
Closes: #1262
Approved by: mheon
Move all Linux-related data under the corresponding buildtags to reduce
the memory footprint and speed up compilation for non-apparmor builds.
Signed-off-by: Valentin Rothberg <vrothberg@suse.com>
Closes: #1262
Approved by: mheon
when searching multiple registries for images, if we get an error on one
of the searches, we should keep going and complete the search. if there
is only one search registry however, we will return an error.
Resolves: #1255
Signed-off-by: baude <bbaude@redhat.com>
Closes: #1257
Approved by: mheon
Lookup the current username by UID if the USER env variable is not
set.
Reported in: https://github.com/projectatomic/libpod/issues/1092
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1217
Approved by: rhatdan
Rather than making a runtime each time a client hits a varlink endpoint, we now
make a single runtime when the varlink service starts up. This fixes a problem
where we hit a max inotify limit from CNI.
Resolves: #1211
Signed-off-by: baude <bbaude@redhat.com>
Closes: #1215
Approved by: rhatdan
It is required only when directly configuring the user namespace.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1200
Approved by: rhatdan
The goal is to be very explicit about which functions try to heuristically
guess what is the expected format of the string. Not quite "shaming"
the users, but making sure they stand out.
RFC:
- Is this at all acceptable? Desirable?
- varlink ExportImage says "destination must have transport type";
should it be using alltransports.ParseImageReference
+ PushImageToReference, then?
(While touching the call in cmd/podman, also remove a commented-out
older version of the call.)
Should not change behavior (but does not add unit tests).
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Closes: #1176
Approved by: rhatdan
When removing an image via varlink, we should always return the
ID of the image even in the case where the image has multiple
repository names and one was only untagged.
Reported by jhonce during integration testing.
Signed-off-by: baude <bbaude@redhat.com>
Closes: #1191
Approved by: jwhonce
The CNI plugins upstream removed their network namespace creation
code, making it a test package only. Copy it into our repository
and slightly modify it for our use (most notably, use MNT_DETACH
when unmounting namespaces).
This new CNI code splits closing and unmounting network
namespaces, which allows us to greatly reduce the number of
occasions on which we call teardownNetwork() and make more errors
in that function fatal instead of warnings. Instead, we can call
Close() and just close the open file descriptor in cases where
the namespace has already been cleaned up.
Signed-off-by: Matthew Heon <matthew.heon@gmail.com>
Closes: #1165
Approved by: baude
If more than one volume was mounted using the --volume flag in
podman run, the second and onwards volumes were picking up options
of the previous volume mounts defined. Found out that the options were
not be cleared out after every volume was parsed.
Signed-off-by: umohnani8 <umohnani@redhat.com>
Closes: #1142
Approved by: mheon
Most images won't work without multiple ids/gids. Error out
immediately if there are no multiple ids available.
The error code when the user is not present in /etc/sub{g,u}id looks
like:
$ bin/podman run --rm -ti alpine echo hello
ERRO[0000] No subuid ranges found for user "gscrivano"
Closes: https://github.com/projectatomic/libpod/issues/1087
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1097
Approved by: rhatdan
unshare the mount namespace as well when creating an user namespace so
that we are the owner of the mount namespace and we can mount FUSE
file systems on Linux 4.18. Tested on Fedora Rawhide:
podman --storage-opt overlay.fuse_program=/usr/bin/fuse-overlayfs run alpine echo hello
hello
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This is a refresh of Dan William's PR #974 with a rebase and proper
vendoring of ocicni and containernetworking/cni. It adds the ability
to define multiple networks as so:
podman run --network=net1,net2,foobar ...
Signed-off-by: baude <bbaude@redhat.com>
Closes: #1082
Approved by: baude
when using the getattachsockets endpoint, which returns the sockets needed
to create and use a terminal, we should check if the container is just in the
configured state. if so, we need to perform a container init to have conmon
create the required sockets so we can attach to them prior to starting the container.
Signed-off-by: baude <bbaude@redhat.com>
Closes: #1067
Approved by: jwhonce
Make users of libpod more secure by adding the libpod/apparmor package
to load a pre-defined AppArmor profile. Large chunks of libpod/apparmor
come from github.com/moby/moby.
Also check if a specified AppArmor profile is actually loaded and throw
an error if necessary.
The default profile is loaded only on Linux builds with the `apparmor`
buildtag enabled.
Signed-off-by: Valentin Rothberg <vrothberg@suse.com>
Closes: #1063
Approved by: rhatdan
use execvp instead of exec so that we keep the PATH environment
variable and the lookup for the "podman" executable works.
Closes: https://github.com/projectatomic/libpod/issues/1070
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1072
Approved by: mheon
podman now supports --volumes-from flag, which allows users
to add all the volumes an existing container has to a new one.
Signed-off-by: umohnani8 <umohnani@redhat.com>
Closes: #931
Approved by: mheon
We added the explicit int64 casts for 32-bit builds in 35e1ad78 (Make
libpod build on 32-bit systems, 2018-02-12, #324), but the explicit
casts work fine on 64-bit systems too.
Signed-off-by: W. Trevor King <wking@tremily.us>
Closes: #1058
Approved by: mheon
This removes some boilerplate from the libpod package, so we can focus
on container stuff there. And it gives us a tidy sub-package for
focusing on ctime extraction, so we can focus on unit testing and
portability of the extraction utility there.
For the unsupported implementation, I'm falling back to Go's ModTime
[1]. That's obviously not the creation time, but it's likely to be
closer than the uninitialized Time structure from cc6f0e85 (more
changes to compile darwin, 2018-07-04, #1047). Especially for our use
case in libpod/oci, where we're looking at write-once exit files.
The test is more complicated than I initially expected, because on
Linux filesystem timestamps come from a truncated clock without
interpolation [2] (and network filesystems can be completely decoupled
[3]). So even for local disks, creation times can be up to a jiffie
earlier than 'before'. This test ensures at least monotonicity by
creating two files and ensuring the reported creation time for the
second is greater than or equal to the reported creation time for the
first. It also checks that both creation times are within the window
from one second earlier than 'before' through 'after'. That should be
enough of a window for local disks, even if the kernel for those
systems has an abnormally large jiffie. It might be ok on network
filesystems, although it will not be very resilient to network clock
lagging behind the local system clock.
[1]: https://golang.org/pkg/os/#FileInfo
[2]: https://groups.google.com/d/msg/linux.kernel/mdeXx2TBYZA/_4eJEuJoAQAJ
Subject: Re: Apparent backward time travel in timestamps on file creation
Date: Thu, 30 Mar 2017 20:20:02 +0200
Message-ID: <tqMPU-1Sb-21@gated-at.bofh.it>
[3]: https://groups.google.com/d/msg/linux.kernel/mdeXx2TBYZA/cTKj4OBuAQAJ
Subject: Re: Apparent backward time travel in timestamps on file creation
Date: Thu, 30 Mar 2017 22:10:01 +0200
Message-ID: <tqOyl-36A-1@gated-at.bofh.it>
Signed-off-by: W. Trevor King <wking@tremily.us>
Closes: #1050
Approved by: mheon
b96be3af (changes to allow for darwin compilation, 2018-06-20, #1015)
made AddPrivilegedDevices per-platform and cc6f0e85 (more changes to
compile darwin, 2018-07-04, #1047) made CreateBlockIO per-platform.
But both left but left out docs for the unsupported version [1]:
pkg/spec/config_unsupported.go:18:1⚠️ exported method
CreateConfig.AddPrivilegedDevices should have comment or be
unexported (golint)
pkg/spec/config_unsupported.go:22:1⚠️ exported method
CreateConfig.CreateBlockIO should have comment or be unexported
(golint)
To keep the docs DRY, I've restored the public methods and their docs,
and I've added new, internal methods for the per-platform
implementations.
[1]: https://travis-ci.org/projectatomic/libpod/jobs/400555937#L160
Signed-off-by: W. Trevor King <wking@tremily.us>
Closes: #1034
Approved by: baude
The files were split apart by b96be3af (changes to allow for darwin
compilation, 2018-06-20, #1015), but the C import and two functions
left in rootless.go are all Linux-specific as well. This commit moves
all of the pre-b96be3af rootless.go into rootless_linux.go, just
adding the '// +build linux' header (b96be3af also scrambled the + in
that header) and keeping the new GetRootlessUID from a1545fe6
(rootless: add function to retrieve the original UID, 2018-07-05, #1048).
Signed-off-by: W. Trevor King <wking@tremily.us>
Closes: #1034
Approved by: baude
this should represent the last major changes to get darwin to **compile**. again,
the purpose here is to get darwin to compile so that we can eventually implement a
ci task that would protect against regressions for darwin compilation.
i have left the manual darwin compilation largely static still and in fact now only
interject (manually) two build tags to assist with the build. trevor king has great
ideas on how to make this better and i will defer final implementation of those
to him.
Signed-off-by: baude <bbaude@redhat.com>
Closes: #1047
Approved by: rhatdan
After we re-exec in the userNS os.Getuid() returns the new UID (= 0)
which is not what we want to use.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #1048
Approved by: mheon
When we run containers in detach mode, nothing cleans up the network stack or
the mount points. This patch will tell conmon to execute the cleanup code when
the container exits.
It can also be called to attempt to cleanup previously running containers.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #942
Approved by: mheon
If the caller sets up the app to be in logrus.DebugLevel,
then we will add the --syslog flag to conmon to get all of the
messages.
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
Closes: #1014
Approved by: TomSweeneyRedHat
Catching up with opencontainers/runtime-tools@84a62c6a (generate: Move
Generator.spec to Generator.Config, 2016-11-06, #266, v0.6.0), now
that we've bumped runtime-tools in f6c0fc1a (Vendor in latest
runtime-tools, 2018-06-26, #1007).
Signed-off-by: W. Trevor King <wking@tremily.us>
Closes: #1008
Approved by: mheon
When running podman as non root user always create an userNS and let
the OCI runtime use it.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Closes: #936
Approved by: rhatdan