There's no need for an arch pointer to get segments. We can call the
routine directly since we don't need this code to be called from
different context where a pointer is needed.
Sponsored by: Netflix
Reviewed by: kevans, andrew
Differential Revision: https://reviews.freebsd.org/D38266
A number of bug fixes to loading kernels and modules on aarch64 and amd64.
Fix offset calcuations.
Add a number of debugs, commented out for now (will GC them in the future)
With this, and the MD aarch64 commands, we can linux boot in qemu and on
real hardware.
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38261
Copy more of the necessary state for FreeBSD to boot:
o Copy EFI memory tables
o Create custom page tables needed for the kernel to find itself
o Simplify the passing of args to the trampoline by putting them
on the stack rather than in dedicated memory.
This is only partially successful... we get only part way through the
amd64 startup code before dying. However, it's much further than before
the changes.
Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38259
Update exec.c (copyied from efi/loader/arch/arm64/exec.c) to allow
execution of aarch64 kernels. This includes a new trampoline code that
handles copying the UEFI memory map, if available from the Linux FDT
provided PA. This is a complete implementation now, able to boot from
the LinuxBoot environment on an aarch64 server that only offers
LinuxBoot (though a workaround for the gicv3 inability to re-init is not
yet in FreeBSD). Many 'fit and finish' issues will be addressed in
subsequent commits.
Sponsored by: Netflix
Reviewed by: tsoome, kevans, andrew
Differential Revision: https://reviews.freebsd.org/D38258
Connect efi's bootinfo.c to the kboot build, and adjust to use
the kboot specific routines.
The getrootmount() call is independent of EFI. Remove ifdefs so it's
called for kboot too.
The differences between the kboot and efi bootinfo.c files are now tiny.
This could use some more refactoring, but this is a working checkpoint.
Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38350
Since aarch64 is different, it needs a different smap. We first see if
we have the PA of the table from the FDT info. If so, we copy that and
quit. Otherwise, we do the best we can in translating the /proc/iomap
into EFI Memory Table format.
We also send the system table to the kernel.
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38255
Copy the EFI memory tables we were able to get into the MODINFOMD_SMAP
metadata area for the kernel.
Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38254
It's just a stub, since the kernel learns of memory via FDT.
Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38253
Each architecture will soon be required to provide this to load memory
maps as metadata for the platforms that require it (or a stub function
for those that don't).
Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38252
Now that all architectures provide this, enumerate the platform's memory
before we go to interact(). This needs to be done only once, but relies
on our ability to open host: files on some platforms, so it needs to be
done after devinit().
Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38251
Move memory enumeration to the enumerate_memory_arch(), tweak the code a
bit to make that fit into that framework.
Also fix a bug in the name of the end location. The old code never found
memory (though amd64 doesn't yet work, this lead to using fallback
addresses that were good enough for QEMU...).
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38250
We have an odd situation with aarch64 memory enumeration. The fdt that
we can get has a PA of the UEFI memory map, as modified by the current
running Linux kernel so it can retain those pages it needs for EFI and
other services. We have to pass in this EFI tablem but don't have access
to it in the boot loader. We do in the trampoline code, so a forthcoming
commit will copy it there for the kernel to use. All for want of /dev/mem
in the target environment sometimes.
However, we also have to find a place to load the kernel, so we have to
fallback to /proc/iomem when we can't read the UEFI memory map directly
from /dev/mem. It will give us good enough results to do this task. This
table isn't quite suitable to be converted to the EFI table, so we use
both methods. We'll fall back to this method also if there's no EFI
table advertised in the fdt. There's no /sys file on aarch64 that has
this information, hence using the old-style /proc/iomem. We're unlikely
to work if there's no EFI, though.
Note: The underlying Linux mechanism is different than the amd64 method
which seems like it should be MI, but unimplemented on aarch64.
Sponsored by: Netflix
Discussed with: kevans
Differential Revision: https://reviews.freebsd.org/D38249
Add stub for new MI interface for enumerating memory. Right now powerpc
looks in the FDT table at a later point in boot since we don't need to
pass a specific memory table to the kernel. Leave it like that for now,
but note plans for the future.
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38248
We'll be using this code for most / all of the platforms since iomem is
the only interface that can tell us of the reserved to the linux kernel
areas that we cannot place the new kernel into, but that we are free to
use once we hit trampoline. aarch64 will use this shortly, and similar
code in amd64 will be refactored when I make that platform work.
Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38309
Create segment handling code up to the top level. Move it all into
seg.c, and make necessary adjustments for it being in a new file,
including inventing print_avail() and first_avail() to print the array
and find the first large enough memory hole. aarch64 will use this,
and I'll refactor the other platforms to use it as I make them work.
Sponsored by: Netflix
Discussed with: kevans
Differential Revision: https://reviews.freebsd.org/D38308
enumerate_memory_arch is called once early in kboot's startup to allow
us to discover the memory layout, reserved areas, etc of the system
memory. Add the MI interface part of this.
Sponsored by: Netflix
Reviewed by: tsoome, kevans
Differential Revision: https://reviews.freebsd.org/D38247
Guess where to boot from when bootdev= isn't on the command line or
other config. Search all the disks and partitions for one that looks
like it could be a boot partition (same as we do when probing
zpools). Return the first one we find.
Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38319
Turns out that the loadaddr interface is not sufficiently expressive to
do the loading we need to do. Instead, we'll emulate some of its
features with inline math in copyin/copyout.
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38260
When converting from a Linux error to a FreeBSD errno, assert that the
value passed in is negative, as is Linux's custom.
Suggested by: brooks
Sponsored by: Netflix
Reviewed by: tsoome, brooks
Differential Revision: https://reviews.freebsd.org/D38357
To properly size segments, we have to know how much memory we have in
the system, as well as how much this process can allocate. Due to our
inability to overcommit, we need to know how much memory is
available. commit_limit is the grand total allowed. committed_as is the
current memory used. mem_avail is what Linux tells us is available. Find
these from /proc/meminfo. We'll use them later to allocate the biggest
possible segment sizes, but for now print the raw numbers.
Sponsored by: Netflix
Reviewed by: kevans (earlier version)
Differential Revision: https://reviews.freebsd.org/D38267
Translate the Linux error return from read to a FreeBSD errno. We use a
simplified translation: 1-34 are the same between the systems, so any of
those will be returned directly. All other errno map to EINVAL. This
will suffice for some code that reads /dev/mem in producing the right
diagnostic.
A fully generalized version is much harder. Linux has a number of errno
that don't translate well and has architecture dependent
encodings. Avoid this mess with a simple macro for now. Add comment
explaining why we use the simple method we do.
Sponsored by: Netflix
Reviewed by: kevans, andrew
Differential Revision: https://reviews.freebsd.org/D38265
The device name was totally wrong. It should be "/dev/mumble:" not just
"mumble".
Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D38318
We only need 64MB to read off ZFS pools. Since Linux doesn't do
ovecommit by default, the extra 64MB is 64MB less we can allocate for
things like RAM disks.
Sponsored by: Netflix
Reviewed by: kevans, andrew
Differential Revision: https://reviews.freebsd.org/D38268
Use the standard set_currdev instead of the (now very old) copy of
setting currdev and loaddev directly. We do this only when we don't go
find the ZFS pool to boot from.
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38012
When hostdisk_override is set, all the /dev devices are hidden, and only
the files in that directory are used. This will allow filesystem testing
on FreeBSD without root, for example. Adjust the parse routine to not
require devices start with /dev (plus fix a leak for an error
condition). Add a match routine to allow the device name to be something
like "/home/user/testing/zfsfoo:" instead of strictly in /dev. Note:
since we need to look at all the devices in the system to probe for ZFS
zpools, you can't generally use a full path to get a 'virtual disk' at
this time.
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38011
Fetch bootdev from the environment variable (so it should be set on the
command line). Default to 'zfs:' which will in the future look for the
first zpool that we can boot from. Prior versions of kboot would set
this from the second argument on the command line.
Fetch hostfs_root from the environment (defaulting to '/'). Prior
versions of kboot would set this from the first arg on the command line.
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38010
Now that all the pieces are in place, allow kboot to be built with ZFS
support.
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38009
Add the zfs device and filesystem to config and write the hook we need
to probe zfs since there's not a generic mechanism in place to do that
when ZFS is configured.
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38008
Add helper function to walk through the disk drives we've found to look
for zpools. main.c will still need to call this because the loader
hasn't implemented a good way to 'taste' drives for zpools and/or GELI
partitions (mostly because there's no generic list of candidate
devices).
Sponsored by: Netflix
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D38007
Keep a list of disks and partitions that we have. Keep track of the
sizes of the media and sector and use that to implement DIOCGMEDIASIZE
and DIOCGSECTORSIZE. Proivde a way to lookup disks by name.
Sponsored by: Netflix
Reviewed by: kevans (prior version)
Differential Revision: https://reviews.freebsd.org/D38013
ZFS uses a lot of memory. The old minimal allocations won't work when
ZFS support is added. Most environments this will be used (or will
liekly be used) have >> 256MB, 128MB should be safe everywhere and allow
examination of a fair number of ZFS pools to boot from.
Sponsored by: Netflix
Add the familiar macros for file types for stat's st_mode
member. Prepend HOST_ to the start of these. Make sure all the values
match the linux nolibc and uapi headers. These values are the same as
native values since they appear to be required by POSIX. Define anyway
to allow the reader of the code to know that they are in the 'host (eg
Linux)' namespace rather than the 'loader' namespace.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D37967
Linux pre-boot environments will often have a number of psuedo disks
that are small, all smaller than a few MB. 16MB is a good cutoff since
it's big enough to filter these devices, yet small enough to allow a
super-minimal partition through (the smallest I've been able to make
that's useful lately is around 20MB).
Sponsored by: Netflix
`int foo();` means 'a function that takes any number of arguments.`
not `a function that takes no arguemnts`, that's spelled `int foo(void);`
Adopt the latter.
Sponsored by: Netflix
Linux has /sys/firmware/fdt and /proc/device-tree to publish the dtb for
the system. The former has it all in one file, while the latter breaks
it out. Prefer the former since it's the more modern interface, but
retain both since I don't have a PS3 to test to see if its kernel is new
enough for /sys/firmware or not.
In addition, do the proper fixup.
Sponsored by: Netflix
Do the standard command line parsing... With a small twist to deal with
the quirks of booting via linuxboot to the initrd from the command line
in shell.efi and other observed oddities.
Sponsored by: Netflix
main() of the boot loader is expected to call devinit() early. We do
this at the same time we do it in the EFI loader (except we don't have a
buffer cache here, we don't need to initialize time and we don't have
special efi partition handles to enumerate). This is just after we probe
for the console.
Sponsored by: Netflix
Copy EFI's bootinfo.c and make minor adjustments for kboot's needs. Do
not connect this to the build just yet until other pieces are in place.
Sponsored by: Netflix
These are declared as extern in a number of files (some with the wrong
return type). Centralize this in modinfo.h and remove a few extra stray
declarations as well that are no longer used. No functional change.
Note: I've not tried to cope with the bi_load() functions which are the
same logical thing. These will be handled separately.
Sponsored by: Netflix
Some typedefs are system dependent, so move them into stat_arch.h where
they are used. On amd64, nlinks is a int64_t, while on aarch64 it's an
int (or int32_t).
Sponsored by: Netflix
For the 64-bit platforms, this is a nop. Currently kboot only supports
64-bit platforms, though. If we support 32-bit in the future, this will
become important.
Noticed by: rpokala
Sponsored by: Netflix
Added missing functionality to allow us to boot off of things like
/dev/nvme0n1p2 successfully. And to list all available devices and
partitions with 'lsdev'.
Sponsored by: Netflix
Use the system's firmware memory map to find a good place to put the
kernel that won't stomp on anything else. While this uses obstensibly MI
interfaces to get this data, arm64 doesn't have this, nor does
powerpc64, so place it here.
Sponsored by: Netflix
We can use devparse directly now. No need to invent a kboot_parsedev
that just does what devparse does now that we've refactored.
Sponsored by: Netflix
Most of the files in /sys/ and /proc/ are small with one value. Create
two routines to help us read the file and decode that value.
Sponsored by: Netflix
Add hostfs for the Linux environment. We can't use the userboot one
that's kinda similar because the Linux system calls we have in kboot are
not quite POSIX compliant (Linux takes care of providing the POSIX
interface in libc), so we have to cope with a number of quirks in that
area.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D36607
The fixups needed vary somewhat by architecture, so move the FDT fixup
to be per-arch. Rename the fdt_linux_fixup() routine to be
fdt_arch_fixup() and expect all architecutres to fix things up as
needed.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D36604
We assume in all the code that a DEVT_DISK uses common/disk.c and/or
common/part.c and we can access a struct disk_devdesc. hostdisk.c
opens raw devices directly, so has no such structures. Define a
kboot-specific DEVT_HOSTDISK and use that instead.
In addition, disk_fmtdev assumes it is working with a struct
disk_devdesc, so write hostdisk_fmtdev as well.
Sponsored by: Netflix
The load address computations are highly architecture specific. There
are generic ways that are augmented by specific constraints of specific
way things work on each architecture. Move the current load segment
computations into a MD routine load_addr.
As part of the move, I'm marking kboot_get_kernel_machine_bits as
unused. This arrived in a prior commit, but never seems to have been
connected, suggesting an incomplete merge at the time, or a path not yet
taken.
Create a stub for amd64 that will be filled in with a later commit.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D36603
Normally in the boot loader, we key off of MACHINE since that specifies
the kernel and the loader is very tuned to each type of MACHINE in
general. In this case, however, we're producing a Linux binary, with
Linux system calls encoded in it. These align better along the
MACHINE_ARCH axis of FreeBSD. For PowerPC the system calls are radically
different for each of our MACHINE_ARCHes, with only powerpc64 and
powerpc64le sharing the same numbers and memory layout. The same was
true about mips when it was in the tree. 32-bit arm uses the same
layout, however, for both armv6 and armv7 ports: that can be easily
shared in the unlikely event we support that in the future.
Sponsored by: Netflix
It is desirable to run kboot as the first program in some LinuxBoot
environments. This is the traditional "pid 1" or "init" program. When
running as pid 1. rovide a minimal environment based on what sysvinit,
u-root, initramfs-tools and other like projects do. We mount /dev, /sys,
/proc, make symlinks from /dev/fd to /dev/proc, and create /tmp, /run,
and /var. We also setup stdin/out/err to the console, set the tty
characteristics of same and block the appropriate signals.
This is indended as an environment that never does a fork/exec. If
that's required, the process groups, session leaders and all things
POSIX terminal handlers will need to be added.
Unlike the general purpose linux projects in this area, no attempt is
made to support very old kernels.
When not pid 1, we skip all of the above.
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D36368
All of the archsw fmtdev functions treat DEVT_DISK as a call to
disk_fmtdev. Set all disks' dv_fmtdev to disk_fmtdev so devformat
will return the same thing.
Sponsored by: Netflix
Reviewed by: tsoome (prior version)
Differential Revision: https://reviews.freebsd.org/D35917
Put the console into raw mode on startup. This allows the menus to work
as expected. Boot is now interruptable.
Note: Likely should restore the terminal settings on most exists. It's
not clear the best way to do this, and most shells have an auto stty
sane anyway, so note it for future improvement.
Sponsored by: Netflix
Implement a stripped down termios, obtained from various files in musl
and HOST_ or host_ prepended to most things and a few unavoidable style
tweaks. Only implements the bits of termios we need for the boot loader:
put the terminal into raw mode, restore terminal settings and speed
stuff.
Sponsored by: Netflix
Clients of libsa are expected to implement exit(). The current exit just
loops forever. It is better to really exit: when running as init that
will reboot the system. When not running as init, other programs can
recover (not that we support running as init, but when we do in the
future, this is still the rigtht thing).
Sponsored by: Netflix
Add support for aarch64. exec.c and ldscript are copied from the EFI
version with #ifdefs for the differences. Once complete, I'll refactor
them. host_syscall.S implements a generic system call. tramp.S is a
first attempt to create a tramoline that we can use to jump to the
aarch64 kernel. Add aarch64-specific startup and stat files as well.
exec.c tweaked slightly to avoid bringing in bi_load(), which will come
in later. Includes tweaks to stat due to name differences between names
on different Linux architectures.
Sponsored by: Netflix
conf.c is the same now between powerpc64 and amd64, so move it up to
kboot. Move powerpc file formats defines to ppc64_elf_freebsd.c
Sponsored by: Netflix
This was copied from powerpc/ofw and has never been used. We also don't
care about -DAIM. It's only relevant for in-kernel structures, which we
don't use in this userland program.
Sponsored by: Netflix
Linux 2.4 introduced getdents64. Switch to using it because aarch64
doesn't have getdents as that syscall was obsoleted before that port was
created.
Sponsored by: Netflix
dv_cleanup is specified almost everywhere. Use nullsys instead of NULL
to indicate 'do nothing'. Also, be consistent in trailing commas that
were missing before.
Sponsored by: Netflix
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D35913
Create a wrapper for the mount system call. To ensure a sane early boot
environment and to gather data we need for kexec, we may need to mount
some special filesystems.
Sponsored by: Netflix
Early in boot, we need to create the normal stdin/out/err env for the
boot loader to run in. To do that, we need to open the console and
duplicate the file descriptors which requires dup(2). Implement a
wrapper as host_dup.
Sponsored by: Netflix
Linux's /dev/fd is implemented inside of /proc/self/fd, so we may need
to create a symlink to it early in boot. "/dev/fd" and "/dev/std*" might
not be strictly required for the boot loader, but should be present for
maximum flexibility.
Sponsored by: Netflix
Add host_getpid() so we can know if we're running as init(8) or not. If
we are, we may chose to do early system setup / sanity operations.
Sponsored by: Netflix
Implement stat(2) and fstat(2) in terms of newfstatat and newfstat
system calls respectively (assume we have a compat #define when
there's no newfstat and just a regular fstat and do so for ppc).
Snag struct kstat (the Linux kernel stat(2), et al interface) from musl
and attribute properly.
Sponsored by: Netflix
Add the common O_ constants for the open, fcntl, etc system calls. They
are different than FreeBSD's. While they can differ based on
architecture, they are constant for architectures we care about, and
those architectures use the 'generic' version so future architectures
will also work.
Sponsored by: Netflix
Fallback to currdev when NULL is passed in when 'rootdev' is NULL. Other
getdevs do this. Additional features are needed here still, though.
Sponsored by: Netflix
Split _start into _start and _start_c (inspired by musl and the powerpc
impl is copied from there). This allows us to actually get the command
line arguments on all the platforms. We have a very simplified startup
that supports only static linking.
Sponsored by: Netflix