cmd/cgo: add implementation comment

R=golang-dev, r, bradfitz, iant
CC=golang-dev
https://golang.org/cl/7407050
This commit is contained in:
Russ Cox 2013-02-27 20:55:01 -08:00
parent 3b69efb010
commit 062a239046

View file

@ -134,3 +134,266 @@ See "C? Go? Cgo!" for an introduction to using cgo:
http://golang.org/doc/articles/c_go_cgo.html
*/
package main
/*
Implementation details.
Cgo provides a way for Go programs to call C code linked into the same
address space. This comment explains the operation of cgo.
Cgo reads a set of Go source files and looks for statements saying
import "C". If the import has a doc comment, that comment is
taken as literal C code to be used as a preamble to any C code
generated by cgo. A typical preamble #includes necessary definitions:
// #include <stdio.h>
import "C"
For more details about the usage of cgo, see the documentation
comment at the top of this file.
Understanding C
Cgo scans the Go source files that import "C" for uses of that
package, such as C.puts. It collects all such identifiers. The next
step is to determine each kind of name. In C.xxx the xxx might refer
to a type, a function, a constant, or a global variable. Cgo must
decide which.
The obvious thing for cgo to do is to process the preamble, expanding
#includes and processing the corresponding C code. That would require
a full C parser and type checker that was also aware of any extensions
known to the system compiler (for example, all the GNU C extensions) as
well as the system-specific header locations and system-specific
pre-#defined macros. This is certainly possible to do, but it is an
enormous amount of work.
Cgo takes a different approach. It determines the meaning of C
identifiers not by parsing C code but by feeding carefully constructed
programs into the system C compiler and interpreting the generated
error messages, debug information, and object files. In practice,
parsing these is significantly less work and more robust than parsing
C source.
Cgo first invokes gcc -E -dM on the preamble, in order to find out
about simple #defines for constants and the like. These are recorded
for later use.
Next, cgo needs to identify the kinds for each identifier. For the
identifiers C.foo and C.bar, cgo generates this C program:
<preamble>
void __cgo__f__(void) {
#line 1 "cgo-test"
foo;
enum { _cgo_enum_0 = foo };
bar;
enum { _cgo_enum_1 = bar };
}
This program will not compile, but cgo can look at the error messages
to infer the kind of each identifier. The line number given in the
error tells cgo which identifier is involved.
An error like "unexpected type name" or "useless type name in empty
declaration" or "declaration does not declare anything" tells cgo that
the identifier is a type.
An error like "statement with no effect" or "expression result unused"
tells cgo that the identifier is not a type, but not whether it is a
constant, function, or global variable.
An error like "not an integer constant" tells cgo that the identifier
is not a constant. If it is also not a type, it must be a function or
global variable. For now, those can be treated the same.
Next, cgo must learn the details of each type, variable, function, or
constant. It can do this by reading object files. If cgo has decided
that t1 is a type, v2 and v3 are variables or functions, and c4, c5,
and c6 are constants, it generates:
<preamble>
typeof(t1) *__cgo__1;
typeof(v2) *__cgo__2;
typeof(v3) *__cgo__3;
typeof(c4) *__cgo__4;
enum { __cgo_enum__4 = c4 };
typeof(c5) *__cgo__5;
enum { __cgo_enum__5 = c5 };
typeof(c6) *__cgo__6;
enum { __cgo_enum__6 = c6 };
long long __cgo_debug_data[] = {
0, // t1
0, // v2
0, // v3
c4,
c5,
c6,
1
};
and again invokes the system C compiler, to produce an object file
containing debug information. Cgo parses the DWARF debug information
for __cgo__N to learn the type of each identifier. (The types also
distinguish functions from global variables.) If using a standard gcc,
cgo can parse the DWARF debug information for the __cgo_enum__N to
learn the identifier's value. The LLVM-based gcc on OS X emits
incomplete DWARF information for enums; in that case cgo reads the
constant values from the __cgo_debug_data from the object file's data
segment.
At this point cgo knows the meaning of each C.xxx well enough to start
the translation process.
Translating Go
[The rest of this comment refers to 6g and 6c, the Go and C compilers
that are part of the amd64 port of the gc Go toolchain. Everything here
applies to another architecture's compilers as well.]
Given the input Go files x.go and y.go, cgo generates these source
files:
x.cgo1.go # for 6g
y.cgo1.go # for 6g
_cgo_gotypes.go # for 6g
_cgo_defun.c # for 6c
x.cgo2.c # for gcc
y.cgo2.c # for gcc
_cgo_export.c # for gcc
_cgo_main.c # for gcc
The file x.cgo1.go is a copy of x.go with the import "C" removed and
references to C.xxx replaced with names like _Cfunc_xxx or _Ctype_xxx.
The definitions of those identifiers, written as Go functions, types,
or variables, are provided in _cgo_gotypes.go.
Here is a _cgo_gotypes.go containing definitions for C.flush (provided
in the preamble) and C.puts (from stdio):
type _Ctype_char int8
type _Ctype_int int32
type _Ctype_void [0]byte
func _Cfunc_CString(string) *_Ctype_char
func _Cfunc_flush() _Ctype_void
func _Cfunc_puts(*_Ctype_char) _Ctype_int
For functions, cgo only writes an external declaration in the Go
output. The implementation is in a combination of C for 6c (meaning
any gc-toolchain compiler) and C for gcc.
The 6c file contains the definitions of the functions. They all have
similar bodies that invoke runtime·cgocall to make a switch from the
Go runtime world to the system C (GCC-based) world.
For example, here is the definition of _Cfunc_puts:
void _cgo_be59f0f25121_Cfunc_puts(void*);
void
·_Cfunc_puts(struct{uint8 x[1];}p)
{
runtime·cgocall(_cgo_be59f0f25121_Cfunc_puts, &p);
}
The hexadecimal number is a hash of cgo's input, chosen to be
deterministic yet unlikely to collide with other uses. The actual
function _cgo_be59f0f25121_Cfunc_flush is implemented in a C source
file compiled by gcc, the file x.cgo2.c:
void
_cgo_be59f0f25121_Cfunc_puts(void *v)
{
struct {
char* p0;
int r;
char __pad12[4];
} __attribute__((__packed__)) *a = v;
a->r = puts((void*)a->p0);
}
It extracts the arguments from the pointer to _Cfunc_puts's argument
frame, invokes the system C function (in this case, puts), stores the
result in the frame, and returns.
Linking
Once the _cgo_export.c and *.cgo2.c files have been compiled with gcc,
they need to be linked into the final binary, along with the libraries
they might depend on (in the case of puts, stdio). 6l has been
extended to understand basic ELF files, but it does not understand ELF
in the full complexity that modern C libraries embrace, so it cannot
in general generate direct references to the system libraries.
Instead, the build process generates an object file using dynamic
linkage to the desired libraries. The main function is provided by
_cgo_main.c:
int main() { return 0; }
void crosscall2(void(*fn)(void*, int), void *a, int c) { }
void _cgo_allocate(void *a, int c) { }
void _cgo_panic(void *a, int c) { }
The extra functions here are stubs to satisfy the references in the C
code generated for gcc. The build process links this stub, along with
_cgo_export.c and *.cgo2.c, into a dynamic executable and then lets
cgo examine the executable. Cgo records the list of shared library
references and resolved names and writes them into a new file
_cgo_import.c, which looks like:
#pragma dynlinker "/lib64/ld-linux-x86-64.so.2"
#pragma dynimport puts puts#GLIBC_2.2.5 "libc.so.6"
#pragma dynimport __libc_start_main __libc_start_main#GLIBC_2.2.5 "libc.so.6"
#pragma dynimport stdout stdout#GLIBC_2.2.5 "libc.so.6"
#pragma dynimport fflush fflush#GLIBC_2.2.5 "libc.so.6"
#pragma dynimport _ _ "libpthread.so.0"
#pragma dynimport _ _ "libc.so.6"
In the end, the compiled Go package, which will eventually be
presented to 6l as part of a larger program, contains:
_go_.6 # 6g-compiled object for _cgo_gotypes.go *.cgo1.go
_cgo_defun.6 # 6c-compiled object for _cgo_defun.c
_all.o # gcc-compiled object for _cgo_export.c, *.cgo2.c
_cgo_import.6 # 6c-compiled object for _cgo_import.c
The final program will be a dynamic executable, so that 6l can avoid
needing to process arbitrary .o files. It only needs to process the .o
files generated from C files that cgo writes, and those are much more
limited in the ELF or other features that they use.
In essence, the _cgo_import.6 file includes the extra linking
directives that 6l is not sophisticated enough to derive from _all.o
on its own. Similarly, the _all.o uses dynamic references to real
system object code because 6l is not sophisticated enough to process
the real code.
The main benefits of this system are that 6l remains relatively simple
(it does not need to implement a complete ELF and Mach-O linker) and
that gcc is not needed after the package is compiled. For example,
package net uses cgo for access to name resolution functions provided
by libc. Although gcc is needed to compile package net, gcc is not
needed to link programs that import package net.
Runtime
When using cgo, Go must not assume that it owns all details of the
process. In particular it needs to coordinate with C in the use of
threads and thread-local storage. The runtime package, in its own
(6c-compiled) C code, declares a few uninitialized (default bss)
variables:
bool runtime·iscgo;
void (*libcgo_thread_start)(void*);
void (*initcgo)(G*);
Any package using cgo imports "runtime/cgo", which provides
initializations for these variables. It sets iscgo to 1, initcgo to a
gcc-compiled function that can be called early during program startup,
and libcgo_thread_start to a gcc-compiled function that can be used to
create a new thread, in place of the runtime's usual direct system
calls.
*/