From 18c5b488a3b2e218c0e0cf2a7d4820d9da93a554 Mon Sep 17 00:00:00 2001 From: Robert Griesemer Date: Sun, 2 Mar 2008 20:47:34 -0800 Subject: [PATCH] Go spec starting point. SVN=111041 --- doc/go_spec | 1197 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1197 insertions(+) create mode 100644 doc/go_spec diff --git a/doc/go_spec b/doc/go_spec new file mode 100644 index 0000000000..b9fc63912e --- /dev/null +++ b/doc/go_spec @@ -0,0 +1,1197 @@ +The Go Annotated Specification + +This document supersedes all previous Go spec attempts. The intent is +to make this a reference for syntax and semantics. It is annotated +with additional information not strictly belonging into a language +spec. + + +Recent design decisions + +A list of decisions made but for which we haven't incorporated proper +language into this spec. Keep this section small and the spec +up-to-date instead. + +- multi-dimensional arrays: implementation restriction for now + +- no '->', always '.' +- (*a)[i] can be sugared into: a[i] +- '.' to select package elements + +- arrays are not automatically pointers, we must always say + explicitly: "*array T" if we mean a pointer to that array +- there is no pointer arithmetic in the language +- there are no unions + +- packages: need to pin it all down + +- tuple notation: (a, b) = (b, a); + generally: need to make this clear + +- for now: no (C) 'static' variables inside functions + +- exports: we write: 'export a, b, c;' (with a, b, c, etc. a list of + exported names, possibly also: structure.field) +- the ordering of methods in interfaces is not relevant +- structs must be identical (same decl) to be the same + (Ken has different implementation: equivalent declaration is the + same; what about methods?) + +- new methods can be added to a struct outside the package where the + struct is declared (need to think through all implications) +- array assignment by value +- do we need a type switch? + +- write down scoping rules for statements + +- semicolons: where are they needed and where are they not needed. + need a simple and consistent rule + +- we have: postfix ++ and -- as statements + + + +Guiding principles + +Go is an attempt at a new systems programming language. +[gri: this needs to be expanded. some keywords below] + +- small, concise, crisp +- procedural +- strongly typed +- few, orthogonal, and general concepts +- avoid repetition of declarations +- multi-threading support in the language +- garbage collected +- containers w/o templates +- compiler can be written in Go and so can it's GC +- very fast compilation possible (1MLOC/s stretch goal) +- reasonably efficient (C ballpark) +- compact, predictable code + (local program changes generally have local effects) +- no macros + + +Syntax + +The syntax of Go borrows from the C tradition with respect to +statements and from the Pascal tradition with respect to declarations. +Go programs are written using a lean notation with a small set of +keywords, without filler keywords (such as 'of', 'to', etc.) or other +gratuitous syntax, and with a slight preference for expressive +keywords (e.g. 'function') over operators or other syntactic +mechanisms. Generally, "light" language features (variables, simple +control flow, etc.) are expressed using a light-weight notation (short +keywords, little syntax), while "heavy" language features use a more +heavy-weight notation (longer keywords, more syntax). + +[gri: should say something about syntactic alternatives: if a +syntactic form foreseeably will lead to a style recommendation, try to +make that the syntactic form instead. For instance, Go structured +statements always require the {} braces even if there is only a single +sub-statement. Similar ideas apply elsewhere.] + + +Modularity, identifiers and scopes + +A Go program consists of one or more files compiled separately, though +not independently. A single file or compilation unit may make +individual identifiers visible to other files by marking them as +exported; there is no "header file". The exported interface of a file +may be exposed in condensed form (without the corresponding +implementation) through tools. + +A package collects types, constants, functions, and so on into a named +entity that may be imported to enable its constituents be used in +another compilation unit. Each source file is part of exactly one +package; each package is constructed from one source file. + +Within a file, all identifiers are declared explicitly (expect for +general predeclared identifiers such as true and false) and thus for +each identifier in a file the corresponding declaration can be found +in that same file (usually before its use, except for the rare case of +forward declarations). Identifiers may denote program entities that +are implemented in other files. Nevertheless, such identifiers are +still declared via an import declaration in the file that is referring +to them. This explicit declaration requirement ensures that every +compilation unit can be read by itself. + +The scoping of identifiers is uniform: An identifier is visible from +the point of its declaration to the end of the immediately surrounding +block, and nested identifiers shadow outer identifiers with the same +name. All identifiers are in the same namespace; i.e., no two +identifiers in the same scope may have the same name even if they +denote different language concepts (for instance, such as variable vs +a function). Uniform scoping rules make Go programs easier to read +and to understand. + + +Program structure + +A compilation unit consists of a package specifier followed by import +declarations followed by other declarations. There are no statements +at the top level of a file. [gri: do we have a main function? or do +we treat all functions uniformly and instead permit a program to be +started by providing a package name and a "start" function? I like +the latter because if gives a lot of flexibility and should be not +hard to implement]. [r: i suggest that we define a symbol, main or +Main or start or Start, and begin execution in the single exported +function of that name in the program. the flexibility of having a +choice of name is unimportant and the corresponding need to define the +name in order to link or execute adds complexity. by default it +should be trivial; we could allow a run-time flag to override the +default for gri's flexibility.] + + +Typing, polymorphism, and object-orientation + +Go programs are strongly typed; i.e., each program entity has a static +type known at compile time. Variables also have a dynamic type, which +is the type of the value they hold at run-time. Generally, the +dynamic and the static type of a variable are identical, except for +variables of interface type. In that case the dynamic type of the +variable is a pointer to a structure that implements the variable's +(static) interface type. There may be many different structures +implementing an interface and thus the dynamic type of such variables +is generally not known at compile time. Such variables are called +polymorphic. + +Interface types are the mechanism to support an object-oriented +programming style. Different interface types are independent of each +other and no explicit hierarchy is required (such as single or +multiple inheritance explicitly specified through respective type +declarations). Interface types only define a set of functions that a +corresponding implementation must provide. Thus interface and +implementation are strictly separated. + +An interface is implemented by associating functions (methods) with +structures. If a structure implements all methods of an interface, it +implements that interface and thus can be used where that interface is +required. Unless used through a variable of interface type, methods +can always be statically bound (they are not "virtual"), and incur no +runtime overhead compared to an ordinary function. + +Go has no explicit notion of classes, sub-classes, or inheritance. +These concepts are trivially modeled in Go through the use of +functions, structures, associated methods, and interfaces. + +Go has no explicit notion of type parameters or templates. Instead, +containers (such as stacks, lists, etc.) are implemented through the +use of abstract data types operating on interface types. [gri: there +is some automatic boxing, semi-automatic unboxing support for basic +types]. + + +Pointers and garbage collection + +Variables may be allocated automatically (when entering the scope of +the variable) or explicitly on the heap. Pointers are used to refer +to heap-allocated variables. Pointers may also be used to point to +any other variable; such a pointer is obtained by "getting the +address" of that variable. In particular, pointers may point "inside" +other variables, or to automatic variables (which are usually +allocated on the stack). Variables are automatically reclaimed when +they are no longer accessible. There is no pointer arithmetic in Go. + + +Functions + +Functions contain declarations and statements. They may be invoked +recursively. Functions may declare nested functions, and nested +functions have access to the variables in the surrounding functions, +they are in fact closures. Functions may be anonymous and appear as +literals in expressions. + + +Multithreading and channels + +[Rob: We need something here] + + + + +Notation + +The syntax is specified in green productions using Extended +Backus-Naur Form (EBNF). In particular: + +'' encloses lexical symbols +| separates alternatives +() used for grouping +[] specifies option (0 or 1 times) +{} specifies repetition (0 to n times) + +A production may be referred to from various places in this document +but is usually defined close to its first use. Code examples are +written in gray. Annotations are in blue, and open issues are in red. +One goal is to get rid of all red text in this document. [r: done!] + + +Vocabulary and representation + +REWRITE THIS: BADLY EXPRESSED + +Go program source is a sequence of characters. Each character is a +Unicode code point encoded in UTF-8. + +A Go program is a sequence of symbols satisfying the Go syntax. A +symbol is a non-empty sequence of characters. Symbols are +identifiers, numbers, strings, operators, delimiters, and comments. +White space must not occur within symbols (except in comments, and in +the case of blanks and tabs in strings). They are ignored unless they +are essential to separate two consecutive symbols. + +White space is composed of blanks, newlines, carriage returns, and +tabs only. + +A character is a Unicode code point. In particular, capital and +lower-case letters are considered as being distinct. Note that some +Unicode characters (e.g., the character ä), may be representable in +two forms, as a single code point, or as two code points. For the +Unicode standard these two encodings represent the same character, but +for Go, these two encodings correspond to two different characters). + +Source encoding + +The input is encoded in UTF-8. In the grammar we use the notation + +utf8_char + +to refer to an arbitrary Unicode code point encoded in UTF-8. + +Digits and Letters + +octal_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' } . +decimal_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' } . +hex_digit = { '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' | 'a' | + 'A' | 'b' | 'B' | 'c' | 'C' | 'd' | 'D' | 'e' | 'E' | 'f' | 'F' } . +letter = 'A' | 'a' | ... 'Z' | 'z' | '_' . + +For now, letters and digits are ASCII. We may expand this to allow +Unicode definitions of letters and digits. + + +Identifiers + +An identifier is a name for a program entity such as a variable, a +type, a function, etc. + +identifier = letter { letter | decimal_digit } . + + +- need to explain scopes, visibility (elsewhere) +- need to say something about predeclared identifiers, and their + (universe) scope (elsewhere) + + +Character and string literals + +A RawStringLit is a string literal delimited by back quotes ``; the +first back quote encountered after the opening back quote terminates +the string. + +RawStringLit = '`' { utf8_char } '`' . + +`abc` +`\n` + +Character and string literals are very similar to C except: + - Octal character escapes are always 3 digits (\077 not \77) + - Hexadecimal character escapes are always 2 digits (\x07 not \x7) + - Strings are UTF-8 and represent Unicode + - `` strings exist; they do not interpret backslashes + +CharLit = '\'' ( UnicodeValue | ByteValue ) '\'' . +StringLit = RawStringLit | InterpretedStringLit . +InterpretedStringLit = '"' { UnicodeValue | ByteValue } '"' . +ByteValue = OctalByteValue | HexByteValue . +OctalByteValue = '\' octal_digit octal_digit octal_digit . +HexByteValue = '\' 'x' hex_digit hex_digit . +UnicodeValue = utf8_char | EscapedCharacter | LittleUValue | BigUValue . +LittleUValue = '\' 'u' hex_digit hex_digit hex_digit hex_digit . +BigUValue = '\' 'U' hex_digit hex_digit hex_digit hex_digit + hex_digit hex_digit hex_digit hex_digit . +EscapedCharacter = '\' ( 'a' | 'b' | 'f' | 'n' | 'r' | 't' | 'v' ) . + +An OctalByteValue contains three octal digits. A HexByteValue +contains two hexadecimal digits. (Note: This differs from C but is +simpler.) + +It is erroneous for an OctalByteValue to represent a value larger than 255. +(By construction, a HexByteValue cannot.) + +A UnicodeValue takes one of four forms: + + 1. The UTF-8 encoding of a Unicode code point. Since Go source + text is in UTF-8, this is the obvious translation from input + text into Unicode characters. + 2. The usual list of C backslash escapes: \n \t etc. 3. A + `little u' value, such as \u12AB. This represents the Unicode + code point with the corresponding hexadecimal value. It always + has exactly 4 hexadecimal digits. + 4. A `big U' value, such as '\U00101234'. This represents the + Unicode code point with the corresponding hexadecimal value. + It always has exactly 8 hexadecimal digits. + +Some values that can be represented this way are illegal because they +are not valid Unicode code points. These include values above +0x10FFFF and surrogate halves. + +A character literal is a form of unsigned integer constant. Its value +is that of the Unicode code point represented by the text between the +quotes. + +'a' +'ä' +'本' +'\t' +'\0' +'\07' +'\0377' +'\x7' +'\xff' +'\u12e4' +'\U00101234' + +A string literal has type 'string'. Its value is constructed by +taking the byte values formed by the successive elements of the +literal. For ByteValues, these are the literal bytes; for +UnicodeValues, these are the bytes of the UTF-8 encoding of the +corresponding Unicode code points. Note that "\u00FF" and "\xFF" are +different strings: the first contains the two-byte UTF-8 expansion of +the value 255, while the second contains a single byte of value 255. +The same rules apply to raw string literals, except the contents are +uninterpreted UTF-8. + +"" +"Hello, world!\n" +"日本語" +"\u65e5本\U00008a9e" +"\xff\u00FF" + +These examples all represent the same string: + +"日本語" // UTF-8 input text +`日本語` // UTF-8 input text as a raw literal +"\u65e5\u672c\u8a9e" // The explicit Unicode code points +"\U000065e5\U0000672c\U00008a9e" // The explicit Unicode code points +"\xe6\x97\xa5\xe6\x9c\xac\xe8\xaa\x9e" // The explicit UTF-8 bytes + +The language does not canonicalize Unicode text or evaluate combining +forms. The text of source code is passed uninterpreted. + +If the source code represents a character as two code points, such as +a combining form involving an accent and a letter, the result will be +an error if placed in a character literal (it is not a single code +point), and will appear as two code points if placed in a string +literal. [This simple strategy may be insufficient in the long run +but is surely fine for now.] + + +Numeric literals + +Integer literals take the usual C form, except for the absence of the +'U', 'L' etc. suffixes, and represent integer constants. (Character +literals are also integer constants.) Similarly, floating point +literals are also C-like, without suffixes and decimal only. + +An integer constant represents an abstract integer value of arbitrary +precision. Only when an integer constant (or arithmetic expression +formed from integer constants) is assigned to a variable (or other +l-value) is it required to fit into a particular size - that of type +of the variable. In other words, integer constants and arithmetic +upon them is not subject to overflow; only assignment of integer +constants (and constant expressions) to an l-value can cause overflow. +It is an error if the value of the constant or expression cannot be +represented correctly in the range of the type of the l-value. + +Floating point literals also represent an abstract, ideal floating +point value that is constrained only upon assignment. [r: what do we +need to say here? trickier because of truncation of fractions.] + +IntLit = [ '+' | '-' ] UnsignedIntLit . +UnsignedIntLit = DecimalIntLit | OctalIntLit | HexIntLit . +DecimalIntLit = ( '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ) + { decimal_digit } . +OctalIntLit = '0' { octal_digit } . +HexIntLit = '0' ( 'x' | 'X' ) hex_digit { hex_digit } . +FloatLit = [ '+' | '-' ] UnsignedFloatLit . +UnsignedFloatLit = "the usual decimal-only floating point representation". + + + +Compound Literals + +THIS SECTION IS WRONG +Compound literals require some fine tuning. I think we did ok in +Sawzall but there are some loose ends. I don't like that one cannot +easily distinguish between an array and a struct. We may need to +specify a type if these literals appear in expressions, but we don't +want to specify a type if these literals appear as intializer +expressions where the variable is already typed. And we don't want to +do any implicit conversions. + +CompoundLit = ArrayLit | FunctionLit | StructureLit | MapLit. +ArrayLit = '{' [ ExpressionList ] ']'. // all elems must have "the same" type +StructureLit = '{' [ ExpressionList ] '}'. +MapLit = '{' [ PairList ] '}'. +PairList = Pair { ',' Pair }. +Pair = Expression ':' Expression. + +Literals + +Literal = BasicLit | CompoundLit . +BasicLit = CharLit | StringLit | IntLit | FloatLit . + + +Function Literals +[THESE ARE CORRECT] + +FunctionLit = FunctionType Block. + +// Function literal +func (a, b int, z float) bool { return a*b < int(z); } + +// Method literal +func (p *T) . (a, b int, z float) bool { return a*b < int(z) + p.x; } + + +Operators + +- incomplete + + +Delimiters + +- incomplete + + +Comments + +There are two forms of comments. + +The first starts '//' and ends at a newline. + +The second starts at '/*' and ends at the first '*/'. It may cross +newlines. It does not nest. + +Comments are treated like white space. + + +Common productions + +IdentifierList = identifier { ',' identifier }. +ExpressionList = Expression { ',' Expression }. + +QualifiedIdent = [ PackageName '.' ] identifier. +PackageName = identifier. + + +Types + +A type specifies the set of values which variables of that type may +assume, and the operators that are applicable. + +Except for variables of interface types, the static type of a variable +(i.e. the type the variable is declared with) is the same as the +dynamic type of the variable (i.e. the type of the variable at +run-time). Variables of interface types may hold variables of +different dynamic types, but their dynamic types must be compatible +with the static interface type. At any given instant during run-time, +a variable has exactly one dynamic type. A type declaration +associates an identifier with a type. + +Array and struct types are called structured types, all other types +are called unstructured. A structured type cannot contain itself. +[gri: this needs to be formulated much more precisely]. + +Type = TypeName | ArrayType | ChannelType | InterfaceType | + FunctionType | MapType | StructType | PointerType . +TypeName = QualifiedIdent. + + +[gri: To make the types specifications more precise we need to +introduce some general concepts such as what it means to 'contain' +another type, to be 'equal' to another type, etc. Furthermore, we are +imprecise as we sometimes use the word type, sometimes just the type +name (int), or the structure (array) to denote different things (types +and variables). We should explain more precisely. Finally, there is +a difference between equality of types and assignment compatibility - +or isn't there?] + + +Basic types + +Go defines a number of basic types which are referred to by their +predeclared type names. There are signed and unsigned integer types, +and floating point types: + + bool the truth values true and false + + uint8 the set of all unsigned 8bit integers + uint16 the set of all unsigned 16bit integers + uint32 the set of all unsigned 32bit integers + unit64 the set of all unsigned 64bit integers + + byte same as uint8 + + int8 the set of all signed 8bit integers, in 2's complement + int16 the set of all signed 16bit integers, in 2's complement + int32 the set of all signed 32bit integers, in 2's complement + int64 the set of all signed 64bit integers, in 2's complement + + float32 the set of all valid IEEE-754 32bit floating point numbers + float64 the set of all valid IEEE-754 64bit floating point numbers + float80 the set of all valid IEEE-754 80bit floating point numbers + + double same as float64 + +Additionally, Go declares 3 basic types, uint, int, and float, which +are platform-specific. The bit width of these types corresponds to +the "natural bit width" for the respective types for the given +platform (e.g. int is usally the same as int32 on a 32bit +architecture, or int64 on a 64bit architecture). These types are by +definition platform-specific and should be used with the appropriate +caution. + +[gri: do we specify minimal sizes for uint, int, float? e.g. int is +at least int32?] [gri: do we say something about the correspondence of +sizeof(*T) and sizeof(int)? Are they the same?] [r: do we want +int128 and uint128?.] + + +Built-in types + +Besides the basic types there is a set of built-in types: string, and chan, +with maybe more to follow. + + +Type string + +The string type represents the set of string values (strings). +A string behaves like an array of bytes, with the following properties: + +- They are immutable: after creation, it is not possible to change the + contents of a string +- No internal pointers: it is illegal to create a pointer to an inner + element of a string +- They can be indexed: given string s1, s1[i] is a byte value +- They can be concatenated: given strings s1 and s2, s1 + s2 is a value + combining the elements of s1 and s2 in sequence +- Known length: the length of a string s1 can be obtained by the function/ + operator len(s1). [r: is it a bulitin? do we make it a method? etc. this is + a placeholder]. The length of a string is the number of bytes within. + Unlike in C, there is no terminal NUL byte. +- Creation 1: a string can be created from an integer value by a conversion + string('x') yields "x" +- Creation 2: a string can by created from an array of integer values (maybe + just array of bytes) by a conversion + a [3]byte; a[0] = 'a'; a[1] = 'b'; a[2] = 'c'; string(a) == "abc"; + +The language has string literals as dicussed above. The type of a string +literal is 'string'. + + +Array types + +An array is a structured type consisting of a number of elements which +are all of the same type, called the element type. The number of +elements of an array is called its length. The elements of an array +are designated by indices which are integers between 0 and the length +- 1. + +THIS SECTION NEEDS WORK REGARDING STATIC AND DYNAMIC ARRAYS + +An array type specifies a set of arrays with a given element type and +an optional array length. The array length must be (compile-time) +constant expression, if present. Arrays without length specification +are called open arrays. An open array must not contain other open +arrays, and open arrays can only be used as parameter types or in a +pointer type (for instance, a struct may not contain an open array +field, but only a pointer to an open array). + +[gri: Need to define when array types are the same! Also need to +define assignment compatibility] [gri: Need to define a mechanism to +get to the length of an array at run-time. This could be a +predeclared function 'length' (which may be problematic due to the +name). Alternatively, we could define an interface for array types +and say that there is a 'length()' method. So we would write +a.length() which I think is pretty clean.]. [r: if array types have +an interface and a string is an array, some stuff (but not enough) +falls out nicely.] + +ArrayType = 'array' { '[' ArrayLength ']' } ElementType. +ArrayLength = Expression. +ElementType = Type. + +The notation + + array [n][m] T + +is a syntactic shortcut for + + array [n] array [m] T. + +(the shortcut may be applied recursively). + +array uint8 +array [64] struct { x, y: int32; } +array [1000][1000] float64 + + +Channel types + + +ChannelType = 'channel' '(' Type '<-' Type ')' . + +channel(int <- float) + +- incomplete + + +Pointer types + +- TODO: Need some intro here. + +Two pointer types are the same if they are pointing to variables of +the same type. + +PointerType = '*' Type. + +- We do not allow pointer arithmetic of any kind. + +Interface types + +- TBD: This needs to be much more precise. For now we understand what it means. + +An interface type specifies a set of methods, the "method interface" +of structs. No two methods in one interface can have the same name. + +Two interfaces are the same if their set of functions is the same, +i.e., if all methods exist in both interfaces and if the function +names and signatures are the same. The order of declaration of +methods in an interface is irrelevant. + +A set of interface types implicitly creates an unconnected, ordered +lattice of types. An interface type T1 is said to be smaller than or +equalt to an interface type T2 (T1 <= T2) if the entire interface of +T1 "is part" of T2. Thus, two interface types T1, T2 are the same if +T1 <= T2, and T2 <= T1, and thus we can write T1 == T2. + + +InterfaceType = 'interface' '{' { MethodDecl } '}' . +MethodDecl = identifier Signature ';', + +// An empty interface. +interface {}; + +// A basic file interface. +interface { + Read(Buffer) bool; + Write(Buffer) bool; + Close(); +} + + +Interface pointers can be implemented as "fat pointers"; namely a pair +(ptr, tdesc) where ptr is simply the pointer to a struct instance +implementing the interface, and tdesc is the structs type descriptor. +Only when crossing the boundary from statically typed structs to +interfaces and vice versa, does the type descriptor come into play. +In those places, the compiler statically knows the value of the type +descriptor. + + +Function types + +FunctionType = 'func' Signature . +Signature = [ Receiver '.' ] Parameters [ Result ] . +Receiver = '(' identifier Type ')' . +Parameters = '(' [ ParameterList ] ')' . +ParameterList = ParameterSection { ',' ParameterSection } . +ParameterSection = [ IdentifierList ] Type . +Result = [ Type ] | '(' ParameterList ')' . + +// Function types +func () +func (a, b int, z float) bool +func (a, b int, z float) (success bool) +func (a, b int, z float) (success bool, result float) + +// Method types +func (p *T) . () +func (p *T) . (a, b int, z float) bool +func (p *T) . (a, b int, z float) (success bool) +func (p *T) . (a, b int, z float) (success bool, result float) + + +Map types + +MapType = 'map' '(' Type <- Type ')'. + +map(int <- string) + +- incomplete + + +Struct types + +Struct types are similar to C structs. + +NEED TO DEFINE STRUCT EQUIVALENCE Two struct types are the same if and +only if they are declared by the same struct type; i.e., struct types +are compared via equivalence, and *not* structurally. For that +reason, struct types are usually given a type name so that it is +possible to refer to the same struct in different places in a program. +What about equivalence of structs w/ respect to methods? What if +methods can be added in another package? TBD. + +Each field of a struct represents a variable within the data +structure. In particular, a function field represents a function +variable, not a method. + +StructType = 'struct' '{' { FieldDecl } '}' . +FieldDecl = IdentifierList Type ';' . + +// An empty struct. +struct {} + +// A struct with 5 fields. +struct { + x, y int; + u float; + a []int; + f func(); +} + + + +Note that a program which never uses interface types can be fully +statically typed. That is, the "usual" implementation of structs (or +classes as they are called in other languages) having an extra type +descriptor prepended in front of every single struct is not required. +Only when a pointer to a struct is assigned to an interface variable, +the type descriptor comes into play, and at that point it is +statically known at compile-time! + +Package specifiers + +Every source file is an element of a package, and defines which +package by the first element of every source file, which must be a +package specifier: + +PackageSpecifier = 'package' PackageName . + +package Math + + +Package import declarations + +A program can access exported items from another package. It does so +by in effect declaring a local name providing access to the package, +and then using the local name as a namespace with which to address the +elements of the package. + +ImportDecl = 'import' PackageName FileName . +FileName = DoubleQuotedString . +DoubleQuotedString = '"' TEXT '"' . + +(DoubleQuotedString should be replaced by the correct string literal production!) +Package import declarations must be the first statements in a file +after the package specifier. + +A package import associates an identifier with a package, named by a +file. In effect, it is a declaration: + +import Math "lib/Math"; +import library "my/library"; + +After such an import, one can use the Math (e.g) identifier to access +elements within it + +x float = Math.sin(y); + +Note that this process derives nothing explicit about the type of the +`imported' function (here Math.sin()). The import must execute to +provide this information to the compiler (or the programmer, for that +matter). + +An angled-string refers to official stuff in a public place, in effect +the run-time library. A double-quoted-string refers to arbitrary +code; it is probably a local file name that needs to be discovered +using rules outside the scope of the language spec. + +The file name in a package must be complete except for a suffix. +Moreover, the package name must correspond to the (basename of) the +source file name. For instance, the implementation of package Bar +must be in file Bar.go, and if it lives in directory foo we write + +import Bar "foo/bar"; + +to import it. + +[This is a little redundant but if we allow multiple files per package +it will seem less so, and in any case the redundancy is useful and +protective.] + +We assume Unix syntax for file names: / separators, no suffix for +directories. If the language is ported to other systems, the +environment must simulate these properties to avoid changing the +source code. + + +Declarations + +- This needs to be expanded. +- We need to think about enums (or some alternative mechanism). + +Declaration = (ConstDecl | VarDecl | TypeDecl | FunctionDecl | + ForwardDecl | AliasDecl) . + + +Const declarations + +ConstDecl = 'const' ( ConstSpec | '(' ConstSpecList [ ';' ] ')' ). +ConstSpec = identifier [ Type ] '=' Expression . +ConstSpecList = ConstSpec { ';' ConstSpec }. + +const pi float = 3.14159265 +const e = 2.718281828 +const ( + one int = 1; + two = 3 +) + + +Variable declarations + +VarDecl = 'var' ( VarSpec | '(' VarSpecList [ ';' ] ')' ) | ShortVarDecl . +VarSpec = IdentifierList ( Type [ '=' ExpressionList ] | '=' ExpressionList ) . +VarSpecList = VarSpec { ';' VarSpec } . +ShortVarDecl = identifier ':=' Expression . + +var i int +var u, v, w float +var k = 0 +var x, y float = -1.0, -2.0 +var ( + i int; + u, v = 2.0, 3.0 +) + +If the expression list is present, it must have the same number of elements +as there are variables in the variable specification. + +[ TODO: why is x := 0 not legal at the global level? ] + + +Type declarations + +TypeDecl = 'type' ( TypeSpec | '(' TypeSpecList [ ';' ] ')' ). +TypeSpec = identifier Type . +TypeSpecList = TypeSpec { ';' TypeSpec }. + + +type IntArray [16] int +type ( + Point struct { x, y float }; + Polar Point +) + + +Function and method declarations + +FunctionDecl = 'func' [ Receiver ] identifier Parameters [ Result ] ( ';' | Block ) . +Block = '{' { Statement } '}' . + + +func min(x int, y int) int { + if x < y { + return x; + } + return y; +} + +func foo (a, b int, z float) bool { + return a*b < int(z); +} + + +A method is a function that also declares a receiver. The receiver is +a struct with which the function is associated. The receiver type +must denote a pointer to a struct. + +func (p *T) foo (a, b int, z float) bool { + return a*b < int(z) + p.x; +} + +func (p *Point) Length() float { + return Math.sqrt(p.x * p.x + p.y * p.y); +} + +func (p *Point) Scale(factor float) { + p.x = p.x * factor; + p.y = p.y * factor; +} + +The last two examples are methods of struct type Point. The variable p is +the receiver; within the body of the method it represents the value of +the receiving struct. + +Note that methods are declared outside the body of the corresponding +struct. + +Functions and methods can be forward declared by omitting the body: + +func foo (a, b int, z float) bool; +func (p *T) foo (a, b int, z float) bool; + + + +Statements + +Statement = EmptyStat | Assignment | CompoundStat | Declaration | + ExpressionStat | IncDecStat | IfStat | WhileStat | ReturnStat . + + +Empty statements + +EmptyStat = ';' . + + +Assignments + +Assignment = Designator '=' Expression . + +- no automatic conversions +- values can be assigned to variables if they are of the same type, or +if they satisfy the interface type (much more precision needed here!) + + + +Compound statements + +CompoundStat = '{' { Statement } '}' . + + +Expression statements + +ExpressionStat = Expression . + + +IncDec statements + +IncDecStat = Expression ( '++' | '--' ) . + + + + +If statements + +IfStat = 'if' ( [ Expression ] '{' { IfCaseList } '}' ) | + ( Expression '{' { Statement } '}' [ 'else' { Statement } ] ). +IfCaseList = ( 'case' ExpressionList | 'default' ) ':' { Statement } . + +if x < y { + return x; +} else { + return y; +} + +if tag { +case 0, 1: s1(); +case 2: s2(); +default: ; +} + +if { +case x < y: f1(); +case x < z: f2(); +} + + +While statements + +WhileStat = 'while' ( [ Expression ] '{' { WhileCaseList } '}' ) | + ( Expression '{' { Statement } '}' ). +WhileCaseList = 'case' ExpressionList ':' { Statement } . + +while { +case i < n: f1(); +case i < m: f2(); +} + + +Return statements + +ReturnStat = 'return' [ ExpressionList ] . + +There are two ways to return values from a function. The first is to +explicitly list the return value or values in the return statement: + +func simple_f () int { + return 2; +} + +func complex_f1() (re float, im float) { + return -7.0, -4.0; +} + +The second is to provide names for the return values and assign them +explicitly in the function; the return statement will then provide no +values: + +func complex_f2() (re float, im float) { + re = 7.0; + im = 4.0; + return; +} + +It is legal to name the return values in the declaration even if the +first form of return statement is used: + + +func complex_f2() (re float, im float) { + return 7.0, 4.0; +} + + +Expressions + +Expression = Conjunction { '||' Conjunction }. +Conjunction = Comparison { '&&' Comparison }. +Comparison = SimpleExpr [ relation SimpleExpr ]. +relation = '==' | '!=' | '<' | '<=' | '>' | '>='. +SimpleExpr = Term { add_op Term }. +add_op = '+' | '-' | '|' | '^'. +Term = Factor { mul_op Factor }. +mul_op = '*' | '/' | '%' | '<<' | '>>' | '&'. + +The corresponding precedence hierarchy is as follows: (5 levels of +precedence is about the maximum people can keep comfortably in their +heads. The experience with C and C++ shows that more then that +usually requires explicit manual consultation...). [gri: I still +think we should consider 0 levels of binary precedence: All operators +are on the same level, but parentheses are required when different +operators are mixed. That would make it really easy, and really +clear. It would also open the door for straight-forward introduction +of user-defined operators, which would be rather useful.] + +Precedence Operator + 1 || + 2 && + 3 == != < <= > >= + 4 + - | ^ + 5 * / % << >> & + + +For integer values, / and % satisfy the following relationship: + + (a / b) * b + a % b == a + +and + + (a / b) is "truncated towards zero". + +The shift operators implement arithmetic shifts for signed integers, +and logical shifts for unsigned integers. TBD: is there any range +checking on s in x >> s, or x << s ? + +[gri: We decided on a couple of issues here that we need to write down +more nicely] + +- There are no implicit type conversions except for +constants/literals. In particular, unsigned and signed integers +cannot be mixed in an expression w/o explicit casting. + +- Unary '^' corresponds to C '~' (bitwise negate). + +- Arrays can be subscripted (a[i]) or sliced (a[i : j]). A slice a[i +: j] is a new array of length (j - i), and consisting of the elements +a[i], a[i + 1], ... a[j - 1]. [gri/r: Is the slice array bounds +check hard (leading to an error), or soft (truncating) ?]. +Furthermore: Array slicing is very tricky! Do we get a copy (a new +array) or a new array descriptor? This is open at this point. There +is a simple way out of the mess: Structured types are always passed by +reference, and there is no value assignment for structured types. It +gets very complicated very quickly. + +[gri: Syntax below is incomplete - what about method invocation?] + +Factor = Literal | Designator | '!' Expression | '-' Expression | + '^' Expression | '&' Expression | '(' Expression ')' | Call. +Designator = QualifiedIdent { Selector }. +Selector = '.' identifier | '[' Expression [ ':' Expression ] ']'. +Call = Factor '(' ExpressionList ')'. + +[gri: We need a precise definition of a constant expression] + + + + +Compilation units + +The unit of compilation is a single file. A compilation unit consists +of a package specifier followed by a list of import declarations +followed by a list of global declarations. + +CompilationUnit = { ImportDecl } { GlobalDeclaration }. +GlobalDeclaration = Declaration. + + +Exports + +Globally declared identifiers may be exported, thus making the +exported identifer visible outside the package. Another package may +then import the identifier to use it. + +Export directives must only appear at the global level of a +compilation unit (at least for now). That is, one can export +compilation-unit global identifiers but not, for example, local +variables or structure fields. + +Exporting an identifier makes the identifier visible externally to the +package. If the identifier represents a type, the type structure is +exported as well. The exported identifiers may appear later in the +source than the export directive itself, but it is an error to specify +an identifier not declared anywhere in the source file containing the +export directive. + +ExportDirective = 'export' ExportIdentifier { ',' ExportIdentifier } . +ExportIdentifier = identifier . + +export sin, cos; + +One may export variables and types, but (at least for now), not +aliases. [r: what is needed to make aliases exportable? issue is +transitivity.] + +Exporting a variable does not automatically export the type of the +variable. For illustration, consider the program fragment: + +package P; +export v1, v2, p; +struct S { a int; b int; } +var v1 S; +var v2 S; +var p *S; + +Notice that S is not exported. Another source file may contain: + +import P; +alias v1 P.v1; +alias v2 P.v2; +alias p P.p; + +This program can use v and p but not access the fields (a and b) of +structure type S explicitly. For instance, it could legally contain + +if p == nil { } +if v1 == v2 { } + +but not + +if v.a == 0 { } + + +