knowledge/technology/tools/JSON Schema.md

20 KiB

obj website
concept https://json-schema.org

JSON Schema

JSON Schema is a schema specification for JSON, which can validate a JSON Document to be of a specific format. Some common schemas can be found here.

type Keyword

The type keyword is used to restrict the types of a property

{ "type": "string" }

Every type has specific keywords that can be used:

string

Length

The length of a string can be constrained:

{
	"type": "string",
	"minLength": 2,
	"maxLength": 3
}

Regex

The pattern keyword is used to restrict a string to a particular regular expression

{
  "type": "string",
  "pattern": "^(\\([0-9]{3}\\))?[0-9]{3}-[0-9]{4}$"
}

Format

The format keyword allows for basic semantic identification of certain kinds of string values that are commonly used. For example, because JSON doesn't have a "DateTime" type, dates need to be encoded as strings. format allows the schema author to indicate that the string value should be interpreted as a date. By default, format is just an annotation and does not effect validation.

Optionally, validator implementations can provide a configuration option to enable format to function as an assertion rather than just an annotation. That means that validation will fail if, for example, a value with a date format isn't in a form that can be parsed as a date. This can allow values to be constrained beyond what the other tools in JSON Schema, including Regular Expressions can do.

Built-in formats

The following is the list of formats specified in the JSON Schema specification.

Dates and times

Dates and times are represented in RFC 3339, section 5.6. This is a subset of the date format also commonly known as ISO8601 format.

  • "date-time": Date and time together, for example, 2018-11-13T20:20:39+00:00.
  • "time": Time, for example, 20:20:39+00:00
  • "date": Date, for example, 2018-11-13.
  • "duration": A duration as defined by the ISO 8601 ABNF for "duration". For example, P3D expresses a duration of 3 days.
Email addresses
Hostnames
IP Addresses
Resource identifiers
  • "uuid": A Universally Unique Identifier as defined by RFC 4122. Example: 3e4666bf-d5e5-4aa7-b8ce-cefe41c7568a
  • "uri": A universal resource identifier (URI), according to RFC3986.
  • "uri-reference": A URI Reference (either a URI or a relative-reference), according to RFC3986, section 4.1.
  • "iri": The internationalized equivalent of a "uri", according to RFC3987.
  • "iri-reference": New in draft 7The internationalized equivalent of a "uri-reference", according to RFC3987

If the values in the schema have the ability to be relative to a particular source path (such as a link from a webpage), it is generally better practice to use "uri-reference" (or "iri-reference") rather than "uri" (or "iri"). "uri" should only be used when the path must be absolute.

URI template
  • "uri-template": A URI Template (of any level) according to RFC6570. If you don't already know what a URI Template is, you probably don't need this value.
JSON Pointer
  • "json-pointer": A JSON Pointer, according to RFC6901. There is more discussion on the use of JSON Pointer within JSON Schema in Structuring a complex schema. Note that this should be used only when the entire string contains only JSON Pointer content, e.g. /foo/bar. JSON Pointer URI fragments, e.g. #/foo/bar/ should use "uri-reference".
  • "relative-json-pointer": A relative JSON pointer.
Regular Expressions

integer

The integer type is used for integral numbers. JSON does not have distinct types for integers and floating-point values. Therefore, the presence or absence of a decimal point is not enough to distinguish between integers and non-integers. For example, 1 and 1.0 are two ways to represent the same value in JSON. JSON Schema considers that value an integer no matter which representation was used.

For differencing float and integer values these types can be used.

{ "type": "integer"}
{ "type": "number" }

Multiples

Numbers can be restricted to a multiple of a given number, using the multipleOf keyword. It may be set to any positive number.

{
	"type": "number",
	"multipleOf" : 10
}

Range

Ranges of numbers are specified using a combination of the minimum and maximum keywords, (or exclusiveMinimum and exclusiveMaximum for expressing exclusive range).

{
  "type": "number",
  "minimum": 0,
  "exclusiveMaximum": 100
}

object

Objects are the mapping type in JSON. They map "keys" to "values". In JSON, the "keys" must always be strings. Each of these pairs is conventionally referred to as a "property".

{ "type": "object" }

Properties

The properties (key-value pairs) on an object are defined using the properties keyword. The value of properties is an object, where each key is the name of a property and each value is a schema used to validate that property. Any property that doesn't match any of the property names in the properties keyword is ignored by this keyword.

{
  "type": "object",
  "properties": {
    "number": { "type": "number" },
    "street_name": { "type": "string" },
    "street_type": { "enum": ["Street", "Avenue", "Boulevard"] }
  }
}

Pattern Properties

Sometimes you want to say that, given a particular kind of property name, the value should match a particular schema. That's where patternProperties comes in: it maps regular expressions to schemas. If a property name matches the given regular expression, the property value must validate against the corresponding schema.

{
  "type": "object",
  "patternProperties": {
    "^S_": { "type": "string" },
    "^I_": { "type": "integer" }
  }
}

Additional Properties

The additionalProperties keyword is used to control the handling of extra stuff, that is, properties whose names are not listed in the properties keyword or match any of the regular expressions in the patternProperties keyword. By default any additional properties are allowed.

The value of the additionalProperties keyword is a schema that will be used to validate any properties in the instance that are not matched by properties or patternProperties. Setting the additionalProperties schema to false means no additional properties will be allowed.

{
  "type": "object",
  "properties": {
    "number": { "type": "number" },
    "street_name": { "type": "string" },
    "street_type": { "enum": ["Street", "Avenue", "Boulevard"] }
  },
  "additionalProperties": false
}

Required Properties

By default, the properties defined by the properties keyword are not required. However, one can provide a list of required properties using the required keyword.

{
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "email": { "type": "string" },
    "address": { "type": "string" },
    "telephone": { "type": "string" }
  },
  "required": ["name", "email"]
}

Property names

The names of properties can be validated against a schema, irrespective of their values. This can be useful if you don't want to enforce specific properties, but you want to make sure that the names of those properties follow a specific convention. You might, for example, want to enforce that all names are valid ASCII tokens so they can be used as attributes in a particular programming language.

{
  "type": "object",
  "propertyNames": {
    "pattern": "^[A-Za-z_][A-Za-z0-9_]*$"
  }
}

Size

The number of properties on an object can be restricted using the minProperties and maxProperties keywords. Each of these must be a non-negative integer.

{
  "type": "object",
  "minProperties": 2,
  "maxProperties": 3
}

array

Arrays are used for ordered elements. In JSON, each element in an array may be of a different type.

{ "type": "array" }

Items

List validation is useful for arrays of arbitrary length where each item matches the same schema. For this kind of array, set the items keyword to a single schema that will be used to validate all of the items in the array.

{
  "type": "array",
  "items": {
    "type": "number"
  }
}

Tuple Validation

Tuple validation is useful when the array is a collection of items where each has a different schema and the ordinal index of each item is meaningful.

{
  "type": "array",
  "prefixItems": [
    { "type": "number" },
    { "type": "string" },
    { "enum": ["Street", "Avenue", "Boulevard"] },
    { "enum": ["NW", "NE", "SW", "SE"] }
  ]
}

Additional Items

The items keyword can be used to control whether it's valid to have additional items in a tuple beyond what is defined in prefixItems. The value of the items keyword is a schema that all additional items must pass in order for the keyword to validate.

{
  "type": "array",
  "prefixItems": [
    { "type": "number" },
    { "type": "string" },
    { "enum": ["Street", "Avenue", "Boulevard"] },
    { "enum": ["NW", "NE", "SW", "SE"] }
  ],
  "items": false
}

Unevaluated Items

The unevaluatedItems keyword is useful mainly when you want to add or disallow extra items to an array.

unevaluatedItems applies to any values not evaluated by an items, prefixItems, or contains keyword. Just as unevaluatedProperties affects only properties in an object, unevaluatedItems affects only items in an array.

Watch out! The word "unevaluated" does not mean "not evaluated by items, prefixItems, or contains." "Unevaluated" means "not successfully evaluated", or "does not evaluate to true".

Like with items, if you set unevaluatedItems to false, you can disallow extra items in the array.

{
  "prefixItems": [
    { "type": "string" }, { "type": "number" }
  ],
  "unevaluatedItems": false
}

Contains

While the items schema must be valid for every item in the array, the contains schema only needs to validate against one or more items in the array.

{
  "type": "array",
  "contains": {
    "type": "number"
  }
}

minContains / maxContains

minContains and maxContains can be used with contains to further specify how many times a schema matches a contains constraint. These keywords can be any non-negative number including zero.

{
  "type": "array",
  "contains": {
    "type": "number"
  },
  "minContains": 2,
  "maxContains": 3
}

Length

The length of the array can be specified using the minItems and maxItems keywords. The value of each keyword must be a non-negative number. These keywords work whether doing list validation or tuple-validation.

{
  "type": "array",
  "minItems": 2,
  "maxItems": 3
}

Uniqueness

A schema can ensure that each of the items in an array is unique. Simply set the uniqueItems keyword to true.

{
  "type": "array",
  "uniqueItems": true
}

boolean

The boolean type matches only two special values: true and false. Note that values that evaluate to true or false, such as 1 and 0, are not accepted by the schema.

{ "type": "boolean" }

null

When a schema specifies a type of null, it has only one acceptable value: null.

{ "type": "null" }

Annotations

JSON Schema includes a few keywords, that aren't strictly used for validation, but are used to describe parts of a schema. None of these "annotation" keywords are required, but they are encouraged for good practice, and can make your schema "self-documenting".

The title and description keywords must be strings. A "title" will preferably be short, whereas a "description" will provide a more lengthy explanation about the purpose of the data described by the schema.

The default keyword specifies a default value. This value is not used to fill in missing values during the validation process. Non-validation tools such as documentation generators or form generators may use this value to give hints to users about how to use a value. However, default is typically used to express that if a value is missing, then the value is semantically the same as if the value was present with the default value. The value of default should validate against the schema in which it resides, but that isn't required.

The examples keyword is a place to provide an array of examples that validate against the schema. This isn't used for validation, but may help with explaining the effect and purpose of the schema to a reader. Each entry should validate against the schema in which it resides, but that isn't strictly required. There is no need to duplicate the default value in the examples array, since default will be treated as another example.

The boolean keywords readOnly and writeOnly are typically used in an API context. readOnly indicates that a value should not be modified. It could be used to indicate that a PUT request that changes a value would result in a 400 Bad Request response. writeOnly indicates that a value may be set, but will remain hidden. In could be used to indicate you can set a value with a PUT request, but it would not be included when retrieving that record with a GET request.

The deprecated keyword is a boolean that indicates that the instance value the keyword applies to should not be used and may be removed in the future.

{
  "title": "Match anything",
  "description": "This is a schema that matches anything.",
  "default": "Default value",
  "examples": [
    "Anything",
    4035
  ],
  "deprecated": true,
  "readOnly": true,
  "writeOnly": false
}

Enumerated values

The enum keyword is used to restrict a value to a fixed set of values. It must be an array with at least one element, where each element is unique.

{
  "enum": ["red", "amber", "green"]
}

Constant Values

The const keyword is used to restrict a value to a single value.

{
  "properties": {
    "country": {
      "const": "United States of America"
    }
  }
}

Media: string-encoding non JSON Data

JSON schema has a set of keywords to describe and optionally validate non-JSON data stored inside JSON strings. Since it would be difficult to write validators for many media types, JSON schema validators are not required to validate the contents of JSON strings based on these keywords. However, these keywords are still useful for an application that consumes validated JSON.

contentMediaType

The contentMediaType keyword specifies the MIME type of the contents of a string, as described in RFC 2046. There is a list of MIME types officially registered by the IANA, but the set of types supported will be application and operating system dependent. Mozilla Developer Network also maintains a shorter list of MIME types that are important for the web

contentEncoding

The contentEncoding keyword specifies the encoding used to store the contents, as specified in RFC 2054, part 6.1 and RFC 4648.

The acceptable values are 7bit, 8bit, binary, quoted-printable, base16, base32, and base64. If not specified, the encoding is the same as the containing JSON document.

Without getting into the low-level details of each of these encodings, there are really only two options useful for modern usage:

  • If the content is encoded in the same encoding as the enclosing JSON document (which for practical purposes, is almost always UTF-8), leave contentEncoding unspecified, and include the content in a string as-is. This includes text-based content types, such as text/html or application/xml.
  • If the content is binary data, set contentEncoding to base64 and encode the contents using Base64. This would include many image types, such as image/png or audio types, such as audio/mpeg.
{
  "type": "string",
  "contentMediaType": "text/html"
}
{
  "type": "string",
  "contentEncoding": "base64",
  "contentMediaType": "image/png"
}

Conditional Schemas

dependentRequired

The dependentRequired keyword conditionally requires that certain properties must be present if a given property is present in an object. For example, suppose we have a schema representing a customer. If you have their credit card number, you also want to ensure you have a billing address. If you don't have their credit card number, a billing address would not be required. We represent this dependency of one property on another using the dependentRequired keyword. The value of the dependentRequired keyword is an object. Each entry in the object maps from the name of a property, p, to an array of strings listing properties that are required if p is present.

{
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "credit_card": { "type": "number" },
    "billing_address": { "type": "string" }
  },
  "required": ["name"],
  "dependentRequired": {
    "credit_card": ["billing_address"]
  }
}

dependentSchemas

The dependentSchemas keyword conditionally applies a subschema when a given property is present. This schema is applied in the same way allOf applies schemas. Nothing is merged or extended. Both schemas apply independently.

{
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "credit_card": { "type": "number" }
  },
  "required": ["name"],
  "dependentSchemas": {
    "credit_card": {
      "properties": {
        "billing_address": { "type": "string" }
      },
      "required": ["billing_address"]
    }
  }
}

If-Then-Else

The if, then and else keywords allow the application of a subschema based on the outcome of another schema, much like the if/then/else constructs you've probably seen in traditional programming languages.

{
  "type": "object",
  "properties": {
    "street_address": {
      "type": "string"
    },
    "country": {
      "default": "United States of America",
      "enum": ["United States of America", "Canada"]
    }
  },
  "if": {
    "properties": { "country": { "const": "United States of America" } }
  },
  "then": {
    "properties": { "postal_code": { "pattern": "[0-9]{5}(-[0-9]{4})?" } }
  },
  "else": {
    "properties": { "postal_code": { "pattern": "[A-Z][0-9][A-Z] [0-9][A-Z][0-9]" } }
  }
}