Canonical JSON: Difference between revisions
(Define canonical JSON.) |
(Explain UTF-16 encoding better.) |
||
Line 45: | Line 45: | ||
''digit'' ''digits'' |
''digit'' ''digits'' |
||
Whitespace is not permitted between tokens. Leading and trailing whitespace is likewise disallowed. The ''members'' production in ''object'' must consist of keys '''in lexicographically sorted order'''. Contents of strings must be in a UTF-16 encoding of unicode [http://www.unicode.org/reports/tr15/ Normalization Form D] (UAX #15). Control characters are defined as unicode codepoint 127 (decimal) and those with unicode codepoints less than 32. |
Whitespace is not permitted between tokens. Leading and trailing whitespace is likewise disallowed. The ''members'' production in ''object'' must consist of keys '''in lexicographically sorted order'''. Contents of strings must be in a UTF-16 encoding of unicode [http://www.unicode.org/reports/tr15/ Normalization Form D] (UAX #15) with no byte-order mark (since each character-or-escape-sequence encodes a full 16-bit codepoint). Control characters are defined as unicode codepoint 127 (decimal) and those with unicode codepoints less than 32. |
||
The "backslash u" escape form must not be used for any unicode character with code point greater than 31 (decimal) or less than 127 (decimal), except for codepoint 34 (decimal) and codepoint 92 (decimal), which are, respectively, the quotation and backslash character. |
The "backslash u" escape form must not be used for any unicode character with code point greater than 31 (decimal) or less than 127 (decimal), except for codepoint 34 (decimal) and codepoint 92 (decimal), which are, respectively, the quotation and backslash character. |
||
Revision as of 22:28, 14 August 2007
A "canonical JSON" format is provided in order to provide meaningful and repeatable hashes of JSON-encoded data. Canonical JSON is parsable with any full JSON parser, but security-conscious applications will want to verify that input is in canonical form before authenticating any hash or signature on that input.
The grammar for canonical JSON closely matches the grammar presented at json.org:
object: {} { members } members: pair pair , members pair: string : value array: [] [ elements ] elements: value value , elements value: string number object array true false null string: "" " chars " chars: char char chars char: any-7-bit-ASCII-character-except-"-or-\-and-control-characters \u four-hex-digits-in-lowercase number: int int: digit digit1-9 digits - digit1-9 - digit1-9 digits digits: digit digit digits
Whitespace is not permitted between tokens. Leading and trailing whitespace is likewise disallowed. The members production in object must consist of keys in lexicographically sorted order. Contents of strings must be in a UTF-16 encoding of unicode Normalization Form D (UAX #15) with no byte-order mark (since each character-or-escape-sequence encodes a full 16-bit codepoint). Control characters are defined as unicode codepoint 127 (decimal) and those with unicode codepoints less than 32. The "backslash u" escape form must not be used for any unicode character with code point greater than 31 (decimal) or less than 127 (decimal), except for codepoint 34 (decimal) and codepoint 92 (decimal), which are, respectively, the quotation and backslash character.
Notes
- Floating point numbers are not allowed in canonical JSON. Neither are leading zeros or "minus 0" for integers.
- All map keys must be quoted, and must appear in sorted order.
- All whitespace is eliminated.
- Trailing commas in members and elements are not allowed.
- Only one 'escape' sequence is defined for strings, and its use is mandatory for certain characters.