Contents manifest specification: Difference between revisions

From OLPC
Jump to navigation Jump to search
(→‎Credentials: Update to match the firmware signature spec more closely.)
(→‎Manifests: update manifest format)
Line 128: Line 128:
efficient updates, because we don't have to synchronize irrelevant
efficient updates, because we don't have to synchronize irrelevant
metadata. Omitted metadata includes file creation, modification, and
metadata. Omitted metadata includes file creation, modification, and
access times, and hard link and xattr information. We also use
access times, and hard link and xattr information. We also use both
numeric user and group ids, which may not correlate with the target
numeric and textual user and group ids, which may require care on the target
system if the user/group databases are not synchronized. Our design
system if the user/group databases are not synchronized. Our design
is sufficient for the present; we will consider adding additional
is sufficient for the present; we will consider adding additional
Line 141: Line 141:


A manifest is an envelope with type "manifest" and version 1.0.
A manifest is an envelope with type "manifest" and version 1.0.
A manifest's datum is a 2-tuple (JSON 2-element list):
A manifest's datum is a list:


(hash-algorithm-list, directory-list) ::
directory-list :: List<Directory>
(List<String>, List<Directory>)


The hash-algorithm-list is an ordered list of opaque algorithm
identifiers. There must be at least one item in the list. This hash algorithm order is used by the included [[#Directory objects|directory objects]].
The directory-list is an list of [[#Directory objects|directory]] envelopes, in post-order.
The directory-list is an list of [[#Directory objects|directory]] envelopes, in post-order.
The first directory corresponds to the root of the filesystem defined by the manifest, and it is followed by directory objects for its subdirectories (and their subdirectories) in the order in which they appear in the root directory object. For example, the directory objects for the directory tree:
The first directory corresponds to the root of the filesystem defined by the manifest, and it is followed by directory objects for its subdirectories (and their subdirectories) in the order in which they appear in the root directory object. For example, the directory objects for the directory tree:
Line 178: Line 175:
where all subdirectories are contained in the directory list.
where all subdirectories are contained in the directory list.


Since subdirectories may be elided from the directory list, the credential authenticating a manifest should sign the root directory object (ie, 'a' in the example above). The validity of the remaining subdirectories is assured by the [http://en.wikipedia.org/wiki/Hash_tree hash tree] construction of the directory objects.
Since subdirectories may be elided from the directory list, '''the credential authenticating a manifest should sign the root directory object''' (ie, 'a' in the example above). The validity of the remaining subdirectories is assured by the [http://en.wikipedia.org/wiki/Hash_tree hash tree] construction of the directory objects. The credential need not share hash algorithms with the manifest.


Although objects may be omitted from the directory list, it should
Although objects may be omitted from the directory list, it should
be considered an error to include extraneous directories not directly referenced from the included directory objects.
be considered an error to include extraneous directories not directly referenced from the included directory objects.


A manifest is just a convenience object for bundling a number of related directory objects; it should not be directly signed. Instead, the root directory object in the manifest should be the element which receives a signature.
In version 1 of the manifest format, there must be exactly two
hash algorithms specified, and they must be 'sha-256' and
'whirlpool-XXX'. (Future subversions of the manifest format may add new
algorithms.)

Note that adding a bogus hash algorithm to the hash-algorithm-list
will allow injection of arbitrary data into the manifest, which could
comprise an attack on the hash algorithm(s) used. For this reason,
security-conscious applications will require a fixed algorithm list
(as version 1 does). Migration to a new hash algorithm requires
first distributing an upgrade which recognizes a new manifest version
(either a new subversion, if the hash algorithm list is simply
extended, or a new version if hash algorithms are removed). After the
upgrade, manifests with the new version (and thus hash algorithm list)
can be published.

Similarly, a manifest with N listed hash algorithms admits N! possible
variants where the algorithms are listed in a different order. Since
N! grows much faster than 2^N, it is wise to keep the list of hash
algorithms small.

This attack does not apply to hash and signature algorithms in a
credential authenticating this manifest's root directory object, since the credential object is not itself hashed or
signed. Any desired set of hash and signature algorithms can be used in the credential. It is recommended, however, that the
hash algorithms used in the credential exactly match those named in the
manifest header.


===Directory objects===
===Directory objects===
Line 216: Line 188:


A directory object is an envelope with type "dir" and version
A directory object is an envelope with type "dir" and version
1. The datum is a map:
1. The datum is a 2-tuple (JSON 2-element list):

(hash-algorithm-list, contents-map) ::
(List<String>, Map)

The hash-algorithm-list is an ordered list of opaque algorithm
identifiers. There must be at least one item in the list. This hash algorithm list is used by the contents-map, which contains:

{ filename1: file-descriptor1, ..., filenameN: file-descriptorN } ::
{ filename1: file-descriptor1, ..., filenameN: file-descriptorN } ::
Map<String, Map<String, Value>>
Map<String, Map<String, Value>>
Line 237: Line 216:


The list of hashes under key 'h' should match the order of
The list of hashes under key 'h' should match the order of
hash-algorithm-list in a containing manifest object. The hash for a
hash-algorithm-list. The hash for a
entry naming a subdirectory is defined to be the hash of the
entry naming a subdirectory is defined to be the hash of the
"canonical JSON"-encoded directory object describing that directory.
"canonical JSON"-encoded directory object describing that directory; the contents and order of the hash-algorithm-list in the subdirectory must exactly match the one for this directory.


Note that both numeric and symbolic user and group names are included. This is the same strategy [http://www.gnu.org/software/tar/manual/html_node/Attributes.html#index-numeric_002downer-485 used by the 'tar' utility]. When extracting or verifying a manifest, certain of this information may be ignored: for example, activities in the user's home directory will typically belong to the user, while 'system' activities may belong to root. Care should be taken when generating manifests to ensure that user and group information has standardized content, even if that information will be ignored in the expected use, to avoid having multiple valid manifests for a given activity or library bundle.
Note that both numeric and symbolic user and group names are included. This is the same strategy [http://www.gnu.org/software/tar/manual/html_node/Attributes.html#index-numeric_002downer-485 used by the 'tar' utility]. When extracting or verifying a manifest, certain of this information may be ignored: for example, activities in the user's home directory will typically belong to the user, while 'system' activities may belong to root. Care should be taken when generating manifests to ensure that user and group information has standardized content, even if that information will be ignored in the expected use, to avoid having multiple valid manifests for a given activity or library bundle.

In version 1 of the directory format, there must be exactly two
hash algorithms specified in the hash-algorithm-list, and they must be 'sha-256' and
'whirlpool-512' (ISO/IEC 10118-3), in that order.

Note that adding a bogus hash algorithm to the hash-algorithm-list
will allow injection of arbitrary data into the directory object, which could
comprise an attack on the hash algorithm(s) used. For this reason,
security-conscious applications will require a fixed algorithm list
(as version 1 does). Migration to a new hash algorithm requires
first distributing an upgrade which recognizes a new directory version. After the
upgrade, directories with the new version (and thus hash algorithm list)
can be published.

Similarly, a directory with N listed hash algorithms admits N! possible
variants where the algorithms are listed in a different order. Since
N! grows much faster than 2^N, it is wise to keep the list of hash
algorithms small.

This attack does not apply to hash and signature algorithms in a
credential authenticating this directory object, since the credential object is not itself hashed or signed. Any desired set of hash and signature algorithms can be used in the credential.


A small example of a directory object might be:
A small example of a directory object might be:
["dir", 1, {
["dir", 1, [ [ "sha1", "md5" ], {
"bar": { g: "users", g#: 1000, h: [ "e242ed3bffccdf271b7fbaf34ed72d089537b42f", "c157a79031e1c40f85931829bc5fc552" ], m: 33188, u: "olpc", u#: 1000 },
"bar": { g: "users", g#: 1000, h: [ "e242ed3bffccdf271b7fbaf34ed72d089537b42f", "c157a79031e1c40f85931829bc5fc552" ], m: 33188, u: "olpc", u#: 1000 },
"fifo": { g: "users", g#: 1000, m: 4516, u: "olpc", u#: 1000 },
"fifo": { g: "users", g#: 1000, m: 4516, u: "olpc", u#: 1000 },
Line 250: Line 250:
"null": { d: 2049, g: "users", g#: 1000, m: 8612, u: "olpc", u#: 1000 },
"null": { d: 2049, g: "users", g#: 1000, m: 8612, u: "olpc", u#: 1000 },
"subdir": { g: "users", g#: 1000, h: [ "da39a3ee5e6b4b0d3255bfef95601890afd80709", "d41d8cd98f00b204e9800998ecf8427e" ], m: 16804, u: "olpc", u#: 1000 }
"subdir": { g: "users", g#: 1000, h: [ "da39a3ee5e6b4b0d3255bfef95601890afd80709", "d41d8cd98f00b204e9800998ecf8427e" ], m: 16804, u: "olpc", u#: 1000 }
} ]
} ] ]
(Of course, the JSON encoding should quote map keys and use normalized whitespace
(Of course, the JSON encoding should quote map keys and remove whitespace
to actually be canonical JSON.)
to actually be [[Canonical JSON]].)


A simple manifest containing this directory might be:
A simple manifest containing this directory might be:
["manifest", 1, [
["manifest", 1, [
[ "sha1", "md5" ], [
["dir", 1, [ [ "sha1", "md5" ], {
["dir", 1, {
"bar": { g: "users", g#: 1000, h: [ "e242ed3bffccdf271b7fbaf34ed72d089537b42f", "c157a79031e1c40f85931829bc5fc552" ], m: 33188, u: "olpc", u#: 1000 },
"bar": { g: "users", g#: 1000, h: [ "e242ed3bffccdf271b7fbaf34ed72d089537b42f", "c157a79031e1c40f85931829bc5fc552" ], m: 33188, u: "olpc", u#: 1000 },
"fifo": { g: "users", g#: 1000, m: 4516, u: "olpc", u#: 1000 },
"fifo": { g: "users", g#: 1000, m: 4516, u: "olpc", u#: 1000 },
Line 264: Line 263:
"null": { d: 2049, g: "users", g#: 1000, m: 8612, u: "olpc", u#: 1000 },
"null": { d: 2049, g: "users", g#: 1000, m: 8612, u: "olpc", u#: 1000 },
"subdir": { g: "users", g#: 1000, h: [ "da39a3ee5e6b4b0d3255bfef95601890afd80709", "d41d8cd98f00b204e9800998ecf8427e" ], m: 16804, u: "olpc", u#: 1000 }
"subdir": { g: "users", g#: 1000, h: [ "da39a3ee5e6b4b0d3255bfef95601890afd80709", "d41d8cd98f00b204e9800998ecf8427e" ], m: 16804, u: "olpc", u#: 1000 }
} ],
} ] ],
[dir,1,{}]
[dir, 1, [ [ "sha1", "md5" ], {} ] ]
] ] ]
] ]


==Notes on resource-bounded parsing==
==Notes on resource-bounded parsing==

Revision as of 16:03, 27 August 2007

  This page is monitored by the OLPC team.


Pencil.png NOTE: The contents of this page are not set in stone, and are subject to change!

This page is a draft in active flux ...
Please leave suggestions on the talk page.

Pencil.png
Author
Michael Stone <michael@laptop.org>, Noah Kantrowitz <noah@laptop.org>, C. Scott Ananian <cscott@laptop.org>
Date
August 12, 2007

Our system for update verification manipulates five kinds of data: envelopes, keys, credentials, directory objects, and manifests.

Presently, we intend to use JSON as a transfer encoding because we often would like to represent structured data: trees, lists, tuples. JSON is a "modern S-expression" format which is easily and safely parseable in Python and other languages.

In security-sensitive applications, we will use a "canonical JSON". Canonical JSON is restricted such that there is only one possible encoding of a given data structure: whitespace is normalized, keys in mappings are ordered, and string encoding is restricted. Canonical JSON can be read by any standard JSON parser, although security-conscious applications will want to verify that the restrictions of canonical JSON format are observed. Canonical JSON format is fully specified in another document.

If we learn that we need a streamable or more space-efficient encoding, then we will consider alternatives.


Envelopes

Envelopes are fundamental self-describing objects. They contain an opaque string describing the type of the object, semi-opaque integers providing version information, and json-encoded contents. Envelopes are represented by 3-tuples (JSON 3-element lists) of the form:

 (type, version, data) :: (String, Integer, Value)
    ^       ^       ^
    |       |       \________ a JSON value in a format governed by
    |       |                   (type, version)
    |       \______ an opaque version identifier; see below.
    \_________ a hint about how to interpret the data field that follows.


Version information is opaque, and specifies the data format that must be supported by compliant clients. Version numbers are incomparable: only equality testing of versions is valid.

Keys

Keys are envelopes with type "key" and version 1. A key's datum consists of a tuple (JSON 3-element list):

 (algorithm, fingerprint, key)  :: (String, String, String)
      ^          ^         ^
      |          |         \______ an opaque string specifiying the actual key
      |          \______ an opaque string to use to refer to the key 
      \_________ an opaque algorithm identifier; 

Algorithms should be careful to specify a canonical representation for the key and fingerprint. In particular, if the key data and fingerprint are represented as hex digits, then (for example) only lowercase hex digits should be allowed.

In practice, we might see a key encoded as:

 ["key", 1, ["rsa-2048-pub", "12508713460986140897", "ae1908579871250981230896109761"]]

Initially, the key data should be the same hexadecimal-encoded octet string given in the firmware key spec, the key fingerprint should be the trailing 64 characters of that string, as in the firmware signature spec, and the algorithm string should be "rsa-2048-pub" exactly. An encoding for private keys is not (yet) given.

Credentials

A credential is an envelope with type "sig" and version 1. Conceptually, its datum consists of a relation whose tuples are signatures of the form:

 (signature-algorithm, key, message, signature)

However, in practice, the message is defined implicitly by context (i.e. the manifest with which this credential is paired) and we wish to use the credential/manifest system to help detect accidental corruption as early as possible, including corruption of the manifest or confusion of manifest/signature pairs.

Therefore, we actually propose to represent the datum of a message credential as a list of tuples, where each tuple (JSON 5-item list) is:

 (sig-alg, key-identifier, signature)

Each tuple in the list specifies a different signature algorithm. A valid credential must have at least one tuple in the list, and *all* signatures must verify for the credential to be valid. (Implementations may chose to verify only a subset of the provided signatures, either for performance reasons or to allow migration to future preferred signature algorithms, but any invalid signature invalidates the credential.) Since the list of tuples is nominally unordered, we will require that the encoding of the credential list tuples in lexicographic order. That is, tuples are sorted first lexicographically on sig-alg, then tuples with the same sig-alg are lexicographically sorted by key-identifier, and further ties are broken by the signature field if necessary. All tuples must be unique: the credential is invalid if the same (sig-alg, key-identifier, signature) tuple is present more than once.

Note that detached signatures are a pervasive implementation detail within bitfrost; we assume some external mechanism to associate this signature with the appropriate message.

The key fingerprint identifies the key that should be used to verify the signature. It must match the fingerprint specified for a "key" envelope exactly.

The signature itself is opaque.

As with keys, the specific signature algorithms should be careful to unambiguously encode the signature. For example, if hex digits are used, only lowercase hex digits should be permitted.

In actual JSON, we might imagine that such a message would look like:

 ["sig", 1, [ ["rsassa-pss-sha256", "123mykey", "0paqu3s1g"], 
              ["rsassa-pkcs1-rmd160", "9912key", "8567a"] ] ]

Initially, the only two sig-alg strings allowed are "rsassa-pss-sha256" and "rsassa-pkcs1-rmd160", defined using the same encoding as used in the firmware signature format.

Manifests

Manifests describe the contents of a filesystem. Their primary use is in describing software releases and allowing upgrades between these.

In version 1 of the manifest format, we have decided to omit unnecessary file metadata from the manifest. This will allow more efficient updates, because we don't have to synchronize irrelevant metadata. Omitted metadata includes file creation, modification, and access times, and hard link and xattr information. We also use both numeric and textual user and group ids, which may require care on the target system if the user/group databases are not synchronized. Our design is sufficient for the present; we will consider adding additional metadata to the manifest in future versions if a need can be demonstrated.

(Hard link information is omitted because (a) it seems to be unnecessary at the present, and (b) it is difficult to properly support in our current containerization copy-on-write framework. Manifest-generation tools should check that hard links are not used.)

A manifest is an envelope with type "manifest" and version 1.0. A manifest's datum is a list:

 directory-list :: List<Directory>

The directory-list is an list of directory envelopes, in post-order. The first directory corresponds to the root of the filesystem defined by the manifest, and it is followed by directory objects for its subdirectories (and their subdirectories) in the order in which they appear in the root directory object. For example, the directory objects for the directory tree:

  /
  /a
  /a/b
  /a/c
  /d
  /d/e
  /d/e/f
  /g

are written out in the order:

  [a, b, c, d, e, f, g]

The directory-list must include the 'root' object, but it need not be complete: subdirectory trees may be omitted from the manifest object if they can be obtained by other means. In the example above, 'f' could be omitted, and 'e' and 'f' could be omitted, but 'f' should not be included in the directory list if 'e' is omitted. Omitted subdirectories can be discovered by comparing the hash of each directory object encountered with the hash declared in its parent directory, and skipping omitted directories until the hashes match.

Only the root directory need be included in a manifest transmitted for the purpose of validating a directory tree: hashes for the subdirectories can be recomputed from the filesystem contents and compared to those named in the root directory object. Similarly, one may chose to store subdirectory manifest objects in special hidden files in the directories themselves, either as bare JSON-encoded directory objects, or as full-fledged manifests with their (sub)directory lists. The manifest format allows a completely self-contained manifest, however, where all subdirectories are contained in the directory list.

Since subdirectories may be elided from the directory list, the credential authenticating a manifest should sign the root directory object (ie, 'a' in the example above). The validity of the remaining subdirectories is assured by the hash tree construction of the directory objects. The credential need not share hash algorithms with the manifest.

Although objects may be omitted from the directory list, it should be considered an error to include extraneous directories not directly referenced from the included directory objects.

A manifest is just a convenience object for bundling a number of related directory objects; it should not be directly signed. Instead, the root directory object in the manifest should be the element which receives a signature.

Directory objects

Directory objects describe the contents of a single directory; a manifest is built from a collection of directory objects.

A directory object is an envelope with type "dir" and version 1. The datum is a 2-tuple (JSON 2-element list):

 (hash-algorithm-list, contents-map) ::
                     (List<String>, Map)

The hash-algorithm-list is an ordered list of opaque algorithm identifiers. There must be at least one item in the list. This hash algorithm list is used by the contents-map, which contains:

 { filename1: file-descriptor1, ..., filenameN: file-descriptorN } ::
                                      Map<String, Map<String, Value>>

The map keys are filenames for the contents in the directory. Path components are not included in these keys. The values are maps describing the file entry, with the following keys:

  m:  numeric POSIX mode of the entry
  u:  username of the entry's owner (string)
  u#: numeric UID of the entry's owner
  g:  group name of the entry's group (string)
  g#: numeric GID of the entry's group
  d:  numeric device (for entries where mode indicates a character or
                      block device ONLY)
      [This is 'st_rdev' in the entry's lstat]
  l: link target (for entries where mode indicates a symlink ONLY)
  h: an ordered list of hashes for this entry (for regular files or
                                               directories ONLY)

The list of hashes under key 'h' should match the order of hash-algorithm-list. The hash for a entry naming a subdirectory is defined to be the hash of the "canonical JSON"-encoded directory object describing that directory; the contents and order of the hash-algorithm-list in the subdirectory must exactly match the one for this directory.

Note that both numeric and symbolic user and group names are included. This is the same strategy used by the 'tar' utility. When extracting or verifying a manifest, certain of this information may be ignored: for example, activities in the user's home directory will typically belong to the user, while 'system' activities may belong to root. Care should be taken when generating manifests to ensure that user and group information has standardized content, even if that information will be ignored in the expected use, to avoid having multiple valid manifests for a given activity or library bundle.

In version 1 of the directory format, there must be exactly two hash algorithms specified in the hash-algorithm-list, and they must be 'sha-256' and 'whirlpool-512' (ISO/IEC 10118-3), in that order.

Note that adding a bogus hash algorithm to the hash-algorithm-list will allow injection of arbitrary data into the directory object, which could comprise an attack on the hash algorithm(s) used. For this reason, security-conscious applications will require a fixed algorithm list (as version 1 does). Migration to a new hash algorithm requires first distributing an upgrade which recognizes a new directory version. After the upgrade, directories with the new version (and thus hash algorithm list) can be published.

Similarly, a directory with N listed hash algorithms admits N! possible variants where the algorithms are listed in a different order. Since N! grows much faster than 2^N, it is wise to keep the list of hash algorithms small.

This attack does not apply to hash and signature algorithms in a credential authenticating this directory object, since the credential object is not itself hashed or signed. Any desired set of hash and signature algorithms can be used in the credential.

A small example of a directory object might be:

 ["dir", 1, [ [ "sha1", "md5" ], {
   "bar": { g: "users", g#: 1000, h: [ "e242ed3bffccdf271b7fbaf34ed72d089537b42f", "c157a79031e1c40f85931829bc5fc552" ], m: 33188, u: "olpc", u#: 1000 },
   "fifo": { g: "users", g#: 1000, m: 4516, u: "olpc", u#: 1000 },
   "frobnitz": { g: "users", g#: 1000, l: "bar", m: 41380, u: "olpc", u#: 1000 },
   "null": { d: 2049, g: "users", g#: 1000, m: 8612, u: "olpc", u#: 1000 },
   "subdir": { g: "users", g#: 1000, h: [ "da39a3ee5e6b4b0d3255bfef95601890afd80709", "d41d8cd98f00b204e9800998ecf8427e" ], m: 16804, u: "olpc", u#: 1000 }
 } ] ]

(Of course, the JSON encoding should quote map keys and remove whitespace to actually be Canonical JSON.)

A simple manifest containing this directory might be:

 ["manifest", 1, [
       ["dir", 1, [ [ "sha1", "md5" ], {
         "bar": { g: "users", g#: 1000, h: [ "e242ed3bffccdf271b7fbaf34ed72d089537b42f", "c157a79031e1c40f85931829bc5fc552" ], m: 33188, u: "olpc", u#: 1000 },
         "fifo": { g: "users", g#: 1000, m: 4516, u: "olpc", u#: 1000 },
         "frobnitz": { g: "users", g#: 1000, l: "bar", m: 41380, u: "olpc", u#: 1000 },
         "null": { d: 2049, g: "users", g#: 1000, m: 8612, u: "olpc", u#: 1000 },
         "subdir": { g: "users", g#: 1000, h: [ "da39a3ee5e6b4b0d3255bfef95601890afd80709", "d41d8cd98f00b204e9800998ecf8427e" ], m: 16804, u: "olpc", u#: 1000 }
       } ] ],
       [dir, 1, [ [ "sha1", "md5" ], {} ] ]
     ] ]

Notes on resource-bounded parsing

Small bounds can be enforced when parsing directory entry descriptions in directory objects: the numeric components can be limited to 2^32=10 digits (2^64=20 digits for the 'size' field), and the string components (hash, symlink, entry key) can be limited to 256 characters. It is a little more problematic to limit the number of entries "reasonably" found in a directory, but some empirical bound can be set here as well.

Credentials can be resource-bounded as well, using small bounds on the number of signatures included and the size of the hash and signature algorithm descriptors and the signature data.

For manifests, small bounds on the hash algorithm list are wise to limit the possible permutations which can be generated. The 'directory-list' in the manifest will have as many elements as there are directories in the filesystem. However, the directory list can be directly checked as the entries are parsed, requiring space proportional only to the depth of the directory tree (since we must keep around subdirectory hashes until we parse the subdirectory's object). Thus, a limit on directory depth can be used instead of limits on aggregate size of the manifest.