[Next] [Previous] [Up] [Top] [Full Contents] [Search]

8. Content Parameters

8.1 Media Types

HTTP uses Internet Media Types [15], formerly referred to as MIME Content-Types [6], in order to provide open and extensible data typing and type negotiation. For mail applications, where there is no type negotiation between sender and receiver, it is reasonable to put strict limits on the set of allowed media types. With HTTP, however, user agents can identify acceptable media types as part of the connection, and thus are allowed more freedom in the use of non-registered types. The following grammar for media types is a superset of that for MIME because it does not restrict itself to the official IANA and x-token types.

media-type	=	type "/" subtype *( ";" parameter )
type	=	token
subtype	=	token
Parameters may follow the type/subtype in the form of attribute/value pairs.

parameter	=	attribute "=" value
attribute	=	token
value	=	token | quoted-string
The type, subtype, and parameter attribute names are not case-sensitive. Parameter values may or may not be case-sensitive, depending on the semantics of the parameter name. No LWS is allowed between the type and subtype, nor between an attribute and its value.

If a given media-type value has been registered by the IANA, any use of that value must be indicative of the registered data format. Although HTTP allows the use of non-registered media types, such usage must not conflict with the IANA registry. Data providers are strongly encouraged to register their media types with IANA via the procedures outlined in RFC 1590 [15].

All media-type's registered by IANA must be preferred over extension tokens. However, HTTP does not limit conforming applications to the use of officially registered media types, nor does it encourage the use of an "x-" prefix for unofficial types outside of explicitly short experimental use between consenting applications.

8.1.1 Canonicalization and Text Defaults

Media types are registered in a canonical form. In general, entity bodies transferred via HTTP must be represented in the appropriate canonical form prior to transmission. If the body has been encoded via a Content-Encoding and/or Content-Transfer-Encoding, the data must be in canonical form prior to that encoding. However, HTTP modifies the canonical form requirements for media of primary type "text" and for "application" types consisting of text-like records.

HTTP redefines the canonical form of text media to allow multiple octet sequences to indicate a text line break. In addition to the preferred form of CRLF, HTTP applications must accept a bare CR or LF alone as representing a single line break in text media. Furthermore, if the text media is represented in a character set which does not use octets 13 and 10 for CR and LF respectively (as is the case for some multi-byte character sets), HTTP allows the use of whatever octet sequence(s) is defined by that character set to represent the equivalent of CRLF, bare CR, and bare LF. It is assumed that any recipient capable of using such a character set will know the appropriate octet sequence for representing line breaks within that character set.

Note
This interpretation of line breaks applies only to the contents of an Entity-Body and only after any Content-Transfer-Encoding and/or Content-Encoding has been removed. All other HTTP constructs use CRLF exclusively to indicate a line break. Encoding mechanisms define their own line break requirements.

A recipient of an HTTP text entity should translate the received entity line breaks to the local line break conventions before saving the entity external to the application and its cache; whether this translation takes place immediately upon receipt of the entity, or only when prompted by the user, is entirely up to the individual application.

HTTP also redefines the default character set for text media in an entity body. If a textual media type defines a charset parameter with a registered default value of "US-ASCII", HTTP changes the default to be "ISO-8859-1". Since the character set ISO-8859-1 [19] is a superset of USASCII [18], this has no effect upon the interpretation of entity bodies which only contain octets within the US-ASCII set (0 - 127). The presence of a charset parameter value in a Content-Type header field overrides the default.

HTTP does not require that the character set of an entity body be labelled as the lowest common denominator of the character codes used within a document.

8.1.2 Multipart Types

MIME provides for a number of "multipart" types -- encapsulations of several entities within a single message's Entity-Body. The multipart types registered by IANA [17] do not have any special meaning for HTTP/1.0, though user agents may need to understand each type in order to correctly interpret the purpose of each body-part. Ideally, an HTTP user agent should follow the same or similar behavior as a MIME user agent does upon receipt of a multipart type.

As in MIME [6], all multipart types share a common syntax and must include a boundary parameter as part of the media type value. The message body is itself a protocol element and must therefore use only CRLF to represent line breaks between body-parts. Unlike in MIME, multipart body-parts may contain HTTP header fields which are significant to the meaning of that part.

A URI-header field (Section 7.1.13) should be included in the body-part for each enclosed entity that can be identified by a URI.


T. Berners-Lee, R. T. Fielding, H. Frystyk Nielsen - 12 MAR 95

[Next] [Previous] [Up] [Top] [Full Contents] [Search]

Generated with CERN WebMaker