[Next] [Previous] [Up] [Top] [Full Contents] [Search]

8. Content Parameters

8.3 Character Sets

HTTP uses the same definition of the term "character set" as that described for MIME:

The term "character set" is used in this document to refer to a method used with one or more tables to convert a sequence of octets into a sequence of characters. Note that unconditional conversion in the other direction is not required, in that not all characters may be available in a given character set and a character set may provide more than one sequence of octets to represent a particular character. This definition is intended to allow various kinds of character encodings, from simple single-table mappings such as US-ASCII to complex table switching methods such as those that use ISO 2022's techniques. However, the definition associated with a MIME character set name must fully specify the mapping to be performed from octets to characters. In particular, use of external profiling information to determine the exact mapping is not permitted.

Character sets are identified by case-insensitive tokens. The complete set of allowed charset values are defined by the IANA Character Set registry [17]. However, because that registry does not define a single, consistent token for each character set, we define here the preferred names for those character sets most likely to be used with HTTP entities. This set of charset values includes those registered by RFC 1521 [6] -- the US-ASCII [18] and ISO8859 [19] character sets -- and other character set names specifically recommended for use within MIME charset parameters.

charset	=	"US-ASCII"
	|	"ISO-8859-1" | "ISO-8859-2" | "ISO-8859-3"
	|	"ISO-8859-4" | "ISO-8859-5" | "ISO-8859-6"
	|	"ISO-8859-7" | "ISO-8859-8" | "ISO-8859-9"
	|	"ISO-2022-JP" | "ISO-2022-JP-2" | "ISO-2022-KR"
	|	"UNICODE-1-1" | "UNICODE-1-1-UTF-7" | "UNICODE-1-1-UTF-8"
	|	token
Although HTTP allows an arbitrary token to be used as a character set value, any token that has a predefined value within the IANA Character Set registry [17] must represent the character set defined by that registry. Applications are encouraged, but not required, to limit their use of character sets to those defined by the IANA registry.


T. Berners-Lee, R. T. Fielding, H. Frystyk Nielsen - 12 MAR 95

[Next] [Previous] [Up] [Top] [Full Contents] [Search]

Generated with CERN WebMaker