5. Use of encoded-words in message headers
 メッセージヘッダの encoded-word の使用

An 'encoded-word' may appear in a message header or body part header according to the following rules:


  1. An 'encoded-word' may replace a 'text' token (as defined by RFC 822) in any Subject or Comments header field, any extension message header field, or any MIME body part field for which the field body is defined as '*text'. An 'encoded-word' may also appear in any user-defined ("X-") message or body part header field.

    『encoded-word』は、どのようなサブジェクトまたはコメントのヘッダーフィールド、すべての拡張メッセージヘッダフィールド、またはフィールドボディが『*text』と定義されるすべてのMIMEボディ部分フィールドででも、『text』トークン(RFC 822によって定義されるような)を置換するかもしれません。『encoded-word』はどのようなユーザ定義の(「X-」)メッセージまたはボディ部分ヘッダーフィールドの中ででもまた出現するかもしれません。

    Ordinary ASCII text and 'encoded-word's may appear together in the same header field. However, an 'encoded-word' that appears in a header field defined as '*text' MUST be separated from any adjacent 'encoded-word' or 'text' by 'linear-white-space'.


  2. An 'encoded-word' may appear within a 'comment' delimited by "(" and ")", i.e., wherever a 'ctext' is allowed. More precisely, the RFC 822 ABNF definition for 'comment' is amended as follows:

    たとえ、『ctext』がすなわちどこで許されても、『encoded-word』は「(」と「)」によって区切られた「コメント」中で出現するかもしれません。より正確に、『コメント』のためのRFC 822 ABNF定義は次の通り改正されます:

    comment = "(" *(ctext / quoted-pair / comment / encoded-word) ")"

    A "Q"-encoded 'encoded-word' which appears in a 'comment' MUST NOT contain the characters "(", ")" or " 'encoded-word' that appears in a 'comment' MUST be separated from any adjacent 'encoded-word' or 'ctext' by 'linear-white-space'.

    『コメント』に出現する「Q」符号化された『encoded-word』は、「(」(、)「)」というキャラクタを含んではならないか、または、「encoded-word」それは『コメント』に出現します どのような隣接の「encoded-word」または「ctext」からでも『線形の空白類』により分離されなければなりません。

    It is important to note that 'comment's are only recognized inside "structured" field bodies. In fields whose bodies are defined as '*text', "(" and ")" are treated as ordinary characters rather than comment delimiters, and rule (1) of this section applies. (See RFC 822, sections 3.1.2 and 3.1.3)

    『comment』において出現する『encoded-word』は、『linear-white-space』によってあらゆる隣接の『encoded-word』または『ctext』から分離されなければなりません。 『comment』が、「structured」フィールドボディの中でだけ認められていることに注意することは重要です。ボディが『*text』と定義されるフィールドでは、「(」および「)」はコメント区切り記号ではなく普通の文字として扱われ、このセクションの規則(1)が適用されます。(RFC 822、セクション3.1.2と3.1.3を見てください)

  3. As a replacement for a 'word' entity within a 'phrase', for example, one that precedes an address in a From, To, or Cc header. The ABNF definition for 'phrase' from RFC 822 thus becomes:

    『フレーズ』(例えば、From、To、またはCcヘッダーにおいてアドレスに先行するもの)中の『ワード』実体の置換として。従って、RFC 822からの『フレーズ』のためのABNF定義は適しています:

    phrase = 1*( encoded-word / word )

    In this case the set of characters that may be used in a "Q"-encoded 'encoded-word' is restricted to: <upper and lower case ASCII letters, decimal digits, "!", "*", "+", "-", "/", "=", and "_" (underscore, ASCII 95.)>. An 'encoded-word' that appears within a 'phrase' MUST be separated from any adjacent 'word', 'text' or 'special' by 'linear-white-space'.

    この場合に、「Q」符号化された『encoded-word』において使われることができるキャラクタのセットは次に限定されます。<大文字と小文字ASCII文字、10進数字、「!」、「*」、「+」、「-」、「/」、「=」、および「_」(下線、ASCII 95)>。『フレーズ』中で出現する『encoded-word』は『linear-white-space』によってどのような隣接の『word』『text』または『special』でもから分離しなければなりません。

These are the ONLY locations where an 'encoded-word' may appear. In particular:


The 'encoded-text' in an 'encoded-word' must be self-contained; 'encoded-text' MUST NOT be continued from one 'encoded-word' to another. This implies that the 'encoded-text' portion of a "B" 'encoded-word' will be a multiple of 4 characters long; for a "Q" 'encoded-word', any "=" character that appears in the 'encoded-text' portion will be followed by two hexadecimal characters.

『encoded-word』において『encoded-text』は自己完結型にちがいありません; 『encoded-text』は一方の『encoded-word』から別のものに続いてはなりません。これは、「B」エンコードは『encoded-word』の『encoded-text』部分が長い間4文字の倍数になるであろうということを暗示しています。「Q」エンコードは『encoded-word』のために、『encoded-text』部分において表れるどのような「=」キャラクタにでも2つの16進の性格によって続くでしょう。

Each 'encoded-word' MUST encode an integral number of octets. The 'encoded-text' in each 'encoded-word' must be well-formed according to the encoding specified; the 'encoded-text' may not be continued in the next 'encoded-word'. (For example, "=?charset?Q?=?= =?charset?Q?AB?=" would be illegal, because the two hex digits "AB" must follow the "=" in the same 'encoded-word'.)

各『encoded-word』は8ビットバイトの整数を符号化しなければなりません。指定されたエンコーディングに従って各『encoded-word』において『encoded-text』は適格にちがいありません。『encoded-text』は次の『encoded-word』において続かないかもしれません。(例えば、2つの六角形の数字「AB」が同じ『encoded-word』において「=」に続かなければならないので、「=?charset?Q?=?= =?charset?Q?AB?=」は不当でしょう。)

Each 'encoded-word' MUST represent an integral number of characters. A multi-octet character may not be split across adjacent 'encoded- word's.


Only printable and white space character data should be encoded using this scheme. However, since these encoding schemes allow the encoding of arbitrary octet values, mail readers that implement this decoding should also ensure that display of the decoded data on the recipient's terminal will not cause unwanted side-effects.


Use of these methods to encode non-textual data (e.g., pictures or sounds) is not defined by this memo. Use of 'encoded-word's to represent strings of purely ASCII characters is allowed, but discouraged. In rare cases it may be necessary to encode ordinary text that looks like an 'encoded-word'.


6. Support of 'encoded-word's by mail readers

6.1. Recognition of 'encoded-word's in message headers

A mail reader must parse the message and body part headers according to the rules in RFC 822 to correctly recognize 'encoded-word's.

メールリーダは、正しく『encoded-word』を認識するために、RFC 822のルールに従ってメッセージヘッダおよびボディパートヘッダを解析しなければなりません。

'encoded-word's are to be recognized as follows:


  1. Any message or body part header field defined as '*text', or any user-defined header field, should be parsed as follows: Beginning at the start of the field-body and immediately following each occurrence of 'linear-white-space', each sequence of up to 75 printable characters (not containing any 'linear-white-space') should be examined to see if it is an 'encoded-word' according to the syntax rules in section 2. Any other sequence of printable characters should be treated as ordinary ASCII text.

    『*text』と定義される任意のメッセージヘッダあるいはボディパートヘッダフィールド、またはユーザが定義したヘッダフィールドは次のように解析されなければなりません。フィールドボディの開始点、およびそれぞれの『linear-white-space』の直後の最高75文字まで表示可能な文字(どの『linear-white-space』も含まない) のシーケンスは、セクション2の文法に従い『encoded-word』かどうかをテストするべきです。表示可能文字以外のシーケンスは、普通のASCIIテキストとして扱われるべきです。

  2. Any header field not defined as '*text' should be parsed according to the syntax rules for that header field. However, any 'word' that appears within a 'phrase' should be treated as an 'encoded-word' if it meets the syntax rules in section 2. Otherwise it should be treated as an ordinary 'word'.


  3. Within a 'comment', any sequence of up to 75 printable characters (not containing 'linear-white-space'), that meets the syntax rules in section 2, should be treated as an 'encoded-word'. Otherwise it should be treated as normal comment text.


  4. A MIME-Version header field is NOT required to be present for 'encoded-word's to be interpreted according to this specification. One reason for this is that the mail reader is not expected to parse the entire message header before displaying lines that may contain 'encoded-word's.


6.2. Display of 'encoded-word's

Any 'encoded-word's so recognized are decoded, and if possible, the resulting unencoded text is displayed in the original character set.


NOTE: Decoding and display of encoded-words occurs *after* a structured field body is parsed into tokens. It is therefore possible to hide 'special' characters in encoded-words which, when displayed, will be indistinguishable from 'special' characters in the surrounding text. For this and other reasons, it is NOT generally possible to translate a message header containing 'encoded-word's to an unencoded form which can be parsed by an RFC 822 mail reader.

注意事項: 符号化ワードの復元と表示が起こる *after* 構造化フィールドボディはトークンに構文解析されます。『special』キャラクタを、表示される時に、周辺のテキストの中の『special』キャラクタと見分けがつかないようになるであろう符号化されたワードに隠すことは従って可能です。これと他の理由にとって、RFC 822メールリーダによって構文解析されることができる符号化されなかった形式に『encoded-word』を含んでいるメッセージヘッダを翻訳することは一般に可能でありません。

When displaying a particular header field that contains multiple 'encoded-word's, any 'linear-white-space' that separates a pair of adjacent 'encoded-word's is ignored. (This is to allow the use of multiple 'encoded-word's to represent long strings of unencoded text, without having to separate 'encoded-word's where spaces occur in the unencoded text.)


In the event other encodings are defined in the future, and the mail reader does not support the encoding used, it may either (a) display the 'encoded-word' as ordinary text, or (b) substitute an appropriate message indicating that the text could not be decoded.


If the mail reader does not support the character set used, it may (a) display the 'encoded-word' as ordinary text (i.e., as it appears in the header), (b) make a "best effort" to display using such characters as are available, or (c) substitute an appropriate message indicating that the decoded text could not be displayed.

メールリーダが使われた文字セットをサポートしないならば、(a) 通常のテキスト(すなわち、それがヘッダーの中で出現する時)として、『encoded-word』を表示するか、(b) 使用可能なそのようなキャラクタを使うことを表示するために、「最もよい努力」をする。または、(c) デコードされたテキストが表示されることができなかったことを示している適切なメッセージを代用してください。

If the character set being used employs code-switching techniques, display of the encoded text implicitly begins in "ASCII mode". In addition, the mail reader must ensure that the output device is once again in "ASCII mode" after the 'encoded-word' is displayed.


6.3. Mail reader handling of incorrectly formed 'encoded-word's

It is possible that an 'encoded-word' that is legal according to the syntax defined in section 2, is incorrectly formed according to the rules for the encoding being used. For example:

構文に従って正当な『encoded-word』がセクション2中で定義したことは可能で、使われているエンコーディングのために規則に従って間違って成形されます。 例えば:

  1. An 'encoded-word' which contains characters which are not legal for a particular encoding (for example, a "-" in the "B" encoding, or a SPACE or HTAB in either the "B" or "Q" encoding), is incorrectly formed.


  2. Any 'encoded-word' which encodes a non-integral number of characters or octets is incorrectly formed.


A mail reader need not attempt to display the text associated with an 'encoded-word' that is incorrectly formed. However, a mail reader MUST NOT prevent the display or handling of a message because an 'encoded-word' is incorrectly formed.


7. Conformance

A mail composing program claiming compliance with this specification MUST ensure that any string of non-white-space printable ASCII characters within a '*text' or '*ctext' that begins with "=?" and ends with "?=" be a valid 'encoded-word'. ("begins" means: at the start of the field-body, immediately following 'linear-white-space', or immediately following a "(" for an 'encoded-word' within '*ctext'; "ends" means: at the end of the field-body, immediately preceding 'linear-white-space', or immediately preceding a ")" for an 'encoded-word' within '*ctext'.) In addition, any 'word' within a 'phrase' that begins with "=?" and ends with "?=" must be a valid 'encoded-word'.


A mail reading program claiming compliance with this specification must be able to distinguish 'encoded-word's from 'text', 'ctext', or 'word's, according to the rules in section 6, anytime they appear in appropriate places in message headers. It must support both the "B" and "Q" encodings for any character set which it supports. The program must be able to display the unencoded text if the character set is "US-ASCII". For the ISO-8859-* character sets, the mail reading program must at least be able to display the characters which are also in the ASCII set.


8. Examples

The following are examples of message headers containing 'encoded-word's:


From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
CC: =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be>
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=

Note: In the first 'encoded-word' of the Subject field above, the last "=" at the end of the 'encoded-text' is necessary because each 'encoded-word' must be self-contained (the "=" character completes a group of 4 base64 characters representing 2 octets). An additional octet could have been encoded in the first 'encoded-word' (so that the encoded-word would contain an exact multiple of 3 encoded octets), except that the second 'encoded-word' uses a different 'charset' than the first one.

注意事項: 上のサブジェクトフィールドの最初の『encoded-word』において、各『encoded-word』が、自己完結型(「=」キャラクタは2つの8ビットバイトを表している4つのベース64キャラクタのグループを完成します)にちがいないので、『encoded-text』の終わりの最後の「=」は必要です。2番目の『encoded-word』が最初のものと違う『charset』を使う以外、追加の8ビットバイトは、最初の『encoded-word』(符号化されたワードが3つの符号化された8ビットバイトの正確な倍数を含むように)において符号化されたかもしれません。

From: =?ISO-8859-1?Q?Olle_J=E4rnefors?= <ojarnef@admin.kth.se>
To: ietf-822@dimacs.rutgers.edu, ojarnef@admin.kth.se
Subject: Time for ISO 10646?

To: Dave Crocker <dcrocker@mordor.stanford.edu>
Cc: ietf-822@dimacs.rutgers.edu, paf@comsol.se
From: =?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= <paf@nada.kth.se>
Subject: Re: RFC-HDR care and feeding

From: Nathaniel Borenstein <nsb@thumper.bellcore.com>
To: Greg Vaudreuil <gvaudre@NRI.Reston.VA.US>, Ned Freed
   <ned@innosoft.com>, Keith Moore <moore@cs.utk.edu>
Subject: Test of new header generator
MIME-Version: 1.0
Content-type: text/plain; charset=ISO-8859-1

The following examples illustrate how text containing 'encoded-word's which appear in a structured field body. The rules are slightly different for fields defined as '*text' because "(" and ")" are not recognized as 'comment' delimiters. [Section 5, paragraph (1)].


In each of the following examples, if the same sequence were to occur in a '*text' field, the "displayed as" form would NOT be treated as encoded words, but be identical to the "encoded form". This is because each of the encoded-words in the following examples is adjacent to a "(" or ")" character.

以下の例のそれぞれの中で、同じシーケンスが、『*text』フィールドに存在することでならば、「表示したので(displayed as)」形式は符号化されたワードとして扱われるのでなく「符号化された形式(encoded form)」に同一でしょう。これは、以下の例における符号化されたワードのそれぞれが「(」または「)」キャラクタに隣接のためです。

encoded form                                displayed as
(=?ISO-8859-1?Q?a?=)                        (a)

(=?ISO-8859-1?Q?a?= b)                      (a b)

Within a 'comment', white space MUST appear between an 'encoded-word' and surrounding text. [Section 5, paragraph (2)]. However, white space is not needed between the initial "(" that begins the 'comment', and the 'encoded-word'.


(=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=)     (ab)

White space between adjacent 'encoded-word's is not displayed.


(=?ISO-8859-1?Q?a?=  =?ISO-8859-1?Q?b?=)    (ab)

Even multiple SPACEs between 'encoded-word's are ignored for the purpose of display.

『encoded-word』の間の複数の SPACE も表示されません。

(=?ISO-8859-1?Q?a?=                         (ab)

Any amount of linear-space-white between 'encoded-word's, even if it includes a CRLF followed by one or more SPACEs, is ignored for the purposes of display.


(=?ISO-8859-1?Q?a_b?=)                      (a b)

In order to cause a SPACE to be displayed within a portion of encoded text, the SPACE MUST be encoded as part of the 'encoded-word'.


(=?ISO-8859-1?Q?a?= =?ISO-8859-2?Q?_b?=)    (a b)

In order to cause a SPACE to be displayed between two strings of encoded text, the SPACE MAY be encoded as part of one of the 'encoded-word's.


9. References

10. Security Considerations

Security issues are not discussed in this memo.


11. Acknowledgements

The author wishes to thank Nathaniel Borenstein, Issac Chan, Lutz Donnerhacke, Paul Eggert, Ned Freed, Andreas M. Kirchwitz, Olle Jarnefors, Mike Rosin, Yutaka Sato, Bart Schaefer, and Kazuhiko Yamamoto, for their helpful advice, insightful comments, and illuminating questions in response to earlier versions of this specification.

作者は有益な助言、洞察に満ちたコメント、および啓発的な質問についてこの仕様の以前のバージョンに対応して以下の方々に感謝します。Nathaniel Borenstein, Issac Chan, Lutz Donnerhacke, Paul Eggert, Ned Freed, Andreas M. Kirchwitz, Olle Jarnefors, Mike Rosin, Yutaka Sato, Bart Schaefer, Kazuhiko Yamamoto。

12. Author's Address

Keith Moore
University of Tennessee
107 Ayres Hall
Knoxville TN 37996-1301

EMail: moore@cs.utk.edu

Appendix - changes since RFC 1522 (in no particular order)
 付録 - RFC 1522 からの変更 (順不同)