# HTML 4，HTML 5，XHTML，MIME類型 - 權威資源

HTML與XHTML和XHTML作為text/html與XHTML作為XHTML的主題非常復雜。不幸的是，很難得到一個完整的圖片，因為信息主要是在網絡上的點點滴滴傳播，或深埋在W3C技術術語中。此外還有一些錯誤的信息被傳播。我建議將此作為關於該主題的權威SO資源，描述最重要的方面：

• HTML 4
• HTML 5
• XHTML 1.0 as text/html，application/xml + xhtml
• XHTML 1.1 as application/xml + xhtml

## 內容。

• 術語
• 語言和序列化
• 規格
• 瀏覽器分析程序和內容（MIME）類型
• 瀏覽器支持
• 驗證器和文檔類型定義
• Quirks，有限怪癖和標準模式。

## 術語

One of the difficulties of describing this is clearly that the 術語 within the official 產品規格 has changed over the years, since HTML was first introduced. What follows below is based on HTML5 術語. Also, "file" is used as a generic term to mean a file, document, input stream, octet stream, etc to avoid having to make fine distinctions.

## 語言和序列化

HTML和XHTML是根據語言和序列化定義的。

The serialization defines how mark-up is used to describe these elements and attributes within a text document. This includes which tags are required and which can be inferred, and the rules for those inferences. It describes such things as how void elements should be marked up (e.g. “>” vs “/>”) and when attribute values need to be quoted.

## 產品規格

HTML 4.01規範是定義HTML語言和HTML序列化的當前規範。

The xml 1.0 specification defines a serialization but leaves the language to be defined by other 產品規格, which are termed “XML applications”

The XHTML 1.0 and 1.1 產品規格 are both in use. Essentially, they use the same language as HTML 4.01 but use a different serialization, one that is compatible with the xml 1.0 specification. i.e. XHTML is an xml application.

The HTML5 (as of 2010-04-18, draft) specification describes a new language for both HTML and XHTML. This language is mostly a superset of the HTML 4.01 language, but is intended to only be backward compatible with existing web tools, (e.g. browsers, search engines and authoring tools) and not with previous 產品規格, where differences arise. So the meaning of some elements are occasionally changed from the earlier 產品規格. Similarly, each of the serializations are backward compatible with the current tools.

## 驗證器和文檔類型定義

HTML和XHTML文件可以以文檔類型定義（DTD）聲明開頭，該聲明指示文檔中使用的語言和序列化。驗證器（例如 http://validator.w3.org/ 上的驗證器）使用此信息來匹配語言和文件中使用的序列化對照DTD中定義的規則。然後，它會通過在文件中標記來根據違反DTD規則的位置報告錯誤。

Not all HTML serialization and language rules can be described in a DTD, so validators only test for a subset of all the rules described by the 產品規格.

HTML 4.01和XHTML 1.0定義了Strict，Transitional和Frameset DTD，它們在兼容文件中允許的語言元素和屬性不同。

## Quirks，有限怪癖和標準模式。

Where the DTD is not recognised, the mode is determined by a complex set of rules. One special case is where the public and system identifiers are omitted and the declaration is simply <!DOCTYPE HTML>. This is known to be the shortest doctype declaration where current browsers will treat the file as standards mode. For that reason, it is the declaration specified to be used for HTML5 compliant files.