Emacs GNU Texinfo Problems; Invalid HTML

By Xah Lee. Date:

The Texinfo software produces invalid HTML documents. The following gives some detail.

In around , i have created a HTML version of the GNU Emacs Lisp Reference Manual for my website, with corrections to the HTML and cleaned up CSS, so that the HTML are valid HTML documents and the CSS is handcrafted for better online presentation.

The version can be seen here: GNU Emacs Lisp Reference Manual. The story of my motivation is documented here (warning: rant): A Record of Frustration in IT Industry.

In the process of creating this cleaned up HTML version, there are several problems i found generated by the texinfo software outputing to HTML. Here's a summary. I hope others who want to convert GNU docs to valid HTML might benefit, or that texinfo developers might fix these problems.

Invalid HTML

Problems with texinfo generated HTML, with respect to HTML 4 transitional:

Problems with respect to HTML4 strict:

General HTML issues

Dead Links

In the elisp manual (one node per HTML page, roughly 850 HTML pages), there are 70 (local) links to other GNU documents. The local links are nice in that they provide cross-reference, but if one hosts only the elisp doc, all these local links will be dead.

Therefore, it would be nice, to have perhaps at texinfo level to embed markers to links that cross-ref to external docs, or perhaps at the HTML conversion level to provide a option to filter local links, so that local links can replaced as non-links (such as “See Emacs manual node on Abbrev”) or full http links to the right uri at gnu.org.

Vast majority of the 70 local links in the elisp doc are references to Emacs doc, but there are 6 that refers to widget, ses, cl, libc.

I presume that people who wishes to host GNU doc do not want to host the entire set of GNU's documentation.

Use of ASCII

Also, the texinfo still use the convention of backtick ` and straight single quote ' to emulate curly quotes “ ” or ‘ ’. It also use other ASCII kludge such as => instead of . The ability to display these chars has been widely available on commercial platforms since mid 1990s, and on Linuxes since about 2003 or so (emacs itself support Unicode to a practical degree since emacs 21, released in 2001). It is perhaps time to update GNU doc convention to utf8 and use the proper characters.

Emacs Modernization