Emacs Flaw: Elisp Syntax Table

By Xah Lee. Date:

emacs syntax table is truely, really, greatly annoying. We need to write a parsing lib without syntax table.

every time i thought of using it and relying on it, i run into bunch of problems.

for example, recently i rewrote a command to select text between quotes. It's based on syntax table.

(defun my-select-text-in-quote-1 ()
  "Select text between ASCII quotes, APOSTROPHE or QUOTATION MARK delimited."
  (interactive)
  (let (p1 p2)
    (if (nth 3 (syntax-ppss))
        (progn
          (backward-up-list 1 "ESCAPE-STRINGS" "NO-SYNTAX-CROSSING")
          (setq p1 (point))
          (forward-sexp 1)
          (setq p2 (point))
          (goto-char (1+ p1))
          (set-mark (1- p2)))
      (progn
        (error "Cursor not inside quote")))))

(note: backward-up-list changed in emacs 24.x. I'm using 24.4)

try the command in nxml-mode. Doesn't work. Why? probably because it has complex use of syntax table.

but syntax table is useful, no? No. It's not that emacs syntax table is useful. It's the underlying builtin emacs parsing lib that's useful.

For example, the select text in quote was like this, without using syntax table:

(defun my-select-text-in-quote-2 ()
  "Select text between the nearest left and right delimiters.
Delimiters are paired characters: ()[]<>«»“”‘’「」, including \"\"."
  (interactive)
  (let (b1 b2)
    (skip-chars-backward "^<>(“{[「«\"‘")
    (setq b1 (point))
    (skip-chars-forward "^<>)”}]」»\"’")
    (setq b2 (point))
    (set-mark b1)))

The problem was that, it couldn't deal with nested quotes or backslash escaped quotes. Nor can it deal with APOSTROPHE delimited strings, as used in {Python Tutorial, Ruby, HTML, etc}. (If you include APOSTROPHE as delimiter, it's a problem because it is also used as apostrophe, and happens often (e.g. “it's so!”).) But at least, for any QUOTATION MARK delimited string that doesn't contain backslash escaped quotes, it always works, reliably.

so, the solution is to add perhaps 50 lines of code to do parsing, or, rely on emacs builtin parser. ( Parsing Expressions (ELISP Manual) )

but if you rely on emacs parsing engine (such as the syntax-ppss function), it'll save you time writing the parser, but it relies on syntax table, meaning, your command's behavior is unpredictable, depending on each buffer/major-mode's syntax table.

Theoretically, the idea of syntax table is useful, because each major mode can have its own concept of quote, suitable for each language the major mode is designed for. Great. But in reality, it doesn't work out that way, as in this example of nxml-mode (which is written by the world's top xml expert James Clark, also a top emacs lisp expert. (who also wrote the classic html-mode and xml-mode in 1990s, all part of emacs.))

All these html modes had complex use of syntax table, because syntax table is not flexible. (e.g. in HTML, the APOSTROPHE isn't normally a quoting character, except when inside a tag (and not already inside a quote).)

<div class='wow' title="it's">complex</div>

Emacs's syntax table is mostly designed for just 3 languages of the 1990s: {C, Lisp, TeX}.

https://plus.google.com/113859563190964307534/posts/PWchD5VtnpZ