Xah Lee, 2007-08, 2009-08-20, 2011-01-13
Emacs's regex is not based on Perl or Python's, but is very similar. In emacs regex, the parenthesis characters () are
literal. If you want to capture a pattern, you need to escape the
paren like this: \(myPattern\).
Here are some common patterns:
| Pattern | Matches |
|---|---|
| . | any single character |
| \. | one period |
| [0-9]+ | sequence of digits |
| [A-Za-z]+ | sequence of letters |
| [-A-Za-z0-9]+ | sequence of letter, digit, hyphen |
| [_A-Za-z0-9]+ | sequence of letter, digit, underscore |
| [-_A-Za-z0-9]+ | sequence of letter, digit, hyphen, underscore |
| [[:blank:]]+ | sequence of tabs and spaces |
| [[:upper:]]+ | sequence of cap letters |
| [[:lower:]]+ | sequence of lowercase letters |
| "\([^"]+?\)" | capture text between double quotes (non-greedy) |
| “\([^”]+?\)” | capture text between curly double quotes (non-greedy; unicode char) |
| (\([^)]+?\)) | capture text between parenthesis (non-greedy) |
| + | means match previous pattern 1 or more times |
| * | means match previous pattern 0 or more times |
| ? | means match previous pattern 0 or 1 time |
If you are familiar with Perl's regex, here are some practical major differences.
\(pattern\) to capture instead.[A-z], as its meaning is currently ambiguous. Use [A-Za-z]./d, /w, /s does not work. Use [[:digit:]], [[:word:]], [[:space:]] instead. For example, perl's /d+ for one or more digits is emacs's [[:digit:]]+. (Yes, you need double brackets)\t, \n. To enter a literal Tab, press 【Ctrl+q Tab】. To match a new line, press 【Ctrl+q Ctrl+j】. (For explanation, see: Emacs's Key Notations Explained (/r, ^M, C-m, RET, <return>, M-, meta)) In elisp string, you can use \t, \n and no need to double the backslash.\n (Line Feed; ASCII 10). You do not need to worry whether the file has unix style line ending or Windows or Mac. Also, you should NOT change a file's eol style by doing find & replace. (See: Emacs Line Return and Windows, Unix, Mac, All That ^M ^J ^L.)Emacs has a interactive regex mode. It show matches as you type. To go into the mode, call “regexp-builder”.
Alternatively, you can call “query-replace-regexp” to test your pattern.
To test regex in your elisp code, you can open a empty file and place the regex function at top and the text you want to match below it, like this:
(search-forward-regexp "yourRegex")
whatever text here
Then, put your cursor to the right of the closing parenthesis, then call “eval-last-sexp”. If your regex matches, it'll move cursor to the last char of the matched text. If you get a lisp error saying search failed, then your regex didn't match. If you get a lisp syntax error, then you probably screwed up on the backslashs.
In a lisp regex function that takes a regex string (e.g. “search-forward-regexp”), you will need to use double backslash. This is because, in elisp string, a backslash needs to be prefixed with a backslash, then, this interpreted string is passed to emacs's regex engine.
For example, suppose you have this text:
Sin[x] + Sin[y]
and you need to capture the x or y. You can use:
(search-backward-regexp "\\(\\[[a-z]\\]\\)")
The regex engine really just got:
\(\[[a-z]\]\)