Xah Lee, 2010-08, 2011-02, 2011-03-14
This article shows a convenient emacs lisp package 〔xfrp_find_replace_pairs.el〕 for doing multi-pair find & replace.
You have a given region in a buffer. You want to do more than one pair of find & replace strings. For example:
HTML entities:
& ⟷ &< ⟷ <> ⟷ >URL percentage encoding:
“ ” ⟷ “%20”~ ⟷ %7e, ⟷ %2cFor more examples, see: Emacs Lisp Multi-Pair Find & Replace Applications.
The normal idiom to do find replace in a region is like this:
(defun replace-html-chars-region (start end) "Replace “<” to “<” and some other chars in HTML. This works on the current region." (interactive "r") (save-restriction (narrow-to-region start end) (goto-char (point-min)) (while (search-forward "&" nil t) (replace-match "&" nil t)) (goto-char (point-min)) (while (search-forward "<" nil t) (replace-match "<" nil t)) (goto-char (point-min)) (while (search-forward ">" nil t) (replace-match ">" nil t)) ) )
Basically, you narrow to region, and for each pair you use a while loop. This is quite cumbersome.
It would be nicer, if you can write it like this:
(defun replace-html-chars-region (start end) "Replace “<” to “<” and some other chars in HTML. This works on the current region." (interactive "r") (replace-pairs-region start end '( ["&" "&"] ["<" "<"] [">" ">"] ) ))
I wrote a elisp package that solves this problem. It can be downloaded at: code.google.com xfrp_find_replace_pairs.el.
Here are several elisp functions that make this easy.
For each function, there's a plain text version and a regex version.
Each function also has a string and region version. The string version works on a given string, the region version works on a region in buffer. This saves you from doing string/region conversion.
The region versions call the string versions to do their work. This makes the code more manageable.
Both the string versions call the built-in elisp function “replace-regexp-in-string” to do their work.
The code is 130 lines (not counting comment header). Here's the main code that does the bulk of the work.
(defun replace-pairs-in-string (str pairs) "Replace string STR by find & replace PAIRS sequence. Example: (replace-pairs-in-string \"abcdef\" '([\"a\" \"1\"] [\"b\" \"2\"] [\"c\" \"3\"])) ⇒ “\"123def\"”. The search strings are not case sensitive. The replacement are literal and case sensitive. If you want search strings to be case sensitive, set case-fold-search to nil. Like this: (let ((case-fold-search nil)) (replace-regexp-in-string-pairs …) Once a subsring in the input string is replaced, that part is not changed again. For example, if the input string is “abcd”, and the pairs are a → c and c → d, then, result is “cbdd”, not “dbdd”. See also `replace-pairs-in-string-recursive'. This function calls `replace-regexp-in-string' to do its work. See also `replace-regexp-pairs-in-string'." (let (ii (mystr str) (randomStrList '())) (random t) ; set a seed ;; generate a random string list for intermediate replacement (setq ii 0) (while (< ii (length pairs)) (setq randomStrList (cons (concat "ㄓ" (number-to-string (random)) "ㄘ") ; use rarely used unicode char to prevent match in input string randomStrList )) (setq ii (1+ ii)) ) ;; replace each find string by corresponding item in random string list (setq ii 0) (while (< ii (length pairs)) (setq mystr (replace-regexp-in-string (regexp-quote (elt (elt pairs ii) 0)) (elt randomStrList ii) mystr t t)) (setq ii (1+ ii)) ) ;; replace each random string by corresponding replacement string (setq ii 0) (while (< ii (length pairs)) (setq mystr (replace-regexp-in-string (elt randomStrList ii) (elt (elt pairs ii) 1) mystr t t)) (setq ii (1+ ii)) ) mystr))
One interesting issue about multiple find & replace is that the input string is recursively replaced, and you may end up with a substring that's not in the original input string nor in any of the find & replace pairs.
For example, if the input string is “abcd”, and the pairs are “a → c” and “c → d”, then, result is “dbdd”, though most of the time you want “cbdd”. This is especially important if you use regex in your find string.
The function “replace-pairs-in-string” will not do feedback loop. It guarantees that a replacement is done IF AND ONLY IF the original input string contains a substring in one of your find string.
This is important when you do complex text processing such as transforming HTML4 to HTML5 or HTML to XHTML.
For a version that does feedback, use “replace-pairs-in-string-recursive”, also in the package.
To implement the non-feedback version, i first replace the string to a intermediate random string. For example, suppose the input pairs are “a → b” and “c → d”. Then, the code will actually do this:
The random string so generated should not happen in the input string. This is achived by using rarely used char in Unicode plus a random number, for the intermediate string.
For about 10 examples of using multi-pair find & replace, See: Emacs Lisp Multi-Pair Find & Replace Applications.
Emacs is fantastic!