HomeMathComputingArtsWordsLiteratureMusictwitter facebook webfeed

Emacs Lisp: Multi-Pair String Replacement: xfrp_find_replace_pairs.el

Advertise Here For Profit

Xah Lee, 2010-08, 2011-02, 2011-03-14

This article shows a convenient emacs lisp package 〔xfrp_find_replace_pairs.el〕 for doing multi-pair find & replace.

Examples of Multi-Pair Replacement

You have a given region in a buffer. You want to do more than one pair of find & replace strings. For example:

HTML entities:

URL percentage encoding:

For more examples, see: Emacs Lisp Multi-Pair Find & Replace Applications.

Standard Elisp Solution for Multi-Pair Replacement

The normal idiom to do find replace in a region is like this:

(defun replace-html-chars-region (start end)
  "Replace “<” to “&lt;” and some other chars in HTML.
This works on the current region."
  (interactive "r")
  (save-restriction 
    (narrow-to-region start end)

    (goto-char (point-min))
    (while (search-forward "&" nil t) (replace-match "&amp;" nil t))

    (goto-char (point-min))
    (while (search-forward "<" nil t) (replace-match "&lt;" nil t))

    (goto-char (point-min))
    (while (search-forward ">" nil t) (replace-match "&gt;" nil t))
    ) )

Basically, you narrow to region, and for each pair you use a while loop. This is quite cumbersome.

It would be nicer, if you can write it like this:

(defun replace-html-chars-region (start end)
  "Replace “<” to “&lt;” and some other chars in HTML.
This works on the current region."
  (interactive "r")
  (replace-pairs-region start end
 '(
 ["&" "&amp;"]
 ["<" "&lt;"]
 [">" "&gt;"]
 )
 ))

Emacs Lisp Package: xfrp_find_replace_pairs.el

I wrote a elisp package that solves this problem. It can be downloaded at: code.google.com xfrp_find_replace_pairs.el.

Here are several elisp functions that make this easy.

For each function, there's a plain text version and a regex version.

Each function also has a string and region version. The string version works on a given string, the region version works on a region in buffer. This saves you from doing string/region conversion.

Some Discussion of Implementation

The region versions call the string versions to do their work. This makes the code more manageable.

Both the string versions call the built-in elisp function “replace-regexp-in-string” to do their work.

Main Code of replace-pairs-in-string

The code is 130 lines (not counting comment header). Here's the main code that does the bulk of the work.

(defun replace-pairs-in-string (str pairs)
  "Replace string STR by find & replace PAIRS sequence.

Example:
 (replace-pairs-in-string \"abcdef\"
  '([\"a\" \"1\"] [\"b\" \"2\"] [\"c\" \"3\"]))  ⇒ “\"123def\"”.

The search strings are not case sensitive.
The replacement are literal and case sensitive.

If you want search strings to be case sensitive, set
case-fold-search to nil. Like this:

 (let ((case-fold-search nil)) 
   (replace-regexp-in-string-pairs …)

Once a subsring in the input string is replaced, that part is not changed again.
For example, if the input string is “abcd”, and the pairs are
a → c and c → d, then, result is “cbdd”, not “dbdd”.
See also `replace-pairs-in-string-recursive'.

This function calls `replace-regexp-in-string' to do its work.

See also `replace-regexp-pairs-in-string'."
  (let (ii (mystr str) (randomStrList '()))
    (random t) ; set a seed

    ;; generate a random string list for intermediate replacement
    (setq ii 0)
    (while (< ii (length pairs))
      (setq randomStrList (cons
                    (concat "ㄓ" (number-to-string (random)) "ㄘ")
 ; use rarely used unicode char to prevent match in input string
                    randomStrList ))
      (setq ii (1+ ii))
      )

    ;; replace each find string by corresponding item in random string list
    (setq ii 0)
    (while (< ii (length pairs))
      (setq mystr (replace-regexp-in-string
                   (regexp-quote (elt (elt pairs ii) 0))
                   (elt randomStrList ii)
                   mystr t t))
      (setq ii (1+ ii))
      )

    ;; replace each random string by corresponding replacement string
    (setq ii 0)
    (while (< ii (length pairs))
      (setq mystr (replace-regexp-in-string
                   (elt randomStrList ii)
                   (elt (elt pairs ii) 1)
                   mystr t t))
      (setq ii (1+ ii))
      )
    
    mystr))

Find & Replace Feedback Loop Problem

One interesting issue about multiple find & replace is that the input string is recursively replaced, and you may end up with a substring that's not in the original input string nor in any of the find & replace pairs.

For example, if the input string is “abcd”, and the pairs are “a → c” and “c → d”, then, result is “dbdd”, though most of the time you want “cbdd”. This is especially important if you use regex in your find string.

The function “replace-pairs-in-string” will not do feedback loop. It guarantees that a replacement is done IF AND ONLY IF the original input string contains a substring in one of your find string.

This is important when you do complex text processing such as transforming HTML4 to HTML5 or HTML to XHTML.

For a version that does feedback, use “replace-pairs-in-string-recursive”, also in the package.

To implement the non-feedback version, i first replace the string to a intermediate random string. For example, suppose the input pairs are “a → b” and “c → d”. Then, the code will actually do this:

The random string so generated should not happen in the input string. This is achived by using rarely used char in Unicode plus a random number, for the intermediate string.

Applications

For about 10 examples of using multi-pair find & replace, See: Emacs Lisp Multi-Pair Find & Replace Applications.

Emacs is fantastic!

blog comments powered by Disqus