ELisp: Text Processing, Transforming Page Tag

By Xah Lee. Date: . Last updated: .

This page shows a example of using emacs lisp for text processing. It is used to update HTML page's navigation bar.

Problem

You have hundreds of HTML pages that have a nav bar like this:

<div class="pages">Goto Page:
<a href="1.html">1</a>,
<a href="2.html">2</a>,
<a href="3.html">3</a>,
…
</div>

It looks like this in browser (with CSS):

page tag 1

This is the page navigation bar. Note that the page contains a link to itself.

You want to remove the self-link. The result should look like this:

<div class="pages">Goto Page:
1,
<a href="2.html">2</a>,
<a href="3.html">3</a>,
…
</div>
page tag 2

Solution

Here are the steps we need to do for each file:

  1. open the file.
  2. move cursor to the beginning of page navigation string.
  3. move cursor to file name.
  4. call sgml-delete-tag to remove the anchor tag. (sgml-delete-tag is from html-mode)
  5. save file.
  6. close buffer.

We begin by writing a test code to process a single file.

(defun my-process-file-navbar (fPath)
  "Modify the HTML file at fPath."
  (let (fName myBuffer)
    (setq fName (file-name-nondirectory fPath))
    (setq myBuffer (find-file fPath))
    (widen) ; in case buffer already open, and narrow-to-region is in effect
    (goto-char (point-min))
    (search-forward "<div class=\"pages\">Goto Page:")
    (search-forward fName)
    (sgml-delete-tag 1)
    (save-buffer)
    (kill-buffer myBuffer)))

(my-process-file-navbar "~/test1.html")

For testing, create files {test1.html, test2.html, test3.html} in a temp directory for testing this code. Place the following content into each file:

<div class="pages">Goto Page: <a href="test1.html">XYZ Overview</a>, <a href="test2.html">Second Page</a>, <a href="test3.html">Summary Z</a></div>

(note that the link text may not be 1, 2, 3.)

The elisp code above is very basic.

sgml-delete-tag is from html-mode (which is automatically loaded when a HTML file is opened).

sgml-delete-tag deletes the opening/closing tags tags the cursor is on.

All we need to do now is to feed it a bunch of file paths.

To get the list of files that contains the page-nav tag, we can simply use linux's “find” and “grep”, like this:

find . -name "*\.html" -exec grep -l '<div class="pages">' {} \;

From the output, we can use string-rectangle and query-replace, to construct the following code:

(mapc 'my-process-file-navbar
      [
       "~/web/cat1html"
       "~/web/dog.html"
       "~/web/something.html"
       "~/web/xyz.html"
       ]
      )

The mapc is a lisp idiom of looping thru a list or vector. The first argument is a function. The function will be applied to every element in the list. The single quote in front of the function is necessary. It prevents the symbol “my-process-file-navbar” from being evaluated (as a expression of a variable).

Emacs 🧡