HomeMathComputingArtsWordsLiteratureMusictwitter facebook webfeed

Emacs Lisp: Writing a url-linkify Command

Advertise Here For Profit

Xah Lee, 2010-12-03, …, 2011-05-16

This page is a little lisp tutorial. A example on writing a function that transform the text under cursor on the fly. If you are not familiar with elisp, see: Emacs Lisp Basics.

Problem

I need to write a elisp command, so that, when pressing a button, the url under cursor, such as:

http://some.example.com/xyz.html

becomes this:

<a class="sorc" href="http://some.example.com/xyz.html" title="accessed:2011-05-16">Source some.example.com</a>

with today's date automatically inserted to the “access” part. And pressing another button, the link become this:

<span class="sorcdd" title="accessed:2011-05-16; defunct:2011-05-16; http://some.example.com/xyz.html">Source some.example.com</span>

with today's date added to the “defunct” part.

Detail

In writing blogs, often you need to cite links. The links may be other blogs, news sites, or some random site. Many such url are ephemeral. They exist today, but may become a dead link few months later. Typically, if the url doesn't have a domain, but is hosted blog service site, it is more likely to go bad sooner.

For me, i write many blogs on xahlee.info, so have hundreds of links. When you update your pages years later, you find dead links like 〔http://someRandomBlog.org/importantToday.html〕, and may not remember what that link is about. No author, no title, no idea when that link was active or become dead. Sometimes, link is still good but the domain name owner of the link has changed, so the linked page may become porn site or been bought by domain squatters.

One partial solution is to add access date together with the link, like this:

<p>I found a fantastic <a href="http://some.example.com/xyz.html">emacs blog</a>
(Accessed on 2010-12-03) today!</p>

With a access date, at least you know when the link was good. If the link went bad, you or your readers can at least try to see the link thru web archive site such as Internet Archive.

However, this requires manual insertion of the date. Also, the “accessed on …” info in your content is very distracting.

It would be better, if the access date is somehow embedded in the link, and in some uniform format. HTML4 or even HTML5 does not have a way to embed access date. I decided to add the access date into the “title” attribute, like this:

<a class="sorc" href="http://example.com/" title="accessed:2011-05-16">Source example.com</a> 

This is not a ideal solution, because the “title” attribute is supposed to be title, not a date stamp. But in practice, i decided it's ok for me to adopt this solution.

When later on if i found a link is dead, i can press a button, and emacs will change the link to this format:

<span class="sorcdd" title="accessed:2011-05-16; defunct:2011-05-16; http://example.com/">Source example.com</span> 

Notice that the value of the “class” attribute has changed from “sorc” to “sorcdd”. With proper css, the link will be shown as crossed out. Like this: Source some.example.com.

A uniform format is good. Because, if later on HTML6 or other HTML Microformat has a way to add access date to links, i can easily write a script that change all my thousands of external links to the new format.

Solution

Here's the code:

(defun source-linkify ()
  "Make url at cursor point into a html link.
If there's a text selection, use the text selection as input.

Example: http://example.com/xyz.htm
becomes
<a class=\"sorc\" href=\"http://example.com/xyz.htm\" title=\"accessed:2008-12-25\">Source example.com</a>"
  (interactive)
  (let (url resultLinkStr bds p1 p2 domainName)

    ;; get the boundary of url or text selection
    (if (region-active-p)
        (setq bds (list (region-beginning) (region-end))  )
      (setq bds (bounds-of-thing-at-point 'url))
      )

    ;; set url
    (setq p1 (car bds))
    (setq p2 (cdr bds))
    (setq url (buffer-substring-no-properties p1 p2))

    ;; get the domainName
    (string-match "://\\([^\/]+?\\)/" url)
    (setq domainName  (match-string 1 url))

    (setq url (replace-regexp-in-string "&" "&amp;" url))
    (setq resultLinkStr
          (concat "<a class=\"sorc\" href=\"" url "\""
                  " title=\"accessed:" (format-time-string "%Y-%m-%d")
                  "\""
                  ">" 
                  "Source " domainName
                  "</a>"))

    ;; delete url and insert the link
    (delete-region p1 p2)
    (insert resultLinkStr)))

The code is easy to understand. If you find it difficult, try reading this page Emacs Lisp: Writing a Wrap-URL Function, which has more explanation.

You can assign a hotkey for this command.

The following is the code to turn a link into a dead link format.

(defun defunct-link ()
  "Make the html link under cursor to a defunct form.
Example:
If cursor is inside this tag
<a class=\"sorc\" href=\"http://example.com/\" title=\"accessed:2008-12-26\">…</a>
 (and inside the opening tag.)

It becomes:
<span class=\"sorcdd\" title=\"accessed:2008-12-26; defunct:2008-12-26; http://example.com\">…</span>"
  (interactive)
  (let (p1 p2 wholeLinkStr newLinkStr url titleStr anchorText)
    (require 'sgml-mode)
    (save-excursion
      ;; get the boundary of opening tag
      ;; (forward-char 3)
      ;; (search-backward "<a " ) (setq p1 (point) )
      ;; (search-forward "</a>") (setq p2 (point) )
      (sgml-skip-tag-backward 1) (setq p1 (point) )
      (sgml-skip-tag-forward 1) (setq p2 (point) )

      ;; get wholeLinkStr
      (setq wholeLinkStr (buffer-substring-no-properties p1 p2))

      ;; generate replacement text
      (with-temp-buffer
        (insert wholeLinkStr)

        (goto-char 1)
        (search-forward-regexp  "href=\"\\([^\"]+?\\)\"")
        (setq url (match-string 1))

        (search-forward-regexp  "title=\"\\([^\"]+?\\)\"")
        (setq titleStr (match-string 1))

        (search-forward-regexp  ">\\([^<]+?\\)</a>")
        (setq anchorText (match-string 1))

        (setq newLinkStr
              (concat "<span class=\"sorcdd\" "
                      "title=\""
                      (concat
                       titleStr
                       "; defunct:" (format-time-string "%Y-%m-%d") "; " url )
                      "\">" anchorText "</span>")
              )))

    (delete-region p1 p2)
    (insert newLinkStr)))

Here's the css for the dead link:

span.sorcdd {text-decoration:line-through}

Elisp is fantastic!

blog comments powered by Disqus