Xah Lee, 2007-10-09, 2010-12-02
This page shows a example of writing a emacs lisp function that cleans up a file's content by repeated application of find & replace operation. If you don't know elisp, first take a look at Emacs Lisp Basics.
I want to write a command such that it repeatedly does find & replace on several find & replace pairs on the current file.
I have a website of Math Surface Gallery, which contains a Java applet called JavaView that allows people to view 3D objects with real-time rotation by the mouse. For example, this is one of the java applet page: Costa surface applet. There are about 70 of such surfaces. Each of these surface has a raw data file that the java applet reads. For example, for the Costa surface above, the raw data file is: costa.mgs.gz. These files are just Mathematica graphics in plain text, and compressed with gzip.
The content of the file looks like this:
Graphics3D[{{
Polygon[{{3.552, -0.001061, 2.689}, {3.552, 0.03079, 2.689},
{3.025, 0.02634, 2.524}, {3.025, -0.001061, 2.524}}],
Polygon[{{3.552, 0.03079, 2.689}, {3.550, 0.1250, 2.689},
{3.023, 0.1074, 2.524}, {3.025, 0.02634, 2.524}}],
Polygon[{…}],
…
}}]
Because the file contains thousands or tens of thousands of polygons,
it can get large, and takes a while for the java applet to load it
from the net. One way to reduce file size is to reduce the number of
polygons. But given a file, spaces and end-of-line characters can be
deleted, and the decimal numbers can be safely truncated to 3 digits.
So, typically, i open the file, do global find-and-replace operations
【Alt+x query-replace】 by replacing , to just ,, and delete line endings
(replacing \n by empty string), delete multiple spaces. To truncate
decimals to 3 places, i use the 【Alt+x query-replace-regexp】 with pattern
\([0-9]\)\.\([0-9][0-9][0-9]\)[0-9]+ and replace it with \1.\2.
After a while, this process gets repetitious. It would be nice, to have a emacs command, so that when invoked, it will perform all these find & replace operations on the current file in one-shot. This would reduce some 50 keystrokes and eye-balling into a single brainless button punch.
Here's the solution:
(defun clean-mgs-buffer () "Reduce size of a mgs file by removing whitespace and truncating numbers. This command does several find & replace on the current buffer. Removing spaces, removing new lines, truncate numbers to 3 decimals, etc. The goal of these replacement is to reduce the file size of a Mathematica Graphics file (.mgs) that are read over the net by JavaView." (interactive) (goto-char 1) (while (search-forward "\n" nil t) (replace-match "" nil t)) (goto-char 1) (while (search-forward-regexp " +" nil t) (replace-match " " nil t)) (goto-char 1) (while (search-forward ", " nil t) (replace-match "," nil t)) (goto-char 1) (while (search-forward-regexp "\\([0-9]\\)\\.\\([0-9][0-9][0-9]\\)[0-9]+" nil t) (replace-match "\\1.\\2" t nil)))
This function is relatively simple. It does a series of replacement using the “while” loop, each time moving the cursor to the beginning of file. The core is the functions “search-forward”, “search-forward-regexp”, and “replace-match”.
The “search-forward” function takes a string and moves the cursor to the end of the string that matches. “search-forward-regexp” does similar. The “replace-match” simply replaces the text matched by the last search.
One interesting aspect about “search-forward-regexp” is that you must use 2 backslashes to represent one backslash. This is because backslash in emacs string needs a backslash to represent it. Then, this string is passed to emacs's regex engine.
Another thing of interest is that the first 2 optional parameters to “replace-match” function is “fixedcase” and “literal”, both are booleans. If “fixedcase” is non-nil, then emacs will not alter the case of the replacement. (otherwise it decides smartly based on the case of the matched text) If “literal” is non-nil, then emacs will interpret the replacement string as literal. (in our case, we want “literal” to be “nil” when we use search-forward-regexp.)
Emacs is beautiful!
PS: Note that in this tutorial, each replacement pair is done using a while loop, and each start with (goto-char 1). What if you have lots of pairs?
Won't it be great if you can simply write:
'( ["alpha" "α"] ["beta" "β"] ["gamma" "γ"] )
instead of each with a while loop? For a solution for this, see: Elisp Package: Multi-Pair String Replacement: xfrp_find_replace_pairs.el.
Addendum: here's the Mathematica code to export graphics into a text file forcing all numbers to be printed in a simple d.dddd format.
Otherwise, Mathematica may print numbers in various forms such as
2.25`*^-9,
\(7.2389`\),
3.141592653589793238462643383279503`20.
writeToFileRounded[expr_Graphics3D,fileName_?StringQ,prec_:4]:=Module[{},
OpenWrite[fileName];
WriteString[fileName,"Graphics3D["];
WriteString[fileName,
StringReplace[
ToString@
NumberForm[First@SetPrecision[Chop[expr,10^-(prec+1)],prec],
ExponentFunction\[Rule](If[-Infinity<#<Infinity,Null,#]&)],
"],"->"],\n"]];
WriteString[fileName,"]"];
Close[fileName]
];
writeToFileRounded[surf,"helicoid.ma",4]
(*the first argument is a Graphics3D object, the second is a name to
save to, the third is number of decimal places for the coordinate
values.*)
blog comments powered by Disqus