Linux: grep

By Xah Lee. Date: .

What is grep

grep is a command to find text in files.

Show Matching Lines

# show lines containing xyz in myFile
grep 'xyz' myFile
# show lines containing xyz in all files ending in html in current dir top level files
grep 'xyz' *html

Grep for All Files in a Dir

# show matching lines in dir and subdir, file name ending in html
grep -r 'xyz' --include='*html' ~/web

Here's what the options mean:

-r
All subdirectories.
--include='*html'
Match file name by a glob pattern (* is a wildcard that matches 0 or more any char.).

Grep Fixed String (no regex)

Use the option -F. (F means Fixed String)

# search html files that contains  .* literally
grep -F '.*' *html

Put complicated pattern in a file

If your search string contains quote, you can put it in a file, and use the option

--file=my_pattern_filename

# search html source code in dir and all subdirs. The regex is stored in file named myPattern.txt
grep -r --file=myPattern.txt --include=*html .

Options for Pattern String

-F
Use fixed string. (no regex)
-P
Use Perl's regex syntax. (Perl and Python's regex are basically compatible.)
-i
Ignore case.
-v
Print lines NOT containing the pattern.
# print lines not matching a string, for all files ending in “log”
grep -v 'html HTTP' *log
# print lines containing “png HTTP” or “jpg HTTP”
grep -P 'png HTTP|jpg HTTP' *log

Options for File Selection

*.html
search all files ending in “.html”, in current dir. (files in subdir are ignored)
grep -r --include='*html' pattern dirName
Search files for pattern in dirName including subdirs, but only files ending in “.html”.

Output Options

-H
Include file name in the result.
-h
Do NOT print file name.
-l
Print just file name; do NOT print the matched lines.
-L
Print just file name that does NOT match.

More Grep Examples

# print lines containing “html HTTP” in a log file, show only the 12th and 7th columns, show only certain lines, then sort, then condense repeation with count, then sort that by the count.

grep 'html HTTP' apache.log | awk '{print $12 , $7}' | grep -i -P "livejournal|blogspot" | sort | uniq -c | sort -n
# print all links in all html files of a dir, except certain links. Output to xx.txt

grep -r --include='*html' -F 'http://' ~/web | grep -v -P 'google.com|twitter.com|reddit.com|wikipedia.org' > xx.txt

Linux, Process Text