Linux: grep
What is grep
grep
is a command to find text in files.
Show Matching Lines
# show lines containing xyz in myFile grep 'xyz' myFile
# show lines containing xyz in all files ending in html in current dir top level files grep 'xyz' *html
Grep for All Files in a Dir
# show matching lines in dir and subdir, file name ending in html grep -r 'xyz' --include='*html' ~/web
Here's what the options mean:
-r
- All subdirectories.
--include='*html'
-
Match file name by a glob pattern (
*
is a wildcard that matches 0 or more any char.).
Grep Fixed String (no regex)
Use the option -F
. (F means Fixed String)
# search html files that contains .* literally grep -F '.*' *html
Put complicated pattern in a file
If your search string contains quote, you can put it in a file, and use the option
--file=my_pattern_filename
# search html source code in dir and all subdirs. The regex is stored in file named myPattern.txt grep -r --file=myPattern.txt --include=*html .
Options for Pattern String
-F
- Use fixed string. (no regex)
-P
- Use Perl's regex syntax. (Perl and Python's regex are basically compatible.)
-i
- Ignore case.
-v
- Print lines NOT containing the pattern.
# print lines not matching a string, for all files ending in “log” grep -v 'html HTTP' *log
# print lines containing “png HTTP” or “jpg HTTP” grep -P 'png HTTP|jpg HTTP' *log
Options for File Selection
*.html
- search all files ending in “.html”, in current dir. (files in subdir are ignored)
grep -r --include='*html' pattern dirName
- Search files for pattern in dirName including subdirs, but only files ending in “.html”.
Output Options
-H
- Include file name in the result.
-h
- Do NOT print file name.
-l
- Print just file name; do NOT print the matched lines.
-L
- Print just file name that does NOT match.
More Grep Examples
# print lines containing “html HTTP” in a log file, show only the 12th and 7th columns, show only certain lines, then sort, then condense repeation with count, then sort that by the count. grep 'html HTTP' apache.log | awk '{print $12 , $7}' | grep -i -P "livejournal|blogspot" | sort | uniq -c | sort -n
# print all links in all html files of a dir, except certain links. Output to xx.txt grep -r --include='*html' -F 'http://' ~/web | grep -v -P 'google.com|twitter.com|reddit.com|wikipedia.org' > xx.txt