Taking Notes From Grep Pocket Reference

Taking Notes From Grep Pocket Reference

Introduction to Regular Expressions

Metacharacters

grep -e 'e[^a]' name.list

Regular expression metacharacters

Metacharacter Matches

Items to match a single character

. Dot Any one character

[...] Character class Any character listed in brackets

[^...] Negated character class Any character not listed in brackets

\char Escape character

Items that match a position

^ Caret Start of a line

$ Dollar sign End of a line

\< Backslash less-than Start of a word

\> Backslash greater-than End of a word

The quantifiers

? Question mark Optional; considered a quantifier

* Asterisk Any number (including zero)

+ Plus One or more of the preceding expression

{N} Match exactly Match exactly N times

{N,} Match at least Match at least N times

{min,max} Specified range Match between min and max times

Other

| Alternation Matches either expression given

- Dash Indicates a range

(...) P arentheses Used to limit scope of alternation

\1, \2, ... Backreference, matches text previously matched within parentheses (e.g., first set, second set, etc.)

\b Word boundary Batches characters that typically mark the end of a word

\B Backslash This is an alternative to using “\\” to match a backslash, used for readability

\w Word character This is used to match any “word” character

\W Non-word character This matches any character that isn’t used in words

\` Start of buffer Matches the start of a buffer sent to grep

\' End of buffer Matches the end of a buffer sent to grep

POSIX Character Classes

[:alpha:] Any alphabetical character, regardless of case

[:digit:] Any numerical character

[:alnum:] Any alphabetical or numerical character

[:blank:] Space or tab characters

[:xdigit:] Hexadecimal characters; any number or A–F or a–f

[:punct:] Any punctuation symbol

[:print:] Any printable character (not control characters)

[:space:] Any whitespace character

[:graph:] Exclude whitespace characters

[:upper:] Any uppercase letter

[:lower:] Any lowercase letter

[:cntrl:] Control characters

One placement of these POSIX character definitions will match only one single character.

grep Basics

chaining” grep commands is inefficient most of the time. Often, a regular expression can be

crafted to combine several conditions into a single search.

There is a case to be made for piping commands when you wish to search through content that is continually streaming.

tail -f /var/log/messages | grep WARNING

The grep program is actually a package of four differentpattern-matching programs that use different regular-expression models.

Basic Regular Expressions (grep or grep -G)

Match Control

-e pattern, --regexp=pattern

grep -e -style doc.txt

Ensures that grep recognizes the pattern as the regular expression argument. Useful if the regular expression begins with a hyphen, which makes it look like an option

-f file, --file=file Takes patterns from file.

-i, --ignore-case

-v, --invert-match Returns lines that do not match, instead of lines that do.

-w, --word-regexp Matches only when the input text consists of full words.

This is the equivalent of putting \b at the beginning and end of the regular expression.

-x, --line-regexp grep -x 'Hello, world!' filename

This example matches only lines that consist entirely of “Hello, world!”.

General Output Control

-c, --count grep -c contact.html access.log

Instead of the normal output, you receive just a count of how many lines matched in each input file.

--color[=WHEN], --colour[=WHEN] grep -color[=auto] regexp filename

Assuming the terminal can support color, grep will colorize the pattern in the output. WHEN has three options: never, always, and auto.

-l, --files-with-matches

Instead of normal output, prints just the names of input files containing the pattern.

-L, --files-without-match

Instead of normal output, prints just the names of input files that contain no matches

-m NUM, --max-count=NUM grep -m 10 'ERROR:' *.log

This option tells grep to stop reading a file after NUM lines are matched

-o, --only-matching

Prints only the text that matches, instead of the whole line of input.

-q, --quiet, --silent

Suppresses output. The command still conveys useful information because the grep command’s exit status (0 forsuccess if a match is found, 1 for no match found, 2 if the program cannot run because of an error) can be checked. The option is used in scripts to determine the presence of a pattern in a file without displaying unnecessary output.

-s, --no-messages

Silently discards any error messages resulting from non- existent files or permission errors.

Output Line Prefix Control

-b, --byte-offset

Displays the byte offset of each matching text instead of the line number.

-H, --with-filename

Always includes the name of the file before each line printed.

-h, --no-filename

--label=LABEL

When the input is taken from standard input (for instance, when the output of another file is redirected into grep), the --label option will prefix the line with LABEL.

-n, --line-number Includes the line number of each line displayed,

-T, --initial-tab

Inserts a tab before each matching line, putting the tab between the information generated by grep and the matching lines.

-Z, --null

Prints an ASCII NUL (a zero byte) after each filename. This is useful when processing filenames that may contain special characters (such as carriage returns).

Context Line Control

-A NUM, --after-context=NUM grep -A 3 Copyright filename

Offers a context for matching lines by printing the NUM lines that follow each match. A group separator (--) is placed between each set of matches.

-B NUM, --before-context=NUM grep -B 3 Copyright filename

Same concept as the -A NUM option, except that it prints the lines before the match instead of after it.

-C NUM, -NUM, --context=NUM grep -C 3 Copyright filename

The -C NUM option operates as if the user entered both the -A NUM and -B NUM options. It will display NUM lines before and after the match. A group separator (--) is placed be-

tween each set of matches.

File and Directory Selection

-a, --text

Equivalent to the --binary-files=text option, allowing a binary file to be processed as if it were a text file.

--binary-files=TYPE grep --binary-files=TYPE pattern filename

TYPE can be either binary, without-match, or text. When grep first examines a file, it determines whether the file is a “binary” file (a file primarily composed of non-human-readable text) and changes its output accordingly. By default, a match in a binary file causes grep to display sim-

ply the message “Binary file somefile.bin matches.” The default behavior can also be specified with the --binary-files=binary option.

When TYPE is without-match, grep does not search the binary file and proceeds as if it had no matches (equivalent to the -l option). When TYPE is text, the binary file is processed like text (equivalent to the -a option).

-D ACTION, --devices=ACTION grep -D read 123-45-6789 /dev/hda1

If the input file is a special file, such as a FIFO or a socket, this flag tells grep how to proceed. By default, grep will process these files as if they were normal files on a system. If ACTION is set to skip, grep will silently ignore them. The example will search an entire disk partition for the fake

Social Security number shown. When ACTION is set to read, grep will read through the device as if it were a normal file.

-d ACTION, --directories=ACTION grep -d ACTION pattern path

This flag tells grep how to process directories submitted as input files. When ACTION is read, this reads the directory as if it were a file. recurse searches the files within that directory (same as the -R option), and skip skips the directory without searching it.

--exclude=GLOB grep --exclude=PATTERN path

Refines the list of input files by telling grep to ignore files whose names match the specified pattern.

--exclude-from=FILE

Similar to the --exclude option, except that it takes a list of patterns from a specified filename, which lists each pattern on a separate line.

--exclude-dir=DIR grep --exclude-dir=DIR pattern path

Any directories in the path matching the pattern DIR will be excluded from recursive searches.

-l grep -l pattern filename

Same as the --binary-files=without-match option. When grep finds a binary file, it will assume there is no match in the file.

--include=GLOB

grep --include=*.log pattern filename

Limits searches to input files whose names match the given pattern

-R, -r, --recursive

Searches all files underneath each directory submitted as an input file to grep.

Other Options

--line-buffered, --mmap

-U, --binary

An MS-DOS/Windows-specific option that causes grep to treat all files as binary.

-V, --version

Extended Regular Expressions (egrep or grep -E)

Supported regular expression: ?, +, {n,m}, |, (). the backslash (\) negates the metacharacter’s behavior and forces the search to match the character in a literal sense.

The metacharacter { is not supported by The traditional egrep. Although some versions interpret \{ literally, it should be avoided in egrep patterns. Instead, [{] should be used to match the character without invoking the special meaning.

Basic regular expressions Extended regular expressions

'\(red\)' '(red)'

'a\{1,3\}' 'a{1,3}'

'behaviou\?r' 'behaviou?r'

'pattern\+' 'pattern+'

Each of the special metacharacters in extended regular expressions needs to be prefaced by an escape to draw out its special meaning. Note that this is the reverse of normal escaping behavior, which usually strips special meaning.

Fixed Strings (fgrep or grep -F)

It is known as “fast grep” because of the great performance it has compared to grep and egrep. It accomplishes this by dropping regular expressions altogether and looking for a defined string pattern. It is useful for searching for specific static content in a precise manner.

It supports the following option:

-b Shows the block number where the string_pattern was found.

-c This counts the number of lines that contain one or more instances of the string_pattern.

-e, -string fgrep -e string_pattern filename

Used for the search of more than one pattern or when the string_pattern begins with hyphen.

-f file

Outputs the results of the search into a new file instead of printing directly to the terminal

-h no-filename

-l Displays the files containing the string_pattern but not the matching lines themselves.

-n Prints out the line number before the line that matches the given string_pattern.

-v Matches any lines that do not contain the given string_pattern.

-x Prints out the lines that match the string_pattern in their entirety.

-i ignore case

Perl-Style Regular Expressions (grep -P)

Perl-style regular expressions use the Perl-Compatible Regular Expressions (PCRE) library to interpret the pattern and perform searches.

Debian does not enable Perl-style regular expressions by default in their grep package. Instead, they ship a pcregrep program, which provides very similar functionality to grep -P.

Test: grep -P test /bin/ls

All the same command line options that are present for grep will work with grep -P;

Character Types

PCRE-specific escapes

\a Matches the “alarm” character (HEX 07)

\cX Matches Ctrl-X, where X is any letter

\e Matches escape character (HEX 1B)

\f Matches form feed character (HEX 0C)

\n Matches newline character (HEX 0A)

\r Matches carriage return (HEX 0D)

\t Matches tab character (HEX 09)

\d Any decimal digit

\D Any non-decimal character

\s Any whitespace character

\S Any non-whitespace character

\w Any “word” character

\W Any “non-word” character

\b Matches when at word boundary

\B Matches when not at word boundary

\A Matches when at start of subject

\Z Matches when at end of subject or before newline

\z Matches when at end of subject

\G Matches at first matching position

Octal Searching

To search for “space”, use /40 or /040.

Introduction to grep-Relevant Environment Variables

Issue an env command in a terminal to output all the current parameters.

Change your EDITOR from vi to vim. In .profile, type: setenv EDITOR vim

Advanced Tips and Tricks with grep

Backreferences:\n

grep -E '(red|green|blue).*\1' filename

grep -E '(I am the very model of a modern major general.).*\1' filename

Binary File Searching grep help /bin/ls

Grep does do an initial check to see if a file is binary and alters the way it displays results accordingly.

Useful Recipes

IP addresses

grep -E '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b' patterns

grep -E '\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b' patterns

MAC addresses

grep -Ei '\b[0-9a-f]{2}(:[0-9a-f]{2}){5}\b' patterns

Email addresses

grep -Ei '\b[a-z0-9]{1,}@*\.(com|net|org|uk|mil|gov|edu)\b' patterns

U.S.-based phone numbers

grep -E '\b(\(|)[0-9]{3}(\)|-|\)-|)[0-9]{3}(-|)[0-9]{4}\b' patterns

Social Security numbers

grep -E '\b[0-9]{3}( |-|)[0-9]{2}( |-|)[0-9]{4}\b' patterns

Credit card numbers

grep -E '\b[0-9]{4}(( |-|)[0-9]{4}){3}\b' patterns

if your corporation uses the following copyright tag, you could use the following command to locate files that have this content:

fgrep -l 'ACME Corp. Proprietary and Confidential' patterns

Searching through large numbers of files

There is a limit to the number of files that can be handled in a single command. If you ask grep to

process too many files, it will produce an error saying “Too Many Files” or the equivalent.

A tool called xargs can get around this limitation.

find / -print | xargs grep 'ABCDEFGH'

Matching strings across multiple lines

grep -P '(?m)red\ndog' test

This allows a user to overcome the limitation in grep where it will examine only individual lines.

grep --help on Ubuntu

Regexp selection and interpretation:

-P, --perl-regexp PATTERN is a Perl regular expression

-f, --file=FILE obtain PATTERN from FILE

-i, --ignore-case ignore case distinctions

-w, --word-regexp force PATTERN to match only whole words

-x, --line-regexp force PATTERN to match only whole lines

Miscellaneous:

-s, --no-messages suppress error messages

-v, --invert-match select non-matching lines

Output control:

-m, --max-count=NUM stop after NUM matches

-n, --line-number print line number with output lines

--line-buffered flush output on every line

-H, --with-filename print the filename for each match

-h, --no-filename suppress the prefixing filename on output

--label=LABEL print LABEL as filename for standard input

-o, --only-matching show only the part of a line matching PATTERN

-q, --quiet, --silent suppress all normal output

-d, --directories=ACTION how to handle directories;

ACTION is `read', `recurse', or `skip'

-R, -r, --recursive equivalent to --directories=recurse

--include=FILE_PATTERN search only files that match FILE_PATTERN

--exclude=FILE_PATTERN skip files and directories matching FILE_PATTERN

--exclude-from=FILE skip files matching any file pattern from FILE

--exclude-dir=PATTERN directories that match PATTERN will be skipped.

-L, --files-without-match print only names of FILEs containing no match

-l, --files-with-matches print only names of FILEs containing matches

-c, --count print only a count of matching lines per FILE

-T, --initial-tab make tabs line up (if needed)


Context control:

-B, --before-context=NUM print NUM lines of leading context

-A, --after-context=NUM print NUM lines of trailing context

-C, --context=NUM print NUM lines of output context

-NUM same as --context=NUM

--color[=WHEN],

--colour[=WHEN] use markers to highlight the matching strings;

WHEN is `always', `never', or `auto'


Resources:

Grep Pocket Reference


Post a Comment

Labels

Java (159) Lucene-Solr (112) Interview (61) All (58) J2SE (53) Algorithm (45) Soft Skills (38) Eclipse (33) Code Example (31) Linux (25) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (16) Defects (14) Text Mining (14) J2EE (13) Network (13) Troubleshooting (13) PowerShell (11) Chrome (9) Design (9) How to (9) Learning code (9) Performance (9) Problem Solving (9) UIMA (9) html (9) Http Client (8) Maven (8) Security (8) bat (8) blogger (8) Big Data (7) Continuous Integration (7) Google (7) Guava (7) JSON (7) Shell (7) ANT (6) Coding Skills (6) Database (6) Lesson Learned (6) Programmer Skills (6) Scala (6) Tips (6) css (6) Algorithm Series (5) Cache (5) Dynamic Languages (5) IDE (5) System Design (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) Miscs (4) OpenNLP (4) Project Managment (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Bit Operation (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Firefox (2) Google Drive (2) Gson (2) How to Interview (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Python (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts