Bash Regular Expressions

A regular expression is a pattern that describes a set of strings.

Regular expressions are constructed analogously to arithmetic expressions by using various operators to combine smaller expressions.

The fundamental building blocks are the regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match themselves. Any metacharacter with special meaning may be quoted by preceding it with a backslash.

Regular expression metacharacters

A regular expression may be followed by one of several repetition operators (metacharacters):

Regular expression operators

Operator Effect
. Matches any single character.
? The preceding item is optional and will be matched, at most, once.
* The preceding item will be matched zero or more times.
+ The preceding item will be matched one or more times.
{N} The preceding item is matched exactly N times.
{N,} The preceding item is matched N or more times.
{N,M} The preceding item is matched at least N times, but not more than M times.
represents the range if it’s not first or last in a list or the ending point of a range in a list.
^ Matches the empty string at the beginning of a line; also represents the characters not in the range of a list.
$ Matches the empty string at the end of a line.
\b Matches the empty string at the edge of a word.
\B Matches the empty string provided it’s not at the edge of a word.
\< Match the empty string at the beginning of word.
\> Match the empty string at the end of word.

Two regular expressions may be concatenated; the resulting regular expression matches any string formed by concatenating two substrings that respectively match the concatenated subexpressions.

Two regular expressions may be joined by the infix operator “|”; the resulting regular expression matches any string matching either subexpression.

Repetition takes precedence over concatenation, which in turn takes precedence over alternation. A whole subexpression may be enclosed in parentheses to override these precedence rules.

Basic versus extended regular expressions

In basic regular expressions the metacharacters “?”, “+”, “{“, “|”, “(“, and “)” lose their special meaning; instead use the backslashed versions “\?”, “\+”, “\{“, “\|”, “\(“, and “\)”.

Check in your system documentation whether commands using regular expressions support extended expressions.

What is grep ?

grep searches the input files for lines containing a match to a given pattern list. When it finds a match in a line, it copies the line to standard output (by default), or whatever other sort of output you have requested with options.

Though grep expects to do the matching on text, it has no limits on input line length other than available memory, and it can match arbitrary characters within a line. If the final byte of an input file is not a newline, grep silently supplies one. Since newline is also a separator for the list of patterns, there is no way to match newline characters in a text.

grep regular expression

To know more about grep, read manual pages and use –help

Th3-Gam3 ~ # man grep
Th3-Gam3 ~ # grep --help

Examples:

To search for a user “akm” in /etc/passwd

Th3-Gam3 ~ # grep akm /etc/passwd
akm:x:1000:1000:Falcon,,,:/home/akm:/bin/bash

To find a word and print its line number from the file, use -n

Th3-Gam3 ~ # grep -n akm /etc/passwd
41:akm:x:1000:1000:Falcon,,,:/home/akm:/bin/bash

To find anything except a word, use -v

Th3-Gam3 ~ # grep -v akm /etc/passwd

It will print all the file except the line that have the word akm

To show how many time the word has repeated, use -c

Th3-Gam3 ~ # grep -c akm /etc/passwd
1

To find lines that only start with a word, use ^word

Th3-Gam3 ~ # grep root /etc/passwd
root:x:0:0:root:/root:/bin/bash
nm-openvpn:x:117:124:NetworkManager OpenVPN,,,:/var/lib/openvpn/chroot:/bin/false

Th3-Gam3 ~ # grep ^root /etc/passwd
root:x:0:0:root:/root:/bin/bash
Th3-Gam3 ~ #

To find lines that end with a word, use word$

Th3-Gam3 ~ # grep bash$ /etc/passwd
root:x:0:0:root:/root:/bin/bash
akm:x:1000:1000:Falcon,,,:/home/akm:/bin/bash
Th3-Gam3 ~ #
Character classes

A bracket expression is a list of characters enclosed by [ and ]. It matches any single character in that list; if the first character of the list is the caret, ^, then it matches any character NOT in the list.

To match any of specific characters or range of characters like “abc” ,use [abc] or [a-c] which means from a to c .
To match all except “abc” , use [^abc] , or [^a-c]
To match a range of letters and numbers like from a to c and from 0 to 5 , use [a-c0-5]
To match all capital letters , use [A-Z]
all small letters, use [a-z]
all digits, use [0-9]

You can use any combination of characters and ranges.

Examples :

To find lines that start with small letters a or b :

Th3-Gam3 ~ # grep ^[ab] /etc/passwd
bin:x:2:2:bin:/bin:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
avahi-autoipd:x:109:117:Avahi autoip daemon,,,:/var/lib/avahi-autoipd:/bin/false
avahi:x:110:118:Avahi mDNS daemon,,,:/var/run/avahi-daemon:/bin/false
akm:x:1000:1000:Falcon,,,:/home/akm:/bin/bash
account:x:1003:1003::/home/account:
ahmed:x:1004:1004::/home/ahmed:
bacula:x:126:139:Bacula:/var/lib/bacula:/bin/false
Th3-Gam3 ~ # grep ^[AB] /etc/passwd
Th3-Gam3 ~ #

To find all lines except what starts with a or b :

Th3-Gam3 ~ # grep ^[^ab] /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin

*The ^ outside means start with,
The ^ inside brackets means reverse or NOT.

Character classes

Character classes can be specified within the square braces, using the syntax [:CLASS:], where CLASS is defined in the POSIX standard and has one of the values

“alnum”, “alpha”, “ascii”, “blank”, “cntrl”, “digit”, “graph”, “lower”, “print”, “punct”, “space”, “upper”, “word” or “xdigit”.

To list all directory that start with digits:

Th3-Gam3 ~ # ls -ld [[:digit:]]*
drwxr-xr-x 1 root root 0 May  8 17:32 0akm

Full Examples :

To match an email address :

"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$"

Analysis :

^ : means start with
[A-Za-z0-9._%+-]  : means any capital , small , digits , special characters.
+ : means one or more of previous.

That was the email name.

@ : means it should have @ character.

[A-Za-z0-9.-]+ : means letters and digits and dots or dashes must be found one or more (that what + means ).
\. : means must fount a dot , using backslash in front to use it as normal dot , NOT as regular expression.
[A-Za-z]{2,4}$ : means it must end with a word of tow letters up to four letters only.

and that represents the domain and extension part of email.

[email protected] for example.

That is it, i know it was long but no choice, regular expression is critical task for any programmer not just for bash, thanks for joining me.
Enjoy !.

 

 

 

 

One comment on “Bash Regular Expressions”

Comments are closed.