Sunday, February 7, 2010

Regular Expressions - 2

Alternation:
Difference between a character class and alternation is that alternation can be used to match alternates of entire regular expressions while character classes match only one character. However alternation cannot be negated like a character class.

Ignoring differences in capitalization:
We can tell egrep to ignore capitalization differences by using the '-i' option. But remember that this is not part of the regular expression language itself.

Word boundaries:

Metacharacters:
There are different metacharacters within a class and outside of it. Sometimes the meaning of a metacharacter also changes depending on whether it is within a class or outside of it.

. Matches anything
[...] Represents a character class
[^..] Neggated character class
^ Start of a line when outside a class, negation when the 1st char in a class
$ End of a line
\< Start of a word
\> End of a word
| Alternation
() Used to limit the scope of alternation... there are other uses as well
- Used to specify ranges within a character class
? 0 or 1 match (signifies an optional match)
+ Matches 1 or more
* Matches any number including none (0 or more)

Example:

July? (fourth|4(th)?)
This means match anything which has a 'J' followe by a 'u' followed by a 'l' which may (optional) be followed by 'y'. Following this we may have a 'fourth' or a '4', or '4th'.


No comments: