Sunday, February 7, 2010

Regular Expressions - 2

Difference between a character class and alternation is that alternation can be used to match alternates of entire regular expressions while character classes match only one character. However alternation cannot be negated like a character class.

Ignoring differences in capitalization:
We can tell egrep to ignore capitalization differences by using the '-i' option. But remember that this is not part of the regular expression language itself.

Word boundaries:

There are different metacharacters within a class and outside of it. Sometimes the meaning of a metacharacter also changes depending on whether it is within a class or outside of it.

. Matches anything
[...] Represents a character class
[^..] Neggated character class
^ Start of a line when outside a class, negation when the 1st char in a class
$ End of a line
\< Start of a word
\> End of a word
| Alternation
() Used to limit the scope of alternation... there are other uses as well
- Used to specify ranges within a character class
? 0 or 1 match (signifies an optional match)
+ Matches 1 or more
* Matches any number including none (0 or more)


July? (fourth|4(th)?)
This means match anything which has a 'J' followe by a 'u' followed by a 'l' which may (optional) be followed by 'y'. Following this we may have a 'fourth' or a '4', or '4th'.

Saturday, February 6, 2010

Regular Expressions - 1

Regular expressions by practice

I want to search for my name in a large text file. This is a case where we can use simple regular expressions.

egrep Parag file.txt

Will search the file called file.txt and print all lines which have the word 'Parag' in it. But normally things are not as simple as this. What if I suddenly remember that some instances of Parag may not have the first character in upper case. I want to search for all instances of 'Parag', or 'parag'. So basically I want to search for a word which begins with either 'P', or 'p', followed by 'arag'. How do I specify 'P', or 'p'? It can be done with what is known as character classes. A character class essentially results in a match when anythin specifed in a character class matches.

egrep [Pp]arag file.txt

will print all lines containing 'Parag' or 'parag'

Tuesday, August 11, 2009

Python lists

Here are some methods which can be used with lists in Python

We will first create a list to represent a bag of flowers:

flowers = ['rock banana', 'swollen finger grass', 'velvet leaf', 'turmeric', 'rock banana']

Let's first determine how many rock banana flowers we have in the list:

>>>flowers.count('rock banana')

At what index in the list does 'swollen finger grass' appear?

>>>flowers.index('swollen finger grass')

Let's insert a rose in there at index 2
>>>flowers.insert(2, 'rose')
>>>print flowers
['rock banana', 'swollen finger grass', 'rose', 'velvet leaf', 'turmeric', 'rock banana']

Let's say we want to delete the rose
Notice that the pop() method returns the element. If we just want to delete without returning the element, then we can use the del() built-in function in Python. I have discussed built-in functions in this blog post.

What if we want to remove an element by name. Let's remove 'swollen finger grass' using the remove() method.

>>> flowers.remove('swollen finger grass')
>>> print flowers
['rock banana', 'velvet leaf', 'turmeric', 'rock banana']

We have already spoken about slicing in the blog post on sequences. There are a few more things we can do with slicing, like replacing elements, and removing them as well.

Let's create a simple list of numbers:
numbers = [3, 5, 4, 7, 4]

Let's insert a number at the 3rd position in the list
>>>numbers[2:2] = [10]
>>>print numbers
[3, 5, 10, 4, 7, 4]
Notice that when we use slicing to assign insert into a list, we have to use an iterable for the value we insert. This is the reason we use [10] and not 10

Let's replace some elements of the list
>>> print numbers
[3, 5, 10, 4, 7, 4]
>>> numbers[3:5] = [14, 17]
>>> print numbers
[3, 5, 10, 14, 17, 4]

Let's delete the 4th element from th list
>>> numbers[3:4] = []
>>> print numbers
[3, 5, 10, 17, 4]

Monday, August 10, 2009

Python built in functions

Some built in functions in Python

Let's make a simple list of numbers

>>>numbers = [2,4,3,6,5]

To get the number of elements in numbers, we can use the len() method


To get the max element in the list we can use the max() method


The max() method is not restricted to lists of numbers only. It can work with any iterable.

Just like the max function returns the largest value from an iterable, the min() function returns the smallest value.


Suppose we have a String "Hello" and we want to create a list of the characters of the String, we can use the built-in list() function.

['H', 'e', 'l', 'l', 'o']

The list() function is not limited to String, we can give it any iterable and it will return a list of the elements of the iterable in the same order as they appear in the iterable.

If we want to delete an element in an iterable, we can use the del function
>>> hello_list = list("Hello")
>>> del hello_list[0]
>>> print hello_list
['e', 'l', 'l', 'o']

We can determine the type of an object using the type() function.
>>> type(hello_list)

Slicing in Python

Python supports slicing of sequences. Just watched a video on Python slicing here.

My notes follow:

Slicing a sequence is the act of taking a sequence and getting back some elements from it. These elements couls be consecutive elements or they could be spaced at an interval.

Let's create a list and then slice it

my_list = [2,3,4,5,6,7,8,9,]

print my_list[2:7]
Will return all elements from the 2nd to the (7-1)st position.
[4, 5, 6, 7, 8]

print my_list[2:]
Will print all elements from the 2nd element to the end of the list.
[4, 5, 6, 7, 8, 9]

We need not get only consecutive elements. They can also be spaced.
print my_list[2:7:2]
Will print elements spaced at a distance of 2 from the 2nd element to the 6th element.
[4, 6, 8]

We can also address the list using -ve indexes.
print my_list[-5:-1]
Will return all the elements starting from the 5th element from the end going upto the element before the 1st element from the end.
[5, 6, 7, 8]

If we want to print from the 5th element from the end to the last element, we need to use
[5, 6, 7, 8, 9]

Lists using -ve indexes can also be spaced.
print my_list[-5:-1:2]
[5, 7]

Python Sequences

I just viewed some videos on Python sequences and lists at . My notes follow:

  • A populated tuple need not be enclosed in parameters. Does that mean that
the print method in
print "Hello", "Again"

takes a tuple of Strings?
  • Even though ['h','e','l','l','o'] and "Hello" are both sequences, we cannot concatenate them, because we cannot concatenate the non list with a list.
  • Sequences can be multiples... "Hello"*3 will result in "HelloHelloHello"
  • Lists support a sort method. This method will take a list of objects and sort the list. This method does not return anything.
  • Python has a sorted() function which takes a String and sorts it. It returns a new string which is the sorted string.

Python Strings

Just watched the basic and advanced tutorials on Python Strings over at . Here are my notes on Python Strings.

When we concatenate a String with another object, we will get an Exception if the other object is not a String. In such cases we should convert the other object to a String. There are a few methods which will convert non string objects to Strings. Assuming obj is a non string.

  1. str(obj)
  2. repr(obj)
  3. `obj`
Strings have a find method which will find a substring within a String and return the index where the substring starts.

We can change the case of a String using the upper() and lower() methods. Note that these methods return the changed String and do not change the existing String (since String objects are immutable)

>>> s = "HeLlo"
>>> upper(s)
>>> s.upper()
>>> s.lower()
>>> s

Strings also have a join method which can be used to join a list with a String, such that the String delimits every element from the list.

>>> words = ['hello', 'howdy', 'namaste']
>>> s = ':'
>>> s.join(words)

We can replace characters or parts of a String using the replace() method.

>>> hello = 'hello'
>>> hello.replace('e', 'g')
>>> hello.replace('hell', 'heaven')