2011-02-20 this page is under revision.
Xah Lee, 2005, 2011-02
Some functions and methods return a “MatchObject”. The following are their methods and attributes:
Returns the string in ‹template›, with back references replaced from the captured pattern. This is similar to the “sub()” function. That is:
matchObj = re.compile(pat).search(stringText) result = matchObj.expand(template)
is equivalent to
result = re.sub(pat,template,stringText)
Back refernces include the numeric forms, e.g. \1, \2, …, or \g<1>, \g<2>, …, as well as named forms, e.g. \g<name>. Note, the named forms needs to be specified in the regex pattern by ?P<name>‹pattern› (see: Regex Syntax.)
Here's a complete usage example of “expand()”:
# python import re xx = re.compile(r'date is (\d\d\d\d)') yy = xx.search('the date is 1999.') print yy.expand(r'Date: \1') # prints: Date: 1999
The following methods “groups()”, “group()”, “groupdict()” all returns the captured match in different ways.
“groups()” returns them all, “group(n1,n2, …)” returns them in a user specified order and combination, and “groupdict()” returns the named captures as a dictionary.
Return a tuple containing all the subgroups of the match.
Example:
# python import re myText = 'some a1 a2 a3 list' patObj = re.compile(r'.+(\w\d+) (\w\d+) (\w\d+).+') matchObj = patObj.search(myText) print matchObj.groups() # prints: ('a1', 'a2', 'a3')
Returns one or more captured patterns. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Each argument is a integer reference to the captured pattern, with 0 denoting the entire matched pattern. No argument group() is equivalent to group(0).
Example:
# -*- coding: utf-8 -*- # python import re myText = 'some a1 a2 a3 list' patObj = re.compile(r'.+(\w\d+) (\w\d+) (\w\d+).+') matchObj = patObj.search(myText) print matchObj.groups() # ⇒ ('a1', 'a2', 'a3') print matchObj.group() # ⇒ 'some a1 a2 a3 list' print matchObj.group(0) # ⇒ 'some a1 a2 a3 list' print matchObj.group(1) # ⇒ 'a1' print matchObj.group(2) # ⇒ 'a2' print matchObj.group(1,2) # ⇒ ('a1', 'a2') print matchObj.group(2,1,1) # ⇒ ('a2', 'a1', 'a1') print matchObj.group(0,1) # ⇒ ('some a1 a2 a3 list', 'a1')
If an argument is negative or larger than the number of groups defined in the pattern, an IndexError exception is raised.
If a group is contained in a part of the pattern that did not match, the corresponding result is None. (NEED EXAMPLE) If a group is contained in a part of the pattern that matched multiple times, the last match is returned. (NEED EXAMPLE)
If the regular expression uses the (?P<name>...) syntax, the arguments may also be strings identifying groups by name.
Example:
# -*- coding: utf-8 -*- # python import re myText = 'some a1 a2 a3 list' patObj = re.compile(r'.+(\w\d+) (?P<second>\w\d+) (\w\d+).+') matchObj = patObj.search(myText) print matchObj.group(1,'second',3) # prints: ('a1', 'a2', 'a3')
If a string argument is not used as a group name in the pattern, IndexError exception is raised.
Return a dictionary containing all the named subgroups of the match, keyed by the subgroup name. The default argument is used for groups that did not participate in the match; it defaults to None. Example:
Example:
# -*- coding: utf-8 -*- # python import re myText = 'some a1 a2 a3 list' patObj = re.compile(r'.+(?P<this>\w\d+) (?P<second>\w\d+) (?P<thatt>\w\d+).+') matchObj=patObj.search(myText) print matchObj.groupdict() # prints {'this': 'a1', 'thatt': 'a3', 'second': 'a2'}
✻ ✻ ✻
Return the indices of the start and end of the substring matched by nth captured pattern. start() is equivalent to start(0), similarly for end(). (0 represents to string matched by the whole regex pattern.) Example:
# -*- coding: utf-8 -*- # python import re myText = 'some a1 a2 a3 list' patObj = re.compile(r'.+(?P<this>\w\d+) (?P<second>\w\d+) (?P<thatt>\w\d+).+') matchObj = patObj.search(myText) print matchObj.start(1) # prints 5 print matchObj.end(1) # prints 7
Return -1 if group exists but did not contribute to the match. (todo: NOTE QUITE UNDERSTAND THIS. NEED EXAMPLE HERE) For a match object m, and a group g that did contribute to the match, the substring matched by group g (equivalent to m.group(g)) is
m.string[m.start(g):m.end(g)]
Note that
m.start(myGroup) will equal m.end(myGroup) if
myGroup matched a null string. For example, after m =
re.search('b(c?)', 'cba'), m.start(0) is 1,
m.end(0) is 2, m.start(1) and
m.end(1) are both 2, and m.start(2) raises
an “IndexError” exception.
For MatchObject m, return the 2-tuple (m.start(n), m.end(n)). Note that if the given captured pattern did not contribute to the match, this is (-1, -1). (MAY NEED AN EXAMPLE HERE). span() is equivalent to span(0).
✻ ✻ ✻
The following are various attributes of the MatchObject.
The string passed to match() or search().
Example:
# -*- coding: utf-8 -*- # python import re mm = re.compile(r'some.+').search('some text') print mm.string # prints 'some text'
The regular expression object whose match() or search() method produced this MatchObject instance.
The value of pos which was passed to the search() or match() method of the RegexObject. This is the index into the string at which the RE engine started looking for a match.
The value of endpos which was passed to the search() or match() method of the RegexObject. This is the index into the string beyond which the RE engine will not go.
The integer index of the last matched capturing group, or None
if no group was matched at all. For example, the expressions
(a)b, ((a)(b)), and ((ab)) will have
lastindex == 1 if applyied to the string 'ab',
while the expression (a)(b) will have lastindex == 2,
if applyied to the same string.
The name of the last matched capturing group, or None if the
group didn't have a name, or if no group was matched at all.