Code 401 Class 19 Reading Notes
Python Regular Expressions Tutorial
Basic Pattern: Ordinary Characters
Ordinary characters are the simplest regular expressions. They match themselves exactly and do not have a special meaning in their regular expression syntax.
Example of ordinary characters used to perform simple exact matches:
pattern = r"Cookie"
sequence = "Cookie"
if re.match(pattern, sequence):
print("Match!")
else: print("Not a match!")
> Output: Match!
Wild Card Characters: Special Characters, are characters that do no match themselves as seen but have a special meaning when used in a regular expression. Can be thought of as reserved metacharacters that denote something else and not what they look like.
. - A period. Matches any single character except the newline characters
Example
re.search(r’Co.k.e’, ‘Cookie’).group()
‘Cookie’
^
- A caret. Matches the start of the string.- Helpful if you want to make sure a document/sentence starts with certain characters.
$
- Matches the end of string.- Helpful if you want to make sure a document/sentence ends with certain characters.
- [abc] - Matches a or b or c.
- [a-zA-Z0-9] - Matches any letter from (a to z) or (A to Z) or (0 - 9)
- \ - Backlash.
- If the character following the backslash is a recognized escape character, the the special meaning of the term is taken.
- Else if the charcter following the \ is not a recognized escape character, then the \ is treated like any other character and passed through
- \ Can be used in front of all metacharacters to remove their special meaning.
Standard Library reference for a complete list.
compile(patter, flags=0): can computer a regular expression pattern into a regular expression object.
search(pattern, string, flags=0), can scan through the given string/sequence, looking for the first location where the regular expression produces a match.
match(patter, string, flags=0), Returns a corresponding match object if zero or more characters at the beginning of string match the pattern
findall(pattern, string, flags=0), possible matches in the entire sequence and returns them as a list of strings. Each returned string represents one match.
finditer(string, [position, end_position]), finds all the possible matches in the entire sequence but returns regex match objects as an iterator.
Things I want to know more about
Love these functions that ‘re’ has.