Regular Expression
We will start with Regular Expressions. RegEx is a way of creating a rule to filter what you want from your data. We will to keep Python as a consistent coding language for this complete exercise.
Many of you must have come across SQL questions where you need to get the data of customers whose name starts with A and in the WHERE condition you write something like,
WHERE Customer_Name LIKE ‘A%’
Well !! This is the basic Regular Expression where you request the query to get you a specific result. The way we write Regular Expression in Python is a bit different. Check out the table below:-
To use Regular Expression, first, you need to “import re” packageAnd the following 4 functions quite useful for using your regex 1. findall – It returns a complete list of all the matches
2. search – It returns a match object
3. split – Splits the string wherever there is a match
4. sub – It replaces one or many matches of the regex
Following are some important metacharacter and special sequence
RegEx | Description |
w+ | Get all the words |
d | Digits |
s | Spaces |
S | Anything but white spaces |
+ | One or more occurrences |
^ | Starts with |
$ | Ends with |
* | Zero or more occurences |
+ | One or more occurrences |
| | Either Or |
[] | A set of Character |
Special sequence |
Let’s get down on some questions to understand the basics of how to write a regex
1. re.split(‘s+’,’My name is Data Monk’)‘My’ ‘name’ ‘is’ ‘Data’ ‘Monk’ – The above function took the regex s+ to get all the words from the given string and split it 2. end_Sentence = r'[.?!]’
print(re.split(end_Sentence, String)
The above line of codes will split the document wherever a sentence is ending with a full stop, question mark, or an exclamation mark
3. [a-z A-Z 0-9 -.]
This will match all the upper case, lower case, digits, – and . 4. r”[.*]”
Since it contains an asterisk, so it will match anything and everything You can find many more RegEx exercise questions on different websites. Do practice a few 🙂
Let’s continue with our NLP