Regular Expression

We will start with Regular Expressions. RegEx is a way of creating a rule to filter what you want from your data. We will to keep Python as a consistent coding language for this complete exercise.

Many of you must have come across SQL questions where you need to get the data of customers whose name starts with A and in the WHERE condition you write something like,

WHERE Customer_Name LIKE ‘A%’

Well !! This is the basic Regular Expression where you request the query to get you a specific result. The way we write Regular Expression in Python is a bit different. Check out the table below:-

To use Regular Expression, first, you need to “import re” package
And the following 4 functions quite useful for using your regex 1. findall – It returns a complete list of all the matches
2. search – It returns a match object
3. split – Splits the string wherever there is a match
4. sub – It replaces one or many matches of the regex

Following are some important metacharacter and special sequence

RegEx Description
w+ Get all the words
d Digits
s Spaces
S Anything but white spaces
+ One or more occurrences
^ Starts with
$ Ends with
* Zero or more occurences
+ One or more occurrences
| Either Or
[] A set of Character
Special sequence

Let’s get down on some questions to understand the basics of how to write a regex

1. re.split(‘s+’,’My name is Data Monk’)
‘My’ ‘name’ ‘is’ ‘Data’ ‘Monk’ – The above function took the regex s+ to get all the words from the given string and split it 2. end_Sentence = r'[.?!]’
print(re.split(end_Sentence, String)
The above line of codes will split the document wherever a sentence is ending with a full stop, question mark, or an exclamation mark
3. [a-z A-Z 0-9 -.]
This will match all the upper case, lower case, digits, – and . 4. r”[.*]”
Since it contains an asterisk, so it will match anything and everything You can find many more RegEx exercise questions on different websites. Do practice a few 🙂
Let’s continue with our NLP

 

Leave a Reply

Your email address will not be published. Required fields are marked *