Question

BOX8 | What is tokenization and What are the important nltk tokenizer?

Question

in NLP

in progress 0

Machine Learning TheDataMonk 55 years 2 Answers 1025 views Grand Master 0

About TheDataMonkGrand Master

I am the Co-Founder of The Data Monk. I have a total of 6+ years of analytics experience 3+ years at Mu Sigma 2 years at OYO 1 year and counting at The Data Monk I am an active trader and a logically sarcastic idiot :)

Follow Me

Answers ( 2 )

Leave an answer

Name*

E-Mail*

Website

Attachment

Browse

Featured image

Browse

Answer*

Previous question

Next question

Ramya Mamidipaka · Answer 1 · June 21, 2020

Tokenization is the process by which big quantity of text is divided into smaller parts called tokens.

Natural language processing is used for building applications such as Text classification, intelligent chatbot, sentimental analysis, language translation, etc. It becomes vital to understand the pattern in the text to achieve the above-stated purpose. These tokens are very useful for finding such patterns as well as is considered as a base step for stemming and lemmatization.

Some nltk tokenizers: TweetTokenizer,MWETokenizer, sent_tokenize, word_tokenize

swap007 Grand Master · Answer 2 · June 21, 2020

Tokenization is basically splitting a string into different parts (tokens) based
upon a particular delimiter.
Each word is a token when a sentence is tokenized into words. Each sentence can also be a token,
if you tokenized the sentences out of a paragraph.
Different types of tokenizers:
SpaceTokenizer – Tokenizes on the basis of space
WordPunctTokenizer() – Tokenizes on the basis of Alphabets and Non-alphabets
TweetTokenizer() – we are able to convert the stream of words into small small tokens
so that we can analyse the audio stream.
StanfordTokenizer() – Follows Stanford Standard for generating tokens.
TabTokenizer() – Tokenizes on the basis of TAB
LineTokenizer() – Tokenizes every line.

Register Now

Login

Lost Password

BOX8 | What is tokenization and What are the important nltk tokenizer?

About TheDataMonkGrand Master

Related questions

What kind of jobs or career opportunities are present in the Machine Learning domain?

Random Forest

Can you use Linear Regression for Classification?

What are the assumptions of Linear Regression?

What is correlation and what is its range?

Answers ( 2 )

Leave an answer