Bag of Words in NLP : Natural Language Processing

Bag of words

The above bag is not a common bag its a bag of words. But what is bag-of-words (BoW). Well by name what we can assume is it’s a bag carrying words.

Well thats some how true this bag does carry words but whats the use of it and how does it work this is some thing I am going to explain you now.

  • He wants to go USA
  • She wants to go Germany
  • Bodybuilders like good diet

Above are three sentence now if I give this text directly to my model as input will it understand ? NO! it won’t because computer cannot understand this text without being trained so in order for us to make computer understand these sentences we need to create an equivalent vector of each sentence.

Vector in easy words is nothing but a numerical shape of text.

This whole process of creating an equivalent vector is called bag of words now let’s explore the process of BoW.

Process of BoW:

Assume that initially we have only these three statements and this is our entire data that we need to work on.

Unique Words:

First the BoW will find all the unique words in the three statement we won’t be counting repeating words.

Bag of words

In the given data we found 11 unique words.

Creating Table:

As soon as BoW finds all the unique word it will create column for each words and row for each statement.

So in the current scenario we’ve 11 words and 3 statements which means our table will contain 11 columns and 3 rows.

Bag of words NLP

Creating Vectors :

Now we are going to use numbers to fill out our table and create equivalent vector for each statement.

If a word is in the sentence we will write the number of time it is present if it is not present then 0.

BoW NLP

Our first statement is ‘He want to go USA’ . ‘He’ is in the statement once only thats why we put 1 in that column for first statement row. ‘wants’ yes its there once only put 1. ‘to’ yes its there again once only put 1. ‘go’ yes its there how many times only 1 so put 1. ‘USA’ yes its there put 1 for the remaining words they aren’t in our first sentence i.e the first row thats why we put 0.

Bag of words NLP

The vector form of each statement looks like this:

  • He wants to go USA
NLP
  • She wants to go Germany
natural language processing
  • Bodybuilders likes good diet
NLP python

Note one thing the 1 in the row doesn’t represents the presence only but also the number of times each word is present in that specific statement.

Example :

Now our statement is this ‘He He wants to go USA’

‘He’ is occuring twice thats why this time we put 2 in the column of ‘he’ for this statement.

After converting our text in to vector we can pass it as input to our model.

If my this article has helped you understanding BoW. Kindly follow me and share with your friends. In my next article I’ll be writing for TF-IDF.

--

--

--

An energetic and motivated individual IOS developer/ Data Science Practitioner. Apart from computer science Martial arts interests me the most.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Behavioural Cloning Applied to Self-Driving Car on a Simulated Track

ID verification pipeline with deep learning

Prototyping Machine Learning Models with Streamlit

What NLP Has To Say About Radiologists

Dog Breed Classification

How to detect face masks using cloud service?

Understanding the log loss function

Feature Transformation

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Umair Ishrat Khan

Umair Ishrat Khan

An energetic and motivated individual IOS developer/ Data Science Practitioner. Apart from computer science Martial arts interests me the most.

More from Medium

Stop Words In Natural Language Processing — NLP

Stop Words In Natural Language Processing — NLP

Natural Language Processing Challenges

Natural Language Processing in the textual genres analysis

Natural Language Processing (NLP)