/

/

Amazing-Python-Scripts

Обзор Центр заботыВойти

Amazing-Python-Scripts

Ветки: 37 Коммиты: 4560 Теги: 0

..

Amazing-Python-Scripts

/

Bag of words model

Update Bag of words model/README.md

4 года назад

Format code with autopep8

3 года назад

README.md

Package/Script Name

-->Package installed- NLKT

NLTK stands for 'Natural Language Tool Kit'. It consists of the most common algorithms such as tokenizing, part-of-speech tagging, stemming, sentiment analysis, topic segmentation, and named entity recognition. NLTK helps the computer to analysis, preprocess, and understand the written text.

--> Pandas

pandas is a library where your data can be stored, analyzed and processed in row and column representation

--> from sklearn.feature_extraction.text import CountVectorizer

Scikit-learn's CountVectorizer is used to convert a collection of text documents to a vector of term/token counts. It also enables the pre-processing of text data prior to generating the vector representation. This functionality makes it a highly flexible feature representation module for text.

Setup instructions

Input the sentences you would like to vectorize.
The script will tokenize the sentences.
It will transform the text to vectors where each word and its count is a feature.
Then the bag of word model is ready.
create dataframe where dataFrame is an analogy to excel-spreadsheet.
Open excel and check the 'bowp.xlsx' where sheet name is 'data'. The dataframe will be stored over there.

Output

Author(s)

This code is written by Sanya Devansh Zaveri

Disclaimers, if any

There are no disclaimers for this script.

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.