SImple tweet sentiment analysis tutorial

julio 25, 2018

Natural Language Processing (NLP) is one of those areas that is gaining thrust thanks to deep learning. The advances in AI in the last years has led to the explosion of several old-school AI problems like NLP.

Though several areas of this huge field that is NLP have been under development for centuries (see German-Danish Scientist Christian Gottlieb Kratzenstein), the truth is that the state of the art of this field is gradually being taken away from pure-lingüistic models and are being replaced with others that are more deep learning based (lingüistics is still key in their design). Training large neural nets with a vast amount of data is proving to reach human-like performances in many areas of the NLP (watch Google’s assistant presentation at Google I/O 2018).

Today we will see how to produce a real-time sentiment analysis on tweets by using pretrained models and available python libraries.

First of all, you are going to need a Twitter API key, you can get one here.

The libraries we are going to use are:

pandas
numpy
tweepy
matplotlib
textblob
re

Pandas, numpy, matplotlib and re, are well know libraries for data scientists (and any python programmer).

Tweepy is the tool we are going to use to call the Twitter API and textblob is an NLP library with pre-trained sentiment anaylsis models and several other features, you should take a look at its documentation.

Lets get to it!

First we have to call all the libraries

import pandas as pd 
import numpy as np
import tweepy
from IPython.display import display
import matplotlib.pyplot as plt
%matplotlib inline
from textblob import TextBlob
import re

Then we need to set our Twitter API keys

CONSUMER_KEY = “XXX”
CONSUMER_SECRET = “XXX

ACCESS_TOKEN = “XXX”
ACCESS_SECRET = “XXX”

An usuall step in any NLP algorithm is to “clean” the text, in our case, tweets tend to have several special characters coming from links, emojis, #hashtags and @ats.

So we define a function to clean the text to make it easier for our sentiment analysis function to produce good results.

def clean_tweet(tweet):
       return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())

And now a function to produce the sentiment analysis using the TextBlob function, this “analyze_sentiment” function will return a -1 for negative posts 0 for neutrals and 1 for positive ones.

def analize_sentiment(tweet):
       analysis = TextBlob(clean_tweet(tweet))
       if analysis.sentiment.polarity > 0:
            return 1
      elif analysis.sentiment.polarity == 0:
            return 0
      else:
            return -1

Now we have to deal with the tweets feed, tweepy neeeds the keys to query the API so we pass the keys and set up a “listener” for tweepy to gather the tweets we want

class TwitterStreamListener(tweepy.StreamListener):
       def on_status(self, status):
             print(status.text)
             print(status.user.profile_image_url_https)
             print(analize_sentiment(status.text))

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.secure = True
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, retry_count=10, retry_delay=5, retry_errors=5)
streamListener = TwitterStreamListener()
myStream = tweepy.Stream(auth=api.auth, listener=streamListener)

myStream.filter(track=["keyword"])

This will give us tweets containing the keyword in real-time. Check tweepy documentation to see how to get information on users, hashtags, single tweets, and so on.

We made a simple visualization tool to display the tweets, in future posts we will show you how to make that too!

Huge credits to github user: eledroos for parts of the code.
Writter:
Ricardo Gasperini
Category:
NLP, MACHINE LEARNING, TUTORIAL
Skill:
Python
Date Post:
10 July, 2018
Tags:
nlp, machine learning, tutorial, twitter

Data science, economy and music

Deja un comentario

Tu dirección de correo electrónico no será publicada. Los campos obligatorios están marcados con *