Welcome to labMT-simple’s documentation!

Contents:

Getting Started

TL;DR a simple labMT usage script

Usage

This script uses the language assessment by Mechanical Turk (labMT) word list to score the happiness of a corpus. The labMT word list was created by combining the 5000 words most frequently appearing in four sources: Twitter, the New York Times, Google Books, and music lyrics, and then scoring the words for sentiment on Amazon’s Mechanical Turk. The list is described in detail in the publication Dodds’ et al. 2011, PLOS ONE, “Temporal Patterns of Happiness and Information in a Global-Scale Social Network: Hedonometrics and Twitter.”

Given two corpora, the script “storylab.py” creates a word-shift graph illustrating the words most responsible for the difference in happiness between the two corpora. The corpora should be large (e.g. at least 10,000 words) in order for the difference to be meaningful, as this is a bag-of-words approach. As an example, a random collection of English tweets from both Saturday January 18 2014 and Tuesday January 21 2014 are included in the “example” directory. They can be compared by moving to the test directory, using the command

python example.py example-shift.html

and opening the file example-shift.html in a web browser. For an explanation of the resulting plot, please visit

http://www.hedonometer.org/shifts.html

Installation

Cloning the github directly is recommended, and then installing locally:

git clone https://github.com/andyreagan/labMT-simple.git
cd labMT-simple
python setup.py install

This repository can also be installed using pip

pip install labMTsimple

in which case you can download the tests from github and run them, if desired.

Running tests

Tests are based on nose2, pip install nose2, and can be run inside the by executing

nose2

in the root directory of this repository.

This will compare the two days in test/data and print test.html which shifts them, allowing for a changable lens.

Developing with labMT-simple locally

I find it really useful to reload the library when testing it interactively:

try:
    reload
except NameError:
    # Python 3
    from importlib import reload

Building these docs

Go into the docs directory (activate local virtualenv first), and do the following:

\rm -rf _build/*
make html
make latexpdf
git add -f *
git commit -am "new docs, probably should just add a pre-commit hook"

Note that these docs will build locally in python 2 because the dependencies exist. With python 3 available, these dependencies will be mocked (and this is set for the online readthedocs site).

(sphinx-apidoc -o . ../labMTsimple was run once.)

Detailed Examples

Preparing texts

This is simple really: just load the text to be scored into python. I’m using a subset of a couple days of public tweets to text, and I’ve already put the tweet text into .txt files that I load into strings:

f = codecs.open("data/18.01.14.txt","r","utf8")
saturday = f.read()
f.close()

f = codecs.open("data/21.01.14.txt","r","utf8")
tuesday = f.read()
f.close()

Loading dictionaries

Again this is really simple, just use the emotionFileReader function:

lang = 'english'
labMT,labMTvector,labMTwordList = emotionFileReader(stopval=0.0,lang=lang,returnVector=True)

Then we can score the text and get the word vector at the same time:

saturdayValence,saturdayFvec = emotion(saturday,labMT,shift=True,happsList=labMTvector)
tuesdayValence,tuesdayFvec = emotion(tuesday,labMT,shift=True,happsList=labMTvector)

But we don’t want to use these happiness scores yet, because they included all words (including neutral words). So, set all of the neutral words to 0, and generate the scores:

tuesdayStoppedVec = stopper(tuesdayFvec,labMTvector,labMTwordList,stopVal=1.0)
saturdayStoppedVec = stopper(saturdayFvec,labMTvector,labMTwordList,stopVal=1.0)

saturdayValence = emotionV(saturdayStoppedVec,labMTvector)
tuesdayValence = emotionV(tuesdayStoppedVec,labMTvector)

Making Wordshifts

I just merged updates to the d3 wordshift plotting into labMTsimple, and combined with phantom crowbar (see previous post), it’s easier than ever to use the labMT data set to compare texts.

To make an html page with the shift, you’ll just need to have labMT-simple installed. To automate the process into generating svg files, you’ll need the phantom crowbar, which depends on phantomjs. To go all the way to pdf, you’ll also need inkscape for making vectorized pdfs, or rsvg for making better formatted, but rasterized, versions.

Let’s get set up to make shifts automatically. Since they’re aren’t many dependencies all the way down, start by getting phantomjs installed, then the phantom-crowbar.

Installing phantom-crowbar

For the phantomjs, I prefer to use homebrew:

brew update
brew upgrade
brew install phantomjs

Then to get the crowbar, clone the git repository.

cd ~
git clone https://github.com/andyreagan/phantom-crowbar

To use it system-wide, I use the bash alias:

alias phantom-crowbar="/usr/local/bin/phantomjs ~/phantom-crowbar/phantom-crowbar.js"

Without too much detail, I recommend adding this to your ~/.bash_profile so that it’s loaded every time you start a terminal session.

Installing inkscape

You only need inkscape if you want to go from svg to pdf (and there are other ways too), but this one is easy with, again, homebrew.

brew install inkscape

Installing rsvg

You only need inkscape if you want to go from svg to pdf (and there are other ways too), but this one is easy with, again, homebrew.

brew install librsvg

Installing labMTsimple

There are two ways to get it: using pip of cloning the git repo. If you’re not sure, use pip. I think pip makes it easier to keep it up to date, etc.

pip install labMTsimple

Making your first shift

If you cloned the git repository, install the thing and then you can check out the example in examples/example.py. If you went with pip, see that file on github.

Go ahead and run that script!

python example-002.py

You can open the html file to see the shift in any browser, with your choice of local webserver. Python’s SimpleHTTPServer works fine, and I’ve found that the node based http-server is a bit more stable.

To take out the svg, go ahead and use the phantom-crowbar.js file copied to the example/static directory. Running it looks like this, for me:

/usr/local/bin/phantomjs js/shift-crowbar.js example-002.html shiftsvg wordshift.svg

Using inkscape or librsvg on my computer look like this:

/Applications/Inkscape.app/Contents/Resources/bin/inkscape -f $(pwd)/wordshift.svg -A $(pwd)/wordshift-inkscape.pdf

rsvg-convert --format=eps worshift.svg > wordshift-rsvg.eps
epstopdf wordshift-rsvg.eps

And again, feel free to tweet suggestions at @andyreagan, and submit pull requests to the source code!

Full Automation

I’ve wrapped up all of this into what is potentially the most backwards way to generate figure imaginable. The shiftPDF() function operates the same way as the shiftHTML(), but uses the headless web server to render the d3 graphic, then exectues a piece of injected JS to save a local SVG, and uses command line image manipulation libraries to massage it into a PDF.

On my macbook, this works, but your mileage will most certainly vary.

Advanced Usage

About Tries

For dictionary lookup of scores from phrases, the fastest benchmarks I’ve found and that were reasonable stable were from the libraries datrie and marisatrie which both have python bindings.

They’re used in the speedy module in an attempt to both speed things up, and match against word stems.

Advanced Parsing

Some dictionaries use word stems to cover the multiple uses of a single word, with a single score. We can very quickly match these word stems using a prefix match on a trie. This is much better than using many compiled RE matches, which in my testing took a very long time.

labMTsimple package

labMTsimple.speedy module

class labMTsimple.speedy.AFINN(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = "@inproceedings{nielsen2011new,\n\tAuthor = {Nielsen, Finn {\\AA}rup},\n\tBooktitle = {CEUR Workshop Proceedings},\n\tEditor = {Matthew Rowe and Milan Stankovic and Aba-Sah Dadzie and Mariann Hardey},\n\tMonth = {May},\n\tPages = {93-98},\n\tTitle = {A new {ANEW}: Evaluation of a word list for sentiment analysis in microblogs},\n\tVolume = {Proceedings of the ESWC2011 Workshop on 'Making Sense of Microposts': Big things come in small packages 718},\n\tYear = {2011}}"
citation_key = 'nielsen2011new'
construction_note = 'Manual'
license = 'ODbL v1.0'
loadDict(bananas, lang)
note = 'Words manually rated -5 to 5 with impact scores by Finn Nielsen'
score_range_type = 'integer'
stems = False
title = 'AFINN'
class labMTsimple.speedy.ANEW(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

ANEW class.

center = 5.0
citation = '@techreport{bradley1999affective,\n\tAddress = {Gainesville, FL},\n\tAuthor = {Bradley, M. M. and Lang, P. J.},\n\tInstitution = {University of Florida},\n\tKey = {psychology},\n\tTitle = {Affective norms for English words ({ANEW}): Stimuli, instruction manual and affective ratings},\n\tType = {Technical report C-1},\n\tYear = {1999}}'
citation_key = 'bradley1999affective'
construction_note = 'Survey: FSU Psych 101'
license = 'Free for research'
loadDict(bananas, lang)

Load the corpus into a dictionary, straight from the origin corpus file.

note = 'Affective Norms of English Words'
score_range_type = 'continuous'
stems = False
title = 'ANEW'
class labMTsimple.speedy.EmoLex(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@article{mohammad2013crowdsourcing,\n title={Crowdsourcing a word--emotion association lexicon},\n author={Mohammad, Saif M and Turney, Peter D},\n journal={Computational Intelligence},\n volume={29},\n number={3},\n pages={436--465},\n year={2013},\n publisher={Wiley Online Library}\n}'
citation_key = 'mohammad2013crowdsourcing'
construction_note = 'Survey: MT'
license = 'Free for research'
loadDict(bananas, lang)
note = 'NRC Word-Emotion Association Lexicon: emotions and sentiment evoked by common words and phrases using Mechanical Turk'
score_range_type = 'integer'
stems = False
title = 'EmoLex'
class labMTsimple.speedy.EmoSenticNet(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@article{poria2013enhanced,\n\tAuthor = {Poria, Soujanya and Gelbukh, Alexander and Hussain, Amir and Howard, Newton and Das, Dipankar and Bandyopadhyay, Sivaji},\n\tJournal = {IEEE Intelligent Systems},\n\tNumber = {2},\n\tPages = {31--38},\n\tTitle = {Enhanced SenticNet with affective labels for concept-based opinion mining},\n\tVolume = {28},\n\tYear = {2013}}'
citation_key = 'poria2013enhanced'
construction_note = 'Bootstrapped extension'
license = 'Non-commercial'
loadDict(bananas, lang)
note = 'extends SenticNet words with WNA labels'
score_range_type = 'integer'
stems = False
title = 'EmoSenticNet'
url = 'http://www.gelbukh.com/emosenticnet/'
class labMTsimple.speedy.Emoticons(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = "@inproceedings{gonccalves2013comparing,\n title={Comparing and combining sentiment analysis methods},\n author={Gon{\\c{c}}alves, Pollyanna and Ara{'u}jo, Matheus and Benevenuto, Fabr{'\\i}cio and Cha, Meeyoung},\n booktitle={Proceedings of the first ACM conference on Online social networks},\n pages={27--38},\n year={2013},\n organization={ACM}}"
citation_key = 'gonccalves2013comparing'
construction_note = 'Manual'
license = 'Open source code'
loadDict(bananas, lang)
note = 'Commonly used emoticons with their positive, negative, or neutral emotion'
score_range_type = 'integer'
stems = False
title = 'Emoticons'
class labMTsimple.speedy.GI(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@article{stone1966general,\n\tAuthor = {Stone, Philip J and Dunphy, Dexter C and Smith, Marshall S},\n\tJournal = {MIT Press},\n\tPublisher = {MIT press},\n\tTitle = {The General Inquirer: A Computer Approach to Content Analysis.},\n\tYear = {1966}}'
citation_key = 'stone1966general'
construction_note = 'Harvard-IV-4'
license = 'Unspecified'
loadDict(bananas, lang)
note = 'General Inquirer: database of words and manually created semantic and cognitive categories, including positive and negative connotations'
score_range_type = 'integer'
stems = False
title = 'GI'
class labMTsimple.speedy.HashtagSent(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@inproceedings{zhu2014nrc,\n title={Nrc-canada-2014: Recent improvements in the sentiment analysis of tweets},\n author={Zhu, Xiaodan and Kiritchenko, Svetlana and Mohammad, Saif M},\n booktitle={Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014)},\n pages={443--447},\n year={2014},\n organization={Citeseer}\n}'
citation_key = 'zhu2014nrc'
construction_note = 'PMI with hashtags'
license = 'Free for research'
loadDict(bananas, lang)
note = 'NRC Hashtag Sentiment Lexicon: created from tweets using Pairwise Mutual Information with sentiment hashtags as positive and negative labels (here we use only the unigrams)'
score_range_type = 'continuous'
stems = False
title = 'HashtagSent'
class labMTsimple.speedy.LIWC(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

LIWC class.

affect = 125
center = 0.0
citation = '@article{pennebaker2001linguistic},\n\tAuthor = {Pennebaker, James W and Francis, Martha E and Booth, Roger J},\n\tJournal = {Mahway: Lawrence Erlbaum Associates},\n\tPages = {2001},\n\tTitle = {Linguistic inquiry and word count: {LIWC} 2001},\n\tVolume = {71},\n\tYear = {2001}}'
citation_key = 'pennebaker2001linguistic'
construction_note = 'Manual'
license = 'Paid, commercial'
loadDict(bananas, lang)

Load the corpus into a dictionary, straight from the origin corpus file.

negative = 127
note = 'Linguistic Inquiry and Word Count, three version'
positive = 126
score_range_type = 'integer'
stems = True
title = 'LIWC'
word_types = {}
year = '07'
class labMTsimple.speedy.LIWC01(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.LIWC

affect = 12
negative = 16
positive = 13
title = 'LIWC01'
year = '01'
class labMTsimple.speedy.LIWC07(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.LIWC

This is the default, define it anyway

affect = 125
negative = 127
positive = 126
title = 'LIWC07'
year = '07'
class labMTsimple.speedy.LIWC15(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.LIWC

affect = 30
negative = 32
positive = 31
title = 'LIWC15'
year = '15'
class labMTsimple.speedy.LabMT(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

LabMT class.

Now takes the full name of the language.

center = 5.0
citation = '@article{dodds2015human,\n\tAuthor = {Dodds, P. S. and Clark, E. M. and Desu, S. and Frank, M. R. and Reagan, A. J. and Williams, J. R. and Mitchell, L. and Harris, K. D. and Kloumann, I. M. and Bagrow, J. P. and Megerdoomian, K. and McMahon, M. T. and Tivnan, B. F. and Danforth, C. M.},\nn\tJournal = {PNAS},\n\tNumber = {8},\n\tPages = {2389--2394},\n\tTitle = {Human language reveals a universal positivity bias},\n\tVolume = {112},\n\tYear = {2015}}'
citation_key = 'dodds2015human'
construction_note = 'Survey: MT, 50 ratings'
license = 'CC'
loadDict(bananas, lang)
note = 'language assessment by Mechanical Turk'
score_range_type = 'continuous'
stems = False
title = 'labMT'
class labMTsimple.speedy.MPQA(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

MPQA class.

center = 0.0
citation = '@article{wilson2005recognizing,\n\tAuthor = {Theresa Wilson and Janyce Wiebe and Paul Hoffmann},\n\tJournal = {Proceedings of Human Language Technologies Conference/Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005)},\n\tTitle = {Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis},\n\tYear = {2005}}'
citation_key = 'wilson2005recognizing'
construction_note = 'Manual + ML'
license = 'GNU GPL'
loadDict(bananas, lang)

Load the corpus into a dictionary, straight from the origin corpus file.

note = 'The Multi-Perspective Question Answering (MPQA) Subjectivity Dictionary'
score_range_type = 'integer'
stems = True
title = 'MPQA'
class labMTsimple.speedy.MaxDiff(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@article{kiritchenko2014sentiment,\n title={Sentiment analysis of short informal texts},\n author={Kiritchenko, Svetlana and Zhu, Xiaodan and Mohammad, Saif M},\n journal={Journal of Artificial Intelligence Research},\n volume={50},\n pages={723--762},\n year={2014}\n}'
citation_key = 'kiritchenko2014sentiment'
construction_note = 'Survey: MT, MaxDiff'
license = 'Free for research'
loadDict(bananas, lang)
note = 'NRC MaxDiff Twitter Sentiment Lexicon: crowdsourced real-valued scores using the MaxDiff method'
score_range_type = 'continuous'
stems = False
title = 'MaxDiff'
class labMTsimple.speedy.OL(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@article{liu2010sentiment,\n\tAuthor = {Liu, Bing},\n\tJournal = {Handbook of natural language processing},\n\tPages = {627--666},\n\tPublisher = {Chapman \\& Hall Goshen, CT},\n\tTitle = {Sentiment analysis and subjectivity},\n\tVolume = {2},\n\tYear = {2010}}'
citation_key = 'liu2010sentiment'
construction_note = 'Dictionary propagation'
license = 'Free'
loadDict(bananas, lang)

Load the corpus into a dictionary, straight from the origin corpus file.

note = 'Opinion Lexicon, developed by Bing Liu)'
score_range_type = 'integer'
stems = False
title = 'OL'
class labMTsimple.speedy.PANASX(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@jurthesis{watson1999panas,\n\tAuthor = {Watson, David and Clark, Lee Anna},\n\tSchool = {University of Iowa},\n\tTitle = {The {PANAS-X}: Manual for the positive and negative affect schedule-expanded form: Manual for the positive and negative affect schedule-expanded form},\n\tYear = {1999}}'
citation_key = 'watson1999panas'
construction_note = 'Manual'
license = 'Copyrighted paper'
loadDict(bananas, lang)
note = 'The Positive and Negative Affect Schedule --- Expanded'
score_range_type = 'integer'
stems = False
title = 'PANAS-X'
class labMTsimple.speedy.Pattern(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@article{de2012pattern,\n\tAuthor = {De Smedt, Tom and Daelemans, Walter},\n\tJournal = {The Journal of Machine Learning Research},\n\tNumber = {1},\n\tPages = {2063--2067},\n\tPublisher = {JMLR. org},\n\tTitle = {Pattern for {P}ython},\n\tVolume = {13},\n\tYear = {2012}}'
citation_key = 'de2012pattern'
construction_note = 'Unspecified'
license = 'BSD'
loadDict(bananas, lang)
note = 'A web mining module for the Python programming language, version 2.6'
score_range_type = 'continuous'
stems = False
title = 'Pattern'
class labMTsimple.speedy.SOCAL(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@article{taboada2011lexicon,\n Author = {Taboada, Maite and Brooke, Julian and Tofiloski, Milan and Voll, Kimberly and Stede, Manfred},\n Date-Added = {2016-07-13 20:17:18 +0000},\n Date-Modified = {2016-07-13 20:17:18 +0000},\n Journal = {Computational linguistics},\n Number = {2},\n Pages = {267--307},\n Publisher = {MIT Press},\n Title = {Lexicon-based methods for sentiment analysis},\n Volume = {37},\n Year = {2011}}'
citation_key = 'taboada2011lexicon'
construction_note = 'Manual'
license = 'GNU GPL'
loadDict(bananas, lang)
note = 'Manually constructed general-purpose sentiment dictionary'
score_range_type = 'continuous'
stems = True
title = 'SOCAL'
class labMTsimple.speedy.Sent140Lex(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@inproceedings{MohammadKZ2013,\n\tAddress = {Atlanta, Georgia, USA},\n\tAuthor = {Mohammad, Saif M. and Kiritchenko, Svetlana and Zhu, Xiaodan},\n\tBooktitle = {Proceedings of the seventh international workshop on Semantic Evaluation Exercises (SemEval-2013)},\n\tMonth = {June},\n\tTitle = {NRC-Canada: Building the State-of-the-Art in Sentiment Analysis of Tweets},\n\tYear = {2013}}'
citation_key = 'MohammadKZ2013'
construction_note = 'PMI with emoticons'
license = 'Free for research'
loadDict(bananas, lang)
note = "NRC Sentiment140 Lexicon: created from the ``sentiment140'' corpus of tweets, using Pairwise Mutual Information with emoticons as positive and negative labels (here we use only the unigrams)"
score_range_type = 'continuous'
stems = False
title = 'Sent140Lex'
class labMTsimple.speedy.SentiStrength(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@article{thelwall2010sentiment,\n\tAuthor = {Thelwall, Mike and Buckley, Kevan and Paltoglou, Georgios and Cai, Di and Kappas, Arvid},\n\tDate-Added = {2016-07-13 20:51:52 +0000},\n\tDate-Modified = {2016-07-13 20:51:52 +0000},\n\tJournal = {Journal of the American Society for Information Science and Technology},\n\tNumber = {12},\n\tPages = {2544--2558},\n\tPublisher = {Wiley Online Library},\n\tTitle = {Sentiment strength detection in short informal text},\n\tVolume = {61},\n\tYear = {2010}}'
citation_key = 'thelwall2010sentiment'
construction_note = 'LIWC+GI'
license = 'Free for research'
loadDict(bananas, lang)
note = 'an API and Java program for general purpose sentiment detection (here we use only the sentiment dictionary)'
score_range_type = 'integer'
stems = True
title = 'SentiStrength'
url = 'http://sentistrength.wlv.ac.uk/'
class labMTsimple.speedy.SentiWordNet(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@inproceedings{baccianella2010sentiwordnet,\n title={SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.},\n\tAuthor = {Baccianella, Stefano and Esuli, Andrea and Sebastiani, Fabrizio},\n\tBooktitle = {LREC},\n\tPages = {2200--2204},\n\tTitle = {Senti{W}ord{N}et 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.},\n\tVolume = {10},\n\tYear = {2010}}'
citation_key = 'baccianella2010sentiwordnet'
construction_note = 'Synset synonyms'
license = 'CC BY-SA 3.0'
loadDict(bananas, lang)
note = 'WordNet synsets each assigned three sentiment scores: positivity, negativity, and objectivity'
score_range_type = 'continuous'
stems = False
title = 'SentiWordNet'
class labMTsimple.speedy.SenticNet(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@inproceedings{cambria2014senticnet,\n title={SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis},\n author={Cambria, Erik and Olsher, Daniel and Rajagopal, Dheeraj},\n booktitle={Proceedings of the twenty-eighth AAAI conference on artificial intelligence},\n pages={1515--1521},\n year={2014},\n organization={AAAI Press}}'
citation_key = 'cambria2014senticnet'
construction_note = 'Label propogation'
license = 'Citation requested'
loadDict(bananas, lang)
note = 'Sentiment dataset labelled with semantics and 5 dimensions of emotions by Cambria \\etal, version 3'
score_range_type = 'continuous'
stems = False
title = 'SenticNet'
class labMTsimple.speedy.USent(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@inproceedings{pappas2013distinguishing,\n title={Distinguishing the popularity between topics: A system for up-to-date opinion retrieval and mining in the web},\n author={Pappas, Nikolaos and Katsimpras, Georgios and Stamatatos, Efstathios},\n booktitle={International Conference on Intelligent Text Processing and Computational Linguistics},\n pages={197--209},\n year={2013},\n organization={Springer}}'
citation_key = 'pappas2013distinguishing'
construction_note = 'Manual'
license = 'CC'
loadDict(bananas, lang)
note = 'set of emoticons and bad words that extend MPQA'
score_range_type = 'integer'
stems = False
title = 'USent'
class labMTsimple.speedy.Umigon(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@inproceedings{levallois2013umigon,\n\tAuthor = {Levallois, Clement},\n\tBooktitle = {Second Joint Conference on Lexical and Computational Semantics (* SEM)},\n\tDate-Added = {2016-07-13 21:41:50 +0000},\n\tDate-Modified = {2016-07-13 21:41:50 +0000},\n\tPages = {414--417},\n\tTitle = {Umigon: sentiment analysis for tweets based on terms lists and heuristics},\n\tVolume = {2},\n\tYear = {2013}}'
citation_key = 'levallois2013umigon'
construction_note = 'Manual'
license = 'Public Domain'
loadDict(bananas, lang)
note = 'Manually built specifically to analyze Tweets from the sentiment140 corpus'
score_range_type = 'integer'
stems = False
title = 'Umigon'
class labMTsimple.speedy.VADER(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 0.0
citation = '@inproceedings{hutto2014vader,\n title={Vader: A parsimonious rule-based model for sentiment analysis of social media text},\n author={Hutto, Clayton J and Gilbert, Eric},\n booktitle={Eighth International AAAI Conference on Weblogs and Social Media},\n year={2014}}'
citation_key = 'hutto2014vader'
construction_note = 'MT survey, 10 ratings'
license = 'Freely available'
loadDict(bananas, lang)
note = 'method developed specifically for Twitter and social media analysis'
score_range_type = 'continuous'
stems = False
title = 'VADER'
class labMTsimple.speedy.WDAL(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 1.5
citation = "@article{whissell1986dictionary,\n\tAuthor = {Whissell, Cynthia and Fournier, Michael and Pelland, Ren{'e} and Weir, Deborah and Makarec, Katherine},\n\tJournal = {Perceptual and Motor Skills},\n\tNumber = {3},\n\tPages = {875--888},\n\tPublisher = {Ammons Scientific},\n\tTitle = {A dictionary of affect in language: IV. Reliability, validity, and applications},\n\tVolume = {62},\n\tYear = {1986}}"
citation_key = 'whissell1986dictionary'
construction_note = 'Survey: Columbia students'
license = 'Unspecified'
loadDict(bananas, lang)
note = "Whissel's Dictionary of Affective Language: words rated in terms of their Pleasantness, Activation, and Imagery (concreteness)"
score_range_type = 'continuous'
stems = False
title = 'WDAL'
class labMTsimple.speedy.WK(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: labMTsimple.speedy.sentiDict

center = 5.0
citation = '@article{warriner2013norms,\n\tAbstract = {Information about the affective meanings of words is used by researchers working on emotions and moods, word recognition and memory, and text-based sentiment analysis. Three components of emotions are traditionally distinguished: valence (the pleasantness of a stimulus), arousal (the intensity of emotion provoked by a stimulus), and dominance (the degree of control exerted by a stimulus). Thus far, nearly all research has been based on the ANEW norms collected by Bradley and Lang (1999) for 1,034 words. We extended that database to nearly 14,000 English lemmas, providing researchers with a much richer source of information, including gender, age, and educational differences in emotion norms. As an example of the new possibilities, we included stimuli from nearly all of the category norms (e.g., types of diseases, occupations, and taboo words) collected by Van Overschelde, Rawson, and Dunlosky (Journal of Memory and Language 50:289-335, 2004), making it possible to include affect in studies of semantic memory.},\n\tAuthor = {Warriner, Amy Beth and Kuperman, Victor and Brysbaert, Marc},\n\tDate-Added = {2015-09-23 21:58:30 +0000},\n\tDate-Modified = {2017-01-16 02:20:03 +0000},\n\tDoi = {10.3758/s13428-012-0314-x},\n\tIssn = {1554-3528},\n\tJournal = {Behavior research methods},\n\tKeywords = {age,are in high demand,are used in at,because they,crowdsourcing,differences,emotion,emotional ratings of words,gender differences,least four lines of,research,semantics,the first of these},\n\tMonth = December,\n\tNumber = {4},\n\tPages = {1191--1207},\n\tPmid = {23404613},\n\tTitle = {Norms of valence, arousal, and dominance for 13,915 English lemmas},\n\tUrl = {http://www.ncbi.nlm.nih.gov/pubmed/23404613},\n\tVolume = {45},\n\tYear = {2013},\n\tBdsk-Url-1 = {http://www.ncbi.nlm.nih.gov/pubmed/23404613},\n\tBdsk-Url-2 = {http://dx.doi.org/10.3758/s13428-012-0314-x}}'
citation_key = 'warriner2013norms'
construction_note = 'Survey: MT, 14--18 ratings'
license = 'CC'
loadDict(bananas, lang)
note = 'Warriner and Kuperman rated words from SUBTLEX by Mechanical Turk'
score_range_type = 'continuous'
stems = False
title = 'WK'
class labMTsimple.speedy.sentiDict(datastructure='dict', stopVal=0.0, bananas=False, loadFromFile=False, saveFile=False, lang='english')

Bases: object

An abstract class to score them all.

computeStatistics(stopVal)
data = {}
fmt = 'Hf'
makeListsFromDict()

Make lists from a dict, used internally.

makeMarisaTrie(save_flag=False)

Turn a dictionary into a marisa_trie.

matcherDictBool(word)

MatcherTrieDict(word) just checks if a word is in the dict.

matcherTrieBool(word)

MatcherTrieBool(word) just checks if a word is in the list. Returns 0 or 1.

Works for both trie types. Only one needed to make the plots. Only use this for coverage, so don’t even worry about using with a dict.

matcherTrieDict(word, wordVec, count)

Not sure what this one does.

matcherTrieMarisa(word, wordVec, count)

Not sure what this one does.

my_marisa = (<marisa_trie.RecordTrie object>, <marisa_trie.RecordTrie object>)

Declare this globally.

openWithPath(filename, mode)

Helper function for searching for files.

scoreTrieDict(wordDict, idx=1, center=0.0, stopVal=0.0)

Score a wordDict using the dict backend.

INPUTS:

-wordDict is a favorite hash table of word and count.

scoreTrieMarisa(wordDict, idx=1, center=0.0, stopVal=0.0)

Score a wordDict using the marisa_trie backend.

INPUTS:

-wordDict is a favorite hash table of word and count.

stopData()
stopper(tmpVec, stopVal=1.0, ignore=[])

Take a frequency vector, and 0 out the stop words.

Will always remove the nig* words.

Return the 0’ed vector.

wordVecifyTrieDict(wordDict)

Make a word vec from word dict using dict backend.

INPUTS:

-wordDict is our favorite hash table of word and count.

wordVecifyTrieMarisa(wordDict)

Make a word vec from word dict using marisa_trie backend.

INPUTS:

-wordDict is our favorite hash table of word and count.

labMTsimple.speedy.u(x)

Python 2/3 agnostic unicode function

labMTsimple.storyLab module

labMTsimple.storyLab.copy_static_files()

Deprecated method to copy files from this module’s static directory into the directory where shifts are being made.

labMTsimple.storyLab.emotion(tmpStr, someDict, scoreIndex=1, shift=False, happsList=[])

Take a string and the happiness dictionary, and rate the string.

If shift=True, will return a vector (also then needs the happsList).

labMTsimple.storyLab.emotionFileReader(stopval=1.0, lang='english', min=1.0, max=9.0, returnVector=False)

Load the dictionary of sentiment words.

Stopval is our lens, $Delta _h$, read the labMT dataset into a dict with this lens (must be tab-deliminated).

With returnVector = True, returns tmpDict,tmpList,wordList. Otherwise, just the dictionary.

labMTsimple.storyLab.emotionV(frequencyVec, scoreVec)

Given the frequency vector and the score vector, compute the happs.

Doesn’t use numpy, but equivalent to np.dot(freq,happs)/np.sum(freq).

Same as copy_static_files, but makes symbolic links.

labMTsimple.storyLab.shift(refFreq, compFreq, lens, words, sort=True)

Compute a shift, and return the results.

If sort=True, will return the three sorted lists, and sumTypes. Else, just the two shift lists, and sumTypes (words don’t need to be sorted).

labMTsimple.storyLab.shiftHtml(scoreList, wordList, refFreq, compFreq, outFile, corpus='LabMT', advanced=False, customTitle=False, title='', ref_name='reference', comp_name='comparison', ref_name_happs='', comp_name_happs='', isare='')

Make an interactive shift for exploring and sharing.

The most insane-o piece of code here (lots of file copying, writing vectors into html files, etc).

Accepts a score list, a word list, two frequency files and the name of an HTML file to generate

** will make the HTML file, and a directory called static that hosts a bunch of .js, .css that is useful.

labMTsimple.storyLab.shiftHtmlJupyter(scoreList, wordList, refFreq, compFreq, outFile, corpus='LabMT', advanced=False, customTitle=False, title='', ref_name='reference', comp_name='comparison', ref_name_happs='', comp_name_happs='', isare='', saveFull=True, selfshift=False, bgcolor='white')

Shifter that generates HTML in two pieces, designed to work inside of a Jupyter notebook.

Saves the filename as given (with .html extension), and sneaks in a filename-wrapper.html, and the wrapper file has the html headers, everything to be a standalone file. The filenamed html is just the guts of the html file, because the complete markup isn’t need inside the notebook.

labMTsimple.storyLab.shiftHtmlPreshifted(scoreList, wordList, refFreq, compFreq, outFile, corpus='LabMT', advanced=False, customTitle=False, title='', ref_name='reference', comp_name='comparison', ref_name_happs='', comp_name_happs='', isare='')

Make an interactive shift for exploring and sharing.

The most insane-o piece of code here (lots of file copying, writing vectors into html files, etc).

Accepts a score list, a word list, two frequency files and the name of an HTML file to generate

** will make the HTML file, and a directory called static that hosts a bunch of .js, .css that is useful.

labMTsimple.storyLab.stopper(tmpVec, score_list, word_list, stopVal=1.0, ignore=[], center=5.0)

Take a frequency vector, and 0 out the stop words.

Will always remove the nig* words.

Return the 0’ed vector.

labMTsimple.storyLab.stopper_mat(tmpVec, score_list, word_list, stopVal=1.0, ignore=[], center=5.0)

Take a frequency vector, and 0 out the stop words.

A sparse-aware matrix stopper. F-vecs are rows: [i,:]

Will always remove the nig* words.

Return the 0’ed matrix, sparse.