meme_get

This is a library that provides a high-level abstraction for extracting memes from popular online websites. Currently, we support extracting memes from:

Here is a short example:

>>> a = RedditMemes()
>>> meme_list = a.get_memes(100)
>>> for meme in meme_list:
>>>     print(meme.get_title())
_images/meme1.jpg

Why do we care about memes?

According to Wikipedia, an Internet meme is “an activity, concept, catchphrase or piece of media which spreads, often as mimicry, from person to person via the Internet.” In our case, we care about a specific format of meme: one that takes the form of an image macro and captions.

_images/meme0.jpg

We find these memes interesting because they are inherently simple. Yet with the right captions and image combinations, memes can become viral in a few hours. (See Bad Luck Brian) We think it might be interesting to investigate into these phenomenons.

Authors: Jingnan Shi, Lingdong Huang, Jason Ma

Guides

If you are looking for information on installation and/or how to use a specific method or class, then you can refer to the following:

About

This library is one part of the on-going Physical Generative Arts using Trending Internet Memes Project at Harvey Mudd College.

Installation

Users can choose to install the package using pip.

Use Pip

It’s simple:

pip install meme_get

Documentation

This is the documentation page for meme_get package.

Meme

Meme objects contain the information you need to analyze a meme.

class meme_get.memesites.Meme(pic_url, time, title=None, caption=None, raw_pic_url=None, origin=<Origins.NA: 0>, tags=[], score=-1)[source]

A class for representing memes

This class provides a high-level abstraction for memes.

Attributes:
  • _pic_url (str): A string representing the url of the picture
  • _caption (str): A string representing the caption of the meme
  • _time (datetime object): The time of creation of the meme
  • _origin (Orgins Enum): The origins enum object representing the origin
  • _tags (list): A list of string representing the categories of the meme
get_caption()[source]

Get caption of the meme

Returns:The captions of the meme.
Return type:str
get_origin()[source]

Return the origin of the meme

get_pic_url()[source]

Get url to the picture

Returns:The url to the meme picture. Notice that this picture contains the captions.
Return type:str
get_raw_pic_url()[source]

Return the url of the meme’s picture without caption

Returns:The url pointing to the meme’s background picture
Return type:str
Raises:ValueError – if the meme does not have a empty background picture
get_tags()[source]

Representing a list of tags for the meme

get_time()[source]

Return the meme’s creation time

Returns:The creation time of the meme
Return type:datetime object
get_title()[source]

Get the title of the meme

Returns:The title of the meme
Return type:str
Raises:ValueError – if the meme does not have a title
ocr_caption(method='Tesseract', **kwargs)[source]

Use ocr to update self caption

OCR Methods Available

  • Tesseract: Open-source OCR Engine
  • FontMatching: Using Impact Font and template matching to conduct OCR

When using Tesseract, users need to provide two keyword arguments:

  • thres (bool): a boolean indicating whether we need to threshold the image
  • cfg (str): a string representing the configuration to use for Tesseract

MemeSite

MemeSite is the superclass that defines the interface for all the meme-extraction subclasses. The subclasses should implement caching to improve reading performance.

class meme_get.memesites.MemeSite(url, cache_size=500, maxcache_day=1)[source]

A super class for any sites with respect to memes.

This class should be subclassed. The MemeSite is designed to keep all Memes in a cache file, so that even if the Python process is terminated, the next time we run the save process, we don’t need to re-download all the memes from the Internet. The _meme_pool and _meme_deque store memes, but the users should not view the memes in them as constant, as operations on the object will change the memes inside the pool and deque.

Attributes:
  • _url (str): URL for the website hosting memes
  • _max_tries (int): Max tries for http requests
  • _meme_pool (set): A set containing stored memes
  • _meme_deque (deque): A deque containing stored memes
  • _last_update (datetime object): The time of last download of memes
  • _cache_size (int): Number of memes stored on disk
  • _maxcache_day (int): Max day of keeping the cache on disk
clean_meme_deque()[source]

Empty the meme deque

Returns:None
Return type:NoneType
clean_meme_pool()[source]

Empty the meme pool

Returns:None
Return type:NoneType
get_captions(num_memes)[source]

Return a list of captions.

Returns:A list of strings representing the captions. If captions do not exist, the string will be of None type.
Return type:list
get_meme_num()[source]

Return the number of memes we have.

Returns:An int
Return type:int
get_meme_pool()[source]

Return a set of memes

Returns:A set of Memes
Return type:set
get_memes(num_memes)[source]

Return a list of Memes.

Returns:A list of Meme objects.
Return type:list
get_unique_meme_num()[source]

Return the number of unique memes we have

Returns:An int
Return type:int
get_url()[source]

Return the base url

Returns:A string representing the url to the origin site.abs
Return type:str
QuickMeme

This is a subclass derived from MemeSite to provide meme extraction from www.quickmeme.com.

class meme_get.memesites.QuickMeme(cache_size=500, maxcache_day=1)[source]

The MemeSite subclass that deals with the quickmeme site.

quickmeme.com uses an infinite scrolling homepage. Fortunately, we can also access the later pages by just going to the url:

www.quickmeme.com/page/i/, where i is the page number each page contains 10 user posts; each post consists of an image and an alternative text

get_memes(num_memes)[source]

Get a number of memes from Quickmeme.com

MemeGenerator

This is a subclass derived from MemeSite to provide meme extraction from memegenerator.net.

class meme_get.memesites.MemeGenerator(cache_size=500, maxcache_day=1, popular_type='Daily', timeout=20)[source]

This class represents the memegenerator.net website

get_memes(num_memes)[source]

Get a number of memes from memegenerator.net

RedditMemes

This is a subclass derived from MemeSite to provide meme extraction from /r/memes.

class meme_get.memesites.RedditMemes(cache_size=500, maxcache_day=1, popular_type='Daily', timeout=20)[source]
get_memes(num)[source]

Get memes from Reddit /r/meme subreddit

Origins Enum

This is an enum-type designed to provide abstraction for the origins of the memes we extract from the Internet.

class meme_get.memesites.Origins[source]

Enum for holding the origins of memes.

MEMEGENERATOR = None

Representing memegenerator.net.

NA = None

Representing an unknown origin.

QUICKMEME = None

Representing quickmeme.com.

REDDITMEMES = None

Represeting Reddit /r/meme subreddit.

classmethod string_to_enum(s)[source]

Conver string to a Origins Enum object

Parameters:s (str) – The string representing the name of the origin

Indices and tables