meme_get¶
This is a library that provides a high-level abstraction for extracting memes from popular online websites. Currently, we support extracting memes from:
- quickmeme.com
- memegenerator.net
- Memes subreddit from Reddit.
Here is a short example:
>>> a = RedditMemes()
>>> meme_list = a.get_memes(100)
>>> for meme in meme_list:
>>> print(meme.get_title())

Why do we care about memes?¶
According to Wikipedia, an Internet meme is “an activity, concept, catchphrase or piece of media which spreads, often as mimicry, from person to person via the Internet.” In our case, we care about a specific format of meme: one that takes the form of an image macro and captions.

We find these memes interesting because they are inherently simple. Yet with the right captions and image combinations, memes can become viral in a few hours. (See Bad Luck Brian) We think it might be interesting to investigate into these phenomenons.
Authors: Jingnan Shi, Lingdong Huang, Jason Ma
Guides¶
If you are looking for information on installation and/or how to use a specific method or class, then you can refer to the following:
About¶
This library is one part of the on-going Physical Generative Arts using Trending Internet Memes Project at Harvey Mudd College.
Installation¶
Users can choose to install the package using pip
.
Documentation¶
This is the documentation page for meme_get package.
Meme¶
Meme objects contain the information you need to analyze a meme.
-
class
meme_get.memesites.
Meme
(pic_url, time, title=None, caption=None, raw_pic_url=None, origin=<Origins.NA: 0>, tags=[], score=-1)[source]¶ A class for representing memes
This class provides a high-level abstraction for memes.
- Attributes:
- _pic_url (str): A string representing the url of the picture
- _caption (str): A string representing the caption of the meme
- _time (datetime object): The time of creation of the meme
- _origin (Orgins Enum): The origins enum object representing the origin
- _tags (list): A list of string representing the categories of the meme
-
get_pic_url
()[source]¶ Get url to the picture
Returns: The url to the meme picture. Notice that this picture contains the captions. Return type: str
-
get_raw_pic_url
()[source]¶ Return the url of the meme’s picture without caption
Returns: The url pointing to the meme’s background picture Return type: str Raises: ValueError – if the meme does not have a empty background picture
Representing a list of tags for the meme
-
get_time
()[source]¶ Return the meme’s creation time
Returns: The creation time of the meme Return type: datetime object
-
get_title
()[source]¶ Get the title of the meme
Returns: The title of the meme Return type: str Raises: ValueError – if the meme does not have a title
-
ocr_caption
(method='Tesseract', **kwargs)[source]¶ Use ocr to update self caption
OCR Methods Available
- Tesseract: Open-source OCR Engine
- FontMatching: Using Impact Font and template matching to conduct OCR
When using Tesseract, users need to provide two keyword arguments:
- thres (bool): a boolean indicating whether we need to threshold the image
- cfg (str): a string representing the configuration to use for Tesseract
MemeSite¶
MemeSite is the superclass that defines the interface for all the meme-extraction subclasses. The subclasses should implement caching to improve reading performance.
-
class
meme_get.memesites.
MemeSite
(url, cache_size=500, maxcache_day=1)[source]¶ A super class for any sites with respect to memes.
This class should be subclassed. The MemeSite is designed to keep all Memes in a cache file, so that even if the Python process is terminated, the next time we run the save process, we don’t need to re-download all the memes from the Internet. The _meme_pool and _meme_deque store memes, but the users should not view the memes in them as constant, as operations on the object will change the memes inside the pool and deque.
- Attributes:
- _url (str): URL for the website hosting memes
- _max_tries (int): Max tries for http requests
- _meme_pool (set): A set containing stored memes
- _meme_deque (deque): A deque containing stored memes
- _last_update (datetime object): The time of last download of memes
- _cache_size (int): Number of memes stored on disk
- _maxcache_day (int): Max day of keeping the cache on disk
-
get_captions
(num_memes)[source]¶ Return a list of captions.
Returns: A list of strings representing the captions. If captions do not exist, the string will be of None type. Return type: list
-
get_memes
(num_memes)[source]¶ Return a list of Memes.
Returns: A list of Meme objects. Return type: list
QuickMeme¶
This is a subclass derived from MemeSite
to provide meme extraction from www.quickmeme.com.
-
class
meme_get.memesites.
QuickMeme
(cache_size=500, maxcache_day=1)[source]¶ The MemeSite subclass that deals with the quickmeme site.
quickmeme.com uses an infinite scrolling homepage. Fortunately, we can also access the later pages by just going to the url:
www.quickmeme.com/page/i/, where i is the page number each page contains 10 user posts; each post consists of an image and an alternative text
MemeGenerator¶
This is a subclass derived from MemeSite
to provide meme extraction from memegenerator.net.
Origins Enum¶
This is an enum-type designed to provide abstraction for the origins of the memes we extract from the Internet.