8chan Python Library¶
py8chan is a Python library that gives access to the 8chan API and an object-oriented way to browse and get board and thread information quickly and easily.
py8chan is based on BASC-py4chan, a 4chan API wrapper that was adopted and extended by the Bibliotheca Anonoma.
The py8chan repository is located on Github, where pull requests and issues can be submitted.
Getting Help
If you want help, or you have some trouble using this library, our primary IRC channel is #bibanon on irc.rizon.net. Simply head in there and talk to dan or antonizoon. Otherwise, you can put a issue on our Github Issue Tracker and we’ll respond as soon as we can!
General Documentation¶
Tutorial¶
When using py8chan, it can be a bit hard to find where to begin. Here, we run through how to create and use the various objects available in this module.
Boards¶
py8chan.Board
is the first thing you create when using py8chan. Everything else is created through that class. The most basic way to create a board is as below:
board = py8chan.Board('v')
This creates a py8chan.Board
object that you can then use to create py8chan.Thread
and py8chan.Post
objects.
But what sort of things does a py8chan.Board
object let you do?
Here’s a short code snippet of us printing out how many threads are active on a board:
board = py8chan.Board('v')
thread_ids = board.get_all_thread_ids()
str_thread_ids = [str(id) for id in thread_ids] # need to do this so str.join below works
print('There are', len(all_ids), 'active threads on /tg/:', ', '.join(str_thread_ids))
Threads¶
Listing how many threads exist on a board is all well and good, but most people want to actually get threads and do things with them. Here, we’ll describe how to do that.
All py8chan.Thread
objects are created by a py8chan.Board
object, using one of the py8chan.Board.get_thread()
methods.
For this example, we have a user ask us about “thread 1234”, and we return information about it:
thread_id = 1234
board = py8chan.Board('v')
if board.thread_exists(thread_id):
thread = board.get_thread(thread_id)
# print thread information
print('Thread', thread_id)
if thread.closed:
print(' is closed')
if thread.sticky
print(' is a sticky')
# information from the OP
topic = thread.topic
print(' is named:', topic.subject)
print(' and was made by:', name, email)
Changes from the original py4chan¶
Since Edgeworth has gone MIA, BASC has adopted the project and made the following improvements.
Changes by antonizoon¶
- 4chan Link Structure Update - 4chan has heavily reformed it’s link structure, finally removing the strange folder structure inherited from the Futaba Channel.
- 4chan cdn Link update - To save money on bandwidth. 4chan has changed it’s image/thumbnail/json/css servers to a domain name with fewer characters.
- Thread Class: new
filenames()
function that return the filenames of all files (not thumbnails) in a thread. - Thread Class: new
thumbnames()
function that return the filenames of all thumbnails in a thread.- Post Class: new
image_fname
andthumbnail_fname
properties, designed for Thread Classfilenames()
andthumbnames()
.
- Post Class: new
- Actual API Documentation - Real documentation on using the py-4chan library is a must. For some people, it is rocket science.
Changes by Anorov¶
- Anorov’s underscore_function_notation - Even I have to say that CamelCase is beginning to suck, so we’ve adopted Anorov’s function notation for py4chan. This breaks API compatibility with the original py-4chan, but just use find/replace to change your functions.
- Break up classes into separate files. - Makes the code much cleaner.
- Thread Class:
expand()
function, used to display omitted posts and images. Used by all_posts(). - Thread Class:
semantic_thread_url()
function, used to obtain 4chan’s new URL format, which tacks on the thread title (obtained fromslug()
). - Post Class:
comment()
has been modified to useclean_comment_body()
when returning a comment. The raw text from the 4chan API can still be obtained fromorig_comment()
.- Util Class:
clean_comment_body()
function, which converts all HTML tags and entities within 4chan comments into human-readable text equivalents.(e.g.<br>
to a newline,<a href>
into a raw link)
- Util Class:
- Board Class:
_get_json()
function, which dumps the raw JSON from the 4chan API. - A whole host of new Catalog parsing functions:
- Board Class:
refresh_cache()
andclear_cache()
- Get the latest Catalog of all threads in the board, or clear the current cache. - Board Class:
get_threads(page)
- Get a list of all threads on a certain page. (Pages are now indexed starting from 1). - Board Class:
get_all_thread_ids()
- Get a list of all thread IDs on the board. - Board Class:
get_all_threads()
- Return all threads on all pages in the board.
- Board Class:
Changes by Daniel Oaks¶
- ReadTheDocs Documentation - Splitting the documentation out to ReadTheDocs, using Sphinx to generate nice, useful docs!
API Documentation¶
py8chan
– 8chan Python Library¶
py8chan
gives access to 8chan from a clean Python interface.
Basic Usage¶
4chan Python Library.
BASC-py4chan is a Python library that gives access to the 4chan API and an object-oriented way to browse and get board and thread information quickly and easily.
py8chan.Board
– 8chan Boards¶
py8chan.Board
provides access to a 8chan board including checking if threads exist, retrieving appropriate py8chan.Thread
objects, and returning lists of all the threads that exist on the given board.
Example¶
Here is a sample application that grabs and uses Board information:
from __future__ import print_function
import py8chan
board = py8chan.Board('tg')
thread_ids = board.get_all_thread_ids()
str_thread_ids = [str(id) for id in thread_ids] # need to do this so str.join below works
print('There are', len(all_ids), 'active threads on /tg/:', ', '.join(str_thread_ids))
Basic Usage¶
-
class
py8chan.
Board
(board_name, https=False, session=None)[source]¶ Represents a 4chan board.
-
name
¶ Name of this board, such as
tg
ork
.Type: str
-
name
Name of the board, such as “tg” or “etc”.
Type: string
-
title
¶ Board title, such as “Animu and Mango”.
Type: string
-
is_worksafe
¶ Whether this board is worksafe.
Type: bool
-
page_count
¶ How many pages this board has.
Type: int
-
threads_per_page
¶ How many threads there are on each page.
Type: int
-
Methods¶
Board.
__init__
(board_name, https=False, session=None)[source]¶Creates a
basc_py4chan.Board
object.
Parameters:
- board_name (string) – Name of the board, such as “tg” or “etc”.
- https (bool) – Whether to use a secure connection to 4chan.
- session – Existing requests.session object to use instead of our current one.
Board.
thread_exists
(thread_id)[source]¶Check if a thread exists or has 404’d.
Parameters: thread_id (int) – Thread ID Returns: Whether the given thread exists on this board. Return type: bool
Board.
get_thread
(thread_id, update_if_cached=True, raise_404=False)[source]¶Get a thread from 4chan via 4chan API.
Parameters:
- thread_id (int) – Thread ID
- update_if_cached (bool) – Whether the thread should be updated if it’s already in our cache
- raise_404 (bool) – Raise an Exception if thread has 404’d
Returns: Thread object
Return type:
basc_py4chan.Thread
Board.
get_threads
(page=0)[source]¶Returns all threads on a certain page.
Gets a list of Thread objects for every thread on the given page. If a thread is already in our cache, the cached version is returned and thread.want_update is set to True on the specific thread object.
Pages on 8chan/vichan are indexed from 0 onwards. (not 1 as in modern 4chan: 4chan used to start from 0)
Parameters: page (int) – Page to request threads for. Defaults to the first page. Returns: List of Thread objects representing the threads on the given page. Return type: list of basc_py4chan.Thread
Board.
get_all_threads
(expand=False)[source]¶Return every thread on this board.
If not expanded, result is same as get_threads run across all board pages, with last 3-5 replies included.
Uses the catalog when not expanding, and uses the flat thread ID listing at /{board}/threads.json when expanding for more efficient resource usage.
If expanded, all data of all threads is returned with no omitted posts.
Parameters: expand (bool) – Whether to download every single post of every thread. If enabled, this option can be very slow and bandwidth-intensive. Returns: List of Thread objects representing every thread on this board. Return type: list of basc_py4chan.Thread
Board.
get_all_thread_ids
()[source]¶Return the ID of every thread on this board.
Returns: List of IDs of every thread on this board. Return type: list of ints
py8chan.Thread
– 8chan Threads¶
py8chan.Thread
allows for standard access to a 8chan thread, including listing all the posts in the thread, information such as whether the thread is locked and stickied, and lists of attached file URLs or thumbnails.
Basic Usage¶
-
class
py8chan.
Thread
(board, id)[source]¶ Represents a thread.
-
closed
¶ Whether the thread has been closed.
Type: bool
-
sticky
¶ Whether this thread is a ‘sticky’.
Type: bool
-
topic
¶ Topic post of the thread, the OP.
Type: py8chan.Post
-
posts
¶ List of all posts in the thread, including the OP.
Type: list of py8chan.Post
-
all_posts
¶ List of all posts in the thread, including the OP and any omitted posts.
Type: list of py8chan.Post
-
url
¶ URL of the thread, not including semantic slug.
Type: string
-
Undefined Attributes
Type: Not implemented in 8chan API. Do not use.
-
replies and images
Infuriatingly, the OP post in a thread
-
doesn't list how many replies there are in a thread.
-
semantic_url
¶ URL of this post, with the thread’s ‘semantic’ component.
Type: string
-
semantic_slug
¶ This post’s ‘semantic slug’.
Type: string
-
Methods¶
Thread objects are not instantiated directly, but instead through the appropriate
py8chan.Board
methods such aspy8chan.Board.get_thread()
.
py8chan.Post
– 8chan Post¶
py8chan.Post
allows for standard access to a 8chan post.
Example¶
Here is a sample application that grabs and prints py8chan.Thread
and py8chan.Post
information:
# credits to Anarov for improved example
from __future__ import print_function
import py8chan
# get the board we want
board = py8chan.Board('v')
# select the first thread on the board
all_thread_ids = board.get_all_thread_ids()
first_thread_id = all_thread_ids[0]
thread = board.get_thread(first_thread_id)
# print thread information
print(thread)
print('Sticky?', thread.sticky)
print('Closed?', thread.closed)
print('Replies:', len(thread.replies))
# print topic post information
topic = thread.topic
print('Topic Repr', topic)
print('Postnumber', topic.post_number)
print('Timestamp', topic.timestamp)
print('Datetime', repr(topic.datetime))
print('Filemd5hex', topic.file_md5_hex)
print('Fileurl', topic.file_url)
print('Subject', topic.subject)
print('Comment', topic.comment)
print('Thumbnailurl', topic.thumbnail_url)
Basic Usage¶
-
class
py8chan.
Post
(thread, data)[source]¶ Represents a 4chan post.
-
post_id
¶ ID of this post. Eg:
123123123
,456456456
.Type: int
-
poster_id
¶ Poster ID.
Type: int
-
name
¶ Poster’s name.
Type: string
-
email
¶ Poster’s email.
Type: string
-
tripcode
¶ Poster’s tripcode.
Type: string
-
subject
¶ Subject of this post.
Type: string
-
comment
¶ This comment, with the <wbr> tag removed.
Type: string
-
html_comment
¶ Original, direct HTML of this comment.
Type: string
-
text_comment
¶ Plaintext version of this comment.
Type: string
-
is_op
¶ Whether this is the OP (first post of the thread)
Type: bool
-
timestamp
¶ Unix timestamp for this post.
Type: int
-
datetime
¶ Datetime time of this post.
Type: datetime.datetime
-
has_file
¶ Whether this post has a file attached to it.
Type: bool
-
has_extra_files
¶ Whether this post has more than one file attached to it.
Type: bool
-
url
¶ URL of this post.
Type: string
- Undefined Attributes (Not implemented in 8chan API. Do not use.):
- poster_id (int): Poster ID. file_deleted (bool): Whether the file attached to this post was deleted after being posted. semantic_url (string): URL of this post, with the thread’s ‘semantic’ component. semantic_slug (string): This post’s ‘semantic slug’.
Post objects are not instantiated directly, but through a
py8chan.Thread
object with an attribute likepy8chan.Thread.all_posts
.-
py8chan.File
– 8chan File¶
py8chan.Post
allows for standard access to a 4chan file. This provides programs with a complete File object that contains all metadata about the 4chan file, and makes migration easy if 4chan ever makes multiple files in one Post possible (as 8chan does).
Basic Usage¶
-
class
py8chan.
File
(post, data)[source]¶ Represents File objects and their thumbnails.
- Constructor:
- post (py8chan.Post) - parent Post object. data (dict) - The post or extra_files dict from the 8chan API.
-
file_md5
¶ MD5 hash of the file attached to this post.
Type: string
-
file_md5_hex
¶ Hex-encoded MD5 hash of the file attached to this post.
Type: string
-
filename_original
¶ Original name of the file attached to this post.
Type: string
-
filename
¶ Filename of the file attached to this post.
Type: string
-
file_url
¶ URL of the file attached to this post.
Type: string
-
file_extension
¶ Extension of the file attached to this post. Eg:
png
,webm
, etc.Type: string
-
file_size
¶ Size of the file attached to this post.
Type: int
-
file_width
¶ Width of the file attached to this post.
Type: int
-
file_height
¶ Height of the file attached to this post.
Type: int
-
thumbnail_width
¶ Width of the thumbnail attached to this post.
Type: int
-
thumbnail_height
¶ Height of the thumbnail attached to this post.
Type: int
-
thumbnail_fname
¶ Filename of the thumbnail attached to this post.
Type: string
-
thumbnail_url
¶ URL of the thumbnail attached to this post.
Type: string
File objects are not instantiated directly, but through a
py8chan.File
object with an attribute likepy8chan.Post.first_file
.