pyPodcastParser

Introduction

pyPodcastParser is a podcast parser. It should parse any RSS file, but it specializes in parsing podcast rss feeds. pyPodcastParser is agnostic about the method you use to get a podcast RSS feed. Most user will be most comfortable with the Requests library.

Installation

pip install pyPodcastParser

Usage

from pyPodcastParser.Podcast import Podcast
import requests

request = requests.get('https://some_rss_feed')
podcast = Podcast(request.content)

Objects and their Useful Attributes

Notes:

  • All attributes with empty or nonexistent element will have a value of None.
  • Attributes are generally strings or lists of strings, because we want to record the literal value of elements.
  • The cloud element aka RSS Cloud is not supported as it has been superseded by the superior PubSubHubbub protocal

Podcast

  • categories (list) A list for strings representing the feed categories
  • copyright (string): The feed’s copyright
  • creative_commons (string): The feed’s creative commons license
  • items (list): A list of Item objects
  • description (string): The feed’s description
  • generator (string): The feed’s generator
  • image_title (string): Feed image title
  • image_url (string): Feed image url
  • image_link (string): Feed image link to homepage
  • image_width (string): Feed image width
  • image_height (Sample H4string): Feed image height
  • itunes_author_name (string): The podcast’s author name for iTunes
  • itunes_block (boolean): Does the podcast block itunes
  • itunes_categories (list): List of strings of itunes categories
  • itunes_complete (string): Is this podcast done and complete
  • itunes_explicit (string): Is this item explicit. Should only be “yes” and “clean.”
  • itune_image (string): URL to itunes image
  • itunes_keywords (list): List of strings of itunes keywords
  • itunes_new_feed_url (string): The new url of this podcast
  • language (string): Language of feed
  • last_build_date (string): Last build date of this feed
  • link (string): URL to homepage
  • managing_editor (string): managing editor of feed
  • published_date (string): Date feed was published
  • pubsubhubbub (string): The URL of the pubsubhubbub service for this feed
  • owner_name (string): Name of feed owner
  • owner_email (string): Email of feed owner
  • subtitle (string): The feed subtitle
  • title (string): The feed title
  • ttl (string): The time to live or number of minutes to cache feed
  • web_master (string): The feed’s webmaster

Item

  • author (string): The author of the item
  • comments (string): URL of comments
  • creative_commons (string): creative commons license for this item
  • description (string): Description of the item.
  • enclosure_url (string): URL of enclosure
  • enclosure_type (string): File MIME type
  • enclosure_length (integer): File size in bytes
  • guid (string): globally unique identifier
  • itunes_author_name (string): Author name given to iTunes
  • itunes_block (boolean): It this Item blocked from itunes
  • itunes_closed_captioned: (string): It is this item have closed captions
  • itunes_duration (string): Duration of enclosure
  • itunes_explicit (string): Is this item explicit. Should only be “yes” and “clean.”
  • itune_image (string): URL of item cover art
  • itunes_order (string): Override published_date order
  • itunes_subtitle (string): The item subtitle
  • itunes_summary (string): The summary of the item
  • link (string): The URL of item.
  • published_date (string): Date item was published
  • title (string): The title of item.

Credits

Testing

https://travis-ci.org/jrigden/pyPodcastParser.svg?branch=master https://coveralls.io/repos/github/jrigden/pyPodcastParser/badge.svg?branch=master

License

The MIT License (MIT) Copyright (c) 2016 Jason Rigden

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

API

These are the details:

class pyPodcastParser.Podcast.Podcast(feed_content)

Parses an xml rss feed

RSS Specs http://cyber.law.harvard.edu/rss/rss.html

More RSS Specs http://www.rssboard.org/rss-specification

iTunes Podcast Specs http://www.apple.com/itunes/podcasts/specs.html

The cloud element aka RSS Cloud is not supported as it has been superseded by the superior PubSubHubbub protocal

Parameters:feed_content (str) – An rss string

Note

All attributes with empty or nonexistent element will have a value of None

Attributes are generally strings or lists of strings, because we want to record the literal value of elements.

feed_content

str

The actual xml of the feed

soup

bs4.BeautifulSoup

A soup of the xml with items and image removed

image_soup

bs4.BeautifulSoup

soup of image

full_soup

bs4.BeautifulSoup

A soup of the xml with items

categories

list

List for strings representing the feed categories

copyright

str

The feed’s copyright

creative_commons

str

The feed’s creative commons license

items

item

Item objects

description

str

The feed’s description

generator

str

The feed’s generator

image_title

str

Feed image title

image_url

str

Feed image url

str

Feed image link to homepage

image_width

str

Feed image width

image_height

str

Feed image height

itunes_author_name

str

The podcast’s author name for iTunes

itunes_block

bool

Does the podcast block itunes

itunes_categories

list

List of strings of itunes categories

itunes_complete

str

Is this podcast done and complete

itunes_explicit

str

Is this item explicit. Should only be “yes” and “clean.”

itune_image

str

URL to itunes image

itunes_keywords

list

List of strings of itunes keywords

itunes_new_feed_url

str

The new url of this podcast

language

str

Language of feed

last_build_date

str

Last build date of this feed

str

URL to homepage

managing_editor

str

managing editor of feed

published_date

str

Date feed was published

pubsubhubbub

str

The URL of the pubsubhubbub service for this feed

owner_name

str

Name of feed owner

owner_email

str

Email of feed owner

subtitle

str

The feed subtitle

title

str

The feed title

ttl

str

The time to live or number of minutes to cache feed

web_master

str

The feed’s webmaster

is_valid_rss

bool

Is this a valid RSS Feed

is_valid_podcast

bool

Is this a valid Podcast

date_time

datetime

When published

count_items()

Counts Items in full_soup and soup. For debugging

set_categories()

Parses and set feed categories

Parses copyright and set value

set_creative_commons()

Parses creative commons for item and sets value

set_description()

Parses description and sets value

set_extended_elements()

Parses and sets non required elements

set_full_soup()

Sets soup and keeps items

set_generator()

Parses feed generator and sets value

set_image()

Parses image element and set values

set_is_valid_rss()

Check to if this is actually a valid RSS feed

set_itune_image()

Parses itunes images and set url as value

set_itunes()

Sets elements related to itunes

set_itunes_author_name()

Parses author name from itunes tags and sets value

set_itunes_block()

Check and see if podcast is blocked from iTunes and sets value

set_itunes_categories()

Parses and set itunes categories

set_itunes_complete()

Parses complete from itunes tags and sets value

set_itunes_explicit()

Parses explicit from itunes tags and sets value

set_itunes_keywords()

Parses itunes keywords and set value

set_itunes_new_feed_url()

Parses new feed url from itunes tags and sets value

set_language()

Parses feed language and set value

set_last_build_date()

Parses last build date and set value

Parses link to homepage and set value

set_managing_editor()

Parses managing editor and set value

set_optional_elements()

Sets elements considered option by RSS spec

set_owner()

Parses owner name and email then sets value

set_published_date()

Parses published date and set value

set_pubsubhubbub()

Parses pubsubhubbub and email then sets value

set_required_elements()

Sets elements required by RSS spec

set_soup()

Sets soup and strips items

set_subtitle()

Parses subtitle and sets value

set_summary()

Parses summary and set value

set_title()

Parses title and set value

set_ttl()

Parses summary and set value

set_web_master()

Parses the feed’s webmaster and sets value

class pyPodcastParser.Item.Item(soup)

Parses an xml rss feed

RSS Specs http://cyber.law.harvard.edu/rss/rss.html iTunes Podcast Specs http://www.apple.com/itunes/podcasts/specs.html

Parameters:soup (bs4.BeautifulSoup) – BeautifulSoup object representing a rss item

Note

All attributes with empty or nonexistent element will have a value of None

author

str

The author of the item

comments

str

URL of comments

creative_commons

str

creative commons license for this item

description

str

Description of the item.

enclosure_url

str

URL of enclosure

enclosure_type

str

File MIME type

enclosure_length

int

File size in bytes

guid

str

globally unique identifier

itunes_author_name

str

Author name given to iTunes

itunes_block

bool

It this Item blocked from itunes

itunes_closed_captioned

(str): It is this item have closed captions

itunes_duration

str

Duration of enclosure

itunes_explicit

str

Is this item explicit. Should only be yes or clean.

itune_image

str

URL of item cover art

itunes_order

str

Override published_date order

itunes_subtitle

str

The item subtitle

itunes_summary

str

The summary of the item

str

The URL of item.

published_date

str

Date item was published

title

str

The title of item.

date_time

datetime

When published

set_author()

Parses author and set value.

set_categories()

Parses and set categories

set_comments()

Parses comments and set value.

set_creative_commons()

Parses creative commons for item and sets value

set_description()

Parses description and set value.

set_enclosure()

Parses enclosure_url, enclosure_type then set values.

set_guid()

Parses guid and set value

set_itune_image()

Parses itunes item images and set url as value

set_itunes_author_name()

Parses author name from itunes tags and sets value

set_itunes_block()

Check and see if item is blocked from iTunes and sets value

set_itunes_closed_captioned()

Parses isClosedCaptioned from itunes tags and sets value

set_itunes_duration()

Parses duration from itunes tags and sets value

set_itunes_element()

Set each of the itunes elements.

set_itunes_explicit()

Parses explicit from itunes item tags and sets value

set_itunes_order()

Parses episode order and set url as value

set_itunes_subtitle()

Parses subtitle from itunes tags and sets value

set_itunes_summary()

Parses summary from itunes tags and sets value

Parses link and set value.

set_published_date()

Parses published date and set value.

set_rss_element()

Set each of the basic rss elements.

set_title()

Parses title and set value.