Sphinx Tutorial

Welcome to the Introduction to Sphinx & Read the Docs. This tutorial will walk you through the initial steps writing reStructuredText and Sphinx, and deploying that code to Read the Docs.

Please provide feedback to @ericholscher.

Schedule

Thanks for coming

Getting Started: Overview & Installing Initial Project

Concepts

Sphinx Philosophy

Sphinx is what is called a documentation generator. This means that it takes a bunch of source files in plain text, and generates a bunch of other awesome things, mainly HTML. For our use case you can think of it as a program that takes in plain text files in reStructuredText format, and outputs HTML.

reST -> Sphinx -> HTML

So as a user of Sphinx, your main job will be writing these text files. This means that you should be minimally familiar with reStructuredText as a language. It’s similar to Markdown in a lot of ways, if you are already familiar with Markdown.

Tasks

Installing Sphinx

The first step is installing Sphinx. Sphinx is a python project, so it can be installed like any other python library. Every Operating System should have Python pre-installed, so you should just have to run:

pip install sphinx

Note

Advanced users can install this in a virtualenv if they wish. Also, easy_install install Sphinx works fine if you don’t have pip.

Get this repo

To do this tutorial, you need the actual repository. It contains the example code that we will be documenting.

You can clone it here:

git clone https://github.com/ericholscher/pycon-sphinx-tutorial
Getting Started

Now you are ready to start creating documentation. You should have a directory called crawler, which contains source code in it’s src directory. Inside the crawler you should create a docs directory, and move into it:

cd crawler
mkdir docs
cd docs

Then you can create the Sphinx project skeleton in this directory:

sphinx-quickstart

Have the Project name be Crawler, put in your own Author name, and put in 1.0 as the Project version. Otherwise you can accept the default options.

My output looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
-> sphinx-quickstart
Welcome to the Sphinx 1.3.1 quickstart utility.

Please enter values for the following settings (just press Enter to
accept a default value, if one is given in brackets).

Enter the root path for documentation.
> Root path for the documentation [.]:

You have two options for placing the build directory for Sphinx output.
Either, you use a directory "_build" within the root path, or you separate
"source" and "build" directories within the root path.
> Separate source and build directories (y/n) [n]: 

Inside the root directory, two more directories will be created; "_templates"
for custom HTML templates and "_static" for custom stylesheets and other static
files. You can enter another prefix (such as ".") to replace the underscore.
> Name prefix for templates and static dir [_]:

The project name will occur in several places in the built documentation.
> Project name: Crawler
> Author name(s): Eric Holscher

Sphinx has the notion of a "version" and a "release" for the
software. Each version can have multiple releases. For example, for
Python the version is something like 2.5 or 3.0, while the release is
something like 2.5.1 or 3.0a1.  If you don't need this dual structure,
just set both to the same value.
> Project version: 1.0
> Project release [1.0]:

If the documents are to be written in a language other than English,
you can select a language here by its language code. Sphinx will then
translate text that it generates into that language.

For a list of supported codes, see
http://sphinx-doc.org/config.html#confval-language.
> Project language [en]:

The file name suffix for source files. Commonly, this is either ".txt"
or ".rst".  Only files with this suffix are considered documents.
> Source file suffix [.rst]:

One document is special in that it is considered the top node of the
"contents tree", that is, it is the root of the hierarchical structure
of the documents. Normally, this is "index", but if your "index"
document is a custom template, you can also set this to another filename.
> Name of your master document (without suffix) [index]:

Sphinx can also add configuration for epub output:
> Do you want to use the epub builder (y/n) [n]:

Please indicate if you want to use one of the following Sphinx extensions:
> autodoc: automatically insert docstrings from modules (y/n) [n]:
> doctest: automatically test code snippets in doctest blocks (y/n) [n]:
> intersphinx: link between Sphinx documentation of different projects (y/n) [n]:
> todo: write "todo" entries that can be shown or hidden on build (y/n) [n]:
> coverage: checks for documentation coverage (y/n) [n]:
> pngmath: include math, rendered as PNG images (y/n) [n]:
> mathjax: include math, rendered in the browser by MathJax (y/n) [n]:
> ifconfig: conditional inclusion of content based on config values (y/n) [n]:
> viewcode: include links to the source code of documented Python objects (y/n) [n]:

A Makefile and a Windows command file can be generated for you so that you
only have to run e.g. `make html' instead of invoking sphinx-build
directly.
> Create Makefile? (y/n) [y]: 
> Create Windows command file? (y/n) [y]: 

Creating file ./conf.py.
Creating file ./index.rst.
Creating file ./Makefile.
Creating file ./make.bat.

Finished: An initial directory structure has been created.

You should now populate your master file ./index.rst and create other documentation
source files. Use the Makefile to build the docs, like so:
   make builder
where "builder" is one of the supported builders, e.g. html, latex or linkcheck.

Your file system should now look similar to this:

crawler
├── src
└── docs
    ├── index.rst
    ├── conf.py
    ├── Makefile
    ├── make.bat
    ├── _build
    ├── _static
    ├── _templates

We have a top-level docs directory in the main project directory. Inside of this is:

index.rst:
This is the index file for the documentation, or what lives at /. It normally contains a Table of Contents that will link to all other pages of the documentation.
conf.py:
Allows for customization of Sphinx. You won’t need to use this too much yet, but it’s good to be familiar with this file.
Makefile & make.bat:
This is the main interface for local development, and shouldn’t be changed.
_build:
The directory that your output files go into.
_static:
The directory to include all your static files, like images.
_templates:
Allows you to override Sphinx templates to customize look and feel.
Building docs

Let’s build our docs into HTML to see how it works. Simply run:

# Inside top-level docs/ directory.
make html

This should run Sphinx in your shell, and output HTML. At the end, it should say something about the documents being ready in _build/html. You can now open them in your browser by typing:

# On OS X
open _build/html/index.html

You can also view it by running a web server in that directory:

# Inside docs/_build/html directory.
python -m SimpleHTTPServer

# For python 3
python3 -m http.server

Then open your browser to http://localhost:8000.

This should display a rendered HTML page that says Welcome to Crawler’s documentation! at the top.

Note

make html is the main way you will build HTML documentation locally. It is simply a wrapper around a more complex call to Sphinx, which you can see as the first line of output.

Custom Theme

You’ll notice your docs look a bit different than mine.

First, you need to install the theme:

$ pip install sphinx_rtd_theme

Then you need to update a few settings in your conf.py.

import sphinx_rtd_theme

html_theme = 'sphinx_rtd_theme'

html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]

If you rebuild your documentation, you will see the new theme:

make html

Warning

Didn’t see your new theme? That’s because Sphinx is smart, and only rebuilds pages that have changed. It might have thought none of your pages changed, so it didn’t rebuild anything. Fix this by running a make clean html, which will force a full rebuild.

Extra Credit

Have some extra time left? Check out these other cool things you can do with Sphinx.

Understanding conf.py

Sphinx is quite configurable, which can be a bit overwhelming. However, the conf.py file is quite well documented. You can read through it and get some ideas about what all it can do.

A few of the more useful settings are:

  • project
  • html_theme
  • extensions
  • exclude_patterns

This is all well documented in the Sphinx Configuration doc.

Moving on

Now it is time to move on to Step 1: Getting started with RST.

Step 1: Getting started with RST

Now that we have our basic skeleton, let’s document the project. As you might have guessed from the name, we’ll be documenting a basic web crawler.

For this project, we’ll have the following pages:

  • Index Page
  • Support
  • Installation
  • Cookbook
  • Command Line Options
  • API

Let’s go over the concepts we’ll cover, and then we can talk more about the pages to create.

Concepts

A lot of these RST syntax examples are covered in the Sphinx reStructuredText Primer.

Sections
Title
=====

Section
-------

Subsection
~~~~~~~~~~

Every Sphinx document has multiple level of headings. Section headers are created by underlining the section title with a punctuation character, at least as long as the text.

They give structure to the document, which is used in navigation and in the display in all output formats.

Code Samples
You can use ``backticks`` for showing ``highlighted`` code.

If you want to make sure that text is shown in monospaced fonts for code examples or concepts, use double backticks around it. It looks like this on output.

Code Example Syntax
A cool bit of code::

   Some cool Code

.. code-block:: rst

   A bit of **rst** which should be *highlighted* properly.

The syntax for displaying code is ::. When it is used at the end of a sentence, Sphinx is smart and displays one : in the output, and knows there is a code example in the following indented block.

Sphinx, like Python, uses meaningful whitespace. Blocks of content are structured based on the indention level they are on. You can see this concept with our code-block directive later.

Table of Contents Tree
.. toctree::
   :maxdepth: 2

   install
   support

Now would be a good time to introduce the toctree. One of the main concepts in Sphinx is that it allows multiple pages to be combined into a cohesive hierarchy. The toctree directive is a fundamental part of this structure.

The above example will output a Table of Contents in the page where it occurs. The maxdepth argument tells Sphinx to include 2 levels of headers in it’s output. It will output the 2 top-level headers of the pages listed. This also tells Sphinx that the other pages are sub-pages of the current page, creating a “tree” structure of the pages:

index
├── install
├── support

Note

The TOC Tree is also used for generating the navigation elements inside Sphinx. It is quite important, and one of the most powerful concepts in Sphinx.

Tasks

Create Installation page

Installation documentation is really important. Anyone who is coming to the project will need to install it. For our example, we are installing a basic Python script, so it will be pretty easy.

Include the following in your install.rst, on the same level as index.rst, properly marked up:

1
2
3
4
5
6
7
8
9
Installation

At the command line:

easy_install crawler

Or, if you have pip installed:

pip install crawler

Note

Live Preview: Installation

Create Support page

It’s always important that users can ask questions when they get stuck. There are many ways to handle this, but normal approaches are to have an IRC channel and mailing list.

Go ahead and put this in your support.rst, but add the proper RST markup:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Support

The easiest way to get help with the project is to join the #crawler
channel on Freenode.
We hang out there and you can get real-time help with your projects.
The other good way is to open an issue on Github.

The mailing list at https://groups.google.com/forum/#!forum/crawler 
is also available for support.

Freenode: irc://freenode.net
Github: http://github.com/example/crawler/issues

Note

Live Preview: Support

You can now open the support.html file directly, but it isn’t showing on the navigation..

Add TocTree

Now you need to tie all these files together. As we mentioned above, the Table of Contents Tree is the best way to do this. Go ahead and complete the toctree directive in your index.rst file, adding the new install and support.

Sanity Check

Your filesystem should now look something like this:

crawler
├── src
└── docs
    ├── index.rst
    ├── support.rst
    ├── install.rst
    ├── Makefile
    ├── conf.py
Build Docs

Now that you have a few pages of content, go ahead and build your docs again:

make html

If you open up your index.html, you should see the basic structure of your docs from the included toctree directive.

Extra Credit

Have some extra time left? Check out these other cool things you can do with Sphinx.

Make a manpage

The beauty of Sphinx is that it can output in multiple formats, not just HTML. All of those formats share the same base format though, so you only have to change things in one place. So you can generate a manpage for your docs:

make man

This will place a manpage in _build/man. You can then view it with:

man _build/man/crawler.1
Create a single page document

Some people prefer one large HTML document, instead of having to look through multiple pages. This is another area where Sphinx shines. You can write your documentation in multiple files to make editing and updating easier. Then if you want to distribute a single page HTML version:

make singlehtml

This will combine all of your HTML pages into a single page. Check it out by opening it in your browser:

open _build/singlehtml/index.html

Note

You’ll notice that it included the documents in the order that your TOC Tree was defined.

Play with RST

RST takes a bit of practice to wrap your head around. Go over to http://rst.ninjs.org, which is a live preview.

Note

Use the Cheat Sheet for lots more ideas!

Looking for some ideas of what the syntax contains? The reStructuredText Primer in the Sphinx docs is a great place to start.

Moving on

Now it is time to move on to Step 2: Building References & API docs.

Step 2: Building References & API docs

Note

Finish at 11:15

Concepts

Referencing

Another important Sphinx feature is that it allows referencing across documents. This is another powerful way to tie documents together.

The simplest way to do this is to define an explicit reference object:

.. _reference-name:

Cool section
------------

Which can then be referenced with :ref::

:ref:`reference-name`

Which will then be rendered with the title of the section Cool section.

Sphinx also supports :doc:`docname` for linking to a document.

Semantic Descriptions and References

Sphinx also has much more powerful semantic referencing capabilities, which knows all about software development concepts.

Say you’re creating a CLI application. You can define an option for that program quite easily:

.. option:: -i <regex>, --ignore <regex>

   Ignore pages that match a specific pattern.

That can also be referenced quite simply:

:option:`-i`

Sphinx includes a large number of these semantic types, including:

External References

Sphinx also includes a number of pre-defined references for external concepts. Things like PEP’s and RFC’s:

You can learn more about this at :pep:`8` or :rfc:`1984`.

You can read more about this in the Sphinx inline-markup docs.

Automatically generating this markup

Of course, Sphinx wants to make your life easy. It includes ways to automatically create these object definitions for your own code. This is called audodoc, which allows you do to syntax like this:

.. automodule:: crawler

and have it document the full Python module importable as crawler. You can also do a full range of auto functions:

.. autoclass::
.. autofunction::
.. autoexception::

Warning

The module must be importable by Sphinx when running. We’ll cover how to do this in the Tasks below.

You can read more about this in the Sphinx autodoc docs.

Tasks

Referencing Code

Let’s go ahead and add a cookbook to our documentation. Users will often come to your project to solve the same problems. Including a Cookbook or Examples section will be a great resource for this content.

In your cookbook.rst, add the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Cookbook

Crawl a web page

The most simple way to use our program is with no arguments.
Simply run:

python main.py -u <url>

to crawl a webpage.

Crawl a page slowly

To add a delay to your crawler,
use -d:

python main.py -d 10 -u <url>

This will wait 10 seconds between page fetches.

Crawl only your blog

You will want to use the -i flag,
which while ignore URLs matching the passed regex::

python main.py -i "^blog" -u <url>

This will only crawl pages that contain your blog URL.

Note

Live Preview: Cookbook

Remember, you will need to use :option: blocks here. This is because they are referencing a command line option for our program.

Adding Reference Targets

Now that we have pointed at our CLI options, we need to actually define them. In your cli.rst file, add the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
Command Line Options

These flags allow you to change the behavior of Crawler.
Check out how to use them in the Cookbook.

-d <sec>, --delay <sec>

Use a delay in between page fetchs so we don't overwhelm the remote server.
Value in seconds.

Default: 1 second
    
-i <regex>, --ignore <regex>

Ignore pages that match a specific pattern.

Default: None

Note

Live Preview: Command Line Options

Here you are documenting the actual options your code takes.

Try it out

Let’s go ahead and build the docs and see what happens. Do a:

make html

Here you will see that the :option: blocks magically become links to the definition. This is your first taste of Semantic Markup. With Sphinx, we are able to simply say that something is a option, and then it handles everything for us; linking between the definition and the usage.

Importing Code

Being able to define options and link to them is pretty neat. Wouldn’t it be great if we could do that with actual code too? Sphinx makes this easy, let’s take a look.

We’ll go ahead and create an api.rst that will hold our API reference:

1
2
3
4
5
6
7
8
Crawler Python API

Getting started with Crawler is easy.
The main class you need to care about is crawler.main.Crawler

crawler.main

automodule: crawler.main

Note

Live Preview: Crawler Python API

Remember, you’ll need to use the .. autoclass:: directive to pull in your source code. This will render the docstrings of your Python code nicely.

Requirements

In order to build your code, it needs to be able to import it. This means it needs all of the required Python modules you import in the code.

If you have third party dependencies, that means that you have to have them installed in your Python environment. Luckily, for most cases you can actually mock these variables using autodoc_mock_imports.

In your conf.py go ahead and add:

autodoc_mock_imports = [‘bs4’, ‘requests’]

This will allow your docs to import the example code without requiring those modules be installed.

Tell Sphinx about your code

When Sphinx runs autodoc, it imports your Python code to pull off the docstrings. This means that Sphinx has to be able to see your code. We’ll need to add our PYTHONPATH to our conf.py so it can import the code.

If you open up your conf.py file, you should see something close to this on line 18:

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.insert(0, os.path.abspath('.'))

As it notes, you need to let it know the path to your Python source. In our example it will be ../src/, so go ahead and put that in this setting.

Note

You should always use relative paths here. Part of the value of Sphinx is having your docs build on other people’s computers, and if you hard code local paths that won’t work!

Try it out

Now go ahead and regenerate your docs and look at the magic that happened:

make html

Your Python docstrings have been magically imported into the project.

Tie it all together

Now let’s link directly to that for users who come in to the project. Update your index.rst to look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
Crawler Step 2 Documentation

User Guide

toctree:

install
support
cookbook

Programmer Reference

toctree:

cli
api

Note

Live Preview: Crawler Step 2 Documentation

One last time, let’s rebuild those docs:

make html

Warning

You now have awesome documentation! :)

Now you have a beautiful documentation reference that is coming directly from your code. This means that every time you change your code, it will automatically be reflected in your documentation.

The beauty of this approach is that it allows you to keep your prose and reference documentation in the same place. It even lets you semantically reference the code from inside the docs. This is amazingly powerful and a great way to write documentation.

Extra Credit

Have some extra time left? Let’s look through the code to understand what’s happening here more.

Look through intersphinx

Intersphinx allows you to bring the power of Sphinx references to multiple projects. It lets you pull in references, and semantically link them across projects. For example, in this guide we reference the Sphinx docs a lot, so we have this intersphinx setting:

intersphinx_mapping = {
    'sphinx': ('http://sphinx-doc.org/', None),
}

Which allows us to add a prefix to references and have them resolve:

:ref:`sphinx:inline-markup`

We can also ignore the prefix, and Sphinx will fall back to intersphinx references if none exist in the current project:

:ref:`inline-markup`

You can read more about this in the intersphinx docs.

Understand the code

A lot of the magic that is happening in Importing Code above is actually in the source code.

Check out the code for crawler/main.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
"""
Main Module
"""
import time
from optparse import OptionParser
# Python 3 compat
try:
    from urlparse import urlparse
except ImportError:
    from urllib.parse import urlparse

import requests
from bs4 import BeautifulSoup

from utils import log, should_ignore


class Crawler(object):

    """
    Main Crawler object.

    Example::

        c = Crawler('http://example.com')
        c.crawl()

    :param delay: Number of seconds to wait between searches
    :param ignore: Paths to ignore

    """

    def __init__(self, url, delay, ignore):
        self.url = url
        self.delay = delay
        if ignore:
            self.ignore = ignore.split(',')
        else:
            self.ignore = []

    def get(self, url):
        """
        Get a specific URL, log its response, and return its content.

        :param url: The fully qualified URL to retrieve
        """
        response = requests.get(url)
        log(url, response.status_code)
        return response.content

    def crawl(self):
        """
        Crawl the URL set up in the crawler.

        This is the main entry point, and will block while it runs.
        """
        html = self.get(self.url)
        soup = BeautifulSoup(html, "html.parser")
        for tag in soup.findAll('a', href=True):
            link = tag['href']
            parsed = urlparse(link)
            if parsed.scheme:
                to_get = link
            else:
                to_get = self.url + link
            if should_ignore(self.ignore, to_get):
                print('Ignoring URL: {url}'.format(url=to_get))
                continue
            self.get(to_get)
            time.sleep(self.delay)


def run_main():
    """
    A small wrapper that is used for running as a CLI Script.
    """

    parser = OptionParser()
    parser.add_option("-u", "--url", dest="url", default="http://docs.readthedocs.org/en/latest/",
                      help="URL to fetch")
    parser.add_option("-d", "--delay", dest="delay", type="int", default=1,
                      help="Delay between fetching")
    parser.add_option("-i", "--ignore", dest="ignore", default='',
                      help="Ignore a subset of URL's")

    (options, args) = parser.parse_args()

    c = Crawler(url=options.url, delay=options.delay, ignore=options.ignore)
    c.crawl()

if __name__ == '__main__':
    run_main()

As you can see, we’re heavily using RST in our docstrings. This gives us the same power as we have in Sphinx, but allows it to live within the code base.

This approach of having the docs live inside the code is great for some things. However, the power of Sphinx allows you to mix docstrings and prose documentation together. This lets you keep the amount of

Moving on

Could it get better? In fact, it can and it will. Let’s go on to Step 3: Keeping Documentation Up to Date.

Step 3: Keeping Documentation Up to Date

Now we have a wonderful set of documentation, so we want to make sure it stays up to date and correct.

There are two factors here:

  • The documentation is up to date with the code
  • The user is seeing the latest version of the docs

We will solve the first problem with Sphinx’s doctest module. The second problem we will solve by deploying our docs to Read the Docs.

Concepts

Testing your code

Sphinx ships with a doctest module which is quite powerful. It allows you to run tests against your code inside your docs. This means that you can verify all of the code examples work, so that your docs are always up to date with your code!

Warning

This only works for Python currently.

You can read the full Sphinx docs for doctest, but here is a basic example:

.. doctest::

        >>> sum(2, 2)
        4

When you run this example, Sphinx will validate the return is what is expected.

If you need any other code to be run, but not output to the user, you can use testsetup:

.. testsetup::

        import os

        x = 4

This will then be available in the examples that you actually show your user.

Hosting docs on Read the Docs

Read the Docs (https://readthedocs.org) is an open source doc hosting site. It’s built in Django, and is free to use for open source projects. It hosts Sphinx documentation, automatically building it each time you make a commit.

Read the Docs gives you a number of additional features, over hosting Sphinx yourself:

  • You can add Versions to your project for each tag & branch.
  • You can alerts for when your doc build fails
  • You can search across the full set of docs with Elastic Search

We’ll be putting your docs up on Read the Docs at the end of this tutorial.

Tasks

Add doctests to our utils

The utils module is inside crawler is a good candidate for testing. It has small, self-contained pieces of logic that will work great as doctests.

Open your api.rst, and update it to look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Crawler Python API

Getting started with Crawler is easy.
The main class you need to care about is crawler.main.Crawler

crawler.main

crawler.utils

crawler.utils.should_ignore

should_ignore(['blog/$'], 'http://ericholscher.com/blog/')
True

# This test should fail
should_ignore(['home'], 'http://ericholscher.com/blog/')
True

crawler.utils.log

log('http://ericholscher.com/blog/', 200)
OK: 200 http://ericholscher.com/blog/ 

log('http://ericholscher.com/blog/', 500)
ERR: 500 http://ericholscher.com/blog/

# This test should fail
log('http://ericholscher.com/blog/', 500)
OK: 500 http://ericholscher.com/blog/

Note

Live Preview: Crawler Python API

Now go ahead and add the RST markup that is covered above in the Concepts section.

As you can see here, we are actually testing our logic. It also acts as documentation for your users, and is included in the output of your documentation.

These doctests do double duty, acting as tests and documentation.

Caveats

Note that we have to import our code in the testsetup:: block. This is so that Sphinx can call the functions properly in our doctest blocks. This is hidden in the output of the docs though, so users won’t be confused.

Note

You can also put doctest blocks directly in your docstrings. They will need to include full import paths though, as Sphinx can’t guarantee the testsetup:: directive will be called.

Test your docs

You can now go ahead and test your docs:

make doctest

Note

You will need to make sure to add the sphinx.ext.doctest to your extensions. Open up your conf.py file and make sure that you have it there.

It should provide output that looks similar to this:

Doctest summary
===============
    5 tests
    2 failures in tests
    0 failures in setup code
    0 failures in cleanup code
build finished with problems.

As you can see, some of the tests are broken! You should go ahead and fix the tests :)

Requirements

In order for Read the Docs to build your code, it needs to be able to import it. This means it needs all of the required Python modules you import in the code.

You can add a requirements.txt to the top-level of your project:

beautifulsoup4
requests
Read the Docs

Last but not least, once you’ve written your documentation you have to put it somewhere for the world to see! Read the Docs makes this quite simple, and is free for all open source projects.

  • Register for an account at http://readthedocs.org
  • Click the Import Project button
  • Add the URL for a specific repository you want to build docs for
  • Sit back and have a drink while Read the Docs does the rest.

It will:

  • Pull down your code
  • Install your requirements.txt
  • Build HTML, PDF, and ePub of your docs
  • Serve it up online at http://<projectname>.readthedocs.org

Extra Credit

Have some extra time left? Let’s run the code and see if it actually works!

Explore doctests more

Sphinx’s doctest module has more interesting options. You can do things that look more like normal unit tests, as well as specific “doctest-style” testing. Go in and re-write one of the existing tests to use the testcode directive instead of the doctest directive.

Run the crawler

Go ahead and run the crawler against the Read the Docs documentation:

# in crawler/src/crawler
python main.py -u https://docs.readthedocs.org/en/latest/

You should see your terminal start printing output, if your internet if working.

Can you add another command line option, and document it?

Moving on

Now we are at the last part of our Tutorial. Let’s head on over to Finishing Up: Additional Extensions & Individual Exploration.

Finishing Up: Additional Extensions & Individual Exploration

If there is much time left in the session, take some time to play around and get to know Sphinx better. There is a large ecosystem of extensions, and lots of builtin features we haven’t covered.

I’m happy to consult with you about interesting challenges you might be facing with docs.

Part of being a good user of Sphinx is knowing what all is there. Here are a few options for what to look at:

  • Developing extensions for Sphinx
  • Read through all the existing extensions
  • Breathe
  • Explore the Read the Docs Admin Panel
  • Apply these docs to a project you have
  • Show a neighbor what you’ve done & talk about the concepts learned.

Also, here are a number of more thought out examples of things you might do:

Markdown Support

You can use Markdown and reStructuredText in the same Sphinx project. We support this natively on Read the Docs, and you can do it locally:

$ pip install recommonmark

Then in your conf.py:

from recommonmark.parser import CommonMarkParser

source_parsers = {
    '.md': CommonMarkParser,
}

source_suffix = ['.rst', '.md']

Note

Markdown doesn’t support a lot of the features of Sphinx, like inline markup and directives. However, it works for basic prose content.

You can now add a Markdown file with a .md extension, and Sphinx will build it into the project. You can do things like include it in your normal TOC Tree, and Sphinx will search it.

Go ahead and add a new Markdown File with an .md extension. Since we haven’t covered Markdown in this text, here is an example community.md:

# Community Standards

The Crawler community is quite large,
and with that we have a specific set of standards that we apply in our community.

All of our project spaces are covered by the [Django Community Code of Conduct](https://djangoproject.com/conduct/].

### Feedback

Any issues can be sent directly to our [project mailing list](mailto:community@crawler.com).

Add it to your toctree in your index.rst as well, and you will see it appear properly in Sphinx.

Generate i18n Files

Sphinx has support for i18n. If you do a make gettext on your project, you should get a gettext catalog for your documentation. Check for it in _build/locale.

You can then use these files to translate your documentation using most standard tools. You can read more about this in Sphinx’s Internationalization doc.

Play with Sphinx autoapi

sphinx-autoapi is a tool that I am helping develop which will make doing API docs easier. It depends on parsing, instead of importing code. This means you don’t need to change your PYTHONPATH at all, and we have a few other different design decisions.

First you need to install autoapi:

pip install sphinx-autoapi

Then add it to your Sphinx project’s conf.py:

extensions = ['autoapi.extension']

# Document Python Code
autoapi_type = 'python'
autoapi_dir = '../src'

AutoAPI will automatically add itself to the last TOCTree in your top-level index.rst.

This is needed because we will be outputting rst files into the autoapi directory. This adds it into the global TOCTree for your project, so that it appears in the menus.

Add Django Support

Have a Django project laying around? Add Sphinx documentation to it! There isn’t anything special for Django projects except for the DJANGO_SETTINGS_MODULE.

You can set it in your conf.py, similar to autodoc. Try this piece of code:

# Set this to whatever your settings file should default to.
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "settings.test")

Tables

Tables can be a tricky part of a lot of lightweight markup languages. Luckily, RST has some really nice features around tables. It supports tables in a couple easier to use formats:

So for example, you can manage your tables in Google Docs, then export them as CSV in your docs.

An example of a CSV table:

.. csv-table::
   :header: "Treat", "Quantity", "Description"
   :widths: 15, 10, 30

   "Albatross", 2.99, "On a stick!"
   "Crunchy Frog", 1.49, "If we took the bones out, it wouldn't be
   crunchy, now would it?"
   "Gannet Ripple", 1.99, "On a stick!"

And a rendered example:

Treat Quantity Description
Albatross 2.99 On a stick!
Crunchy Frog 1.49 If we took the bones out, it wouldn’t be crunchy, now would it?
Gannet Ripple 1.99 On a stick!

Go ahead and try it yourself!

Cheat Sheet

We have made a cheat sheet for helping you remember the syntax for RST & Sphinx programs.

RST Cheat Sheet

_images/cheatsheet-front-full.png

Sphinx Cheat Sheet

_images/cheatsheet-back-full.png

Crawler Step 1 Documentation

Our Crawler will make your life as a web developer easier. You can learn more about it in our documentation.

Installation

At the command line:

easy_install crawler

Or, if you have pip installed:

pip install crawler

Support

The easiest way to get help with the project is to join the #crawler channel on Freenode. We hang out there and you can get real-time help with your projects. The other good way is to open an issue on Github.

The mailing list at https://groups.google.com/forum/#!forum/crawler is also available for support.

Crawler Step 2 Documentation

Our Crawler will make your life as a web developer easier. You can learn more about it in our documentation.

Installation

At the command line:

easy_install crawler

Or, if you have pip installed:

pip install crawler

Support

The easiest way to get help with the project is to join the #crawler channel on Freenode. We hang out there and you can get real-time help with your projects. The other good way is to open an issue on Github.

The mailing list at https://groups.google.com/forum/#!forum/crawler is also available for support.

Cookbook

Crawl a web page

The most simple way to use our program is with no arguments. Simply run:

python main.py -u <url>

to crawl a webpage.

Crawl a page slowly

To add a delay to your crawler, use -d:

python main.py -d 10 -u <url>

This will wait 10 seconds between page fetches.

Crawl only your blog

You will want to use the -i flag, which while ignore URLs matching the passed regex:

python main.py -i "^blog" -u <url>

This will only crawl pages that contain your blog URL.

Command Line Options

These flags allow you to change the behavior of Crawler. Check out how to use them in the Cookbook.

-d <sec>, --delay <sec>

Use a delay in between page fetchs so we don’t overwhelm the remote server. Value in seconds.

Default: 1 second

-i <regex>, --ignore <regex>

Ignore pages that match a specific pattern.

Default: None

Crawler Python API

Getting started with Crawler is easy. The main class you need to care about is Crawler

crawler.main

Main Module

class crawler.main.Crawler(url, delay, ignore)

Main Crawler object.

Example:

c = Crawler('http://example.com')
c.crawl()
Parameters:
  • delay – Number of seconds to wait between searches
  • ignore – Paths to ignore
crawl()

Crawl the URL set up in the crawler.

This is the main entry point, and will block while it runs.

get(url)

Get a specific URL, log its response, and return its content.

Parameters:url – The fully qualified URL to retrieve
crawler.main.run_main()

A small wrapper that is used for running as a CLI Script.

Crawler Step 3 Documentation

Our Crawler will make your life as a web developer easier. You can learn more about it in our documentation.

Installation

At the command line:

easy_install crawler

Or, if you have pip installed:

pip install crawler

Support

The easiest way to get help with the project is to join the #crawler channel on Freenode. We hang out there and you can get real-time help with your projects. The other good way is to open an issue on Github.

The mailing list at https://groups.google.com/forum/#!forum/crawler is also available for support.

Cookbook

Crawl a web page

The most simple way to use our program is with no arguments. Simply run:

python main.py -u <url>

to crawl a webpage.

Crawl a page slowly

To add a delay to your crawler, use -d:

python main.py -d 10 -u <url>

This will wait 10 seconds between page fetches.

Crawl only your blog

You will want to use the -i flag, which while ignore URLs matching the passed regex:

python main.py -i "^blog" -u <url>

This will only crawl pages that contain your blog URL.

Command Line Options

These flags allow you to change the behavior of Crawler. Check out how to use them in the Cookbook.

-d <sec>, --delay <sec>

Use a delay in between page fetchs so we don’t overwhelm the remote server. Value in seconds.

Default: 1 second

-i <regex>, --ignore <regex>

Ignore pages that match a specific pattern.

Default: None

Crawler Python API

Getting started with Crawler is easy. The main class you need to care about is Crawler

crawler.main

Main Module

class crawler.main.Crawler(url, delay, ignore)

Main Crawler object.

Example:

c = Crawler('http://example.com')
c.crawl()
Parameters:
  • delay – Number of seconds to wait between searches
  • ignore – Paths to ignore
crawl()

Crawl the URL set up in the crawler.

This is the main entry point, and will block while it runs.

get(url)

Get a specific URL, log its response, and return its content.

Parameters:url – The fully qualified URL to retrieve
crawler.main.run_main()

A small wrapper that is used for running as a CLI Script.

crawler.utils
utils.should_ignore(url)

Returns True if the URL should be ignored

Parameters:
  • ignore_list – The list of regexs to ignore.
  • url – The fully qualified URL to compare against.
>>> should_ignore(['blog/$'], 'http://ericholscher.com/blog/')
True
# This test should fail
>>> should_ignore(['home'], 'http://ericholscher.com/blog/')
True
utils.log(status)

Log information about a response to the console.

Parameters:
  • url – The URL that was retrieved.
  • status – A status code for the Response.
>>> log('http://ericholscher.com/blog/', 200)
OK: 200 http://ericholscher.com/blog/
>>> log('http://ericholscher.com/blog/', 500)
ERR: 500 http://ericholscher.com/blog/
# This test should fail
>>> log('http://ericholscher.com/blog/', 500)
OK: 500 http://ericholscher.com/blog/