Programming Guide¶
API Stability¶
In general, any method or class prefixed with an underscore (like
_method
or _ClassName
) is private, and the API may change at
any time. You SHOULD NOT use these. Any method in an interface class
(which all begin with I
, like IAnInterface
) are stable, public
APIs and will maintain backwards-compatibility between releases.
There is one exception to this at the moment: the hidden- / onion- services APIs are NOT yet considered stable, and may still change somewhat.
Any APIs that will go away will first be deprecated for at least one major release before being removed.
There are also some attributes which don’t have underscores but
really should; these will get “deprecated” via an @property
decorator so your code will still work.
High Level Overview¶
Interacting with Tor via txtorcon should involve only calling
methods of the Tor
class.
You get an instance of Tor
in one of two ways:
call
txtorcon.connect()
or;call
txtorcon.launch()
Once you’ve got a Tor
instance you can use it to gain access to
(or create) instances of the other interesting classes; see “A Tor
Instance” below for various use-cases.
Note that for historical reasons (namely: Tor
is a
relatively new class) there are many other functions and classes
exported from txtorcon but you shouldn’t need to instantiate these
directly. If something is missing from this top-level class, please
get in touch (file a bug, chat on IRC, etc) because it’s probably a
missing feature.
A Tor Instance¶
You will need a connection to a Tor instance for txtorcon to control. This can be either an already-running Tor that you’re authorized to connect to, or a Tor instance that has been freshly launched by txtorcon.
We abstract “a Tor instance” behind the Tor
class,
which provides a very high-level API for all the other things you
might want to do:
make client-type connections over tor (see “Making Connections Over Tor”);
change its configuration (see “Tracking and Changing Tor’s Configuration”);
monitor its state (see “Monitor and Change Tor’s State”);
offer hidden-/onion- services via Tor (see “Onion (Hidden) Services”);
create and use custom circuits (see “Custom Circuits”);
issue low-level commands (see “Low-Level Protocol Classes”)
The actual control-protocol connection to tor is abstracted behind
TorControlProtocol
. This can usually be ignored by
most users, but can be useful to issue protocol commands directly,
listen to raw events, etc.
In general, txtorcon tries to never look at Tor’s version and instead queries required information directly via the control-protocol (there is only one exception to this). So the names of configuration values and events may change (or, more typically, expand) depending on what version of Tor you’re connected to.
Connecting to a Running Tor¶
Tor can listen for control connections on TCP ports or UNIX sockets. See “Tor Configuration” for information on how to configure Tor to work with txtorcon. By default, “COOKIE” authentication is used; only if that is not available do we try password authentication.
To connect, use txtorcon.connect()
which returns a Deferred that
will fire with a Tor
instance. If you need access to the
TorControlProtocol
instance, it’s available via the
.protocol
property (there is always exactly one of these per
Tor
instance). Similarly, the current configuration is
available via .get_config
(which returns a Deferred firing a
TorConfig
). You can change the configuration by updating
attributes on this class but it won’t take effect until you call
TorConfig.save()
.
Launching a New Tor¶
It’s also possible to launch your own Tor instance. txtorcon keeps a
“global” tor available for use by e.g. the .global_tor
endpoint
factory functions (like
TCPHiddenServiceEndpoint.global_tor()
). You can access
it via get_global_tor_instance()
. There is exactly zero
or one of these per Python process that uses txtorcon
.
To explicitly launch your own Tor instance, use
launch()
. You can pass a couple of minimal options
(data_directory
being recommended). If you need to set other Tor
options, use .config
to retrieve the TorConfig
instance associated with this tor and change configuration afterwards.
Setting data_directory
gives your Tor instance a place to cache
its state information which includes the current “consensus”
document. If you don’t set it, txtorcon creates a temporary directory
(which is deleted when this Tor instance exits). Startup time is
drammatically improved if Tor already has a recent consensus, so when
integrating with Tor by launching your own client it’s highly
recommended to specify a data_directory
somewhere sensible
(e.g. ~/.config/your_program_name/
is a popular choice on
Linux). See the Tor manual under the
DataDirectory
option for more information.
Tor itself will create a missing data_directory
with the correct
permissions and Tor will also chdir
into its DataDirectory
when running. For these reasons, txtorcon doesn’t try to create the
data_directory
nor do any chdir
-ing, and neither should you.
A Note On Style¶
Most of txtorcon tends towards “attribute-style access”. The guiding
principle is that “mere data” that is immediately available will be an
attribute, whereas things that “take work” or are async (and thus
return Deferred
s) will be functions. For example,
Router.get_location()
is a method because it
potentially has to ask Tor for the country, whereas
Router.hex_id
is a plain attribute because it’s
always available.
Tracking and Changing Tor’s Configuration¶
Instances of the TorConfig
class represent the
current, live state of a running Tor. There is a bit of
attribute-magic to make it possible to simply get and set things
easily:
tor = launch(..)
print("SOCKS ports: {}".format(tor.config.SOCKSPort))
tor.config.ControlPort.append(4321)
tor.config.save()
Only when .save()
is called are any SETCONF
commands
issued – and then, all configuration values are sent in a single
command. All TorConfig
instances subscribe to configuration
updates from Tor, so “live state” includes actions by any other
controllers that may be connected.
For some configuration items, the order they’re sent to Tor
matters. Sometimes, if you change one config item, you have to set a
series of related items. TorConfig handles these cases for you – you
just manipulate the configuration, and wait for .save()
‘s
Deferred
to fire and the running Tor’s configuration is updated.
Note there is a tiny window during which the state may appear slightly
inconsistent if you have multiple TorConfig
instances: after Tor
has acknowledged a SETCONF
command, but before a separate
TorConfig
instance has gotten all the CONF_CHANGED
events
(because they’re hung up in the networking stack for some
reason). This shouldn’t concern most users. (I’m not even 100% sure
this is possible; it may be that Tor doesn’t send the OK until after
all the CONF_CHANGED events). In normal use, there should only be a
single TorConfig
instance for every Tor
instance so this
shouldn’t affect you unless you’ve created your own TorConfig
.
Since TorConfig
conforms to the Iterator protocol,
you can easily find all the config-options that Tor supports:
tor = launch(..)
for config_key in tor.config:
print("{} has value: {}".format(config_key, getattr(tor.config.config_key)))
These come from interrogating Tor using GETINFO config/names
and
so represent the configuration options of the current connected Tor
process. If the value “isn’t set” (i.e. is the default), the value
from Tor will be .DEFAULT_VALUE
.
When you set values into TorConfig
, they are parsed according to
control-spec for the different types given to the values, via
information from GETINFO config/names
. So, for example, setting
.SOCKSPort
to a "quux"
won’t work. Of course, it would also
fail the whole SETCONF
command if txtorcon happens to allow some
values that Tor doesn’t. Unfortunately, for any item that’s a
list, Tor doesn’t tell us anything about each element so they’re all
strings. This means we can’t pre-validate them and so some things may
not fail until you call .save()
.
Monitor and Change Tor’s State¶
Instances of TorState
prepresent a live, interactive
version of all the relays/routers (Router
instances), all circuits (Circuit
instances) and
streams (Stream
instances) active in the underlying
Tor instance.
As the TorState
instance has subscribed to various events from
Tor, the “live” state represents an “as up-to-date as possible”
view. This includes all other controlers, Tor Browser, etcetera that
might be interacting with your Tor client.
A Tor
instance doesn’t have a TorState
instance by default (it
can take a few hundred milliseconds to set up) and so one is created
via the asynchronous method Tor.get_state()
.
Note
If you need to be absolutely sure there’s nothing stuck in
networking buffers and that your instance is “definitely
up-to-date” you can issue a do-nothing command to Tor via
TorControlProtocol.queue_command()
(e.g. yield
queue_command("GETINFO version")
). Most users shouldn’t have to
worry about this edge-case. In any case, there could be a new
update that Tor decides to issue at any moment.
You can modify the state of Tor in a few simple ways. For example, you
can call Stream.close()
or
Circuit.close()
to cause a stream or circuit to be
closed. You can wait for a circuit to become usable with
Circuit.when_built()
.
For a lot of the read-only state, you can simply access interesting
attributes. The relays through which a circuit traverses are in
Circuit.path
(a list of Router
instances),
Circuit.streams
contains a list of Stream
instances, .state
and .purpose
are strings. .time_created
returns a datetime instance. There
are also some convenience functions like Circuit.age()
.
For sending streams over a particular circuit,
Circuit.stream_via()
returns an
IStreamClientEndpoint implementation that will cause a subsequent
.connect()
on it to go via the given circuit in Tor. A similar
method (Circuit.web_agent()
) exists for Web requests.
Listening for certain events to happen can be done by implementing the
interfaces interface.IStreamListener
and
interface.ICircuitListener
. You can request
notifications on a Tor-wide basis with
TorState.add_circuit_listener()
or
TorState.add_stream_listener()
. If you are just
interested in a single circuit, you can call
Circuit.listen()
directly on a Circuit
instance.
You can instead use methods (which also function as decorators) such
as TorState.on_circuit_launched()
or
TorState.on_stream_closed()
to add listeners for single events.
The Tor relays are abstracted with Router
instances. Again, these have read-only attributes for interesting
information, e.g.: id_hex
, ip
, flags
(a list of strings),
bandwidth
, policy
, etc. Note that all information in these
objects is from “microdescriptors”. If you’re doing a long-running
iteration over relays, it may be important to remember that the
collection of routers can change every hour (when a new “consensus”
from the Directory Authorities is published) which may change the
underlying collection (e.g. TorState.routers_by_hash
)
over which you’re iterating.
Here’s a simple sketch that traverses all circuits printing their router IDs, and closing each stream and circuit afterwards:
from twisted.internet.task import react
from twisted.internet.defer import inlineCallbacks
from twisted.internet.endpoints import UNIXClientEndpoint
import txtorcon
@react
@inlineCallbacks
def main(reactor):
"""
Close all open streams and circuits in the Tor we connect to
"""
control_ep = UNIXClientEndpoint(reactor, '/var/run/tor/control')
tor = yield txtorcon.connect(reactor, control_ep)
state = yield tor.create_state()
print("Closing all circuits:")
for circuit in list(state.circuits.values()):
path = '->'.join(map(lambda r: r.id_hex, circuit.path))
print("Circuit {} through {}".format(circuit.id, path))
for stream in circuit.streams:
print(" Stream {} to {}".format(stream.id, stream.target_host))
yield stream.close()
print(" closed")
yield circuit.close()
print("closed")
yield tor.quit()
Making Connections Over Tor¶
SOCKS5¶
Tor exposes a SOCKS5 interface to make client-type connections over the network. There are also a couple of custom extensions Tor provides to do DNS resolution over a Tor circuit (txtorcon supports these, too).
All client-side interactions are via instances that implement IStreamClientEndpoint. There are several factory functions used to create suitable instances.
The recommended API is to acquire a Tor
instance
(see “A Tor Instance”) and then call
Tor.create_client_endpoint()
. To do DNS lookups (or
reverse lookups) via a Tor circuit, use
Tor.dns_resolve()
and
Tor.dns_resolve_ptr()
.
A common use-case is to download a Web resource; you can do so via
Twisted’s built-in twisted.web.client
package, or using the
friendlier treq library. In both cases, you need a
twisted.web.client.Agent
instance which you can acquire with Tor.web_agent()
or
Circuit.web_agent()
. The latter is used to make the
request over a specific circuit. Usually, txtorcon will simply use one
of the available SOCKS ports configured in the Tor it is connected to
– if you care which one, you can specify it as the optional
_socks_endpoint=
argument (this starts with an underscore on
purpose as it’s not recommended for “public” use and its semantics
might change in the future).
Note
Tor supports SOCKS over Unix sockets. So does txtorcon. To take
advantage of this, simply pass a valid SocksPort
value for unix
sockets (e.g. unix:/tmp/foo/socks
) as the _socks_endpoint
argument to either web_agent()
call. If this doesn’t already
exist in the underlying Tor, it will be added. Tor has particular
requirements for the directory in which the socket file is
(0700
). We don’t have a way (yet?) to auto-discover if the Tor
we’re connected to can support Unix sockets so the default is to
use TCP.
You can also use Twisted’s clientFromString API as txtorcon
registers a tor:
plugin. This also implies that any Twisted-using
program that supports configuring endpoint strings gets Tor support
“for free”. For example, passing a string like
tor:fjblvrw2jrxnhtg67qpbzi45r7ofojaoo3orzykesly2j3c2m3htapid.onion:80
to clientFromString will return
an endpoint that will connect to txtorcon’s onion-service
website. Note that these endpoints will use the “global to txtorcon”
Tor instance (available from get_global_tor()
). Thus,
if you want to control which tor instance your circuit goes over,
this is not a suitable API.
There are also lower-level APIs to create
TorClientEndpoint
instances directly if you have a
TorConfig
instance. These very APIs are used by the
Tor
object mentioned above. If you have a use-case that requires
using this API, I’d be curious to learn why the Tor
methods are un-suitable (as those are the suggested API).
You should expect these APIs to raise SOCKS5 errors, which can all be
handled by catching the socks.SocksError
class. If
you need to work with each specific error (corresponding to the
RFC-specified SOCKS5 replies), see the “txtorcon.socks Module” for a list of
them.
Custom Circuits¶
Tor provides a way to let controllers like txtorcon decide which streams go on which circuits. Since your Tor client will then be acting differently from a “normal” Tor client, it may become easier to de-anonymize you.
High Level¶
With that in mind, you may still decide to attach streams to
circuits. Most often, this means you simply want to make a client
connection over a particluar circuit. The recommended API uses
Circuit.stream_via()
for arbitrary protocols or
Circuit.web_agent()
as a convenience for Web
connections. The latter can be used via Twisted’s Web client
or via treq (a “requests”-like library for Twisted).
See the following examples:
Note that these APIs mimic Tor.stream_via()
and
Tor.web_agent()
except they use a particular Circuit.
Low Level¶
Under the hood of these calls, txtorcon provides a low-level interface directly over top of Tor’s circuit-attachment API.
This works by:
setting
__LeaveStreamsUnattached 1
in the Tor’s configurationlistening for
STREAM
eventstelling Tor (via
ATTACHSTREAM
) what circuit to put each new stream on(we can also choose to tell Tor “attach this one however you normally would”)
This is an asynchronous API (i.e. Tor isn’t “asking us” for each stream) so arbitrary work can be done on a per-stream basis before telling Tor which circuit to use. There are two limitations though:
Tor doesn’t play nicely with multiple controllers playing the role of attaching circuits. Generally, there’s not a good way to know if there’s another controller trying to attach streams, but basically the first one to answer “wins”.
Tor doesn’t currently allow controllers to attach circuits destined for onion-services (even if the circuit is actually suitable and goes to the correct Introduction Point).
In order to do custom stream -> circuit mapping, you call
TorState.set_attacher()
with an object implementing
interface.IStreamAttacher
. Then every time a new
stream is detected, txtorcon will call
interface.IStreamAttacher.attach_stream()
with the
Stream
instance and a list of all available
circuits. You make an appropriate return.
There can be either no attacher at all or a single attacher
object. You can “un-set” an attacher by calling set_attacher(None)
(in which case __LeaveStreamsUnattached
will be set back to 0).
If you really do need multiple attachers, you can use the utility
class attacher.PriorityAttacher
which acts as the
“top level” one (so you add your multiple attachers to it).
Be aware that txtorcon internally uses this API itself if you’ve
ever called the “high level” API
(Circuit.stream_via()
or
Circuit.web_agent()
) and so it is an error to set a
new attacher if there is already an existing attacher.
Building Your Own Circuits¶
To re-iterate the warning above, making your own circuits differently from how Tor normally does runs a high risk of de-anonymizing you. That said, you can build custom circuits using txtorcon.
Building a Single Circuit¶
If your use-case needs just a single circuit, it is probably easiest
to call TorState.build_circuit()
. This methods takes a
list of Router
instances, which you can get from the
TorState
instance by using one of the attributes:
.all_routers
.routers
.routers_by_name
or
.routers_by_hash
The last three are all dicts. For relays that have the Guard
flag,
you can access the dicts .guards
(for all of them) or
.entry_guards
(for just the entry guards configured on this Tor
client).
If you don’t actually care which relays are used, but simply want a
fresh circuit, you can call TorState.build_circuit()
without any arguments at all which asks Tor to build a new circuit in
the way it normally would (i.e. respecting your guard nodes etc).
There is also build_timeout_circuit()
as a convenience method
if you wish the attempt to time out after a while.
Building Many Circuits¶
Caution
This API doesn’t exist yet; this is documenting what may become a new API in a future version of txtorcon. Please get in touch if you want this now.
If you would like to build many circuits, you’ll want an instance that
implements txtorcon.ICircuitBuilder
(which is usually simply
an instance of CircuitBuilder
). Instances of this
class can be created by calling one of the factory functions like
circuit_builder_fixed_exit()
.
XXX what about a “config object” idea, e.g. could have keys:
guard_selection
: one ofentry_only
(use one of the current entry guards) orrandom_guard
(use any relay with the Guard flag, selected by XXX).
middle_selection
: one ofuniform
(selected randomly from all relays),weighted
(selected randomly, but weighted by consensus weight – basically same way as Tor would select).