Welcome to PyPI to 0install’s documentation!¶
Contents:
User documentation¶
Installation¶
To install:
git clone https://github.com/timdiels/pypi-to-0install.git
cd pypi-to-0install
python3 -m venv venv && . venv/bin/activate
pip install -r requirements.txt
wget https://downloads.sf.net/project/zero-install/0install/2.3.4/0install-2.3.4.tar.bz2
tar -xaf 0install-2.3.4.tar.bz2
mv 0install-2.3.4/zeroinstall .
rm -rf 0install-2.3.4* # cleanup, optional
python3 should be at least python 3.4.
Running¶
To run:
. $repo_root/venv/bin/activate
export PYTHONPATH="$repo_root"
python3 $repo_root/pypi_to_0install/main.py
Developer documentation¶
For full testing, set up a local mirror of PyPI and use that instead. This way you do not download all of PyPI more than once. You can set it up with bandersnatch.
For tests, set PYTHONPATH as in the regular run instructions, then run
pytest
.
Design¶
PyPI XMLRPC interface¶
This document clarifies some aspects of PyPI’s XMLRPC interface, which is used for the conversion.
While PyPI’s metadata is structured, little input validation is performed. E.g.
some fields may be None
, ''
or something bogus such as UNKNOWN
(analyzing pypi metadata). E.g. author_email isn’t required to be an email
adress.
The following is a non-exhaustive list of descriptions of the output of some of the interface’s commands:
release_data:
- name: the package name. This is not the canonical name. You are required to use this name when requesting info through the interface, not the canonical name.
- home_page: a URL.
- license: a string such as
GPL
, potentially has variations such asGeneral Public License
(and bogus values such asLICENSE.txt
). - summary: short description string
- description: long description string in reStructuredText format
- keywords: whitespace separated list of keywords as string
- classifiers: list of Trove classifiers, list of str.
- release_url: the PyPI page corresponding to this version
- package_url: the PyPI page of the latest version of this package
- docs_url: if the package has hosted its documentation at PyPI, this URL points to it. Submitting documentation to PyPI has been deprecated (in favor of Read the Docs).
- platform: do not use. It is tied to a version, but not to a download url
(release_urls), so it can’t be meaningful. E.g. for numpy it returns
Windows
while numpy is supported on Linux as well. - stable_version: always empty string, useless.
- requires, requires_dist, provides, provides_dist: seems these are not returned or are always empty
release_urls:
packagetype:
Meaning of the most common values:
- sdist: source distribution
- bdist_wheel: Python wheel
- bdist_egg: Python egg, can be converted to a wheel
- bdist_wininst:
... bdist --format=wininst
output, a self-extracting ZIP for Windows; but it can be converted to a wheel
python_version: unknown. Examples:
source
,any
,py3
, ...url: the download url
filename: file name of the download. For a wheel, this follows the wheel file name convention. Eggs also follow a file name convention. Metadata such as which platform the download is for is missing, instead one has to derive it from the filename or download and inspect the binary.
Conversion: General¶
This details how PyPI packages are converted to ZI feeds. Parts specific to the
packagetype (sdist, wheel, ...) are detailed in the other conversion pages.
I will use shorthands such as release_data['summary']
throughout the text
(instead of release_data(...)['summary']
) to refer to the PyPI XMLRPC
interface.
We will refer to a PyPI project as a package (e.g. numpy; this follows PyPI’s terminology) and its downloads as distributions (e.g. an sdist/wheel of numpy).
Overview¶
This pseudo-feed gives an overview of the conversion (end tags omitted):
<interface>
<name>{canonical_name(release_data['name'])}
<summary>{release_data['summary']}
<homepage>{release_data['home_page']}
<description>{pandoc(release_data['description'], from=rst, to=txt)}
<category type={uri_to_trove_namespace}>{release_data['classifiers'][i]}
...
<needs-terminal/> iff ``Environment :: Console`` in classifiers
<implementation
id={release_urls['path']}
version={converted_version}
released={format(release_urls['upload_time'], 'YYYY-MM-DD')}
stability={stability}
langs={langs}
license={license}
...
>
<requires interface='https://pypi_to_zi_feeds.github.io/...' importance='{importance}' />
...
Where:
def canonical_name(pypi_name):
re.sub(r"[-_.]+", "-", pypi_name).lower()
Here, release_data
refers to the release data of the newest release/version
of the package.
The description is converted from reST to plain text.
Categories are Trove classifiers.
TODO What’s the format of the xml file describing the categories? Need more info before I can convert Trove database into what’s expected by ZI (or find something existing).
For the meaning of {converted_version}
, see the Version conversion section
below.
{stability}
is developer
if Python version has a .dev
segment. Else, if
the version contains a prerelease segment (.a|b|rc
), stability is
testing
. Otherwise, stability is stable
.
{langs}
is derived from Natural Language ::
classifiers.
{license}
is a Trove classifier. If License ::
is in classifiers, it is
used. If there are multiple, pick one in a deterministic fashion. If none, try
to derive it from release_data['license']
. If none or its value is not
understood, try to derive it from a LICENSE.txt
. If no such file, omit
the license attribute.
For <requires ...>...
, see dependency conversion below.
Additional attributes and content of each <implementation>
depends on the
packagetype of the corresponding release_url.
Version conversion¶
As Python and ZI versioning schemes differ, conversion is required. Given a Python conversion, we convert it to a normalised Python version (via packaging.version.parse), which gives us:
{epoch}!{release}[{prerelease_type}{prerelease_number}][.post{post_number}][.dev{dev_number}]
Where:
[]
denotes optional partrelease := N(.N)*
, withN
an integerprerelease_type := a|b|rc
epoch, prerelease_number, post_number, dev_number
are non-negative numbers
This is converted to the ZI version:
{epoch}-{stripped_release}-{modifiers}
Where:
stripped_release
isrelease
with trailing.0
components trimmed off. This is necessary due to1 < 1.0
in ZI, while1 == 1.0
in Python.modifiers
is a list of up to 3 modifiers where prereleases, post and dev segments are considered modifiers. Modifiers are joined by-
, e.g.{modifiers[0]}-{modifier[1]}
. A modifier is formatted as:{type}.{number}
where:
type
is a number derived from this mapping:types = { 'dev': 0, 'a': 1, 'b': 2, 'rc': 3, 'post': 5, }
number
is one ofprerelease_number
,post_number
,dev_number
, depending on the modifier type.
When a version has less than the maximum amount of modifiers, i.e. less than 3, an empty modifier (
-4
) is appended to the list. This ensures correct version ordering.Some examples of modifier conversion:
a10.post20.dev30 -> 1.10-5.20-0.30 b10.dev30 -> 2.10-0.30-4 post20.dev30 -> 5.20-0.30-4 dev30 -> 0.30-4 rc10 -> 3.10-4
For examples of the whole conversion, see test_convert_version.
This conversion does not change version ordering.
Dependency conversion¶
Dependencies are derived from the the distribution (egg_info
:
requires.txt
and depends.txt
) as this information is not available
through PyPI’s metadata (e.g. release_data['requires']
is missing).
{importance}
is essential
if the dependency is in install_requires
and recommended
otherwise (extras_require
).
Python packages allow for optional named groups of dependencies called extras. Further, Python dependencies can be conditional (by using environment markers). If a dependency is either conditional or appears in extras_require, it is added as a recommended dependencies in the converted feed, else it is added as a required dependency. Note that Zero Install tries to select all recommended dependencies, but does not fail to select the depending interface when one of its recommended dependencies cannot be selected.
For example:
install_requires = ['dep1 ; python_version<2.7', 'dep2==3.*']
extras_require = {
':python_version<2.7': ['install_requires_dep'],
'test:platform_system=="Windows"': ['pywin32'], # only on windows
'test': ['somepkg'], # regardless of platform
'special_feature': ['dep2>=3.3,<4'], # regardless of platform
}
is converted to:
<implementation ...>
<requires interface='.../feeds/dep1.xml' importance='recommended' />
<requires interface='.../feeds/dep2.xml' importance='required' version='{constraints}' />
<requires interface='.../feeds/install_requires_dep.xml' importance='recommended' />
<requires interface='.../feeds/pywin32.xml' importance='recommended' />
<requires interface='.../feeds/somepkg.xml' importance='recommended' />
where {constraints}
are all Python version specifiers converted to a ZI
version expression.
Conversion: Packagetype specifics¶
This documents the parts of the conversion that depend on the packagetype of
each download (from release_urls
). These only affect <implementation>
.
There can be multiple download urls for the same version, each can have a
different packagetype.
Currently, only source distributions are supported.
Generally, a <manifest-digest>
requires downloading and unpacking the archive.
In doing so, the download’s md5sum is compared to release_urls['md5_digest']
.
Python distributions, installation¶
Generally, a Python distribution (the download from release_urls
) is an
archive/executable which installs:
Platform independent Python code into a location in
PYTHONPATH
.Platform dependent libraries, such as extension modules, into
PYTHONPATH
.Python scripts (according to distutils). These are added to
PATH
. Some of these are stored as files in the distribution, others are generated fromentry_points
metadata.Upon build (
python setup.py build_scripts
), the stored scripts are copied and their shebang is edited to point to the python interpreter used for the build (this is an absolute path).Only upon installation, are scripts generated from
entry_points
.Data files as specified by
data_files
insetup.py
. This does not includepackage_data
files, those are placed next to the Python source files.data_files
can have both absolute and relative destination paths.Files with a relative destination path can end up being installed anywhere and the application/library has no way of finding out where these data files have been installed; as such we can safely ignore these files in converting to ZI.
Files with an absolute destination path will be installed to a predictable location and so the application/library can depend on them. However, making this possible in ZI would require a layered file system to make the file appear installed (e.g. a destination in /etc) without modifying global state. This is not currently supported. I expect there are few popular packages, if any, which use this.
Bottom line: the conversion drops
data_files
. (package_data
is still included!)C/C++ header files.
pyc files¶
Normally, when installed, py files are compiled to pyc files. These are
specific to the Python version and implementation (e.g. CPython 3.6).
Having pyc files in our binary ZI implementation would restrict its reusability
to os-cpu-python_implementation-python_implementation_version
, i.e. it
kills reuse. So, pyc files are not included in implementations.
When Python imports a package, it tries to write a pyc file if missing. This pyc file is written (in a __pycache__ directory) near the py file. There is no way of writing pyc files to a different location. All these pyc writes result in permission errors as the 0store cache is read-only.
This means we either generate highly platform-specific ZI implementations or have no pyc files. According to #python, the lack of pyc files results in an unnoticeable performance hit on startup time.
The permission errors can be avoided by setting the environment var
PYTHONDONTWRITEBYTECODE=true
.
As such, we disable pyc file generation on installation and set
PYTHONDONTWRITEBYTECODE
.
Source distribution¶
A source distribution (release_urls['packagetype'] == 'sdist'
) is a tgz/zip
containing at least a setup.py
. The preferred way to install these is with
pip.
After unpacking the distribution, it can be installed without affecting global state like so:
pip install \
--install-option="--install-purelib=/lib" \
--install-option="--install-platlib=/lib" \
--install-option="--install-headers=/headers" \
--install-option="--install-scripts=/scripts" \
--install-option="--install-data=/data" \
--root "$PWD/install" \
--no-deps .
--root
prevents installing outside the install directory; this mainly
counters counter data_files
with absolute paths.
The resulting dir contains:
- lib
- Cross platform ‘libraries’: Python source and pyc files, egg-info directories, package_data files, ...
- Platform specific libraries such as Python extension modules.
- scripts
- Python scripts with a shebang that points by absolute path to the python used by pip. This includes generated scripts.
- headers
- C/C++ headers. Unused.
- data
- Data files from
data_files
with relative destination paths. Unused. - *
- Data files from
data_files
with absolute destination paths. Unused.
The source implementation as pseudo-code (extends the <implementation>
from
Conversion: general):
<implementation arch='*-src'>
<archive href='{release_urls['url']}' size='{release_urls['size']}' />
<command name='compile' ...>
...
<compile:implementation arch='*-*'>
<archive href='{release_urls['url']}' size='{release_urls['size']}' />
<environment name='PYTHONPATH' insert='{lib}' />
<environment name='PATH' insert='{scripts}' />
<environment name='PYTHONDONTWRITEBYTECODE' value='true' mode='replace' />
For now, some requirements are omitted from the compiled implementation (it may be easier to tackle them when real life cases arise where this forms a problem):
- For example, the NumPy package does not work on PyPy. One way to add this
constraint is
<restricts interface=PyPy version='0'>
where version 0 does not exist. - script generation depends on
os.name=posix|java|nt
andsys.platform.startswith('java')
. It appears it is not possible to express this in ZI currently. Though, instead of expressing it in ZI, we should instead generate our own cross-platform scripts. - The Python code itself could be platform dependent. This could be derived from classifiers; but these are often omitted and one can doubt the correctness of those that do list it. In this case, it may be better to be too lenient rather than too restrictive.
- extension modules require a certain os-cpu architecture (and perhaps an ABI
unless that’s standardised by a PEP). When these are present,
os-cpu
should be set
Wheel¶
Not supported.
Notes:
release_urls['packagetype'] == 'bdist_wheel'
- can derive arch from
release_urls['filename']
. See the PyPI XMLRPC interface notes. - bdist_egg and bdist_wininst can be converted to a wheel
- Wheels cannot be used as binary ZI implementation as scripts need to be
generated for
entry_points
. release_urls['python_version']
should be used to restrict which python interpreters and versions may be used; if it’s not already mentioned in the wheel name.
Egg¶
Not supported.
Notes:
release_urls['packagetype'] == 'bdist_egg'
- can derive arch from
release_urls['filename']
. See the PyPI XMLRPC interface notes (follow the link to the egg file name convention and search it for “Filename-Embedded Metadata”). - for an example of eggs, see the pymongo project on PyPI
- Eggs cannot be used as binary ZI implementation as scripts need to be
generated for
entry_points
.