Introduction
Patroni is a template for high availability (HA) PostgreSQL solutions using Python. For maximum accessibility, Patroni supports a variety of distributed configuration stores like ZooKeeper, etcd, Consul or Kubernetes. Database engineers, DBAs, DevOps engineers, and SREs who are looking to quickly deploy HA PostgreSQL in datacenters — or anywhere else — will hopefully find it useful.
We call Patroni a “template” because it is far from being a one-size-fits-all or plug-and-play replication system. It will have its own caveats. Use wisely. There are many ways to run high availability with PostgreSQL; for a list, see the PostgreSQL Documentation.
Currently supported PostgreSQL versions: 9.3 to 16.
Note to Citus users: Starting from 3.0 Patroni nicely integrates with the Citus database extension to Postgres. Please check the Citus support page in the Patroni documentation for more info about how to use Patroni high availability together with a Citus distributed cluster.
Note to Kubernetes users: Patroni can run natively on top of Kubernetes. Take a look at the Kubernetes chapter of the Patroni documentation.
Introduction
Patroni is a template for high availability (HA) PostgreSQL solutions using Python. Patroni originated as a fork of Governor, the project from Compose. It includes plenty of new features.
For an example of a Docker-based deployment with Patroni, see Spilo, currently in use at Zalando.
For additional background info, see:
PostgreSQL HA with Kubernetes and Patroni, talk by Josh Berkus at KubeCon 2016 (video)
Development Status
Patroni is in active development and accepts contributions. See our Contributing section below for more details.
We report new releases information here.
Technical Requirements/Installation
Go here for guidance on installing and upgrading Patroni on various platforms.
Planning the Number of PostgreSQL Nodes
Patroni/PostgreSQL nodes are decoupled from DCS nodes (except when Patroni implements RAFT on its own) and therefore there is no requirement on the minimal number of nodes. Running a cluster consisting of one primary and one standby is perfectly fine. You can add more standby nodes later.
Running and Configuring
The following section assumes Patroni repository as being cloned from https://github.com/zalando/patroni. Namely, you will need example configuration files postgres0.yml and postgres1.yml. If you installed Patroni with pip, you can obtain those files from the git repository and replace ./patroni.py below with patroni command.
To get started, do the following from different terminals:
> etcd --data-dir=data/etcd --enable-v2=true
> ./patroni.py postgres0.yml
> ./patroni.py postgres1.yml
You will then see a high-availability cluster start up. Test different settings in the YAML files to see how the cluster’s behavior changes. Kill some of the components to see how the system behaves.
Add more postgres*.yml
files to create an even larger cluster.
Patroni provides an HAProxy configuration, which will give your application a single endpoint for connecting to the cluster’s leader. To configure, run:
> haproxy -f haproxy.cfg
> psql --host 127.0.0.1 --port 5000 postgres
YAML Configuration
Go here for comprehensive information about settings for etcd, consul, and ZooKeeper. And for an example, see postgres0.yml.
Environment Configuration
Go here for comprehensive information about configuring(overriding) settings via environment variables.
Replication Choices
Patroni uses Postgres’ streaming replication, which is asynchronous by default. Patroni’s asynchronous replication configuration allows for maximum_lag_on_failover
settings. This setting ensures failover will not occur if a follower is more than a certain number of bytes behind the leader. This setting should be increased or decreased based on business requirements. It’s also possible to use synchronous replication for better durability guarantees. See replication modes documentation for details.
Applications Should Not Use Superusers
When connecting from an application, always use a non-superuser. Patroni requires access to the database to function properly. By using a superuser from an application, you can potentially use the entire connection pool, including the connections reserved for superusers, with the superuser_reserved_connections
setting. If Patroni cannot access the Primary because the connection pool is full, behavior will be undesirable.
Testing Your HA Solution
Testing an HA solution is a time consuming process, with many variables. This is particularly true considering a cross-platform application. You need a trained system administrator or a consultant to do this work. It is not something we can cover in depth in the documentation.
That said, here are some pieces of your infrastructure you should be sure to test:
Network (the network in front of your system as well as the NICs [physical or virtual] themselves)
Disk IO
file limits (nofile in Linux)
RAM. Even if you have oomkiller turned off, the unavailability of RAM could cause issues.
CPU
Virtualization Contention (overcommitting the hypervisor)
Any cgroup limitation (likely to be related to the above)
kill -9
of any postgres process (except postmaster!). This is a decent simulation of a segfault.
One thing that you should not do is run kill -9
on a postmaster process. This is because doing so does not mimic any real life scenario. If you are concerned your infrastructure is insecure and an attacker could run kill -9
, no amount of HA process is going to fix that. The attacker will simply kill the process again, or cause chaos in another way.
Installation
Pre-requirements for Mac OS
To install requirements on a Mac, run the following:
brew install postgresql etcd haproxy libyaml python
Psycopg
Starting from psycopg2-2.8 the binary version of psycopg2 will no longer be installed by default. Installing it from
the source code requires C compiler and postgres+python dev packages. Since in the python world it is not possible to
specify dependency as psycopg2 OR psycopg2-binary
you will have to decide how to install it.
There are a few options available:
Use the package manager from your distro
sudo apt-get install python3-psycopg2 # install psycopg2 module on Debian/Ubuntu
sudo yum install python3-psycopg2 # install psycopg2 on RedHat/Fedora/CentOS
Specify one of psycopg, psycopg2, or psycopg2-binary in the list of dependencies when installing Patroni with pip.
General installation for pip
Patroni can be installed with pip:
pip install patroni[dependencies]
where dependencies
can be either empty, or consist of one or more of the following:
- etcd or etcd3
python-etcd module in order to use Etcd as Distributed Configuration Store (DCS)
- consul
python-consul module in order to use Consul as DCS
- zookeeper
kazoo module in order to use Zookeeper as DCS
- exhibitor
kazoo module in order to use Exhibitor as DCS (same dependencies as for Zookeeper)
- kubernetes
kubernetes module in order to use Kubernetes as DCS in Patroni
- raft
pysyncobj module in order to use python Raft implementation as DCS
- aws
boto3 in order to use AWS callbacks
- all
all of the above (except psycopg family)
- psycopg
psycopg[binary]>=3.0.0 module
- psycopg2
psycopg2>=2.5.4 module
- psycopg2-binary
psycopg2-binary module
For example, the command in order to install Patroni together with psycopg3, dependencies for Etcd as a DCS, and AWS callbacks is:
pip install patroni[psycopg3,etcd3,aws]
Note that external tools to call in the replica creation or custom bootstrap scripts (i.e. WAL-E) should be installed independently of Patroni.
Package installation on Linux
Patroni packages may be available for your operating system, produced by the Postgres community for:
RHEL, RockyLinux, AlmaLinux;
Debian and Ubuntu;
SUSE Enterprise Linux.
You can also find packages for direct dependencies of Patroni, like python modules that might not be available in the official operating system repositories.
For more information see the PGDG repository documentation.
If you are on a RedHat Enterprise Linux derivative operating system you may also require packages from EPEL, see EPEL repository documentation.
Once you have installed the PGDG repository for your OS you can install patroni.
Note
Patroni packages are not maintained by the Patroni developers, but rather by the Postgres community. If you require support please first try connecting on Postgres slack.
Installing on Debian derivatives
With PGDG repo installed, see above, install Patroni via apt run:
apt-get install patroni
Installing on RedHat derivatives
With PGDG repo installed, see above, install patroni with an etcd DCS via dnf on RHEL 9 (and derivatives) run:
dnf install patroni patroni-etcd
You can install etcd from PGDG if your RedHat derivative distribution does not provide packages. On the nodes that will host the DCS run:
dnf install 'dnf-command(config-manager)'
dnf config-manager --enable pgdg-rhel9-extras
dnf install etcd
You can replace the version of RHEL with 8 in the repo to make pgdg-rhel8-extras if needed. The repo name is still pgdg-rhelN-extras on RockyLinux, AlmaLinux, Oracle Linux, etc…
Installing on SUSE Enterprise Linux
You might need to enable the SUSE PackageHub repositories for some dependencies. see SUSE PackageHub documentation.
For SLES 15 with PGDG repo installed, see above, you can install patroni using:
zypper install patroni patroni-etcd
With the SUSE PackageHub repo enabled you can also install etcd:
SUSEConnect -p PackageHub/15.5/x86_64
zypper install etcd
Upgrading
Upgrading patroni is a very simple process, just update the software installation and restart the Patroni daemon on each node in the cluster.
However, restarting the Patroni daemon will result in a Postgres database restart. In some situations this may cause a failover of the primary node in your cluster, therefore it is recommended to put the cluster into maintenance mode until the Patroni daemon restart has been completed.
To put the cluster in maintenance mode, run the following command on one of the patroni nodes:
patronictl pause --wait
Then on each node in the cluster, perform the package upgrade required for your OS:
apt-get update && apt-get install patroni patroni-etcd
Restart the patroni daemon process on each node:
systemctl restart patroni
Then finally resume monitoring of Postgres with patroni to take it out of maintenance mode:
patronictl resume --wait
The cluster will now be full operational with the new version of Patroni.
Patroni configuration
Dynamic Configuration Settings
Dynamic configuration is stored in the DCS (Distributed Configuration Store) and applied on all cluster nodes.
In order to change the dynamic configuration you can use either patronictl edit-config tool or Patroni REST API.
loop_wait: the number of seconds the loop will sleep. Default value: 10, minimum possible value: 1
ttl: the TTL to acquire the leader lock (in seconds). Think of it as the length of time before initiation of the automatic failover process. Default value: 30, minimum possible value: 20
retry_timeout: timeout for DCS and PostgreSQL operation retries (in seconds). DCS or network issues shorter than this will not cause Patroni to demote the leader. Default value: 10, minimum possible value: 3
Warning
when changing values of loop_wait, retry_timeout, or ttl you have to follow the rule:
loop_wait + 2 * retry_timeout <= ttl
maximum_lag_on_failover: the maximum bytes a follower may lag to be able to participate in leader election.
maximum_lag_on_syncnode: the maximum bytes a synchronous follower may lag before it is considered as an unhealthy candidate and swapped by healthy asynchronous follower. Patroni utilize the max replica lsn if there is more than one follower, otherwise it will use leader’s current wal lsn. Default is -1, Patroni will not take action to swap synchronous unhealthy follower when the value is set to 0 or below. Please set the value high enough so Patroni won’t swap synchrounous follower fequently during high transaction volume.
max_timelines_history: maximum number of timeline history items kept in DCS. Default value: 0. When set to 0, it keeps the full history in DCS.
primary_start_timeout: the amount of time a primary is allowed to recover from failures before failover is triggered (in seconds). Default is 300 seconds. When set to 0 failover is done immediately after a crash is detected if possible. When using asynchronous replication a failover can cause lost transactions. Worst case failover time for primary failure is: loop_wait + primary_start_timeout + loop_wait, unless primary_start_timeout is zero, in which case it’s just loop_wait. Set the value according to your durability/availability tradeoff.
primary_stop_timeout: The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled. When set to > 0 and the synchronous_mode is enabled, Patroni sends SIGKILL to the postmaster if the stop operation is running for more than the value set by primary_stop_timeout. Set the value according to your durability/availability tradeoff. If the parameter is not set or set <= 0, primary_stop_timeout does not apply.
synchronous_mode: turns on synchronous replication mode. In this mode a replica will be chosen as synchronous and only the latest leader and synchronous replica are able to participate in leader election. Synchronous mode makes sure that successfully committed transactions will not be lost at failover, at the cost of losing availability for writes when Patroni cannot ensure transaction durability. See replication modes documentation for details.
synchronous_mode_strict: prevents disabling synchronous replication if no synchronous replicas are available, blocking all client writes to the primary. See replication modes documentation for details.
failsafe_mode: Enables DCS Failsafe Mode. Defaults to false.
postgresql:
use_pg_rewind: whether or not to use pg_rewind. Defaults to false.
use_slots: whether or not to use replication slots. Defaults to true on PostgreSQL 9.4+.
recovery_conf: additional configuration settings written to recovery.conf when configuring follower. There is no recovery.conf anymore in PostgreSQL 12, but you may continue using this section, because Patroni handles it transparently.
parameters: list of configuration settings for Postgres.
pg_hba: list of lines that Patroni will use to generate
pg_hba.conf
. Patroni ignores this parameter ifhba_file
PostgreSQL parameter is set to a non-default value.- host all all 0.0.0.0/0 md5
- host replication replicator 127.0.0.1/32 md5: A line like this is required for replication.
pg_ident: list of lines that Patroni will use to generate
pg_ident.conf
. Patroni ignores this parameter ifident_file
PostgreSQL parameter is set to a non-default value.- mapname1 systemname1 pguser1
- mapname1 systemname2 pguser2
standby_cluster: if this section is defined, we want to bootstrap a standby cluster.
host: an address of remote node
port: a port of remote node
primary_slot_name: which slot on the remote node to use for replication. This parameter is optional, the default value is derived from the instance name (see function slot_name_from_member_name).
create_replica_methods: an ordered list of methods that can be used to bootstrap standby leader from the remote primary, can be different from the list defined in PostgreSQL
restore_command: command to restore WAL records from the remote primary to nodes in a standby cluster, can be different from the list defined in PostgreSQL
archive_cleanup_command: cleanup command for standby leader
recovery_min_apply_delay: how long to wait before actually apply WAL records on a standby leader
slots: define permanent replication slots. These slots will be preserved during switchover/failover. Permanent slots that don’t exist will be created by Patroni. With PostgreSQL 11 onwards permanent physical slots are created on all nodes and their position is advanced every loop_wait seconds. For PostgreSQL versions older than 11 permanent physical replication slots are maintained only on the current primary. The logical slots are copied from the primary to a standby with restart, and after that their position advanced every loop_wait seconds (if necessary). Copying logical slot files performed via
libpq
connection and using either rewind or superuser credentials (see postgresql.authentication section). There is always a chance that the logical slot position on the replica is a bit behind the former primary, therefore application should be prepared that some messages could be received the second time after the failover. The easiest way of doing so - trackingconfirmed_flush_lsn
. Enabling permanent replication slots requires postgresql.use_slots to be set totrue
. If there are permanent logical replication slots defined Patroni will automatically enable thehot_standby_feedback
. Since the failover of logical replication slots is unsafe on PostgreSQL 9.6 and older and PostgreSQL version 10 is missing some important functions, the feature only works with PostgreSQL 11+.my_slot_name: the name of the permanent replication slot. If the permanent slot name matches with the name of the current node it will not be created on this node. If you add a permanent physical replication slot which name matches the name of a Patroni member, Patroni will ensure that the slot that was created is not removed even if the corresponding member becomes unresponsive, situation which would normally result in the slot’s removal by Patroni. Although this can be useful in some situations, such as when you want replication slots used by members to persist during temporary failures or when importing existing members to a new Patroni cluster (see Convert a Standalone to a Patroni Cluster for details), caution should be exercised by the operator that these clashes in names are not persisted in the DCS, when the slot is no longer required, due to its effect on normal functioning of Patroni.
type: slot type. Could be
physical
orlogical
. If the slot is logical, you have to additionally definedatabase
andplugin
.database: the database name where logical slots should be created.
plugin: the plugin name for the logical slot.
ignore_slots: list of sets of replication slot properties for which Patroni should ignore matching slots. This configuration/feature/etc. is useful when some replication slots are managed outside of Patroni. Any subset of matching properties will cause a slot to be ignored.
name: the name of the replication slot.
type: slot type. Can be
physical
orlogical
. If the slot is logical, you may additionally definedatabase
and/orplugin
.database: the database name (when matching a
logical
slot).plugin: the logical decoding plugin (when matching a
logical
slot).
Note: slots is a hashmap while ignore_slots is an array. For example:
slots:
permanent_logical_slot_name:
type: logical
database: my_db
plugin: test_decoding
permanent_physical_slot_name:
type: physical
...
ignore_slots:
- name: ignored_logical_slot_name
type: logical
database: my_db
plugin: test_decoding
- name: ignored_physical_slot_name
type: physical
...
Note: if cluster topology is static (fixed number of nodes that never change their names) you can configure permanent physical replication slots with names corresponding to names of nodes to avoid recycling of WAL files while replica is temporary down:
slots:
node_name1:
type: physical
node_name2:
type: physical
node_name3:
type: physical
...
Warning
Permanent replication slots are synchronized only from the primary
/standby_leader
to replica nodes. That means, applications are supposed to be using them only from the leader node. Using them on replica nodes will cause indefinite growth of pg_wal
on all other nodes in the cluster.
An exception to that rule are permanent physical slots that match the Patroni member names, if you happen to configure any. Those will be synchronized among all nodes as they are used for replication among them.
YAML Configuration Settings
Global/Universal
name: the name of the host. Must be unique for the cluster.
namespace: path within the configuration store where Patroni will keep information about the cluster. Default value: “/service”
scope: cluster name
Log
level: sets the general logging level. Default value is INFO (see the docs for Python logging)
traceback_level: sets the level where tracebacks will be visible. Default value is ERROR. Set it to DEBUG if you want to see tracebacks only if you enable log.level=DEBUG.
format: sets the log formatting string. Default value is %(asctime)s %(levelname)s: %(message)s (see the LogRecord attributes)
dateformat: sets the datetime formatting string. (see the formatTime() documentation)
max_queue_size: Patroni is using two-step logging. Log records are written into the in-memory queue and there is a separate thread which pulls them from the queue and writes to stderr or file. The maximum size of the internal queue is limited by default by 1000 records, which is enough to keep logs for the past 1h20m.
dir: Directory to write application logs to. The directory must exist and be writable by the user executing Patroni. If you set this value, the application will retain 4 25MB logs by default. You can tune those retention values with file_num and file_size (see below).
file_num: The number of application logs to retain.
file_size: Size of patroni.log file (in bytes) that triggers a log rolling.
loggers: This section allows redefining logging level per python module
patroni.postmaster: WARNING
urllib3: DEBUG
Bootstrap configuration
Note
Once Patroni has initialized the cluster for the first time and settings have been stored in the DCS, all future
changes to the bootstrap.dcs
section of the YAML configuration will not take any effect! If you want to change
them please use either patronictl edit-config or the Patroni REST API.
bootstrap:
dcs: This section will be written into /<namespace>/<scope>/config of the given configuration store after initializing the new cluster. The global dynamic configuration for the cluster. You can put any of the parameters described in the Dynamic Configuration settings under
bootstrap.dcs
and after Patroni has initialized (bootstrapped) the new cluster, it will write this section into /<namespace>/<scope>/config of the configuration store.method: custom script to use for bootstrapping this cluster.
See custom bootstrap methods documentation for details. When
initdb
is specified revert to the defaultinitdb
command.initdb
is also triggered when nomethod
parameter is present in the configuration file.initdb: (optional) list options to be passed on to initdb.
- data-checksums: Must be enabled when pg_rewind is needed on 9.3.
- encoding: UTF8: default encoding for new databases.
- locale: UTF8: default locale for new databases.
post_bootstrap or post_init: An additional script that will be executed after initializing the cluster. The script receives a connection string URL (with the cluster superuser as a user name). The PGPASSFILE variable is set to the location of pgpass file.
Citus
Enables integration Patroni with Citus. If configured, Patroni will take care of registering Citus worker nodes on the coordinator. You can find more information about Citus support here.
group: the Citus group id, integer. Use
0
for coordinator and1
,2
, etc… for workersdatabase: the database where
citus
extension should be created. Must be the same on the coordinator and all workers. Currently only one database is supported.
Consul
Most of the parameters are optional, but you have to specify one of the host or url
host: the host:port for the Consul local agent.
url: url for the Consul local agent, in format: http(s)://host:port.
port: (optional) Consul port.
scheme: (optional) http or https, defaults to http.
token: (optional) ACL token.
verify: (optional) whether to verify the SSL certificate for HTTPS requests.
cacert: (optional) The ca certificate. If present it will enable validation.
cert: (optional) file with the client certificate.
key: (optional) file with the client key. Can be empty if the key is part of cert.
dc: (optional) Datacenter to communicate with. By default the datacenter of the host is used.
consistency: (optional) Select consul consistency mode. Possible values are
default
,consistent
, orstale
(more details in consul API reference)checks: (optional) list of Consul health checks used for the session. By default an empty list is used.
register_service: (optional) whether or not to register a service with the name defined by the scope parameter and the tag master, primary, replica, or standby-leader depending on the node’s role. Defaults to false.
service_tags: (optional) additional static tags to add to the Consul service apart from the role (
master
/primary
/replica
/standby-leader
). By default an empty list is used.service_check_interval: (optional) how often to perform health check against registered url. Defaults to ‘5s’.
service_check_tls_server_name: (optional) overide SNI host when connecting via TLS, see also consul agent check API reference.
The token
needs to have the following ACL permissions:
service_prefix "${scope}" {
policy = "write"
}
key_prefix "${namespace}/${scope}" {
policy = "write"
}
session_prefix "" {
policy = "write"
}
Etcd
Most of the parameters are optional, but you have to specify one of the host, hosts, url, proxy or srv
host: the host:port for the etcd endpoint.
hosts: list of etcd endpoint in format host1:port1,host2:port2,etc… Could be a comma separated string or an actual yaml list.
use_proxies: If this parameter is set to true, Patroni will consider hosts as a list of proxies and will not perform a topology discovery of etcd cluster.
url: url for the etcd.
proxy: proxy url for the etcd. If you are connecting to the etcd using proxy, use this parameter instead of url.
srv: Domain to search the SRV record(s) for cluster autodiscovery. Patroni will try to query these SRV service names for specified domain (in that order until first success):
_etcd-client-ssl
,_etcd-client
,_etcd-ssl
,_etcd
,_etcd-server-ssl
,_etcd-server
. If SRV records for_etcd-server-ssl
or_etcd-server
are retrieved then ETCD peer protocol is used do query ETCD for available members. Otherwise hosts from SRV records will be used.srv_suffix: Configures a suffix to the SRV name that is queried during discovery. Use this flag to differentiate between multiple etcd clusters under the same domain. Works only with conjunction with srv. For example, if
srv_suffix: foo
andsrv: example.org
are set, the following DNS SRV query is made:_etcd-client-ssl-foo._tcp.example.com
(and so on for every possible ETCD SRV service name).protocol: (optional) http or https, if not specified http is used. If the url or proxy is specified - will take protocol from them.
username: (optional) username for etcd authentication.
password: (optional) password for etcd authentication.
cacert: (optional) The ca certificate. If present it will enable validation.
cert: (optional) file with the client certificate.
key: (optional) file with the client key. Can be empty if the key is part of cert.
Etcdv3
If you want that Patroni works with Etcd cluster via protocol version 3, you need to use the etcd3
section in the Patroni configuration file. All configuration parameters are the same as for etcd
.
Warning
Keys created with protocol version 2 are not visible with protocol version 3 and the other way around, therefore it is not possible to switch from etcd
to etcd3
just by updating Patroni config file.
ZooKeeper
hosts: List of ZooKeeper cluster members in format: [‘host1:port1’, ‘host2:port2’, ‘etc…’].
use_ssl: (optional) Whether SSL is used or not. Defaults to
false
. If set tofalse
, all SSL specific parameters are ignored.cacert: (optional) The CA certificate. If present it will enable validation.
cert: (optional) File with the client certificate.
key: (optional) File with the client key.
key_password: (optional) The client key password.
verify: (optional) Whether to verify certificate or not. Defaults to
true
.set_acls: (optional) If set, configure Kazoo to apply a default ACL to each ZNode that it creates. ACLs will assume ‘x509’ schema and should be specified as a dictionary with the principal as the key and one or more permissions as a list in the value. Permissions may be one of
CREATE
,READ
,WRITE
,DELETE
orADMIN
. For example,set_acls: {CN=principal1: [CREATE, READ], CN=principal2: [ALL]}
.
Note
It is required to install kazoo>=2.6.0
to support SSL.
Exhibitor
hosts: initial list of Exhibitor (ZooKeeper) nodes in format: ‘host1,host2,etc…’. This list updates automatically whenever the Exhibitor (ZooKeeper) cluster topology changes.
poll_interval: how often the list of ZooKeeper and Exhibitor nodes should be updated from Exhibitor.
port: Exhibitor port.
Kubernetes
bypass_api_service: (optional) When communicating with the Kubernetes API, Patroni is usually relying on the kubernetes service, the address of which is exposed in the pods via the KUBERNETES_SERVICE_HOST environment variable. If bypass_api_service is set to
true
, Patroni will resolve the list of API nodes behind the service and connect directly to them.namespace: (optional) Kubernetes namespace where Patroni pod is running. Default value is default.
labels: Labels in format
{label1: value1, label2: value2}
. These labels will be used to find existing objects (Pods and either Endpoints or ConfigMaps) associated with the current cluster. Also Patroni will set them on every object (Endpoint or ConfigMap) it creates.scope_label: (optional) name of the label containing cluster name. Default value is cluster-name.
role_label: (optional) name of the label containing role (master or replica or other custom value). Patroni will set this label on the pod it runs in. Default value is
role
.leader_label_value: (optional) value of the pod label when Postgres role is
master
. Default value ismaster
.follower_label_value: (optional) value of the pod label when Postgres role is
replica
. Default value isreplica
.standby_leader_label_value: (optional) value of the pod label when Postgres role is
standby_leader
. Default value ismaster
.tmp_role_label: (optional) name of the temporary label containing role (master or replica). Value of this label will always use the default of corresponding role. Set only when necessary.
use_endpoints: (optional) if set to true, Patroni will use Endpoints instead of ConfigMaps to run leader elections and keep cluster state.
pod_ip: (optional) IP address of the pod Patroni is running in. This value is required when use_endpoints is enabled and is used to populate the leader endpoint subsets when the pod’s PostgreSQL is promoted.
ports: (optional) if the Service object has the name for the port, the same name must appear in the Endpoint object, otherwise service won’t work. For example, if your service is defined as
{Kind: Service, spec: {ports: [{name: postgresql, port: 5432, targetPort: 5432}]}}
, then you have to setkubernetes.ports: [{"name": "postgresql", "port": 5432}]
and Patroni will use it for updating subsets of the leader Endpoint. This parameter is used only if kubernetes.use_endpoints is set.cacert: (optional) Specifies the file with the CA_BUNDLE file with certificates of trusted CAs to use while verifying Kubernetes API SSL certs. If not provided, patroni will use the value provided by the ServiceAccount secret.
retriable_http_codes: (optional) list of HTTP status codes from K8s API to retry on. By default Patroni is retrying on
500
,503
, and504
, or if K8s API response hasretry-after
HTTP header.
Raft (deprecated)
self_addr:
ip:port
to listen on for Raft connections. Theself_addr
must be accessible from other nodes of the cluster. If not set, the node will not participate in consensus.bind_addr: (optional)
ip:port
to listen on for Raft connections. If not specified theself_addr
will be used.partner_addrs: list of other Patroni nodes in the cluster in format: [‘ip1:port’, ‘ip2:port’, ‘etc…’]
data_dir: directory where to store Raft log and snapshot. If not specified the current working directory is used.
password: (optional) Encrypt Raft traffic with a specified password, requires
cryptography
python module.Short FAQ about Raft implementation
Q: How to list all the nodes providing consensus?
A:
syncobj_admin -conn host:port -status
where the host:port is the address of one of the cluster nodesQ: Node that was a part of consensus and has gone and I can’t reuse the same IP for other node. How to remove this node from the consensus?
A:
syncobj_admin -conn host:port -remove host2:port2
where thehost2:port2
is the address of the node you want to remove from consensus.Q: Where to get the
syncobj_admin
utility?A: It is installed together with
pysyncobj
module (python RAFT implementation), which is Patroni dependency.Q: it is possible to run Patroni node without adding in to the consensus?
A: Yes, just comment out or remove
raft.self_addr
from Patroni configuration.Q: It is possible to run Patroni and PostgreSQL only on two nodes?
A: Yes, on the third node you can run
patroni_raft_controller
(without Patroni and PostgreSQL). In such a setup, one can temporarily lose one node without affecting the primary.
PostgreSQL
postgresql:
authentication:
superuser:
username: name for the superuser, set during initialization (initdb) and later used by Patroni to connect to the postgres.
password: password for the superuser, set during initialization (initdb).
sslmode: (optional) maps to the sslmode connection parameter, which allows a client to specify the type of TLS negotiation mode with the server. For more information on how each mode works, please visit the PostgreSQL documentation. The default mode is
prefer
.sslkey: (optional) maps to the sslkey connection parameter, which specifies the location of the secret key used with the client’s certificate.
sslpassword: (optional) maps to the sslpassword connection parameter, which specifies the password for the secret key specified in
sslkey
.sslcert: (optional) maps to the sslcert connection parameter, which specifies the location of the client certificate.
sslrootcert: (optional) maps to the sslrootcert connection parameter, which specifies the location of a file containing one ore more certificate authorities (CA) certificates that the client will use to verify a server’s certificate.
sslcrl: (optional) maps to the sslcrl connection parameter, which specifies the location of a file containing a certificate revocation list. A client will reject connecting to any server that has a certificate present in this list.
sslcrldir: (optional) maps to the sslcrldir connection parameter, which specifies the location of a directory with files containing a certificate revocation list. A client will reject connecting to any server that has a certificate present in this list.
gssencmode: (optional) maps to the gssencmode connection parameter, which determines whether or with what priority a secure GSS TCP/IP connection will be negotiated with the server
channel_binding: (optional) maps to the channel_binding connection parameter, which controls the client’s use of channel binding.
replication:
username: replication username; the user will be created during initialization. Replicas will use this user to access the replication source via streaming replication
password: replication password; the user will be created during initialization.
sslmode: (optional) maps to the sslmode connection parameter, which allows a client to specify the type of TLS negotiation mode with the server. For more information on how each mode works, please visit the PostgreSQL documentation. The default mode is
prefer
.sslkey: (optional) maps to the sslkey connection parameter, which specifies the location of the secret key used with the client’s certificate.
sslpassword: (optional) maps to the sslpassword connection parameter, which specifies the password for the secret key specified in
sslkey
.sslcert: (optional) maps to the sslcert connection parameter, which specifies the location of the client certificate.
sslrootcert: (optional) maps to the sslrootcert connection parameter, which specifies the location of a file containing one ore more certificate authorities (CA) certificates that the client will use to verify a server’s certificate.
sslcrl: (optional) maps to the sslcrl connection parameter, which specifies the location of a file containing a certificate revocation list. A client will reject connecting to any server that has a certificate present in this list.
sslcrldir: (optional) maps to the sslcrldir connection parameter, which specifies the location of a directory with files containing a certificate revocation list. A client will reject connecting to any server that has a certificate present in this list.
gssencmode: (optional) maps to the gssencmode connection parameter, which determines whether or with what priority a secure GSS TCP/IP connection will be negotiated with the server
channel_binding: (optional) maps to the channel_binding connection parameter, which controls the client’s use of channel binding.
rewind:
username: (optional) name for the user for
pg_rewind
; the user will be created during initialization of postgres 11+ and all necessary permissions will be granted.password: (optional) password for the user for
pg_rewind
; the user will be created during initialization.sslmode: (optional) maps to the sslmode connection parameter, which allows a client to specify the type of TLS negotiation mode with the server. For more information on how each mode works, please visit the PostgreSQL documentation. The default mode is
prefer
.sslkey: (optional) maps to the sslkey connection parameter, which specifies the location of the secret key used with the client’s certificate.
sslpassword: (optional) maps to the sslpassword connection parameter, which specifies the password for the secret key specified in
sslkey
.sslcert: (optional) maps to the sslcert connection parameter, which specifies the location of the client certificate.
sslrootcert: (optional) maps to the sslrootcert connection parameter, which specifies the location of a file containing one ore more certificate authorities (CA) certificates that the client will use to verify a server’s certificate.
sslcrl: (optional) maps to the sslcrl connection parameter, which specifies the location of a file containing a certificate revocation list. A client will reject connecting to any server that has a certificate present in this list.
sslcrldir: (optional) maps to the sslcrldir connection parameter, which specifies the location of a directory with files containing a certificate revocation list. A client will reject connecting to any server that has a certificate present in this list.
gssencmode: (optional) maps to the gssencmode connection parameter, which determines whether or with what priority a secure GSS TCP/IP connection will be negotiated with the server
channel_binding: (optional) maps to the channel_binding connection parameter, which controls the client’s use of channel binding.
callbacks: callback scripts to run on certain actions. Patroni will pass the action, role and cluster name. (See scripts/aws.py as an example of how to write them.)
on_reload: run this script when configuration reload is triggered.
on_restart: run this script when the postgres restarts (without changing role).
on_role_change: run this script when the postgres is being promoted or demoted.
on_start: run this script when the postgres starts.
on_stop: run this script when the postgres stops.
connect_address: IP address + port through which Postgres is accessible from other nodes and applications.
proxy_address: IP address + port through which a connection pool (e.g. pgbouncer) running next to Postgres is accessible. The value is written to the member key in DCS as
proxy_url
and could be used/useful for service discovery.create_replica_methods: an ordered list of the create methods for turning a Patroni node into a new replica. “basebackup” is the default method; other methods are assumed to refer to scripts, each of which is configured as its own config item. See custom replica creation methods documentation for further explanation.
data_dir: The location of the Postgres data directory, either existing or to be initialized by Patroni.
config_dir: The location of the Postgres configuration directory, defaults to the data directory. Must be writable by Patroni.
bin_dir: (optional) Path to PostgreSQL binaries (pg_ctl, initdb, pg_controldata, pg_basebackup, postgres, pg_isready, pg_rewind). If not provided or is an empty string, PATH environment variable will be used to find the executables.
bin_name: (optional) Make it possible to override Postgres binary names, if you are using a custom Postgres distribution:
pg_ctl: (optional) Custom name for
pg_ctl
binary.initdb: (optional) Custom name for
initdb
binary.pgcontroldata: (optional) Custom name for
pg_controldata
binary.pg_basebackup: (optional) Custom name for
pg_basebackup
binary.postgres: (optional) Custom name for
postgres
binary.pg_isready: (optional) Custom name for
pg_isready
binary.pg_rewind: (optional) Custom name for
pg_rewind
binary.
listen: IP address + port that Postgres listens to; must be accessible from other nodes in the cluster, if you’re using streaming replication. Multiple comma-separated addresses are permitted, as long as the port component is appended after to the last one with a colon, i.e.
listen: 127.0.0.1,127.0.0.2:5432
. Patroni will use the first address from this list to establish local connections to the PostgreSQL node.use_unix_socket: specifies that Patroni should prefer to use unix sockets to connect to the cluster. Default value is
false
. Ifunix_socket_directories
is defined, Patroni will use the first suitable value from it to connect to the cluster and fallback to tcp if nothing is suitable. Ifunix_socket_directories
is not specified inpostgresql.parameters
, Patroni will assume that the default value should be used and omithost
from the connection parameters.use_unix_socket_repl: specifies that Patroni should prefer to use unix sockets for replication user cluster connection. Default value is
false
. Ifunix_socket_directories
is defined, Patroni will use the first suitable value from it to connect to the cluster and fallback to tcp if nothing is suitable. Ifunix_socket_directories
is not specified inpostgresql.parameters
, Patroni will assume that the default value should be used and omithost
from the connection parameters.pgpass: path to the .pgpass password file. Patroni creates this file before executing pg_basebackup, the post_init script and under some other circumstances. The location must be writable by Patroni.
recovery_conf: additional configuration settings written to recovery.conf when configuring follower.
custom_conf : path to an optional custom
postgresql.conf
file, that will be used in place ofpostgresql.base.conf
. The file must exist on all cluster nodes, be readable by PostgreSQL and will be included from its location on the realpostgresql.conf
. Note that Patroni will not monitor this file for changes, nor backup it. However, its settings can still be overridden by Patroni’s own configuration facilities - see dynamic configuration for details.parameters: list of configuration settings for Postgres. Many of these are required for replication to work.
pg_hba: list of lines that Patroni will use to generate
pg_hba.conf
. Patroni ignores this parameter ifhba_file
PostgreSQL parameter is set to a non-default value. Together with dynamic configuration this parameter simplifies management ofpg_hba.conf
.- host all all 0.0.0.0/0 md5
- host replication replicator 127.0.0.1/32 md5: A line like this is required for replication.
pg_ident: list of lines that Patroni will use to generate
pg_ident.conf
. Patroni ignores this parameter ifident_file
PostgreSQL parameter is set to a non-default value. Together with dynamic configuration this parameter simplifies management ofpg_ident.conf
.- mapname1 systemname1 pguser1
- mapname1 systemname2 pguser2
pg_ctl_timeout: How long should pg_ctl wait when doing
start
,stop
orrestart
. Default value is 60 seconds.use_pg_rewind: try to use pg_rewind on the former leader when it joins cluster as a replica.
remove_data_directory_on_rewind_failure: If this option is enabled, Patroni will remove the PostgreSQL data directory and recreate the replica. Otherwise it will try to follow the new leader. Default value is false.
remove_data_directory_on_diverged_timelines: Patroni will remove the PostgreSQL data directory and recreate the replica if it notices that timelines are diverging and the former primary can not start streaming from the new primary. This option is useful when
pg_rewind
can not be used. While performing timelines divergence check on PostgreSQL v10 and older Patroni will try to connect with replication credential to the “postgres” database. Hence, such access should be allowed in the pg_hba.conf. Default value is false.replica_method: for each create_replica_methods other than basebackup, you would add a configuration section of the same name. At a minimum, this should include “command” with a full path to the actual script to be executed. Other configuration parameters will be passed along to the script in the form “parameter=value”.
pre_promote: a fencing script that executes during a failover after acquiring the leader lock but before promoting the replica. If the script exits with a non-zero code, Patroni does not promote the replica and removes the leader key from DCS.
before_stop: a script that executes immediately prior to stopping postgres. As opposed to a callback, this script runs synchronously, blocking shutdown until it has completed. The return code of this script does not impact whether shutdown proceeds afterwards.
REST API
restapi:
connect_address: IP address (or hostname) and port, to access the Patroni’s REST API. All the members of the cluster must be able to connect to this address, so unless the Patroni setup is intended for a demo inside the localhost, this address must be a non “localhost” or loopback address (ie: “localhost” or “127.0.0.1”). It can serve as an endpoint for HTTP health checks (read below about the “listen” REST API parameter), and also for user queries (either directly or via the REST API), as well as for the health checks done by the cluster members during leader elections (for example, to determine whether the leader is still running, or if there is a node which has a WAL position that is ahead of the one doing the query; etc.) The connect_address is put in the member key in DCS, making it possible to translate the member name into the address to connect to its REST API.
listen: IP address (or hostname) and port that Patroni will listen to for the REST API - to provide also the same health checks and cluster messaging between the participating nodes, as described above. to provide health-check information for HAProxy (or any other load balancer capable of doing a HTTP “OPTION” or “GET” checks).
authentication: (optional)
username: Basic-auth username to protect unsafe REST API endpoints.
password: Basic-auth password to protect unsafe REST API endpoints.
certfile: (optional): Specifies the file with the certificate in the PEM format. If the certfile is not specified or is left empty, the API server will work without SSL.
keyfile: (optional): Specifies the file with the secret key in the PEM format.
keyfile_password: (optional): Specifies a password for decrypting the keyfile.
cafile: (optional): Specifies the file with the CA_BUNDLE with certificates of trusted CAs to use while verifying client certs.
ciphers: (optional): Specifies the permitted cipher suites (e.g. “ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:!SSLv1:!SSLv2:!SSLv3:!TLSv1:!TLSv1.1”)
verify_client: (optional):
none
(default),optional
orrequired
. Whennone
REST API will not check client certificates. Whenrequired
client certificates are required for all REST API calls. Whenoptional
client certificates are required for all unsafe REST API endpoints. Whenrequired
is used, then client authentication succeeds, if the certificate signature verification succeeds. Foroptional
the client cert will only be checked forPUT
,POST
,PATCH
, andDELETE
requests.allowlist: (optional): Specifies the set of hosts that are allowed to call unsafe REST API endpoints. The single element could be a host name, an IP address or a network address using CIDR notation. By default
allow all
is used. In case ifallowlist
orallowlist_include_members
are set, anything that is not included is rejected.allowlist_include_members: (optional): If set to
true
it allows accessing unsafe REST API endpoints from other cluster members registered in DCS (IP address or hostname is taken from the membersapi_url
). Be careful, it might happen that OS will use a different IP for outgoing connections.http_extra_headers: (optional): HTTP headers let the REST API server pass additional information with an HTTP response.
https_extra_headers: (optional): HTTPS headers let the REST API server pass additional information with an HTTP response when TLS is enabled. This will also pass additional information set in
http_extra_headers
.request_queue_size: (optional): Sets request queue size for TCP socket used by Patroni REST API. Once the queue is full, further requests get a “Connection denied” error. The default value is 5.
Here is an example of both http_extra_headers and https_extra_headers:
restapi:
listen: <listen>
connect_address: <connect_address>
authentication:
username: <username>
password: <password>
http_extra_headers:
'X-Frame-Options': 'SAMEORIGIN'
'X-XSS-Protection': '1; mode=block'
'X-Content-Type-Options': 'nosniff'
cafile: <ca file>
certfile: <cert>
keyfile: <key>
https_extra_headers:
'Strict-Transport-Security': 'max-age=31536000; includeSubDomains'
Warning
The
restapi.connect_address
must be accessible from all nodes of a given Patroni cluster. Internally Patroni is using it during the leader race to find nodes with minimal replication lag.If you enabled client certificates validation (
restapi.verify_client
is set torequired
), you also must provide valid client certificates in thectl.certfile
,ctl.keyfile
,ctl.keyfile_password
. If not provided, Patroni will not work correctly.
CTL
ctl: (optional)
authentication:
username: Basic-auth username for accessing protected REST API endpoints. If not provided patronictl will use the value provided for REST API “username” parameter.
password: Basic-auth password for accessing protected REST API endpoints. If not provided patronictl will use the value provided for REST API “password” parameter.
insecure: Allow connections to REST API without verifying SSL certs.
cacert: Specifies the file with the CA_BUNDLE file or directory with certificates of trusted CAs to use while verifying REST API SSL certs. If not provided patronictl will use the value provided for REST API “cafile” parameter.
certfile: Specifies the file with the client certificate in the PEM format.
keyfile: Specifies the file with the client secret key in the PEM format.
keyfile_password: Specifies a password for decrypting the client keyfile.
Watchdog
mode:
off
,automatic
orrequired
. Whenoff
watchdog is disabled. Whenautomatic
watchdog will be used if available, but ignored if it is not. Whenrequired
the node will not become a leader unless watchdog can be successfully enabled.device: Path to watchdog device. Defaults to
/dev/watchdog
.safety_margin: Number of seconds of safety margin between watchdog triggering and leader key expiration.
Environment Configuration Settings
It is possible to override some of the configuration parameters defined in the Patroni configuration file using the system environment variables. This document lists all environment variables handled by Patroni. The values set via those variables always take precedence over the ones set in the Patroni configuration file.
Global/Universal
PATRONI_CONFIGURATION: it is possible to set the entire configuration for the Patroni via
PATRONI_CONFIGURATION
environment variable. In this case any other environment variables will not be considered!PATRONI_NAME: name of the node where the current instance of Patroni is running. Must be unique for the cluster.
PATRONI_NAMESPACE: path within the configuration store where Patroni will keep information about the cluster. Default value: “/service”
PATRONI_SCOPE: cluster name
Log
PATRONI_LOG_LEVEL: sets the general logging level. Default value is INFO (see the docs for Python logging)
PATRONI_LOG_TRACEBACK_LEVEL: sets the level where tracebacks will be visible. Default value is ERROR. Set it to DEBUG if you want to see tracebacks only if you enable PATRONI_LOG_LEVEL=DEBUG.
PATRONI_LOG_FORMAT: sets the log formatting string. Default value is %(asctime)s %(levelname)s: %(message)s (see the LogRecord attributes)
PATRONI_LOG_DATEFORMAT: sets the datetime formatting string. (see the formatTime() documentation)
PATRONI_LOG_MAX_QUEUE_SIZE: Patroni is using two-step logging. Log records are written into the in-memory queue and there is a separate thread which pulls them from the queue and writes to stderr or file. The maximum size of the internal queue is limited by default by 1000 records, which is enough to keep logs for the past 1h20m.
PATRONI_LOG_DIR: Directory to write application logs to. The directory must exist and be writable by the user executing Patroni. If you set this env variable, the application will retain 4 25MB logs by default. You can tune those retention values with PATRONI_LOG_FILE_NUM and PATRONI_LOG_FILE_SIZE (see below).
PATRONI_LOG_FILE_NUM: The number of application logs to retain.
PATRONI_LOG_FILE_SIZE: Size of patroni.log file (in bytes) that triggers a log rolling.
PATRONI_LOG_LOGGERS: Redefine logging level per python module. Example
PATRONI_LOG_LOGGERS="{patroni.postmaster: WARNING, urllib3: DEBUG}"
Citus
Enables integration Patroni with Citus. If configured, Patroni will take care of registering Citus worker nodes on the coordinator. You can find more information about Citus support here.
PATRONI_CITUS_GROUP: the Citus group id, integer. Use
0
for coordinator and1
,2
, etc… for workersPATRONI_CITUS_DATABASE: the database where
citus
extension should be created. Must be the same on the coordinator and all workers. Currently only one database is supported.
Consul
PATRONI_CONSUL_HOST: the host:port for the Consul local agent.
PATRONI_CONSUL_URL: url for the Consul local agent, in format: http(s)://host:port
PATRONI_CONSUL_PORT: (optional) Consul port
PATRONI_CONSUL_SCHEME: (optional) http or https, defaults to http
PATRONI_CONSUL_TOKEN: (optional) ACL token
PATRONI_CONSUL_VERIFY: (optional) whether to verify the SSL certificate for HTTPS requests
PATRONI_CONSUL_CACERT: (optional) The ca certificate. If present it will enable validation.
PATRONI_CONSUL_CERT: (optional) File with the client certificate
PATRONI_CONSUL_KEY: (optional) File with the client key. Can be empty if the key is part of certificate.
PATRONI_CONSUL_DC: (optional) Datacenter to communicate with. By default the datacenter of the host is used.
PATRONI_CONSUL_CONSISTENCY: (optional) Select consul consistency mode. Possible values are
default
,consistent
, orstale
(more details in consul API reference)PATRONI_CONSUL_CHECKS: (optional) list of Consul health checks used for the session. By default an empty list is used.
PATRONI_CONSUL_REGISTER_SERVICE: (optional) whether or not to register a service with the name defined by the scope parameter and the tag master, primary, replica, or standby-leader depending on the node’s role. Defaults to false
PATRONI_CONSUL_SERVICE_TAGS: (optional) additional static tags to add to the Consul service apart from the role (
master
/primary
/replica
/standby-leader
). By default an empty list is used.PATRONI_CONSUL_SERVICE_CHECK_INTERVAL: (optional) how often to perform health check against registered url
PATRONI_CONSUL_SERVICE_CHECK_TLS_SERVER_NAME: (optional) overide SNI host when connecting via TLS, see also consul agent check API reference.
Etcd
PATRONI_ETCD_PROXY: proxy url for the etcd. If you are connecting to the etcd using proxy, use this parameter instead of PATRONI_ETCD_URL
PATRONI_ETCD_URL: url for the etcd, in format: http(s)://(username:password@)host:port
PATRONI_ETCD_HOSTS: list of etcd endpoints in format ‘host1:port1’,’host2:port2’,etc…
PATRONI_ETCD_USE_PROXIES: If this parameter is set to true, Patroni will consider hosts as a list of proxies and will not perform a topology discovery of etcd cluster but stick to a fixed list of hosts.
PATRONI_ETCD_PROTOCOL: http or https, if not specified http is used. If the url or proxy is specified - will take protocol from them.
PATRONI_ETCD_HOST: the host:port for the etcd endpoint.
PATRONI_ETCD_SRV: Domain to search the SRV record(s) for cluster autodiscovery. Patroni will try to query these SRV service names for specified domain (in that order until first success):
_etcd-client-ssl
,_etcd-client
,_etcd-ssl
,_etcd
,_etcd-server-ssl
,_etcd-server
. If SRV records for_etcd-server-ssl
or_etcd-server
are retrieved then ETCD peer protocol is used do query ETCD for available members. Otherwise hosts from SRV records will be used.PATRONI_ETCD_SRV_SUFFIX: Configures a suffix to the SRV name that is queried during discovery. Use this flag to differentiate between multiple etcd clusters under the same domain. Works only with conjunction with PATRONI_ETCD_SRV. For example, if
PATRONI_ETCD_SRV_SUFFIX=foo
andPATRONI_ETCD_SRV=example.org
are set, the following DNS SRV query is made:_etcd-client-ssl-foo._tcp.example.com
(and so on for every possible ETCD SRV service name).PATRONI_ETCD_USERNAME: username for etcd authentication.
PATRONI_ETCD_PASSWORD: password for etcd authentication.
PATRONI_ETCD_CACERT: The ca certificate. If present it will enable validation.
PATRONI_ETCD_CERT: File with the client certificate.
PATRONI_ETCD_KEY: File with the client key. Can be empty if the key is part of certificate.
Etcdv3
Environment names for Etcdv3 are similar as for Etcd, you just need to use ETCD3
instead of ETCD
in the variable name. Example: PATRONI_ETCD3_HOST
, PATRONI_ETCD3_CACERT
, and so on.
Warning
Keys created with protocol version 2 are not visible with protocol version 3 and the other way around, therefore it is not possible to switch from Etcd to Etcdv3 just by updating Patroni configuration.
ZooKeeper
PATRONI_ZOOKEEPER_HOSTS: Comma separated list of ZooKeeper cluster members: “‘host1:port1’,’host2:port2’,’etc…’”. It is important to quote every single entity!
PATRONI_ZOOKEEPER_USE_SSL: (optional) Whether SSL is used or not. Defaults to
false
. If set tofalse
, all SSL specific parameters are ignored.PATRONI_ZOOKEEPER_CACERT: (optional) The CA certificate. If present it will enable validation.
PATRONI_ZOOKEEPER_CERT: (optional) File with the client certificate.
PATRONI_ZOOKEEPER_KEY: (optional) File with the client key.
PATRONI_ZOOKEEPER_KEY_PASSWORD: (optional) The client key password.
PATRONI_ZOOKEEPER_VERIFY: (optional) Whether to verify certificate or not. Defaults to
true
.PATRONI_ZOOKEEPER_SET_ACLS: (optional) If set, configure Kazoo to apply a default ACL to each ZNode that it creates. ACLs will assume ‘x509’ schema and should be specified as a dictionary with the principal as the key and one or more permissions as a list in the value. Permissions may be one of
CREATE
,READ
,WRITE
,DELETE
orADMIN
. For example,set_acls: {CN=principal1: [CREATE, READ], CN=principal2: [ALL]}
.
Note
It is required to install kazoo>=2.6.0
to support SSL.
Exhibitor
PATRONI_EXHIBITOR_HOSTS: initial list of Exhibitor (ZooKeeper) nodes in format: ‘host1,host2,etc…’. This list updates automatically whenever the Exhibitor (ZooKeeper) cluster topology changes.
PATRONI_EXHIBITOR_PORT: Exhibitor port.
Kubernetes
PATRONI_KUBERNETES_BYPASS_API_SERVICE: (optional) When communicating with the Kubernetes API, Patroni is usually relying on the kubernetes service, the address of which is exposed in the pods via the KUBERNETES_SERVICE_HOST environment variable. If PATRONI_KUBERNETES_BYPASS_API_SERVICE is set to
true
, Patroni will resolve the list of API nodes behind the service and connect directly to them.PATRONI_KUBERNETES_NAMESPACE: (optional) Kubernetes namespace where the Patroni pod is running. Default value is default.
PATRONI_KUBERNETES_LABELS: Labels in format
{label1: value1, label2: value2}
. These labels will be used to find existing objects (Pods and either Endpoints or ConfigMaps) associated with the current cluster. Also Patroni will set them on every object (Endpoint or ConfigMap) it creates.PATRONI_KUBERNETES_SCOPE_LABEL: (optional) name of the label containing cluster name. Default value is cluster-name.
PATRONI_KUBERNETES_ROLE_LABEL: (optional) name of the label containing role (master or replica or other custom value). Patroni will set this label on the pod it runs in. Default value is
role
.PATRONI_KUBERNETES_LEADER_LABEL_VALUE: (optional) value of the pod label when Postgres role is master. Default value is master.
PATRONI_KUBERNETES_FOLLOWER_LABEL_VALUE: (optional) value of the pod label when Postgres role is replica. Default value is replica.
PATRONI_KUBERNETES_STANDBY_LEADER_LABEL_VALUE: (optional) value of the pod label when Postgres role is
standby_leader
. Default value ismaster
.PATRONI_KUBERNETES_TMP_ROLE_LABEL: (optional) name of the temporary label containing role (master or replica). Value of this label will always use the default of corresponding role. Set only when necessary.
PATRONI_KUBERNETES_USE_ENDPOINTS: (optional) if set to true, Patroni will use Endpoints instead of ConfigMaps to run leader elections and keep cluster state.
PATRONI_KUBERNETES_POD_IP: (optional) IP address of the pod Patroni is running in. This value is required when PATRONI_KUBERNETES_USE_ENDPOINTS is enabled and is used to populate the leader endpoint subsets when the pod’s PostgreSQL is promoted.
PATRONI_KUBERNETES_PORTS: (optional) if the Service object has the name for the port, the same name must appear in the Endpoint object, otherwise service won’t work. For example, if your service is defined as
{Kind: Service, spec: {ports: [{name: postgresql, port: 5432, targetPort: 5432}]}}
, then you have to setPATRONI_KUBERNETES_PORTS='[{"name": "postgresql", "port": 5432}]'
and Patroni will use it for updating subsets of the leader Endpoint. This parameter is used only if PATRONI_KUBERNETES_USE_ENDPOINTS is set.PATRONI_KUBERNETES_CACERT: (optional) Specifies the file with the CA_BUNDLE file with certificates of trusted CAs to use while verifying Kubernetes API SSL certs. If not provided, patroni will use the value provided by the ServiceAccount secret.
PATRONI_RETRIABLE_HTTP_CODES: (optional) list of HTTP status codes from K8s API to retry on. By default Patroni is retrying on
500
,503
, and504
, or if K8s API response hasretry-after
HTTP header.
Raft (deprecated)
PATRONI_RAFT_SELF_ADDR:
ip:port
to listen on for Raft connections. Theself_addr
must be accessible from other nodes of the cluster. If not set, the node will not participate in consensus.PATRONI_RAFT_BIND_ADDR: (optional)
ip:port
to listen on for Raft connections. If not specified theself_addr
will be used.PATRONI_RAFT_PARTNER_ADDRS: list of other Patroni nodes in the cluster in format
"'ip1:port1','ip2:port2'"
. It is important to quote every single entity!PATRONI_RAFT_DATA_DIR: directory where to store Raft log and snapshot. If not specified the current working directory is used.
PATRONI_RAFT_PASSWORD: (optional) Encrypt Raft traffic with a specified password, requires
cryptography
python module.
PostgreSQL
PATRONI_POSTGRESQL_LISTEN: IP address + port that Postgres listens to. Multiple comma-separated addresses are permitted, as long as the port component is appended after to the last one with a colon, i.e.
listen: 127.0.0.1,127.0.0.2:5432
. Patroni will use the first address from this list to establish local connections to the PostgreSQL node.PATRONI_POSTGRESQL_CONNECT_ADDRESS: IP address + port through which Postgres is accessible from other nodes and applications.
PATRONI_POSTGRESQL_PROXY_ADDRESS: IP address + port through which a connection pool (e.g. pgbouncer) running next to Postgres is accessible. The value is written to the member key in DCS as
proxy_url
and could be used/useful for service discovery.PATRONI_POSTGRESQL_DATA_DIR: The location of the Postgres data directory, either existing or to be initialized by Patroni.
PATRONI_POSTGRESQL_CONFIG_DIR: The location of the Postgres configuration directory, defaults to the data directory. Must be writable by Patroni.
PATRONI_POSTGRESQL_BIN_DIR: Path to PostgreSQL binaries. (pg_ctl, initdb, pg_controldata, pg_basebackup, postgres, pg_isready, pg_rewind) The default value is an empty string meaning that PATH environment variable will be used to find the executables.
PATRONI_POSTGRESQL_BIN_PG_CTL: (optional) Custom name for
pg_ctl
binary.PATRONI_POSTGRESQL_BIN_INITDB: (optional) Custom name for
initdb
binary.PATRONI_POSTGRESQL_BIN_PG_CONTROLDATA: (optional) Custom name for
pg_controldata
binary.PATRONI_POSTGRESQL_BIN_PG_BASEBACKUP: (optional) Custom name for
pg_basebackup
binary.PATRONI_POSTGRESQL_BIN_POSTGRES: (optional) Custom name for
postgres
binary.PATRONI_POSTGRESQL_BIN_IS_READY: (optional) Custom name for
pg_isready
binary.PATRONI_POSTGRESQL_BIN_PG_REWIND: (optional) Custom name for
pg_rewind
binary.PATRONI_POSTGRESQL_PGPASS: path to the .pgpass password file. Patroni creates this file before executing pg_basebackup and under some other circumstances. The location must be writable by Patroni.
PATRONI_REPLICATION_USERNAME: replication username; the user will be created during initialization. Replicas will use this user to access the replication source via streaming replication
PATRONI_REPLICATION_PASSWORD: replication password; the user will be created during initialization.
PATRONI_REPLICATION_SSLMODE: (optional) maps to the sslmode connection parameter, which allows a client to specify the type of TLS negotiation mode with the server. For more information on how each mode works, please visit the PostgreSQL documentation. The default mode is
prefer
.PATRONI_REPLICATION_SSLKEY: (optional) maps to the sslkey connection parameter, which specifies the location of the secret key used with the client’s certificate.
PATRONI_REPLICATION_SSLPASSWORD: (optional) maps to the sslpassword connection parameter, which specifies the password for the secret key specified in
PATRONI_REPLICATION_SSLKEY
.PATRONI_REPLICATION_SSLCERT: (optional) maps to the sslcert connection parameter, which specifies the location of the client certificate.
PATRONI_REPLICATION_SSLROOTCERT: (optional) maps to the sslrootcert connection parameter, which specifies the location of a file containing one ore more certificate authorities (CA) certificates that the client will use to verify a server’s certificate.
PATRONI_REPLICATION_SSLCRL: (optional) maps to the sslcrl connection parameter, which specifies the location of a file containing a certificate revocation list. A client will reject connecting to any server that has a certificate present in this list.
PATRONI_REPLICATION_SSLCRLDIR: (optional) maps to the sslcrldir connection parameter, which specifies the location of a directory with files containing a certificate revocation list. A client will reject connecting to any server that has a certificate present in this list.
PATRONI_REPLICATION_GSSENCMODE: (optional) maps to the gssencmode connection parameter, which determines whether or with what priority a secure GSS TCP/IP connection will be negotiated with the server
PATRONI_REPLICATION_CHANNEL_BINDING: (optional) maps to the channel_binding connection parameter, which controls the client’s use of channel binding.
PATRONI_SUPERUSER_USERNAME: name for the superuser, set during initialization (initdb) and later used by Patroni to connect to the postgres. Also this user is used by pg_rewind.
PATRONI_SUPERUSER_PASSWORD: password for the superuser, set during initialization (initdb).
PATRONI_SUPERUSER_SSLMODE: (optional) maps to the sslmode connection parameter, which allows a client to specify the type of TLS negotiation mode with the server. For more information on how each mode works, please visit the PostgreSQL documentation. The default mode is
prefer
.PATRONI_SUPERUSER_SSLKEY: (optional) maps to the sslkey connection parameter, which specifies the location of the secret key used with the client’s certificate.
PATRONI_SUPERUSER_SSLPASSWORD: (optional) maps to the sslpassword connection parameter, which specifies the password for the secret key specified in
PATRONI_SUPERUSER_SSLKEY
.PATRONI_SUPERUSER_SSLCERT: (optional) maps to the sslcert connection parameter, which specifies the location of the client certificate.
PATRONI_SUPERUSER_SSLROOTCERT: (optional) maps to the sslrootcert connection parameter, which specifies the location of a file containing one ore more certificate authorities (CA) certificates that the client will use to verify a server’s certificate.
PATRONI_SUPERUSER_SSLCRL: (optional) maps to the sslcrl connection parameter, which specifies the location of a file containing a certificate revocation list. A client will reject connecting to any server that has a certificate present in this list.
PATRONI_SUPERUSER_SSLCRLDIR: (optional) maps to the sslcrldir connection parameter, which specifies the location of a directory with files containing a certificate revocation list. A client will reject connecting to any server that has a certificate present in this list.
PATRONI_SUPERUSER_GSSENCMODE: (optional) maps to the gssencmode connection parameter, which determines whether or with what priority a secure GSS TCP/IP connection will be negotiated with the server
PATRONI_SUPERUSER_CHANNEL_BINDING: (optional) maps to the channel_binding connection parameter, which controls the client’s use of channel binding.
PATRONI_REWIND_USERNAME: (optional) name for the user for
pg_rewind
; the user will be created during initialization of postgres 11+ and all necessary permissions will be granted.PATRONI_REWIND_PASSWORD: (optional) password for the user for
pg_rewind
; the user will be created during initialization.PATRONI_REWIND_SSLMODE: (optional) maps to the sslmode connection parameter, which allows a client to specify the type of TLS negotiation mode with the server. For more information on how each mode works, please visit the PostgreSQL documentation. The default mode is
prefer
.PATRONI_REWIND_SSLKEY: (optional) maps to the sslkey connection parameter, which specifies the location of the secret key used with the client’s certificate.
PATRONI_REWIND_SSLPASSWORD: (optional) maps to the sslpassword connection parameter, which specifies the password for the secret key specified in
PATRONI_REWIND_SSLKEY
.PATRONI_REWIND_SSLCERT: (optional) maps to the sslcert connection parameter, which specifies the location of the client certificate.
PATRONI_REWIND_SSLROOTCERT: (optional) maps to the sslrootcert connection parameter, which specifies the location of a file containing one ore more certificate authorities (CA) certificates that the client will use to verify a server’s certificate.
PATRONI_REWIND_SSLCRL: (optional) maps to the sslcrl connection parameter, which specifies the location of a file containing a certificate revocation list. A client will reject connecting to any server that has a certificate present in this list.
PATRONI_REWIND_SSLCRLDIR: (optional) maps to the sslcrldir connection parameter, which specifies the location of a directory with files containing a certificate revocation list. A client will reject connecting to any server that has a certificate present in this list.
PATRONI_REWIND_GSSENCMODE: (optional) maps to the gssencmode connection parameter, which determines whether or with what priority a secure GSS TCP/IP connection will be negotiated with the server
PATRONI_REWIND_CHANNEL_BINDING: (optional) maps to the channel_binding connection parameter, which controls the client’s use of channel binding.
REST API
PATRONI_RESTAPI_CONNECT_ADDRESS: IP address and port to access the REST API.
PATRONI_RESTAPI_LISTEN: IP address and port that Patroni will listen to, to provide health-check information for HAProxy.
PATRONI_RESTAPI_USERNAME: Basic-auth username to protect unsafe REST API endpoints.
PATRONI_RESTAPI_PASSWORD: Basic-auth password to protect unsafe REST API endpoints.
PATRONI_RESTAPI_CERTFILE: Specifies the file with the certificate in the PEM format. If the certfile is not specified or is left empty, the API server will work without SSL.
PATRONI_RESTAPI_KEYFILE: Specifies the file with the secret key in the PEM format.
PATRONI_RESTAPI_KEYFILE_PASSWORD: Specifies a password for decrypting the keyfile.
PATRONI_RESTAPI_CAFILE: Specifies the file with the CA_BUNDLE with certificates of trusted CAs to use while verifying client certs.
PATRONI_RESTAPI_CIPHERS: (optional) Specifies the permitted cipher suites (e.g. “ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:!SSLv1:!SSLv2:!SSLv3:!TLSv1:!TLSv1.1”)
PATRONI_RESTAPI_VERIFY_CLIENT:
none
(default),optional
orrequired
. Whennone
REST API will not check client certificates. Whenrequired
client certificates are required for all REST API calls. Whenoptional
client certificates are required for all unsafe REST API endpoints. Whenrequired
is used, then client authentication succeeds, if the certificate signature verification succeeds. Foroptional
the client cert will only be checked forPUT
,POST
,PATCH
, andDELETE
requests.PATRONI_RESTAPI_ALLOWLIST: (optional): Specifies the set of hosts that are allowed to call unsafe REST API endpoints. The single element could be a host name, an IP address or a network address using CIDR notation. By default
allow all
is used. In case ifallowlist
orallowlist_include_members
are set, anything that is not included is rejected.PATRONI_RESTAPI_ALLOWLIST_INCLUDE_MEMBERS: (optional): If set to
true
it allows accessing unsafe REST API endpoints from other cluster members registered in DCS (IP address or hostname is taken from the membersapi_url
). Be careful, it might happen that OS will use a different IP for outgoing connections.PATRONI_RESTAPI_HTTP_EXTRA_HEADERS: (optional) HTTP headers let the REST API server pass additional information with an HTTP response.
PATRONI_RESTAPI_HTTPS_EXTRA_HEADERS: (optional) HTTPS headers let the REST API server pass additional information with an HTTP response when TLS is enabled. This will also pass additional information set in
http_extra_headers
.PATRONI_RESTAPI_REQUEST_QUEUE_SIZE: (optional): Sets request queue size for TCP socket used by Patroni REST API. Once the queue is full, further requests get a “Connection denied” error. The default value is 5.
Warning
The
PATRONI_RESTAPI_CONNECT_ADDRESS
must be accessible from all nodes of a given Patroni cluster. Internally Patroni is using it during the leader race to find nodes with minimal replication lag.If you enabled client certificates validation (
PATRONI_RESTAPI_VERIFY_CLIENT
is set torequired
), you also must provide valid client certificates in thePATRONI_CTL_CERTFILE
,PATRONI_CTL_KEYFILE
,PATRONI_CTL_KEYFILE_PASSWORD
. If not provided, Patroni will not work correctly.
CTL
PATRONICTL_CONFIG_FILE: (optional) location of the configuration file.
PATRONI_CTL_USERNAME: (optional) Basic-auth username for accessing protected REST API endpoints. If not provided patronictl will use the value provided for REST API “username” parameter.
PATRONI_CTL_PASSWORD: (optional) Basic-auth password for accessing protected REST API endpoints. If not provided patronictl will use the value provided for REST API “password” parameter.
PATRONI_CTL_INSECURE: (optional) Allow connections to REST API without verifying SSL certs.
PATRONI_CTL_CACERT: (optional) Specifies the file with the CA_BUNDLE file or directory with certificates of trusted CAs to use while verifying REST API SSL certs. If not provided patronictl will use the value provided for REST API “cafile” parameter.
PATRONI_CTL_CERTFILE: (optional) Specifies the file with the client certificate in the PEM format.
PATRONI_CTL_KEYFILE: (optional) Specifies the file with the client secret key in the PEM format.
PATRONI_CTL_KEYFILE_PASSWORD: (optional) Specifies a password for decrypting the client keyfile.
There are 3 types of Patroni configuration:
- Global dynamic configuration.
These options are stored in the DCS (Distributed Configuration Store) and applied on all cluster nodes. Dynamic configuration can be set at any time using patronictl edit-config tool or Patroni REST API. If the options changed are not part of the startup configuration, they are applied asynchronously (upon the next wake up cycle) to every node, which gets subsequently reloaded. If the node requires a restart to apply the configuration (for PostgreSQL parameters with context postmaster, if their values have changed), a special flag
pending_restart
indicating this is set in the members.data JSON. Additionally, the node status indicates this by showing"restart_pending": true
.
- Local configuration file (patroni.yml).
These options are defined in the configuration file and take precedence over dynamic configuration.
patroni.yml
can be changed and reloaded at runtime (without restart of Patroni) by sending SIGHUP to the Patroni process, performingPOST /reload
REST-API request or executing patronictl reload. Local configuration can be either a single YAML file or a directory. When it is a directory, all YAML files in that directory are loaded one by one in sorted order. In case a key is defined in multiple files, the occurrence in the last file takes precedence.
- Environment configuration.
It is possible to set/override some of the “Local” configuration parameters with environment variables. Environment configuration is very useful when you are running in a dynamic environment and you don’t know some of the parameters in advance (for example it’s not possible to know your external IP address when you are running inside
docker
).
Important rules
PostgreSQL parameters controlled by Patroni
Some of the PostgreSQL parameters must hold the same values on the primary and the replicas. For those, values set either in the local patroni configuration files or via the environment variables take no effect. To alter or set their values one must change the shared configuration in the DCS. Below is the actual list of such parameters together with the default values:
max_connections: 100
max_locks_per_transaction: 64
max_worker_processes: 8
max_prepared_transactions: 0
wal_level: hot_standby
track_commit_timestamp: off
For the parameters below, PostgreSQL does not require equal values among the primary and all the replicas. However, considering the possibility of a replica to become the primary at any time, it doesn’t really make sense to set them differently; therefore, Patroni restricts setting their values to the dynamic configuration.
max_wal_senders: 5
max_replication_slots: 5
wal_keep_segments: 8
wal_keep_size: 128MB
These parameters are validated to ensure they are sane, or meet a minimum value.
There are some other Postgres parameters controlled by Patroni:
listen_addresses - is set either from
postgresql.listen
or fromPATRONI_POSTGRESQL_LISTEN
environment variableport - is set either from
postgresql.listen
or fromPATRONI_POSTGRESQL_LISTEN
environment variablecluster_name - is set either from
scope
or fromPATRONI_SCOPE
environment variablehot_standby: on
wal_log_hints: on - for Postgres 9.4 and newer.
To be on the safe side parameters from the above lists are not written into postgresql.conf
, but passed as a list of arguments to the pg_ctl start
which gives them the highest precedence, even above ALTER SYSTEM
There also are some parameters like postgresql.listen, postgresql.data_dir that can be set only locally, i.e. in the Patroni config file or via configuration variable. In most cases the local configuration will override the dynamic configuration.
When applying the local or dynamic configuration options, the following actions are taken:
The node first checks if there is a postgresql.base.conf file or if the
custom_conf
parameter is set.If the
custom_conf
parameter is set, the file it specifies is used as the base configuration, ignoring postgresql.base.conf and postgresql.conf.If the
custom_conf
parameter is not set and postgresql.base.conf exists, it contains the renamed “original” configuration and is used as the base configuration.If there is no
custom_conf
nor postgresql.base.conf, the original postgresql.conf is renamed to postgresql.base.conf and used as the base configuration.The dynamic options (with the exceptions above) are dumped into the postgresql.conf and an include is set in postgresql.conf to the base configuration (either postgresql.base.conf or the file at
custom_conf
). Therefore, we would be able to apply new options without re-reading the configuration file to check if the include is present or not.Some parameters that are essential for Patroni to manage the cluster are overridden using the command line.
If an option that requires restart is changed (we should look at the context in pg_settings and at the actual values of those options), a pending_restart flag is set on that node. This flag is reset on any restart.
The parameters would be applied in the following order (run-time are given the highest priority):
load parameters from file postgresql.base.conf (or from a
custom_conf
file, if set)load parameters from file postgresql.conf
load parameters from file postgresql.auto.conf
run-time parameter using -o –name=value
This allows configuration for all the nodes (2), configuration for a specific node using ALTER SYSTEM
(3) and ensures that parameters essential to the running of Patroni are enforced (4), as well as leaves room for configuration tools that manage postgresql.conf directly without involving Patroni (1).
Patroni configuration parameters
Also the following Patroni configuration options can be changed only dynamically:
ttl: 30
loop_wait: 10
retry_timeouts: 10
maximum_lag_on_failover: 1048576
max_timelines_history: 0
check_timeline: false
postgresql.use_slots: true
Upon changing these options, Patroni will read the relevant section of the configuration stored in DCS and change its run-time values.
Patroni nodes are dumping the state of the DCS options to disk upon for every change of the configuration into the file patroni.dynamic.json
located in the Postgres data directory. Only the leader is allowed to restore these options from the on-disk dump if these are completely absent from the DCS or if they are invalid.
Configuration generation and validation
Patroni provides command-line interfaces for a Patroni local configuration generation and validation. Using the patroni
executable you can:
Create a sample local Patroni configuration;
Create a Patroni configuration file for the locally running PostgreSQL instance (e.g. as a preparation step for the Patroni integration);
Validate a given Patroni configuration file.
Sample Patroni configuration
patroni --generate-sample-config [configfile]
Description
Generate a sample Patroni configuration file in yaml
format.
Parameter values are defined using the Environment configuration, otherwise, if not set, the defaults used in Patroni or the #FIXME
string for the values that should be later defined by the user.
Some default values are defined based on the local setup:
postgresql.listen: the IP address returned by
gethostname
call for the current machine’s hostname and the standard5432
port.postgresql.connect_address: the IP address returned by
gethostname
call for the current machine’s hostname and the standard5432
port.postgresql.authentication.rewind: is only defined if the PostgreSQL version can be defined from the binary and the version is 11 or later.
restapi.listen: IP address returned by
gethostname
call for the current machine’s hostname and the standard8008
port.restapi.connect_address: IP address returned by
gethostname
call for the current machine’s hostname and the standard8008
port.
Parameters
configfile
- full path to the configuration file used to store the result. If not provided, the result is sent to stdout
.
Patroni configuration for a running instance
patroni --generate-config [--dsn DSN] [configfile]
Description
Generate a Patroni configuration in yaml
format for the locally running PostgreSQL instance.
Either the provided DSN (takes precedence) or PostgreSQL environment variables will be used for the PostgreSQL connection. If the password is not provided, it should be entered via prompt.
All the non-internal GUCs defined in the source Postgres instance, independently if they were set through a configuration file, through the postmaster command-line, or through environment variables, will be used as the source for the following Patroni configuration parameters:
scope:
cluster_name
GUC value;postgresql.listen:
listen_addresses
andport
GUC values;postgresql.datadir:
data_directory
GUC value;postgresql.parameters:
archive_command
,restore_command
,archive_cleanup_command
,recovery_end_command
,ssl_passphrase_command
,hba_file
,ident_file
,config_file
GUC values;bootstrap.dcs: all other gathered PostgreSQL GUCs.
If scope
, postgresql.listen
or postgresql.datadir
is not set from the Postgres GUCs, the respective Environment configuration value is used.
Other rules applied for the values definition:
name:
PATRONI_NAME
environment variable value if set, otherwise the current machine’s hostname.postgresql.bin_dir: path to the Postgres binaries gathered from the running instance.
postgresql.connect_address: the IP address returned by
gethostname
call for the current machine’s hostname and the port used for the instance connection or theport
GUC value.postgresql.authentication.superuser: the configuration used for the instance connection;
postgresql.pg_hba: the lines gathered from the source instance’s
hba_file
.postgresql.pg_ident: the lines gathered from the source instance’s
ident_file
.restapi.listen: IP address returned by
gethostname
call for the current machine’s hostname and the standard8008
port.restapi.connect_address: IP address returned by
gethostname
call for the current machine’s hostname and the standard8008
port.
Other parameters defined using Environment configuration are also included into the configuration.
Parameters
configfile
Full path to the configuration file used to store the result. If not provided, result is sent to
stdout
.dsn
Optional DSN string for the local PostgreSQL instance to get GUC values from.
Validate Patroni configuration
patroni --validate-config [configfile]
Description
Validate the given Patroni configuration and print the information about the failed checks.
Parameters
configfile
Full path to the configuration file to check. If not given or file does not exist, will try to read from the
PATRONI_CONFIG_VARIABLE
environment variable or, if not set, from the Patroni environment variables.
Patroni REST API
Patroni has a rich REST API, which is used by Patroni itself during the leader race, by the patronictl tool in order to perform failovers/switchovers/reinitialize/restarts/reloads, by HAProxy or any other kind of load balancer to perform HTTP health checks, and of course could also be used for monitoring. Below you will find the list of Patroni REST API endpoints.
Health check endpoints
For all health check GET
requests Patroni returns a JSON document with the status of the node, along with the HTTP status code. If you don’t want or don’t need the JSON document, you might consider using the HEAD
or OPTIONS
method instead of GET
.
The following requests to Patroni REST API will return HTTP status code 200 only when the Patroni node is running as the primary with leader lock:
GET /
GET /primary
GET /read-write
GET /standby-leader
: returns HTTP status code 200 only when the Patroni node is running as the leader in a standby cluster.GET /leader
: returns HTTP status code 200 when the Patroni node has the leader lock. The major difference from the two previous endpoints is that it doesn’t take into account whether PostgreSQL is running as theprimary
or thestandby_leader
.GET /replica
: replica health check endpoint. It returns HTTP status code 200 only when the Patroni node is in the staterunning
, the role isreplica
andnoloadbalance
tag is not set.GET /replica?lag=<max-lag>
: replica check endpoint. In addition to checks fromreplica
, it also checks replication latency and returns status code 200 only when it is below specified value. The key cluster.last_leader_operation from DCS is used for Leader wal position and compute latency on replica for performance reasons. max-lag can be specified in bytes (integer) or in human readable values, for e.g. 16kB, 64MB, 1GB.GET /replica?lag=1048576
GET /replica?lag=1024kB
GET /replica?lag=10MB
GET /replica?lag=1GB
GET /replica?tag_key1=value1&tag_key2=value2
: replica check endpoint. In addition, It will also check for user defined tagskey1
andkey2
and their respective values in the tags section of the yaml configuration management. If the tag isn’t defined for an instance, or if the value in the yaml configuration doesn’t match the querying value, it will return HTTP Status Code 503.In the following requests, since we are checking for the leader or standby-leader status, Patroni doesn’t apply any of the user defined tags and they will be ignored.
GET /?tag_key1=value1&tag_key2=value2
GET /leader?tag_key1=value1&tag_key2=value2
GET /primary?tag_key1=value1&tag_key2=value2
GET /read-write?tag_key1=value1&tag_key2=value2
GET /standby_leader?tag_key1=value1&tag_key2=value2
GET /standby-leader?tag_key1=value1&tag_key2=value2
GET /read-only
: like the above endpoint, but also includes the primary.GET /synchronous
orGET /sync
: returns HTTP status code 200 only when the Patroni node is running as a synchronous standby.GET /read-only-sync
: like the above endpoint, but also includes the primary.GET /asynchronous
orGET /async
: returns HTTP status code 200 only when the Patroni node is running as an asynchronous standby.GET /asynchronous?lag=<max-lag>
orGET /async?lag=<max-lag>
: asynchronous standby check endpoint. In addition to checks fromasynchronous
orasync
, it also checks replication latency and returns status code 200 only when it is below specified value. The key cluster.last_leader_operation from DCS is used for Leader wal position and compute latency on replica for performance reasons. max-lag can be specified in bytes (integer) or in human readable values, for e.g. 16kB, 64MB, 1GB.GET /async?lag=1048576
GET /async?lag=1024kB
GET /async?lag=10MB
GET /async?lag=1GB
GET /health
: returns HTTP status code 200 only when PostgreSQL is up and running.GET /liveness
: returns HTTP status code 200 if Patroni heartbeat loop is properly running and 503 if the last run was more thanttl
seconds ago on the primary or2*ttl
on the replica. Could be used forlivenessProbe
.GET /readiness
: returns HTTP status code 200 when the Patroni node is running as the leader or when PostgreSQL is up and running. The endpoint could be used forreadinessProbe
when it is not possible to use Kubernetes endpoints for leader elections (OpenShift).
Both, readiness
and liveness
endpoints are very light-weight and not executing any SQL. Probes should be configured in such a way that they start failing about time when the leader key is expiring. With the default value of ttl
, which is 30s
example probes would look like:
readinessProbe:
httpGet:
scheme: HTTP
path: /readiness
port: 8008
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
scheme: HTTP
path: /liveness
port: 8008
initialDelaySeconds: 3
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
Monitoring endpoint
The GET /patroni
is used by Patroni during the leader race. It also could be used by your monitoring system. The JSON document produced by this endpoint has the same structure as the JSON produced by the health check endpoints.
Example: A healthy cluster
$ curl -s http://localhost:8008/patroni | jq .
{
"state": "running",
"postmaster_start_time": "2023-08-18 11:03:37.966359+00:00",
"role": "master",
"server_version": 150004,
"xlog": {
"location": 67395656
},
"timeline": 1,
"replication": [
{
"usename": "replicator",
"application_name": "patroni2",
"client_addr": "10.89.0.6",
"state": "streaming",
"sync_state": "async",
"sync_priority": 0
},
{
"usename": "replicator",
"application_name": "patroni3",
"client_addr": "10.89.0.2",
"state": "streaming",
"sync_state": "async",
"sync_priority": 0
}
],
"dcs_last_seen": 1692356718,
"tags": {
"clonefrom": true
},
"database_system_identifier": "7268616322854375442",
"patroni": {
"version": "3.1.0",
"scope": "demo",
"name": "patroni1"
}
}
Example: An unlocked cluster
$ curl -s http://localhost:8008/patroni | jq .
{
"state": "running",
"postmaster_start_time": "2023-08-18 11:09:08.615242+00:00",
"role": "replica",
"server_version": 150004,
"xlog": {
"received_location": 67419744,
"replayed_location": 67419744,
"replayed_timestamp": null,
"paused": false
},
"timeline": 1,
"replication": [
{
"usename": "replicator",
"application_name": "patroni2",
"client_addr": "10.89.0.6",
"state": "streaming",
"sync_state": "async",
"sync_priority": 0
},
{
"usename": "replicator",
"application_name": "patroni3",
"client_addr": "10.89.0.2",
"state": "streaming",
"sync_state": "async",
"sync_priority": 0
}
],
"cluster_unlocked": true,
"dcs_last_seen": 1692356928,
"tags": {
"clonefrom": true
},
"database_system_identifier": "7268616322854375442",
"patroni": {
"version": "3.1.0",
"scope": "demo",
"name": "patroni1"
}
}
Example: An unlocked cluster with DCS failsafe mode enabled
$ curl -s http://localhost:8008/patroni | jq .
{
"state": "running",
"postmaster_start_time": "2023-08-18 11:09:08.615242+00:00",
"role": "replica",
"server_version": 150004,
"xlog": {
"location": 67420024
},
"timeline": 1,
"replication": [
{
"usename": "replicator",
"application_name": "patroni2",
"client_addr": "10.89.0.6",
"state": "streaming",
"sync_state": "async",
"sync_priority": 0
},
{
"usename": "replicator",
"application_name": "patroni3",
"client_addr": "10.89.0.2",
"state": "streaming",
"sync_state": "async",
"sync_priority": 0
}
],
"cluster_unlocked": true,
"failsafe_mode_is_active": true,
"dcs_last_seen": 1692356928,
"tags": {
"clonefrom": true
},
"database_system_identifier": "7268616322854375442",
"patroni": {
"version": "3.1.0",
"scope": "demo",
"name": "patroni1"
}
}
Example: A cluster with the pause mode enabled
$ curl -s http://localhost:8008/patroni | jq .
{
"state": "running",
"postmaster_start_time": "2023-08-18 11:09:08.615242+00:00",
"role": "replica",
"server_version": 150004,
"xlog": {
"location": 67420024
},
"timeline": 1,
"replication": [
{
"usename": "replicator",
"application_name": "patroni2",
"client_addr": "10.89.0.6",
"state": "streaming",
"sync_state": "async",
"sync_priority": 0
},
{
"usename": "replicator",
"application_name": "patroni3",
"client_addr": "10.89.0.2",
"state": "streaming",
"sync_state": "async",
"sync_priority": 0
}
],
"pause": true,
"dcs_last_seen": 1692356928,
"tags": {
"clonefrom": true
},
"database_system_identifier": "7268616322854375442",
"patroni": {
"version": "3.1.0",
"scope": "demo",
"name": "patroni1"
}
}
Retrieve the Patroni metrics in Prometheus format through the GET /metrics
endpoint.
$ curl http://localhost:8008/metrics
# HELP patroni_version Patroni semver without periods. \
# TYPE patroni_version gauge
patroni_version{scope="batman",name="patroni1"} 020103
# HELP patroni_postgres_running Value is 1 if Postgres is running, 0 otherwise.
# TYPE patroni_postgres_running gauge
patroni_postgres_running{scope="batman",name="patroni1"} 1
# HELP patroni_postmaster_start_time Epoch seconds since Postgres started.
# TYPE patroni_postmaster_start_time gauge
patroni_postmaster_start_time{scope="batman",name="patroni1"} 1657656955.179243
# HELP patroni_master Value is 1 if this node is the leader, 0 otherwise.
# TYPE patroni_master gauge
patroni_master{scope="batman",name="patroni1"} 1
# HELP patroni_primary Value is 1 if this node is the leader, 0 otherwise.
# TYPE patroni_primary gauge
patroni_primary{scope="batman",name="patroni1"} 1
# HELP patroni_xlog_location Current location of the Postgres transaction log, 0 if this node is not the leader.
# TYPE patroni_xlog_location counter
patroni_xlog_location{scope="batman",name="patroni1"} 22320573386952
# HELP patroni_standby_leader Value is 1 if this node is the standby_leader, 0 otherwise.
# TYPE patroni_standby_leader gauge
patroni_standby_leader{scope="batman",name="patroni1"} 0
# HELP patroni_replica Value is 1 if this node is a replica, 0 otherwise.
# TYPE patroni_replica gauge
patroni_replica{scope="batman",name="patroni1"} 0
# HELP patroni_sync_standby Value is 1 if this node is a sync standby replica, 0 otherwise.
# TYPE patroni_sync_standby gauge
patroni_sync_standby{scope="batman",name="patroni1"} 0
# HELP patroni_xlog_received_location Current location of the received Postgres transaction log, 0 if this node is not a replica.
# TYPE patroni_xlog_received_location counter
patroni_xlog_received_location{scope="batman",name="patroni1"} 0
# HELP patroni_xlog_replayed_location Current location of the replayed Postgres transaction log, 0 if this node is not a replica.
# TYPE patroni_xlog_replayed_location counter
patroni_xlog_replayed_location{scope="batman",name="patroni1"} 0
# HELP patroni_xlog_replayed_timestamp Current timestamp of the replayed Postgres transaction log, 0 if null.
# TYPE patroni_xlog_replayed_timestamp gauge
patroni_xlog_replayed_timestamp{scope="batman",name="patroni1"} 0
# HELP patroni_xlog_paused Value is 1 if the Postgres xlog is paused, 0 otherwise.
# TYPE patroni_xlog_paused gauge
patroni_xlog_paused{scope="batman",name="patroni1"} 0
# HELP patroni_postgres_streaming Value is 1 if Postgres is streaming, 0 otherwise.
# TYPE patroni_postgres_streaming gauge
patroni_postgres_streaming{scope="batman",name="patroni1"} 1
# HELP patroni_postgres_in_archive_recovery Value is 1 if Postgres is replicating from archive, 0 otherwise.
# TYPE patroni_postgres_in_archive_recovery gauge
patroni_postgres_in_archive_recovery{scope="batman",name="patroni1"} 0
# HELP patroni_postgres_server_version Version of Postgres (if running), 0 otherwise.
# TYPE patroni_postgres_server_version gauge
patroni_postgres_server_version{scope="batman",name="patroni1"} 140004
# HELP patroni_cluster_unlocked Value is 1 if the cluster is unlocked, 0 if locked.
# TYPE patroni_cluster_unlocked gauge
patroni_cluster_unlocked{scope="batman",name="patroni1"} 0
# HELP patroni_postgres_timeline Postgres timeline of this node (if running), 0 otherwise.
# TYPE patroni_postgres_timeline counter
patroni_failsafe_mode_is_active{scope="batman",name="patroni1"} 0
# HELP patroni_postgres_timeline Postgres timeline of this node (if running), 0 otherwise.
# TYPE patroni_postgres_timeline counter
patroni_postgres_timeline{scope="batman",name="patroni1"} 24
# HELP patroni_dcs_last_seen Epoch timestamp when DCS was last contacted successfully by Patroni.
# TYPE patroni_dcs_last_seen gauge
patroni_dcs_last_seen{scope="batman",name="patroni1"} 1677658321
# HELP patroni_pending_restart Value is 1 if the node needs a restart, 0 otherwise.
# TYPE patroni_pending_restart gauge
patroni_pending_restart{scope="batman",name="patroni1"} 1
# HELP patroni_is_paused Value is 1 if auto failover is disabled, 0 otherwise.
# TYPE patroni_is_paused gauge
patroni_is_paused{scope="batman",name="patroni1"} 1
Cluster status endpoints
The
GET /cluster
endpoint generates a JSON document describing the current cluster topology and state:
$ curl -s http://localhost:8008/cluster | jq .
{
"members": [
{
"name": "patroni1",
"role": "leader",
"state": "running",
"api_url": "http://10.89.0.4:8008/patroni",
"host": "10.89.0.4",
"port": 5432,
"timeline": 5,
"tags": {
"clonefrom": true
}
},
{
"name": "patroni2",
"role": "replica",
"state": "streaming",
"api_url": "http://10.89.0.6:8008/patroni",
"host": "10.89.0.6",
"port": 5433,
"timeline": 5,
"tags": {
"clonefrom": true
},
"lag": 0
}
],
"scope": "demo",
"scheduled_switchover": {
"at": "2023-09-24T10:36:00+02:00",
"from": "patroni1",
"to": "patroni3"
}
}
The
GET /history
endpoint provides a view on the history of cluster switchovers/failovers. The format is very similar to the content of history files in thepg_wal
directory. The only difference is the timestamp field showing when the new timeline was created.
$ curl -s http://localhost:8008/history | jq .
[
[
1,
25623960,
"no recovery target specified",
"2019-09-23T16:57:57+02:00"
],
[
2,
25624344,
"no recovery target specified",
"2019-09-24T09:22:33+02:00"
],
[
3,
25624752,
"no recovery target specified",
"2019-09-24T09:26:15+02:00"
],
[
4,
50331856,
"no recovery target specified",
"2019-09-24T09:35:52+02:00"
]
]
Config endpoint
GET /config
: Get the current version of the dynamic configuration:
$ curl -s http://localhost:8008/config | jq .
{
"ttl": 30,
"loop_wait": 10,
"retry_timeout": 10,
"maximum_lag_on_failover": 1048576,
"postgresql": {
"use_slots": true,
"use_pg_rewind": true,
"parameters": {
"hot_standby": "on",
"wal_level": "hot_standby",
"max_wal_senders": 5,
"max_replication_slots": 5,
"max_connections": "100"
}
}
}
PATCH /config
: Change the existing configuration.
$ curl -s -XPATCH -d \
'{"loop_wait":5,"ttl":20,"postgresql":{"parameters":{"max_connections":"101"}}}' \
http://localhost:8008/config | jq .
{
"ttl": 20,
"loop_wait": 5,
"maximum_lag_on_failover": 1048576,
"retry_timeout": 10,
"postgresql": {
"use_slots": true,
"use_pg_rewind": true,
"parameters": {
"hot_standby": "on",
"wal_level": "hot_standby",
"max_wal_senders": 5,
"max_replication_slots": 5,
"max_connections": "101"
}
}
}
The above REST API call patches the existing configuration and returns the new configuration.
Let’s check that the node processed this configuration. First of all it should start printing log lines every 5 seconds (loop_wait=5). The change of “max_connections” requires a restart, so the “pending_restart” flag should be exposed:
$ curl -s http://localhost:8008/patroni | jq .
{
"pending_restart": true,
"database_system_identifier": "6287881213849985952",
"postmaster_start_time": "2016-06-13 13:13:05.211 CEST",
"xlog": {
"location": 2197818976
},
"patroni": {
"version": "1.0",
"scope": "batman",
"name": "patroni1"
},
"state": "running",
"role": "master",
"server_version": 90503
}
Removing parameters:
If you want to remove (reset) some setting just patch it with null
:
$ curl -s -XPATCH -d \
'{"postgresql":{"parameters":{"max_connections":null}}}' \
http://localhost:8008/config | jq .
{
"ttl": 20,
"loop_wait": 5,
"retry_timeout": 10,
"maximum_lag_on_failover": 1048576,
"postgresql": {
"use_slots": true,
"use_pg_rewind": true,
"parameters": {
"hot_standby": "on",
"unix_socket_directories": ".",
"wal_level": "hot_standby",
"max_wal_senders": 5,
"max_replication_slots": 5
}
}
}
The above call removes postgresql.parameters.max_connections
from the dynamic configuration.
PUT /config
: It’s also possible to perform the full rewrite of an existing dynamic configuration unconditionally:
$ curl -s -XPUT -d \
'{"maximum_lag_on_failover":1048576,"retry_timeout":10,"postgresql":{"use_slots":true,"use_pg_rewind":true,"parameters":{"hot_standby":"on","wal_level":"hot_standby","unix_socket_directories":".","max_wal_senders":5}},"loop_wait":3,"ttl":20}' \
http://localhost:8008/config | jq .
{
"ttl": 20,
"maximum_lag_on_failover": 1048576,
"retry_timeout": 10,
"postgresql": {
"use_slots": true,
"parameters": {
"hot_standby": "on",
"unix_socket_directories": ".",
"wal_level": "hot_standby",
"max_wal_senders": 5
},
"use_pg_rewind": true
},
"loop_wait": 3
}
Switchover and failover endpoints
Switchover
/switchover
endpoint only works when the cluster is healthy (there is a leader). It also allows to schedule a switchover at a given time.
When calling /switchover
endpoint a candidate can be specified but is not required, in contrast to /failover
endpoint. If a candidate is not provided, all the eligible nodes of the cluster will participate in the leader race after the leader stepped down.
In the JSON body of the POST
request you must specify the leader
field. The candidate
and the scheduled_at
fields are optional and can be used to schedule a switchover at a specific time.
Depending on the situation, requests might return different HTTP status codes and bodies. Status code 200 is returned when the switchover or failover successfully completed. If the switchover was successfully scheduled, Patroni will return HTTP status code 202. In case something went wrong, the error status code (one of 400, 412, or 503) will be returned with some details in the response body.
DELETE /switchover
can be used to delete the currently scheduled switchover.
Example: perform a switchover to any healthy standby
$ curl -s http://localhost:8008/switchover -XPOST -d '{"leader":"postgresql1"}'
Successfully switched over to "postgresql2"
Example: perform a switchover to a specific node
$ curl -s http://localhost:8008/switchover -XPOST -d \
'{"leader":"postgresql1","candidate":"postgresql2"}'
Successfully switched over to "postgresql2"
Example: schedule a switchover from the leader to any other healthy standby in the cluster at a specific time.
$ curl -s http://localhost:8008/switchover -XPOST -d \
'{"leader":"postgresql0","scheduled_at":"2019-09-24T12:00+00"}'
Switchover scheduled
Failover
/failover
endpoint can be used to perform a manual failover when there are no healthy nodes (e.g. to an asynchronous standby if all synchronous standbys are not healthy enough to promote). However there is no requirement for a cluster not to have leader - failover can also be run on a healthy cluster.
In the JSON body of the POST
request you must specify the candidate
field. If the leader
field is specified, a switchover is triggered instead.
Example:
$ curl -s http://localhost:8008/failover -XPOST -d '{"candidate":"postgresql1"}'
Successfully failed over to "postgresql1"
Warning
Be very careful when using this endpoint, as this can cause data loss in certain situations. In most cases, the switchover endpoint satisfies the administrator’s needs.
POST /switchover
and POST /failover
endpoints are used by patronictl switchover and patronictl failover, respectively.
DELETE /switchover
is used by patronictl flush cluster-name switchover.
Failover |
Switchover |
|
---|---|---|
Requires leader specified |
no |
yes |
Requires candidate specified |
yes |
no |
Can be run in pause |
yes |
yes (only to a specific candidate) |
Can be scheduled |
no |
yes (if not in pause) |
Healthy standby
There are a couple of checks that a member of a cluster should pass to be able to participate in the leader race during a switchover or to become a leader as a failover/switchover candidate:
be reachable via Patroni API;
not have
nofailover
tag set totrue
;have watchdog fully functional (if required by the configuration);
in case of a switchover in a healthy cluster or an automatic failover, not exceed maximum replication lag (
maximum_lag_on_failover
configuration parameter);in case of a switchover in a healthy cluster or an automatic failover, not have a timeline number smaller than the cluster timeline if
check_timeline
configuration parameter is set totrue
;in synchronous mode:
In case of a switchover (both with and without a candidate): be listed in the
/sync
key members;For a failover in both healthy and unhealthy clusters, this check is omitted.
Warning
- In case of a manual failover in a cluster without a leader, a candidate will be allowed to promote even if:
it is not in the
/sync
key members when synchronous mode is enabled;its lag exceeds the maximum replication lag allowed;
it has the timeline number smaller than the last known cluster timeline.
Restart endpoint
POST /restart
: You can restart Postgres on the specific node by performing thePOST /restart
call. In the JSON body ofPOST
request it is possible to optionally specify some restart conditions:restart_pending: boolean, if set to
true
Patroni will restart PostgreSQL only when restart is pending in order to apply some changes in the PostgreSQL config.role: perform restart only if the current role of the node matches with the role from the POST request.
postgres_version: perform restart only if the current version of postgres is smaller than specified in the POST request.
timeout: how long we should wait before PostgreSQL starts accepting connections. Overrides
primary_start_timeout
.schedule: timestamp with time zone, schedule the restart somewhere in the future.
DELETE /restart
: delete the scheduled restart
POST /restart
and DELETE /restart
endpoints are used by patronictl restart and patronictl flush cluster-name restart respectively.
Reload endpoint
The POST /reload
call will order Patroni to re-read and apply the configuration file. This is the equivalent of sending the SIGHUP
signal to the Patroni process. In case you changed some of the Postgres parameters which require a restart (like shared_buffers), you still have to explicitly do the restart of Postgres by either calling the POST /restart
endpoint or with the help of patronictl restart.
The reload endpoint is used by patronictl reload.
Reinitialize endpoint
POST /reinitialize
: reinitialize the PostgreSQL data directory on the specified node. It is allowed to be executed only on replicas. Once called, it will remove the data directory and start pg_basebackup
or some alternative replica creation method.
The call might fail if Patroni is in a loop trying to recover (restart) a failed Postgres. In order to overcome this problem one can specify {"force":true}
in the request body.
The reinitialize endpoint is used by patronictl reinit.
patronictl
Patroni has a command-line interface named patronictl
, which is used basically to interact with Patroni’s REST API and with the DCS. It is intended to make it easier to perform operations in the cluster, and can easily be used by humans or scripts.
Configuration
patronictl
uses 3 sections of the configuration:
ctl: how to authenticate against the Patroni REST API, and how to validate the server identity. Refer to ctl settings for more details;
restapi: how to authenticate against the Patroni REST API, and how to validate the server identity. Only used if
ctl
configuration is not enough.patronictl
is mainly interested inrestapi.authentication
section (in casectl.authentication
is missing) andrestapi.cafile
setting (in casectl.cacert
is missing). Refer to REST API settings for more details;DCS (e.g. etcd): how to contact and authenticate against the DCS used by Patroni.
Those configuration options can come either from environment variables or from a configuration file. Look for the above sections in Environment Configuration Settings or YAML Configuration Settings to understand how you can set the options for them through environment variables or through a configuration file.
If you opt for using environment variables, it’s a straight forward approach. Patronictl will read the environment variables and use their values.
If you opt for using a configuration file, you have different ways to inform patronictl
about the file to be used. By default patronictl
will attempt to load a configuration file named patronictl.yaml
, which is expected to be found under either of these paths, according to your system:
Mac OS X:
~/Library/Application Support/patroni
Mac OS X (POSIX):
~/.patroni
Unix:
~/.config/patroni
Unix (POSIX):
~/.patroni
Windows (roaming):
C:\Users\<user>\AppData\Roaming\patroni
Windows (not roaming):
C:\Users\<user>\AppData\Local\patroni
You can override that behavior either by:
Setting the environment variable
PATRONICTL_CONFIG_FILE
with the path to a custom configuration file;Using the
-c
/--config-file
command-line argument ofpatronictl
with the path to a custom configuration file.
Note
If you are running patronictl
in the same host as patroni
daemon is running, you may just use the same configuration file if it contains all the configuration sections required by patronictl
.
Usage
patronictl
exposes several handy operations. This section is intended to describe each of them.
Before jumping into each of the sub-commands of patronictl
, be aware that patronictl
itself has the following command-line arguments:
-c
/--config-file
As explained before, used to provide a path to a configuration file for
patronictl
.-d
/--dcs-url
/--dcs
Provide a connection string to the DCS used by Patroni.
This argument can be used either to override the DCS and
namespace
settings from thepatronictl
configuration, or to define it if it’s missing in the configuration.The value should be in the format
DCS://HOST:PORT/NAMESPACE
, e.g.etcd3://localhost:2379/service
to connect to etcd v3 running onlocalhost
with Patroni cluster stored underservice
namespace. Any part that is missing in the argument value will be replaced with the value present in the configuration or with its default.-k
/--insecure
Flag to bypass validation of REST API server SSL certificate.
This is the synopsis for running a command from the patronictl
:
patronictl [ { -c | --config-file } CONFIG_FILE ]
[ { -d | --dcs-url | --dcs } DCS_URL ]
[ { -k | --insecure } ]
SUBCOMMAND
Note
This is the syntax for the synopsis:
Options between square brackets are optional;
Options between curly brackets represent a “choose one of set” operation;
Options with
[, ... ]
can be specified multiple times;Things written in uppercase represent a literal that should be given a value to.
We will use this same syntax when describing patronictl
sub-commands in the following sub-sections.
Also, when describing sub-commands in the following sub-sections, the commands’ synposis should be seen as a replacement for the SUBCOMMAND
in the above synopsis.
In the following sub-sections you can find a description of each command implemented by patronictl
. For sake of example, we will use the configuration files present in the GitHub repository of Patroni (files postgres0.yml
, postgres1.yml
and postgres2.yml
).
patronictl dsn
Synopsis
dsn
[ CLUSTER_NAME ]
[ { { -r | --role } { leader | primary | standby-leader | replica | standby | any } | { -m | --member } MEMBER_NAME } ]
[ --group CITUS_GROUP ]
Description
patronictl dsn
gets the connection string for one member of the Patroni cluster.
If multiple members match the parameters of this command, one of them will be chosen, prioritizing the primary node.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
If not given,
patronictl
will attempt to fetch that from thescope
configuration, if it exists.-r
/--role
Choose a member that has the given role.
Role can be one of:
leader
: the leader of either a regular Patroni cluster or a standby Patroni cluster; orprimary
: the leader of a regular Patroni cluster; orstandby-leader
: the leader of a standby Patroni cluster; orreplica
: a replica of a Patroni cluster; orstandby
: same asreplica
; orany
: any role. Same as omitting this parameter; or
-m
/--member
Choose a member of the cluster with the given name.
MEMBER_NAME
is the name of the member.--group
Choose a member that is part of the given Citus group.
CITUS_GROUP
is the ID of the Citus group.
Examples
Get DSN of the primary node:
$ patronictl -c postgres0.yml dsn batman -r primary
host=127.0.0.1 port=5432
Get DSN of the node named postgresql1
:
$ patronictl -c postgres0.yml dsn batman --member postgresql1
host=127.0.0.1 port=5433
patronictl edit-config
Synopsis
edit-config
[ CLUSTER_NAME ]
[ --group CITUS_GROUP ]
[ { -q | --quiet } ]
[ { -s | --set } CONFIG="VALUE" [, ... ] ]
[ { -p | --pg } PG_CONFIG="PG_VALUE" [, ... ] ]
[ { --apply | --replace } CONFIG_FILE ]
[ --force ]
Description
patronictl edit-config
changes the dynamic configuration of the cluster and updates the DCS with that.
Note
When invoked through a TTY the command attempts to show a diff of the dynamic configuration through a pager. By default, it attempts to use either less
or more
. If you want a different pager, set the PAGER
environment variable with the desired one.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
If not given,
patronictl
will attempt to fetch that from thescope
configuration, if it exists.--group
Change dynamic configuration of the given Citus group.
If not given,
patronictl
will attempt to fetch that from thecitus.group
configuration, if it exists.CITUS_GROUP
is the ID of the Citus group.-q
/--quiet
Flag to skip showing the configuration diff.
-s
/--set
Set a given dynamic configuration option with a given value.
CONFIG
is the name of the dynamic configuration path in the YAML tree, with levels joined by.
.VALUE
is the value forCONFIG
. If it isnull
, thenCONFIG
will be removed from the dynamic configuration.-p
/--pg
Set a given dynamic Postgres configuration option with the given value.
It is essentially a shorthand for
--s
/--set
withCONFIG
prepended withpostgresql.parameters.
.PG_CONFIG
is the name of the Postgres configuration to be set.PG_VALUE
is the value forPG_CONFIG
. If it isnulll
, thenPG_CONFIG
will be removed from the dynamic configuration.--apply
Apply dynamic configuration from the given file.
It is similar to specifying multiple
-s
/--set
options, one for each configuration fromCONFIG_FILE
.CONFIG_FILE
is the path to a file containing the dynamic configuration to be applied, in YAML format. Use-
if you want to read fromstdin
.--replace
Replace the dynamic configuration in the DCS with the dynamic configuration specified in the given file.
CONFIG_FILE
is the path to a file containing the new dynamic configuration to take effect, in YAML format. Use-
if you want to read fromstdin
.--force
Flag to skip confirmation prompts when changing the dynamic configuration.
Useful for scripts.
Examples
Change max_connections
Postgres GUC:
patronictl -c postgres0.yml edit-config batman --pg max_connections="150" --force
---
+++
@@ -1,6 +1,8 @@
loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
+ parameters:
+ max_connections: 150
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host all all 0.0.0.0/0 md5
Configuration changed
Change loop_wait
and ttl
settings:
patronictl -c postgres0.yml edit-config batman --set loop_wait="15" --set ttl="45" --force
---
+++
@@ -1,4 +1,4 @@
-loop_wait: 10
+loop_wait: 15
maximum_lag_on_failover: 1048576
postgresql:
pg_hba:
@@ -6,4 +6,4 @@
- host all all 0.0.0.0/0 md5
use_pg_rewind: true
retry_timeout: 10
-ttl: 30
+ttl: 45
Configuration changed
Remove maximum_lag_on_failover
setting from dynamic configuration:
patronictl -c postgres0.yml edit-config batman --set maximum_lag_on_failover="null" --force
---
+++
@@ -1,5 +1,4 @@
loop_wait: 10
-maximum_lag_on_failover: 1048576
postgresql:
pg_hba:
- host replication replicator 127.0.0.1/32 md5
Configuration changed
patronictl failover
Synopsis
failover
[ CLUSTER_NAME ]
[ --group CITUS_GROUP ]
[ { --leader | --primary } LEADER_NAME ]
--candidate CANDIDATE_NAME
[ --force ]
Description
patronictl failover
performs a manual failover in the cluster.
It is designed to be used when the cluster is not healthy, e.g.:
There is no leader; or
There is no synchronous standby available in a synchronous cluster.
It also allows to fail over to an asynchronous node if synchronous mode is enabled.
Note
Nothing prevents you from running patronictl failover
in a healthy cluster. However, we recommend using patronictl switchover
in those cases.
Warning
Triggering a failover can cause data loss depending on how up-to-date the promoted replica is in comparison to the primary.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
If not given,
patronictl
will attempt to fetch that from thescope
configuration, if it exists.--group
Perform a failover in the given Citus group.
CITUS_GROUP
is the ID of the Citus group.--leader
/--primary
Indicate who is the expected leader at failover time.
If given, a switchover is performed instead of a failover.
LEADER_NAME
should match the name of the current leader in the cluster.Warning
This argument is deprecated and will be removed in a future release.
--candidate
The node to be promoted on failover.
CANDIDATE_NAME
is the name of the node to be promoted.--force
Flag to skip confirmation prompts when performing the failover.
Useful for scripts.
Examples
Fail over to node postgresql2
:
$ patronictl -c postgres0.yml failover batman --candidate postgresql2 --force
Current cluster topology
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 3 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 3 | 0 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 3 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
2023-09-12 11:52:27.50978 Successfully failed over to "postgresql2"
+ Cluster: batman (7277694203142172922) -+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+---------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Replica | stopped | | unknown |
| postgresql1 | 127.0.0.1:5433 | Replica | running | 3 | 0 |
| postgresql2 | 127.0.0.1:5434 | Leader | running | 3 | |
+-------------+----------------+---------+---------+----+-----------+
patronictl flush
Synopsis
flush
CLUSTER_NAME
[ MEMBER_NAME [, ... ] ]
{ restart | switchover }
[ --group CITUS_GROUP ]
[ { -r | --role } { leader | primary | standby-leader | replica | standby | any } ]
[ --force ]
Description
patronictl flush
discards scheduled events, if any.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
MEMBER_NAME
Discard scheduled events for the given Patroni member(s).
Multiple members can be specified. If no members are specified, all of them are considered.
Note
Only used if discarding scheduled restart events.
restart
Discard scheduled restart events.
switchover
Discard scheduled switchover event.
--group
Discard scheduled events from the given Citus group.
CITUS_GROUP
is the ID of the Citus group.-r
/--role
Discard scheduled events for members that have the given role.
Role can be one of:
leader
: the leader of either a regular Patroni cluster or a standby Patroni cluster; orprimary
: the leader of a regular Patroni cluster; orstandby-leader
: the leader of a standby Patroni cluster; orreplica
: a replica of a Patroni cluster; orstandby
: same asreplica
; orany
: any role. Same as omitting this parameter.
Note
Only used if discarding scheduled restart events.
--force
Flag to skip confirmation prompts when performing the flush.
Useful for scripts.
Examples
Discard a scheduled switchover event:
$ patronictl -c postgres0.yml flush batman switchover --force
Success: scheduled switchover deleted
Discard scheduled restart of all standby nodes:
$ patronictl -c postgres0.yml flush batman restart -r replica --force
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+---------------------------+
| Member | Host | Role | State | TL | Lag in MB | Scheduled restart |
+-------------+----------------+---------+-----------+----+-----------+---------------------------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 5 | | 2023-09-12T17:17:00+00:00 |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 5 | 0 | 2023-09-12T17:17:00+00:00 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 5 | 0 | 2023-09-12T17:17:00+00:00 |
+-------------+----------------+---------+-----------+----+-----------+---------------------------+
Success: flush scheduled restart for member postgresql1
Success: flush scheduled restart for member postgresql2
Discard scheduled restart of nodes postgresql0
and postgresql1
:
$ patronictl -c postgres0.yml flush batman postgresql0 postgresql1 restart --force
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+---------------------------+
| Member | Host | Role | State | TL | Lag in MB | Scheduled restart |
+-------------+----------------+---------+-----------+----+-----------+---------------------------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 5 | | 2023-09-12T17:17:00+00:00 |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 5 | 0 | 2023-09-12T17:17:00+00:00 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 5 | 0 | 2023-09-12T17:17:00+00:00 |
+-------------+----------------+---------+-----------+----+-----------+---------------------------+
Success: flush scheduled restart for member postgresql0
Success: flush scheduled restart for member postgresql1
patronictl history
Synopsis
history
[ CLUSTER_NAME ]
[ --group CITUS_GROUP ]
[ { -f | --format } { pretty | tsv | json | yaml } ]
Description
patronictl history
shows a history of failover and switchover events from the cluster, if any.
The following information is included in the output:
TL
Postgres timeline at which the event occurred.
LSN
Postgres LSN at which the event occurred.
Reason
Reason fetched from the Postgres
.history
file.Timestamp
Time when the event occurred.
New Leader
Patroni member that has been promoted during the event.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
If not given,
patronictl
will attempt to fetch that from thescope
configuration, if it exists.--group
Show history of events from the given Citus group.
CITUS_GROUP
is the ID of the Citus group.If not given,
patronictl
will attempt to fetch that from thecitus.group
configuration, if it exists.-f
/--format
How to format the list of events in the output.
Format can be one of:
pretty
: prints history as a pretty table; ortsv
: prints history as tabular information, with columns delimited by\t
; orjson
: prints history in JSON format; oryaml
: prints history in YAML format.
The default is
pretty
.--force
Flag to skip confirmation prompts when performing the flush.
Useful for scripts.
Examples
Show the history of events:
$ patronictl -c postgres0.yml history batman
+----+----------+------------------------------+----------------------------------+-------------+
| TL | LSN | Reason | Timestamp | New Leader |
+----+----------+------------------------------+----------------------------------+-------------+
| 1 | 24392648 | no recovery target specified | 2023-09-11T22:11:27.125527+00:00 | postgresql0 |
| 2 | 50331864 | no recovery target specified | 2023-09-12T11:34:03.148097+00:00 | postgresql0 |
| 3 | 83886704 | no recovery target specified | 2023-09-12T11:52:26.948134+00:00 | postgresql2 |
| 4 | 83887280 | no recovery target specified | 2023-09-12T11:53:09.620136+00:00 | postgresql0 |
+----+----------+------------------------------+----------------------------------+-------------+
Show the history of events in YAML format:
$ patronictl -c postgres0.yml history batman -f yaml
- LSN: 24392648
New Leader: postgresql0
Reason: no recovery target specified
TL: 1
Timestamp: '2023-09-11T22:11:27.125527+00:00'
- LSN: 50331864
New Leader: postgresql0
Reason: no recovery target specified
TL: 2
Timestamp: '2023-09-12T11:34:03.148097+00:00'
- LSN: 83886704
New Leader: postgresql2
Reason: no recovery target specified
TL: 3
Timestamp: '2023-09-12T11:52:26.948134+00:00'
- LSN: 83887280
New Leader: postgresql0
Reason: no recovery target specified
TL: 4
Timestamp: '2023-09-12T11:53:09.620136+00:00'
patronictl list
Synopsis
list
[ CLUSTER_NAME [, ... ] ]
[ --group CITUS_GROUP ]
[ { -e | --extended } ]
[ { -t | --timestamp } ]
[ { -f | --format } { pretty | tsv | json | yaml } ]
[ { -W | { -w | --watch } TIME } ]
Description
patronictl list
shows information about Patroni cluster and its members.
The following information is included in the output:
Cluster
Name of the Patroni cluster.
Member
Name of the Patroni member.
Host
Host where the member is located.
Role
Current role of the member.
Can be one among:
Leader
: the current leader of a regular Patroni cluster; orStandby Leader
: the current leader of a Patroni standby cluster; orSync Standby
: a synchronous standby of a Patroni cluster with synchronous mode enabled; orReplica
: a regular standby of a Patroni cluster.
State
Current state of Postgres in the Patroni member.
Some examples among the possible states:
running
: if Postgres is currently up and running;streaming
: if a replica and Postgres is currently streaming WALs from the primary node;in archive recovery
: if a replica and Postgres is currently fetching WALs from the archive;stopped
: if Postgres had been shut down;crashed
: if Postgres has crashed.
TL
Current Postgres timeline in the Patroni member.
Lag in MB
Amount worth of replication lag in megabytes between the Patroni member and its upstream.
Besides that, the following information may be included in the output:
System identifier
Postgres system identifier.
Note
Shown in the table header.
Only shown if output format is
pretty
.Group
Citus group ID.
Note
Shown in the table header.
Only shown if a Citus cluster.
Pending restart
*
indicates that the node needs a restart for some Postgres configuration to take effect. An empty value indicates the node does not require a restart.Note
Shown as a member attribute.
Shown if:
Printing in
pretty
ortsv
format and with extended output enabled; orIf node requires a restart.
Scheduled restart
Timestamp at which a restart has been scheduled for the Postgres instance managed by the Patroni member. An empty value indicates there is no scheduled restart for the member.
Note
Shown as a member attribute.
Shown if:
Printing in
pretty
ortsv
format and with extended output enabled; orIf node has a scheduled restart.
Tags
Contains tags set for the Patroni member. An empty value indicates that either no tags have been configured, or that they have been configured with default values.
Note
Shown as a member attribute.
Shown if:
Printing in
pretty
ortsv
format and with extended output enabled; orIf node has any custom tags, or any default tags with non-default values.
Scheduled switchover
Timestamp at which a switchover has been scheduled for the Patroni cluster, if any.
Note
Shown in the table footer.
Only shown if there is a scheduled switchover, and output format is
pretty
.
Maintenance mode
If the cluster monitoring is currently paused.
Note
Shown in the table footer.
Only shown if the cluster is paused, and output format is
pretty
.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
If not given,
patronictl
will attempt to fetch that from thescope
configuration, if it exists.--group
Show information about members from the given Citus group.
CITUS_GROUP
is the ID of the Citus group.-e
/--extended
Show extended information.
Force showing
Pending restart
,Scheduled restart
andTags
attributes, even if their value is empty.Note
Only applies to
pretty
andtsv
output formats.-t
/--timestamp
Print timestamp before printing information about the cluster and its members.
-f
/--format
How to format the list of events in the output.
Format can be one of:
pretty
: prints history as a pretty table; ortsv
: prints history as tabular information, with columns delimited by\t
; orjson
: prints history in JSON format; oryaml
: prints history in YAML format.
The default is
pretty
.-W
Automatically refresh information every 2 seconds.
-w
/--watch
Automatically refresh information at the specified interval.
TIME
is the interval between refreshes, in seconds.
Examples
Show information about the cluster in pretty format:
$ patronictl -c postgres0.yml list batman
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 5 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 5 | 0 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 5 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
Show information about the cluster in pretty format with extended columns:
$ patronictl -c postgres0.yml list batman -e
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+-----------------+-------------------+------+
| Member | Host | Role | State | TL | Lag in MB | Pending restart | Scheduled restart | Tags |
+-------------+----------------+---------+-----------+----+-----------+-----------------+-------------------+------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 5 | | | | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 5 | 0 | | | |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 5 | 0 | | | |
+-------------+----------------+---------+-----------+----+-----------+-----------------+-------------------+------+
Show information about the cluster in YAML format, with timestamp of execution:
$ patronictl -c postgres0.yml list batman -f yaml -t
2023-09-12 13:30:48
- Cluster: batman
Host: 127.0.0.1:5432
Member: postgresql0
Role: Leader
State: running
TL: 5
- Cluster: batman
Host: 127.0.0.1:5433
Lag in MB: 0
Member: postgresql1
Role: Replica
State: streaming
TL: 5
- Cluster: batman
Host: 127.0.0.1:5434
Lag in MB: 0
Member: postgresql2
Role: Replica
State: streaming
TL: 5
patronictl pause
Synopsis
pause
[ CLUSTER_NAME ]
[ --group CITUS_GROUP ]
[ --wait ]
Description
patronictl pause
temporarily puts the Patroni cluster in maintenance mode and disables automatic failover.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
If not given,
patronictl
will attempt to fetch that from thescope
configuration, if it exists.--group
Pause the given Citus group.
CITUS_GROUP
is the ID of the Citus group.If not given,
patronictl
will attempt to fetch that from thecitus.group
configuration, if it exists.--wait
Wait until all Patroni members are paused before returning control to the caller.
Examples
Put the cluster in maintenance mode, and wait until all nodes have been paused:
$ patronictl -c postgres0.yml pause batman --wait
'pause' request sent, waiting until it is recognized by all nodes
Success: cluster management is paused
patronictl query
Synopsis
query
[ CLUSTER_NAME ]
[ --group CITUS_GROUP ]
[ { { -r | --role } { leader | primary | standby-leader | replica | standby | any } | { -m | --member } MEMBER_NAME } ]
[ { -d | --dbname } DBNAME ]
[ { -U | --username } USERNAME ]
[ --password ]
[ --format { pretty | tsv | json | yaml } ]
[ { { -f | --file } FILE_NAME | { -c | --command } SQL_COMMAND } ]
[ --delimiter ]
[ { -W | { -w | --watch } TIME } ]
Description
patronictl query
executes a SQL command or script against a member of the Patroni cluster.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
If not given,
patronictl
will attempt to fetch that from thescope
configuration, if it exists.--group
Query the given Citus group.
CITUS_GROUP
is the ID of the Citus group.-r
/--role
Choose a member that has the given role.
Role can be one of:
leader
: the leader of either a regular Patroni cluster or a standby Patroni cluster; orprimary
: the leader of a regular Patroni cluster; orstandby-leader
: the leader of a standby Patroni cluster; orreplica
: a replica of a Patroni cluster; orstandby
: same asreplica
; orany
: any role. Same as omitting this parameter.
-m
/--member
Choose a member that has the given name.
MEMBER_NAME
is the name of the member to be picked.-d
/--dbname
Database to connect and run the query.
DBNAME
is the name of the database. If not given, defaults toUSERNAME
.-U
/--username
User to connect to the database.
USERNAME
name of the user. If not given, defaults to the operating system user runningpatronictl query
.--password
Prompt for the password of the connecting user.
As Patroni uses
libpq
, alternatively you can create a~/.pgpass
file or set thePGPASSWORD
environment variable.--format
How to format the output of the query.
Format can be one of:
pretty
: prints query output as a pretty table; ortsv
: prints query output as tabular information, with columns delimited by\t
; orjson
: prints query output in JSON format; oryaml
: prints query output in YAML format.
The default is
tsv
.-f
/--file
Use a file as source of commands to run queries.
FILE_NAME
is the path to the source file.-c
/--command
Run the given SQL command in the query.
SQL_COMMAND
is the SQL command to be executed.--delimiter
The delimiter when printing information in
tsv
format, or\t
if omitted.-W
Automatically re-run the query every 2 seconds.
-w
/--watch
Automatically re-run the query at the specified interval.
TIME
is the interval between re-runs, in seconds.
Examples
Run a SQL command as postgres
user, and ask for its password:
$ patronictl -c postgres0.yml query batman -U postgres --password -c "SELECT now()"
Password:
now
2023-09-12 18:10:53.228084+00:00
Run a SQL command as postgres
user, and take password from libpq
environment variable:
$ PGPASSWORD=zalando patronictl -c postgres0.yml query batman -U postgres -c "SELECT now()"
now
2023-09-12 18:11:37.639500+00:00
Run a SQL command and print in pretty
format every 2 seconds:
$ patronictl -c postgres0.yml query batman -c "SELECT now()" --format pretty -W
+----------------------------------+
| now |
+----------------------------------+
| 2023-09-12 18:12:16.716235+00:00 |
+----------------------------------+
+----------------------------------+
| now |
+----------------------------------+
| 2023-09-12 18:12:18.732645+00:00 |
+----------------------------------+
+----------------------------------+
| now |
+----------------------------------+
| 2023-09-12 18:12:20.750573+00:00 |
+----------------------------------+
Run a SQL command on database test
and print the output in YAML format:
$ patronictl -c postgres0.yml query batman -d test -c "SELECT now() AS column_1, 'test' AS column_2" --format yaml
- column_1: 2023-09-12 18:14:22.052060+00:00
column_2: test
Run a SQL command on member postgresql2
:
$ patronictl -c postgres0.yml query batman -m postgresql2 -c "SHOW port"
port
5434
Run a SQL command on any of the standbys:
$ patronictl -c postgres0.yml query batman -r replica -c "SHOW port"
port
5433
patronictl reinit
Synopsis
reinit
CLUSTER_NAME
[ MEMBER_NAME [, ... ] ]
[ --group CITUS_GROUP ]
[ --wait ]
[ --force ]
Description
patronictl reinit
rebuilds a Postgres standby instance managed by a replica member of the Patroni cluster.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
MEMBER_NAME
Name of the replica member for which the Postgres instance will be rebuilt.
Multiple replica members can be specified. If no members are specified, the command does nothing.
--group
Rebuild a replica member of the given Citus group.
CITUS_GROUP
is the ID of the Citus group.--wait
Wait until the reinitialization of the Postgres standby node(s) is finished.
--force
Flag to skip confirmation prompts when rebuilding Postgres standby instances.
Useful for scripts.
Examples
Request a rebuild of all replica members of the Patroni cluster and immediately return control to the caller:
$ patronictl -c postgres0.yml reinit batman postgresql1 postgresql2 --force
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 5 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 5 | 0 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 5 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
Success: reinitialize for member postgresql1
Success: reinitialize for member postgresql2
Request a rebuild of postgresql2
and wait for it to complete:
$ patronictl -c postgres0.yml reinit batman postgresql2 --wait --force
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 5 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 5 | 0 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 5 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
Success: reinitialize for member postgresql2
Waiting for reinitialize to complete on: postgresql2
Reinitialize is completed on: postgresql2
patronictl reload
Synopsis
reload
CLUSTER_NAME
[ MEMBER_NAME [, ... ] ]
[ --group CITUS_GROUP ]
[ { -r | --role } { leader | primary | standby-leader | replica | standby | any } ]
[ --force ]
Description
patronictl reload
requests a reload of local configuration for one or more Patroni members.
It also triggers pg_ctl reload
on the managed Postgres instance, even if nothing has changed.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
MEMBER_NAME
Request a reload of local configuration for the given Patroni member(s).
Multiple members can be specified. If no members are specified, all of them are considered.
--group
Request a reload of members of the given Citus group.
CITUS_GROUP
is the ID of the Citus group.-r
/--role
Select members that have the given role.
Role can be one of:
leader
: the leader of either a regular Patroni cluster or a standby Patroni cluster; orprimary
: the leader of a regular Patroni cluster; orstandby-leader
: the leader of a standby Patroni cluster; orreplica
: a replica of a Patroni cluster; orstandby
: same asreplica
; orany
: any role. Same as omitting this parameter.
--force
Flag to skip confirmation prompts when requesting a reload of the local configuration.
Useful for scripts.
Examples
Request a reload of the local configuration of all members of the Patroni cluster:
$ patronictl -c postgres0.yml reload batman --force
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 5 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 5 | 0 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 5 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
Reload request received for member postgresql0 and will be processed within 10 seconds
Reload request received for member postgresql1 and will be processed within 10 seconds
Reload request received for member postgresql2 and will be processed within 10 seconds
patronictl remove
Synopsis
remove
CLUSTER_NAME
[ --group CITUS_GROUP ]
[ { -f | --format } { pretty | tsv | json | yaml } ]
Description
patronictl remove
removes information of the cluster from the DCS.
It is an interactive action.
Warning
This operation will destroy the information of the Patroni cluster from the DCS.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
--group
Remove information about the Patroni cluster related with the given Citus group.
CITUS_GROUP
is the ID of the Citus group.-f
/--format
How to format the list of members in the output when prompting for confirmation.
Format can be one of:
pretty
: prints members as a pretty table; ortsv
: prints members as tabular information, with columns delimited by\t
; orjson
: prints members in JSON format; oryaml
: prints members in YAML format.
The default is
pretty
.
Examples
Remove information about Patroni cluster batman
from the DCS:
$ patronictl -c postgres0.yml remove batman
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 5 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 5 | 0 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 5 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
Please confirm the cluster name to remove: batman
You are about to remove all information in DCS for batman, please type: "Yes I am aware": Yes I am aware
This cluster currently is healthy. Please specify the leader name to continue: postgresql0
patronictl restart
Synopsis
restart
CLUSTER_NAME
[ MEMBER_NAME [, ...] ]
[ --group CITUS_GROUP ]
[ { -r | --role } { leader | primary | standby-leader | replica | standby | any } ]
[ --any ]
[ --pg-version PG_VERSION ]
[ --pending ]
[ --timeout TIMEOUT ]
[ --scheduled TIMESTAMP ]
[ --force ]
Description
patronictl restart
requests a restart of the Postgres instance managed by a member of the Patroni cluster.
The restart can be performed immediately or scheduled for later.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
--group
Restart the Patroni cluster related with the given Citus group.
CITUS_GROUP
is the ID of the Citus group.-r
/--role
Choose members that have the given role.
Role can be one of:
leader
: the leader of either a regular Patroni cluster or a standby Patroni cluster; orprimary
: the leader of a regular Patroni cluster; orstandby-leader
: the leader of a standby Patroni cluster; orreplica
: a replica of a Patroni cluster; orstandby
: same asreplica
; orany
: any role. Same as omitting this parameter.
--any
Restart a single random node among the ones which match the given filters.
--pg-version
Select only members which version of the managed Postgres instance is older than the given version.
PG_VERSION
is the Postgres version to be compared.--pending
Select only members which are flagged as
Pending restart
.timeout
Abort the restart if it takes more than the specified timeout, and fail over to a replica if the issue is on the primary.
TIMEOUT
is the amount of seconds to wait before aborting the restart.--scheduled
Schedule a restart to occur at the given timestamp.
TIMESTAMP
is the timestamp when the restart should occur. Specify it in unambiguous format, preferrably with time zone. You can also use the literalnow
for the restart to be executed immediately.--force
Flag to skip confirmation prompts when requesting the restart operations.
Useful for scripts.
Examples
Restart all members of the cluster immediately:
$ patronictl -c postgres0.yml restart batman --force
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 6 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 6 | 0 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 6 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
Success: restart on member postgresql0
Success: restart on member postgresql1
Success: restart on member postgresql2
Restart a random member of the cluster immediately:
$ patronictl -c postgres0.yml restart batman --any --force
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 6 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 6 | 0 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 6 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
Success: restart on member postgresql1
Schedule a restart to occur at 2023-09-13T18:00-03:00
:
$ patronictl -c postgres0.yml restart batman --scheduled 2023-09-13T18:00-03:00 --force
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 6 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 6 | 0 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 6 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
Success: restart scheduled on member postgresql0
Success: restart scheduled on member postgresql1
Success: restart scheduled on member postgresql2
patronictl resume
Synopsis
resume
[ CLUSTER_NAME ]
[ --group CITUS_GROUP ]
[ --wait ]
Description
patronictl resume
takes the Patroni cluster out of maintenance mode and re-enables automatic failover.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
If not given,
patronictl
will attempt to fetch that from thescope
configuration, if it exists.--group
Resume the given Citus group.
CITUS_GROUP
is the ID of the Citus group.If not given,
patronictl
will attempt to fetch that from thecitus.group
configuration, if it exists.--wait
Wait until all Patroni members are unpaused before returning control to the caller.
Examples
Put the cluster out of maintenance mode:
$ patronictl -c postgres0.yml resume batman --wait
'resume' request sent, waiting until it is recognized by all nodes
Success: cluster management is resumed
patronictl show-config
Synopsis
show-config
[ CLUSTER_NAME ]
[ --group CITUS_GROUP ]
Description
patronictl show-config
shows the dynamic configuration of the cluster that is stored in the DCS.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
If not given,
patronictl
will attempt to fetch that from thescope
configuration, if it exists.--group
Show dynamic configuration of the given Citus group.
CITUS_GROUP
is the ID of the Citus group.If not given,
patronictl
will attempt to fetch that from thecitus.group
configuration, if it exists.
Examples
Show dynamic configuration of cluster batman
:
$ patronictl -c postgres0.yml show-config batman
loop_wait: 10
postgresql:
parameters:
max_connections: 250
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host all all 0.0.0.0/0 md5
use_pg_rewind: true
retry_timeout: 10
ttl: 30
patronictl switchover
Synopsis
switchover
[ CLUSTER_NAME ]
[ --group CITUS_GROUP ]
[ { --leader | --primary } LEADER_NAME ]
--candidate CANDIDATE_NAME
[ --force ]
Description
patronictl switchover
performs a switchover in the cluster.
It is designed to be used when the cluster is healthy, e.g.:
There is a leader;
There are synchronous standbys available in a synchronous cluster.
Note
If your cluster is unhealthy you might be interested in patronictl failover
instead.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
If not given,
patronictl
will attempt to fetch that from thescope
configuration, if it exists.--group
Perform a switchover in the given Citus group.
CITUS_GROUP
is the ID of the Citus group.--leader
/--primary
Indicate who is the leader to be demoted at switchover time.
LEADER_NAME
should match the name of the current leader in the cluster.--candidate
The node to be promoted on switchover, and take the primary role.
CANDIDATE_NAME
is the name of the node to be promoted.--scheduled
Schedule a switchover to occur at the given timestamp.
TIMESTAMP
is the timestamp when the switchover should occur. Specify it in unambiguous format, preferrably with time zone. You can also use the literalnow
for the switchover to be executed immediately.--force
Flag to skip confirmation prompts when performing the switchover.
Useful for scripts.
Examples
Switch over with node postgresql2
:
$ patronictl -c postgres0.yml switchover batman --leader postgresql0 --candidate postgresql2 --force
Current cluster topology
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 6 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 6 | 0 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 6 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
2023-09-13 14:15:23.07497 Successfully switched over to "postgresql2"
+ Cluster: batman (7277694203142172922) -+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+---------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Replica | stopped | | unknown |
| postgresql1 | 127.0.0.1:5433 | Replica | running | 6 | 0 |
| postgresql2 | 127.0.0.1:5434 | Leader | running | 6 | |
+-------------+----------------+---------+---------+----+-----------+
Schedule a switchover between postgresql0
and postgresql2
to occur at 2023-09-13T18:00:00-03:00
:
$ patronictl -c postgres0.yml switchover batman --leader postgresql0 --candidate postgresql2 --scheduled 2023-09-13T18:00-03:00 --force
Current cluster topology
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 8 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 8 | 0 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 8 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
2023-09-13 14:18:11.20661 Switchover scheduled
+ Cluster: batman (7277694203142172922) -+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 8 | |
| postgresql1 | 127.0.0.1:5433 | Replica | streaming | 8 | 0 |
| postgresql2 | 127.0.0.1:5434 | Replica | streaming | 8 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
Switchover scheduled at: 2023-09-13T18:00:00-03:00
from: postgresql0
to: postgresql2
patronictl topology
Synopsis
topology
[ CLUSTER_NAME [, ... ] ]
[ --group CITUS_GROUP ]
[ { -W | { -w | --watch } TIME } ]
Description
patronictl topology
shows information about the Patroni cluster and its members with a tree view approach.
The following information is included in the output:
Cluster
Name of the Patroni cluster.
Note
Shown in the table header.
System identifier
Postgres system identifier.
Note
Shown in the table header.
Member
Name of the Patroni member.
Note
Information in this column is shown as a tree view of members in terms of replication connections.
Host
Host where the member is located.
Role
Current role of the member.
Can be one among:
Leader
: the current leader of a regular Patroni cluster; orStandby Leader
: the current leader of a Patroni standby cluster; orSync Standby
: a synchronous standby of a Patroni cluster with synchronous mode enabled; orReplica
: a regular standby of a Patroni cluster.
State
Current state of Postgres in the Patroni member.
Some examples among the possible states:
running
: if Postgres is currently up and running;streaming
: if a replica and Postgres is currently streaming WALs from the primary node;in archive recovery
: if a replica and Postgres is currently fetching WALs from the archive;stopped
: if Postgres had been shut down;crashed
: if Postgres has crashed.
TL
Current Postgres timeline in the Patroni member.
Lag in MB
Amount worth of replication lag in megabytes between the Patroni member and its upstream.
Besides that, the following information may be included in the output:
Group
Citus group ID.
Note
Shown in the table header.
Only shown if a Citus cluster.
Pending restart
*
indicates the node needs a restart for some Postgres configuration to take effect. An empty value indicates the node does not require a restart.Note
Shown as a member attribute.
Shown if node requires a restart.
Scheduled restart
Timestamp at which a restart has been scheduled for the Postgres instance managed by the Patroni member. An empty value indicates there is no scheduled restart for the member.
Note
Shown as a member attribute.
Shown if node has a scheduled restart.
Tags
Contains tags set for the Patroni member. An empty value indicates that either no tags have been configured, or that they have been configured with default values.
Note
Shown as a member attribute.
Shown if node has any custom tags, or any default tags with non-default values.
Scheduled switchover
Timestamp at which a switchover has been scheduled for the Patroni cluster, if any.
Note
Shown in the table footer.
Only shown if there is a scheduled switchover.
Maintenance mode
If the cluster monitoring is currently paused.
Note
Shown in the table footer.
Only shown if the cluster is paused.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
If not given,
patronictl
will attempt to fetch that from thescope
configuration, if it exists.--group
Show information about members from the given Citus group.
CITUS_GROUP
is the ID of the Citus group.-W
Automatically refresh information every 2 seconds.
-w
/--watch
Automatically refresh information at the specified interval.
TIME
is the interval between refreshes, in seconds.
Examples
Show topology of the cluster batman
– postgresql1
and postgresql2
are replicating from postgresql0
:
$ patronictl -c postgres0.yml topology batman
+ Cluster: batman (7277694203142172922) ---+-----------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+---------------+----------------+---------+-----------+----+-----------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 8 | |
| + postgresql1 | 127.0.0.1:5433 | Replica | streaming | 8 | 0 |
| + postgresql2 | 127.0.0.1:5434 | Replica | streaming | 8 | 0 |
+---------------+----------------+---------+-----------+----+-----------+
patronictl version
Synopsis
version
[ CLUSTER_NAME [, ... ] ]
[ MEMBER_NAME [, ... ] ]
[ --group CITUS_GROUP ]
Description
patronictl version
gets the version of patronictl
application. Besides that it may also include version information about Patroni clusters and their members.
Parameters
CLUSTER_NAME
Name of the Patroni cluster.
MEMBER_NAME
Name of the member of the Patroni cluster.
--group
Consider a Patroni cluster with the given Citus group.
CITUS_GROUP
is the ID of the Citus group.
Examples
Get version of patronictl
only:
$ patronictl -c postgres0.yml version
patronictl version 3.1.0
Get version of patronictl
and of all members of cluster batman
:
$ patronictl -c postgres0.yml version batman
patronictl version 3.1.0
postgresql0: Patroni 3.1.0 PostgreSQL 15.2
postgresql1: Patroni 3.1.0 PostgreSQL 15.2
postgresql2: Patroni 3.1.0 PostgreSQL 15.2
Get version of patronictl
and of members postgresql1
and postgresql2
of cluster batman
:
$ patronictl -c postgres0.yml version batman postgresql1 postgresql2
patronictl version 3.1.0
postgresql1: Patroni 3.1.0 PostgreSQL 15.2
postgresql2: Patroni 3.1.0 PostgreSQL 15.2
Replica imaging and bootstrap
Patroni allows customizing creation of a new replica. It also supports defining what happens when the new empty cluster
is being bootstrapped. The distinction between two is well defined: Patroni creates replicas only if the initialize
key is present in DCS for the cluster. If there is no initialize
key - Patroni calls bootstrap exclusively on the
first node that takes the initialize key lock.
Bootstrap
PostgreSQL provides initdb
command to initialize a new cluster and Patroni calls it by default. In certain cases,
particularly when creating a new cluster as a copy of an existing one, it is necessary to replace a built-in method with
custom actions. Patroni supports executing user-defined scripts to bootstrap new clusters, supplying some required
arguments to them, i.e. the name of the cluster and the path to the data directory. This is configured in the
bootstrap
section of the Patroni configuration. For example:
bootstrap:
method: <custom_bootstrap_method_name>
<custom_bootstrap_method_name>:
command: <path_to_custom_bootstrap_script> [param1 [, ...]]
keep_existing_recovery_conf: False
no_params: False
recovery_conf:
recovery_target_action: promote
recovery_target_timeline: latest
restore_command: <method_specific_restore_command>
Each bootstrap method must define at least a name
and a command
. A special initdb
method is available to trigger
the default behavior, in which case method
parameter can be omitted altogether. The command
can be specified using either
an absolute path, or the one relative to the patroni
command location. In addition to the fixed parameters defined
in the configuration files, Patroni supplies two cluster-specific ones:
- --scope
Name of the cluster to be bootstrapped
- --datadir
Path to the data directory of the cluster instance to be bootstrapped
Passing these two additional flags can be disabled by setting a special no_params
parameter to True
.
If the bootstrap script returns 0, Patroni tries to configure and start the PostgreSQL instance produced by it. If any of the intermediate steps fail, or the script returns a non-zero value, Patroni assumes that the bootstrap has failed, cleans up after itself and releases the initialize lock to give another node the opportunity to bootstrap.
If a recovery_conf
block is defined in the same section as the custom bootstrap method, Patroni will generate a
recovery.conf
before starting the newly bootstrapped instance. Typically, such recovery.conf should contain at least
one of the recovery_target_*
parameters, together with the recovery_target_timeline
set to promote
.
If keep_existing_recovery_conf
is defined and set to True
, Patroni will not remove the existing recovery.conf
file if it exists.
This is useful when bootstrapping from a backup with tools like pgBackRest that generate the appropriate recovery.conf
for you.
Note
Bootstrap methods are neither chained, nor fallen-back to the default one in case the primary one fails
Building replicas
Patroni uses tried and proven pg_basebackup
in order to create new replicas. One downside of it is that it requires
a running leader node. Another one is the lack of ‘on-the-fly’ compression for the backup data and no built-in cleanup
for outdated backup files. Some people prefer other backup solutions, such as WAL-E
, pgBackRest
, Barman
and
others, or simply roll their own scripts. In order to accommodate all those use-cases Patroni supports running custom
scripts to clone a new replica. Those are configured in the postgresql
configuration block:
postgresql:
create_replica_methods:
- <method name>
<method name>:
command: <command name>
keep_data: True
no_params: True
no_leader: 1
example: wal_e
postgresql:
create_replica_methods:
- wal_e
- basebackup
wal_e:
command: patroni_wale_restore
no_leader: 1
envdir: {{WALE_ENV_DIR}}
use_iam: 1
basebackup:
max-rate: '100M'
example: pgbackrest
postgresql:
create_replica_methods:
- pgbackrest
- basebackup
pgbackrest:
command: /usr/bin/pgbackrest --stanza=<scope> --delta restore
keep_data: True
no_params: True
basebackup:
max-rate: '100M'
The create_replica_methods
defines available replica creation methods and the order of executing them. Patroni will
stop on the first one that returns 0. Each method should define a separate section in the configuration file, listing the command
to execute and any custom parameters that should be passed to that command. All parameters will be passed in a
--name=value
format. Besides user-defined parameters, Patroni supplies a couple of cluster-specific ones:
- --scope
Which cluster this replica belongs to
- --datadir
Path to the data directory of the replica
- --role
Always ‘replica’
- --connstring
Connection string to connect to the cluster member to clone from (primary or other replica). The user in the connection string can execute SQL and replication protocol commands.
A special no_leader
parameter, if defined, allows Patroni to call the replica creation method even if there is no
running leader or replicas. In that case, an empty string will be passed in a connection string. This is useful for
restoring the formerly running cluster from the binary backup.
A special keep_data
parameter, if defined, will instruct Patroni to not clean PGDATA folder before calling restore.
A special no_params
parameter, if defined, restricts passing parameters to custom command.
A basebackup
method is a special case: it will be used if
create_replica_methods
is empty, although it is possible
to list it explicitly among the create_replica_methods
methods. This method initializes a new replica with the
pg_basebackup
, the base backup is taken from the leader unless there are replicas with clonefrom
tag, in which case one
of such replicas will be used as the origin for pg_basebackup. It works without any configuration; however, it is
possible to specify a basebackup
configuration section. Same rules as with the other method configuration apply,
namely, only long (with –) options should be specified there. Not all parameters make sense, if you override a connection
string or provide an option to created tar-ed or compressed base backups, patroni won’t be able to make a replica out
of it. There is no validation performed on the names or values of the parameters passed to the basebackup
section.
Also note that in case symlinks are used for the WAL folder it is up to the user to specify the correct --waldir
path as an option, so that after replica buildup or re-initialization the symlink would persist. This option is supported
only since v10 though.
You can specify basebackup parameters as either a map (key-value pairs) or a list of elements, where each element
could be either a key-value pair or a single key (for options that does not receive any values, for instance, --verbose
).
Consider those 2 examples:
postgresql:
basebackup:
max-rate: '100M'
checkpoint: 'fast'
and
postgresql:
basebackup:
- verbose
- max-rate: '100M'
- waldir: /pg-wal-mount/external-waldir
If all replica creation methods fail, Patroni will try again all methods in order during the next event loop cycle.
Standby cluster
Another available option is to run a “standby cluster”, that contains only of standby nodes replicating from some remote node. This type of clusters has:
“standby leader”, that behaves pretty much like a regular cluster leader, except it replicates from a remote node.
cascade replicas, that are replicating from standby leader.
Standby leader holds and updates a leader lock in DCS. If the leader lock expires, cascade replicas will perform an election to choose another leader from the standbys.
There is no further relationship between the standby cluster and the primary cluster it replicates from, in particular, they must not share the same DCS scope if they use the same DCS. They do not know anything else from each other apart from replication information. Also, the standby cluster is not being displayed in patronictl list or patronictl topology output on the primary cluster.
For the sake of flexibility, you can specify methods of creating a replica and recovery WAL records when a cluster is in the “standby mode” by providing create_replica_methods key in standby_cluster section. It is distinct from creating replicas, when cluster is detached and functions as a normal cluster, which is controlled by create_replica_methods in postgresql section. Both “standby” and “normal” create_replica_methods reference keys in postgresql section.
To configure such cluster you need to specify the section standby_cluster
in a patroni configuration:
bootstrap:
dcs:
standby_cluster:
host: 1.2.3.4
port: 5432
primary_slot_name: patroni
create_replica_methods:
- basebackup
Note, that these options will be applied only once during cluster bootstrap, and the only way to change them afterwards is through DCS.
Patroni expects to find postgresql.conf or postgresql.conf.backup in PGDATA of the remote primary and will not start if it does not find it after a basebackup. If the remote primary keeps its postgresql.conf elsewhere, it is your responsibility to copy it to PGDATA.
If you use replication slots on the standby cluster, you must also create the corresponding replication slot on the primary cluster. It will not be done automatically by the standby cluster implementation. You can use Patroni’s permanent replication slots feature on the primary cluster to maintain a replication slot with the same name as primary_slot_name
, or its default value if primary_slot_name
is not provided.
Replication modes
Patroni uses PostgreSQL streaming replication. For more information about streaming replication, see the Postgres documentation. By default Patroni configures PostgreSQL for asynchronous replication. Choosing your replication schema is dependent on your business considerations. Investigate both async and sync replication, as well as other HA solutions, to determine which solution is best for you.
Asynchronous mode durability
In asynchronous mode the cluster is allowed to lose some committed transactions to ensure availability. When the primary server fails or becomes unavailable for any other reason Patroni will automatically promote a sufficiently healthy standby to primary. Any transactions that have not been replicated to that standby remain in a “forked timeline” on the primary, and are effectively unrecoverable [1].
The amount of transactions that can be lost is controlled via maximum_lag_on_failover
parameter. Because the primary transaction log position is not sampled in real time, in reality the amount of lost data on failover is worst case bounded by maximum_lag_on_failover
bytes of transaction log plus the amount that is written in the last ttl
seconds (loop_wait
/2 seconds in the average case). However typical steady state replication delay is well under a second.
By default, when running leader elections, Patroni does not take into account the current timeline of replicas, what in some cases could be undesirable behavior. You can prevent the node not having the same timeline as a former primary become the new leader by changing the value of check_timeline
parameter to true
.
PostgreSQL synchronous replication
You can use Postgres’s synchronous replication with Patroni. Synchronous replication ensures consistency across a cluster by confirming that writes are written to a secondary before returning to the connecting client with a success. The cost of synchronous replication: reduced throughput on writes. This throughput will be entirely based on network performance.
In hosted datacenter environments (like AWS, Rackspace, or any network you do not control), synchronous replication significantly increases the variability of write performance. If followers become inaccessible from the leader, the leader effectively becomes read-only.
To enable a simple synchronous replication test, add the following lines to the parameters
section of your YAML configuration files:
synchronous_commit: "on"
synchronous_standby_names: "*"
When using PostgreSQL synchronous replication, use at least three Postgres data nodes to ensure write availability if one host fails.
Using PostgreSQL synchronous replication does not guarantee zero lost transactions under all circumstances. When the primary and the secondary that is currently acting as a synchronous replica fail simultaneously a third node that might not contain all transactions will be promoted.
Synchronous mode
For use cases where losing committed transactions is not permissible you can turn on Patroni’s synchronous_mode
. When synchronous_mode
is turned on Patroni will not promote a standby unless it is certain that the standby contains all transactions that may have returned a successful commit status to client [2]. This means that the system may be unavailable for writes even though some servers are available. System administrators can still use manual failover commands to promote a standby even if it results in transaction loss.
Turning on synchronous_mode
does not guarantee multi node durability of commits under all circumstances. When no suitable standby is available, primary server will still accept writes, but does not guarantee their replication. When the primary fails in this mode no standby will be promoted. When the host that used to be the primary comes back it will get promoted automatically, unless system administrator performed a manual failover. This behavior makes synchronous mode usable with 2 node clusters.
When synchronous_mode
is on and a standby crashes, commits will block until next iteration of Patroni runs and switches the primary to standalone mode (worst case delay for writes ttl
seconds, average case loop_wait
/2 seconds). Manually shutting down or restarting a standby will not cause a commit service interruption. Standby will signal the primary to release itself from synchronous standby duties before PostgreSQL shutdown is initiated.
When it is absolutely necessary to guarantee that each write is stored durably
on at least two nodes, enable synchronous_mode_strict
in addition to the
synchronous_mode
. This parameter prevents Patroni from switching off the
synchronous replication on the primary when no synchronous standby candidates
are available. As a downside, the primary is not be available for writes
(unless the Postgres transaction explicitly turns off synchronous_mode
),
blocking all client write requests until at least one synchronous replica comes
up.
You can ensure that a standby never becomes the synchronous standby by setting nosync
tag to true. This is recommended to set for standbys that are behind slow network connections and would cause performance degradation when becoming a synchronous standby.
Synchronous mode can be switched on and off via Patroni REST interface. See dynamic configuration for instructions.
Note: Because of the way synchronous replication is implemented in PostgreSQL it is still possible to lose transactions even when using synchronous_mode_strict
. If the PostgreSQL backend is cancelled while waiting to acknowledge replication (as a result of packet cancellation due to client timeout or backend failure) transaction changes become visible for other backends. Such changes are not yet replicated and may be lost in case of standby promotion.
Synchronous Replication Factor
The parameter synchronous_node_count
is used by Patroni to manage number of synchronous standby databases. It is set to 1 by default. It has no effect when synchronous_mode
is set to off. When enabled, Patroni manages precise number of synchronous standby databases based on parameter synchronous_node_count
and adjusts the state in DCS & synchronous_standby_names as members join and leave.
Synchronous mode implementation
When in synchronous mode Patroni maintains synchronization state in the DCS, containing the latest primary and current synchronous standby databases. This state is updated with strict ordering constraints to ensure the following invariants:
A node must be marked as the latest leader whenever it can accept write transactions. Patroni crashing or PostgreSQL not shutting down can cause violations of this invariant.
A node must be set as the synchronous standby in PostgreSQL as long as it is published as the synchronous standby.
A node that is not the leader or current synchronous standby is not allowed to promote itself automatically.
Patroni will only assign one or more synchronous standby nodes based on synchronous_node_count
parameter to synchronous_standby_names
.
On each HA loop iteration Patroni re-evaluates synchronous standby nodes choice. If the current list of synchronous standby nodes are connected and has not requested its synchronous status to be removed it remains picked. Otherwise the cluster member available for sync that is furthest ahead in replication is picked.
Watchdog support
Having multiple PostgreSQL servers running as primary can result in transactions lost due to diverging timelines. This situation is also called a split-brain problem. To avoid split-brain Patroni needs to ensure PostgreSQL will not accept any transaction commits after leader key expires in the DCS. Under normal circumstances Patroni will try to achieve this by stopping PostgreSQL when leader lock update fails for any reason. However, this may fail to happen due to various reasons:
Patroni has crashed due to a bug, out-of-memory condition or by being accidentally killed by a system administrator.
Shutting down PostgreSQL is too slow.
Patroni does not get to run due to high load on the system, the VM being paused by the hypervisor, or other infrastructure issues.
To guarantee correct behavior under these conditions Patroni supports watchdog devices. Watchdog devices are software or hardware mechanisms that will reset the whole system when they do not get a keepalive heartbeat within a specified timeframe. This adds an additional layer of fail safe in case usual Patroni split-brain protection mechanisms fail.
Patroni will try to activate the watchdog before promoting PostgreSQL to primary. If watchdog activation fails and watchdog mode is required
then the node will refuse to become leader. When deciding to participate in leader election Patroni will also check that watchdog configuration will allow it to become leader at all. After demoting PostgreSQL (for example due to a manual failover) Patroni will disable the watchdog again. Watchdog will also be disabled while Patroni is in paused state.
By default Patroni will set up the watchdog to expire 5 seconds before TTL expires. With the default setup of loop_wait=10
and ttl=30
this gives HA loop at least 15 seconds (ttl
- safety_margin
- loop_wait
) to complete before the system gets forcefully reset. By default accessing DCS is configured to time out after 10 seconds. This means that when DCS is unavailable, for example due to network issues, Patroni and PostgreSQL will have at least 5 seconds (ttl
- safety_margin
- loop_wait
- retry_timeout
) to come to a state where all client connections are terminated.
Safety margin is the amount of time that Patroni reserves for time between leader key update and watchdog keepalive. Patroni will try to send a keepalive immediately after confirmation of leader key update. If Patroni process is suspended for extended amount of time at exactly the right moment the keepalive may be delayed for more than the safety margin without triggering the watchdog. This results in a window of time where watchdog will not trigger before leader key expiration, invalidating the guarantee. To be absolutely sure that watchdog will trigger under all circumstances set up the watchdog to expire after half of TTL by setting safety_margin
to -1 to set watchdog timeout to ttl // 2
. If you need this guarantee you probably should increase ttl
and/or reduce loop_wait
and retry_timeout
.
Currently watchdogs are only supported using Linux watchdog device interface.
Setting up software watchdog on Linux
Default Patroni configuration will try to use /dev/watchdog
on Linux if it is accessible to Patroni. For most use cases using software watchdog built into the Linux kernel is secure enough.
To enable software watchdog issue the following commands as root before starting Patroni:
modprobe softdog
# Replace postgres with the user you will be running patroni under
chown postgres /dev/watchdog
For testing it may be helpful to disable rebooting by adding soft_noboot=1
to the modprobe command line. In this case the watchdog will just log a line in kernel ring buffer, visible via dmesg.
Patroni will log information about the watchdog when it is successfully enabled.
Pause/Resume mode for the cluster
The goal
Under certain circumstances Patroni needs to temporarily step down from managing the cluster, while still retaining the cluster state in DCS. Possible use cases are uncommon activities on the cluster, such as major version upgrades or corruption recovery. During those activities nodes are often started and stopped for reasons unknown to Patroni, some nodes can be even temporarily promoted, violating the assumption of running only one primary. Therefore, Patroni needs to be able to “detach” from the running cluster, implementing an equivalent of the maintenance mode in Pacemaker.
The implementation
When Patroni runs in a paused mode, it does not change the state of PostgreSQL, except for the following cases:
For each node, the member key in DCS is updated with the current information about the cluster. This causes Patroni to run read-only queries on a member node if the member is running.
For the Postgres primary with the leader lock Patroni updates the lock. If the node with the leader lock stops being the primary (i.e. is demoted manually), Patroni will release the lock instead of promoting the node back.
Manual unscheduled restart, manual unscheduled failover/switchover and reinitialize are allowed. No scheduled action is allowed. Manual switchover is only allowed if the node to switch over to is specified.
If ‘parallel’ primaries are detected by Patroni, it emits a warning, but does not demote the primary without the leader lock.
If there is no leader lock in the cluster, the running primary acquires the lock. If there is more than one primary node, then the first primary to acquire the lock wins. If there are no primary altogether, Patroni does not try to promote any replicas. There is an exception in this rule: if there is no leader lock because the old primary has demoted itself due to the manual promotion, then only the candidate node mentioned in the promotion request may take the leader lock. When the new leader lock is granted (i.e. after promoting a replica manually), Patroni makes sure the replicas that were streaming from the previous leader will switch to the new one.
When Postgres is stopped, Patroni does not try to start it. When Patroni is stopped, it does not try to stop the Postgres instance it is managing.
Patroni will not try to remove replication slots that don’t represent the other cluster member or are not listed in the configuration of the permanent slots.
User guide
patronictl
supports pause and resume commands.
One can also issue a PATCH
request to the {namespace}/{cluster}/config
key with {"pause": true/false/null}
DCS Failsafe Mode
The problem
Patroni is heavily relying on Distributed Configuration Store (DCS) to solve the task of leader elections and detect network partitioning. That is, the node is allowed to run Postgres as the primary only if it can update the leader lock in DCS. In case the update of the leader lock fails, Postgres is immediately demoted and started as read-only. Depending on which DCS is used, the chances of hitting the “problem” differ. For example, with Etcd which is only used for Patroni, chances are close to zero, while with K8s API (backed by Etcd) it could be observed more frequently.
Reasons for the current implementation
The leader lock update failure could be caused by two main reasons:
Network partitioning
DCS being down
In general, it is impossible to distinguish between these two from a single node, and therefore Patroni assumes the worst case - network partitioning. In the case of a partitioned network, other nodes of the Patroni cluster may successfully grab the leader lock and promote Postgres to primary. In order to avoid a split-brain, the old primary is demoted before the leader lock expires.
DCS Failsafe Mode
We introduce a new special option, the failsafe_mode
. It could be enabled only via global dynamic configuration stored in the DCS /config
key. If the failsafe mode is enabled and the leader lock update in DCS failed due to reasons different from the version/value/index mismatch, Postgres may continue to run as a primary if it can access all known members of the cluster via Patroni REST API.
Low-level implementation details
We introduce a new, permanent key in DCS, named
/failsafe
.The
/failsafe
key contains all known members of the given Patroni cluster at a given time.The current leader maintains the
/failsafe
key.The member is allowed to participate in the leader race and become the new leader only if it is present in the
/failsafe
key.If the cluster consists of a single node the
/failsafe
key will contain a single member.In the case of DCS “outage” the existing primary connects to all members presented in the
/failsafe
key via thePOST /failsafe
REST API and may continue to run as the primary if all replicas acknowledge it.If one of the members doesn’t respond, the primary is demoted.
Replicas are using incoming
POST /failsafe
REST API requests as an indicator that the primary is still alive. This information is cached forttl
seconds.
F.A.Q.
Why MUST the current primary see ALL other members? Can’t we rely on quorum here?
This is a great question! The problem is that the view on the quorum might be different from the perspective of DCS and Patroni. While DCS nodes must be evenly distributed across availability zones, there is no such rule for Patroni, and more importantly, there is no mechanism for introducing and enforcing such a rule. If the majority of Patroni nodes ends up in the losing part of the partitioned network (including primary) while minority nodes are in the winning part, the primary must be demoted. Only checking ALL other members allows detecting such a situation.
What if node/pod gets terminated while DCS is down?
If DCS isn’t accessible, the check “are ALL other cluster members accessible?” is executed every cycle of the heartbeat loop (every
loop_wait
seconds). If pod/node is terminated, the check will fail and Postgres will be demoted to a read-only and will not recover until DCS is restored.What if all members of the Patroni cluster are lost while DCS is down?
Patroni could be configured to create the new replica from the backup even when the cluster doesn’t have a leader. But, if the new member isn’t present in the
/failsafe
key, it will not be able to grab the leader lock and promote.What will happen if the primary lost access to DCS while replicas didn’t?
The primary will execute the failsafe code and contact all known replicas. These replicas will use this information as an indicator that the primary is alive and will not start the leader race even if the leader lock in DCS has expired.
How to enable the Failsafe Mode?
Before enabling the
failsafe_mode
please make sure that Patroni version on all members is up-to-date. After that, you can use either thePATCH /config
REST API or patronictl edit-config -s failsafe_mode=true
Using Patroni with Kubernetes
Patroni can use Kubernetes objects in order to store the state of the cluster and manage the leader key. That makes it capable of operating Postgres in Kubernetes environment without any consistency store, namely, one doesn’t need to run an extra Etcd deployment. There are two different type of Kubernetes objects Patroni can use to store the leader and the configuration keys, they are configured with the kubernetes.use_endpoints or PATRONI_KUBERNETES_USE_ENDPOINTS environment variable.
Use Endpoints
Despite the fact that this is the recommended mode, it is turned off by default for compatibility reasons. When it is on, Patroni stores the cluster configuration and the leader key in the metadata: annotations fields of the respective Endpoints it creates. Changing the leader is safer than when using ConfigMaps, since both the annotations, containing the leader information, and the actual addresses pointing to the running leader pod are updated simultaneously in one go.
Use ConfigMaps
In this mode, Patroni will create ConfigMaps instead of Endpoints and store keys inside meta-data of those ConfigMaps. Changing the leader takes at least two updates, one to the leader ConfigMap and another to the respective Endpoint.
To direct the traffic to the Postgres leader you need to configure the Kubernetes Postgres service to use the label selector with the role_label (configured in patroni configuration).
Note that in some cases, for instance, when running on OpenShift, there is no alternative to using ConfigMaps.
Configuration
Patroni Kubernetes settings and environment variables are described in the general chapters of the documentation.
Customize role label
By default, Patroni will set corresponding labels on the pod it runs in based on node’s role, such as role=master
.
The key and value of label can be customized by kubernetes.role_label, kubernetes.leader_label_value, kubernetes.follower_label_value and kubernetes.standby_leader_label_value.
Note that if you migrate from default role labels to custom ones, you can reduce downtime by following migration steps:
Add a temporary label using original role value for the pod with kubernetes.tmp_role_label (like
tmp_role
). Once pods are restarted they will get following labels set by Patroni:
labels: cluster-name: foo role: master tmp_role: master
After all pods have been updated, modify the service selector to select the temporary label.
selector: cluster-name: foo tmp_role: master
Add your custom role label (e.g., set kubernetes.leader_label_value=primary). Once pods are restarted they will get following new labels set by Patroni:
labels: cluster-name: foo role: primary tmp_role: master
After all pods have been updated again, modify the service selector to use new role value.
selector: cluster-name: foo role: primary
Finally, remove the temporary label from your configuration and update all pods.
labels: cluster-name: foo role: primary
Examples
The kubernetes folder of the Patroni repository contains examples of the Docker image, and the Kubernetes manifest to test Patroni Kubernetes setup. Note that in the current state it will not be able to use PersistentVolumes because of permission issues.
You can find the full-featured Docker image that can use Persistent Volumes in the Spilo Project.
There is also a Helm chart to deploy the Spilo image configured with Patroni running using Kubernetes.
In order to run your database clusters at scale using Patroni and Spilo, take a look at the postgres-operator project. It implements the operator pattern to manage Spilo clusters.
Citus support
Patroni makes it extremely simple to deploy Multi-Node Citus clusters.
TL;DR
There are only a few simple rules you need to follow:
Citus database extension to PostgreSQL must be available on all nodes. Absolute minimum supported Citus version is 10.0, but, to take all benefits from transparent switchovers and restarts of workers we recommend using at least Citus 11.2.
Cluster name (
scope
) must be the same for all Citus nodes!Superuser credentials must be the same on coordinator and all worker nodes, and
pg_hba.conf
should allow superuser access between all nodes.REST API access should be allowed from worker nodes to the coordinator. E.g., credentials should be the same and if configured, client certificates from worker nodes must be accepted by the coordinator.
Add the following section to the
patroni.yaml
:
citus:
group: X # 0 for coordinator and 1, 2, 3, etc for workers
database: citus # must be the same on all nodes
After that you just need to start Patroni and it will handle the rest:
citus
extension will be automatically added toshared_preload_libraries
.If
max_prepared_transactions
isn’t explicitly set in the global dynamic configuration Patroni will automatically set it to2*max_connections
.The
citus.local_hostname
GUC value will be adjusted fromlocalhost
to the value that Patroni is using in order to connect to the local PostgreSQL instance. The value sometimes should be different from thelocalhost
because PostgreSQL might be not listening on it.The
citus.database
will be automatically created followed byCREATE EXTENSION citus
.Current superuser credentials will be added to the
pg_dist_authinfo
table to allow cross-node communication. Don’t forget to update them if later you decide to change superuser username/password/sslcert/sslkey!The coordinator primary node will automatically discover worker primary nodes and add them to the
pg_dist_node
table using thecitus_add_node()
function.Patroni will also maintain
pg_dist_node
in case failover/switchover on the coordinator or worker clusters occurs.
patronictl
Coordinator and worker clusters are physically different PostgreSQL/Patroni clusters that are just logically groupped together using the Citus database extension to PostgreSQL. Therefore in most cases it is not possible to manage them as a single entity.
It results in two major differences in patronictl behaviour when
patroni.yaml
has the citus
section comparing with the usual:
The
list
and thetopology
by default output all members of the Citus formation (coordinators and workers). The new columnGroup
indicates which Citus group they belong to.For all
patronictl
commands the new option is introduced, named--group
. For some commands the default value for the group might be taken from thepatroni.yaml
. For example, patronictl pause will enable the maintenance mode by default for thegroup
that is set in thecitus
section, but for example for patronictl switchover or patronictl remove the group must be explicitly specified.
An example of patronictl list output for the Citus cluster:
postgres@coord1:~$ patronictl list demo
+ Citus cluster: demo ----------+--------------+---------+----+-----------+
| Group | Member | Host | Role | State | TL | Lag in MB |
+-------+---------+-------------+--------------+---------+----+-----------+
| 0 | coord1 | 172.27.0.10 | Replica | running | 1 | 0 |
| 0 | coord2 | 172.27.0.6 | Sync Standby | running | 1 | 0 |
| 0 | coord3 | 172.27.0.4 | Leader | running | 1 | |
| 1 | work1-1 | 172.27.0.8 | Sync Standby | running | 1 | 0 |
| 1 | work1-2 | 172.27.0.2 | Leader | running | 1 | |
| 2 | work2-1 | 172.27.0.5 | Sync Standby | running | 1 | 0 |
| 2 | work2-2 | 172.27.0.7 | Leader | running | 1 | |
+-------+---------+-------------+--------------+---------+----+-----------+
If we add the --group
option, the output will change to:
postgres@coord1:~$ patronictl list demo --group 0
+ Citus cluster: demo (group: 0, 7179854923829112860) -----------+
| Member | Host | Role | State | TL | Lag in MB |
+--------+-------------+--------------+---------+----+-----------+
| coord1 | 172.27.0.10 | Replica | running | 1 | 0 |
| coord2 | 172.27.0.6 | Sync Standby | running | 1 | 0 |
| coord3 | 172.27.0.4 | Leader | running | 1 | |
+--------+-------------+--------------+---------+----+-----------+
postgres@coord1:~$ patronictl list demo --group 1
+ Citus cluster: demo (group: 1, 7179854923881963547) -----------+
| Member | Host | Role | State | TL | Lag in MB |
+---------+------------+--------------+---------+----+-----------+
| work1-1 | 172.27.0.8 | Sync Standby | running | 1 | 0 |
| work1-2 | 172.27.0.2 | Leader | running | 1 | |
+---------+------------+--------------+---------+----+-----------+
Citus worker switchover
When a switchover is orchestrated for a Citus worker node, Citus offers the opportunity to make the switchover close to transparent for an application. Because the application connects to the coordinator, which in turn connects to the worker nodes, then it is possible with Citus to pause the SQL traffic on the coordinator for the shards hosted on a worker node. The switchover then happens while the traffic is kept on the coordinator, and resumes as soon as a new primary worker node is ready to accept read-write queries.
An example of patronictl switchover on the worker cluster:
postgres@coord1:~$ patronictl switchover demo
+ Citus cluster: demo ----------+--------------+---------+----+-----------+
| Group | Member | Host | Role | State | TL | Lag in MB |
+-------+---------+-------------+--------------+---------+----+-----------+
| 0 | coord1 | 172.27.0.10 | Replica | running | 1 | 0 |
| 0 | coord2 | 172.27.0.6 | Sync Standby | running | 1 | 0 |
| 0 | coord3 | 172.27.0.4 | Leader | running | 1 | |
| 1 | work1-1 | 172.27.0.8 | Leader | running | 1 | |
| 1 | work1-2 | 172.27.0.2 | Sync Standby | running | 1 | 0 |
| 2 | work2-1 | 172.27.0.5 | Sync Standby | running | 1 | 0 |
| 2 | work2-2 | 172.27.0.7 | Leader | running | 1 | |
+-------+---------+-------------+--------------+---------+----+-----------+
Citus group: 2
Primary [work2-2]:
Candidate ['work2-1'] []:
When should the switchover take place (e.g. 2022-12-22T08:02 ) [now]:
Current cluster topology
+ Citus cluster: demo (group: 2, 7179854924063375386) -----------+
| Member | Host | Role | State | TL | Lag in MB |
+---------+------------+--------------+---------+----+-----------+
| work2-1 | 172.27.0.5 | Sync Standby | running | 1 | 0 |
| work2-2 | 172.27.0.7 | Leader | running | 1 | |
+---------+------------+--------------+---------+----+-----------+
Are you sure you want to switchover cluster demo, demoting current primary work2-2? [y/N]: y
2022-12-22 07:02:40.33003 Successfully switched over to "work2-1"
+ Citus cluster: demo (group: 2, 7179854924063375386) ------+
| Member | Host | Role | State | TL | Lag in MB |
+---------+------------+---------+---------+----+-----------+
| work2-1 | 172.27.0.5 | Leader | running | 1 | |
| work2-2 | 172.27.0.7 | Replica | stopped | | unknown |
+---------+------------+---------+---------+----+-----------+
postgres@coord1:~$ patronictl list demo
+ Citus cluster: demo ----------+--------------+---------+----+-----------+
| Group | Member | Host | Role | State | TL | Lag in MB |
+-------+---------+-------------+--------------+---------+----+-----------+
| 0 | coord1 | 172.27.0.10 | Replica | running | 1 | 0 |
| 0 | coord2 | 172.27.0.6 | Sync Standby | running | 1 | 0 |
| 0 | coord3 | 172.27.0.4 | Leader | running | 1 | |
| 1 | work1-1 | 172.27.0.8 | Leader | running | 1 | |
| 1 | work1-2 | 172.27.0.2 | Sync Standby | running | 1 | 0 |
| 2 | work2-1 | 172.27.0.5 | Leader | running | 2 | |
| 2 | work2-2 | 172.27.0.7 | Sync Standby | running | 2 | 0 |
+-------+---------+-------------+--------------+---------+----+-----------+
And this is how it looks on the coordinator side:
# The worker primary notifies the coordinator that it is going to execute "pg_ctl stop".
2022-12-22 07:02:38,636 DEBUG: query("BEGIN")
2022-12-22 07:02:38,636 DEBUG: query("SELECT pg_catalog.citus_update_node(3, '172.27.0.7-demoted', 5432, true, 10000)")
# From this moment all application traffic on the coordinator to the worker group 2 is paused.
# The future worker primary notifies the coordinator that it acquired the leader lock in DCS and about to run "pg_ctl promote".
2022-12-22 07:02:40,085 DEBUG: query("SELECT pg_catalog.citus_update_node(3, '172.27.0.5', 5432)")
# The new worker primary just finished promote and notifies coordinator that it is ready to accept read-write traffic.
2022-12-22 07:02:41,485 DEBUG: query("COMMIT")
# From this moment the application traffic on the coordinator to the worker group 2 is unblocked.
Peek into DCS
The Citus cluster (coordinator and workers) are stored in DCS as a fleet of Patroni clusters logically grouped together:
/service/batman/ # scope=batman
/service/batman/0/ # citus.group=0, coordinator
/service/batman/0/initialize
/service/batman/0/leader
/service/batman/0/members/
/service/batman/0/members/m1
/service/batman/0/members/m2
/service/batman/1/ # citus.group=1, worker
/service/batman/1/initialize
/service/batman/1/leader
/service/batman/1/members/
/service/batman/1/members/m3
/service/batman/1/members/m4
...
Such an approach was chosen because for most DCS it becomes possible to fetch the entire Citus cluster with a single recursive read request. Only Citus coordinator nodes are reading the whole tree, because they have to discover worker nodes. Worker nodes are reading only the subtree for their own group and in some cases they could read the subtree of the coordinator group.
Citus on Kubernetes
Since Kubernetes doesn’t support hierarchical structures we had to include the citus group to all K8s objects Patroni creates:
batman-0-leader # the leader config map for the coordinator
batman-0-config # the config map holding initialize, config, and history "keys"
...
batman-1-leader # the leader config map for worker group 1
batman-1-config
...
I.e., the naming pattern is: ${scope}-${citus.group}-${type}
.
All Kubernetes objects are discovered by Patroni using the label selector, therefore all Pods with Patroni&Citus and Endpoints/ConfigMaps must have similar labels, and Patroni must be configured to use them using Kubernetes settings or environment variables.
A couple of examples of Patroni configuration using Pods environment variables:
for the coordinator cluster
apiVersion: v1
kind: Pod
metadata:
labels:
application: patroni
citus-group: "0"
citus-type: coordinator
cluster-name: citusdemo
name: citusdemo-0-0
namespace: default
spec:
containers:
- env:
- name: PATRONI_SCOPE
value: citusdemo
- name: PATRONI_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: PATRONI_KUBERNETES_POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: PATRONI_KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: PATRONI_KUBERNETES_LABELS
value: '{application: patroni}'
- name: PATRONI_CITUS_DATABASE
value: citus
- name: PATRONI_CITUS_GROUP
value: "0"
for the worker cluster from the group 2
apiVersion: v1
kind: Pod
metadata:
labels:
application: patroni
citus-group: "2"
citus-type: worker
cluster-name: citusdemo
name: citusdemo-2-0
namespace: default
spec:
containers:
- env:
- name: PATRONI_SCOPE
value: citusdemo
- name: PATRONI_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: PATRONI_KUBERNETES_POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: PATRONI_KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: PATRONI_KUBERNETES_LABELS
value: '{application: patroni}'
- name: PATRONI_CITUS_DATABASE
value: citus
- name: PATRONI_CITUS_GROUP
value: "2"
As you may noticed, both examples have citus-group
label set. This label
allows Patroni to identify object as belonging to a certain Citus group. In
addition to that, there is also PATRONI_CITUS_GROUP
environment variable,
which has the same value as the citus-group
label. When Patroni creates
new Kubernetes objects ConfigMaps or Endpoints, it automatically puts the
citus-group: ${env.PATRONI_CITUS_GROUP}
label on them:
apiVersion: v1
kind: ConfigMap
metadata:
name: citusdemo-0-leader # Is generated as ${env.PATRONI_SCOPE}-${env.PATRONI_CITUS_GROUP}-leader
labels:
application: patroni # Is set from the ${env.PATRONI_KUBERNETES_LABELS}
cluster-name: citusdemo # Is automatically set from the ${env.PATRONI_SCOPE}
citus-group: '0' # Is automatically set from the ${env.PATRONI_CITUS_GROUP}
You can find a complete example of Patroni deployment on Kubernetes with Citus support in the kubernetes folder of the Patroni repository.
There are two important files for you:
Dockerfile.citus
citus_k8s.yaml
Citus upgrades and PostgreSQL major upgrades
First, please read about upgrading Citus version in the documentation.
There is one minor change in the process. When executing upgrade, you have to
use patronictl restart instead of systemctl restart
to restart
PostgreSQL.
The PostgreSQL major upgrade with Citus is a bit more complex. You will have to combine techniques used in the Citus documentation about major upgrades and Patroni documentation about PostgreSQL major upgrade. Please keep in mind that Citus cluster consists of many Patroni clusters (coordinator and workers) and they all have to be upgraded independently.
Convert a Standalone to a Patroni Cluster
This section describes the process for converting a standalone PostgreSQL instance into a Patroni cluster.
To deploy a Patroni cluster without using a pre-existing PostgreSQL instance, see Running and Configuring instead.
Procedure
You can find below an overview of steps for converting an existing Postgres cluster to a Patroni managed cluster. In the steps we assume all nodes that are part of the existing cluster are currently up and running, and that you do not intend to change Postgres configuration while the migration is ongoing. The steps:
Create the Postgres users as explained for authentication section of the Patroni configuration. You can find sample SQL commands to create the users in the code block below, in which you need to replace the usernames and passwords as per your environment. If you already have the relevant users, then you can skip this step.
-- Patroni superuser -- Replace PATRONI_SUPERUSER_USERNAME and PATRONI_SUPERUSER_PASSWORD accordingly CREATE USER PATRONI_SUPERUSER_USERNAME WITH SUPERUSER ENCRYPTED PASSWORD 'PATRONI_SUPERUSER_PASSWORD'; -- Patroni replication user -- Replace PATRONI_REPLICATION_USERNAME and PATRONI_REPLICATION_PASSWORD accordingly CREATE USER PATRONI_REPLICATION_USERNAME WITH REPLICATION ENCRYPTED PASSWORD 'PATRONI_REPLICATION_PASSWORD'; -- Patroni rewind user, if you intend to enable use_pg_rewind in your Patroni configuration -- Replace PATRONI_REWIND_USERNAME and PATRONI_REWIND_PASSWORD accordingly CREATE USER PATRONI_REWIND_USERNAME WITH ENCRYPTED PASSWORD 'PATRONI_REWIND_PASSWORD'; GRANT EXECUTE ON function pg_catalog.pg_ls_dir(text, boolean, boolean) TO PATRONI_REWIND_USERNAME; GRANT EXECUTE ON function pg_catalog.pg_stat_file(text, boolean) TO PATRONI_REWIND_USERNAME; GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text) TO PATRONI_REWIND_USERNAME; GRANT EXECUTE ON function pg_catalog.pg_read_binary_file(text, bigint, bigint, boolean) TO PATRONI_REWIND_USERNAME;
Perform the following steps on all Postgres nodes. Perform all steps on one node before proceeding with the next node. Start with the primary node, then proceed with each standby node:
If you are running Postgres through systemd, then disable the Postgres systemd unit. This is performed as Patroni manages starting and stopping the Postgres daemon.
Create a YAML configuration file for Patroni. You can use Patroni configuration generation and validation tooling for that.
Note (specific for the primary node): If you have replication slots being used for replication between cluster members, then it is recommended that you enable
use_slots
and configure the existing replication slots as permanent via theslots
configuration item. Be aware that Patroni automatically creates replication slots for replication between members, and drops replication slots that it does not recognize, whenuse_slots
is enabled. The idea of using permanent slots here is to allow your existing slots to persist while the migration to Patroni is in progress. See YAML Configuration Settings for details.
Start Patroni using the
patroni
systemd service unit. It automatically detects that Postgres is already running and starts monitoring the instance.
Hand over Postgres “start up procedure” to Patroni. In order to do that you need to restart the cluster members through patronictl restart cluster-name member-name command. For minimal downtime you might want to split this step into:
Immediate restart of the standby nodes.
Scheduled restart of the primary node within a maintenance window.
If you configured permanent slots in step
1.2.
, then you should remove them fromslots
configuration through patronictl edit-config cluster-name member-name command once therestart_lsn
of the slots created by Patroni is able to catch up with therestart_lsn
of the original slots for the corresponding members. By removing the slots fromslots
configuration you will allow Patroni to drop the original slots from your cluster once they are not needed anymore. You can find below an example query to check therestart_lsn
of a couple slots, so you can compare them:-- Assume original_slot_for_member_x is the name of the slot in your original -- cluster for replicating changes to member X, and slot_for_member_x is the -- slot created by Patroni for that purpose. You need restart_lsn of -- slot_for_member_x to be >= restart_lsn of original_slot_for_member_x SELECT slot_name, restart_lsn FROM pg_replication_slots WHERE slot_name IN ( 'original_slot_for_member_x', 'slot_for_member_x' )
Major Upgrade of PostgreSQL Version
The only possible way to do a major upgrade currently is:
Stop Patroni
Upgrade PostgreSQL binaries and perform pg_upgrade on the primary node
Update patroni.yml
Remove the initialize key from DCS or wipe complete cluster state from DCS. The second one could be achieved by running patronictl remove cluster-name . It is necessary because pg_upgrade runs initdb which actually creates a new database with a new PostgreSQL system identifier.
If you wiped the cluster state in the previous step, you may wish to copy patroni.dynamic.json from old data dir to the new one. It will help you to retain some PostgreSQL parameters you had set before.
Start Patroni on the primary node.
Upgrade PostgreSQL binaries, update patroni.yml and wipe the data_dir on standby nodes.
Start Patroni on the standby nodes and wait for the replication to complete.
Running pg_upgrade on standby nodes is not supported by PostgreSQL. If you know what you are doing, you can try the rsync procedure described in https://www.postgresql.org/docs/current/pgupgrade.html instead of wiping data_dir on standby nodes. The safest way is however to let Patroni replicate the data for you.
FAQ
During Patroni startup, Patroni complains that it cannot bind to the PostgreSQL port.
You need to verify
listen_addresses
andport
inpostgresql.conf
andpostgresql.listen
inpatroni.yml
. Don’t forget thatpg_hba.conf
should allow such access.After asking Patroni to restart the node, PostgreSQL displays the error message
could not open configuration file "/etc/postgresql/10/main/pg_hba.conf": No such file or directory
It can mean various things depending on how you manage PostgreSQL configuration. If you specified postgresql.config_dir, Patroni generates the
pg_hba.conf
based on the settings in the bootstrap section only when it bootstraps a new cluster. In this scenario thePGDATA
was not empty, therefore no bootstrap happened. This file must exist beforehand.
Security Considerations
A Patroni cluster has two interfaces to be protected from unauthorized access: the distributed configuration storage (DCS) and the Patroni REST API.
Protecting DCS
Patroni and patronictl both store and retrieve data to/from the DCS.
Despite DCS doesn’t contain any sensitive information, it allows changing some of Patroni/Postgres configuration. Therefore the very first thing that should be protected is DCS itself.
The details of protection depend on the type of DCS used. The authentication and encryption parameters (tokens/basic-auth/client certificates) for the supported types of DCS are covered in settings.
The general recommendation is to enable TLS for all DCS communication.
Protecting the REST API
Protecting the REST API is a more complicated task.
The Patroni REST API is used by Patroni itself during the leader race, by the patronictl tool in order to perform failovers/switchovers/reinitialize/restarts/reloads, by HAProxy or any other kind of load balancer to perform HTTP health checks, and of course could also be used for monitoring.
From the point of view of security, REST API contains safe (GET
requests, only retrieve information) and unsafe (PUT
, POST
, PATCH
and DELETE
requests, change the state of nodes) endpoints.
The unsafe endpoints can be protected with HTTP basic-auth by setting the restapi.authentication.username
and restapi.authentication.password
parameters. There is no way to protect the safe endpoints without enabling TLS.
When TLS for the REST API is enabled and a PKI is established, mutual authentication of the API server and API client is possible for all endpoints.
The restapi
section parameters enable TLS client authentication to the server. Depending on the value of the verify_client
parameter, the API server requires a successful client certificate verification for both safe and unsafe API calls (verify_client: required
), or only for unsafe API calls (verify_client: optional
), or for no API calls (verify_client: none
).
The ctl
section parameters enable TLS server authentication to the client (the patronictl tool which uses the same config as patroni). Set insecure: true
to disable the server certificate verification by the client. See settings for a detailed description of the TLS client parameters.
Protecting the PostgreSQL database proper from unauthorized access is beyond the scope of this document and is covered in https://www.postgresql.org/docs/current/client-authentication.html
HA multi datacenter
The high availability of a PostgreSQL cluster deployed in multiple data centers is based on replication, which can be synchronous or asynchronous (replication_modes).
In both cases, it is important to be clear about the following concepts:
Postgres can run as primary or standby leader only when it owns the leading key and can update the leading key.
You should run the odd number of etcd, ZooKeeper or Consul nodes: 3 or 5!
Synchronous Replication
To have a multi DC cluster that can automatically tolerate a zone drop, a minimum of 3 is required.
The architecture diagram would be the following:

We must deploy a cluster of etcd, ZooKeeper or Consul through the different DC, with a minimum of 3 nodes, one in each zone.
Regarding postgres, we must deploy at least 2 nodes, in different DC. Then you have to set synchronous_mode: true
in the global dynamic configuration.
This enables sync replication and the primary node will choose one of the nodes as synchronous.
Asynchronous Replication
With only two data centers it would be better to have two independent etcd clusters and run Patroni standby cluster in the second data center. If the first site is down, you can MANUALLY promote the standby_cluster
.
The architecture diagram would be the following:

Automatic promotion is not possible, because DC2 will never able to figure out the state of DC1.
You should not use pg_ctl promote
in this scenario, you need “manually promote” the healthy cluster by removing standby_cluster
section from the dynamic configuration.
Warning
If the source cluster is still up and running and you promote the standby cluster you create a split-brain.
In case you want to return to the “initial” state, there are only two ways of resolving it:
Add the standby_cluster section back and it will trigger pg_rewind, but there are chances that pg_rewind will fail.
Rebuild the standby cluster from scratch.
Before promoting standby cluster one have to manually ensure that the source cluster is down (STONITH). When DC1 recovers, the cluster has to be converted to a standby cluster.
Before doing that you may manually examine the database and extract all changes that happened between the time when network between DC1 and DC2 has stopped working and the time when you manually stopped the cluster in DC1.
Once extracted, you may also manually apply these changes to the cluster in DC2.
Release notes
Version 3.1.2
Bugfixes
Fixed bug with
wal_keep_size
checks (Alexander Kukushkin)The
wal_keep_size
is a GUC that normally has a unit and Patroni was failing to cast its value toint
. As a result the value ofbootstrap.dcs
was not written to the/config
key afterwards.Detect and resolve inconsistencies between
/sync
key andsynchronous_standby_names
(Alexander Kukushkin)Normally, Patroni updates
/sync
andsynchronous_standby_names
in a very specific order, but in case of a bug or when someone manually resetsynchronous_standby_names
, Patroni was getting into an inconsistent state. As a result it was possible that the failover happens to an asynchronous node.Read GUC’s values when joining running Postgres (Alexander Kukushkin)
When restarted in
pause
, Patroni was discarding thesynchronous_standby_names
GUC from thepostgresql.conf
. To solve it and avoid similar issues, Patroni will read GUC’s value if it is joining an already running Postgres.Silenced annoying warnings when checking for node uniqueness (Alexander Kukushkin)
WARNING
messages are produced byurllib3
if Patroni is quickly restarted.
Version 3.1.1
Bugfixes
Reset failsafe state on promote (ChenChangAo)
If switchover/failover happened shortly after failsafe mode had been activated, the newly promoted primary was demoting itself after failsafe becomes inactive.
Silence useless warnings in
patronictl
(Alexander Kukushkin)If
patronictl
uses the same patroni.yaml file as Patroni and can accessPGDATA
directory it might have been showing annoying warnings about incorrect values in the global configuration.Explicitly enable synchronous mode for a corner case (Alexander Kukushkin)
Synchronous mode effectively was never activated if there are no replicas streaming from the primary.
Fixed bug with
0
integer values validation (Israel Barth Rubio)In most cases, it didn’t cause any issues, just warnings.
Don’t return logical slots for standby cluster (Alexander Kukushkin)
Patroni can’t create logical replication slots in the standby cluster, thus they should be ignored if they are defined in the global configuration.
Avoid showing docstring in
patronictl --help
output (Israel Barth Rubio)The
click
module needs to get a special hint for that.Fixed bug with
kubernetes.standby_leader_label_value
(Alexander Kukushkin)This feature effectively never worked.
Returned cluster system identifier to the
patronictl list
output (Polina Bungina)The problem was introduced while implementing the support for Citus, where we need to hide the identifier because it is different for coordinator and all workers.
Override
write_leader_optime
method in Kubernetes implementation (Alexander Kukushkin)The method is supposed to write shutdown LSN to the leader Endpoint/ConfigMap when there are no healthy replicas available to become the new primary.
Don’t start stopped postgres in pause (Alexander Kukushkin)
Due to a race condition, Patroni was falsely assuming that the standby should be restarted because some recovery parameters (
primary_conninfo
or similar) were changed.Fixed bug in
patronictl query
command (Israel Barth Rubio)It didn’t work when only
-m
argument was provided or when none of-r
or-m
were provided.Properly treat integer parameters that are used in the command line to start postgres (Polina Bungina)
If values are supplied as strings and not casted to integer it was resulting in an incorrect calculation of
max_prepared_transactions
based onmax_connections
for Citus clusters.Don’t rely on
pg_stat_wal_receiver
when deciding onpg_rewind
(Alexander Kukushkin)It could happen that
received_tli
reported bypg_stat_wal_recevier
is ahead of the actual replayed timeline, while the timeline reported byDENTIFY_SYSTEM
via replication connection is always correct.
Version 3.1.0
Breaking changes
Changed semantic of
restapi.keyfile
andrestapi.certfile
(Alexander Kukushkin)Previously Patroni was using
restapi.keyfile
andrestapi.certfile
as client certificates as a fallback if there were no respective configuration parameters in thectl
section.
Warning
If you enabled client certificates validation (restapi.verify_client
is set to required
), you also must provide valid client certificates in the ctl.certfile
, ctl.keyfile
, ctl.keyfile_password
. If not provided, Patroni will not work correctly.
New features
Make Pod role label configurable (Waynerv)
Values could be customized using
kubernetes.leader_label_value
,kubernetes.follower_label_value
andkubernetes.standby_leader_label_value
parameters. This feature will be very useful when we change themaster
role to theprimary
. You can read more about the feature and migration steps here.
Improvements
Various improvements of
patroni --validate-config
(Alexander Kukushkin)Improved parameter validation for different DCS,
bootstrap.dcs
,ctl
,restapi
, andwatchdog
sections.Start Postgres not in recovery if it crashed during recovery while Patroni is running (Alexander Kukushkin)
It may reduce recovery time and will help to prevent unnecessary timeline increments.
Avoid unnecessary updates of
/status
key (Alexander Kukushkin)When there are no permanent logical slots Patroni was updating the
/status
on every heartbeat loop even when LSN on the primary didn’t move forward.Don’t allow stale primary to win the leader race (Alexander Kukushkin)
If Patroni was hanging during a significant time due to lack of resources it will additionally check that no other nodes promoted Postgres before acquiring the leader lock.
Implemented visibility of certain PostgreSQL parameters validation (Alexander Kukushkin, Feike Steenbergen)
If validation of
max_connections
,max_wal_senders
,max_prepared_transactions
,max_locks_per_transaction
,max_replication_slots
, ormax_worker_processes
failed Patroni was using some sane default value. Now in addition to that it will also show a warning.Set permissions for files and directories created in
PGDATA
(Alexander Kukushkin)All files created by Patroni had only owner read/write permissions. This behaviour was breaking backup tools that run under a different user and relying on group read permissions. Now Patroni honors permissions on
PGDATA
and correctly sets permissions on all directories and files it creates insidePGDATA
.
Bugfixes
Run
archive_command
through shell (Waynerv)Patroni might archive some WAL segments before doing crash recovery in a single-user mode or before
pg_rewind
. If the archive_command contains some shell operators, like&&
it didn’t work with Patroni.Fixed “on switchover” shutdown checks (Polina Bungina)
It was possible that specified candidate is still streaming and didn’t received shut down checking but the leader key was removed because some other nodes were healthy.
Fixed “is primary” check (Alexander Kukushkin)
During the leader race replicas were not able to recognize that Postgres on the old leader is still running as a primary.
Fixed
patronictl list
(Alexander Kukushkin)The Cluster name field was missing in
tsv
,json
, andyaml
output formats.Fixed
pg_rewind
behaviour after pause (Alexander Kukushkin)Under certain conditions, Patroni wasn’t able to join the false primary back to the cluster with
pg_rewind
after coming out of maintenance mode.Fixed bug in Etcd v3 implementation (Alexander Kukushkin)
Invalidate internal KV cache if key update performed using
create_revision
/mod_revision
field due to revision mismatch.Fixed behaviour of replicas in standby cluster in pause (Alexander Kukushkin)
When the leader key expires replicas in standby cluster will not follow the remote node but keep
primary_conninfo
as it is.
Version 3.0.4
New features
Make the replication status of standby nodes visible (Alexander Kukushkin)
For PostgreSQL 9.6+ Patroni will report the replication state as
streaming
when the standby is streaming from the other node orin archive recovery
when there is no replication connection andrestore_command
is set. The state is visible inmember
keys in DCS, in the REST API, and inpatronictl list
output.
Improvements
Improved error messages with Etcd v3 (Alexander Kukushkin)
When Etcd v3 cluster isn’t accessible Patroni was reporting that it can’t access
/v2
endpoints.Use quorum read in
patronictl
if it is possible (Alexander Kukushkin)Etcd or Consul clusters could be degraded to read-only, but from the
patronictl
view everything was fine. Now it will fail with the error.Prevent splitbrain from duplicate names in configuration (Mark Pekala)
When starting Patroni will check if node with the same name is registered in DCS, and try to query its REST API. If REST API is accessible Patroni exits with an error. It will help to protect from the human error.
Start Postgres not in recovery if it crashed while Patroni is running (Alexander Kukushkin)
It may reduce recovery time and will help from unnecessary timeline increments.
Bugfixes
REST API SSL certificate were not reloaded upon receiving a SIGHUP (Israel Barth Rubio)
Regression was introduced in 3.0.3.
Fixed integer GUCs validation for parameters like
max_connections
(Feike Steenbergen)Patroni didn’t like quoted numeric values. Regression was introduced in 3.0.3.
Fix issue with
synchronous_mode
(Alexander Kukushkin)Execute
txid_current()
withsynchronous_commit=off
so it doesn’t accidentally wait for absent synchronous standbys whensynchronous_mode_strict
is enabled.
Version 3.0.3
New features
Compatibility with PostgreSQL 16 beta1 (Alexander Kukushkin)
Extended GUC’s validator rules.
Make PostgreSQL GUC’s validator extensible (Israel Barth Rubio)
Validator rules are loaded from YAML files located in
patroni/postgresql/available_parameters/
directory. Files are ordered in alphabetical order and applied one after another. It makes possible to have custom validators for non-standard Postgres distributions.Added
restapi.request_queue_size
option (Andrey Zhidenkov, Aleksei Sukhov)Sets request queue size for TCP socket used by Patroni REST API. Once the queue is full, further requests get a “Connection denied” error. The default value is 5.
Call
initdb
directly when initializing a new cluster (Matt Baker)Previously it was called via
pg_ctl
, what required a special quoting of parameters passed toinitdb
.Added before stop hook (Le Duane)
The hook could be configured via
postgresql.before_stop
and is executed right beforepg_ctl stop
. The exit code doesn’t impact shutdown process.Added support for custom Postgres binary names (Israel Barth Rubio, Polina Bungina)
When using a custom Postgres distribution it may be the case that the Postgres binaries are compiled with different names other than the ones used by the community Postgres distribution. Custom binary names could be configured using
postgresql.bin_name.*
andPATRONI_POSTGRESQL_BIN_*
environment variables.
Improvements
Various improvements of
patroni --validate-config
(Polina Bungina)Make
bootstrap.initdb
optional. It is only required for new clusters, butpatroni --validate-config
was complaining if it was missing in the config.Don’t error out when
postgresql.bin_dir
is empty or not set. Try to first find Postgres binaries in the default PATH instead.Make
postgresql.authentication.rewind
section optional. If it is missing, Patroni is using the superuser.
Improved error reporting in
patronictl
(Israel Barth Rubio)The
\n
symbol was rendered as it is, instead of the actual newline symbol.
Bugfixes
Fixed issue in Citus support (Alexander Kukushkin)
If the REST API call from the promoted worker to the coordinator failed during switchover it was leaving the given Citus group blocked during indefinite time.
Allow etcd3 URL in –dcs-url option of patronictl (Israel Barth Rubio)
If users attempted to pass a etcd3 URL through –dcs-url option of patronictl they would face an exception.
Version 3.0.2
Warning
Version 3.0.2 dropped support of Python older than 3.6.
New features
Added sync standby replica status to
/metrics
endpoint (Thomas von Dein, Alexander Kukushkin)Before were only reporting
primary
/standby_leader
/replica
.User-friendly handling of
PAGER
inpatronictl
(Israel Barth Rubio)It makes pager configurable via
PAGER
environment variable, which overrides defaultless
andmore
.Make K8s retriable HTTP status code configurable (Alexander Kukushkin)
On some managed platforms it is possible to get status code
401 Unauthorized
, which sometimes gets resolved after a few retries.
Improvements
Set
hot_standby
tooff
during custom bootstrap only ifrecovery_target_action
is set topromote
(Alexander Kukushkin)It was necessary to make
recovery_target_action=pause
work correctly.Don’t allow
on_reload
callback to kill other callbacks (Alexander Kukushkin)on_start
/on_stop
/on_role_change
are usually used to add/remove Virtual IP andon_reload
should not interfere with them.Switched to
IMDSFetcher
in aws callback example script (Polina Bungina)The
IMDSv2
requires a token to work with and theIMDSFetcher
handles it transparently.
Bugfixes
Fixed
patronictl switchover
on Citus cluster running on Kubernetes (Lukáš Lalinský)It didn’t work for namespaces different from
default
.Don’t write to
PGDATA
if major version is not known (Alexander Kukushkin)If right after the start
PGDATA
was empty (maybe wasn’t yet mounted), Patroni was making a false assumption about PostgreSQL version and falsely creatingrecovery.conf
file even if the actual major version is v10+.Fixed bug with Citus metadata after coordinator failover (Alexander Kukushkin)
The
citus_set_coordinator_host()
call doesn’t cause metadata sync and the change was invisible on worker nodes. The issue is solved by switching tocitus_update_node()
.Use etcd hosts listed in the config file as a fallback when all etcd nodes “failed” (Alexander Kukushkin)
The etcd cluster may change topology over time and Patroni tries to follow it. If at some point all nodes became unreachable Patroni will use a combination of nodes from the config plus the last known topology when trying to reconnect.
Version 3.0.1
Bugfixes
Pass proper role name to an
on_role_change
callback script’. (Alexander Kukushkin, Polina Bungina)Patroni used to erroneously pass
promoted
role to anon_role_change
callback script on promotion. The passed role name changed back tomaster
. This regression was introduced in 3.0.0.
Version 3.0.0
This version adds integration with Citus and makes it possible to survive temporary DCS outages without demoting primary.
Warning
Version 3.0.0 is the last release supporting Python 2.7. Upcoming release will drop support of Python versions older than 3.7.
The RAFT support is deprecated. We will do our best to maintain it, but take neither guarantee nor responsibility for possible issues.
This version is the first step in getting rid of the “master”, in favor of “primary”. Upgrading to the next major release will work reliably only if you run at least 3.0.0.
New features
DCS failsafe mode (Alexander Kukushkin, Polina Bungina)
If the feature is enabled it will allow Patroni cluster to survive temporary DCS outages. You can find more details in the documentation.
Citus support (Alexander Kukushkin, Polina Bungina, Jelte Fennema)
Patroni enables easy deployment and management of Citus clusters with HA. Please check here page for more information.
Improvements
Suppress recurring errors when dropping unknown but active replication slots (Michael Banck)
Patroni will still write these logs, but only in DEBUG.
Run only one monitoring query per HA loop (Alexander Kukushkin)
It wasn’t the case if synchronous replication is enabled.
Keep only latest failed data directory (William Albertus Dembo)
If bootstrap failed Patroni used to rename $PGDATA folder with timestamp suffix. From now on the suffix will be
.failed
and if such folder exists it is removed before renaming.Improved check of synchronous replication connections (Alexander Kukushkin)
When the new host is added to the
synchronous_standby_names
it will be set as synchronous in DCS only when it managed to catch up with the primary in addition topg_stat_replication.sync_state = 'sync'
.
Removed functionality
Remove
patronictl scaffold
(Alexander Kukushkin)The only reason for having it was a hacky way of running standby clusters.
Version 2.1.7
Bugfixes
Fixed little incompatibilities with legacy python modules (Alexander Kukushkin)
They prevented from building/running Patroni on Debian buster/Ubuntu bionic.
Version 2.1.6
Improvements
Fix annoying exceptions on ssl socket shutdown (Alexander Kukushkin)
The HAProxy is closing connections as soon as it got the HTTP Status code leaving no time for Patroni to properly shutdown SSL connection.
Adjust example Dockerfile for arm64 (Polina Bungina)
Remove explicit
amd64
andx86_64
, don’t removelibnss_files.so.*
.
Security improvements
Enforce
search_path=pg_catalog
for non-replication connections (Alexander Kukushkin)Since Patroni is heavily relying on superuser connections, we want to protect it from the possible attacks carried out using user-defined functions and/or operators in
public
schema with the same name and signature as the corresponding objects inpg_catalog
. For that,search_path=pg_catalog
is enforced for all connections created by Patroni (except replication connections).Prevent passwords from being recorded in
pg_stat_statements
(Feike Steenbergen)It is achieved by setting
pg_stat_statements.track_utility=off
when creating users.
Bugfixes
Declare
proxy_address
as optional (Denis Laxalde)As it is effectively a non-required option.
Improve behaviour of the insecure option (Alexander Kukushkin)
Ctl’s
insecure
option didn’t work properly when client certificates were used for REST API requests.Take watchdog configuration from
bootstrap.dcs
when the new cluster is bootstrapped (Matt Baker)Patroni used to initially configure watchdog with defaults when bootstrapping a new cluster rather than taking configuration used to bootstrap the DCS.
Fix the way file extensions are treated while finding executables in WIN32 (Martín Marqués)
Only add
.exe
to a file name if it has no extension yet.Fix Consul TTL setup (Alexander Kukushkin)
We used
ttl/2.0
when setting the value on the HTTPClient, but forgot to multiply the current value by 2 in the class’ property. It was resulting in Consul TTL off by twice.
Removed functionality
Remove
patronictl configure
(Polina Bungina)There is no more need for a separate
patronictl
config creation.
Version 2.1.5
This version enhances compatibility with PostgreSQL 15 and declares Etcd v3 support as production ready. The Patroni on Raft remains in Beta.
New features
Improve
patroni --validate-config
(Denis Laxalde)Exit with code 1 if config is invalid and print errors to stderr.
Don’t drop replication slots in pause (Alexander Kukushkin)
Patroni is automatically creating/removing physical replication slots when members are joining/leaving the cluster. In pause slots will no longer be removed.
Support the
HEAD
request method for monitoring endpoints (Robert Cutajar)If used instead of
GET
Patroni will return only the HTTP Status Code.Support behave tests on Windows (Alexander Kukushkin)
Emulate graceful Patroni shutdown (
SIGTERM
) on Windows by introduce the new REST API endpointPOST /sigterm
.Introduce
postgresql.proxy_address
(Alexander Kukushkin)It will be written to the member key in DCS as the
proxy_url
and could be used/useful for service discovery.
Stability improvements
Call
pg_replication_slot_advance()
from a thread (Alexander Kukushkin)On busy clusters with many logical replication slots the
pg_replication_slot_advance()
call was affecting the main HA loop and could result in the member key expiration.Archive possibly missing WALs before calling
pg_rewind
on the old primary (Polina Bungina)If the primary crashed and was down during considerable time, some WAL files could be missing from archive and from the new primary. There is a chance that
pg_rewind
could remove these WAL files from the old primary making it impossible to start it as a standby. By archivingready
WAL files we not only mitigate this problem but in general improving continues archiving experience.Ignore
403
errors when trying to create Kubernetes Service (Nick Hudson, Polina Bungina)Patroni was spamming logs by unsuccessful attempts to create the service, which in fact could already exist.
Improve liveness probe (Alexander Kukushkin)
The liveness problem will start failing if the heartbeat loop is running longer than ttl on the primary or 2*ttl on the replica. That will allow us to use it as an alternative for watchdog on Kubernetes.
Make sure only sync node tries to grab the lock when switchover (Alexander Kukushkin, Polina Bungina)
Previously there was a slim chance that up-to-date async member could become the leader if the manual switchover was performed without specifying the target.
Avoid cloning while bootstrap is running (Ants Aasma)
Do not allow a create replica method that does not require a leader to be triggered while the cluster bootstrap is running.
Compatibility with kazoo-2.9.0 (Alexander Kukushkin)
Depending on python version the
SequentialThreadingHandler.select()
method may raiseTypeError
andIOError
exceptions ifselect()
is called on the closed socket.Explicitly shut down SSL connection before socket shutdown (Alexander Kukushkin)
Not doing it resulted in
unexpected eof while reading
errors with OpenSSL 3.0.Compatibility with prettytable>=2.2.0 (Alexander Kukushkin)
Due to the internal API changes the cluster name header was shown on the incorrect line.
Bugfixes
Handle expired token for Etcd lease_grant (monsterxx03)
In case of error get the new token and retry request.
Fix bug in the
GET /read-only-sync
endpoint (Alexander Kukushkin)It was introduced in previous release and effectively never worked.
Handle the case when data dir storage disappeared (Alexander Kukushkin)
Patroni is periodically checking that the PGDATA is there and not empty, but in case of issues with storage the
os.listdir()
is raising theOSError
exception, breaking the heart-beat loop.Apply
master_stop_timeout
when waiting for user backends to close (Alexander Kukushkin)Something that looks like user backend could be in fact a background worker (e.g., Citus Maintenance Daemon) that is failing to stop.
Accept
*:<port>
forpostgresql.listen
(Denis Laxalde)The
patroni --validate-config
was complaining about it being invalid.Timeouts fixes in Raft (Alexander Kukushkin)
When Patroni or patronictl are starting they try to get Raft cluster topology from known members. These calls were made without proper timeouts.
Forcefully update consul service if token was changed (John A. Lotoski)
Not doing so results in errors “rpc error making call: rpc error making call: ACL not found”.
Version 2.1.4
New features
Improve
pg_rewind
behavior on typical Debian/Ubuntu systems (Gunnar “Nick” Bluth)On Postgres setups that keep postgresql.conf outside of the data directory (e.g. Ubuntu/Debian packages),
pg_rewind --restore-target-wal
fails to figure out the value of therestore_command
.Allow setting
TLSServerName
on Consul service checks (Michael Gmelin)Useful when checks are performed by IP and the Consul
node_name
is not a FQDN.Added
ppc64le
support in watchdog (Jean-Michel Scheiwiler)And fixed watchdog support on some non-x86 platforms.
Switched aws.py callback from
boto
toboto3
(Alexander Kukushkin)
boto
2.x is abandoned since 2018 and fails with python 3.9.
Periodically refresh service account token on K8s (Haitao Li)
Since Kubernetes v1.21 service account tokens expire in 1 hour.
Added
/read-only-sync
monitoring endpoint (Dennis4b)It is similar to the
/read-only
but includes only synchronous replicas.
Stability improvements
Don’t copy the logical replication slot to a replica if there is a configuration mismatch in the logical decoding setup with the primary (Alexander Kukushkin)
A replica won’t copy a logical replication slot from the primary anymore if the slot doesn’t match the
plugin
ordatabase
configuration options. Previously, the check for whether the slot matches those configuration options was not performed until after the replica copied the slot and started with it, resulting in unnecessary and repeated restarts.Special handling of recovery configuration parameters for PostgreSQL v12+ (Alexander Kukushkin)
While starting as replica Patroni should be able to update
postgresql.conf
and restart/reload if the leader address has changed by caching current parameters values instead of querying them frompg_settings
.Better handling of IPv6 addresses in the
postgresql.listen
parameters (Alexander Kukushkin)Since the
listen
parameter has a port, people try to put IPv6 addresses into square brackets, which were not correctly stripped when there is more than one IP in the list.Use
replication
credentials when performing divergence check only on PostgreSQL v10 and older (Alexander Kukushkin)If
rewind
is enabled, Patroni will again use eithersuperuser
orrewind
credentials on newer Postgres versions.
Bugfixes
Fixed missing import of
dateutil.parser
(Wesley Mendes)Tests weren’t failing only because it was also imported from other modules.
Ensure that
optime
annotation is a string (Sebastian Hasler)In certain cases Patroni was trying to pass it as numeric.
Better handling of failed
pg_rewind
attempt (Alexander Kukushkin)If the primary becomes unavailable during
pg_rewind
,$PGDATA
will be left in a broken state. Following that, Patroni will remove the data directory even if this is not allowed by the configuration.Don’t remove
slots
annotations from the leaderConfigMap
/Endpoint
when PostgreSQL isn’t ready (Alexander Kukushkin)If
slots
value isn’t passed the annotation will keep the current value.Handle concurrency problem with K8s API watchers (Alexander Kukushkin)
Under certain (unknown) conditions watchers might become stale; as a result,
attempt_to_acquire_leader()
method could fail due to the HTTP status code 409. In that case we reset watchers connections and restart from scratch.
Version 2.1.3
New features
Added support for encrypted TLS keys for
patronictl
(Alexander Kukushkin)It could be configured via
ctl.keyfile_password
or thePATRONI_CTL_KEYFILE_PASSWORD
environment variable.Added more metrics to the /metrics endpoint (Alexandre Pereira)
Specifically,
patroni_pending_restart
andpatroni_is_paused
.Make it possible to specify multiple hosts in the standby cluster configuration (Michael Banck)
If the standby cluster is replicating from the Patroni cluster it might be nice to rely on client-side failover which is available in
libpq
since PostgreSQL v10. That is, theprimary_conninfo
on the standby leader andpg_rewind
settingtarget_session_attrs=read-write
in the connection string. Thepgpass
file will be generated with multiple lines (one line per host), and instead of callingCHECKPOINT
on the primary cluster nodes the standby cluster will wait forpg_control
to be updated.
Stability improvements
Compatibility with legacy
psycopg2
(Alexander Kukushkin)For example, the
psycopg2
installed from Ubuntu 18.04 packages doesn’t have theUndefinedFile
exception yet.Restart
etcd3
watcher if all Etcd nodes don’t respond (Alexander Kukushkin)If the watcher is alive the
get_cluster()
method continues returning stale information even if all Etcd nodes are failing.Don’t remove the leader lock in the standby cluster while paused (Alexander Kukushkin)
Previously the lock was maintained only by the node that was running as a primary and not a standby leader.
Bugfixes
Fixed bug in the standby-leader bootstrap (Alexander Kukushkin)
Patroni was considering bootstrap as failed if Postgres didn’t start accepting connections after 60 seconds. The bug was introduced in the 2.1.2 release.
Fixed bug with failover to a cascading standby (Alexander Kukushkin)
When figuring out which slots should be created on cascading standby we forgot to take into account that the leader might be absent.
Fixed small issues in Postgres config validator (Alexander Kukushkin)
Integer parameters introduced in PostgreSQL v14 were failing to validate because min and max values were quoted in the validator.py
Use replication credentials when checking leader status (Alexander Kukushkin)
It could be that the
remove_data_directory_on_diverged_timelines
is set, but there is norewind_credentials
defined and superuser access between nodes is not allowed.Fixed “port in use” error on REST API certificate replacement (Ants Aasma)
When switching certificates there was a race condition with a concurrent API request. If there is one active during the replacement period then the replacement will error out with a port in use error and Patroni gets stuck in a state without an active API server.
Fixed a bug in cluster bootstrap if passwords contain
%
characters (Bastien Wirtz)The bootstrap method executes the
DO
block, with all parameters properly quoted, but thecursor.execute()
method didn’t like an empty list with parameters passed.Fixed the “AttributeError: no attribute ‘leader’” exception (Hrvoje Milković)
It could happen if the synchronous mode is enabled and the DCS content was wiped out.
Fix bug in divergence timeline check (Alexander Kukushkin)
Patroni was falsely assuming that timelines have diverged. For pg_rewind it didn’t create any problem, but if pg_rewind is not allowed and the
remove_data_directory_on_diverged_timelines
is set, it resulted in reinitializing the former leader.
Version 2.1.2
New features
Compatibility with
psycopg>=3.0
(Alexander Kukushkin)By default
psycopg2
is preferred. psycopg>=3.0 will be used only ifpsycopg2
is not available or its version is too old.Add
dcs_last_seen
field to the REST API (Michael Banck)This field notes the last time (as unix epoch) a cluster member has successfully communicated with the DCS. This is useful to identify and/or analyze network partitions.
Release the leader lock when
pg_controldata
reports “shut down” (Alexander Kukushkin)To solve the problem of slow switchover/shutdown in case
archive_command
is slow/failing, Patroni will remove the leader key immediately afterpg_controldata
started reporting PGDATA asshut down
cleanly and it verified that there is at least one replica that received all changes. If there are no replicas that fulfill this condition the leader key is not removed and the old behavior is retained, i.e. Patroni will keep updating the lock.Add
sslcrldir
connection parameter support (Kostiantyn Nemchenko)The new connection parameter was introduced in the PostgreSQL v14.
Allow setting ACLs for ZNodes in Zookeeper (Alwyn Davis)
Introduce a new configuration option
zookeeper.set_acls
so that Kazoo will apply a default ACL for each ZNode that it creates.
Stability improvements
Delay the next attempt of recovery till next HA loop (Alexander Kukushkin)
If Postgres crashed due to out of disk space (for example) and fails to start because of that Patroni is too eagerly trying to recover it flooding logs.
Add log before demoting, which can take some time (Michael Banck)
It can take some time for the demote to finish and it might not be obvious from looking at the logs what exactly is going on.
Improve “I am” status messages (Michael Banck)
no action. I am a secondary ({0})
vsno action. I am ({0}), a secondary
Cast to int
wal_keep_segments
when converting towal_keep_size
(Jorge Solórzano)It is possible to specify
wal_keep_segments
as a string in the global dynamic configuration and due to Python being a dynamically typed language the string was simply multiplied. Example:wal_keep_segments: "100"
was converted to100100100100100100100100100100100100100100100100MB
.Allow switchover only to sync nodes when synchronous replication is enabled (Alexander Kukushkin)
In addition to that do the leader race only against known synchronous nodes.
Use cached role as a fallback when Postgres is slow (Alexander Kukushkin)
In some extreme cases Postgres could be so slow that the normal monitoring query does not finish in a few seconds. The
statement_timeout
exception not being properly handled could lead to the situation where Postgres was not demoted on time when the leader key expired or the update failed. In case of such exception Patroni will use the cachedrole
to determine whether Postgres is running as a primary.Avoid unnecessary updates of the member ZNode (Alexander Kukushkin)
If no values have changed in the members data, the update should not happen.
Optimize checkpoint after promote (Alexander Kukushkin)
Avoid doing
CHECKPOINT
if the latest timeline is already stored inpg_control
. It helps to avoid unnecessaryCHECKPOINT
right after initializing the new cluster withinitdb
.Prefer members without
nofailover
when picking sync nodes (Alexander Kukushkin)Previously sync nodes were selected only based on the replication lag, hence the node with
nofailover
tag had the same chances to become synchronous as any other node. That behavior was confusing and dangerous at the same time because in case of a failed primary the failover could not happen automatically.Remove duplicate hosts from the etcd machine cache (Michael Banck)
Advertised client URLs in the etcd cluster could be misconfigured. Removing duplicates in Patroni in this case is a low-hanging fruit.
Bugfixes
Skip temporary replication slots while doing slot management (Alexander Kukushkin)
Starting from v10
pg_basebackup
creates a temporary replication slot for WAL streaming and Patroni was trying to drop it because the slot name looks unknown. In order to fix it, we skip all temporary slots when queryingpg_stat_replication_slots
view.Ensure
pg_replication_slot_advance()
doesn’t timeout (Alexander Kukushkin)Patroni was using the default
statement_timeout
in this case and once the call failed there are very high chances that it will never recover, resulting in increased size ofpg_wal
andpg_catalog
bloat.The
/status
wasn’t updated on demote (Alexander Kukushkin)After demoting PostgreSQL the old leader updates the last LSN in DCS. Starting from
2.1.0
the new/status
key was introduced, but the optime was still written to the/optime/leader
.Handle DCS exceptions when demoting (Alexander Kukushkin)
While demoting the master due to failure to update the leader lock it could happen that DCS goes completely down and the
get_cluster()
call raises an exception. Not being handled properly it results in Postgres remaining stopped until DCS recovers.The
use_unix_socket_repl
didn’t work is some cases (Alexander Kukushkin)Specifically, if
postgresql.unix_socket_directories
is not set. In this case Patroni is supposed to use the default value fromlibpq
.Fix a few issues with Patroni REST API (Alexander Kukushkin)
The
clusters_unlocked
sometimes could be not defined, what resulted in exceptions in theGET /metrics
endpoint. In addition to that the error handling method was assuming that theconnect_address
tuple always has two elements, while in fact there could be more in case of IPv6.Wait for newly promoted node to finish recovery before deciding to rewind (Alexander Kukushkin)
It could take some time before the actual promote happens and the new timeline is created. Without waiting replicas could come to the conclusion that rewind isn’t required.
Handle missing timelines in a history file when deciding to rewind (Alexander Kukushkin)
If the current replica timeline is missing in the history file on the primary the replica was falsely assuming that rewind isn’t required.
Version 2.1.1
New features
Support for ETCD SRV name suffix (David Pavlicek)
Etcd allows to differentiate between multiple Etcd clusters under the same domain and from now on Patroni also supports it.
Enrich history with the new leader (huiyalin525)
It adds the new column to the
patronictl history
output.Make the CA bundle configurable for in-cluster Kubernetes config (Aron Parsons)
By default Patroni is using
/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
and this new feature allows specifying the customkubernetes.cacert
.Support dynamically registering/deregistering as a Consul service and changing tags (Tommy Li)
Previously it required Patroni restart.
Bugfixes
Avoid unnecessary reload of REST API (Alexander Kukushkin)
The previous release added a feature of reloading REST API certificates if changed on disk. Unfortunately, the reload was happening unconditionally right after the start.
Don’t resolve cluster members when
etcd.use_proxies
is set (Alexander Kukushkin)When starting up Patroni checks the healthiness of Etcd cluster by querying the list of members. In addition to that, it also tried to resolve their hostnames, which is not necessary when working with Etcd via proxy and was causing unnecessary warnings.
Skip rows with NULL values in the
pg_stat_replication
(Alexander Kukushkin)It seems that the
pg_stat_replication
view could contain NULL values in thereplay_lsn
,flush_lsn
, orwrite_lsn
fields even whenstate = 'streaming'
.
Version 2.1.0
This version adds compatibility with PostgreSQL v14, makes logical replication slots to survive failover/switchover, implements support of allowlist for REST API, and also reducing the number of logs to one line per heart-beat.
New features
Compatibility with PostgreSQL v14 (Alexander Kukushkin)
Unpause WAL replay if Patroni is not in a “pause” mode itself. It could be “paused” due to the change of certain parameters like for example
max_connections
on the primary.Failover logical slots (Alexander Kukushkin)
Make logical replication slots survive failover/switchover on PostgreSQL v11+. The replication slot if copied from the primary to the replica with restart and later the pg_replication_slot_advance() function is used to move it forward. As a result, the slot will already exist before the failover and no events should be lost, but, there is a chance that some events could be delivered more than once.
Implemented allowlist for Patroni REST API (Alexander Kukushkin)
If configured, only IP’s that matching rules would be allowed to call unsafe endpoints. In addition to that, it is possible to automatically include IP’s of members of the cluster to the list.
Added support of replication connections via unix socket (Mohamad El-Rifai)
Previously Patroni was always using TCP for replication connection what could cause some issues with SSL verification. Using unix sockets allows exempt replication user from SSL verification.
Health check on user-defined tags (Arman Jafari Tehrani)
Along with predefined tags: it is possible to specify any number of custom tags that become visible in the
patronictl list
output and in the REST API. From now on it is possible to use custom tags in health checks.Added Prometheus
/metrics
endpoint (Mark Mercado, Michael Banck)The endpoint exposing the same metrics as
/patroni
.Reduced chattiness of Patroni logs (Alexander Kukushkin)
When everything goes normal, only one line will be written for every run of HA loop.
Breaking changes
The old
permanent logical replication slots
feature will no longer work with PostgreSQL v10 and older (Alexander Kukushkin)The strategy of creating the logical slots after performing a promotion can’t guaranty that no logical events are lost and therefore disabled.
The
/leader
endpoint always returns 200 if the node holds the lock (Alexander Kukushkin)Promoting the standby cluster requires updating load-balancer health checks, which is not very convenient and easy to forget. To solve it, we change the behavior of the
/leader
health check endpoint. It will return 200 without taking into account whether the cluster is normal or thestandby_cluster
.
Improvements in Raft support
Reliable support of Raft traffic encryption (Alexander Kukushkin)
Due to the different issues in the
PySyncObj
the encryption support was very unstableHandle DNS issues in Raft implementation (Alexander Kukushkin)
If
self_addr
and/orpartner_addrs
are configured using the DNS name instead of IP’s thePySyncObj
was effectively doing resolve only once when the object is created. It was causing problems when the same node was coming back online with a different IP.
Stability improvements
Compatibility with
psycopg2-2.9+
(Alexander Kukushkin)In
psycopg2
theautocommit = True
is ignored in thewith connection
block, which breaks replication protocol connections.Fix excessive HA loop runs with Zookeeper (Alexander Kukushkin)
Update of member ZNodes was causing a chain reaction and resulted in running the HA loops multiple times in a row.
Reload if REST API certificate is changed on disk (Michael Todorovic)
If the REST API certificate file was updated in place Patroni didn’t perform a reload.
Don’t create pgpass dir if kerberos auth is used (Kostiantyn Nemchenko)
Kerberos and password authentication are mutually exclusive.
Fixed little issues with custom bootstrap (Alexander Kukushkin)
Start Postgres with
hot_standby=off
only when we do a PITR and restart it after PITR is done.
Bugfixes
Compatibility with
kazoo-2.7+
(Alexander Kukushkin)Since Patroni is handling retries on its own, it is relying on the old behavior of
kazoo
that requests to a Zookeeper cluster are immediately discarded when there are no connections available.Explicitly request the version of Etcd v3 cluster when it is known that we are connecting via proxy (Alexander Kukushkin)
Patroni is working with Etcd v3 cluster via gPRC-gateway and it depending on the cluster version different endpoints (
/v3
,/v3beta
, or/v3alpha
) must be used. The version was resolved only together with the cluster topology, but since the latter was never done when connecting via proxy.
Version 2.0.2
New features
Ability to ignore externally managed replication slots (James Coleman)
Patroni is trying to remove any replication slot which is unknown to it, but there are certainly cases when replication slots should be managed externally. From now on it is possible to configure slots that should not be removed.
Added support for cipher suite limitation for REST API (Gunnar “Nick” Bluth)
It could be configured via
restapi.ciphers
or thePATRONI_RESTAPI_CIPHERS
environment variable.Added support for encrypted TLS keys for REST API (Jonathan S. Katz)
It could be configured via
restapi.keyfile_password
or thePATRONI_RESTAPI_KEYFILE_PASSWORD
environment variable.Constant time comparison of REST API authentication credentials (Alex Brasetvik)
Use
hmac.compare_digest()
instead of==
, which is vulnerable to timing attack.Choose synchronous nodes based on replication lag (Krishna Sarabu)
If the replication lag on the synchronous node starts exceeding the configured threshold it could be demoted to asynchronous and/or replaced by the other node. Behaviour is controlled with
maximum_lag_on_syncnode
.
Stability improvements
Start postgres with
hot_standby = off
when doing custom bootstrap (Igor Yanchenko)During custom bootstrap Patroni is restoring the basebackup, starting Postgres up, and waiting until recovery finishes. Some PostgreSQL parameters on the standby can’t be smaller than on the primary and if the new value (restored from WAL) is higher than the configured one, Postgres panics and stops. In order to avoid such behavior we will do custom bootstrap without
hot_standby
mode.Warn the user if the required watchdog is not healthy (Nicolas Thauvin)
When the watchdog device is not writable or missing in required mode, the member cannot be promoted. Added a warning to show the user where to search for this misconfiguration.
Better verbosity for single-user mode recovery (Alexander Kukushkin)
If Patroni notices that PostgreSQL wasn’t shutdown clearly, in certain cases the crash-recovery is executed by starting Postgres in single-user mode. It could happen that the recovery failed (for example due to the lack of space on disk) but errors were swallowed.
Added compatibility with
python-consul2
module (Alexander Kukushkin, Wilfried Roset)The good old
python-consul
is not maintained since a few years, therefore someone created a fork with new features and bug-fixes.Don’t use
bypass_api_service
when runningpatronictl
(Alexander Kukushkin)When a K8s pod is running in a non-
default
namespace it does not necessarily have enough permissions to query thekubernetes
endpoint. In this case Patroni shows the warning and ignores thebypass_api_service
setting. In case ofpatronictl
the warning was a bit annoying.Create
raft.data_dir
if it doesn’t exists or make sure that it is writable (Mark Mercado)Improves user-friendliness and usability.
Bugfixes
Don’t interrupt restart or promote if lost leader lock in pause (Alexander Kukushkin)
In pause it is allowed to run postgres as primary without lock.
Fixed issue with
shutdown_request()
in the REST API (Nicolas Limage)In order to improve handling of SSL connections and delay the handshake until thread is started Patroni overrides a few methods in the
HTTPServer
. Theshutdown_request()
method was forgotten.Fixed issue with sleep time when using Zookeeper (Alexander Kukushkin)
There were chances that Patroni was sleeping up to twice longer between running HA code.
Fixed invalid
os.symlink()
calls when moving data directory after failed bootstrap (Andrew L’Ecuyer)If the bootstrap failed Patroni is renaming data directory, pg_wal, and all tablespaces. After that it updates symlinks so filesystem remains consistent. The symlink creation was failing due to the
src
anddst
arguments being swapped.Fixed bug in the post_bootstrap() method (Alexander Kukushkin)
If the superuser password wasn’t configured Patroni was failing to call the
post_init
script and therefore the whole bootstrap was failing.Fixed an issues with pg_rewind in the standby cluster (Alexander Kukushkin)
If the superuser name is different from Postgres, the
pg_rewind
in the standby cluster was failing because the connection string didn’t contain the database name.Exit only if authentication with Etcd v3 explicitly failed (Alexander Kukushkin)
On start Patroni performs discovery of Etcd cluster topology and authenticates if it is necessarily. It could happen that one of etcd servers is not accessible, Patroni was trying to perform authentication on this server and failing instead of retrying with the next node.
Handle case with psutil cmdline() returning empty list (Alexander Kukushkin)
Zombie processes are still postmasters children, but they don’t have cmdline()
Treat
PATRONI_KUBERNETES_USE_ENDPOINTS
environment variable as boolean (Alexander Kukushkin)Not doing so was making impossible disabling
kubernetes.use_endpoints
via environment.Improve handling of concurrent endpoint update errors (Alexander Kukushkin)
Patroni will explicitly query the current endpoint object, verify that the current pod still holds the leader lock and repeat the update.
Version 2.0.1
New features
Use
more
as pager inpatronictl edit-config
ifless
is not available (Pavel Golub)On Windows it would be the
more.com
. In addition to that,cdiff
was changed toydiff
inrequirements.txt
, butpatronictl
still supports both for compatibility.Added support of
raft
bind_addr
andpassword
(Alexander Kukushkin)raft.bind_addr
might be useful when running behind NAT.raft.password
enables traffic encryption (requires thecryptography
module).Added
sslpassword
connection parameter support (Kostiantyn Nemchenko)The connection parameter was introduced in PostgreSQL 13.
Stability improvements
Changed the behavior in pause (Alexander Kukushkin)
Patroni will not call the
bootstrap
method if thePGDATA
directory is missing/empty.Patroni will not exit on sysid mismatch in pause, only log a warning.
The node will not try to grab the leader key in pause mode if Postgres is running not in recovery (accepting writes) but the sysid doesn’t match with the initialize key.
Apply
master_start_timeout
when executing crash recovery (Alexander Kukushkin)If Postgres crashed on the leader node, Patroni does a crash-recovery by starting Postgres in single-user mode. During the crash-recovery the leader lock is being updated. If the crash-recovery didn’t finish in
master_start_timeout
seconds, Patroni will stop it forcefully and release the leader lock.Removed the
secure
extra from theurllib3
requirements (Alexander Kukushkin)The only reason for adding it there was the
ipaddress
dependency for python 2.7.
Bugfixes
Fixed a bug in the
Kubernetes.update_leader()
(Alexander Kukushkin)An unhandled exception was preventing demoting the primary when the update of the leader object failed.
Fixed hanging
patronictl
when RAFT is being used (Alexander Kukushkin)When using
patronictl
with Patroni config,self_addr
should be added to thepartner_addrs
.Fixed bug in
get_guc_value()
(Alexander Kukushkin)Patroni was failing to get the value of
restore_command
on PostgreSQL 12, therefore fetching missing WALs forpg_rewind
didn’t work.
Version 2.0.0
This version enhances compatibility with PostgreSQL 13, adds support of multiple synchronous standbys, has significant improvements in handling of pg_rewind
, adds support of Etcd v3 and Patroni on pure RAFT (without Etcd, Consul, or Zookeeper), and makes it possible to optionally call the pre_promote
(fencing) script.
PostgreSQL 13 support
Don’t fire
on_reload
when promoting tostandby_leader
on PostgreSQL 13+ (Alexander Kukushkin)When promoting to
standby_leader
we changeprimary_conninfo
, update the role and reload Postgres. Sinceon_role_change
andon_reload
effectively duplicate each other, Patroni will call onlyon_role_change
.Added support for
gssencmode
andchannel_binding
connection parameters (Alexander Kukushkin)PostgreSQL 12 introduced
gssencmode
and 13channel_binding
connection parameters and now they can be used if defined in thepostgresql.authentication
section.Handle renaming of
wal_keep_segments
towal_keep_size
(Alexander Kukushkin)In case of misconfiguration (
wal_keep_segments
on 13 andwal_keep_size
on older versions) Patroni will automatically adjust the configuration.Use
pg_rewind
with--restore-target-wal
on 13 if possible (Alexander Kukushkin)On PostgreSQL 13 Patroni checks if
restore_command
is configured and tellspg_rewind
to use it.
New features
[BETA] Implemented support of Patroni on pure RAFT (Alexander Kukushkin)
This makes it possible to run Patroni without 3rd party dependencies, like Etcd, Consul, or Zookeeper. For HA you will have to run either three Patroni nodes or two nodes with Patroni and one node with
patroni_raft_controller
. For more information please check the documentation.[BETA] Implemented support for Etcd v3 protocol via gPRC-gateway (Alexander Kukushkin)
Etcd 3.0 was released more than four years ago and Etcd 3.4 has v2 disabled by default. There are also chances that v2 will be completely removed from Etcd, therefore we implemented support of Etcd v3 in Patroni. In order to start using it you have to explicitly create the
etcd3
section is the Patroni configuration file.Supporting multiple synchronous standbys (Krishna Sarabu)
It allows running a cluster with more than one synchronous replicas. The maximum number of synchronous replicas is controlled by the new parameter
synchronous_node_count
. It is set to 1 by default and has no effect when thesynchronous_mode
is set tooff
.Added possibility to call the
pre_promote
script (Sergey Dudoladov)Unlike callbacks, the
pre_promote
script is called synchronously after acquiring the leader lock, but before promoting Postgres. If the script fails or exits with a non-zero exitcode, the current node will release the leader lock.Added support for configuration directories (Floris van Nee)
YAML files in the directory loaded and applied in alphabetical order.
Advanced validation of PostgreSQL parameters (Alexander Kukushkin)
In case the specific parameter is not supported by the current PostgreSQL version or when its value is incorrect, Patroni will remove the parameter completely or try to fix the value.
Wake up the main thread when the forced checkpoint after promote completed (Alexander Kukushkin)
Replicas are waiting for checkpoint indication via member key of the leader in DCS. The key is normally updated only once per HA loop. Without waking the main thread up, replicas will have to wait up to
loop_wait
seconds longer than necessary.Use of
pg_stat_wal_recevier
view on 9.6+ (Alexander Kukushkin)The view contains up-to-date values of
primary_conninfo
andprimary_slot_name
, while the contents ofrecovery.conf
could be stale.Improved handing of IPv6 addresses in the Patroni config file (Mateusz Kowalski)
The IPv6 address is supposed to be enclosed into square brackets, but Patroni was expecting to get it plain. Now both formats are supported.
Added Consul
service_tags
configuration parameter (Robert Edström)They are useful for dynamic service discovery, for example by load balancers.
Implemented SSL support for Zookeeper (Kostiantyn Nemchenko)
It requires
kazoo>=2.6.0
.Implemented
no_params
option for custom bootstrap method (Kostiantyn Nemchenko)It allows calling
wal-g
,pgBackRest
and other backup tools without wrapping them into shell scripts.Move WAL and tablespaces after a failed init (Feike Steenbergen)
When doing
reinit
, Patroni was already removing not onlyPGDATA
but also the symlinked WAL directory and tablespaces. Now themove_data_directory()
method will do a similar job, i.e. rename WAL directory and tablespaces and update symlinks in PGDATA.
Improved in pg_rewind support
Improved timeline divergence check (Alexander Kukushkin)
We don’t need to rewind when the replayed location on the replica is not ahead of the switchpoint or the end of the checkpoint record on the former primary is the same as the switchpoint. In order to get the end of the checkpoint record we use
pg_waldump
and parse its output.Try to fetch missing WAL if
pg_rewind
complains about it (Alexander Kukushkin)It could happen that the WAL segment required for
pg_rewind
doesn’t exist in thepg_wal
directory anymore and thereforepg_rewind
can’t find the checkpoint location before the divergence point. Starting from PostgreSQL 13pg_rewind
could userestore_command
for fetching missing WALs. For older PostgreSQL versions Patroni parses the errors of a failed rewind attempt and tries to fetch the missing WAL by calling therestore_command
on its own.Detect a new timeline in the standby cluster and trigger rewind/reinitialize if necessary (Alexander Kukushkin)
The
standby_cluster
is decoupled from the primary cluster and therefore doesn’t immediately know about leader elections and timeline switches. In order to detect the fact, thestandby_leader
periodically checks for new history files inpg_wal
.Shorten and beautify history log output (Alexander Kukushkin)
When Patroni is trying to figure out the necessity of
pg_rewind
, it could write the content of the history file from the primary into the log. The history file is growing with every failover/switchover and eventually starts taking up too many lines, most of which are not so useful. Instead of showing the raw data, Patroni will show only 3 lines before the current replica timeline and 2 lines after.
Improvements on K8s
Get rid of
kubernetes
python module (Alexander Kukushkin)The official python kubernetes client contains a lot of auto-generated code and therefore very heavy. Patroni uses only a small fraction of K8s API endpoints and implementing support for them wasn’t hard.
Make it possible to bypass the
kubernetes
service (Alexander Kukushkin)When running on K8s, Patroni is usually communicating with the K8s API via the
kubernetes
service, the address of which is exposed in theKUBERNETES_SERVICE_HOST
environment variable. Like any other service, thekubernetes
service is handled bykube-proxy
, which in turn, depending on the configuration, is either relying on a userspace program oriptables
for traffic routing. Skipping the intermediate component and connecting directly to the K8s master nodes allows us to implement a better retry strategy and mitigate risks of demoting Postgres when K8s master nodes are upgraded.Sync HA loops of all pods of a Patroni cluster (Alexander Kukushkin)
Not doing so was increasing failure detection time from
ttl
tottl + loop_wait
.Populate
references
andnodename
in the subsets addresses on K8s (Alexander Kukushkin)Some load-balancers are relying on this information.
Fix possible race conditions in the
update_leader()
(Alexander Kukushkin)The concurrent update of the leader configmap or endpoint happening outside of Patroni might cause the
update_leader()
call to fail. In this case Patroni rechecks that the current node is still owning the leader lock and repeats the update.Explicitly disallow patching non-existent config (Alexander Kukushkin)
For DCS other than
kubernetes
the PATCH call is failing with an exception due tocluster.config
beingNone
, but on Kubernetes it was happily creating the config annotation and preventing writing bootstrap configuration after the bootstrap finished.Fix bug in
pause
(Alexander Kukushkin)Replicas were removing
primary_conninfo
and restarting Postgres when the leader key was absent, but they should do nothing.
Improvements in REST API
Defer TLS handshake until worker thread has started (Alexander Kukushkin, Ben Harris)
If the TLS handshake was done in the API thread and the client-side didn’t send any data, the API thread was blocked (risking DoS).
Check
basic-auth
independently from client certificate in REST API (Alexander Kukushkin)Previously only the client certificate was validated. Doing two checks independently is an absolutely valid use-case.
Write double
CRLF
after HTTP headers of theOPTIONS
request (Sergey Burladyan)HAProxy was happy with a single
CRLF
, while Consul health-check complained about broken connection and unexpected EOF.GET /cluster
was showing stale members info for Zookeeper (Alexander Kukushkin)The endpoint was using the Patroni internal cluster view. For Patroni itself it didn’t cause any issues, but when exposed to the outside world we need to show up-to-date information, especially replication lag.
Fixed health-checks for standby cluster (Alexander Kukushkin)
The
GET /standby-leader
for a master andGET /master
for astandby_leader
were incorrectly responding with 200.Implemented
DELETE /switchover
(Alexander Kukushkin)The REST API call deletes the scheduled switchover.
Created
/readiness
and/liveness
endpoints (Alexander Kukushkin)They could be useful to eliminate “unhealthy” pods from subsets addresses when the K8s service is used with label selectors.
Enhanced
GET /replica
andGET /async
REST API health-checks (Krishna Sarabu, Alexander Kukushkin)Checks now support optional keyword
?lag=<max-lag>
and will respond with 200 only if the lag is smaller than the supplied value. If relying on this feature please keep in mind that information about WAL position on the leader is updated only everyloop_wait
seconds!Added support for user defined HTTP headers in the REST API response (Yogesh Sharma)
This feature might be useful if requests are made from a browser.
Improvements in patronictl
Don’t try to call non-existing leader in
patronictl pause
(Alexander Kukushkin)While pausing a cluster without a leader on K8s,
patronictl
was showing warnings that member “None” could not be accessed.Handle the case when member
conn_url
is missing (Alexander Kukushkin)On K8s it is possible that the pod doesn’t have the necessary annotations because Patroni is not yet running. It was making
patronictl
to fail.Added ability to print ASCII cluster topology (Maxim Fedotov, Alexander Kukushkin)
It is very useful to get overview of the cluster with cascading replication.
Implement
patronictl flush switchover
(Alexander Kukushkin)Before that
patronictl flush
only supported cancelling scheduled restarts.
Bugfixes
Attribute error during bootstrap of the cluster with existing PGDATA (Krishna Sarabu)
When trying to create/update the
/history
key, Patroni was accessing theClusterConfig
object which wasn’t created in DCS yet.Improved exception handling in Consul (Alexander Kukushkin)
Unhandled exception in the
touch_member()
method caused the whole Patroni process to crash.Enforce
synchronous_commit=local
for thepost_init
script (Alexander Kukushkin)Patroni was already doing that when creating users (
replication
,rewind
), but missing it in the case ofpost_init
was an oversight. As a result, if the script wasn’t doing it internally on it’s own the bootstrap insynchronous_mode
wasn’t able to finish.Increased
maxsize
in the Consul pool manager (ponvenkates)With the default
size=1
some warnings were generated.Patroni was wrongly reporting Postgres as running (Alexander Kukushkin)
The state wasn’t updated when for example Postgres crashed due to an out-of-disk error.
Put
*
intopgpass
instead of missing or empty values (Alexander Kukushkin)If for example the
standby_cluster.port
is not specified, thepgpass
file was incorrectly generated.Skip physical replication slot creation on the leader node with special characters (Krishna Sarabu)
Patroni appeared to be creating a dormant slot (when
slots
defined) for the leader node when the name contained special chars such as ‘-’ (for e.g. “abc-us-1”).Avoid removing non-existent
pg_hba.conf
in the custom bootstrap (Krishna Sarabu)Patroni was failing if
pg_hba.conf
happened to be located outside of thepgdata
dir after custom bootstrap.
Version 1.6.5
New features
Master stop timeout (Krishna Sarabu)
The number of seconds Patroni is allowed to wait when stopping Postgres. Effective only when
synchronous_mode
is enabled. When set to value greater than 0 and thesynchronous_mode
is enabled, Patroni sendsSIGKILL
to the postmaster if the stop operation is running for more than the value set bymaster_stop_timeout
. Set the value according to your durability/availability tradeoff. If the parameter is not set or set to non-positive value,master_stop_timeout
does not have an effect.Don’t create permanent physical slot with name of the primary (Alexander Kukushkin)
It is a common problem that the primary recycles WAL segments while the replica is down. Now we have a good solution for static clusters, with a fixed number of nodes and names that never change. You just need to list the names of all nodes in the
slots
so the primary will not remove the slot when the node is down (not registered in DCS).First draft of Config Validator (Igor Yanchenko)
Use
patroni --validate-config patroni.yaml
in order to validate Patroni configuration.Possibility to configure max length of timelines history (Krishna Sarabu)
Patroni writes the history of failovers/switchovers into the
/history
key in DCS. Over time the size of this key becomes big, but in most cases only the last few lines are interesting. Themax_timelines_history
parameter allows to specify the maximum number of timeline history items to be kept in DCS.Kazoo 2.7.0 compatibility (Danyal Prout)
Some non-public methods in Kazoo changed their signatures, but Patroni was relying on them.
Improvements in patronictl
Show member tags (Kostiantyn Nemchenko, Alexander Kukushkin)
Tags are configured individually for every node and there was no easy way to get an overview of them
Improve members output (Alexander Kukushkin)
The redundant cluster name won’t be shown anymore on every line, only in the table header.
$ patronictl list
+ Cluster: batman (6813309862653668387) +---------+----+-----------+---------------------+
| Member | Host | Role | State | TL | Lag in MB | Tags |
+-------------+----------------+--------+---------+----+-----------+---------------------+
| postgresql0 | 127.0.0.1:5432 | Leader | running | 3 | | clonefrom: true |
| | | | | | | noloadbalance: true |
| | | | | | | nosync: true |
+-------------+----------------+--------+---------+----+-----------+---------------------+
| postgresql1 | 127.0.0.1:5433 | | running | 3 | 0.0 | |
+-------------+----------------+--------+---------+----+-----------+---------------------+
Fail if a config file is specified explicitly but not found (Kaarel Moppel)
Previously
patronictl
was only reporting aDEBUG
message.Solved the problem of not initialized K8s pod breaking patronictl (Alexander Kukushkin)
Patroni is relying on certain pod annotations on K8s. When one of the Patroni pods is stopping or starting there is no valid annotation yet and
patronictl
was failing with an exception.
Stability improvements
Apply 1 second backoff if LIST call to K8s API server failed (Alexander Kukushkin)
It is mostly necessary to avoid flooding logs, but also helps to prevent starvation of the main thread.
Retry if the
retry-after
HTTP header is returned by K8s API (Alexander Kukushkin)If the K8s API server is overwhelmed with requests it might ask to retry.
Scrub
KUBERNETES_
environment from the postmaster (Feike Steenbergen)The
KUBERNETES_
environment variables are not required for PostgreSQL, yet having them exposed to the postmaster will also expose them to backends and to regular database users (using pl/perl for example).Clean up tablespaces on reinitialize (Krishna Sarabu)
During reinit, Patroni was removing only
PGDATA
and leaving user-defined tablespace directories. This is causing Patroni to loop in reinit. The previous workarond for the problem was implementing the custom bootstrap script.Explicitly execute
CHECKPOINT
after promote happened (Alexander Kukushkin)It helps to reduce the time before the new primary is usable for
pg_rewind
.Smart refresh of Etcd members (Alexander Kukushkin)
In case Patroni failed to execute a request on all members of the Etcd cluster, Patroni will re-check
A
orSRV
records for changes of IPs/hosts before retrying the next time.Skip missing values from
pg_controldata
(Feike Steenbergen)Values are missing when trying to use binaries of a version that doesn’t match PGDATA. Patroni will try to start Postgres anyway, and Postgres will complain that the major version doesn’t match and abort with an error.
Bugfixes
Disable SSL verification for Consul when required (Julien Riou)
Starting from a certain version of
urllib3
, thecert_reqs
must be explicitly set tossl.CERT_NONE
in order to effectively disable SSL verification.Avoid opening replication connection on every cycle of HA loop (Alexander Kukushkin)
Regression was introduced in 1.6.4.
Call
on_role_change
callback on failed primary (Alexander Kukushkin)In certain cases it could lead to the virtual IP remaining attached to the old primary. Regression was introduced in 1.4.5.
Reset rewind state if postgres started after successful pg_rewind (Alexander Kukushkin)
As a result of this bug Patroni was starting up manually shut down postgres in the pause mode.
Convert
recovery_min_apply_delay
toms
when checkingrecovery.conf
Patroni was indefinitely restarting replica if
recovery_min_apply_delay
was configured on PostgreSQL older than 12.PyInstaller compatibility (Alexander Kukushkin)
PyInstaller freezes (packages) Python applications into stand-alone executables. The compatibility was broken when we switched to the
spawn
method instead offork
formultiprocessing
.
Version 1.6.4
New features
Implemented
--wait
option forpatronictl reinit
(Igor Yanchenko)Patronictl will wait for
reinit
to finish is the--wait
option is used.Further improvements of Windows support (Igor Yanchenko, Alexander Kukushkin)
All shell scripts which are used for integration testing are rewritten in python
The
pg_ctl kill
will be used to stop postgres on non posix systemsDon’t try to use unix-domain sockets
Stability improvements
Make sure
unix_socket_directories
andstats_temp_directory
exist (Igor Yanchenko)Upon the start of Patroni and Postgres make sure that
unix_socket_directories
andstats_temp_directory
exist or try to create them. Patroni will exit if failed to create them.Make sure
postgresql.pgpass
is located in the place where Patroni has write access (Igor Yanchenko)In case if it doesn’t have a write access Patroni will exit with exception.
Disable Consul
serfHealth
check by default (Kostiantyn Nemchenko)Even in case of little network problems the failing
serfHealth
leads to invalidation of all sessions associated with the node. Therefore, the leader key is lost much earlier thanttl
which causes unwanted restarts of replicas and maybe demotion of the primary.Configure tcp keepalives for connections to K8s API (Alexander Kukushkin)
In case if we get nothing from the socket after TTL seconds it can be considered dead.
Avoid logging of passwords on user creation (Alexander Kukushkin)
If the password is rejected or logging is configured to verbose or not configured at all it might happen that the password is written into postgres logs. In order to avoid it Patroni will change
log_statement
,log_min_duration_statement
, andlog_min_error_statement
to some safe values before doing the attempt to create/update user.
Bugfixes
Use
restore_command
from thestandby_cluster
config on cascading replicas (Alexander Kukushkin)The
standby_leader
was already doing it from the beginning the feature existed. Not doing the same on replicas might prevent them from catching up with standby leader.Update timeline reported by the standby cluster (Alexander Kukushkin)
In case of timeline switch the standby cluster was correctly replicating from the primary but
patronictl
was reporting the old timeline.Allow certain recovery parameters be defined in the custom_conf (Alexander Kukushkin)
When doing validation of recovery parameters on replica Patroni will skip
archive_cleanup_command
,promote_trigger_file
,recovery_end_command
,recovery_min_apply_delay
, andrestore_command
if they are not defined in the patroni config but in files other thanpostgresql.auto.conf
orpostgresql.conf
.Improve handling of postgresql parameters with period in its name (Alexander Kukushkin)
Such parameters could be defined by extensions where the unit is not necessarily a string. Changing the value might require a restart (for example
pg_stat_statements.max
).Improve exception handling during shutdown (Alexander Kukushkin)
During shutdown Patroni is trying to update its status in the DCS. If the DCS is inaccessible an exception might be raised. Lack of exception handling was preventing logger thread from stopping.
Version 1.6.3
Bugfixes
Don’t expose password when running
pg_rewind
(Alexander Kukushkin)Bug was introduced in the #1301
Apply connection parameters specified in the
postgresql.authentication
topg_basebackup
and custom replica creation methods (Alexander Kukushkin)They were relying on url-like connection string and therefore parameters never applied.
Version 1.6.2
New features
Implemented
patroni --version
(Igor Yanchenko)It prints the current version of Patroni and exits.
Set the
user-agent
http header for all http requests (Alexander Kukushkin)Patroni is communicating with Consul, Etcd, and Kubernetes API via the http protocol. Having a specifically crafted
user-agent
(example:Patroni/1.6.2 Python/3.6.8 Linux
) might be useful for debugging and monitoring.Make it possible to configure log level for exception tracebacks (Igor Yanchenko)
If you set
log.traceback_level=DEBUG
the tracebacks will be visible only whenlog.level=DEBUG
. The default behavior remains the same.
Stability improvements
Avoid importing all DCS modules when searching for the module required by the config file (Alexander Kukushkin)
There is no need to import modules for Etcd, Consul, and Kubernetes if we need only e.g. Zookeeper. It helps to reduce memory usage and solves the problem of having INFO messages
Failed to import smth
.Removed python
requests
module from explicit requirements (Alexander Kukushkin)It wasn’t used for anything critical, but causing a lot of problems when the new version of
urllib3
is released.Improve handling of
etcd.hosts
written as a comma-separated string instead of YAML array (Igor Yanchenko)Previously it was failing when written in format
host1:port1, host2:port2
(the space character after the comma).
Usability improvements
Don’t force users to choose members from an empty list in
patronictl
(Igor Yanchenko)If the user provides a wrong cluster name, we will raise an exception rather than ask to choose a member from an empty list.
Make the error message more helpful if the REST API cannot bind (Igor Yanchenko)
For an inexperienced user it might be hard to figure out what is wrong from the Python stacktrace.
Bugfixes
Fix calculation of
wal_buffers
(Alexander Kukushkin)The base unit has been changed from 8 kB blocks to bytes in PostgreSQL 11.
Use
passfile
inprimary_conninfo
only on PostgreSQL 10+ (Alexander Kukushkin)On older versions there is no guarantee that
passfile
will work, unless the latest version oflibpq
is installed.
Version 1.6.1
New features
Added
PATRONICTL_CONFIG_FILE
environment variable (msvechla)It allows configuring the
--config-file
argument forpatronictl
from the environment.Implement
patronictl history
(Alexander Kukushkin)It shows the history of failovers/switchovers.
Pass
-c statement_timeout=0
inPGOPTIONS
when doingpg_rewind
(Alexander Kukushkin)It protects from the case when
statement_timeout
on the server is set to some small value and one of the statements executed by pg_rewind is canceled.Allow lower values for PostgreSQL configuration (Soulou)
Patroni didn’t allow some of the PostgreSQL configuration parameters be set smaller than some hardcoded values. Now the minimal allowed values are smaller, default values have not been changed.
Allow for certificate-based authentication (Jonathan S. Katz)
This feature enables certificate-based authentication for superuser, replication, rewind accounts and allows the user to specify the
sslmode
they wish to connect with.Use the
passfile
in theprimary_conninfo
instead of password (Alexander Kukushkin)It allows to avoid setting
600
permissions on postgresql.confPerform
pg_ctl reload
regardless of config changes (Alexander Kukushkin)It is possible that some config files are not controlled by Patroni. When somebody is doing a reload via the REST API or by sending SIGHUP to the Patroni process, the usual expectation is that Postgres will also be reloaded. Previously it didn’t happen when there were no changes in the
postgresql
section of Patroni config.Compare all recovery parameters, not only
primary_conninfo
(Alexander Kukushkin)Previously the
check_recovery_conf()
method was only checking whetherprimary_conninfo
has changed, never taking into account all other recovery parameters.Make it possible to apply some recovery parameters without restart (Alexander Kukushkin)
Starting from PostgreSQL 12 the following recovery parameters could be changed without restart:
archive_cleanup_command
,promote_trigger_file
,recovery_end_command
, andrecovery_min_apply_delay
. In future Postgres releases this list will be extended and Patroni will support it automatically.Make it possible to change
use_slots
online (Alexander Kukushkin)Previously it required restarting Patroni and removing slots manually.
Remove only
PATRONI_
prefixed environment variables when starting up Postgres (Cody Coons)It will solve a lot of problems with running different Foreign Data Wrappers.
Stability improvements
Use LIST + WATCH when working with K8s API (Alexander Kukushkin)
It allows to efficiently receive object changes (pods, endpoints/configmaps) and makes less stress on K8s master nodes.
Improve the workflow when PGDATA is not empty during bootstrap (Alexander Kukushkin)
According to the
initdb
source code it might consider a PGDATA empty when there are onlylost+found
and.dotfiles
in it. Now Patroni does the same. IfPGDATA
happens to be non-empty, and at the same time not valid from thepg_controldata
point of view, Patroni will complain and exit.Avoid calling expensive
os.listdir()
on every HA loop (Alexander Kukushkin)When the system is under IO stress,
os.listdir()
could take a few seconds (or even minutes) to execute, badly affecting the HA loop of Patroni. This could even cause the leader key to disappear from DCS due to the lack of updates. There is a better and less expensive way to check that the PGDATA is not empty. Now we check the presence of theglobal/pg_control
file in the PGDATA.Some improvements in logging infrastructure (Alexander Kukushkin)
Previously there was a possibility to loose the last few log lines on shutdown because the logging thread was a
daemon
thread.Use
spawn
multiprocessing start method on python 3.4+ (Maciej Kowalczyk)It is a known issue in Python that threading and multiprocessing do not mix well. Switching from the default method
fork
to thespawn
is a recommended workaround. Not doing so might result in the Postmaster starting process hanging and Patroni indefinitely reportingINFO: restarting after failure in progress
, while Postgres is actually up and running.
Improvements in REST API
Make it possible to check client certificates in the REST API (Alexander Kukushkin)
If the
verify_client
is set torequired
, Patroni will check client certificates for all REST API calls. When it is set tooptional
, client certificates are checked for all unsafe REST API endpoints.Return the response code 503 for the
GET /replica
health check request if Postgres is not running (Alexander Anikin)Postgres might spend significant time in recovery before it starts accepting client connections.
Implement
/history
and/cluster
endpoints (Alexander Kukushkin)The
/history
endpoint shows the content of thehistory
key in DCS. The/cluster
endpoint shows all cluster members and some service info like pending and scheduled restarts or switchovers.
Improvements in Etcd support
Retry on Etcd RAFT internal error (Alexander Kukushkin)
When the Etcd node is being shut down, it sends
response code=300, data='etcdserver: server stopped'
, which was causing Patroni to demote the primary.Don’t give up on Etcd request retry too early (Alexander Kukushkin)
When there were some network problems, Patroni was quickly exhausting the list of Etcd nodes and giving up without using the whole
retry_timeout
, potentially resulting in demoting the primary.
Bugfixes
Disable
synchronous_commit
when granting execute permissions to thepg_rewind
user (kremius)If the bootstrap is done with
synchronous_mode_strict: true
the GRANT EXECUTE statement was waiting indefinitely due to the non-synchronous nodes being available.Fix memory leak on python 3.7 (Alexander Kukushkin)
Patroni is using
ThreadingMixIn
to process REST API requests and python 3.7 made threads spawn for every request non-daemon by default.Fix race conditions in asynchronous actions (Alexander Kukushkin)
There was a chance that
patronictl reinit --force
could be overwritten by the attempt to recover stopped Postgres. This ended up in a situation when Patroni was trying to start Postgres while basebackup was running.Fix race condition in
postmaster_start_time()
method (Alexander Kukushkin)If the method is executed from the REST API thread, it requires a separate cursor object to be created.
Fix the problem of not promoting the sync standby that had a name containing upper case letters (Alexander Kukushkin)
We converted the name to the lower case because Postgres was doing the same while comparing the
application_name
with the value insynchronous_standby_names
.Kill all children along with the callback process before starting the new one (Alexander Kukushkin)
Not doing so makes it hard to implement callbacks in bash and eventually can lead to the situation when two callbacks are running at the same time.
Fix ‘start failed’ issue (Alexander Kukushkin)
Under certain conditions the Postgres state might be set to ‘start failed’ despite Postgres being up and running.
Version 1.6.0
This version adds compatibility with PostgreSQL 12, makes is possible to run pg_rewind without superuser on PostgreSQL 11 and newer, and enables IPv6 support.
New features
Psycopg2 was removed from requirements and must be installed independently (Alexander Kukushkin)
Starting from 2.8.0
psycopg2
was split into two different packages,psycopg2
, andpsycopg2-binary
, which could be installed at the same time into the same place on the filesystem. In order to decrease dependency hell problem, we let a user choose how to install it. There are a few options available, please consult the documentation.Compatibility with PostgreSQL 12 (Alexander Kukushkin)
Starting from PostgreSQL 12 there is no
recovery.conf
anymore and all former recovery parameters are converted into GUC. In order to protect fromALTER SYSTEM SET primary_conninfo
or similar, Patroni will parsepostgresql.auto.conf
and remove all standby and recovery parameters from there. Patroni config remains backward compatible. For example despiterestore_command
being a GUC, one can still specify it in thepostgresql.recovery_conf.restore_command
section and Patroni will write it intopostgresql.conf
for PostgreSQL 12.Make it possible to use
pg_rewind
without superuser on PostgreSQL 11 and newer (Alexander Kukushkin)If you want to use this feature please define
username
andpassword
in thepostgresql.authentication.rewind
section of Patroni configuration file. For an already existing cluster you will have to create the user manually andGRANT EXECUTE
permission on a few functions. You can find more details in the PostgreSQL documentation.Do a smart comparison of actual and desired
primary_conninfo
values on replicas (Alexander Kukushkin)It might help to avoid replica restart when you are converting an already existing primary-standby cluster to one managed by Patroni
IPv6 support (Alexander Kukushkin)
There were two major issues. Patroni REST API service was listening only on
0.0.0.0
and IPv6 IP addresses used in theapi_url
andconn_url
were not properly quoted.Kerberos support (Ajith Vilas, Alexander Kukushkin)
It makes possible using Kerberos authentication between Postgres nodes instead of defining passwords in Patroni configuration file
Manage
pg_ident.conf
(Alexander Kukushkin)This functionality works similarly to
pg_hba.conf
: if thepostgresql.pg_ident
is defined in the config file or DCS, Patroni will write its value topg_ident.conf
, however, ifpostgresql.parameters.ident_file
is defined, Patroni will assume thatpg_ident
is managed from outside and not update the file.
Improvements in REST API
Added
/health
endpoint (Wilfried Roset)It will return an HTTP status code only if PostgreSQL is running
Added
/read-only
and/read-write
endpoints (Julien Riou)The
/read-only
endpoint enables reads balanced across replicas and the primary. The/read-write
endpoint is an alias for/primary
,/leader
and/master
.Use
SSLContext
to wrap the REST API socket (Julien Riou)Usage of
ssl.wrap_socket()
is deprecated and was still allowing soon-to-be-deprecated protocols like TLS 1.1.
Logging improvements
Two-step logging (Alexander Kukushkin)
All log messages are first written into the in-memory queue and later they are asynchronously flushed into the stderr or file from a separate thread. The maximum queue size is limited (configurable). If the limit is reached, Patroni will start losing logs, which is still better than blocking the HA loop.
Enable debug logging for GET/OPTIONS API calls together with latency (Jan Tomsa)
It will help with debugging of health-checks performed by HAProxy, Consul or other tooling that decides which node is the primary/replica.
Log exceptions caught in Retry (Daniel Kucera)
Log the final exception when either the number of attempts or the timeout were reached. It will hopefully help to debug some issues when communication to DCS fails.
Improvements in patronictl
Enhance dialogues for scheduled switchover and restart (Rafia Sabih)
Previously dialogues did not take into account scheduled actions and therefore were misleading.
Check if config file exists (Wilfried Roset)
Be verbose about configuration file when the given filename does not exists, instead of ignoring silently (which can lead to misunderstanding).
Add fallback value for
EDITOR
(Wilfried Roset)When the
EDITOR
environment variable was not defined,patronictl edit-config
was failing with PatroniCtlException. The new strategy is to tryeditor
and thanvi
, which should be available on most systems.
Improvements in Consul support
Allow to specify Consul consistency mode (Jan Tomsa)
You can read more about consistency mode here.
Reload Consul config on SIGHUP (Cameron Daniel Kucera, Alexander Kukushkin)
It is especially useful when somebody is changing the value of
token
.
Bugfixes
Fix corner case in switchover/failover (Sharoon Thomas)
The variable
scheduled_at
may be undefined if REST API is not accessible and we are using DCS as a fallback.Open trust to localhost in
pg_hba.conf
during custom bootstrap (Alexander Kukushkin)Previously it was open only to unix_socket, which was causing a lot of errors:
FATAL: no pg_hba.conf entry for replication connection from host "127.0.0.1", user "replicator"
Consider synchronous node as healthy even when the former leader is ahead (Alexander Kukushkin)
If the primary loses access to the DCS, it restarts Postgres in read-only, but it might happen that other nodes can still access the old primary via the REST API. Such a situation was causing the synchronous standby not to promote because the old primary was reporting WAL position ahead of the synchronous standby.
Standby cluster bugfixes (Alexander Kukushkin)
Make it possible to bootstrap a replica in a standby cluster when the standby_leader is not accessible and a few other minor fixes.
Version 1.5.6
New features
Support work with etcd cluster via set of proxies (Alexander Kukushkin)
It might happen that etcd cluster is not accessible directly but via set of proxies. In this case Patroni will not perform etcd topology discovery but just round-robin via proxy hosts. Behavior is controlled by etcd.use_proxies.
Changed callbacks behavior when role on the node is changed (Alexander Kukushkin)
If the role was changed from master or standby_leader to replica or from replica to standby_leader, on_restart callback will not be called anymore in favor of on_role_change callback.
Change the way how we start postgres (Alexander Kukushkin)
Use multiprocessing.Process instead of executing itself and multiprocessing.Pipe to transmit the postmaster pid to the Patroni process. Before that we were using pipes, what was leaving postmaster process with stdin closed.
Bug fixes
Fix role returned by REST API for the standby leader (Alexander Kukushkin)
It was incorrectly returning replica instead of standby_leader
Wait for callback end if it could not be killed (Julien Tachoires)
Patroni doesn’t have enough privileges to terminate the callback script running under sudo what was cancelling the new callback. If the running script could not be killed, Patroni will wait until it finishes and then run the next callback.
Reduce lock time taken by dcs.get_cluster method (Alexander Kukushkin)
Due to the lock being held DCS slowness was affecting the REST API health checks causing false positives.
Improve cleaning of PGDATA when pg_wal/pg_xlog is a symlink (Julien Tachoires)
In this case Patroni will explicitly remove files from the target directory.
Remove unnecessary usage of os.path.relpath (Ants Aasma)
It depends on being able to resolve the working directory, what will fail if Patroni is started in a directory that is later unlinked from the filesystem.
Do not enforce ssl version when communicating with Etcd (Alexander Kukushkin)
For some unknown reason python3-etcd on debian and ubuntu are not based on the latest version of the package and therefore it enforces TLSv1 which is not supported by Etcd v3. We solved this problem on Patroni side.
Version 1.5.5
This version introduces the possibility of automatic reinit of the former master, improves patronictl list output and fixes a number of bugs.
New features
Add support of PATRONI_ETCD_PROTOCOL, PATRONI_ETCD_USERNAME and PATRONI_ETCD_PASSWORD environment variables (Étienne M)
Before it was possible to configure them only in the config file or as a part of PATRONI_ETCD_URL, which is not always convenient.
Make it possible to automatically reinit the former master (Alexander Kukushkin)
If the pg_rewind is disabled or can’t be used, the former master could fail to start as a new replica due to diverged timelines. In this case, the only way to fix it is wiping the data directory and reinitializing. This behavior could be changed by setting postgresql.remove_data_directory_on_diverged_timelines. When it is set, Patroni will wipe the data directory and reinitialize the former master automatically.
Show information about timelines in patronictl list (Alexander Kukushkin)
It helps to detect stale replicas. In addition to that, Host will include ‘:{port}’ if the port value isn’t default or there is more than one member running on the same host.
Create a headless service associated with the $SCOPE-config endpoint (Alexander Kukushkin)
The “config” endpoint keeps information about the cluster-wide Patroni and Postgres configuration, history file, and last but the most important, it holds the initialize key. When the Kubernetes master node is restarted or upgraded, it removes endpoints without services. The headless service will prevent it from being removed.
Bug fixes
Adjust the read timeout for the leader watch blocking query (Alexander Kukushkin)
According to the Consul documentation, the actual response timeout is increased by a small random amount of additional wait time added to the supplied maximum wait time to spread out the wake up time of any concurrent requests. It adds up to wait / 16 additional time to the maximum duration. In our case we are adding wait / 15 or 1 second depending on what is bigger.
Always use replication=1 when connecting via replication protocol to the postgres (Alexander Kukushkin)
Starting from Postgres 10 the line in the pg_hba.conf with database=replication doesn’t accept connections with the parameter replication=database.
Don’t write primary_conninfo into recovery.conf for wal-only standby cluster (Alexander Kukushkin)
Despite not having neither host nor port defined in the standby_cluster config, Patroni was putting the primary_conninfo into the recovery.conf, which is useless and generating a lot of errors.
Version 1.5.4
This version implements flexible logging and fixes a number of bugs.
New features
Improvements in logging infrastructure (Alexander Kukushkin, Lucas Capistrant, Alexander Anikin)
Logging configuration could be configured not only from environment variables but also from Patroni config file. It makes it possible to change logging configuration in runtime by updating config and doing reload or sending SIGHUP to the Patroni process. By default Patroni writes logs to stderr, but now it becomes possible to write logs directly into the file and rotate when it reaches a certain size. In addition to that added support of custom dateformat and the possibility to fine-tune log level for each python module.
Make it possible to take into account the current timeline during leader elections (Alexander Kukushkin)
It could happen that the node is considering itself as a healthiest one although it is currently not on the latest known timeline. In some cases we want to avoid promoting of such node, which could be achieved by setting check_timeline parameter to true (default behavior remains unchanged).
Relaxed requirements on superuser credentials
Libpq allows opening connections without explicitly specifying neither username nor password. Depending on situation it relies either on pgpass file or trust authentication method in pg_hba.conf. Since pg_rewind is also using libpq, it will work the same way.
Implemented possibility to configure Consul Service registration and check interval via environment variables (Alexander Kukushkin)
Registration of service in Consul was added in the 1.5.0, but so far it was only possible to turn it on via patroni.yaml.
Stability Improvements
Set archive_mode to off during the custom bootstrap (Alexander Kukushkin)
We want to avoid archiving wals and history files until the cluster is fully functional. It really helps if the custom bootstrap involves pg_upgrade.
Apply five seconds backoff when loading global config on start (Alexander Kukushkin)
It helps to avoid hammering DCS when Patroni just starting up.
Reduce amount of error messages generated on shutdown (Alexander Kukushkin)
They were harmless but rather annoying and sometimes scary.
Explicitly secure rw perms for recovery.conf at creation time (Lucas Capistrant)
We don’t want anybody except patroni/postgres user reading this file, because it contains replication user and password.
Redirect HTTPServer exceptions to logger (Julien Riou)
By default, such exceptions were logged on standard output messing with regular logs.
Bug fixes
Removed stderr pipe to stdout on pg_ctl process (Cody Coons)
Inheriting stderr from the main Patroni process allows all Postgres logs to be seen along with all patroni logs. This is very useful in a container environment as Patroni and Postgres logs may be consumed using standard tools (docker logs, kubectl, etc). In addition to that, this change fixes a bug with Patroni not being able to catch postmaster pid when postgres writing some warnings into stderr.
Set Consul service check deregister timeout in Go time format (Pavel Kirillov)
Without explicitly mentioned time unit registration was failing.
Relax checks of standby_cluster cluster configuration (Dmitry Dolgov, Alexander Kukushkin)
It was accepting only strings as valid values and therefore it was not possible to specify the port as integer and create_replica_methods as a list.
Version 1.5.3
Compatibility and bugfix release.
Improve stability when running with python3 against zookeeper (Alexander Kukushkin)
Change of loop_wait was causing Patroni to disconnect from zookeeper and never reconnect back.
Fix broken compatibility with postgres 9.3 (Alexander Kukushkin)
When opening a replication connection we should specify replication=1, because 9.3 does not understand replication=’database’
Make sure we refresh Consul session at least once per HA loop and improve handling of consul sessions exceptions (Alexander Kukushkin)
Restart of local consul agent invalidates all sessions related to the node. Not calling session refresh on time and not doing proper handling of session errors was causing demote of the primary.
Version 1.5.2
Compatibility and bugfix release.
Compatibility with kazoo-2.6.0 (Alexander Kukushkin)
In order to make sure that requests are performed with an appropriate timeout, Patroni redefines create_connection method from python-kazoo module. The last release of kazoo slightly changed the way how create_connection method is called.
Fix Patroni crash when Consul cluster loses the leader (Alexander Kukushkin)
The crash was happening due to incorrect implementation of touch_member method, it should return boolean and not raise any exceptions.
Version 1.5.1
This version implements support of permanent replication slots, adds support of pgBackRest and fixes number of bugs.
New features
Permanent replication slots (Alexander Kukushkin)
Permanent replication slots are preserved on failover/switchover, that is, Patroni on the new primary will create configured replication slots right after doing promote. Slots could be configured with the help of patronictl edit-config. The initial configuration could be also done in the bootstrap.dcs.
Add pgbackrest support (Yogesh Sharma)
pgBackrest can restore in existing $PGDATA folder, this allows speedy restore as files which have not changed since last backup are skipped, to support this feature new parameter keep_data has been introduced. See replica creation method section for additional examples.
Bug fixes
A few bugfixes in the “standby cluster” workflow (Alexander Kukushkin)
Please see https://github.com/zalando/patroni/pull/823 for more details.
Fix REST API health check when cluster management is paused and DCS is not accessible (Alexander Kukushkin)
Regression was introduced in https://github.com/zalando/patroni/commit/90cf930036a9d5249265af15d2b787ec7517cf57
Version 1.5.0
This version enables Patroni HA cluster to operate in a standby mode, introduces experimental support for running on Windows, and provides a new configuration parameter to register PostgreSQL service in Consul.
New features
Standby cluster (Dmitry Dolgov)
One or more Patroni nodes can form a standby cluster that runs alongside the primary one (i.e. in another datacenter) and consists of standby nodes that replicate from the master in the primary cluster. All PostgreSQL nodes in the standby cluster are replicas; one of those replicas elects itself to replicate directly from the remote master, while the others replicate from it in a cascading manner. More detailed description of this feature and some configuration examples can be found at here.
Register Services in Consul (Pavel Kirillov, Alexander Kukushkin)
If register_service parameter in the consul configuration is enabled, the node will register a service with the name scope and the tag master, replica or standby-leader.
Experimental Windows support (Pavel Golub)
From now on it is possible to run Patroni on Windows, although Windows support is brand-new and hasn’t received as much real-world testing as its Linux counterpart. We welcome your feedback!
Improvements in patronictl
Add patronictl -k/–insecure flag and support for restapi cert (Wilfried Roset)
In the past if the REST API was protected by the self-signed certificates patronictl would fail to verify them. There was no way to disable that verification. It is now possible to configure patronictl to skip the certificate verification altogether or provide CA and client certificates in the ctl: section of configuration.
Exclude members with nofailover tag from patronictl switchover/failover output (Alexander Anikin)
Previously, those members were incorrectly proposed as candidates when performing interactive switchover or failover via patronictl.
Stability improvements
Avoid parsing non-key-value output lines of pg_controldata (Alexander Anikin)
Under certain circuimstances pg_controldata outputs lines without a colon character. That would trigger an error in Patroni code that parsed pg_controldata output, hiding the actual problem; often such lines are emitted in a warning shown by pg_controldata before the regular output, i.e. when the binary major version does not match the one of the PostgreSQL data directory.
Add member name to the error message during the leader election (Jan Mussler)
During the leader election, Patroni connects to all known members of the cluster and requests their status. Such status is written to the Patroni log and includes the name of the member. Previously, if the member was not accessible, the error message did not indicate its name, containing only the URL.
Immediately reserve the WAL position upon creation of the replication slot (Alexander Kukushkin)
Starting from 9.6, pg_create_physical_replication_slot function provides an additional boolean parameter immediately_reserve. When it is set to false, which is also the default, the slot doesn’t reserve the WAL position until it receives the first client connection, potentially losing some segments required by the client in a time window between the slot creation and the initial client connection.
Fix bug in strict synchronous replication (Alexander Kukushkin)
When running with synchronous_mode_strict: true, in some cases Patroni puts * into the synchronous_standby_names, changing the sync state for most of the replication connections to potential. Previously, Patroni couldn’t pick a synchronous candidate under such curcuimstances, as it only considered those with the state async.
Version 1.4.6
Bug fixes and stability improvements
This release fixes a critical issue with Patroni API /master endpoint returning 200 for the non-master node. This is a reporting issue, no actual split-brain, but under certain circumstances clients might be directed to the read-only node.
Reset is_leader status on demote (Alexander Kukushkin, Oleksii Kliukin)
Make sure demoted cluster member stops responding with code 200 on the /master API call.
Add new “cluster_unlocked” field to the API output (Dmitry Dolgov)
This field indicates whether the cluster has the master running. It can be used when it is not possible to query any other node but one of the replicas.
Version 1.4.5
New features
Improve logging when applying new postgres configuration (Don Seiler)
Patroni logs changed parameter names and values.
Python 3.7 compatibility (Christoph Berg)
async is a reserved keyword in python3.7
Set state to “stopped” in the DCS when a member is shut down (Tony Sorrentino)
This shows the member state as “stopped” in “patronictl list” command.
Improve the message logged when stale postmaster.pid matches a running process (Ants Aasma)
The previous one was beyond confusing.
Implement patronictl reload functionality (Don Seiler)
Before that it was only possible to reload configuration by either calling REST API or by sending SIGHUP signal to the Patroni process.
Take and apply some parameters from controldata when starting as a replica (Alexander Kukushkin)
The value of max_connections and some other parameters set in the global configuration may be lower than the one actually used by the primary; when this happens, the replica cannot start and should be fixed manually. Patroni takes care of that now by reading and applying the value from pg_controldata, starting postgres and setting pending_restart flag.
If set, use LD_LIBRARY_PATH when starting postgres (Chris Fraser)
When starting up Postgres, Patroni was passing along PATH, LC_ALL and LANG env vars if they are set. Now it is doing the same with LD_LIBRARY_PATH. It should help if somebody installed PostgreSQL to non-standard place.
Rename create_replica_method to create_replica_methods (Dmitry Dolgov)
To make it clear that it’s actually an array. The old name is still supported for backward compatibility.
Bug fixes and stability improvements
Fix condition for the replica start due to pg_rewind in paused state (Oleksii Kliukin)
Avoid starting the replica that had already executed pg_rewind before.
Respond 200 to the master health-check only if update_lock has been successful (Alexander Kukushkin)
Prevent Patroni from reporting itself a master on the former (demoted) master if DCS is partitioned.
Fix compatibility with the new consul module (Alexander Kukushkin)
Starting from v1.1.0 python-consul changed internal API and started using list instead of dict to pass query parameters.
Catch exceptions from Patroni REST API thread during shutdown (Alexander Kukushkin)
Those uncaught exceptions kept PostgreSQL running at shutdown.
Do crash recovery only when Postgres runs as the master (Alexander Kukushkin)
Require pg_controldata to report ‘in production’ or ‘shutting down’ or ‘in crash recovery’. In all other cases no crash recovery is necessary.
Improve handling of configuration errors (Henning Jacobs, Alexander Kukushkin)
It is possible to change a lot of parameters in runtime (including restapi.listen) by updating Patroni config file and sending SIGHUP to Patroni process. This fix eliminates obscure exceptions from the ‘restapi’ thread when some of the parameters receive invalid values.
Version 1.4.4
Stability improvements
Fix race condition in poll_failover_result (Alexander Kukushkin)
It didn’t affect directly neither failover nor switchover, but in some rare cases it was reporting success too early, when the former leader released the lock, producing a ‘Failed over to “None”’ instead of ‘Failed over to “desired-node”’ message.
Treat Postgres parameter names as case insensitive (Alexander Kukushkin)
Most of the Postgres parameters have snake_case names, but there are three exceptions from this rule: DateStyle, IntervalStyle and TimeZone. Postgres accepts those parameters when written in a different case (e.g. timezone = ‘some/tzn’); however, Patroni was unable to find case-insensitive matches of those parameter names in pg_settings and ignored such parameters as a result.
Abort start if attaching to running postgres and cluster not initialized (Alexander Kukushkin)
Patroni can attach itself to an already running Postgres instance. It is imperative to start running Patroni on the master node before getting to the replicas.
Fix behavior of patronictl scaffold (Alexander Kukushkin)
Pass dict object to touch_member instead of json encoded string, DCS implementation will take care of encoding it.
Don’t demote master if failed to update leader key in pause (Alexander Kukushkin)
During maintenance a DCS may start failing write requests while continuing to responds to read ones. In that case, Patroni used to put the Postgres master node to a read-only mode after failing to update the leader lock in DCS.
Sync replication slots when Patroni notices a new postmaster process (Alexander Kukushkin)
If Postgres has been restarted, Patroni has to make sure that list of replication slots matches its expectations.
Verify sysid and sync replication slots after coming out of pause (Alexander Kukushkin)
During the maintenance mode it may happen that data directory was completely rewritten and therefore we have to make sure that Database system identifier still belongs to our cluster and replication slots are in sync with Patroni expectations.
Fix a possible failure to start not running Postgres on a data directory with postmaster lock file present (Alexander Kukushkin)
Detect reuse of PID from the postmaster lock file. More likely to hit such problem if you run Patroni and Postgres in the docker container.
Improve protection of DCS being accidentally wiped (Alexander Kukushkin)
Patroni has a lot of logic in place to prevent failover in such case; it can also restore all keys back; however, until this change an accidental removal of /config key was switching off pause mode for 1 cycle of HA loop.
Do not exit when encountering invalid system ID (Oleksii Kliukin)
Do not exit when the cluster system ID is empty or the one that doesn’t pass the validation check. In that case, the cluster most likely needs a reinit; mention it in the result message. Avoid terminating Patroni, as otherwise reinit cannot happen.
Compatibility with Kubernetes 1.10+
Added check for empty subsets (Cody Coons)
Kubernetes 1.10.0+ started returning Endpoints.subsets set to None instead of [].
Bootstrap improvements
Make deleting recovery.conf optional (Brad Nicholson)
If bootstrap.<custom_bootstrap_method_name>.keep_existing_recovery_conf is defined and set to
True
, Patroni will not remove the existingrecovery.conf
file. This is useful when bootstrapping from a backup with tools like pgBackRest that generate the appropriate recovery.conf for you.Allow options to the basebackup built-in method (Oleksii Kliukin)
It is now possible to supply options to the built-in basebackup method by defining the basebackup section in the configuration, similar to how those are defined for custom replica creation methods. The difference is in the format accepted by the basebackup section: since pg_basebackup accepts both –key=value and –key options, the contents of the section could be either a dictionary of key-value pairs, or a list of either one-element dictionaries or just keys (for the options that don’t accept values). See replica creation method section for additional examples.
Version 1.4.3
Improvements in logging
Make log level configurable from environment variables (Andy Newton, Keyvan Hedayati)
PATRONI_LOGLEVEL - sets the general logging level PATRONI_REQUESTS_LOGLEVEL - sets the logging level for all HTTP requests e.g. Kubernetes API calls See the docs for Python logging <https://docs.python.org/3.6/library/logging.html#levels> to get the names of possible log levels
Stability improvements and bug fixes
Don’t rediscover etcd cluster topology when watch timed out (Alexander Kukushkin)
If we have only one host in etcd configuration and exactly this host is not accessible, Patroni was starting discovery of cluster topology and never succeeding. Instead it should just switch to the next available node.
Write content of bootstrap.pg_hba into a pg_hba.conf after custom bootstrap (Alexander Kukushkin)
Now it behaves similarly to the usual bootstrap with initdb
Single user mode was waiting for user input and never finish (Alexander Kukushkin)
Regression was introduced in https://github.com/zalando/patroni/pull/576
Version 1.4.2
Improvements in patronictl
Rename scheduled failover to scheduled switchover (Alexander Kukushkin)
Failover and switchover functions were separated in version 1.4, but patronictl list was still reporting Scheduled failover instead of Scheduled switchover.
Show information about pending restarts (Alexander Kukushkin)
In order to apply some configuration changes sometimes it is necessary to restart postgres. Patroni was already giving a hint about that in the REST API and when writing node status into DCS, but there were no easy way to display it.
Make show-config to work with cluster_name from config file (Alexander Kukushkin)
It works similar to the patronictl edit-config
Stability improvements
Avoid calling pg_controldata during bootstrap (Alexander Kukushkin)
During initdb or custom bootstrap there is a time window when pgdata is not empty but pg_controldata has not been written yet. In such case pg_controldata call was failing with error messages.
Handle exceptions raised from psutil (Alexander Kukushkin)
cmdline is read and parsed every time when cmdline() method is called. It could happen that the process being examined has already disappeared, in that case NoSuchProcess is raised.
Kubernetes support improvements
Don’t swallow errors from k8s API (Alexander Kukushkin)
A call to Kubernetes API could fail for a different number of reasons. In some cases such call should be retried, in some other cases we should log the error message and the exception stack trace. The change here will help debug Kubernetes permission issues.
Update Kubernetes example Dockerfile to install Patroni from the master branch (Maciej Szulik)
Before that it was using feature/k8s, which became outdated.
Add proper RBAC to run patroni on k8s (Maciej Szulik)
Add the Service account that is assigned to the pods of the cluster, the role that holds only the necessary permissions, and the rolebinding that connects the Service account and the Role.
Version 1.4.1
Fixes in patronictl
Don’t show current leader in suggested list of members to failover to. (Alexander Kukushkin)
patronictl failover could still work when there is leader in the cluster and it should be excluded from the list of member where it is possible to failover to.
Make patronictl switchover compatible with the old Patroni api (Alexander Kukushkin)
In case if POST /switchover REST API call has failed with status code 501 it will do it once again, but to /failover endpoint.
Version 1.4
This version adds support for using Kubernetes as a DCS, allowing to run Patroni as a cloud-native agent in Kubernetes without any additional deployments of Etcd, Zookeeper or Consul.
Upgrade notice
Installing Patroni via pip will no longer bring in dependencies for (such as libraries for Etcd, Zookeper, Consul or Kubernetes, or support for AWS). In order to enable them one need to list them in pip install command explicitly, for instance pip install patroni[etcd,kubernetes].
Kubernetes support
Implement Kubernetes-based DCS. The endpoints meta-data is used in order to store the configuration and the leader key. The meta-data field inside the pods definition is used to store the member-related data. In addition to using Endpoints, Patroni supports ConfigMaps. You can find more information about this feature in the Kubernetes chapter of the documentation
Stability improvements
Factor out postmaster process into a separate object (Ants Aasma)
This object identifies a running postmaster process via pid and start time and simplifies detection (and resolution) of situations when the postmaster was restarted behind our back or when postgres directory disappeared from the file system.
Minimize the amount of SELECT’s issued by Patroni on every loop of HA cylce (Alexander Kukushkin)
On every iteration of HA loop Patroni needs to know recovery status and absolute wal position. From now on Patroni will run only single SELECT to get this information instead of two on the replica and three on the master.
Remove leader key on shutdown only when we have the lock (Ants Aasma)
Unconditional removal was generating unnecessary and misleading exceptions.
Improvements in patronictl
Add version command to patronictl (Ants Aasma)
It will show the version of installed Patroni and versions of running Patroni instances (if the cluster name is specified).
Make optional specifying cluster_name argument for some of patronictl commands (Alexander Kukushkin, Ants Aasma)
It will work if patronictl is using usual Patroni configuration file with the
scope
defined.Show information about scheduled switchover and maintenance mode (Alexander Kukushkin)
Before that it was possible to get this information only from Patroni logs or directly from DCS.
Improve
patronictl reinit
(Alexander Kukushkin)Sometimes
patronictl reinit
refused to proceed when Patroni was busy with other actions, namely trying to start postgres. patronictl didn’t provide any commands to cancel such long running actions and the only (dangerous) workarond was removing a data directory manually. The new implementation of reinit forcefully cancells other long-running actions before proceeding with reinit.Implement
--wait
flag inpatronictl pause
andpatronictl resume
(Alexander Kukushkin)It will make
patronictl
wait until the requested action is acknowledged by all nodes in the cluster. Such behaviour is achieved by exposing thepause
flag for every node in DCS and via the REST API.Rename
patronictl failover
intopatronictl switchover
(Alexander Kukushkin)The previous
failover
was actually only capable of doing a switchover; it refused to proceed in a cluster without the leader.Alter the behavior of
patronictl failover
(Alexander Kukushkin)It will work even if there is no leader, but in that case you will have to explicitly specify a node which should become the new leader.
Expose information about timeline and history
Expose current timeline in DCS and via API (Alexander Kukushkin)
Store information about the current timeline for each member of the cluster. This information is accessible via the API and is stored in the DCS
Store promotion history in the /history key in DCS (Alexander Kukushkin)
In addition, store the timeline history enriched with the timestamp of the corresponding promotion in the /history key in DCS and update it with each promote.
Add endpoints for getting synchronous and asynchronous replicas
Add new /sync and /async endpoints (Alexander Kukushkin, Oleksii Kliukin)
Those endpoints (also accessible as /synchronous and /asynchronous) return 200 only for synchronous and asynchronous replicas correspondingly (exclusing those marked as noloadbalance).
Allow multiple hosts for Etcd
Add a new hosts parameter to Etcd configuration (Alexander Kukushkin)
This parameter should contain the initial list of hosts that will be used to discover and populate the list of the running etcd cluster members. If for some reason during work this list of discovered hosts is exhausted (no available hosts from that list), Patroni will return to the initial list from the hosts parameter.
Version 1.3.6
Stability improvements
Verify process start time when checking if postgres is running. (Ants Aasma)
After a crash that doesn’t clean up postmaster.pid there could be a new process with the same pid, resulting in a false positive for is_running(), which will lead to all kinds of bad behavior.
Shutdown postgresql before bootstrap when we lost data directory (ainlolcat)
When data directory on the master is forcefully removed, postgres process can still stay alive for some time and prevent the replica created in place of that former master from starting or replicating. The fix makes Patroni cache the postmaster pid and its start time and let it terminate the old postmaster in case it is still running after the corresponding data directory has been removed.
Perform crash recovery in a single user mode if postgres master dies (Alexander Kukushkin)
It is unsafe to start immediately as a standby and not possible to run
pg_rewind
if postgres hasn’t been shut down cleanly. The single user crash recovery only kicks in ifpg_rewind
is enabled or there is no master at the moment.
Consul improvements
Make it possible to provide datacenter configuration for Consul (Vilius Okockis, Alexander Kukushkin)
Before that Patroni was always communicating with datacenter of the host it runs on.
Always send a token in X-Consul-Token http header (Alexander Kukushkin)
If
consul.token
is defined in Patroni configuration, we will always send it in the ‘X-Consul-Token’ http header. python-consul module tries to be “consistent” with Consul REST API, which doesn’t accept token as a query parameter for session API, but it still works with ‘X-Consul-Token’ header.Adjust session TTL if supplied value is smaller than the minimum possible (Stas Fomin, Alexander Kukushkin)
It could happen that the TTL provided in the Patroni configuration is smaller than the minimum one supported by Consul. In that case, Consul agent fails to create a new session. Without a session Patroni cannot create member and leader keys in the Consul KV store, resulting in an unhealthy cluster.
Other improvements
Define custom log format via environment variable
PATRONI_LOGFORMAT
(Stas Fomin)Allow disabling timestamps and other similar fields in Patroni logs if they are already added by the system logger (usually when Patroni runs as a service).
Version 1.3.5
Bugfix
Set role to ‘uninitialized’ if data directory was removed (Alexander Kukushkin)
If the node was running as a master it was preventing from failover.
Stability improvement
Try to run postmaster in a single-user mode if we tried and failed to start postgres (Alexander Kukushkin)
Usually such problem happens when node running as a master was terminated and timelines were diverged. If
recovery.conf
hasrestore_command
defined, there are really high chances that postgres will abort startup and leave controldata unchanged. It makes impossible to usepg_rewind
, which requires a clean shutdown.
Consul improvements
Make it possible to specify health checks when creating session (Alexander Kukushkin)
If not specified, Consul will use “serfHealth”. From one side it allows fast detection of isolated master, but from another side it makes it impossible for Patroni to tolerate short network lags.
Bugfix
Fix watchdog on Python 3 (Ants Aasma)
A misunderstanding of the ioctl() call interface. If mutable=False then fcntl.ioctl() actually returns the arg buffer back. This accidentally worked on Python2 because int and str comparison did not return an error. Error reporting is actually done by raising IOError on Python2 and OSError on Python3.
Version 1.3.4
Different Consul improvements
Pass the consul token as a header (Andrew Colin Kissa)
Headers are now the preferred way to pass the token to the consul API.
Advanced configuration for Consul (Alexander Kukushkin)
possibility to specify
scheme
,token
, client and ca certificates details.compatibility with python-consul-0.7.1 and above (Alexander Kukushkin)
new python-consul module has changed signature of some methods
“Could not take out TTL lock” message was never logged (Alexander Kukushkin)
Not a critical bug, but lack of proper logging complicates investigation in case of problems.
Quote synchronous_standby_names using quote_ident
When writing
synchronous_standby_names
into thepostgresql.conf
its value must be quoted (Alexander Kukushkin)If it is not quoted properly, PostgreSQL will effectively disable synchronous replication and continue to work.
Different bugfixes around pause state, mostly related to watchdog (Alexander Kukushkin)
Do not send keepalives if watchdog is not active
Avoid activating watchdog in a pause mode
Set correct postgres state in pause mode
Do not try to run queries from API if postgres is stopped
Version 1.3.3
Bugfixes
synchronous replication was disabled shortly after promotion even when synchronous_mode_strict was turned on (Alexander Kukushkin)
create empty
pg_ident.conf
file if it is missing after restoring from the backup (Alexander Kukushkin)open access in
pg_hba.conf
to all databases, not only postgres (Franco Bellagamba)
Version 1.3.2
Bugfix
patronictl edit-config didn’t work with ZooKeeper (Alexander Kukushkin)
Version 1.3.1
Bugfix
failover via API was broken due to change in
_MemberStatus
(Alexander Kukushkin)
Version 1.3
Version 1.3 adds custom bootstrap possibility, significantly improves support for pg_rewind, enhances the synchronous mode support, adds configuration editing to patronictl and implements watchdog support on Linux. In addition, this is the first version to work correctly with PostgreSQL 10.
Upgrade notice
There are no known compatibility issues with the new version of Patroni. Configuration from version 1.2 should work without any changes. It is possible to upgrade by installing new packages and either restarting Patroni (will cause PostgreSQL restart), or by putting Patroni into a pause mode first and then restarting Patroni on all nodes in the cluster (Patroni in a pause mode will not attempt to stop/start PostgreSQL), resuming from the pause mode at the end.
Custom bootstrap
Make the process of bootstrapping the cluster configurable (Alexander Kukushkin)
Allow custom bootstrap scripts instead of
initdb
when initializing the very first node in the cluster. The bootstrap command receives the name of the cluster and the path to the data directory. The resulting cluster can be configured to perform recovery, making it possible to bootstrap from a backup and do point in time recovery. Refer to the documentaton page for more detailed description of this feature.
Smarter pg_rewind support
Decide on whether to run pg_rewind by looking at the timeline differences from the current master (Alexander Kukushkin)
Previously, Patroni had a fixed set of conditions to trigger pg_rewind, namely when starting a former master, when doing a switchover to the designated node for every other node in the cluster or when there is a replica with the nofailover tag. All those cases have in common a chance that some replica may be ahead of the new master. In some cases, pg_rewind did nothing, in some other ones it was not running when necessary. Instead of relying on this limited list of rules make Patroni compare the master and the replica WAL positions (using the streaming replication protocol) in order to reliably decide if rewind is necessary for the replica.
Synchronous replication mode strict
Enhance synchronous replication support by adding the strict mode (James Sewell, Alexander Kukushkin)
Normally, when
synchronous_mode
is enabled and there are no replicas attached to the master, Patroni will disable synchronous replication in order to keep the master available for writes. Thesynchronous_mode_strict
option changes that, when it is set Patroni will not disable the synchronous replication in a lack of replicas, effectively blocking all clients writing data to the master. In addition to the synchronous mode guarantee of preventing any data loss due to automatic failover, the strict mode ensures that each write is either durably stored on two nodes or not happening altogether if there is only one node in the cluster.
Configuration editing with patronictl
Add configuration editing to patronictl (Ants Aasma, Alexander Kukushkin)
Add the ability to patronictl of editing dynamic cluster configuration stored in DCS. Support either specifying the parameter/values from the command-line, invoking the $EDITOR, or applying configuration from the yaml file.
Linux watchdog support
Implement watchdog support for Linux (Ants Aasma)
Support Linux software watchdog in order to reboot the node where Patroni is not running or not responding (e.g because of the high load) The Linux software watchdog reboots the non-responsive node. It is possible to configure the watchdog device to use (/dev/watchdog by default) and the mode (on, automatic, off) from the watchdog section of the Patroni configuration. You can get more information from the watchdog documentation.
Add support for PostgreSQL 10
Patroni is compatible with all beta versions of PostgreSQL 10 released so far and we expect it to be compatible with the PostgreSQL 10 when it will be released.
PostgreSQL-related minor improvements
Define pg_hba.conf via the Patroni configuration file or the dynamic configuration in DCS (Alexander Kukushkin)
Allow to define the contents of
pg_hba.conf
in thepg_hba
sub-section of thepostgresql
section of the configuration. This simplifies managingpg_hba.conf
on multiple nodes, as one needs to define it only ones in DCS instead of logging to every node, changing it manually and reload the configuration.When defined, the contents of this section will replace the current
pg_hba.conf
completely. Patroni ignores it ifhba_file
PostgreSQL parameter is set.Support connecting via a UNIX socket to the local PostgreSQL cluster (Alexander Kukushkin)
Add the
use_unix_socket
option to thepostgresql
section of Patroni configuration. When set to true and the PostgreSQLunix_socket_directories
option is not empty, enables Patroni to use the first value from it to connect to the local PostgreSQL cluster. Ifunix_socket_directories
is not defined, Patroni will assume its default value and omit thehost
parameter in the PostgreSQL connection string altogether.Support change of superuser and replication credentials on reload (Alexander Kukushkin)
Support storing of configuration files outside of PostgreSQL data directory (@jouir)
Add the new configuration
postgresql
configuration directiveconfig_dir
. It defaults to the data directory and must be writable by Patroni.
Bug fixes and stability improvements
Handle EtcdEventIndexCleared and EtcdWatcherCleared exceptions (Alexander Kukushkin)
Faster recovery when the watch operation is ended by Etcd by avoiding useless retries.
Remove error spinning on Etcd failure and reduce log spam (Ants Aasma)
Avoid immediate retrying and emitting stack traces in the log on the second and subsequent Etcd connection failures.
Export locale variables when forking PostgreSQL processes (Oleksii Kliukin)
Avoid the postmaster became multithreaded during startup fatal error on non-English locales for PostgreSQL built with NLS.
Extra checks when dropping the replication slot (Alexander Kukushkin)
In some cases Patroni is prevented from dropping the replication slot by the WAL sender.
Truncate the replication slot name to 63 (NAMEDATALEN - 1) characters to comply with PostgreSQL naming rules (Nick Scott)
Fix a race condition resulting in extra connections being opened to the PostgreSQL cluster from Patroni (Alexander Kukushkin)
Release the leader key when the node restarts with an empty data directory (Alex Kerney)
Set asynchronous executor busy when running bootstrap without a leader (Alexander Kukushkin)
Failure to do so could have resulted in errors stating the node belonged to a different cluster, as Patroni proceeded with the normal business while being bootstrapped by a bootstrap method that doesn’t require a leader to be present in the cluster.
Improve WAL-E replica creation method (Joar Wandborg, Alexander Kukushkin).
Use csv.DictReader when parsing WAL-E base backup, accepting ISO dates with space-delimited date and time.
Support fetching current WAL position from the replica to estimate the amount of WAL to restore. Previously, the code used to call system information functions that were available only on the master node.
Version 1.2
This version introduces significant improvements over the handling of synchronous replication, makes the startup process and failover more reliable, adds PostgreSQL 9.6 support and fixes plenty of bugs. In addition, the documentation, including these release notes, has been moved to https://patroni.readthedocs.io.
Synchronous replication
Add synchronous replication support. (Ants Aasma)
Adds a new configuration variable
synchronous_mode
. When enabled, Patroni will managesynchronous_standby_names
to enable synchronous replication whenever there are healthy standbys available. When synchronous mode is enabled, Patroni will automatically fail over only to a standby that was synchronously replicating at the time of the master failure. This effectively means that no user visible transaction gets lost in such a case. See the feature documentation for the detailed description and implementation details.
Reliability improvements
Do not try to update the leader position stored in the
leader optime
key when PostgreSQL is not 100% healthy. Demote immediately when the update of the leader key failed. (Alexander Kukushkin)Exclude unhealthy nodes from the list of targets to clone the new replica from. (Alexander Kukushkin)
Implement retry and timeout strategy for Consul similar to how it is done for Etcd. (Alexander Kukushkin)
Make
--dcs
and--config-file
apply to all options inpatronictl
. (Alexander Kukushkin)Write all postgres parameters into postgresql.conf. (Alexander Kukushkin)
It allows starting PostgreSQL configured by Patroni with just
pg_ctl
.Avoid exceptions when there are no users in the config. (Kirill Pushkin)
Allow pausing an unhealthy cluster. Before this fix,
patronictl
would bail out if the node it tries to execute pause on is unhealthy. (Alexander Kukushkin)Improve the leader watch functionality. (Alexander Kukushkin)
Previously the replicas were always watching the leader key (sleeping until the timeout or the leader key changes). With this change, they only watch when the replica’s PostgreSQL is in the
running
state and not when it is stopped/starting or restarting PostgreSQL.Avoid running into race conditions when handling SIGCHILD as a PID 1. (Alexander Kukushkin)
Previously a race condition could occur when running inside the Docker containers, since the same process inside Patroni both spawned new processes and handled SIGCHILD from them. This change uses fork/execs for Patroni and leaves the original PID 1 process responsible for handling signals from children.
Fix WAL-E restore. (Oleksii Kliukin)
Previously WAL-E restore used the
no_master
flag to avoid consulting with the master altogether, making Patroni always choose restoring from WAL over thepg_basebackup
. This change reverts it to the original meaning ofno_master
, namely Patroni WAL-E restore may be selected as a replication method if the master is not running. The latter is checked by examining the connection string passed to the method. In addition, it makes the retry mechanism more robust and handles other minutia.Implement asynchronous DNS resolver cache. (Alexander Kukushkin)
Avoid failing when DNS is temporary unavailable (for instance, due to an excessive traffic received by the node).
Implement starting state and master start timeout. (Ants Aasma, Alexander Kukushkin)
Previously
pg_ctl
waited for a timeout and then happily trodded on considering PostgreSQL to be running. This caused PostgreSQL to show up in listings as running when it was actually not and caused a race condition that resulted in either a failover, or a crash recovery, or a crash recovery interrupted by failover and a missed rewind. This change adds amaster_start_timeout
parameter and introduces a new state for the main HA loop:starting
. Whenmaster_start_timeout
is 0 we will failover immediately when the master crashes as soon as there is a failover candidate. Otherwise, Patroni will wait after attempting to start PostgreSQL on the master for the duration of the timeout; when it expires, it will failover if possible. Manual failover requests will be honored during the crash of the master even before the timeout expiration.Introduce the
timeout
parameter to therestart
API endpoint andpatronictl
. When it is set and restart takes longer than the timeout, PostgreSQL is considered unhealthy and the other nodes becomes eligible to take the leader lock.Fix
pg_rewind
behavior in a pause mode. (Ants Aasma)Avoid unnecessary restart in a pause mode when Patroni thinks it needs to rewind but rewind is not possible (i.e.
pg_rewind
is not present). Fallback to defaultlibpq
values for thesuperuser
(default OS user) ifsuperuser
authentication is missing from thepg_rewind
related Patroni configuration section.Serialize callback execution. Kill the previous callback of the same type when the new one is about to run. Fix the issue of spawning zombie processes when running callbacks. (Alexander Kukushkin)
Avoid promoting a former master when the leader key is set in DCS but update to this leader key fails. (Alexander Kukushkin)
This avoids the issue of a current master continuing to keep its role when it is partitioned together with the minority of nodes in Etcd and other DCSs that allow “inconsistent reads”.
Miscellaneous
Add
post_init
configuration option on bootstrap. (Alejandro Martínez)Patroni will call the script argument of this option right after running
initdb
and starting up PostgreSQL for a new cluster. The script receives a connection URL withsuperuser
and setsPGPASSFILE
to point to the.pgpass
file containing the password. If the script fails, Patroni initialization fails as well. It is useful for adding new users or creating extensions in the new cluster.Implement PostgreSQL 9.6 support. (Alexander Kukushkin)
Use
wal_level = replica
as a synonym forhot_standby
, avoiding pending_restart flag when it changes from one to another. (Alexander Kukushkin)
Documentation improvements
Add a Patroni main loop workflow diagram. (Alejandro Martínez, Alexander Kukushkin)
Improve README, adding the Helm chart and links to release notes. (Lauri Apple)
Move Patroni documentation to
Read the Docs
. The up-to-date documentation is available at https://patroni.readthedocs.io. (Oleksii Kliukin)Makes the documentation easily viewable from different devices (including smartphones) and searchable.
Move the package to the semantic versioning. (Oleksii Kliukin)
Patroni will follow the major.minor.patch version schema to avoid releasing the new minor version on small but critical bugfixes. We will only publish the release notes for the minor version, which will include all patches.
Version 1.1
This release improves management of Patroni cluster by bring in pause mode, improves maintenance with scheduled and conditional restarts, makes Patroni interaction with Etcd or Zookeeper more resilient and greatly enhances patronictl.
Upgrade notice
When upgrading from releases below 1.0 read about changing of credentials and configuration format at 1.0 release notes.
Pause mode
Introduce pause mode to temporary detach Patroni from managing PostgreSQL instance (Murat Kabilov, Alexander Kukushkin, Oleksii Kliukin).
Previously, one had to send SIGKILL signal to Patroni to stop it without terminating PostgreSQL. The new pause mode detaches Patroni from PostgreSQL cluster-wide without terminating Patroni. It is similar to the maintenance mode in Pacemaker. Patroni is still responsible for updating member and leader keys in DCS, but it will not start, stop or restart PostgreSQL server in the process. There are a few exceptions, for instance, manual failovers, reinitializes and restarts are still allowed. You can read a detailed description of this feature.
In addition, patronictl supports new pause
and resume
commands to toggle the pause mode.
Scheduled and conditional restarts
Add conditions to the restart API command (Oleksii Kliukin)
This change enhances Patroni restarts by adding a couple of conditions that can be verified in order to do the restart. Among the conditions are restarting when PostgreSQL role is either a master or a replica, checking the PostgreSQL version number or restarting only when restart is necessary in order to apply configuration changes.
Add scheduled restarts (Oleksii Kliukin)
It is now possible to schedule a restart in the future. Only one scheduled restart per node is supported. It is possible to clear the scheduled restart if it is not needed anymore. A combination of scheduled and conditional restarts is supported, making it possible, for instance, to scheduled minor PostgreSQL upgrades in the night, restarting only the instances that are running the outdated minor version without adding postgres-specific logic to administration scripts.
Add support for conditional and scheduled restarts to patronictl (Murat Kabilov).
patronictl restart supports several new options. There is also patronictl flush command to clean the scheduled actions.
Robust DCS interaction
Set Kazoo timeouts depending on the loop_wait (Alexander Kukushkin)
Originally, ping_timeout and connect_timeout values were calculated from the negotiated session timeout. Patroni loop_wait was not taken into account. As a result, a single retry could take more time than the session timeout, forcing Patroni to release the lock and demote.
This change set ping and connect timeout to half of the value of loop_wait, speeding up detection of connection issues and leaving enough time to retry the connection attempt before losing the lock.
Update Etcd topology only after original request succeed (Alexander Kukushkin)
Postpone updating the Etcd topology known to the client until after the original request. When retrieving the cluster topology, implement the retry timeouts depending on the known number of nodes in the Etcd cluster. This makes our client prefer to get the results of the request to having the up-to-date list of nodes.
Both changes make Patroni connections to DCS more robust in the face of network issues.
Patronictl, monitoring and configuration
Return information about streaming replicas via the API (Feike Steenbergen)
Previously, there was no reliable way to query Patroni about PostgreSQL instances that fail to stream changes (for instance, due to connection issues). This change exposes the contents of pg_stat_replication via the /patroni endpoint.
Add patronictl scaffold command (Oleksii Kliukin)
Add a command to create cluster structure in Etcd. The cluster is created with user-specified sysid and leader, and both leader and member keys are made persistent. This command is useful to create so-called master-less configurations, where Patroni cluster consisting of only replicas replicate from the external master node that is unaware of Patroni. Subsequently, one may remove the leader key, promoting one of the Patroni nodes and replacing the original master with the Patroni-based HA cluster.
Add configuration option
bin_dir
to locate PostgreSQL binaries (Ants Aasma)It is useful to be able to specify the location of PostgreSQL binaries explicitly when Linux distros that support installing multiple PostgreSQL versions at the same time.
Allow configuration file path to be overridden using
custom_conf
of (Alejandro Martínez)Allows for custom configuration file paths, which will be unmanaged by Patroni, details.
Bug fixes and code improvements
Make Patroni compatible with new version schema in PostgreSQL 10 and above (Feike Steenbergen)
Make sure that Patroni understand 2-digits version numbers when doing conditional restarts based on the PostgreSQL version.
Use pkgutil to find DCS modules (Alexander Kukushkin)
Use the dedicated python module instead of traversing directories manually in order to find DCS modules.
Always call on_start callback when starting Patroni (Alexander Kukushkin)
Previously, Patroni did not call any callbacks when attaching to the already running node with the correct role. Since callbacks are often used to route client connections that could result in the failure to register the running node in the connection routing scheme. With this fix, Patroni calls on_start callback even when attaching to the already running node.
Do not drop active replication slots (Murat Kabilov, Oleksii Kliukin)
Avoid dropping active physical replication slots on master. PostgreSQL cannot drop such slots anyway. This change makes possible to run non-Patroni managed replicas/consumers on the master.
Close Patroni connections during start of the PostgreSQL instance (Alexander Kukushkin)
Forces Patroni to close all former connections when PostgreSQL node is started. Avoids the trap of reusing former connections if postmaster was killed with SIGKILL.
Replace invalid characters when constructing slot names from member names (Ants Aasma)
Make sure that standby names that do not comply with the slot naming rules don’t cause the slot creation and standby startup to fail. Replace the dashes in the slot names with underscores and all other characters not allowed in slot names with their unicode codepoints.
Version 1.0
This release introduces the global dynamic configuration that allows dynamic changes of the PostgreSQL and Patroni configuration parameters for the entire HA cluster. It also delivers numerous bugfixes.
Upgrade notice
When upgrading from v0.90 or below, always upgrade all replicas before the master. Since we don’t store replication credentials in DCS anymore, an old replica won’t be able to connect to the new master.
Dynamic Configuration
Implement the dynamic global configuration (Alexander Kukushkin)
Introduce new REST API endpoint /config to provide PostgreSQL and Patroni configuration parameters that should be set globally for the entire HA cluster (master and all the replicas). Those parameters are set in DCS and in many cases can be applied without disrupting PostgreSQL or Patroni. Patroni sets a special flag called “pending restart” visible via the API when some of the values require the PostgreSQL restart. In that case, restart should be issued manually via the API.
Patroni SIGHUP or POST to /reload will make it re-read the configuration file.
See the Patroni configuration for the details on which parameters can be changed and the order of processing difference configuration sources.
The configuration file format has changed since the v0.90. Patroni is still compatible with the old configuration files, but in order to take advantage of the bootstrap parameters one needs to change it. Users are encourage to update them by referring to the dynamic configuration documentation page.
More flexible configuration*
Make postgresql configuration and database name Patroni connects to configurable (Misja Hoebe)
Introduce database and config_base_name configuration parameters. Among others, it makes possible to run Patroni with PipelineDB and other PostgreSQL forks.
Implement possibility to configure some Patroni configuration parameters via environment (Alexander Kukushkin)
Those include the scope, the node name and the namespace, as well as the secrets and makes it easier to run Patroni in a dynamic environment, i.e. Kubernetes Please, refer to the supported environment variables for further details.
Update the built-in Patroni docker container to take advantage of environment-based configuration (Feike Steenbergen).
Add Zookeeper support to Patroni docker image (Alexander Kukushkin)
Split the Zookeeper and Exhibitor configuration options (Alexander Kukushkin)
Make patronictl reuse the code from Patroni to read configuration (Alexander Kukushkin)
This allows patronictl to take advantage of environment-based configuration.
Set application name to node name in primary_conninfo (Alexander Kukushkin)
This simplifies identification and configuration of synchronous replication for a given node.
Stability, security and usability improvements
Reset sysid and do not call pg_controldata when restore of backup in progress (Alexander Kukushkin)
This change reduces the amount of noise generated by Patroni API health checks during the lengthy initialization of this node from the backup.
Fix a bunch of pg_rewind corner-cases (Alexander Kukushkin)
Avoid running pg_rewind if the source cluster is not the master.
In addition, avoid removing the data directory on an unsuccessful rewind, unless the new parameter remove_data_directory_on_rewind_failure is set to true. By default it is false.
Remove passwords from the replication connection string in DCS (Alexander Kukushkin)
Previously, Patroni always used the replication credentials from the Postgres URL in DCS. That is now changed to take the credentials from the patroni configuration. The secrets (replication username and password) and no longer exposed in DCS.
Fix the asynchronous machinery around the demote call (Alexander Kukushkin)
Demote now runs totally asynchronously without blocking the DCS interactions.
Make patronictl always send the authorization header if it is configured (Alexander Kukushkin)
This allows patronictl to issue “protected” requests, i.e. restart or reinitialize, when Patroni is configured to require authorization on those.
Handle the SystemExit exception correctly (Alexander Kukushkin)
Avoids the issues of Patroni not stopping properly when receiving the SIGTERM
Sample haproxy templates for confd (Alexander Kukushkin)
Generates and dynamically changes haproxy configuration from the patroni state in the DCS using confide
Improve and restructure the documentation to make it more friendly to the new users (Lauri Apple)
API must report role=master during pg_ctl stop (Alexander Kukushkin)
Makes the callback calls more reliable, particularly in the cluster stop case. In addition, introduce the pg_ctl_timeout option to set the timeout for the start, stop and restart calls via the pg_ctl.
Fix the retry logic in etcd (Alexander Kukushkin)
Make retries more predictable and robust.
Make Zookeeper code more resilient against short network hiccups (Alexander Kukushkin)
Reduce the connection timeouts to make Zookeeper connection attempts more frequent.
Version 0.90
This releases adds support for Consul, includes a new noloadbalance tag, changes the behavior of the clonefrom tag, improves pg_rewind handling and improves patronictl control program.
Consul support
Implement Consul support (Alexander Kukushkin)
Patroni runs against Consul, in addition to Etcd and Zookeeper. the connection parameters can be configured in the YAML file.
New and improved tags
Implement noloadbalance tag (Alexander Kukushkin)
This tag makes Patroni always return that the replica is not available to the load balancer.
Change the implementation of the clonefrom tag (Alexander Kukushkin)
Previously, a node name had to be supplied to the clonefrom, forcing a tagged replica to clone from the specific node. The new implementation makes clonefrom a boolean tag: if it is set to true, the replica becomes a candidate for other replicas to clone from it. When multiple candidates are present, the replicas picks one randomly.
Stability and security improvements
Numerous reliability improvements (Alexander Kukushkin)
Removes some spurious error messages, improves the stability of the failover, addresses some corner cases with reading data from DCS, shutdown, demote and reattaching of the former leader.
Improve systems script to avoid killing Patroni children on stop (Jan Keirse, Alexander Kukushkin)
Previously, when stopping Patroni, systemd also sent a signal to PostgreSQL. Since Patroni also tried to stop PostgreSQL by itself, it resulted in sending to different shutdown requests (the smart shutdown, followed by the fast shutdown). That resulted in replicas disconnecting too early and a former master not being able to rejoin after demote. Fix by Jan with prior research by Alexander.
Eliminate some cases where the former master was unable to call pg_rewind before rejoining as a replica (Oleksii Kliukin)
Previously, we only called pg_rewind if the former master had crashed. Change this to always run pg_rewind for the former master as long as pg_rewind is present in the system. This fixes the case when the master is shut down before the replicas managed to get the latest changes (i.e. during the “smart” shutdown).
Numerous improvements to unit- and acceptance- tests, in particular, enable support for Zookeeper and Consul (Alexander Kukushkin).
Make Travis CI faster and implement support for running tests against Zookeeper (Exhibitor) and Consul (Alexander Kukushkin)
Both unit and acceptance tests run automatically against Etcd, Zookeeper and Consul on each commit or pull-request.
Clear environment variables before calling PostgreSQL commands from Patroni (Feike Steenbergen)
This prevents a possibility of reading system environment variables by connecting to the PostgreSQL cluster managed by Patroni.
Configuration and control changes
Unify patronictl and Patroni configuration (Feike Steenbergen)
patronictl can use the same configuration file as Patroni itself.
Enable Patroni to read the configuration from the environment variables (Oleksii Kliukin)
This simplifies generating configuration for Patroni automatically, or merging a single configuration from different sources.
Include database system identifier in the information returned by the API (Feike Steenbergen)
Implement delete_cluster for all available DCSs (Alexander Kukushkin)
Enables support for DCSs other than Etcd in patronictl.
Version 0.80
This release adds support for cascading replication and simplifies Patroni management by providing scheduled failovers. One may use older versions of Patroni (in particular, 0.78) combined with this one in order to migrate to the new release. Note that the scheduled failover and cascading replication related features will only work with Patroni 0.80 and above.
Cascading replication
Add support for the replicatefrom and clonefrom tags for the patroni node (Oleksii Kliukin).
The tag replicatefrom allows a replica to use an arbitrary node a source, not necessary the master. The clonefrom does the same for the initial backup. Together, they enable Patroni to fully support cascading replication.
Add support for running replication methods to initialize the replica even without a running replication connection (Oleksii Kliukin).
This is useful in order to create replicas from the snapshots stored on S3 or FTP. A replication method that does not require a running replication connection should supply no_master: true in the yaml configuration. Those scripts will still be called in order if the replication connection is present.
Patronictl, API and DCS improvements
Implement scheduled failovers (Feike Steenbergen).
Failovers can be scheduled to happen at a certain time in the future, using either patronictl, or API calls.
Add support for dbuser and password parameters in patronictl (Feike Steenbergen).
Add PostgreSQL version to the health check output (Feike Steenbergen).
Improve Zookeeper support in patronictl (Oleksandr Shulgin)
Migrate to python-etcd 0.43 (Alexander Kukushkin)
Configuration
Add a sample systems configuration script for Patroni (Jan Keirse).
Fix the problem of Patroni ignoring the superuser name specified in the configuration file for DB connections (Alexander Kukushkin).
Fix the handling of CTRL-C by creating a separate session ID and process group for the postmaster launched by Patroni (Alexander Kukushkin).
Tests
Add acceptance tests with behave in order to check real-world scenarios of running Patroni (Alexander Kukushkin, Oleksii Kliukin).
The tests can be launched manually using the behave command. They are also launched automatically for pull requests and after commits.
Release notes for some older versions can be found on project’s github page.
Contributing
Resources and information for developers can be found in the pages below.
Contributing guidelines
Chatting
If you have a question, looking for an interactive troubleshooting help or want to chat with other Patroni users, join us on channel #patroni in the PostgreSQL Slack.
Reporting bugs
Before reporting a bug please make sure to reproduce it with the latest Patroni version! Also please double check if the issue already exists in our Issues Tracker.
Running tests
Requirements for running behave tests:
PostgreSQL packages including contrib modules need to be installed.
PostgreSQL binaries must be available in your PATH. You may need to add them to the path with something like PATH=/usr/lib/postgresql/11/bin:$PATH python -m behave.
If you’d like to test with external DCSs (e.g., Etcd, Consul, and Zookeeper) you’ll need the packages installed and respective services running and accepting unencrypted/unprotected connections on localhost and default port. In the case of Etcd or Consul, the behave test suite could start them up if binaries are available in the PATH.
Install dependencies:
# You may want to use Virtualenv or specify pip3.
pip install -r requirements.txt
pip install -r requirements.dev.txt
After you have all dependencies installed, you can run the various test suites:
# You may want to use Virtualenv or specify python3.
# Run flake8 to check syntax and formatting:
python setup.py flake8
# Run the pytest suite in tests/:
python setup.py test
# Moreover, you may want to run tests in different scopes for debugging purposes,
# the -s option include print output during test execution.
# Tests in pytest typically follow the pattern: FILEPATH::CLASSNAME::TESTNAME.
pytest -s tests/test_api.py
pytest -s tests/test_api.py::TestRestApiHandler
pytest -s tests/test_api.py::TestRestApiHandler::test_do_GET
# Run the behave (https://behave.readthedocs.io/en/latest/) test suite in features/;
# modify DCS as desired (raft has no dependencies so is the easiest to start with):
DCS=raft python -m behave
Testing with tox
To run tox tests you only need to install one dependency (other than Python)
pip install tox>=4
If you wish to run behave tests then you also need docker installed.
Tox configuration in tox.ini has “environments” to run the following tasks:
lint: Python code lint with flake8
test: unit tests for all available python interpreters with pytest, generates XML reports or HTML reports if a TTY is detected
dep: detect package dependency conflicts using pipdeptree
type: static type checking with pyright
black: code formatting with black
docker-build: build docker image used for the behave env
docker-cmd: run arbitrary command with the above image
docker-behave-etcd: run tox for behave tests with above image
py*behave: run behave with available python interpreters (without docker, although this is what is called inside docker containers)
docs: build docs with sphinx
Running tox
To run the default env list; dep, lint, test, and docs, just run:
tox
The test envs can be run with the label test:
tox -m test
The behave docker tests can be run with the label behave:
tox -m behave
Similarly, docs has the label docs.
All other envs can be run with their respective env names:
tox -e lint
tox -e py39-test-lin
It is also possible to select partial env lists using factors. For example, if you want to run all envs for python 3.10:
tox -f py310
This is equivalent to running all the envs listed below:
$ tox -l -f py310
py310-test-lin
py310-test-mac
py310-test-win
py310-type-lin
py310-type-mac
py310-type-win
py310-behave-etcd-lin
py310-behave-etcd-win
py310-behave-etcd-mac
You can list all configured combinations of environments with tox (>=v4) like so
tox l
The envs test and docs will attempt to open the HTML output files when the job completes, if tox is run with an active terminal. This is intended to be for benefit of the developer running this env locally. It will attempt to run open on a mac and xdg-open on Linux. To use a different command set the env var OPEN_CMD to the name or path of the command. If this step fails it will not fail the run overall. If you want to disable this facility set the env var OPEN_CMD to the : no-op command.
OPEN_CMD=: tox -m docs
Behave tests
Behave tests with -m behave will build docker images based on PG_MAJOR version 11 through 16 and then run all behave tests. This can take quite a long time to run so you might want to limit the scope to a select version of Postgres or to a specific feature set or steps.
To specify the version of postgres include the full name of the dependent image build env that you want and then the behave env name. For instance if you want Postgres 14 use:
tox -e pg14-docker-build,pg14-docker-behave-etcd-lin
If on the other hand you want to test a specific feature you can pass positional arguments to behave. This will run the watchdog behave feature test scenario with all versions of Postgres.
tox -m behave -- features/watchdog.feature
Of course you can combine the two.
Contributing a pull request
Fork the repository, develop and test your code changes.
Reflect changes in the user documentation.
Submit a pull request with a clear description of the changes objective. Link an existing issue if necessary.
You’ll get feedback about your pull request as soon as possible.
Happy Patroni hacking ;-)
patroni
patroni package
Subpackages
patroni.dcs package
Submodules
patroni.dcs.consul module
- class patroni.dcs.consul.Consul(config: Dict[str, Any])View on GitHub
Bases:
AbstractDCS
- __init__(config: Dict[str, Any]) None View on GitHub
Prepare DCS paths, Citus group ID, initial values for state information and processing dependencies.
- Variables:
config –
dict
, reference to config section of selected DCS. i.e.:zookeeper
for zookeeper,etcd
for etcd, etc…
- _abc_impl = <_abc._abc_data object>
- _citus_cluster_loader(path: str) Dict[int, Cluster] View on GitHub
Load and build all Patroni clusters from a single Citus cluster.
- Parameters:
path – the path in DCS where to load Cluster(s) from.
- Returns:
all Citus groups as
dict
, with group IDs as keys andCluster
objects as values or aCluster
object representing the coordinator with filled Cluster.workers attribute.
- _cluster_loader(path: str) Cluster View on GitHub
Load and build the
Cluster
object from DCS, which represents a single Patroni or Citus cluster.- Parameters:
path – the path in DCS where to load Cluster(s) from.
- Returns:
Cluster
instance.
- _delete_leader(**kwargs: Any) Any View on GitHub
Remove leader key from DCS.
This method should remove leader key if current instance is the leader.
- Parameters:
leader –
Leader
object with information about the leader.- Returns:
True
if successfully committed to DCS.
- _do_attempt_to_acquire_leader(retry: Retry) bool View on GitHub
- _do_refresh_session(force: bool = False) bool View on GitHub
- Returns:
!True if it had to create new session
- _load_cluster(path: str, loader: Callable[[str], Cluster | Dict[int, Cluster]]) Cluster | Dict[int, Cluster] View on GitHub
Main abstract method that implements the loading of
Cluster
instance.Note
Internally this method should call the loader method that will build
Cluster
object which represents current state and topology of the cluster in DCS. This method supposed to be called only by theget_cluster()
method.- Parameters:
path – the path in DCS where to load Cluster(s) from.
loader – one of
_cluster_loader()
or_citus_cluster_loader()
.
- Raise:
DCSError
in case of communication problems with DCS. If the current node was running as a primary and exception raised, instance would be demoted.
- static _run_and_handle_exceptions(method: Callable[[...], Any], *args: Any, **kwargs: Any) Any View on GitHub
- _set_service_name() None View on GitHub
- _update_leader(**kwargs: Any)View on GitHub
Update
leader
key (or session) ttl.Note
You have to use CAS (Compare And Swap) operation in order to update leader key, for example for etcd
prevValue
parameter must be used.If update fails due to DCS not being accessible or because it is not able to process requests (hopefully temporary), the
DCSError
exception should be raised.- Parameters:
leader – a reference to a current
leader
object.- Returns:
True
ifleader
key (or session) has been updated successfully.
- _write_failsafe(**kwargs: Any) Any View on GitHub
Write current cluster topology to DCS that will be used by failsafe mechanism (if enabled).
- Parameters:
value – failsafe topology serialized in JSON format.
- Returns:
True
if successfully committed to DCS.
- _write_leader_optime(**kwargs: Any) Any View on GitHub
Write current WAL LSN into
/optime/leader
key in DCS.- Parameters:
last_lsn – absolute WAL LSN in bytes.
- Returns:
True
if successfully committed to DCS.
- _write_status(**kwargs: Any) Any View on GitHub
Write current WAL LSN and
confirmed_flush_lsn
of permanent slots into the/status
key in DCS.- Parameters:
value – status serialized in JSON format.
- Returns:
True
if successfully committed to DCS.
- adjust_ttl() None View on GitHub
- attempt_to_acquire_leader(**kwargs: Any)View on GitHub
Attempt to acquire leader lock.
Note
This method should create
/leader
key with the value_name
.The key must be created atomically. In case the key already exists it should not be overwritten and
False
must be returned.If key creation fails due to DCS not being accessible or because it is not able to process requests (hopefully temporary), the
DCSError
exception should be raised.- Returns:
True
if key has been created successfully.
- cancel_initialization(**kwargs: Any) Any View on GitHub
Removes the
initialize
key for a cluster.- Returns:
True
if successfully committed to DCS.
- create_session() None View on GitHub
- delete_cluster(**kwargs: Any) Any View on GitHub
Delete cluster from DCS.
- Returns:
True
if successfully committed to DCS.
- delete_sync_state(**kwargs: Any) Any View on GitHub
Delete the synchronous state from DCS.
- Parameters:
version – for conditional deletion of the key/object.
- Returns:
True
if delete successful.
- deregister_service(**kwargs: Any) Any View on GitHub
- initialize(**kwargs: Any) Any View on GitHub
Race for cluster initialization.
This method should atomically create
initialize
key and returnTrue
, otherwise it should returnFalse
.- Parameters:
create_new –
False
if the key should already exist (in the case we are setting the system_id).sysid – PostgreSQL cluster system identifier, if specified, is written to the key.
- Returns:
True
if key has been created successfully.
- refresh_session() bool View on GitHub
- register_service(**kwargs: Any) Any View on GitHub
- reload_config(config: Config | Dict[str, Any]) None View on GitHub
Load and set relevant values from configuration.
Sets
loop_wait
,ttl
andretry_timeout
properties.- Parameters:
config – Loaded configuration information object or dictionary of key value pairs.
- set_config_value(**kwargs: Any) Any View on GitHub
Create or update
/config
key in DCS.- Parameters:
value – new value to set in the
config
key.version – for conditional update of the key/object.
- Returns:
True
if successfully committed to DCS.
- set_failover_value(**kwargs: Any) Any View on GitHub
Create or update
/failover
key.- Parameters:
value – value to set.
version – for conditional update of the key/object.
- Returns:
True
if successfully committed to DCS.
- set_history_value(**kwargs: Any) Any View on GitHub
Set value for
history
in DCS.- Parameters:
value – new value of
history
key/object.- Returns:
True
if successfully committed to DCS.
- set_retry_timeout(retry_timeout: int) None View on GitHub
Set the new value for retry_timeout.
- set_sync_state_value(**kwargs: Any) Any View on GitHub
Set synchronous state in DCS.
- Parameters:
value – the new value of
/sync
key.version – for conditional update of the key/object.
- Returns:
version of the new object or
False
in case of error.
- set_ttl(ttl: int) bool | None View on GitHub
Set the new ttl value for DCS keys.
- take_leader() bool View on GitHub
Establish a new leader in DCS.
Note
This method should create leader key with value of
_name
andttl
ofttl
.Since it could be called only on initial cluster bootstrap it could create this key regardless, overwriting the key if necessary.
- Returns:
True
if successfully committed to DCS.
- touch_member(**kwargs: Any) Any View on GitHub
Update member key in DCS.
Note
This method should create or update key with the name with
/members/
+_name
and the value of data in a given DCS.- Parameters:
data – information about an instance (including connection strings).
- Returns:
True
if successfully committed to DCS.
- update_service(**kwargs: Any) Any View on GitHub
- watch(leader_version: int | None, timeout: float) bool View on GitHub
Sleep if the current node is a leader, otherwise, watch for changes of leader key with a given timeout.
- Parameters:
leader_version – version of a leader key.
timeout – timeout in seconds.
- Returns:
if
True
this will reschedule the next run of the HA cycle.
- class patroni.dcs.consul.ConsulClient(*args: Any, **kwargs: Any)View on GitHub
Bases:
Consul
- __init__(*args: Any, **kwargs: Any) None View on GitHub
Consul client with Patroni customisations.
Note
Parameters, token, cert and ca_cert are not passed to the parent class
consul.base.Consul
.Original class documentation,
token is an optional
ACL token
. If supplied it will be used by default for all requests made with this client session. It’s still possible to override this token by passing a token explicitly for a request.consistency sets the consistency mode to use by default for all reads that support the consistency option. It’s still possible to override this by passing explicitly for a given request. consistency can be either ‘default’, ‘consistent’ or ‘stale’.
dc is the datacenter that this agent will communicate with. By default, the datacenter of the host is used.
verify is whether to verify the SSL certificate for HTTPS requests
cert client side certificates for HTTPS requests
- Parameters:
args – positional arguments to pass to
consul.base.Consul
kwargs – keyword arguments, with cert, ca_cert and token removed, passed to
consul.base.Consul
- connect(*args: Any, **kwargs: Any) HTTPClient View on GitHub
- http_connect(*args: Any, **kwargs: Any) HTTPClient View on GitHub
- exception patroni.dcs.consul.ConsulError(value: Any)View on GitHub
Bases:
DCSError
- exception patroni.dcs.consul.ConsulInternalErrorView on GitHub
Bases:
ConsulException
An internal Consul server error occurred
- class patroni.dcs.consul.HTTPClient(host: str = '127.0.0.1', port: int = 8500, token: str | None = None, scheme: str = 'http', verify: bool = True, cert: str | None = None, ca_cert: str | None = None)View on GitHub
Bases:
object
- __init__(host: str = '127.0.0.1', port: int = 8500, token: str | None = None, scheme: str = 'http', verify: bool = True, cert: str | None = None, ca_cert: str | None = None) None View on GitHub
- static response(response: HTTPResponse) Response View on GitHub
- set_read_timeout(timeout: float) None View on GitHub
- set_ttl(ttl: int) bool View on GitHub
- exception patroni.dcs.consul.InvalidSessionView on GitHub
Bases:
ConsulException
invalid session
- exception patroni.dcs.consul.InvalidSessionTTLView on GitHub
Bases:
ConsulException
Session TTL is too small or too big
- class patroni.dcs.consul.Response(code, headers, body, content)View on GitHub
Bases:
NamedTuple
- _asdict()View on GitHub
Return a new dict which maps field names to their values.
- _field_defaults = {}
- _fields = ('code', 'headers', 'body', 'content')
- classmethod _make(iterable)View on GitHub
Make a new Response object from a sequence or iterable
- _replace(**kwds)View on GitHub
Return a new Response object replacing specified fields with new values
- patroni.dcs.consul.catch_consul_errors(func: Callable[[...], Any]) Callable[[...], Any] View on GitHub
- patroni.dcs.consul.force_if_last_failed(func: Callable[[...], Any]) Callable[[...], Any] View on GitHub
- patroni.dcs.consul.service_name_from_scope_name(scope_name: str) str View on GitHub
Translate scope name to service name which can be used in dns.
230 = 253 - len(‘replica.’) - len(‘.service.consul’)
patroni.dcs.etcd module
patroni.dcs.etcd3 module
patroni.dcs.exhibitor module
patroni.dcs.kubernetes module
- class patroni.dcs.kubernetes.CoreV1ApiProxy(use_endpoints: bool | None = False, bypass_api_service: bool | None = False)View on GitHub
Bases:
object
Proxy class to work with k8s_client.CoreV1Api() object
- _DEFAULT_RETRIABLE_HTTP_CODES = frozenset({500, 503, 504})
- __init__(use_endpoints: bool | None = False, bypass_api_service: bool | None = False) None View on GitHub
- configure_retriable_http_codes(retriable_http_codes: List[int]) None View on GitHub
- refresh_api_servers_cache() None View on GitHub
- class patroni.dcs.kubernetes.K8sClientView on GitHub
Bases:
object
- class ApiClient(bypass_api_service: bool | None = False)View on GitHub
Bases:
object
- _API_URL_PREFIX = '/api/v1/namespaces/'
- __init__(bypass_api_service: bool | None = False) None View on GitHub
- _calculate_timeouts(api_servers: int, timeout: float | None = None) Tuple[int, float, int] View on GitHub
Calculate a request timeout and number of retries per single K8s API server node. In case if the timeout per node is too small (less than one second) we will reduce the number of nodes. For the cluster with only one API server node we will try to do 1 retry. No retries for clusters with 2 or more API server nodes. We better rely on switching to a different node.
- _do_http_request(retry: Retry | None, api_servers_cache: List[str], method: str, path: str, **kwargs: Any) HTTPResponse View on GitHub
- static _handle_server_response(response: HTTPResponse, _preload_content: bool) HTTPResponse | K8sObject View on GitHub
- _load_api_servers_cache() None View on GitHub
- _refresh_api_servers_cache(updating_cache: bool | None = False) None View on GitHub
- call_api(method: str, path: str, headers: Dict[str, str] | None = None, body: Any | None = None, _retry: Retry | None = None, _preload_content: bool = True, _request_timeout: float | None = None, **kwargs: Any) HTTPResponse | K8sObject View on GitHub
- refresh_api_servers_cache() None View on GitHub
- request(retry: Retry | None, method: str, path: str, timeout: int | float | Tuple[int | float, int | float] | Timeout | None = None, **kwargs: Any) HTTPResponse View on GitHub
- set_api_servers_cache_ttl(ttl: int) None View on GitHub
- set_base_uri(value: str) None View on GitHub
- set_read_timeout(timeout: int | float) None View on GitHub
- class CoreV1Api(api_client: ApiClient | None = None)View on GitHub
Bases:
object
- __init__(api_client: ApiClient | None = None) None View on GitHub
- class _K8sObjectTemplate(**kwargs: Any)View on GitHub
Bases:
K8sObject
The template for objects which we create locally, e.g. k8s_client.V1ObjectMeta & co
- __init__(**kwargs: Any) None View on GitHub
- __init__() None View on GitHub
- class restView on GitHub
Bases:
object
- class patroni.dcs.kubernetes.K8sConfigView on GitHub
Bases:
object
- exception ConfigExceptionView on GitHub
Bases:
Exception
- __init__() None View on GitHub
- static _get_by_name(config: Dict[str, List[Dict[str, Any]]], section: str, name: str) Dict[str, Any] | None View on GitHub
- _pool_config_from_file_or_data(config: Dict[str, str], file_key_name: str, pool_key_name: str) None View on GitHub
- _read_token_file() str View on GitHub
- _set_token(token: str) None View on GitHub
- load_incluster_config(ca_certs: str = '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt', token_refresh_interval: timedelta = datetime.timedelta(seconds=60)) None View on GitHub
- load_kube_config(context: str | None = None) None View on GitHub
- exception patroni.dcs.kubernetes.K8sConnectionFailedView on GitHub
Bases:
K8sException
- exception patroni.dcs.kubernetes.K8sExceptionView on GitHub
Bases:
Exception
- class patroni.dcs.kubernetes.K8sObject(kwargs: Dict[str, Any])View on GitHub
Bases:
object
- to_dict() Dict[str, Any] View on GitHub
- class patroni.dcs.kubernetes.Kubernetes(config: Dict[str, Any])View on GitHub
Bases:
AbstractDCS
- _CITUS_LABEL = 'citus-group'
- __init__(config: Dict[str, Any]) None View on GitHub
Prepare DCS paths, Citus group ID, initial values for state information and processing dependencies.
- Variables:
config –
dict
, reference to config section of selected DCS. i.e.:zookeeper
for zookeeper,etcd
for etcd, etc…
- __load_cluster(group: str | None, loader: Callable[[Dict[str, Any]], Cluster | Dict[int, Cluster]]) Cluster | Dict[int, Cluster]
- _abc_impl = <_abc._abc_data object>
- _citus_cluster_loader(path: Dict[str, Any]) Dict[int, Cluster] View on GitHub
Load and build all Patroni clusters from a single Citus cluster.
- Parameters:
path – the path in DCS where to load Cluster(s) from.
- Returns:
all Citus groups as
dict
, with group IDs as keys andCluster
objects as values or aCluster
object representing the coordinator with filled Cluster.workers attribute.
- _cluster_from_nodes(group: str, nodes: Dict[str, K8sObject], pods: Collection[K8sObject]) Cluster View on GitHub
- _cluster_loader(path: Dict[str, Any]) Cluster View on GitHub
Load and build the
Cluster
object from DCS, which represents a single Patroni or Citus cluster.- Parameters:
path – the path in DCS where to load Cluster(s) from.
- Returns:
Cluster
instance.
- _create_config_service() None View on GitHub
- _delete_leader(leader: Leader) bool View on GitHub
Unused
- _load_cluster(path: str, loader: Callable[[Any], Cluster | Dict[int, Cluster]]) Cluster | Dict[int, Cluster] View on GitHub
Main abstract method that implements the loading of
Cluster
instance.Note
Internally this method should call the loader method that will build
Cluster
object which represents current state and topology of the cluster in DCS. This method supposed to be called only by theget_cluster()
method.- Parameters:
path – the path in DCS where to load Cluster(s) from.
loader – one of
_cluster_loader()
or_citus_cluster_loader()
.
- Raise:
DCSError
in case of communication problems with DCS. If the current node was running as a primary and exception raised, instance would be demoted.
- _patch_or_create(name: str, annotations: Dict[str, Any], resource_version: str | None = None, patch: bool = False, retry: Callable[[...], Any] | None = None, ips: List[str] | None = None) K8sObject View on GitHub
Patch or create K8s object, Endpoint or ConfigMap.
- Parameters:
name – the name of the object.
annotations – mapping of annotations that we want to create/update.
resource_version – object should be updated only if the
resource_version
matches provided value.patch –
True
if we know in advance that the object already exists and we should patch it.retry – a callable that will take care of retries
ips –
IP address that we want to put to the subsets of the endpoint. Could have following values:
None
- when we don’t need to touch subset;[]
- to set subsets to the empty list, whendelete_leader()
method is called;['ip.add.re.ss']
- when we want to make sure that the subsets of the leader endpoint contains the IP address of the leader, that we get from thekubernetes.pod_ip
;['']
- when we want to make sure that the subsets of the leader endpoint contains the IP address of the leader, butkubernetes.pod_ip
configuration is missing. In this case we will try to take the IP address of the Pod which name matchesname
from the config file.
- Returns:
the new
V1Endpoints
orV1ConfigMap
object, that was created or updated.
- _update_leader(leader: Leader) bool View on GitHub
Unused
- _update_leader_with_retry(annotations: Dict[str, Any], resource_version: str | None, ips: List[str]) bool View on GitHub
- _wait_caches(stop_time: float) None View on GitHub
- _write_failsafe(value: str) bool View on GitHub
Unused
- _write_leader_optime(last_lsn: str) bool View on GitHub
Unused
- _write_status(value: str) bool View on GitHub
Unused
- attempt_to_acquire_leader() bool View on GitHub
Attempt to acquire leader lock.
Note
This method should create
/leader
key with the value_name
.The key must be created atomically. In case the key already exists it should not be overwritten and
False
must be returned.If key creation fails due to DCS not being accessible or because it is not able to process requests (hopefully temporary), the
DCSError
exception should be raised.- Returns:
True
if key has been created successfully.
- cancel_initialization() bool View on GitHub
Removes the
initialize
key for a cluster.- Returns:
True
if successfully committed to DCS.
- client_path(path: str) str View on GitHub
Construct the absolute key name from appropriate parts for the DCS type.
- Parameters:
path – The key name within the current Patroni cluster.
- Returns:
absolute key name for the current Patroni cluster.
- static compare_ports(p1: K8sObject, p2: K8sObject) bool View on GitHub
- delete_cluster(*args: Any, **kwargs: Any) Any View on GitHub
Delete cluster from DCS.
- Returns:
True
if successfully committed to DCS.
- delete_leader(leader: Leader | None, last_lsn: int | None = None) bool View on GitHub
Update
optime/leader
and voluntarily remove leader key from DCS.This method should remove leader key if current instance is the leader.
- Parameters:
leader –
Leader
object with information about the leader.last_lsn – latest checkpoint location in bytes.
- Returns:
boolean result of called abstract
_delete_leader()
.
- delete_sync_state(version: str | None = None) bool View on GitHub
Patch annotations of $SCOPE-sync Endpoint or ConfigMap with empty values.
Effectively it removes “leader” and “sync_standby” annotations from the object. :param version: last known resource_version for conditional update of the object :returns: True if “delete” was successful
- get_citus_coordinator() Cluster | None View on GitHub
Load the Patroni cluster for the Citus Coordinator.
Note
This method is only executed on the worker nodes (
group!=0
) to find the coordinator.- Returns:
Select
Cluster
instance associated with the Citus Coordinator group ID.
- initialize(create_new: bool = True, sysid: str = '') bool View on GitHub
Race for cluster initialization.
This method should atomically create
initialize
key and returnTrue
, otherwise it should returnFalse
.- Parameters:
create_new –
False
if the key should already exist (in the case we are setting the system_id).sysid – PostgreSQL cluster system identifier, if specified, is written to the key.
- Returns:
True
if key has been created successfully.
- manual_failover(leader: str | None, candidate: str | None, scheduled_at: datetime | None = None, version: str | None = None) bool View on GitHub
Prepare dictionary with given values and set
/failover
key in DCS.- Parameters:
leader – value to set for
leader
.candidate – value to set for
member
.scheduled_at – value converted to ISO date format for
scheduled_at
.version – for conditional update of the key/object.
- Returns:
True
if successfully committed to DCS.
- static member(pod: K8sObject) Member View on GitHub
- patch_or_create(*args: Any, **kwargs: Any) Any View on GitHub
- patch_or_create_config(annotations: Dict[str, Any], resource_version: str | None = None, patch: bool = False, retry: bool = True) bool View on GitHub
- reload_config(config: Config | Dict[str, Any]) None View on GitHub
Handles dynamic config changes.
Either cause by changes in the local configuration file + SIGHUP or by changes of dynamic configuration
- set_config_value(value: str, version: str | None = None) bool View on GitHub
Create or update
/config
key in DCS.- Parameters:
value – new value to set in the
config
key.version – for conditional update of the key/object.
- Returns:
True
if successfully committed to DCS.
- set_history_value(value: str) bool View on GitHub
Set value for
history
in DCS.- Parameters:
value – new value of
history
key/object.- Returns:
True
if successfully committed to DCS.
- set_retry_timeout(retry_timeout: int) None View on GitHub
Set the new value for retry_timeout.
- set_ttl(ttl: int) bool | None View on GitHub
Set the new ttl value for DCS keys.
- static subsets_changed(last_observed_subsets: List[K8sObject], ip: str, ports: List[K8sObject]) bool View on GitHub
>>> ip = '1.2.3.4' >>> a = [k8s_client.V1EndpointAddress(ip=ip)] >>> s = [k8s_client.V1EndpointSubset(addresses=a)] >>> Kubernetes.subsets_changed(s, '1.2.3.5', []) True >>> s = [k8s_client.V1EndpointSubset(addresses=a, ports=[k8s_client.V1EndpointPort(protocol='TCP', port=1)])] >>> Kubernetes.subsets_changed(s, '1.2.3.4', [k8s_client.V1EndpointPort(port=5432)]) True >>> p1 = k8s_client.V1EndpointPort(name='port1', port=1) >>> p2 = k8s_client.V1EndpointPort(name='port2', port=2) >>> p3 = k8s_client.V1EndpointPort(name='port3', port=3) >>> s = [k8s_client.V1EndpointSubset(addresses=a, ports=[p1, p2])] >>> Kubernetes.subsets_changed(s, ip, [p2, p3]) True >>> s2 = [k8s_client.V1EndpointSubset(addresses=a, ports=[p2, p1])] >>> Kubernetes.subsets_changed(s, ip, [p2, p1]) False
- take_leader() bool View on GitHub
Establish a new leader in DCS.
Note
This method should create leader key with value of
_name
andttl
ofttl
.Since it could be called only on initial cluster bootstrap it could create this key regardless, overwriting the key if necessary.
- Returns:
True
if successfully committed to DCS.
- touch_member(*args: Any, **kwargs: Any) Any View on GitHub
Update member key in DCS.
Note
This method should create or update key with the name with
/members/
+_name
and the value of data in a given DCS.- Parameters:
data – information about an instance (including connection strings).
- Returns:
True
if successfully committed to DCS.
- update_leader(leader: Leader, last_lsn: int | None, slots: Dict[str, int] | None = None, failsafe: Dict[str, str] | None = None) bool View on GitHub
Update
leader
key (or session) ttl and optime/leader.- Parameters:
leader –
Leader
object with information about the leader.last_lsn – absolute WAL LSN in bytes.
slots – dictionary with permanent slots
confirmed_flush_lsn
.failsafe – if defined dictionary passed to
write_failsafe()
.
- Returns:
True
ifleader
key (or session) has been updated successfully.
- watch(leader_version: str | None, timeout: float) bool View on GitHub
Sleep if the current node is a leader, otherwise, watch for changes of leader key with a given timeout.
- Parameters:
leader_version – version of a leader key.
timeout – timeout in seconds.
- Returns:
if
True
this will reschedule the next run of the HA cycle.
- write_leader_optime(last_lsn: int) None View on GitHub
Write value for WAL LSN to
optime
annotation of the leader object.- Parameters:
last_lsn – absolute WAL LSN in bytes.
- write_sync_state(leader: str | None, sync_standby: Collection[str] | None, version: str | None = None) SyncState | None View on GitHub
Prepare and write annotations to $SCOPE-sync Endpoint or ConfigMap.
- Parameters:
leader – name of the leader node that manages /sync key
sync_standby – collection of currently known synchronous standby node names
version – last known resource_version for conditional update of the object
- Returns:
the new
SyncState
object or None
- exception patroni.dcs.kubernetes.KubernetesError(value: Any)View on GitHub
Bases:
DCSError
- exception patroni.dcs.kubernetes.KubernetesRetriableException(orig: ApiException)View on GitHub
Bases:
ApiException
- __init__(orig: ApiException) None View on GitHub
- class patroni.dcs.kubernetes.ObjectCache(dcs: Kubernetes, func: Callable[[...], Any], retry: Retry, condition: Condition, name: str | None = None)View on GitHub
Bases:
Thread
- __init__(dcs: Kubernetes, func: Callable[[...], Any], retry: Retry, condition: Condition, name: str | None = None) None View on GitHub
This constructor should always be called with keyword arguments. Arguments are:
group should be None; reserved for future extension when a ThreadGroup class is implemented.
target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.
name is the thread name. By default, a unique name is constructed of the form “Thread-N” where N is a small decimal number.
args is a list or tuple of arguments for the target invocation. Defaults to ().
kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.
If a subclass overrides the constructor, it must make sure to invoke the base class constructor (Thread.__init__()) before doing anything else to the thread.
- _build_cache() None View on GitHub
- _do_watch(resource_version: str) None View on GitHub
- static _finish_response(response: HTTPResponse) None View on GitHub
- _list() K8sObject View on GitHub
- _watch(resource_version: str) HTTPResponse View on GitHub
- copy() Dict[str, K8sObject] View on GitHub
- get(name: str) K8sObject | None View on GitHub
- is_ready() bool View on GitHub
Must be called only when holding the lock on _condition
- kill_stream() None View on GitHub
- run() None View on GitHub
Method representing the thread’s activity.
You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.
- patroni.dcs.kubernetes._cleanup_temp_files() None View on GitHub
- patroni.dcs.kubernetes._create_temp_file(content: bytes) str View on GitHub
- patroni.dcs.kubernetes._run_and_handle_exceptions(method: Callable[[...], Any], *args: Any, **kwargs: Any) Any View on GitHub
- patroni.dcs.kubernetes.catch_kubernetes_errors(func: Callable[[...], Any]) Callable[[...], Any] View on GitHub
- patroni.dcs.kubernetes.to_camel_case(value: str) str View on GitHub
patroni.dcs.raft module
- class patroni.dcs.raft.DynMemberSyncObj(selfAddress: str | None, partnerAddrs: Collection[str], conf: SyncObjConf, retry_timeout: int = 10)View on GitHub
Bases:
SyncObj
- __init__(selfAddress: str | None, partnerAddrs: Collection[str], conf: SyncObjConf, retry_timeout: int = 10) None View on GitHub
Main SyncObj class, you should inherit your own class from it.
- Parameters:
selfNode (Node or str) – object representing the self-node or address of the current node server ‘host:port’
otherNodes (iterable of Node or iterable of str) – objects representing the other nodes or addresses of partner nodes [‘host1:port1’, ‘host2:port2’, …]
conf (SyncObjConf) – configuration object
consumers (list of SyncObjConsumer inherited objects) – objects to be replicated
nodeClass (class) – class used for representation of nodes
transport (Transport or None) – transport object; if None, transportClass is used to initialise such an object
transportClass (class) – the Transport subclass to be used for transferring messages to and from other nodes
- _onTick(timeToWait: float = 0.0)View on GitHub
- class patroni.dcs.raft.KVStoreTTL(on_ready: Callable[[...], Any] | None, on_set: Callable[[str, Dict[str, Any]], None] | None, on_delete: Callable[[str], None] | None, **config: Any)View on GitHub
Bases:
DynMemberSyncObj
- __init__(on_ready: Callable[[...], Any] | None, on_set: Callable[[str, Dict[str, Any]], None] | None, on_delete: Callable[[str], None] | None, **config: Any) None View on GitHub
Main SyncObj class, you should inherit your own class from it.
- Parameters:
selfNode (Node or str) – object representing the self-node or address of the current node server ‘host:port’
otherNodes (iterable of Node or iterable of str) – objects representing the other nodes or addresses of partner nodes [‘host1:port1’, ‘host2:port2’, …]
conf (SyncObjConf) – configuration object
consumers (list of SyncObjConsumer inherited objects) – objects to be replicated
nodeClass (class) – class used for representation of nodes
transport (Transport or None) – transport object; if None, transportClass is used to initialise such an object
transportClass (class) – the Transport subclass to be used for transferring messages to and from other nodes
- _autoTickThread() None View on GitHub
- _expire(key: str, value: Dict[str, Any], callback: Callable[[...], Any] | None = None) None View on GitHub
- _expire_v0(key: str, value: Dict[str, Any], callback: Callable[[...], Any] | None = None) None View on GitHub
- _onTick(timeToWait: float = 0.0) None View on GitHub
- destroy() None View on GitHub
Correctly destroy SyncObj. Stop autoTickThread, close connections, etc.
- get(key: str, recursive: bool = False) None | Dict[str, Any] | Dict[str, Dict[str, Any]] View on GitHub
- set(key: str, value: str, ttl: int | None = None, handle_raft_error: bool = True, **kwargs: Any) bool | Dict[str, Any] View on GitHub
- set_retry_timeout(retry_timeout: int) None View on GitHub
- startAutoTick() None View on GitHub
- class patroni.dcs.raft.Raft(config: Dict[str, Any])View on GitHub
Bases:
AbstractDCS
- __init__(config: Dict[str, Any]) None View on GitHub
Prepare DCS paths, Citus group ID, initial values for state information and processing dependencies.
- Variables:
config –
dict
, reference to config section of selected DCS. i.e.:zookeeper
for zookeeper,etcd
for etcd, etc…
- _abc_impl = <_abc._abc_data object>
- _citus_cluster_loader(path: str) Dict[int, Cluster] View on GitHub
Load and build all Patroni clusters from a single Citus cluster.
- Parameters:
path – the path in DCS where to load Cluster(s) from.
- Returns:
all Citus groups as
dict
, with group IDs as keys andCluster
objects as values or aCluster
object representing the coordinator with filled Cluster.workers attribute.
- _cluster_loader(path: str) Cluster View on GitHub
Load and build the
Cluster
object from DCS, which represents a single Patroni or Citus cluster.- Parameters:
path – the path in DCS where to load Cluster(s) from.
- Returns:
Cluster
instance.
- _delete_leader(leader: Leader) bool View on GitHub
Remove leader key from DCS.
This method should remove leader key if current instance is the leader.
- Parameters:
leader –
Leader
object with information about the leader.- Returns:
True
if successfully committed to DCS.
- _load_cluster(path: str, loader: Callable[[str], Cluster | Dict[int, Cluster]]) Cluster | Dict[int, Cluster] View on GitHub
Main abstract method that implements the loading of
Cluster
instance.Note
Internally this method should call the loader method that will build
Cluster
object which represents current state and topology of the cluster in DCS. This method supposed to be called only by theget_cluster()
method.- Parameters:
path – the path in DCS where to load Cluster(s) from.
loader – one of
_cluster_loader()
or_citus_cluster_loader()
.
- Raise:
DCSError
in case of communication problems with DCS. If the current node was running as a primary and exception raised, instance would be demoted.
- _on_delete(key: str) None View on GitHub
- _update_leader(leader: Leader) bool View on GitHub
Update
leader
key (or session) ttl.Note
You have to use CAS (Compare And Swap) operation in order to update leader key, for example for etcd
prevValue
parameter must be used.If update fails due to DCS not being accessible or because it is not able to process requests (hopefully temporary), the
DCSError
exception should be raised.- Parameters:
leader – a reference to a current
leader
object.- Returns:
True
ifleader
key (or session) has been updated successfully.
- _write_failsafe(value: str) bool View on GitHub
Write current cluster topology to DCS that will be used by failsafe mechanism (if enabled).
- Parameters:
value – failsafe topology serialized in JSON format.
- Returns:
True
if successfully committed to DCS.
- _write_leader_optime(last_lsn: str) bool View on GitHub
Write current WAL LSN into
/optime/leader
key in DCS.- Parameters:
last_lsn – absolute WAL LSN in bytes.
- Returns:
True
if successfully committed to DCS.
- _write_status(value: str) bool View on GitHub
Write current WAL LSN and
confirmed_flush_lsn
of permanent slots into the/status
key in DCS.- Parameters:
value – status serialized in JSON format.
- Returns:
True
if successfully committed to DCS.
- attempt_to_acquire_leader() bool View on GitHub
Attempt to acquire leader lock.
Note
This method should create
/leader
key with the value_name
.The key must be created atomically. In case the key already exists it should not be overwritten and
False
must be returned.If key creation fails due to DCS not being accessible or because it is not able to process requests (hopefully temporary), the
DCSError
exception should be raised.- Returns:
True
if key has been created successfully.
- cancel_initialization() bool View on GitHub
Removes the
initialize
key for a cluster.- Returns:
True
if successfully committed to DCS.
- delete_cluster() bool View on GitHub
Delete cluster from DCS.
- Returns:
True
if successfully committed to DCS.
- delete_sync_state(version: int | None = None) bool View on GitHub
Delete the synchronous state from DCS.
- Parameters:
version – for conditional deletion of the key/object.
- Returns:
True
if delete successful.
- initialize(create_new: bool = True, sysid: str = '') bool View on GitHub
Race for cluster initialization.
This method should atomically create
initialize
key and returnTrue
, otherwise it should returnFalse
.- Parameters:
create_new –
False
if the key should already exist (in the case we are setting the system_id).sysid – PostgreSQL cluster system identifier, if specified, is written to the key.
- Returns:
True
if key has been created successfully.
- reload_config(config: Config | Dict[str, Any]) None View on GitHub
Load and set relevant values from configuration.
Sets
loop_wait
,ttl
andretry_timeout
properties.- Parameters:
config – Loaded configuration information object or dictionary of key value pairs.
- set_config_value(value: str, version: int | None = None) bool View on GitHub
Create or update
/config
key in DCS.- Parameters:
value – new value to set in the
config
key.version – for conditional update of the key/object.
- Returns:
True
if successfully committed to DCS.
- set_failover_value(value: str, version: int | None = None) bool View on GitHub
Create or update
/failover
key.- Parameters:
value – value to set.
version – for conditional update of the key/object.
- Returns:
True
if successfully committed to DCS.
- set_history_value(value: str) bool View on GitHub
Set value for
history
in DCS.- Parameters:
value – new value of
history
key/object.- Returns:
True
if successfully committed to DCS.
- set_retry_timeout(retry_timeout: int) None View on GitHub
Set the new value for retry_timeout.
- set_sync_state_value(value: str, version: int | None = None) int | bool View on GitHub
Set synchronous state in DCS.
- Parameters:
value – the new value of
/sync
key.version – for conditional update of the key/object.
- Returns:
version of the new object or
False
in case of error.
- set_ttl(ttl: int) bool | None View on GitHub
Set the new ttl value for DCS keys.
- take_leader() bool View on GitHub
Establish a new leader in DCS.
Note
This method should create leader key with value of
_name
andttl
ofttl
.Since it could be called only on initial cluster bootstrap it could create this key regardless, overwriting the key if necessary.
- Returns:
True
if successfully committed to DCS.
- touch_member(data: Dict[str, Any]) bool View on GitHub
Update member key in DCS.
Note
This method should create or update key with the name with
/members/
+_name
and the value of data in a given DCS.- Parameters:
data – information about an instance (including connection strings).
- Returns:
True
if successfully committed to DCS.
- watch(leader_version: int | None, timeout: float) bool View on GitHub
Sleep if the current node is a leader, otherwise, watch for changes of leader key with a given timeout.
- Parameters:
leader_version – version of a leader key.
timeout – timeout in seconds.
- Returns:
if
True
this will reschedule the next run of the HA cycle.
- exception patroni.dcs.raft.RaftError(value: Any)View on GitHub
Bases:
DCSError
- class patroni.dcs.raft.SyncObjUtility(otherNodes: Collection[str | TCPNode], conf: SyncObjConf, retry_timeout: int = 10)View on GitHub
Bases:
object
- __init__(otherNodes: Collection[str | TCPNode], conf: SyncObjConf, retry_timeout: int = 10) None View on GitHub
- executeCommand(command: List[Any]) Any View on GitHub
- getMembers() List[str] | None View on GitHub
- class patroni.dcs.raft._TCPTransport(syncObj: DynMemberSyncObj, selfNode: TCPNode | None, otherNodes: Collection[TCPNode])View on GitHub
Bases:
TCPTransport
- __init__(syncObj: DynMemberSyncObj, selfNode: TCPNode | None, otherNodes: Collection[TCPNode]) None View on GitHub
Initialise the TCP transport. On normal (non-read-only) nodes, this will start a TCP server. On all nodes, it will initiate relevant connections to other nodes.
- Parameters:
syncObj (SyncObj) – SyncObj
selfNode (TCPNode or None) – current node (None if this is a read-only node)
otherNodes (iterable of TCPNode) – partner nodes
- _connectIfNecessarySingle(node: TCPNode) bool View on GitHub
Connect to a node if necessary.
- Parameters:
node (Node) – node to connect to
- patroni.dcs.raft.resolve_host(self: TCPNode) str | None View on GitHub
patroni.dcs.zookeeper module
- class patroni.dcs.zookeeper.PatroniKazooClient(hosts='127.0.0.1:2181', timeout=10.0, client_id=None, handler=None, default_acl=None, auth_data=None, sasl_options=None, read_only=None, randomize_hosts=True, connection_retry=None, command_retry=None, logger=None, keyfile=None, keyfile_password=None, certfile=None, ca=None, use_ssl=False, verify_certs=True, **kwargs)View on GitHub
Bases:
KazooClient
- class patroni.dcs.zookeeper.PatroniSequentialThreadingHandler(connect_timeout: int | float)View on GitHub
Bases:
SequentialThreadingHandler
- __init__(connect_timeout: int | float) None View on GitHub
Create a
SequentialThreadingHandler
instance
- create_connection(*args: Any, **kwargs: Any) socket View on GitHub
- This method is trying to establish connection with one of the zookeeper nodes.
Somehow strategy “fail earlier and retry more often” works way better comparing to the original strategy “try to connect with specified timeout”. Since we want to try connect to zookeeper more often (with the smaller connect_timeout), he have to override create_connection method in the SequentialThreadingHandler class (which is used by kazoo.Client).
- Parameters:
args – always contains tuple(host, port) as the first element and could contain connect_timeout (negotiated session timeout) as the second element.
- select(*args: Any, **kwargs: Any) Any View on GitHub
Python 3.XY may raise following exceptions if select/poll are called with an invalid socket: - ValueError: because fd == -1 - TypeError: Invalid file descriptor: -1 (starting from kazoo 2.9) Python 2.7 may raise the IOError instead of socket.error (starting from kazoo 2.9)
When it is appropriate we map these exceptions to socket.error.
- set_connect_timeout(connect_timeout: int | float) None View on GitHub
- class patroni.dcs.zookeeper.ZooKeeper(config: Dict[str, Any])View on GitHub
Bases:
AbstractDCS
- __init__(config: Dict[str, Any]) None View on GitHub
Prepare DCS paths, Citus group ID, initial values for state information and processing dependencies.
- Variables:
config –
dict
, reference to config section of selected DCS. i.e.:zookeeper
for zookeeper,etcd
for etcd, etc…
- _abc_impl = <_abc._abc_data object>
- _cancel_initialization() None View on GitHub
- _citus_cluster_loader(path: str) Dict[int, Cluster] View on GitHub
Load and build all Patroni clusters from a single Citus cluster.
- Parameters:
path – the path in DCS where to load Cluster(s) from.
- Returns:
all Citus groups as
dict
, with group IDs as keys andCluster
objects as values or aCluster
object representing the coordinator with filled Cluster.workers attribute.
- _cluster_loader(path: str) Cluster View on GitHub
Load and build the
Cluster
object from DCS, which represents a single Patroni or Citus cluster.- Parameters:
path – the path in DCS where to load Cluster(s) from.
- Returns:
Cluster
instance.
- _delete_leader(leader: Leader) bool View on GitHub
Remove leader key from DCS.
This method should remove leader key if current instance is the leader.
- Parameters:
leader –
Leader
object with information about the leader.- Returns:
True
if successfully committed to DCS.
- _kazoo_connect(*args: Any) Tuple[int | float, int | float] View on GitHub
Kazoo is using Ping’s to determine health of connection to zookeeper. If there is no response on Ping after Ping interval (1/2 from read_timeout) it will consider current connection dead and try to connect to another node. Without this “magic” it was taking up to 2/3 from session timeout (ttl) to figure out that connection was dead and we had only small time for reconnect and retry.
This method is needed to return different value of read_timeout, which is not calculated from negotiated session timeout but from value of loop_wait. And it is 2 sec smaller than loop_wait, because we can spend up to 2 seconds when calling touch_member() and write_leader_optime() methods, which also may hang…
- _load_cluster(path: str, loader: Callable[[str], Cluster | Dict[int, Cluster]]) Cluster | Dict[int, Cluster] View on GitHub
Main abstract method that implements the loading of
Cluster
instance.Note
Internally this method should call the loader method that will build
Cluster
object which represents current state and topology of the cluster in DCS. This method supposed to be called only by theget_cluster()
method.- Parameters:
path – the path in DCS where to load Cluster(s) from.
loader – one of
_cluster_loader()
or_citus_cluster_loader()
.
- Raise:
DCSError
in case of communication problems with DCS. If the current node was running as a primary and exception raised, instance would be demoted.
- _set_or_create(key: str, value: str, version: int | None = None, retry: bool = False, do_not_create_empty: bool = False) int | bool View on GitHub
- _update_leader(leader: Leader) bool View on GitHub
Update
leader
key (or session) ttl.Note
You have to use CAS (Compare And Swap) operation in order to update leader key, for example for etcd
prevValue
parameter must be used.If update fails due to DCS not being accessible or because it is not able to process requests (hopefully temporary), the
DCSError
exception should be raised.- Parameters:
leader – a reference to a current
leader
object.- Returns:
True
ifleader
key (or session) has been updated successfully.
- _watcher(event: WatchedEvent) None View on GitHub
- _write_failsafe(value: str) bool View on GitHub
Write current cluster topology to DCS that will be used by failsafe mechanism (if enabled).
- Parameters:
value – failsafe topology serialized in JSON format.
- Returns:
True
if successfully committed to DCS.
- _write_leader_optime(last_lsn: str) bool View on GitHub
Write current WAL LSN into
/optime/leader
key in DCS.- Parameters:
last_lsn – absolute WAL LSN in bytes.
- Returns:
True
if successfully committed to DCS.
- _write_status(value: str) bool View on GitHub
Write current WAL LSN and
confirmed_flush_lsn
of permanent slots into the/status
key in DCS.- Parameters:
value – status serialized in JSON format.
- Returns:
True
if successfully committed to DCS.
- attempt_to_acquire_leader() bool View on GitHub
Attempt to acquire leader lock.
Note
This method should create
/leader
key with the value_name
.The key must be created atomically. In case the key already exists it should not be overwritten and
False
must be returned.If key creation fails due to DCS not being accessible or because it is not able to process requests (hopefully temporary), the
DCSError
exception should be raised.- Returns:
True
if key has been created successfully.
- cancel_initialization() bool View on GitHub
Removes the
initialize
key for a cluster.- Returns:
True
if successfully committed to DCS.
- delete_cluster() bool View on GitHub
Delete cluster from DCS.
- Returns:
True
if successfully committed to DCS.
- delete_sync_state(version: int | None = None) bool View on GitHub
Delete the synchronous state from DCS.
- Parameters:
version – for conditional deletion of the key/object.
- Returns:
True
if delete successful.
- get_children(key: str) List[str] View on GitHub
- get_node(key: str, watch: Callable[[WatchedEvent], None] | None = None) Tuple[str, ZnodeStat] | None View on GitHub
- initialize(create_new: bool = True, sysid: str = '') bool View on GitHub
Race for cluster initialization.
This method should atomically create
initialize
key and returnTrue
, otherwise it should returnFalse
.- Parameters:
create_new –
False
if the key should already exist (in the case we are setting the system_id).sysid – PostgreSQL cluster system identifier, if specified, is written to the key.
- Returns:
True
if key has been created successfully.
- load_members(path: str) List[Member] View on GitHub
- static member(name: str, value: str, znode: ZnodeStat) Member View on GitHub
- reload_config(config: Config | Dict[str, Any]) None View on GitHub
Load and set relevant values from configuration.
Sets
loop_wait
,ttl
andretry_timeout
properties.- Parameters:
config – Loaded configuration information object or dictionary of key value pairs.
- set_config_value(value: str, version: int | None = None) bool View on GitHub
Create or update
/config
key in DCS.- Parameters:
value – new value to set in the
config
key.version – for conditional update of the key/object.
- Returns:
True
if successfully committed to DCS.
- set_failover_value(value: str, version: int | None = None) bool View on GitHub
Create or update
/failover
key.- Parameters:
value – value to set.
version – for conditional update of the key/object.
- Returns:
True
if successfully committed to DCS.
- set_history_value(value: str) bool View on GitHub
Set value for
history
in DCS.- Parameters:
value – new value of
history
key/object.- Returns:
True
if successfully committed to DCS.
- set_retry_timeout(retry_timeout: int) None View on GitHub
Set the new value for retry_timeout.
- set_sync_state_value(value: str, version: int | None = None) int | bool View on GitHub
Set synchronous state in DCS.
- Parameters:
value – the new value of
/sync
key.version – for conditional update of the key/object.
- Returns:
version of the new object or
False
in case of error.
- set_ttl(ttl: int) bool | None View on GitHub
It is not possible to change ttl (session_timeout) in zookeeper without destroying old session and creating the new one. This method returns !True if session_timeout has been changed (restart() has been called).
- take_leader() bool View on GitHub
Establish a new leader in DCS.
Note
This method should create leader key with value of
_name
andttl
ofttl
.Since it could be called only on initial cluster bootstrap it could create this key regardless, overwriting the key if necessary.
- Returns:
True
if successfully committed to DCS.
- touch_member(data: Dict[str, Any]) bool View on GitHub
Update member key in DCS.
Note
This method should create or update key with the name with
/members/
+_name
and the value of data in a given DCS.- Parameters:
data – information about an instance (including connection strings).
- Returns:
True
if successfully committed to DCS.
- watch(leader_version: int | None, timeout: float) bool View on GitHub
Sleep if the current node is a leader, otherwise, watch for changes of leader key with a given timeout.
- Parameters:
leader_version – version of a leader key.
timeout – timeout in seconds.
- Returns:
if
True
this will reschedule the next run of the HA cycle.
- exception patroni.dcs.zookeeper.ZooKeeperError(value: Any)View on GitHub
Bases:
DCSError
Module contents
Abstract classes for Distributed Configuration Store.
- class patroni.dcs.AbstractDCS(config: Dict[str, Any])View on GitHub
Bases:
ABC
Abstract representation of DCS modules.
Implementations of a concrete DCS class, using appropriate backend client interfaces, must include the following methods and properties.
Functional methods that are critical in their timing, required to complete within
retry_timeout
period in order to prevent the DCS considered inaccessible, each perform construction of complex data objects:_cluster_loader()
:method which processes the structure of data stored in the DCS used to build the
Cluster
object with all relevant associated data.
_citus_cluster_loader()
:Similar to above but specifically representing Citus group and workers information.
_load_cluster()
:main method for calling specific
loader
method to build theCluster
object representing the state and topology of the cluster.
Functional methods that are critical in their timing and must be written with ACID transaction properties in mind:
attempt_to_acquire_leader()
:method used in the leader race to attempt to acquire the leader lock by creating the leader key in the DCS, if it does not exist.
_update_leader()
:method to update
leader
key in DCS. Relies on Compare-And-Set to ensure the Primary lock key is updated. If this fails to update within theretry_timeout
window the Primary will be demoted.
Functional method that relies on Compare-And-Create to ensure only one member creates the relevant key:
initialize()
:method used in the race for cluster initialization which creates the
initialize
key in the DCS.
DCS backend getter and setter methods and properties:
take_leader()
: method to create a new leader key in the DCS.set_ttl()
: method for setting TTL value in DCS.ttl()
: property which returns the current TTL.set_retry_timeout()
: method for settingretry_timeout
in DCS backend._write_leader_optime()
: compatibility method to write WAL LSN to DCS._write_status()
: method to write WAL LSN for slots to the DCS._write_failsafe()
: method to write cluster topology to the DCS, used by failsafe mechanism.touch_member()
: method to update individual member key in the DCS.set_history_value()
: method to set thehistory
key in the DCS.
DCS setter methods using Compare-And-Set which although important are less critical if they fail, attempts can be retried or may result in warning log messages:
set_failover_value()
: method to create and/or update thefailover
key in the DCS.set_config_value()
: method to create and/or update thefailover
key in the DCS.set_sync_state_value()
: method to set the synchronous statesync
key in the DCS.
DCS data and key removal methods:
delete_sync_state()
:likewise, a method to remove synchronous state
sync
key from the DCS.
delete_cluster()
:method which will remove cluster information from the DCS. Used only from patronictl.
_delete_leader()
:method relies on CAS, used by a member that is the current leader, to remove the
leader
key in the DCS.
cancel_initialization()
:method to remove the
initialize
key for the cluster from the DCS.
If either of the sync_state set or delete methods fail, although not critical, this may result in
Synchronous replication key updated by someone else
messages being logged.Care should be taken to consult each abstract method for any additional information and requirements such as expected exceptions that should be raised in certain conditions and the object types for arguments and return from methods and properties.
- _CONFIG = 'config'
- _FAILOVER = 'failover'
- _FAILSAFE = 'failsafe'
- _HISTORY = 'history'
- _INITIALIZE = 'initialize'
- _LEADER = 'leader'
- _LEADER_OPTIME = 'optime/leader'
- _MEMBERS = 'members/'
- _OPTIME = 'optime'
- _STATUS = 'status'
- _SYNC = 'sync'
- __get_patroni_cluster(path: str | None = None) Cluster
Low level method to load a
Cluster
object from DCS.- Parameters:
path – optional client path in DCS backend to load from.
- Returns:
a loaded
Cluster
instance.
- __init__(config: Dict[str, Any]) None View on GitHub
Prepare DCS paths, Citus group ID, initial values for state information and processing dependencies.
- Variables:
config –
dict
, reference to config section of selected DCS. i.e.:zookeeper
for zookeeper,etcd
for etcd, etc…
- _abc_impl = <_abc._abc_data object>
- abstract _citus_cluster_loader(path: Any) Cluster | Dict[int, Cluster] View on GitHub
Load and build all Patroni clusters from a single Citus cluster.
- abstract _cluster_loader(path: Any) Cluster View on GitHub
Load and build the
Cluster
object from DCS, which represents a single Patroni or Citus cluster.- Parameters:
path – the path in DCS where to load Cluster(s) from.
- Returns:
Cluster
instance.
- abstract _delete_leader(leader: Leader) bool View on GitHub
Remove leader key from DCS.
This method should remove leader key if current instance is the leader.
- Parameters:
leader –
Leader
object with information about the leader.- Returns:
True
if successfully committed to DCS.
- _get_citus_cluster() Cluster View on GitHub
Load Citus cluster from DCS.
- Returns:
A Citus
Cluster
instance for the coordinator with workers clusters in the Cluster.workers dict.
- abstract _load_cluster(path: str, loader: Callable[[Any], Cluster | Dict[int, Cluster]]) Cluster | Dict[int, Cluster] View on GitHub
Main abstract method that implements the loading of
Cluster
instance.Note
Internally this method should call the loader method that will build
Cluster
object which represents current state and topology of the cluster in DCS. This method supposed to be called only by theget_cluster()
method.- Parameters:
path – the path in DCS where to load Cluster(s) from.
loader – one of
_cluster_loader()
or_citus_cluster_loader()
.
- Raise:
DCSError
in case of communication problems with DCS. If the current node was running as a primary and exception raised, instance would be demoted.
- _set_loop_wait(loop_wait: int) None View on GitHub
Set new loop_wait value.
- Parameters:
loop_wait – value to set.
- abstract _update_leader(leader: Leader) bool View on GitHub
Update
leader
key (or session) ttl.Note
You have to use CAS (Compare And Swap) operation in order to update leader key, for example for etcd
prevValue
parameter must be used.If update fails due to DCS not being accessible or because it is not able to process requests (hopefully temporary), the
DCSError
exception should be raised.- Parameters:
leader – a reference to a current
leader
object.- Returns:
True
ifleader
key (or session) has been updated successfully.
- abstract _write_failsafe(value: str) bool View on GitHub
Write current cluster topology to DCS that will be used by failsafe mechanism (if enabled).
- Parameters:
value – failsafe topology serialized in JSON format.
- Returns:
True
if successfully committed to DCS.
- abstract _write_leader_optime(last_lsn: str) bool View on GitHub
Write current WAL LSN into
/optime/leader
key in DCS.- Parameters:
last_lsn – absolute WAL LSN in bytes.
- Returns:
True
if successfully committed to DCS.
- abstract _write_status(value: str) bool View on GitHub
Write current WAL LSN and
confirmed_flush_lsn
of permanent slots into the/status
key in DCS.- Parameters:
value – status serialized in JSON format.
- Returns:
True
if successfully committed to DCS.
- abstract attempt_to_acquire_leader() bool View on GitHub
Attempt to acquire leader lock.
Note
This method should create
/leader
key with the value_name
.The key must be created atomically. In case the key already exists it should not be overwritten and
False
must be returned.If key creation fails due to DCS not being accessible or because it is not able to process requests (hopefully temporary), the
DCSError
exception should be raised.- Returns:
True
if key has been created successfully.
- abstract cancel_initialization() bool View on GitHub
Removes the
initialize
key for a cluster.- Returns:
True
if successfully committed to DCS.
- client_path(path: str) str View on GitHub
Construct the absolute key name from appropriate parts for the DCS type.
- Parameters:
path – The key name within the current Patroni cluster.
- Returns:
absolute key name for the current Patroni cluster.
- abstract delete_cluster() bool View on GitHub
Delete cluster from DCS.
- Returns:
True
if successfully committed to DCS.
- delete_leader(leader: Leader | None, last_lsn: int | None = None) bool View on GitHub
Update
optime/leader
and voluntarily remove leader key from DCS.This method should remove leader key if current instance is the leader.
- Parameters:
leader –
Leader
object with information about the leader.last_lsn – latest checkpoint location in bytes.
- Returns:
boolean result of called abstract
_delete_leader()
.
- abstract delete_sync_state(version: Any | None = None) bool View on GitHub
Delete the synchronous state from DCS.
- Parameters:
version – for conditional deletion of the key/object.
- Returns:
True
if delete successful.
- get_citus_coordinator() Cluster | None View on GitHub
Load the Patroni cluster for the Citus Coordinator.
Note
This method is only executed on the worker nodes (
group!=0
) to find the coordinator.- Returns:
Select
Cluster
instance associated with the Citus Coordinator group ID.
- get_cluster() Cluster View on GitHub
Retrieve a fresh view of DCS.
Note
Stores copy of time, status and failsafe values for comparison in DCS update decisions. Caching is required to avoid overhead placed upon the REST API.
Returns either a Citus or Patroni implementation of
Cluster
depending on availability.- Returns:
- abstract initialize(create_new: bool = True, sysid: str = '') bool View on GitHub
Race for cluster initialization.
This method should atomically create
initialize
key and returnTrue
, otherwise it should returnFalse
.- Parameters:
create_new –
False
if the key should already exist (in the case we are setting the system_id).sysid – PostgreSQL cluster system identifier, if specified, is written to the key.
- Returns:
True
if key has been created successfully.
- is_citus_coordinator() bool View on GitHub
Cluster
instance has a Citus Coordinator group ID.- Returns:
True
if the given node is running as Citus Coordinator (group=0
).
- property leader_optime_path: str
Get the client path for
optime/leader
(legacy key, superseded bystatus
).
- manual_failover(leader: str | None, candidate: str | None, scheduled_at: datetime | None = None, version: Any | None = None) bool View on GitHub
Prepare dictionary with given values and set
/failover
key in DCS.- Parameters:
leader – value to set for
leader
.candidate – value to set for
member
.scheduled_at – value converted to ISO date format for
scheduled_at
.version – for conditional update of the key/object.
- Returns:
True
if successfully committed to DCS.
- reload_config(config: Config | Dict[str, Any]) None View on GitHub
Load and set relevant values from configuration.
Sets
loop_wait
,ttl
andretry_timeout
properties.- Parameters:
config – Loaded configuration information object or dictionary of key value pairs.
- reset_cluster() None View on GitHub
Clear cached state of DCS.
- abstract set_config_value(value: str, version: Any | None = None) bool View on GitHub
Create or update
/config
key in DCS.- Parameters:
value – new value to set in the
config
key.version – for conditional update of the key/object.
- Returns:
True
if successfully committed to DCS.
- abstract set_failover_value(value: str, version: Any | None = None) bool View on GitHub
Create or update
/failover
key.- Parameters:
value – value to set.
version – for conditional update of the key/object.
- Returns:
True
if successfully committed to DCS.
- abstract set_history_value(value: str) bool View on GitHub
Set value for
history
in DCS.- Parameters:
value – new value of
history
key/object.- Returns:
True
if successfully committed to DCS.
- abstract set_retry_timeout(retry_timeout: int) None View on GitHub
Set the new value for retry_timeout.
- abstract set_sync_state_value(value: str, version: Any | None = None) Any | bool View on GitHub
Set synchronous state in DCS.
- Parameters:
value – the new value of
/sync
key.version – for conditional update of the key/object.
- Returns:
version of the new object or
False
in case of error.
- abstract set_ttl(ttl: int) bool | None View on GitHub
Set the new ttl value for DCS keys.
- static sync_state(leader: str | None, sync_standby: Collection[str] | None) Dict[str, Any] View on GitHub
Build
sync_state
dictionary.- Parameters:
leader – name of the leader node that manages
/sync
key.sync_standby – collection of currently known synchronous standby node names.
- Returns:
dictionary that later could be serialized to JSON or saved directly to DCS.
- abstract take_leader() bool View on GitHub
Establish a new leader in DCS.
Note
This method should create leader key with value of
_name
andttl
ofttl
.Since it could be called only on initial cluster bootstrap it could create this key regardless, overwriting the key if necessary.
- Returns:
True
if successfully committed to DCS.
- abstract touch_member(data: Dict[str, Any]) bool View on GitHub
Update member key in DCS.
Note
This method should create or update key with the name with
/members/
+_name
and the value of data in a given DCS.- Parameters:
data – information about an instance (including connection strings).
- Returns:
True
if successfully committed to DCS.
- update_leader(leader: Leader, last_lsn: int | None, slots: Dict[str, int] | None = None, failsafe: Dict[str, str] | None = None) bool View on GitHub
Update
leader
key (or session) ttl and optime/leader.- Parameters:
leader –
Leader
object with information about the leader.last_lsn – absolute WAL LSN in bytes.
slots – dictionary with permanent slots
confirmed_flush_lsn
.failsafe – if defined dictionary passed to
write_failsafe()
.
- Returns:
True
ifleader
key (or session) has been updated successfully.
- watch(leader_version: Any | None, timeout: float) bool View on GitHub
Sleep if the current node is a leader, otherwise, watch for changes of leader key with a given timeout.
- Parameters:
leader_version – version of a leader key.
timeout – timeout in seconds.
- Returns:
if
True
this will reschedule the next run of the HA cycle.
- write_failsafe(value: Dict[str, str]) None View on GitHub
Write the
/failsafe
key in DCS.- Parameters:
value – dictionary value to set, consisting of the
name
andapi_url
of members.
- write_leader_optime(last_lsn: int) None View on GitHub
Write value for WAL LSN to
optime/leader
key in DCS.Note
This method abstracts away the required data structure of
write_status()
, so it is not needed in the caller. However, theoptime/leader
is only written inwrite_status()
when the cluster has members with a Patroni version that is old enough to require it (i.e. the old Patroni version doesn’t understand the new format).- Parameters:
last_lsn – absolute WAL LSN in bytes.
- write_status(value: Dict[str, Any]) None View on GitHub
Write cluster status to DCS if changed.
Note
The DCS key
/status
was introduced in Patroni version 2.1.0. Previous to this the position of last known leader LSN was stored inoptime/leader
. This method has detection for backwards compatibility of members with a version older than this.- Parameters:
value – JSON serializable dictionary with current WAL LSN and
confirmed_flush_lsn
of permanent slots.
- write_sync_state(leader: str | None, sync_standby: Collection[str] | None, version: Any | None = None) SyncState | None View on GitHub
Write the new synchronous state to DCS.
Calls
sync_state()
to build a dictionary and then calls DCS specificset_sync_state_value()
.- Parameters:
leader – name of the leader node that manages
/sync
key.sync_standby – collection of currently known synchronous standby node names.
version – for conditional update of the key/object.
- Returns:
the new
SyncState
object orNone
.
- class patroni.dcs.Cluster(*args: Any, **kwargs: Any)View on GitHub
Bases:
Cluster
Immutable object (namedtuple) which represents PostgreSQL or Citus cluster.
Note
We are using an old-style attribute declaration here because otherwise it is not possible to override __new__ method. Without it the workers by default gets always the same
dict
object that could be mutated.Consists of the following fields:
- Variables:
initialize – shows whether this cluster has initialization key stored in DC or not.
config – global dynamic configuration, reference to ClusterConfig object.
leader –
Leader
object which represents current leader of the cluster.status –
Status
object which represents the /status key.members – list of:class:` Member` objects, all PostgreSQL cluster members including leader
failover – reference to
Failover
object.sync – reference to
SyncState
object, last observed synchronous replication state.history – reference to TimelineHistory object.
failsafe – failsafe topology. Node is allowed to become the leader only if its name is found in this list.
workers – dictionary of workers of the Citus cluster, optional. Each key is an
int
representing the group, and the corresponding value is aCluster
instance.
- property __permanent_logical_slots: Dict[str, Any]
Dictionary of permanent
logical
replication slots.
- property __permanent_physical_slots: Dict[str, Any]
Dictionary of permanent
physical
replication slots.
- property __permanent_slots: Dict[str, Dict[str, Any] | Any]
Dictionary of permanent replication slots with their known LSN.
- _get_members_slots(my_name: str, role: str) Dict[str, Dict[str, str]] View on GitHub
Get physical replication slots configuration for members that sourcing from this node.
If the
replicatefrom
tag is set on the member - we should not create the replication slot for it on the current primary, because that member would replicate from elsewhere. We still create the slot if thereplicatefrom
destination member is currently not a member of the cluster (fallback to the primary), or ifreplicatefrom
destination member happens to be the current primary.Will log an error if:
Conflicting slot names between members are found
- Parameters:
my_name – name of this node.
role – role of this node, if this is a
primary
orstandby_leader
return list of members replicating from this node. If not then return a list of members replicating as cascaded replicas from this node.
- Returns:
dictionary of physical replication slots that should exist on a given node.
- _get_permanent_slots(*, is_standby_cluster: bool, role: str, nofailover: bool, major_version: int) Dict[str, Any] View on GitHub
Get configured permanent replication slots.
Note
Permanent replication slots are only considered if
use_slots
configuration is enabled. A node that is not supposed to become a leader (nofailover) will not have permanent replication slots.In a standby cluster we only support physical replication slots.
The returned dictionary for a non-standby cluster always contains permanent logical replication slots in order to show a warning if they are not supported by PostgreSQL before v11.
- Parameters:
is_standby_cluster –
True
if it is known that this is a standby cluster. We pass the value from the outside because we want to protect from the/config
key removal.role – role of this node –
primary
,standby_leader
orreplica
.nofailover –
True
if this node is tagged to not be a failover candidate.major_version – postgresql major version.
- Returns:
dictionary of permanent slot names mapped to attributes.
- _has_permanent_logical_slots(my_name: str, nofailover: bool) bool View on GitHub
Check if the given member node has permanent
logical
replication slots configured.- Parameters:
my_name – name of the member node to check.
nofailover –
True
if this node is tagged to not be a failover candidate.
- Returns:
True
if any detected replications slots arelogical
, otherwiseFalse
.
- _merge_permanent_slots(slots: Dict[str, Dict[str, str]], permanent_slots: Dict[str, Any], my_name: str, major_version: int) List[str] View on GitHub
Merge replication slots for members with permanent_slots.
Perform validation of configured permanent slot name, skipping invalid names.
Will update slots in-line based on
type
of slot,physical
orlogical
, and name of node. Type is assumed to bephysical
if there are no attributes stored as the slot value.- Parameters:
slots – Slot names with existing attributes if known.
my_name – name of this node.
permanent_slots – dictionary containing slot name key and slot information values.
major_version – postgresql major version.
- Returns:
List of disabled permanent, logical slot names, if postgresql version < 11.
- static empty() Cluster View on GitHub
Produce an empty
Cluster
instance.
- filter_permanent_slots(slots: Dict[str, int], is_standby_cluster: bool, major_version: int) Dict[str, int] View on GitHub
Filter out all non-permanent slots from provided slots dict.
- Parameters:
slots – slot names with LSN values
is_standby_cluster –
True
if it is known that this is a standby cluster. We pass the value from the outside because we want to protect from the/config
key removal.major_version – postgresql major version.
- Returns:
a
dict
object that contains only slots that are known to be permanent.
- get_clone_member(exclude_name: str) Member | Leader | None View on GitHub
Get member or leader object to use as clone source.
- Parameters:
exclude_name – name of a member name to exclude.
- Returns:
a randomly selected candidate member from available running members that are configured to as viable sources for cloning (has tag
clonefrom
in configuration). If no member is appropriate the current leader is used.
- get_member(member_name: str, fallback_to_leader: bool = True) Member | Leader | None View on GitHub
- get_my_slot_name_on_primary(my_name: str, replicatefrom: str | None) str View on GitHub
Canonical slot name for physical replication.
Note
P <– I <– L
In case of cascading replication we have to check not our physical slot, but slot of the replica that connects us to the primary.
- Parameters:
my_name – the member node name that is replicating.
replicatefrom – the Intermediate member name that is configured to replicate for cascading replication.
- Returns:
The slot name that is in use for physical replication on this no`de.
- get_replication_slots(my_name: str, role: str, nofailover: bool, major_version: int, *, is_standby_cluster: bool = False, show_error: bool = False) Dict[str, Dict[str, Any]] View on GitHub
Lookup configured slot names in the DCS, report issues found and merge with permanent slots.
Will log an error if:
Any logical slots are disabled, due to version compatibility, and show_error is
True
.
- Parameters:
my_name – name of this node.
role – role of this node.
nofailover –
True
if this node is tagged to not be a failover candidate.major_version – postgresql major version.
is_standby_cluster –
True
if it is known that this is a standby cluster. We pass the value from the outside because we want to protect from the/config
key removal.show_error – if
True
report error if any disabled logical slots or conflicting slot names are found.
- Returns:
final dictionary of slot names, after merging with permanent slots and performing sanity checks.
- has_member(member_name: str) bool View on GitHub
Check if the given member name is present in the cluster.
- Parameters:
member_name – name to look up in the
members
.- Returns:
True
if the member name is found.
- has_permanent_slots(my_name: str, *, is_standby_cluster: bool = False, nofailover: bool = False, major_version: int = 110000) bool View on GitHub
Check if the given member node has permanent replication slots configured.
- Parameters:
my_name – name of the member node to check.
is_standby_cluster –
True
if it is known that this is a standby cluster. We pass the value from the outside because we want to protect from the/config
key removal.nofailover –
True
if this node is tagged to not be a failover candidate.major_version – postgresql major version.
- Returns:
True
if there are permanent replication slots configured, otherwiseFalse
.
- is_empty()View on GitHub
Validate definition of all attributes of this
Cluster
instance.- Returns:
True
if all attributes of the currentCluster
are unpopulated.
- static is_logical_slot(value: Any | Dict[str, Any]) bool View on GitHub
Check whether provided configuration is for permanent logical replication slot.
- Parameters:
value – configuration of the permanent replication slot.
- Returns:
True
if value is a logical replication slot, otherwiseFalse
.
- static is_physical_slot(value: Any | Dict[str, Any]) bool View on GitHub
Check whether provided configuration is for permanent physical replication slot.
- Parameters:
value – configuration of the permanent replication slot.
- Returns:
True
if value is a physical replication slot, otherwiseFalse
.
- is_unlocked() bool View on GitHub
Check if the cluster does not have the leader.
- Returns:
True
if a leader name is not defined.
- property min_version: Tuple[int, ...] | None
Lowest Patroni software version found in known members of the cluster.
- should_enforce_hot_standby_feedback(my_name: str, nofailover: bool) bool View on GitHub
Determine whether
hot_standby_feedback
should be enabled for the given member.The
hot_standby_feedback
must be enabled if the current replica haslogical
slots, or it is working as a cascading replica for the other node that haslogical
slots.- Parameters:
my_name – name of the member node to check.
nofailover –
True
if this node is tagged to not be a failover candidate.
- Returns:
True
if this node or any member replicating from this node has permanent logical slots, otherwiseFalse
.
- property slots: Dict[str, int] | None
{"slot_name": int}
.- Type:
State of permanent replication slots on the primary in the format
- property timeline: int
Get the cluster history index from the
history
.- Returns:
If the recorded history is empty assume timeline is
1
, if it is not defined or the stored history is not formatted as expected0
is returned and an error will be logged. Otherwise, the last number stored incremented by 1 is returned.- Example:
No history provided: >>> Cluster(0, 0, 0, Status.empty(), 0, 0, 0, 0, None, {}).timeline 0
Empty history assume timeline is
1
: >>> Cluster(0, 0, 0, Status.empty(), 0, 0, 0, TimelineHistory.from_node(1, ‘[]’), None, {}).timeline 1Invalid history format, a string of
a
, returns0
: >>> Cluster(0, 0, 0, Status.empty(), 0, 0, 0, TimelineHistory.from_node(1, ‘[[“a”]]’), None, {}).timeline 0History as a list of strings: >>> history = TimelineHistory.from_node(1, ‘[[“3”, “2”, “1”]]’) >>> Cluster(0, 0, 0, Status.empty(), 0, 0, 0, history, None, {}).timeline 4
- class patroni.dcs.ClusterConfig(version: int | str, data: Dict[str, Any], modify_version: int | str)View on GitHub
Bases:
NamedTuple
Immutable object (namedtuple) which represents cluster configuration.
- Variables:
version – version number for the object.
data – dictionary of configuration information.
modify_version – modified version number.
- _asdict()View on GitHub
Return a new dict which maps field names to their values.
- _field_defaults = {}
- _fields = ('version', 'data', 'modify_version')
- classmethod _make(iterable)View on GitHub
Make a new ClusterConfig object from a sequence or iterable
- _replace(**kwds)View on GitHub
Return a new ClusterConfig object replacing specified fields with new values
- static from_node(version: int | str, value: str, modify_version: int | str | None = None) ClusterConfig View on GitHub
Factory method to parse value as configuration information.
- Parameters:
version – version number for object.
value – raw JSON serialized data, if not parsable replaced with an empty dictionary.
modify_version – optional modify version number, use version if not provided.
- Returns:
constructed
ClusterConfig
instance.- Example:
>>> ClusterConfig.from_node(1, '{') is None False
- property ignore_slots_matchers: List[Dict[str, Any]]
The value for
ignore_slots
fromdata
if defined or an empty list.
- class patroni.dcs.Failover(version: int | str, leader: str | None, candidate: str | None, scheduled_at: datetime | None)View on GitHub
Bases:
NamedTuple
Immutable object (namedtuple) representing configuration information required for failover/switchover capability.
- Variables:
version – version of the object.
leader – name of the leader. If value isn’t empty we treat it as a switchover from the specified node.
candidate – the name of the member node to be considered as a failover candidate.
scheduled_at – in the case of a switchover the
datetime
object to perform the scheduled switchover.
- Example:
>>> 'Failover' in str(Failover.from_node(1, '{"leader": "cluster_leader"}')) True
>>> 'Failover' in str(Failover.from_node(1, {"leader": "cluster_leader"})) True
>>> 'Failover' in str(Failover.from_node(1, '{"leader": "cluster_leader", "member": "cluster_candidate"}')) True
>>> Failover.from_node(1, 'null') is None False
>>> n = '''{"leader": "cluster_leader", "member": "cluster_candidate", ... "scheduled_at": "2016-01-14T10:09:57.1394Z"}'''
>>> 'tzinfo=' in str(Failover.from_node(1, n)) True
>>> Failover.from_node(1, None) is None False
>>> Failover.from_node(1, '{}') is None False
>>> 'abc' in Failover.from_node(1, 'abc:def') True
- _asdict()View on GitHub
Return a new dict which maps field names to their values.
- _field_defaults = {}
- _fields = ('version', 'leader', 'candidate', 'scheduled_at')
- classmethod _make(iterable)View on GitHub
Make a new Failover object from a sequence or iterable
- _replace(**kwds)View on GitHub
Return a new Failover object replacing specified fields with new values
- static from_node(version: int | str, value: str | Dict[str, str]) Failover View on GitHub
Factory method to parse value as failover configuration.
- Parameters:
version – version number for the object.
value – JSON serialized data or a dictionary of configuration. Can also be a colon
:
delimited list of leader, followed by candidate name (legacy format). Ifscheduled_at
key is defined the value will be parsed bydateutil.parser.parse()
.
- Returns:
constructed
Failover
information object
- class patroni.dcs.Leader(version: int | str, session: int | float | str | None, member: Member)View on GitHub
Bases:
NamedTuple
Immutable object (namedtuple) which represents leader key.
Consists of the following fields:
- Variables:
version – modification version of a leader key in a Configuration Store
session – either session id or just ttl in seconds
member – reference to a
Member
object which represents current leader (seeCluster.members
)
- _asdict()View on GitHub
Return a new dict which maps field names to their values.
- _field_defaults = {}
- _fields = ('version', 'session', 'member')
- classmethod _make(iterable)View on GitHub
Make a new Leader object from a sequence or iterable
- _replace(**kwds)View on GitHub
Return a new Leader object replacing specified fields with new values
- property checkpoint_after_promote: bool | None
Determine whether a checkpoint has occurred for this leader after promotion.
- Returns:
True
if the role ismaster
orprimary
andcheckpoint_after_promote
is not set,False
if not amaster
orprimary
or if the checkpoint hasn’t occurred. If the version of Patroni is older than 1.5.6, returnNone
.- Example:
>>> Leader(1, '', Member.from_node(1, '', '', '{"version":"z"}')).checkpoint_after_promote
- conn_kwargs(auth: Dict[str, str] | None = None) Dict[str, str] View on GitHub
Connection keyword arguments.
- Parameters:
auth – an optional dictionary containing authentication information.
- Returns:
the result of the called
Member.conn_kwargs()
method.
- class patroni.dcs.Member(version: int | str, name: str, session: int | float | str | None, data: Dict[str, Any])View on GitHub
-
Immutable object (namedtuple) which represents single member of PostgreSQL cluster.
Note
We are using an old-style attribute declaration here because otherwise it is not possible to override
__new__
method in theRemoteMember
class.Note
These two keys in data are always written to the DCS, but care is taken to maintain consistency and resilience from data that is read:
conn_url
: connection string containing host, user and password which could be used to access this member.api_url
: REST API url of patroni instanceConsists of the following fields:
- Variables:
version – modification version of a given member key in a Configuration Store.
name – name of PostgreSQL cluster member.
session – either session id or just ttl in seconds.
data – dictionary containing arbitrary data i.e.
conn_url
,api_url
,xlog_location
,state
,role
,tags
, etc…
- _abc_impl = <_abc._abc_data object>
- conn_kwargs(auth: Any | Dict[str, Any] | None = None) Dict[str, Any] View on GitHub
Give keyword arguments used for PostgreSQL connection settings.
- Parameters:
auth – Authentication properties - can be defined as anything supported by the
psycopg2
orpsycopg
modules. Converts a key ofusername
touser
if supplied.- Returns:
A dictionary containing a merge of default parameter keys
host
,port
anddbname
, with the contents ofdata
conn_kwargs
key. If those are not defined will parse and reform connection parameters fromconn_url
. One of these two attributes needs to have data defined to construct the output dictionary. Finally, auth parameters are merged with the dictionary before returned.
- property conn_url: str | None
The
conn_url
value fromdata
if defined or constructed fromconn_kwargs
.
- static from_node(version: int | str, name: str, session: int | float | str | None, value: str) Member View on GitHub
Factory method for instantiating
Member
from a JSON serialised string or object.- Parameters:
version – modification version of a given member key in a Configuration Store.
name – name of PostgreSQL cluster member.
session – either session id or just ttl in seconds.
value – JSON encoded string containing arbitrary data i.e.
conn_url
,api_url
,xlog_location
,state
,role
,tags
, etc. OR a connection URL starting withpostgres://
.
- Returns:
an
Member
instance built with the given arguments.- Example:
>>> Member.from_node(-1, '', '', '{"conn_url": "postgres://foo@bar/postgres"}') is not None True
>>> Member.from_node(-1, '', '', '{') Member(version=-1, name='', session='', data={})
- class patroni.dcs.RemoteMember(name: str, data: Dict[str, Any])View on GitHub
Bases:
Member
Represents a remote member (typically a primary) for a standby cluster.
- Variables:
ALLOWED_KEYS – Controls access to relevant key names that could be in stored
data
.
- ALLOWED_KEYS: Tuple[str, ...] = ('primary_slot_name', 'create_replica_methods', 'restore_command', 'archive_cleanup_command', 'recovery_min_apply_delay', 'no_replication_slot')
- _abc_impl = <_abc._abc_data object>
- exception patroni.dcs.ReturnFalseExceptionView on GitHub
Bases:
Exception
Exception to be caught by the
catch_return_false_exception()
decorator.
- class patroni.dcs.Status(last_lsn: int, slots: Dict[str, int] | None)View on GitHub
Bases:
NamedTuple
Immutable object (namedtuple) which represents /status key.
Consists of the following fields:
- Variables:
last_lsn –
int
object containing position of last known leader LSN.slots – state of permanent replication slots on the primary in the format:
{"slot_name": int}
.
- _asdict()View on GitHub
Return a new dict which maps field names to their values.
- _field_defaults = {}
- _fields = ('last_lsn', 'slots')
- classmethod _make(iterable)View on GitHub
Make a new Status object from a sequence or iterable
- _replace(**kwds)View on GitHub
Return a new Status object replacing specified fields with new values
- static empty() Status View on GitHub
Construct an empty
Status
instance.- Returns:
empty
Status
object.
- class patroni.dcs.SyncState(version: int | str | None, leader: str | None, sync_standby: str | None)View on GitHub
Bases:
NamedTuple
Immutable object (namedtuple) which represents last observed synchronous replication state.
- Variables:
version – modification version of a synchronization key in a Configuration Store.
leader – reference to member that was leader.
sync_standby – synchronous standby list (comma delimited) which are last synchronized to leader.
- _asdict()View on GitHub
Return a new dict which maps field names to their values.
- _field_defaults = {}
- _fields = ('version', 'leader', 'sync_standby')
- classmethod _make(iterable)View on GitHub
Make a new SyncState object from a sequence or iterable
- _replace(**kwds)View on GitHub
Return a new SyncState object replacing specified fields with new values
- static _str_to_list(value: str) List[str] View on GitHub
Splits a string by comma and returns list of strings.
- Parameters:
value – a comma separated string.
- Returns:
list of non-empty strings after splitting an input value by comma.
- static empty(version: int | str | None = None) SyncState View on GitHub
Construct an empty
SyncState
instance.- Parameters:
version – optional version number.
- Returns:
empty synchronisation state object.
- static from_node(version: int | str | None, value: str | Dict[str, Any] | None) SyncState View on GitHub
Factory method to parse value as synchronisation state information.
- Parameters:
version – optional version number for the object.
value – (optionally JSON serialised) sychronisation state information
- Returns:
constructed
SyncState
object.- Example:
>>> SyncState.from_node(1, None).leader is None True
>>> SyncState.from_node(1, '{}').leader is None True
>>> SyncState.from_node(1, '{').leader is None True
>>> SyncState.from_node(1, '[]').leader is None True
>>> SyncState.from_node(1, '{"leader": "leader"}').leader == "leader" True
>>> SyncState.from_node(1, {"leader": "leader"}).leader == "leader" True
- leader_matches(name: str | None) bool View on GitHub
Compare the given name to stored leader value.
- Returns:
True
if name is matching theleader
value.
- matches(name: str | None, check_leader: bool = False) bool View on GitHub
Checks if node is presented in the /sync state.
Since PostgreSQL does case-insensitive checks for synchronous_standby_name we do it also.
- Parameters:
name – name of the node.
check_leader – by default the name is searched for only in members, a value of
True
will include the leader to list.
- Returns:
True
if the/sync
key notis_empty()
and the given name is among those presented in the sync state.- Example:
>>> s = SyncState(1, 'foo', 'bar,zoo')
>>> s.matches('foo') False
>>> s.matches('fOo', True) True
>>> s.matches('Bar') True
>>> s.matches('zoO') True
>>> s.matches('baz') False
>>> s.matches(None) False
>>> SyncState.empty(1).matches('foo') False
- property members: List[str]
sync_standby
as list or an empty list if undefined or object consideredempty
.
- class patroni.dcs.TimelineHistory(version: int | str, value: Any, lines: List[Tuple[int, int, str] | Tuple[int, int, str, str] | Tuple[int, int, str, str, str]])View on GitHub
Bases:
NamedTuple
Object representing timeline history file.
Note
The content held in lines deserialized from value are lines parsed from PostgreSQL timeline history files, consisting of the timeline number, the LSN where the timeline split and any other string held in the file. The files are parsed by
parse_history()
.- Variables:
version – version number of the file.
value – raw JSON serialised data consisting of parsed lines from history files.
lines –
List
ofTuple
parsed lines from history files.
- _asdict()View on GitHub
Return a new dict which maps field names to their values.
- _field_defaults = {}
- _fields = ('version', 'value', 'lines')
- classmethod _make(iterable)View on GitHub
Make a new TimelineHistory object from a sequence or iterable
- _replace(**kwds)View on GitHub
Return a new TimelineHistory object replacing specified fields with new values
- static from_node(version: int | str, value: str) TimelineHistory View on GitHub
Parse the given JSON serialized string as a list of timeline history lines.
- Parameters:
version – version number
value – JSON serialized string, consisting of parsed lines of PostgreSQL timeline history files, see
TimelineHistory
.
- Returns:
composed timeline history object using parsed lines.
- Example:
If the passed value argument is not parsed an empty list of lines is returned:
>>> h = TimelineHistory.from_node(1, 2)
>>> h.lines []
- patroni.dcs.catch_return_false_exception(func: Callable[[...], Any]) Any View on GitHub
Decorator function for catching functions raising
ReturnFalseException
.- Parameters:
func – function to be wrapped.
- Returns:
wrapped function.
- patroni.dcs.dcs_modules() List[str] View on GitHub
Get names of DCS modules, depending on execution environment.
Note
If being packaged with PyInstaller, modules aren’t discoverable dynamically by scanning source directory because
importlib.machinery.FrozenImporter
doesn’t implementiter_modules()
. But it is still possible to find all potential DCS modules by iterating throughtoc
, which contains list of all “frozen” resources.- Returns:
list of known module names with absolute python module path namespace, e.g.
patroni.dcs.etcd
.
- patroni.dcs.find_dcs_class_in_module(module: module) Type[AbstractDCS] | None View on GitHub
Try to find the implementation of
AbstractDCS
interface in module matching the module name.- Parameters:
module – Imported DCS module.
- Returns:
class with a name matching the name of module that implements
AbstractDCS
orNone
if not found.
- patroni.dcs.get_dcs(config: Config | Dict[str, Any]) AbstractDCS View on GitHub
Attempt to load a Distributed Configuration Store from known available implementations.
Note
Using the list of available DCS classes returned by
iter_dcs_classes()
attempt to dynamically instantiate the class that implements a DCS using the abstract classAbstractDCS
.Basic top-level configuration parameters retrieved from config are propagated to the DCS specific config before being passed to the module DCS class.
If no module is found to satisfy configuration then report and log an error. This will cause Patroni to exit.
:raises
PatroniFatalException
: if a load of all available DCS modules have been tried and none succeeded.- Parameters:
config – object or dictionary with Patroni configuration. This is normally a representation of the main Patroni
- Returns:
The first successfully loaded DCS module which is an implementation of
AbstractDCS
.
- patroni.dcs.iter_dcs_classes(config: Config | Dict[str, Any] | None = None) Iterator[Tuple[str, Type[AbstractDCS]]] View on GitHub
Attempt to import DCS modules that are present in the given configuration.
Note
If a module successfully imports we can assume that all its requirements are installed.
- Parameters:
config – configuration information with possible DCS names as keys. If given, only attempt to import DCS modules defined in the configuration. Else, if
None
, attempt to import any supported DCS module.- Yields:
a tuple containing the module
name
and the imported DCS class object.
- patroni.dcs.parse_connection_string(value: str) Tuple[str, str | None] View on GitHub
Split and rejoin a URL string into a connection URL and an API URL.
Note
Original Governor stores connection strings for each cluster members in a following format:
postgres://{username}:{password}@{connect_address}/postgres
Since each of our patroni instances provides their own REST API endpoint, it’s good to store this information in DCS along with PostgreSQL connection string. In order to not introduce new keys and be compatible with original Governor we decided to extend original connection string in a following way:
postgres://{username}:{password}@{connect_address}/postgres?application_name={api_url}
This way original Governor could use such connection string as it is, because of feature of
libpq
library.- Parameters:
value – The URL string to split.
- Returns:
the connection string stored in DCS split into two parts,
conn_url
andapi_url
.
- patroni.dcs.slot_name_from_member_name(member_name: str) str View on GitHub
Translate member name to valid PostgreSQL slot name.
Note
PostgreSQL’s replication slot names must be valid PostgreSQL names. This function maps the wider space of member names to valid PostgreSQL names. Names have their case lowered, dashes and periods common in hostnames are replaced with underscores, other characters are encoded as their unicode codepoint. Name is truncated to 64 characters. Multiple different member names may map to a single slot name.
- Parameters:
member_name – The string to convert to a slot name.
- Returns:
The string converted using the rules described above.
patroni.postgresql package
Submodules
patroni.postgresql.bootstrap module
patroni.postgresql.callback_executor module
patroni.postgresql.cancellable module
- class patroni.postgresql.cancellable.CancellableExecutorView on GitHub
Bases:
object
There must be only one such process so that AsyncExecutor can easily cancel it.
- __init__() None View on GitHub
- _kill_children() None View on GitHub
- _kill_process() None View on GitHub
- class patroni.postgresql.cancellable.CancellableSubprocessView on GitHub
Bases:
CancellableExecutor
- __init__() None View on GitHub
- cancel(kill: bool = False) None View on GitHub
- reset_is_cancelled() None View on GitHub
patroni.postgresql.citus module
patroni.postgresql.config module
patroni.postgresql.connection module
patroni.postgresql.misc module
patroni.postgresql.postmaster module
patroni.postgresql.rewind module
patroni.postgresql.slots module
patroni.postgresql.sync module
patroni.postgresql.validator module
Module contents
patroni.scripts package
Submodules
patroni.scripts.aws module
- class patroni.scripts.aws.AWSConnection(cluster_name: str | None)View on GitHub
Bases:
object
- __init__(cluster_name: str | None) None View on GitHub
- _tag_ebs(conn: Any, role: str) None View on GitHub
set tags, carrying the cluster name, instance role and instance id for the EBS storage
- _tag_ec2(conn: Any, role: str) None View on GitHub
tag the current EC2 instance with a cluster role
- aws_available() bool View on GitHub
- on_role_change(new_role: str) bool View on GitHub
- retry(*args: Any, **kwargs: Any) Any View on GitHub
- patroni.scripts.aws.main()View on GitHub
patroni.scripts.wale_restore module
Module contents
patroni.watchdog package
Submodules
Module contents
Submodules
patroni.api module
patroni.async_executor module
patroni.collections module
Patroni custom object types somewhat like collections
module.
Provides a case insensitive dict
and set
object types.
- class patroni.collections.CaseInsensitiveDict(data: Dict[str, Any] | None = None)View on GitHub
Bases:
MutableMapping
[str
,Any
]A case-insensitive
dict
-like object.Implements all methods and operations of
MutableMapping
as well asdict
’scopy()
. All keys are expected to be strings. The structure remembers the case of the last key to be set, anditer()
,dict.keys()
,dict.items()
,dict.iterkeys()
, anddict.iteritems()
will contain case-sensitive keys. However, querying and contains testing is case insensitive.- __init__(data: Dict[str, Any] | None = None) None View on GitHub
Create a new instance of
CaseInsensitiveDict
with the given data.- Parameters:
data – initial dictionary to create a
CaseInsensitiveDict
from.
- _abc_impl = <_abc._abc_data object>
- copy() CaseInsensitiveDict View on GitHub
Create a copy of this dict.
- Returns:
a new dict object with the same keys and values of this dict.
- keys() KeysView[str] View on GitHub
Return a new view of the dict’s keys.
- Returns:
a set-like object providing a view on the dict’s keys
- class patroni.collections.CaseInsensitiveSet(values: Collection[str] | None = None)View on GitHub
Bases:
MutableSet
[str
]A case-insensitive
set
-like object.Implements all methods and operations of
MutableSet
. All values are expected to be strings. The structure remembers the case of the last value set, however, contains testing is case insensitive.- __init__(values: Collection[str] | None = None) None View on GitHub
Create a new instance of
CaseInsensitiveSet
with the given values.- Parameters:
values – values to be added to the set.
- _abc_impl = <_abc._abc_data object>
- add(value: str) None View on GitHub
Add value to this set.
Search is performed case-insensitively. If value is already in the set, overwrite it with value, so we “remember” the last case of value.
- Parameters:
value – value to be added to the set.
- discard(value: str) None View on GitHub
Remove value from this set.
Search is performed case-insensitively. If value is not present in the set, no exception is raised.
- Parameters:
value – value to be removed from the set.
- issubset(other: CaseInsensitiveSet) bool View on GitHub
Check if this set is a subset of other.
- Parameters:
other – another set to be compared with this set.
- Returns:
True
if this set is a subset of other, elseFalse
.
patroni.config module
patroni.config_generator module
patroni.ctl module
patroni.daemon module
Daemon processes abstraction module.
This module implements abstraction classes and functions for creating and managing daemon processes in Patroni.
Currently it is only used for the main “Thread” of patroni
and patroni_raft_controller
commands.
- class patroni.daemon.AbstractPatroniDaemon(config: Config)View on GitHub
Bases:
ABC
A Patroni daemon process.
Note
When inheriting from
AbstractPatroniDaemon
you are expected to define the methods_run_cycle()
to determine what it should do in each execution cycle, and_shutdown()
to determine what it should do when shutting down.- Variables:
logger – log handler used by this daemon.
config – configuration options for this daemon.
- __init__(config: Config) None View on GitHub
Set up signal handlers, logging handler and configuration.
- Parameters:
config – configuration options for this daemon.
- _abc_impl = <_abc._abc_data object>
- abstract _run_cycle() None View on GitHub
Define what the daemon should do in each execution cycle.
Keep being called in the daemon’s main loop until the daemon is eventually terminated.
- abstract _shutdown() None View on GitHub
Define what the daemon should do when shutting down.
- api_sigterm() bool View on GitHub
Guarantee only a single SIGTERM is being processed.
Flag the daemon as “SIGTERM received” with a lock-based approach.
- Returns:
True
if the daemon was flagged as “SIGTERM received”.
- reload_config(sighup: bool = False, local: bool | None = False) None View on GitHub
Reload configuration.
- Parameters:
sighup – if it is related to a SIGHUP signal. The sighup parameter could be used in the method overridden in a child class.
local – will be
True
if there are changes in the local configuration file.
- run() None View on GitHub
Run the daemon process.
Start the logger thread and keep running execution cycles until a SIGTERM is eventually received. Also reload configuration uppon receiving SIGHUP.
- setup_signal_handlers() None View on GitHub
Set up daemon signal handlers.
Set up SIGHUP and SIGTERM signal handlers.
Note
SIGHUP is only handled in non-Windows environments.
- shutdown() None View on GitHub
Shut the daemon down when a SIGTERM is received.
Shut down the daemon process and the logger thread.
- sighup_handler(*_: Any) None View on GitHub
Handle SIGHUP signals.
Flag the daemon as “SIGHUP received”.
- sigterm_handler(*_: Any) None View on GitHub
Handle SIGTERM signals.
Terminate the daemon process through
api_sigterm()
.
- patroni.daemon.abstract_main(cls: Type[AbstractPatroniDaemon], configfile: str) None View on GitHub
Create the main entry point of a given daemon process.
- Parameters:
cls – a class that should inherit from
AbstractPatroniDaemon
.configfile –
- patroni.daemon.get_base_arg_parser() ArgumentParser View on GitHub
Create a basic argument parser with the arguments used for both patroni and raft controller daemon.
- Returns:
‘argparse.ArgumentParser’ object
patroni.exceptions module
Implement high-level Patroni exceptions.
More specific exceptions can be found in other modules, as subclasses of any exception defined in this module.
- exception patroni.exceptions.ConfigParseError(value: Any)View on GitHub
Bases:
PatroniException
Any issue identified while loading or validating the Patroni configuration.
- exception patroni.exceptions.DCSError(value: Any)View on GitHub
Bases:
PatroniException
Parent class for all kind of DCS related exceptions.
- exception patroni.exceptions.PatroniException(value: Any)View on GitHub
Bases:
Exception
Parent class for all kind of Patroni exceptions.
- Variables:
value – description of the exception.
- __init__(value: Any) None View on GitHub
Create a new instance of
PatroniException
with the given description.- Parameters:
value – description of the exception.
- exception patroni.exceptions.PatroniFatalException(value: Any)View on GitHub
Bases:
PatroniException
Catastrophic exception that prevents Patroni from performing its job.
- exception patroni.exceptions.PostgresConnectionException(value: Any)View on GitHub
Bases:
PostgresException
Any problem faced while connecting to a Postgres instance.
- exception patroni.exceptions.PostgresException(value: Any)View on GitHub
Bases:
PatroniException
Any exception related with Postgres management.
- exception patroni.exceptions.WatchdogError(value: Any)View on GitHub
Bases:
PatroniException
Any problem faced while managing a watchdog device.
patroni.file_perm module
Helper object that helps with figuring out file and directory permissions based on permissions of PGDATA.
- var logger:
logger of this module.
- var pg_perm:
instance of the
__FilePermissions
object.
- class patroni.file_perm.__FilePermissionsView on GitHub
Bases:
object
Helper class for managing permissions of directories and files under PGDATA.
Execute
set_permissions_from_data_directory()
to figure out which permissions should be used for files and directories under PGDATA based on permissions of PGDATA root directory.- _FilePermissions__PG_DIR_MODE_GROUP = 488
- _FilePermissions__PG_DIR_MODE_OWNER = 448
- _FilePermissions__PG_FILE_MODE_GROUP = 416
- _FilePermissions__PG_FILE_MODE_OWNER = 384
- _FilePermissions__PG_MODE_MASK_GROUP = 23
- _FilePermissions__PG_MODE_MASK_OWNER = 63
- _FilePermissions__set_group_permissions() None View on GitHub
Make directories/files accessible by the owner and readable by group.
- _FilePermissions__set_owner_permissions() None View on GitHub
Make directories/files accessible only by the owner.
- _FilePermissions__set_umask() None View on GitHub
Set umask value based on calculations.
Note
Should only be called once either
__set_owner_permissions()
or__set_group_permissions()
has been executed.
- __init__() None View on GitHub
Create a
__FilePermissions
object and set default permissions.
- set_permissions_from_data_directory(data_dir: str) None View on GitHub
Set new permissions based on provided data_dir.
- Parameters:
data_dir – reference to PGDATA to calculate permissions from.
patroni.ha module
patroni.log module
Patroni logging facilities.
Daemon processes will use a 2-step logging handler. Whenever a log message is issued it is initially enqueued in-memory and is later asynchronously flushed by a thread to the final destination.
- class patroni.log.PatroniLoggerView on GitHub
Bases:
Thread
Logging thread for the Patroni daemon process.
It is a 2-step logging approach. Any time a log message is issued it is initially enqueued in-memory, and then asynchronously flushed to the final destination by the logging thread.
See also
QueueHandler
: object used for enqueueing messages in-memory.- Variables:
DEFAULT_LEVEL – default logging level (
INFO
).DEFAULT_TRACEBACK_LEVEL – default traceback logging level (
ERROR
).DEFAULT_FORMAT – default format of log messages (
%(asctime)s %(levelname)s: %(message)s
).NORMAL_LOG_QUEUE_SIZE – expected number of log messages per HA loop when operating under a normal situation.
DEFAULT_MAX_QUEUE_SIZE – default maximum queue size for holding a backlog of log messages that are pending to be flushed.
LOGGING_BROKEN_EXIT_CODE – exit code to be used if it detects(
5
).log_handler – log handler that is currently being used by the thread.
log_handler_lock – lock used to modify
log_handler
.
- DEFAULT_FORMAT = '%(asctime)s %(levelname)s: %(message)s'
- DEFAULT_LEVEL = 'INFO'
- DEFAULT_MAX_QUEUE_SIZE = 1000
- DEFAULT_TRACEBACK_LEVEL = 'ERROR'
- LOGGING_BROKEN_EXIT_CODE = 5
- NORMAL_LOG_QUEUE_SIZE = 2
- __init__() None View on GitHub
Prepare logging queue and proxy handlers as they become ready during daemon startup.
Note
While Patroni is starting up it keeps
DEBUG
log level, and writes log messages through a proxy handler. Once the logger thread is finally started, it switches from that proxy handler to the queue based logger, and applies the configured log settings. The switching is used to avoid that the logger thread prevents Patroni from shutting down if any issue occurs in the meantime until the thread is properly started.
- _close_old_handlers() None View on GitHub
Close old log handlers.
Note
It is used to remove different handlers that were configured previous to a reload in the configuration, e.g. if we are switching from
RotatingFileHandler
to class:~logging.StreamHandler and vice-versa.
- reload_config(config: Dict[str, Any]) None View on GitHub
Apply log related configuration.
Note
It is also able to deal with runtime configuration changes.
- Parameters:
config –
log
section from Patroni configuration.
- run() None View on GitHub
Run logger’s thread main loop.
Keep consuming log queue until requested to quit through
None
special log record.
- shutdown() None View on GitHub
Shut down the logger thread.
- update_loggers() None View on GitHub
Configure loggers’ log level as defined in
log.loggers
section of Patroni configuration.Note
It creates logger objects that are not defined yet in the log manager.
- class patroni.log.ProxyHandler(patroni_logger: PatroniLogger)View on GitHub
Bases:
Handler
Handle log records in place of pending log handlers.
Note
This is used to handle log messages while the logger thread has not started yet, in which case the queue-based handler is not yet started.
- Variables:
patroni_logger – the logger thread.
- __init__(patroni_logger: PatroniLogger) None View on GitHub
Create a new
ProxyHandler
instance.- Parameters:
patroni_logger – the logger thread.
- emit(record: LogRecord) None View on GitHub
Emit each log record that is handled.
Will push the log record down to
handle()
method of the currently configured log handler.- Parameters:
record – the record that was emitted.
- class patroni.log.QueueHandlerView on GitHub
Bases:
Handler
Queue-based logging handler.
- Variables:
queue – queue to hold log messages that are pending to be flushed to the final destination.
- __init__() None View on GitHub
Queue initialised and initial records_lost established.
- _put_record(record: LogRecord) None View on GitHub
Asynchronously enqueue a log record.
- Parameters:
record – the record to be logged.
- _try_to_report_lost_records() None View on GitHub
Report the number of log messages that have been lost and reset the counter.
Note
It will issue an
WARNING
message in the logs with the number of lost log messages.
- emit(record: LogRecord) None View on GitHub
Handle each log record that is emitted.
Call
_put_record()
to enqueue the emitted log record.Also check if we have previously lost any log record, and if so, log a
WARNING
message.- Parameters:
record – the record that was emitted.
- patroni.log.debug_exception(self: Logger, msg: object, *args: Any, **kwargs: Any) None View on GitHub
Add full stack trace info to debug log messages and partial to others.
Handle
exception()
calls for self.Note
If self log level is set to
DEBUG
, then issue aDEBUG
message with the complete stack trace;- If self log level is
INFO
or higher, then issue anERROR
message with only the last line of the stack trace.
- If self log level is
- Parameters:
self – logger for which
exception()
will be processed.msg – the message related to the exception to be logged.
args – positional arguments to be passed to
debug()
orerror()
.kwargs – keyword arguments to be passed to
debug()
orerror()
.
- patroni.log.error_exception(self: Logger, msg: object, *args: Any, **kwargs: Any) None View on GitHub
Add full stack trace info to error messages.
Handle
exception()
calls for self.Note
By default issue an
ERROR
message with the complete stack trace. If you do not want to show the complete stack trace, call withexc_info=False
.
- Parameters:
self – logger for which
exception()
will be processed.msg – the message related to the exception to be logged.
args – positional arguments to be passed to
error()
.kwargs – keyword arguments to be passed to
error()
.
patroni.psycopg module
patroni.raft_controller module
patroni.request module
patroni.utils module
Utilitary objects and functions that can be used throughout Patroni code.
- var tzutc:
UTC time zone info object.
- var logger:
logger of this module.
- var USER_AGENT:
identifies the Patroni version, Python version, and the underlying platform.
- var OCT_RE:
regular expression to match octal numbers, signed or unsigned.
- var DEC_RE:
regular expression to match decimal numbers, signed or unsigned.
- var HEX_RE:
regular expression to match hex strings, signed or unsigned.
- var DBL_RE:
regular expression to match double precision numbers, signed or unsigned. Matches scientific notation too.
- var WHITESPACE_RE:
regular expression to match whitespace characters
- class patroni.utils.Retry(max_tries: int | None = 1, delay: float = 0.1, backoff: int = 2, max_jitter: float = 0.8, max_delay: int = 3600, sleep_func: ~typing.Callable[[int | float], None] = <function _sleep>, deadline: int | float | None = None, retry_exceptions: ~typing.Type[Exception] | ~typing.Tuple[~typing.Type[Exception], ...] = <class 'patroni.exceptions.PatroniException'>)View on GitHub
Bases:
object
Helper for retrying a method in the face of retryable exceptions.
- Variables:
max_tries – how many times to retry the command.
delay – initial delay between retry attempts.
backoff – backoff multiplier between retry attempts.
max_jitter – additional max jitter period to wait between retry attempts to avoid slamming the server.
max_delay – maximum delay in seconds, regardless of other backoff settings.
sleep_func – function used to introduce artificial delays.
deadline – timeout for operation retries.
retry_exceptions – single exception or tuple
- __init__(max_tries: int | None = 1, delay: float = 0.1, backoff: int = 2, max_jitter: float = 0.8, max_delay: int = 3600, sleep_func: ~typing.Callable[[int | float], None] = <function _sleep>, deadline: int | float | None = None, retry_exceptions: ~typing.Type[Exception] | ~typing.Tuple[~typing.Type[Exception], ...] = <class 'patroni.exceptions.PatroniException'>) None View on GitHub
Create a
Retry
instance for retrying function calls.- Parameters:
max_tries – how many times to retry the command.
-1
means infinite tries.delay – initial delay between retry attempts.
backoff – backoff multiplier between retry attempts. Defaults to
2
for exponential backoff.max_jitter – additional max jitter period to wait between retry attempts to avoid slamming the server.
max_delay – maximum delay in seconds, regardless of other backoff settings.
sleep_func – function used to introduce artificial delays.
deadline – timeout for operation retries.
retry_exceptions – single exception or tuple
- copy() Retry View on GitHub
Return a clone of this retry manager.
- ensure_deadline(timeout: float, raise_ex: Exception | None = None) bool View on GitHub
Calculates, sets, and checks the remaining deadline time.
- Parameters:
timeout – if the deadline is smaller than the provided timeout value raise raise_ex exception.
raise_ex – the exception object that will be raised if the deadline is smaller than provided timeout.
- Returns:
False
if deadline is smaller than a provided timeout and raise_ex isn’t set. OtherwiseTrue
.- Raises:
Exception
: raise_ex if calculated deadline is smaller than provided timeout.
- reset() None View on GitHub
Reset the attempt counter, delay and stop time.
- property sleeptime: float
Get next cycle sleep time.
It is based on the current delay plus a number up to
max_jitter
.
- update_delay() None View on GitHub
Set next cycle delay.
It will be the minimum value between:
current delay with
backoff
; ormax_delay
.
- exception patroni.utils.RetryFailedError(value: Any)View on GitHub
Bases:
PatroniException
Maximum number of attempts exhausted in retry operation.
- patroni.utils._sleep(interval: int | float) None View on GitHub
Wrap
sleep()
.- Parameters:
interval – Delay execution for a given number of seconds. The argument may be a floating point number for subsecond precision.
- patroni.utils.cluster_as_json(cluster: Cluster, global_config: GlobalConfig | None = None) Dict[str, Any] View on GitHub
Get a JSON representation of cluster.
- Parameters:
cluster – the
Cluster
object to be parsed as JSON.global_config – optional
GlobalConfig
object to check the cluster state. if not provided will be instantiated from the Cluster.config.
- Returns:
JSON representation of cluster.
These are the possible keys in the returning object depending on the available information in cluster:
members
: list of members in the cluster. Each value is adict
that may have the following keys:name
: the name of the host (unique in the cluster). Themembers
list is sorted by this key;role
:leader
,standby_leader
,sync_standby
, orreplica
;state
:stopping
,stopped
,stop failed
,crashed
,running
,starting
,start failed
,restarting
,restart failed
,initializing new cluster
,initdb failed
,running custom bootstrap script
,custom bootstrap failed
, orcreating replica
;
api_url
: REST API URL based onrestapi->connect_address
configuration;host
: PostgreSQL host based onpostgresql->connect_address
;port
: PostgreSQL port based onpostgresql->connect_address
;timeline
: PostgreSQL current timeline;pending_restart
:True
if PostgreSQL is pending to be restarted;scheduled_restart
: scheduled restart timestamp, if any;tags
: any tags that were set for this member;lag
: replication lag, if applicable;
pause
:True
if cluster is in maintenance mode;scheduled_switchover
: if a switchover has been scheduled, then it contains this entry with these keys:at
: timestamp when switchover was scheduled to occur;from
: name of the member to be demoted;to
: name of the member to be promoted.
- patroni.utils.compare_values(vartype: str, unit: str | None, old_value: Any, new_value: Any) bool View on GitHub
Check if old_value and new_value are equivalent after parsing them as vartype.
- Parameters:
vartpe –
the target type to parse old_value and new_value before comparing them. Accepts any among of the following (case sensitive):
bool
: parse values usingparse_bool()
; orinteger
: parse values usingparse_int()
; orreal
: parse values usingparse_real()
; orenum
: parse values as lowercase strings; orstring
: parse values as strings. This one is used by default if no valid value is passed as vartype.
unit – base unit to be used as argument when calling
parse_int()
orparse_real()
for new_value.old_value – value to be compared with new_value.
new_value – value to be compared with old_value.
- Returns:
True
if old_value is equivalent to new_value when both are parsed as vartype.- Example:
>>> compare_values('enum', None, 'remote_write', 'REMOTE_WRITE') True
>>> compare_values('string', None, 'remote_write', 'REMOTE_WRITE') False
>>> compare_values('real', None, '1e-06', 0.000001) True
>>> compare_values('integer', 'MB', '6GB', '6GB') False
>>> compare_values('integer', None, '6GB', '6GB') False
>>> compare_values('integer', '16384kB', '64', ' 0x400 MB ') True
>>> compare_values('integer', '2MB', 524288, '1TB') True
>>> compare_values('integer', 'MB', 1048576, '1TB') True
>>> compare_values('integer', 'kB', 4098, '4097.5kB') True
- patroni.utils.convert_to_base_unit(value: int | float, unit: str, base_unit: str | None) int | float | None View on GitHub
Convert value as a unit of compute information or time to base_unit.
- Parameters:
value – value to be converted to the base unit.
unit –
unit of value. Accepts these units (case sensitive):
For space:
B
,kB
,MB
,GB
, orTB
;For time:
d
,h
,min
,s
,ms
, orus
.
base_unit –
target unit in the conversion. May contain the target unit with an associated value, e.g
512MB
. Accepts these units (case sensitive):For space:
B
,kB
, orMB
;For time:
ms
,s
, ormin
.
- Returns:
value in unit converted to base_unit. Returns
None
if unit or base_unit is invalid.- Example:
>>> convert_to_base_unit(1, 'GB', '256MB') 4
>>> convert_to_base_unit(1, 'GB', 'MB') 1024
>>> convert_to_base_unit(1, 'gB', '512MB') is None True
>>> convert_to_base_unit(1, 'GB', '512 MB') is None True
- patroni.utils.data_directory_is_empty(data_dir: str) bool View on GitHub
Check if a PostgreSQL data directory is empty.
Note
In non-Windows environments data_dir is also considered empty if it only contains hidden files and/or
lost+found
directory.- Parameters:
data_dir – the PostgreSQL data directory to be checked.
- Returns:
True
if data_dir is empty.
- patroni.utils.deep_compare(obj1: Dict[Any, Any | Dict[Any, Any]], obj2: Dict[Any, Any | Dict[Any, Any]]) bool View on GitHub
Recursively compare two dictionaries to check if they are equal in terms of keys and values.
Note
Values are compared based on their string representation.
- Parameters:
obj1 – dictionary to be compared with obj2.
obj2 – dictionary to be compared with obj1.
- Returns:
True
if all keys and values match between the two dictionaries.- Example:
>>> deep_compare({'1': None}, {}) False
>>> deep_compare({'1': {}}, {'1': None}) False
>>> deep_compare({'1': [1]}, {'1': [2]}) False
>>> deep_compare({'1': 2}, {'1': '2'}) True
>>> deep_compare({'1': {'2': [3, 4]}}, {'1': {'2': [3, 4]}}) True
- patroni.utils.enable_keepalive(sock: socket, timeout: int, idle: int, cnt: int = 3) None View on GitHub
Enable keepalive for sock.
Will set socket options depending on the platform, as per return of
keepalive_socket_options()
.Note
Value for
TCP_KEEPINTVL
will be calculated throughkeepalive_intvl()
based on timeout, idle, and cnt.- Parameters:
sock – the socket for which keepalive will be enabled.
timeout – value for
TCP_USER_TIMEOUT
.idle – value for
TCP_KEEPIDLE
.cnt – value for
TCP_KEEPCNT
.
- Returns:
output of
ioctl()
if we are on Windows, nothing otherwise.
- patroni.utils.get_major_version(bin_dir: str | None = None, bin_name: str = 'postgres') str View on GitHub
Get the major version of PostgreSQL.
It is based on the output of
postgres --version
.- Parameters:
bin_dir – path to the PostgreSQL binaries directory. If
None
or an empty string, it will use the first bin_name binary that is found by the subprocess in thePATH
.bin_name – name of the postgres binary to call (
postgres
by default)
- Returns:
the PostgreSQL major version.
- Raises:
PatroniException
: if the postgres binary call failed due toOSError
.- Example:
Returns 9.6 for PostgreSQL 9.6.24
Returns 15 for PostgreSQL 15.2
- patroni.utils.is_subpath(d1: str, d2: str) bool View on GitHub
Check if the file system path d2 is contained within d1 after resolving symbolic links.
Note
It will not check if the paths actually exist, it will only expand the paths and resolve any symbolic links that happen to be found.
- Parameters:
d1 – path to a directory.
d2 – path to be checked if is within d1.
- Returns:
True
if d1 is a subpath of d2.
- patroni.utils.iter_response_objects(response: HTTPResponse) Iterator[Dict[str, Any]] View on GitHub
Iterate over the chunks of a
HTTPResponse
and yield each JSON document that is found.- Parameters:
response – the HTTP response from which JSON documents will be retrieved.
- Yields:
current JSON document.
- patroni.utils.keepalive_intvl(timeout: int, idle: int, cnt: int = 3) int View on GitHub
Calculate the value to be used as
TCP_KEEPINTVL
based on timeout, idle, and cnt.- Parameters:
timeout – value for
TCP_USER_TIMEOUT
.idle – value for
TCP_KEEPIDLE
.cnt – value for
TCP_KEEPCNT
.
- Returns:
the value to be used as
TCP_KEEPINTVL
.
- patroni.utils.keepalive_socket_options(timeout: int, idle: int, cnt: int = 3) Iterator[Tuple[int, int, int]] View on GitHub
Get all keepalive related options to be set in a socket.
- Parameters:
timeout – value for
TCP_USER_TIMEOUT
.idle – value for
TCP_KEEPIDLE
.cnt – value for
TCP_KEEPCNT
.
- Yields:
all keepalive related socket options to be set. The first item in the tuple is the protocol, the second item is the option, and the third item is the value to be used. The return values depend on the platform:
Windows
:SO_KEEPALIVE
.
Linux
:SO_KEEPALIVE
;TCP_USER_TIMEOUT
;TCP_KEEPIDLE
;TCP_KEEPINTVL
;TCP_KEEPCNT
.
MacOS
:SO_KEEPALIVE
;TCP_KEEPIDLE
;TCP_KEEPINTVL
;TCP_KEEPCNT
.
- patroni.utils.parse_bool(value: Any) bool | None View on GitHub
Parse a given value to a
bool
object.Note
- The parsing is case-insensitive, and takes into consideration these values:
on
,true
,yes
, and1
asTrue
.off
,false
,no
, and0
asFalse
.
- Parameters:
value – value to be parsed to
bool
.- Returns:
the parsed value. If not able to parse, returns
None
.- Example:
>>> parse_bool(1) True
>>> parse_bool('off') False
>>> parse_bool('foo')
- patroni.utils.parse_int(value: Any, base_unit: str | None = None) int | None View on GitHub
Parse value as an
int
.- Parameters:
value – any value that can be handled either by
strtol()
orstrtod()
. If value contains a unit, then base_unit must be given.base_unit – an optional base unit to convert value through
convert_to_base_unit()
. Not used if value does not contain a unit.
- Returns:
the parsed value, if able to parse. Otherwise returns
None
.- Example:
>>> parse_int('1') == 1 True
>>> parse_int(' 0x400 MB ', '16384kB') == 64 True
>>> parse_int('1MB', 'kB') == 1024 True
>>> parse_int('1000 ms', 's') == 1 True
>>> parse_int('1TB', 'GB') is None True
>>> parse_int(50, None) == 50 True
>>> parse_int("51", None) == 51 True
>>> parse_int("nonsense", None) == None True
>>> parse_int("nonsense", "kB") == None True
>>> parse_int("nonsense") == None True
>>> parse_int(0) == 0 True
>>> parse_int('6GB', '16MB') == 384 True
>>> parse_int('4097.4kB', 'kB') == 4097 True
>>> parse_int('4097.5kB', 'kB') == 4098 True
- patroni.utils.parse_real(value: Any, base_unit: str | None = None) float | None View on GitHub
Parse value as a
float
.- Parameters:
value – any value that can be handled by
strtod()
. If value contains a unit, then base_unit must be given.base_unit – an optional base unit to convert value through
convert_to_base_unit()
. Not used if value does not contain a unit.
- Returns:
the parsed value, if able to parse. Otherwise returns
None
.- Example:
>>> parse_real(' +0.0005 ') == 0.0005 True
>>> parse_real('0.0005ms', 'ms') == 0.0 True
>>> parse_real('0.00051ms', 'ms') == 0.001 True
- patroni.utils.patch_config(config: Dict[Any, Any | Dict[Any, Any]], data: Dict[Any, Any | Dict[Any, Any]]) bool View on GitHub
Update and append to dictionary config from overrides in data.
Note
If the value of a given key in data is
None
, then the key is removed from config;If a key is present in data but not in config, the key with the corresponding value is added to config
For keys that are present on both sides it will compare the string representation of the corresponding values, if the comparison doesn’t match override the value
- Parameters:
config – configuration to be patched.
data – new configuration values to patch config with.
- Returns:
True
if config was changed.
- patroni.utils.polling_loop(timeout: int | float, interval: int | float = 1) Iterator[int] View on GitHub
Return an iterator that returns values every interval seconds until timeout has passed.
Note
Timeout is measured from start of iteration.
- Parameters:
timeout – for how long (in seconds) from now it should keep returning values.
interval – for how long to sleep before returning a new value.
- Yields:
current iteration counter, starting from
0
.
- patroni.utils.read_stripped(file_path: str) Iterator[str] View on GitHub
Iterate over stripped lines in the given file.
- Parameters:
file_path – path to the file to read from
- Yields:
each line from the given file stripped
- patroni.utils.split_host_port(value: str, default_port: int | None) Tuple[str, int] View on GitHub
Extract host(s) and port from value.
- Parameters:
value –
string from where host(s) and port will be extracted. Accepts either of these formats:
host:port
; orhost1,host2,...,hostn:port
.
Each
host
portion of value can be either:A FQDN; or
An IPv4 address; or
An IPv6 address, with or without square brackets.
default_port – if no port can be found in param, use default_port instead.
- Returns:
the first item is composed of a CSV list of hosts from value, and the second item is either the port from value or default_port.
- Example:
>>> split_host_port('127.0.0.1', 5432) ('127.0.0.1', 5432)
>>> split_host_port('127.0.0.1:5400', 5432) ('127.0.0.1', 5400)
>>> split_host_port('127.0.0.1,192.168.0.101:5400', 5432) ('127.0.0.1,192.168.0.101', 5400)
>>> split_host_port('127.0.0.1,www.mydomain.com,[fe80:0:0:0:213:72ff:fe3c:21bf], 0:0:0:0:0:0:0:0:5400', 5432) ('127.0.0.1,www.mydomain.com,fe80:0:0:0:213:72ff:fe3c:21bf,0:0:0:0:0:0:0:0', 5400)
- patroni.utils.strtod(value: Any) Tuple[float | None, str] View on GitHub
Extract the double precision part from the beginning of a string that reprensents a configuration value.
As most as possible close equivalent of
strtod(3)
C function, which is used by postgres to parse parameter values.- Parameters:
value – any value from which we want to extract a double precision.
- Returns:
the first item is the extracted double precision from value, and the second item is the remaining string of value. If not able to match a double precision in value, then the first item will be
None
, and the second item will be the original value.- Example:
>>> strtod(' A ') == (None, 'A') True
>>> strtod('1 A ') == (1.0, ' A') True
>>> strtod('1.5A') == (1.5, 'A') True
>>> strtod('8.325e-10A B C') == (8.325e-10, 'A B C') True
- patroni.utils.strtol(value: Any, strict: bool | None = True) Tuple[int | None, str] View on GitHub
Extract the long integer part from the beginning of a string that represents a configuration value.
As most as possible close equivalent of
strtol(3)
C function (with base=0), which is used by postgres to parse parameter values.Takes into consideration numbers represented either as hex, octal or decimal formats.
- Parameters:
value – any value from which we want to extract a long integer.
strict – dictates how the first item in the returning tuple is set when
strtol()
is not able to find a long integer in value. If strict isTrue
, then the first item will beNone
, else it will be1
.
- Returns:
the first item is the extracted long integer from value, and the second item is the remaining string of value. If not able to match a long integer in value, then the first item will be either
None
or1
(depending on strict argument), and the second item will be the original value.- Example:
>>> strtol(0) == (0, '') True
>>> strtol(1) == (1, '') True
>>> strtol(9) == (9, '') True
>>> strtol(' +0x400MB') == (1024, 'MB') True
>>> strtol(' -070d') == (-56, 'd') True
>>> strtol(' d ') == (None, 'd') True
>>> strtol(' 1 d ') == (1, ' d') True
>>> strtol('9s', False) == (9, 's') True
>>> strtol(' s ', False) == (1, 's') True
- patroni.utils.unquote(string: str) str View on GitHub
Unquote a fully quoted string.
- Parameters:
string – The string to be checked for quoting.
- Returns:
The string with quotes removed, if it is a fully quoted single string, or the original string if quoting is not detected, or unquoting was not possible.
- Examples:
A string with quotes will have those quotes removed
>>> unquote('"a quoted string"') 'a quoted string'
A string with multiple quotes will be returned as is
>>> unquote('"a multi" "quoted string"') '"a multi" "quoted string"'
So will a string with unbalanced quotes
>>> unquote('unbalanced "quoted string') 'unbalanced "quoted string'
- patroni.utils.uri(proto: str, netloc: List[str] | Tuple[str, int | str] | str, path: str | None = '', user: str | None = None) str View on GitHub
Construct URI from given arguments.
- Parameters:
proto – the URI protocol.
netloc –
the URI host(s) and port. Can be specified in either way among
A
str
in either of these formats:host:port
; orhost1,host2,...,hostn:port
.
In all cases, each
host
portion of netloc can be either:An FQDN; or
An IPv4 address; or
An IPv6 address, with or without square brackets.
path – the URI path.
user – the authenticating user, if any.
- Returns:
constructed URI.
- patroni.utils.validate_directory(d: str, msg: str = '{} {}') None View on GitHub
Ensure directory exists and is writable.
Note
If the directory does not exist,
validate_directory()
will attempt to create it.- Parameters:
d – the directory to be checked.
msg –
a message to be thrown when raising
PatroniException
, if any issue is faced. It must contain 2 placeholders to be used byformat()
:The first placeholder will be replaced with path d;
The second placeholder will be replaced with the error condition.
- Raises:
PatroniException
: if any issue is observed while validating d. Can be thrown if:d did not exist, and
validate_directory()
was not able to create it; ord is an existing directory, but Patroni is not able to write to that directory; or
d is an existing file, not a directory.
patroni.validator module
Patroni configuration validation helpers.
This module contains facilities for validating configuration of Patroni processes.
- var schema:
configuration schema of the daemon launched by
patroni
command.
- class patroni.validator.AtMostOne(*args: str)View on GitHub
Bases:
object
Mark that at most one option from a
Case
can be suplied.Represents a list of possible configuration options in a given scope, where at most one can actually be provided.
Note
It should be used together with a
Case
object.- __init__(*args: str) None View on GitHub
Create a :class`AtMostOne` object.
- Parameters:
*args – any arguments that the caller wants to be stored in this
Or
object.- Example:
AtMostOne("nofailover", "failover_priority"): Case({ "nofailover": bool, "failover_priority": IntValidator(min=0, raise_assert=True), })
The :class`AtMostOne` object is used to define that at most one of
nofailover
andfailover_priority
can be provided.
- class patroni.validator.BinDirectory(contains: List[str] | None = None, contains_executable: List[str] | None = None)View on GitHub
Bases:
Directory
Check if a Postgres binary directory contains the expected files.
It is a subclass of
Directory
with an extended capability: translatingBINARIES
according to configuredpostgresql.bin_name
, if any.- Variables:
BINARIES – list of executable files that should exist directly under a given Postgres binary directory.
- BINARIES = ['pg_ctl', 'initdb', 'pg_controldata', 'pg_basebackup', 'postgres', 'pg_isready']
- validate(name: str) Iterator[Result] View on GitHub
Check if the expected executables can be found under name binary directory.
- Parameters:
name – path to the base directory against which executables will be validated. Check against PATH if name is not provided.
- Yields:
objects with the error message related to the failure, if any check fails.
- class patroni.validator.Case(schema: Dict[str, Any])View on GitHub
Bases:
object
Map how a list of available configuration options should be validated.
Note
It should be used together with an
Or
object. TheOr
object will define the list of possible configuration options in a given context, and theCase
object will dictate how to validate each of them, if they are set.- __init__(schema: Dict[str, Any]) None View on GitHub
Create a
Case
object.- Parameters:
schema – the schema for validating a set of attributes that may be available in the configuration. Each key is the configuration that is available in a given scope and that should be validated, and the related value is the validation function or expected type.
- Example:
Case({ "host": validate_host_port, "url": str, })
That will check that
host
configuration, if given, is valid based onvalidate_host_port()
, and will also check thaturl
configuration, if given, is astr
instance.
- class patroni.validator.Directory(contains: List[str] | None = None, contains_executable: List[str] | None = None)View on GitHub
Bases:
object
Check if a directory contains the expected files.
The attributes of objects of this class are used by their
validate()
method.- Parameters:
contains – list of paths that should exist relative to a given directory.
contains_executable – list of executable files that should exist directly under a given directory.
- __init__(contains: List[str] | None = None, contains_executable: List[str] | None = None) None View on GitHub
Create a
Directory
object.- Parameters:
contains – list of paths that should exist relative to a given directory.
contains_executable – list of executable files that should exist directly under a given directory.
- _check_executables(path: str | None = None) Iterator[Result] View on GitHub
Check that all executables from contains_executable list exist within the given directory or within
PATH
.- Parameters:
path – optional path to the base directory against which executables will be validated. If not provided, check within
PATH
.- Yields:
objects with the error message containing the name of the executable, if any check fails.
- validate(name: str) Iterator[Result] View on GitHub
Check if the expected paths and executables can be found under name directory.
- Parameters:
name – path to the base directory against which paths and executables will be validated. Check against
PATH
if name is not provided.- Yields:
objects with the error message related to the failure, if any check fails.
- class patroni.validator.EnumValidator(allowed_values: Tuple[str, ...], case_sensitive: bool = False, raise_assert: bool = False)View on GitHub
Bases:
object
Validate enum setting
- Variables:
allowed_values – a
set
orCaseInsensitiveSet
object with allowed enum values.raise_assert – if an
assert
call should be performed regarding expected type and valid range.
- __init__(allowed_values: Tuple[str, ...], case_sensitive: bool = False, raise_assert: bool = False) None View on GitHub
Create an
EnumValidator
object with given allowed values.- Parameters:
allowed_values – a tuple with allowed enum values
case_sensitive – set to
True
to do case sensitive comparisonsraise_assert – if an
assert
call should be performed regarding expected values.
- class patroni.validator.IntValidator(min: int | None = None, max: int | None = None, base_unit: str | None = None, expected_type: Any = None, raise_assert: bool = False)View on GitHub
Bases:
object
Validate an integer setting.
- Variables:
min – minimum allowed value for the setting, if any.
max – maximum allowed value for the setting, if any.
base_unit – the base unit to convert the value to before checking if it’s within min and max range.
expected_type – the expected Python type.
raise_assert – if an
assert
test should be performed regarding expected type and valid range.
- __init__(min: int | None = None, max: int | None = None, base_unit: str | None = None, expected_type: Any = None, raise_assert: bool = False) None View on GitHub
Create an
IntValidator
object with the given rules.- Parameters:
min – minimum allowed value for the setting, if any.
max – maximum allowed value for the setting, if any.
base_unit – the base unit to convert the value to before checking if it’s within min and max range.
expected_type – the expected Python type.
raise_assert – if an
assert
test should be performed regarding expected type and valid range.
- class patroni.validator.Optional(name: str, default: Any | None = None)View on GitHub
Bases:
object
Mark a configuration option as optional.
- Variables:
name – name of the configuration option.
default – value to set if the configuration option is not explicitly provided
- class patroni.validator.Or(*args: Any)View on GitHub
Bases:
object
Represent the list of options that are available.
It can represent either a list of configuration options that are available in a given scope, or a list of validation functions and/or expected types for a given configuration option.
- __init__(*args: Any) None View on GitHub
Create an
Or
object.- Parameters:
*args – any arguments that the caller wants to be stored in this
Or
object.- Example:
Or("host", "hosts"): Case({ "host": validate_host_port, "hosts": Or(comma_separated_host_port, [validate_host_port]), })
The outer
Or
is used to define thathost
andhosts
are possible options in this scope. The inner :class`Or` in thehosts
key value is used to define thathosts
option is valid if either ofcomma_separated_host_port()
orvalidate_host_port()
succeed to validate it.
- class patroni.validator.Result(status: bool, error: str | None = "didn't pass validation", level: int = 0, path: str = '', data: Any = '')View on GitHub
Bases:
object
Represent the result of a given validation that was performed.
- Variables:
status – If the validation succeeded.
path – YAML tree path of the configuration option.
data – value of the configuration option.
level – error level, in case of error.
error – error message if the validation failed, otherwise
None
.
- __init__(status: bool, error: str | None = "didn't pass validation", level: int = 0, path: str = '', data: Any = '') None View on GitHub
Create a
Result
object based on the given arguments.Note
error
attribute is only set if status is failed.- Parameters:
status – if the validation succeeded.
error – error message related to the validation that was performed, if the validation failed.
level – error level, in case of error.
path – YAML tree path of the configuration option.
data – value of the configuration option.
- class patroni.validator.Schema(validator: Any)View on GitHub
Bases:
object
Define a configuration schema.
It contains all the configuration options that are available in each scope, including the validation(s) that should be performed against each one of them. The validations will be performed whenever the
Schema
object is called, or itsvalidate()
method is called.- Variables:
validator –
validator of the configuration schema. Can be any of these:
str
: defines that a string value is required; ortype
: any subclass oftype
, defines that a value of the given type is required; orcallable
: any callable object, defines that validation will follow the code defined in the callable object. If the callable object contains anexpected_type
attribute, then it will check if the configuration value is of the expected type before calling the code of the callable object; orlist
: list representing one or more values in the configuration; ordict
: dictionary representing the YAML configuration tree.
- __init__(validator: Any) None View on GitHub
Create a
Schema
object.Note
This class is expected to be initially instantiated with a
dict
based validator argument. The idea is that dict represents the full YAML tree of configuration options. Thevalidate()
method will then walk recursively through the configuration tree, creating new instances ofSchema
with the new “base path”, to validate the structure and the leaf values of the tree. The recursion stops on leaf nodes, when it performs checks of the actual setting values.- Parameters:
validator –
validator of the configuration schema. Can be any of these:
str
: defines that a string value is required; ortype
: any subclass oftype
, defines that a value of the given type is required; orcallable
: Any callable object, defines that validation will follow the code defined in the callable object. If the callable object contains anexpected_type
attribute, then it will check if the configuration value is of the expected type before calling the code of the callable object; orlist
: list representing it expects to contain one or more values in the configuration; ordict
: dictionary representing the YAML configuration tree.
The first 3 items in the above list are here referenced as “base validators”, which cause the recursion to stop.
If validator is a
dict
, then you should follow these rules:For the keys it can be either:
A
str
instance. It will be the name of the configuration option; orAn
Optional
instance. Thename
attribute of that object will be the name of the configuration option, and that class makes this configuration option as optional to the user, allowing it to not be specified in the YAML; or- An
Or
instance. Theargs
attribute of that object will contain a tuple of configuration option names. At least one of them should be specified by the user in the YAML;
- An
For the values it can be either:
A new
dict
instance. It will represent a new level in the YAML configuration tree; orA
Case
instance. This is required if the key of this value is anOr
instance, and theCase
instance is used to map each of theargs
inOr
to their corresponding base validator inCase
; orAn
Or
instance with one or more base validators; orA
list
instance with a single item which is the base validator; orA base validator.
- Example:
Schema({ "application_name": str, "bind": { "host": validate_host, "port": int, }, "aliases": [str], Optional("data_directory"): "/var/lib/myapp", Or("log_to_file", "log_to_db"): Case({ "log_to_file": bool, "log_to_db": bool, }), "version": Or(int, float), })
This sample schema defines that your YAML configuration follows these rules:
It must contain an
application_name
entry which value should be astr
instance;It must contain a
bind.host
entry which value should be valid as per functionvalidate_host
;It must contain a
bind.port
entry which value should be anint
instance;It must contain a
aliases
entry which value should be alist
ofstr
instances;It may optionally contain a
data_directory
entry, with a value which should be a string;It must contain at least one of
log_to_file
orlog_to_db
, with a value which should be abool
instance;It must contain a
version
entry which value should be either anint
or afloat
instance.
- _data_key(key: str | Optional | Or | AtMostOne) Iterator[str] View on GitHub
Map a key from the
validator
dictionary to the corresponding key(s) in thedata
dictionary.- Parameters:
key – key from the
validator
attribute.- Yields:
keys that should be used to access corresponding value in the
data
attribute.
- iter() Iterator[Result] View on GitHub
Iterate over
validator
, if it is an iterable object, to validate the corresponding entries indata
.Only
dict
,list
,Directory
andOr
objects are considered iterable objects.- Yields:
objects with the error message related to the failure, if any check fails.
- iter_dict() Iterator[Result] View on GitHub
Iterate over a
dict
basedvalidator
to validate the corresponding entries indata
.- Yields:
objects with the error message related to the failure, if any check fails.
- iter_or() Iterator[Result] View on GitHub
Perform all validations defined in an
Or
object for a given configuration option.This method can be only called against leaf nodes in the configuration tree.
Or
objects defined in thevalidator
keys will be handled byiter_dict()
method.- Yields:
objects with the error message related to the failure, if any check fails.
- validate(data: Any) Iterator[Result] View on GitHub
Perform all validations from the schema against the given configuration.
It first checks that data argument type is compliant with the type of
validator
attribute.- Additionally:
- If
validator
attribute is a callable object, calls it to validate data argument. Before doing so, if validator contains an
expected_type
attribute, check if data argument is compliant with that expected type.
- If
- Parameters:
data – configuration to be validated against
validator
.- Yields:
objects with the error message related to the failure, if any check fails.
- patroni.validator._get_type_name(python_type: Any) str View on GitHub
Get a user-friendly name for a given Python type.
- Parameters:
python_type – Python type which user friendly name should be taken.
- Returns:
User friendly name of the given Python type.
- patroni.validator.assert_(condition: bool, message: str = 'Wrong value') None View on GitHub
Assert that a given condition is
True
.If the assertion fails, then throw a message.
- Parameters:
condition – result of a condition to be asserted.
message – message to be thrown if the condition is
False
.
- patroni.validator.comma_separated_host_port(string: str) bool View on GitHub
Validate a list of host and port items.
Call
validate_host_port_list()
with a list represented by the CSV string.- Parameters:
string – comma-separated list of host and port items.
- Returns:
True
if all items in the CSV string are valid.
- patroni.validator.data_directory_empty(data_dir: str) bool View on GitHub
Check if PostgreSQL data directory is empty.
- Parameters:
data_dir – path to the PostgreSQL data directory to be checked.
- Returns:
True
if the data directory is empty.
- patroni.validator.get_bin_name(bin_name: str) str View on GitHub
Get the value of
postgresql.bin_name[*bin_name*]
configuration option.- Parameters:
bin_name – a key to be retrieved from
postgresql.bin_name
configuration.- Returns:
value of
postgresql.bin_name[*bin_name*]
, if present, otherwise bin_name.
- patroni.validator.is_ipv4_address(ip: str) bool View on GitHub
Check if ip is a valid IPv4 address.
- Parameters:
ip – the IP to be checked.
- Returns:
True
if the IP is an IPv4 address.- Raises:
ConfigParseError
: if ip is not a valid IPv4 address.
- patroni.validator.is_ipv6_address(ip: str) bool View on GitHub
Check if ip is a valid IPv6 address.
- Parameters:
ip – the IP to be checked.
- Returns:
True
if the IP is an IPv6 address.- Raises:
ConfigParseError
: if ip is not a valid IPv6 address.
- patroni.validator.validate_binary_name(bin_name: str) bool View on GitHub
Validate the value of
postgresql.binary_name[*bin_name*]
configuration option.If
postgresql.bin_dir
is set and the value of the bin_name meets these conditions:The path join of
postgresql.bin_dir
plus the bin_name value exists; andThe path join as above is executable
If
postgresql.bin_dir
is not set, then validate that the value of bin_name meets this condition:Is found in the system PATH using
which
- Parameters:
bin_name – the value of the
postgresql.bin_name[*bin_name*]
- Returns:
True
if the conditions are true- Raises:
ConfigParseError
if:bin_name is not set; or
the path join of the
postgresql.bin_dir
plus bin_name does not exist; orthe path join as above is not executable; or
the bin_name cannot be found in the system PATH
- patroni.validator.validate_connect_address(address: str) bool View on GitHub
Check if options related to connection address were properly configured.
- Parameters:
address – address to be validated in the format
host:ip
.- Returns:
True
if the address is valid.- Raises:
ConfigParseError
:If the address is not in the expected format; or
If the host is set to not allowed values (
127.0.0.1
,0.0.0.0
,*
,::1
, orlocalhost
).
- patroni.validator.validate_data_dir(data_dir: str) bool View on GitHub
Validate the value of
postgresql.data_dir
configuration option.It requires that
postgresql.data_dir
is set and match one of following conditions:Point to a path that does not exist yet; or
Point to an empty directory; or
Point to a non-empty directory that seems to contain a valid PostgreSQL data directory.
- Parameters:
data_dir – the value of
postgresql.data_dir
configuration option.- Returns:
True
if the PostgreSQL data directory is valid.- Raises:
ConfigParseError
:If no data_dir was given; or
If data_dir is a file and not a directory; or
- If data_dir is a non-empty directory and:
PG_VERSION
file is not available in the directorypg_wal
/pg_xlog
is not available in the directoryPG_VERSION
content does not match the major version reported bypostgres --version
- patroni.validator.validate_host_port(host_port: str, listen: bool = False, multiple_hosts: bool = False) bool View on GitHub
Check if host(s) and port are valid and available for usage.
- Parameters:
host_port –
the host(s) and port to be validated. It can be in either of these formats:
host:ip
, if multiple_hosts isFalse
; orhost_1,host_2,...,host_n:port
, if multiple_hosts isTrue
.
listen – if the address is expected to be available for binding.
False
means it expects to connect to that address, andTrue
that it expects to bind to that address.multiple_hosts – if host_port can contain multiple hosts.
- Returns:
True
if the host(s) and port are valid.- Raises:
ConfigParseError
:If the host_port is not in the expected format; or
If
*
was specified along with more hosts in host_port; orIf we are expecting to bind to an address that is already in use; or
If we are not able to connect to an address that we are expecting to do so; or
If
gaierror
is thrown by socket module when attempting to connect to the given address(es).
- patroni.validator.validate_host_port_list(value: List[str]) bool View on GitHub
Validate a list of host(s) and port items.
Call
validate_host_port()
with each item in value.- Parameters:
value – list of host(s) and port items to be validated.
- Returns:
True
if all items are valid.
- patroni.validator.validate_host_port_listen(host_port: str) bool View on GitHub
Check if host and port are valid and available for binding.
Call
validate_host_port()
with listen set toTrue
.- Parameters:
host_port – the host and port to be validated. Must be in the format
host:ip
.- Returns:
True
if the host and port are valid and available for binding.
- patroni.validator.validate_host_port_listen_multiple_hosts(host_port: str) bool View on GitHub
Check if host(s) and port are valid and available for binding.
Call
validate_host_port()
with both listen and multiple_hosts set toTrue
.- Parameters:
host_port –
the host(s) and port to be validated. It can be in either of these formats
host:ip
; orhost_1,host_2,...,host_n:port
- Returns:
True
if the host(s) and port are valid and available for binding.
- patroni.validator.validate_watchdog_mode(value: Any) None View on GitHub
Validate
watchdog.mode
configuration option.- Parameters:
value – value of
watchdog.mode
to be validated.
patroni.version module
This module specifies the current Patroni version.
- var __version__:
the current Patroni version.
Module contents
Define general variables and functions for patroni
.
- var PATRONI_ENV_PREFIX:
prefix for Patroni related configuration environment variables.
- var KUBERNETES_ENV_PREFIX:
prefix for Kubernetes related configuration environment variables.
- var MIN_PSYCOPG2:
minimum version of
psycopg2
required by Patroni to work.- var MIN_PSYCOPG3:
minimum version of
psycopg
required by Patroni to work.
- patroni.parse_version(version: str) Tuple[int, ...] View on GitHub
Convert version from human-readable format to tuple of integers.
Note
Designed for easy comparison of software versions in Python.
- Parameters:
version – human-readable software version, e.g.
2.5.4.dev1 (dt dec pq3 ext lo64)
.- Returns:
tuple of version parts, each part as an integer.
- Example:
>>> parse_version('2.5.4.dev1 (dt dec pq3 ext lo64)') (2, 5, 4)