Rubix Documentation¶
RubiX is a light-weight data caching framework that can be used by Big-Data engines. RubiX can be extended to support any engine that accesses data in cloud stores using Hadoop FileSystem interface via plugins. Using the same plugins, RubiX can also be extended to be used with any cloud store.
RubiX¶
RubiX is a light-weight data caching framework that can be used by Big-Data engines. RubiX can be extended to support any engine that accesses data in cloud stores using Hadoop FileSystem interface via plugins. Using the same plugins, RubiX can also be extended to be used with any cloud store
Use Case¶
RubiX provides disk or in-memory caching of data, which would otherwise be accessed over network when it resides in cloud store, thereby improving performance.
Installation Guide¶
This section provides instructions to install RubiX and use it with Presto, Hive, or Spark. If you want help to install on-prem or on other distributions, please contact the community with your questions or issues.
These instructions require RubiX Admin to be installed on your cluster.
Getting Started¶
Note
Make sure passwordless SSH is set up for your cluster before starting.
Install RubiX¶
Install RubiX Admin using PIP:
pip install rubix_admin
Run rubix_admin -h
to generate a config file for RubiX Admin at ~/.radminrc
.
Add the worker node IPs to the workers list so the file looks like the following:
coordinator:
- localhost
workers:
- <worker-ip1>
- <worker-ip2>
..
remote_packages_path: /tmp/rubix_rpms
Once RubiX Admin is configured, install the latest version of RubiX on all nodes specified in ~/.radminrc
:
rubix_admin installer install --cluster-type <type>
To install a specific version of RubiX:
rubix_admin installer install --cluster-type <type> --rpm-version <rubix-version>
To install from an RPM file:
rubix_admin installer install --cluster-type <type> --rpm <path-to-rubix-rpm>
Start RubiX Daemons¶
Use the following command to start the BookKeeperServer and LocalDataTransferServer on all nodes specified in ~/.radminrc
:
rubix_admin daemon start
To verify that the daemons are running, run the following command on each node:
sudo jps -m
You should see the following two entries in the resulting list:
<pid> RunJar ... com.qubole.rubix.bookkeeper.BookKeeperServer
<pid> RunJar ... com.qubole.rubix.bookkeeper.LocalDataTransferServer
If there was an issue starting the daemons, logs for RubiX can be found at /var/log/rubix/
.
Configure engine to use RubiX¶
Presto¶
In order for Presto to use RubiX, you will first need to create an external table through Hive using RubiX as the URL scheme in place of S3.
Start Hive with the following command. This will restart the metastore server, allowing the rubix:// scheme to be recognized:
hive --hiveconf hive.metastore.uris="" \
--hiveconf fs.rubix.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
You will also need to set your AWS access & secret keys for authenticating with S3:
hive> set fs.s3n.awsAccessKeyId=<access-key>
hive> set fs.s3n.awsSecretAccessKey=<secret-key>
Once this is done, create your external table, but specify rubix://
instead of s3://
as the URL scheme:
CREATE EXTERNAL TABLE...
...
LOCATION 'rubix://<s3-path>'
Once your table is created, it will now be configured to use RubiX.
Spark¶
In order to use Spark with S3, you will need to specify your AWS access & secret keys when running your application:
...
--conf spark.hadoop.fs.s3.awsAccessKeyId=<access-key>
--conf spark.hadoop.fs.s3.awsSecretAccessKey=<secret-key>
...
Alternatively, you can add the following lines to your Spark properties file to set them for every application
(default location: $SPARK_HOME/conf/spark-defaults.conf
):
spark.hadoop.fs.s3.awsAccessKeyId <access-key>
spark.hadoop.fs.s3.awsSecretAccessKey <secret-key>
Note
spark.hadoop.<rubix-conf-key>
)Run your first RubiX-enhanced query¶
Once you have properly configured your data engine, RubiX will now cache data when it is being fetched from S3.
You can verify this in the logs for your data engine, which should show usage of a Caching…S3FileSystem,
as well as in the BookKeeper logs at /var/log/rubix/bks.log
.
Troubleshooting¶
ClassNotFoundException: org.apache.hadoop.fs.s3native.NativeS3FileSystem¶
Hadoop requires the hadoop-aws
JAR in order to access files stored on S3. If Hadoop is unable to find
the NativeS3FileSystem
class, make sure this JAR is included in your Hadoop classpath. This JAR should
be provided as part of your Hadoop installation.
Check Hadoop’s S3 documentation for more details.
ClassNotFoundException: org.jets3t.service.ServiceException¶
Spark requires JetS3t in order to execute applications using S3. If Spark is unable to find this class,
make sure jets3t-x.x.x.jar
is included in spark.driver.extraClassPath
and spark.executor.extraClassPath
in your Spark configuration. This JAR should be provided as part of your Hadoop installation.
Data Engines¶
This section provides additional information for using RubiX with various data engines.
These instructions require that RubiX Admin has been installed on your cluster.
Presto¶
Enable RubiX Caching for Table¶
In order for Presto to use RubiX, you will first need to create an external table through Hive using RubiX as the URL scheme in place of S3.
Start Hive with the following command. This will restart the metastore server, allowing the rubix:// scheme to be recognized:
hive --hiveconf hive.metastore.uris="" \
--hiveconf fs.rubix.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
You will also need to set your AWS access & secret keys for authenticating with S3:
hive> set fs.s3n.awsAccessKeyId=<access-key>
hive> set fs.s3n.awsSecretAccessKey=<secret-key>
Once this is done, create your external table, but specify rubix://
instead of s3://
as the URL scheme:
CREATE EXTERNAL TABLE...
...
LOCATION 'rubix://<s3-path>'
Once your table is created, it will now be configured to use RubiX.
Spark¶
Configuration¶
In order to use Spark with S3, you will need to specify your AWS access & secret keys when running your application:
...
--conf spark.hadoop.fs.s3.awsAccessKeyId=<access-key>
--conf spark.hadoop.fs.s3.awsSecretAccessKey=<secret-key>
...
Alternatively, you can add the following lines to your Spark properties file to set them for every application
(default location: $SPARK_HOME/conf/spark-defaults.conf
):
spark.hadoop.fs.s3.awsAccessKeyId <access-key>
spark.hadoop.fs.s3.awsSecretAccessKey <secret-key>
Note
spark.hadoop.<rubix-conf-key>
)Hive¶
Add RubiX JARs to Hadoop Classpath¶
Add RubiX JARs through the Hive CLI:
hive> add jar /usr/lib/rubix/lib/rubix-hadoop2.jar
hive> add jar /usr/lib/rubix/lib/rubix-common.jar
hive> add jar /usr/lib/rubix/lib/rubix-core.jar
hive> add jar /usr/lib/rubix/lib/rubix-spi.jar
OR
Copy JARs to Hadoop’s shared lib directory:
cp /usr/lib/rubix/lib/rubix-hadoop2.jar \
/usr/lib/rubix/lib/rubix-common.jar \
/usr/lib/rubix/lib/rubix-core.jar \
/usr/lib/rubix/lib/rubix-spi.jar \
/usr/lib/hadoop/lib/share/hadoop/tools/lib
Configure Hive to use RubiX CachingFileSystem for S3/S3N/S3A schemes¶
If you use this option, all file system interactions with tables with their location in AWS S3 will automatically use RubiX’s CachingFileSystem.
hive --hiveconf \
fs.s3.impl=com.qubole.rubix.hadoop2.CachingNativeS3FileSystem \
fs.s3n.impl=com.qubole.rubix.hadoop2.CachingNativeS3FileSystem \
fs.s3a.impl=com.qubole.rubix.hadoop2.CachingS3AFileSystem
Configuration¶
Cache¶
Option | Description | Type | Default | Client/ Server | Applicable to Embedded mode |
---|---|---|---|---|---|
rubix.cache.block.size | The amount of data downloaded per block requested for caching. (if block size = 10MB, request for 45MB of data will download 5 blocks of 10MB) | integer (bytes) | 1048576 (1MB) | C & S | Yes |
rubix.cache.dirprefix.list | The list of directories to be used as parents for storing cache files. Example: /media/ephemeral0/fcache/ | list (comma-separated) | /media/ephemeral | C & S | No |
rubix.cache.max.disks | The number of (zero-indexed) disks within the parent directory to be used for storing cached files. Example: /media/ephemeral0 to /media/ephemeral4 | integer | 5 | C & S | No |
rubix.cache.dirsuffix | The name of the subdirectory to be used for storing cache files. Example: /media/ephemeral0/fcache/ | string | /fcache/ | C & S | Yes |
rubix.cache.expiration.after-write | The time files will be kept in cache prior to eviction. | integer (ms) | MAX_VALUE | S | Yes |
rubix.cache.usage.percentage | The percentage of the disk space that will be filled with cached data before cached files will start being evicted. | integer (%) | 80 | S | Yes |
rubix.cache.strict.mode | Propagate exceptions if there is an error while caching data if true; otherwise fall back on reading data directly from remote file system. | boolean | false | C | No |
rubix.cache.file.staleness-check.enable | When true, always check for updates to file metadata from remote filesystem. When false, file metadata will be cached for a period of time before being fetched again. | boolean | true | S | Yes |
rubix.cache.stale.fileinfo.expiry.period | (rubix.cache.file.staleness-check.enable must be false) The time file metadata will be cached before it will be fetched again from the remote filesystem. | integer (s) | 36000 | S | Yes |
rubix.cache.parallel.warmup | When true, cache will be warmed up asynchronously. | boolean | false | C & S | No |
rubix.cache.dummy.mode | When true, the cache is not populated with data and queries read data directly from the source, but metadata is updated so that statistics such as hitrate, cache reads etc. can be collected as if the data was cached. | boolean | false | C | Yes |
Network¶
Option | Description | Type | Default | Client/Server | Applicable to Embedded mode |
---|---|---|---|---|---|
rubix.network.bookkeeper.server.port | The port on which the BookKeeper server is listening. | integer | 8899 | C & S | No |
rubix.network.local.transfer.server.port | The port on which the Local Data Transfer server is listening. | integer | 8898 | C | No |
rubix.network.client.num-retries | The maximum number of retry attempts for executing calls to the BookKeeper server. | integer | 3 | C & S | Yes |
rubix.network.server.connect.timeout | The maximum time to wait for a connection to the BookKeeper server. | integer (ms) | 1000 | C & S | Yes |
rubix.network.server.socket.timeout | The maximum time to wait for a response to requests sent to the BookKeeper server. | integer (ms) | 3000 | C & S | Yes |
rubix.network.client.read.timeout | The maximum time to wait when reading data from another node. | integer (ms) | 3000 | C | Yes |
Cluster¶
Option | Description | Type | Default | Client / Server | Applicable to Embedded mode |
---|---|---|---|---|---|
rubix.cluster.node.refresh.time | The frequency at which the cluster node membership will be checked. | integer (s) | 300 sec | C & S | Yes |
rubix.cluster.manager.hadoop.class | The ClusterManager class to use for fetching node-related information for Hadoop clusters. |
string | com.qubole.rubix.hadoop2. Hadoop2ClusterManager | C & S | No |
rubix.cluster.manager.presto.class | The ClusterManager class to use for fetching node-related information for Presto clusters. |
string | com.qubole.rubix.presto. PrestoClusterManager | C & S | No |
Metrics¶
Option | Description | Type | Default | Client / Server |
---|---|---|---|---|
rubix.metrics.cache.enabled | Collect cache-level metrics if true. | boolean | true | S |
rubix.metrics.health.enabled | Collect heartbeat metrics if true. | boolean | true | S |
rubix.metrics.jvm.enabled | Collect JVM-level metrics if true. | boolean | false | S |
rubix.metrics.reporters | The reporters to be used for collecting metrics. Options: JMX, GANGLIA | list (comma-separated) | JMX,GANGLIA | S |
rubix.metrics.reporting.interval | The interval at which all registered reporters will report their metrics. | integer (ms) | 10000 | S |
rubix.metrics.ganglia.host | The host at which the Ganglia server (gmond) is running. | string | 127.0.0.1 (localhost) | S |
rubix.metrics.ganglia.port | The port on which the Ganglia server (gmond) is listening. | integer | 8649 | S |
Metrics¶
BookKeeper Server Metrics¶
These metrics are available on the BookKeeper server.
Health Metrics
Metrics relating to daemon & service health.
Metric | Description | Abnormalities |
---|---|---|
rubix.bookkeeper.gauge.live_workers | The number of workers currently reporting to the master node. | Mismatch with number reported by engine (Presto, Spark, etc.) |
rubix.bookkeeper.gauge.caching_validated_workers | The number of workers reporting caching validation success. | Mismatch with live worker count (one or more workers failed validation) |
Cache Metrics
Metrics relating to cache interactions.
Metric | Description | Abnormalities |
---|---|---|
rubix.bookkeeper.gauge.cache_size_mb | The current size of the local cache in MB. | Cache size is bigger than configured capacity |
rubix.bookkeeper.gauge.available_cache_size_mb | The current disk space available for cache in MB. | |
rubix.bookkeeper.count.cache_eviction | The number of files removed from the local cache due to size constraints. | No cache evictions & cache has exceeded configured capacity |
rubix.bookkeeper.count.cache_invalidation | The number of files invalidated from the local cache when the source file has been modified. | |
rubix.bookkeeper.count.cache_expiry | The number of files removed from the local cache once expired. | |
rubix.bookkeeper.gauge.cache_hit_rate | The percentage of cache hits for the local cache. | Cache hit rate near 0% |
rubix.bookkeeper.gauge.cache_miss_rate | The percentage of cache misses for the local cache. | Cache miss rate near 100% |
rubix.bookkeeper.count.total_request | The total number of requests made to read data. | |
rubix.bookkeeper.count.cache_request | The number of requests made to read data cached locally. | No cache requests made |
rubix.bookkeeper.count.nonlocal_request | The number of requests made to read data from another node. | No non-local requests made |
rubix.bookkeeper.count.remote_request | The number of requests made to download data from the data store. | No remote requests made |
rubix.bookkeeper.count.total_async_request | The total number of requests made to download data asynchronously. | |
rubix.bookkeeper.count.processed_async_request | The total number of asynchronous download requests that have already been processed. | |
rubix.bookkeeper.gauge.async_queue_size | The current number of queued asynchronous download requests. | High queue size (requests not being processed) |
rubix.bookkeeper.count.async_downloaded_mb | The amount of data asynchronously
downloaded, in MB.
(If there are no cache evictions, this
should match cache_size_mb .) |
|
rubix.bookkeeper.count.async_download_time | Total time spent on downloading data in sec |
JVM Metrics
Metrics relating to JVM statistics, supplied by the Dropwizard Metrics metrics-jvm
module. (https://metrics.dropwizard.io/3.1.0/manual/jvm/)
Metric | Description | Abnormalities |
---|---|---|
rubix.bookkeeper.jvm.gc.* rubix.ldts.jvm.gc.* | Metrics relating to garbage collection (GarbageCollectorMetricSet) | |
rubix.bookkeeper.jvm.memory.* rubix.ldts.jvm.memory.* | Metrics relating to memory usage (MemoryUsageGaugeSet) | |
rubix.bookkeeper.jvm.threads.* rubix.ldts.jvm.threads.* | Metrics relating to thread states (CachedThreadStatesGaugeSet) |
Client side Metrics¶
These metrics are available on the client side i.e. Presto or Spark where the jobs to read data are run.
Client side Metrics is divided into two:
- Basic stats: These stats are available under name rubix:name=stats
- Detailed stats: These stats are available under name rubix:name=stats,type=detailed
If Rubix is used in embedded mode, an engine specific suffix is added to these names, e.g., Presto adds catalog=<catalog_name> suffix.
Following sections cover the metrics available under both these types in detail.
Basic stats
Metric | Description |
---|---|
mb_read_from_cache | Data read from cache by the client jobs |
mb_read_from_source | Data read from Source by the client jobs |
cache_hit | Cache Hit ratio, between 0 and 1 |
Detailed Stats
Data unit in all metrics above is MB
Contribution Guidelines¶
This section provides guidelines to contribute to the project through code, issues and documentation.
Developer Environment¶
Rubix is a Maven project and uses Java 8. It uses JUnit as the testing framework. Ensure that you have a development environment that support the above configuration.
Pre-requisites¶
thrift
binary needs to be available at/usr/local/bin/thrift
. Rubix will not compile with the newer versions of thrift, it is recommended to install thrift version 0.9.3 by downloading the source from here and installing it using the steps mentioned here- Java JDK 8 needs to be used. If you see an error like
Fatal error compiling: invalid target release: 1.8
during compilation then setup your system to use Java JDK 8 for the build. - For generating the RPM you need the
rpmbuild
command available. On Debian-based systemssudo apt-get install rpmbuild
and on RPM-based systemssudo yum install rpm-build
make it available.
Building¶
Fork your own copy of RubiX into your github account by clicking on the “Fork” button
Navigate to your account and clone that copy to your development box
git clone https://github.com/<username>/rubix
Run tests in the RubiX root directory.
mvn test
Add Qubole RubiX as upstream
git remote add upstream https://github.com/qubole/rubix.git git fetch upstream
How to contribute code on Github¶
1. Create a branch and start working on your change.¶
cd rubix
git checkout -b new_rubix_branch
2. Code¶
- Adhere to code standards.
- Include tests and ensure they pass.
- Add release notes, if required, in (next-release) rst file under
docs/release/release_notes/
3. Commit¶
For every commit please write a short (max 72 characters) summary in the first line followed with a blank line and then more detailed descriptions of the change.
Don’t forget a prefix!
More details in Commit Guidelines
4. Update your branch¶
git fetch upstream
git rebase upstream/master
5. Push to remote¶
git push -u origin new_rubix_branch
6. Issue a Pull Request¶
- Navigate to the Rubix repository you just pushed to (e.g. https://github.com/your-user-name/rubix)
- Click Pull Request.
- Write your branch name in the branch field (this is filled with master by default)
- Click Update Commit Range.
- Ensure the changesets you introduced are included in the Commits tab.
- Ensure that the Files Changed incorporate all of your changes.
- Fill in some details about your potential patch including a meaningful title.
- Click Send pull request.
7. Respond to feedback¶
The RubiX team may recommend adjustments to your code. Part of interacting with a healthy open-source community requires you to be open to learning new techniques and strategies; don’t get discouraged! Remember: if the RubiX team suggest changes to your code, they care enough about your work that they want to include it, and hope that you can assist by implementing those revisions on your own.
8. Postscript¶
Once all the changes are approved, one contributor will push the change to the upstream code.
Coding conventions¶
- two spaces, no tabs
- no trailing whitespaces, blank lines should have no spaces
- Do not mix multiple fixes into a single commit.
- Add comments for your future selves and for your current/future peers
- Do not make whitespace changes as part of your regular/feature commits.
- If you feel whitespace issues need to be fixed, please push a separate commit for the same. It will be approved quickly without any discussion.
Testing¶
This section provides contribution guidelines specific to testing.
Robot Framework Integration Tests¶
For more detailed info regarding Robot Framework and its capabilities, read the Robot Framework user guide.
Test Suites¶
Each .robot
file is a test suite.
Related tests should be kept in the same test suite,
to allow for reuse of variables and suite-level keywords.
Test suites contain the following sections:
*** Settings ***¶
This section is for specifying suite-level documentation, as well as keywords for suite-level setup & teardown.
This section also specifies other sources for keywords used by the test suite. These can be:
- Robot Framework built-in libraries, such as Collections and OperatingSystem
- Other
.robot
files containing helpful keywords, such assetup.robot
for setting up & tearing down tests. - A fully-qualified Java class from a custom library containing methods that can be run as keywords (more info below)
Example:
*** Settings ***
Resource OperatingSystem
Resource bookkeeper.robot
Resource com.qubole.rubix.client.robotframework.BookKeeperClientRFLibrary
*** Variables ***¶
This section contains any variables common to the test cases in the suite.
Common variables needed across test suites include the following:
${WORKINGDIR} ${TEMPDIR}${/}<test-suite-name>
${DATADIR} ${WORKINGDIR}${/}data
${CACHE_DIR_PFX} ${WORKINGDIR}${/}
${CACHE_DIR_SFX} /fcache/
${CACHE_NUM_DISKS} <number-of-cache-disks>
Note: ${TEMPDIR}
is supplied by Robot Framework at points to the operating system’s temp directory,
while ${/}
is the operating system’s path separator.
*** Test Cases ***¶
This is where test cases are defined.
Test cases include a name on its own line, followed by the keywords to be executed for the test on indented lines following it.
For RubiX, we use test templates to run the test and verify that it passes with different modes of execution.
*** Keywords ***¶
This section contains any suite-level keywords, as well as keywords for running tests defined as templates.
Like test cases, keywords include their name on its own line, and the keywords to be run on indented lines after it.
Keywords should include a [Documentation] tag to provide details regarding the purpose and/or usage of the keyword.
Test Cases¶
In general, a test case will require the following components:
- Setup
- Body
- Data generation
- Execution
- Verification
- Teardown
Setup¶
Start the test case with the Cache test setup keyword to start a BookKeeper server with the provided configuration options and create the directory used for storing generated data for the test.
The following example starts a server as a master, and configures the cache directory settings and its maximum size.
Cache test setup
... ${DATADIR}
... rubix.cluster.is-master=true
... rubix.cache.dirprefix.list=${CACHE_DIR_PFX}
... rubix.cache.dirsuffix=${CACHE_DIR_SFX}
... rubix.cache.max.disks=${CACHE_NUM_DISKS}
... rubix.cache.fullness.size=${CACHE_MAX_SIZE}
Test Body¶
Integration tests need to:
- generate any files needed for test execution
- execute whatever steps necessary to sufficiently test the desired scenario
- verify the state of the BookKeeper & cache using metrics and other helper keywords
You can generate data files individually:
${fileName} = Generate single test file ${filePath} ${fileLength}
or as a batch of files with similar characteristics:
@{fileNames} = Generate test files ${filePathPrefix} ${fileLength} ${numberOfFiles}
In order to execute calls using the BookKeeper server, you will need to make a request object.
Similar to generating test files, requests can be generated individually:
${request} = Make read request
... ${fileName}
... ${startBlock}
... ${endBlock}
... ${fileLength}
... ${lastModified}
... ${clusterType}
or as a batch of requests with similar characteristics:
@{requests} = Make similar read requests
... ${fileNames}
... ${startBlock}
... ${endBlock}
... ${fileLength}
... ${lastModified}
... ${clusterType}
For read calls, current execution modes include combinations of:
- Caching data by directly calling the BookKeeper server OR using a client file system
- Executing caching calls sequentially OR concurrently
The execution mode is determined by the keyword name passed into the test template. For example, the template below will first run the test with sequential calls to the BookKeeper server, and then with concurrent calls using the client file system.
Cache eviction
[Template] Test cache eviction
Execute read requests using BookKeeper server call runConcurrently=${false}
Concurrently execute read requests using client file system runConcurrently=${true}
The actual execution of the keyword is controlled by the following step, which will run the keyword concurrently on the specified number of threads if the flag is set to true, or sequentially otherwise.
RUN KEYWORD IF ${runConcurrently}
... Execute concurrent requests
... ${executionKeyword}
... ${numThreads}
... ${requests}
... ELSE
... Execute sequential requests
... ${executionKeyword}
... ${requests}
Test execution can be verified by comparing metrics values to expected values.
Verify metric value ${metricName} ${expectedValue}
As well, you can verify that the size of the cache is the expected size.
Verify cache directory size
... ${cacheDirPrefix}
... ${cacheDirSuffix}
... ${cacheDirNumDisks}
... ${expectedCacheSize}
Teardown¶
Finish the test case with the Cache test teardown keyword as a [Teardown] step; this ensures the BookKeeper server used for this test is properly shut down and the environment is cleaned before execution of the next test.
[Teardown] Cache test teardown ${DATADIR}
Style Guide¶
Variables¶
Variables within a keyword are camelCase.
@{testFileNames} = Generate test files ${REMOTE_PATH} ${FILE_LENGTH} ${NUM_TEST_FILES}
Variables in the “Variables” section of a test case are ALL_CAPS_AND_UNDERSCORE_SEPARATED (like Java constants).
${NUM_EXPECTED_EVICTIONS} 3
Keywords¶
Built-in keywords are ALL CAPS.
CREATE DIRECTORY ${directoryName}
Test keywords (defined in a test or resource .robot
file) are Sentence capitalized.
Generate single test file ${fileName} ${fileLength}
Custom library keywords (eg. from BookKeeperRFClientLibrary) are camel Case.
&{metrics} = get Cache Metrics
Arguments have 2 spaces between the keyword and each other.
Verify metric value ${METRIC_EVICTION} ${NUM_EXPECTED_EVICTIONS}
If the keyword needs more than 3 arguments, place the arguments on separate lines.
${request} = Create test client read request
... ${fileName}
... ${startBlock}
... ${endBlock}
... ${fileLength}
... ${lastModified}
... ${clusterType}
Use named arguments for keywords where possible to enhance clarity.
Verify cache directory size
.....
... expectedCacheSize=${CACHE_MAX_SIZE}
For sets of keywords with similar arguments, alignment of arguments is preferred.
[Template] Test cache eviction
Download requests runConcurrently=${false}
Concurrently download requests runConcurrently=${true}
Read requests runConcurrently=${false}
Concurrently read requests runConcurrently=${true}
Custom Keywords¶
If a test requires more functionality than what Robot Framework can offer
(such as when executing requests using the BookKeeper server), keywords can be created
as functions in BookKeeperClientRFLibrary
. All public methods in this class
are exposed as keywords to be used by Robot Framework.
In the following example, getCacheMetrics()
in BookKeeperClientRFLibrary
is accessible for use by our custom Robot Framework keyword Verify metric value
:
public Map<String, Double> getCacheMetrics() throws IOException, TException
{
try (RetryingBookkeeperClient client = createBookKeeperClient()) {
return client.getCacheMetrics();
}
}
Verify metric value
[Arguments] ${metricName} ${expectedValue}
&{metrics} = get Cache Metrics
...
Commit Message¶
Commits are used as a source of truth for various reports. A couple of examples are:
- Release Notes
- Issues resolved for QA to plan the QA cycle.
To be able to generate these reports, uniform commit messages are required. All your commits should follow the following convention:
For every commit please write a short (max 72 characters) summary in the first line followed with a blank line and then more detailed descriptions of the change.
Format of summary:
ACTION: AUDIENCE: COMMIT_MSG
Description:
ACTION is one of 'chg', 'fix', 'new'
Is WHAT the change is about.
'chg' is for refactor, small improvement, cosmetic changes...
'fix' is for bug fixes
'new' is for new features, big improvement
AUDIENCE is one of 'dev', 'usr', 'pkg', 'test', 'doc'
Is WHO is concerned by the change.
'dev' is for developers (API changes, refactors...)
'usr' is for final users
You will use your environment’s default editor (EDITOR=vi|emacs) to compose the commit message. Do NOT use the command line git commit -m “my mesg” as this only allows you to write a single line that most of the times turns out to be useless to others reading or reviewing your commit.
Example¶
new: dev: #124: report liveness metric for BookKeeper daemon (#139)
Add a liveness gauge that the daemon is up & alive. Right now, this
is a simple check that a thread (reporter to be added in a subsequent
commit) is alive. In the future, this simple framework will be used
to add more comprehensive health checks. Ref: #140
The above example shows the commit summary is:
- a single line composed of four columns
- column 1 tells us the nature of the change or ACTION: new
- a short one-line summary of WHAT the commit is doing
The description or the body of the commit message delves into more detail that is intended to serve as a history for developers on the team on how the code is evolving. There are more immediate uses of this description however. When you raise pull requests to make your contributions into the project, your commit descriptions serve as explanations of WHY you fixed an issue. HOW you fixed an issue is explained by code already. This is also the place where the peer-reviewers will begin understanding your code. An unclear commit message is the source of a lot of back and forth resulting in frustration between reviewers and committers.
Reference: http://chris.beams.io/posts/git-commit/
How to report issues¶
A bug report means something is broken, preventing normal/typical use of Rubix.
Make sure the bug isn’t already resolved. Search for similar issues.
Make sure you have clear instructions to reproduce your problem.
If possible, submit a Pull Request with a failing test, or;if you’d rather take matters into your own hands, try fix the bug yourselfMake a report of everything you know about the bug so far by opening an issue about it.When the bug is fixed, you can usually expect to see an update posted on the reporting issue.
Documentation Style Guide¶
- Documentation uses Sphinx documentation generator.
- Documentation is hosted on ReadTheDocs
- File issues if you notice bugs in documentation or to request more information.
Label issues with
doc
- Contributions to documentation is accepted as a Pull Request.
- Choose Markdown if you will add new pages
- Choose Rich Structured Text (rst) for indexes or if the documentation needs tables.
- To locally test docs changes run
python -msphinx . _build
insidedocs
directory
Release Notes¶
Release 0.3.10¶
- Bypass getFileInfo network call when staleness check is enabled
- Remove runtime dependencies to fix ClassNotFound errors in Embedded mode
- Ensure InputStream for DirectReadRequestChain is always closed
- Make CachingFileSystem extend FilterFileSystem
- Fix connection leak in RetryingPooledThriftClient that happens if a connection terminates with an exception
Release 0.3.11¶
- Fix a regression from 0.3.10 that caused wrong BlockLocations to be returned in CachingFileSystem#listLocatedStatus
- Presto’s native NodeManager can now be used as cluster manager in embedded mode
Release 0.3.12¶
Fixes and Features¶
- Prevent RemoteFetchProcessor from stopping on exception
- Fail fast in BookKeeper startup if no disks are available for caching
- Fix over-estimation of disk usage by cache
- Enable FileSystem object cache in Rubix servers
- Allow configuring Rubix via a separate xml file.
rubix.site.location
can be used to provide location of Rubix configuration file - Removed shading of GCS connector to fix caching over GoogleHadoopFileSystem
New Extensions¶
- CachingPrestoAliyunOSSFileSystem: Caching over AliyunOSSFileSystem
- CachingPrestoAdlFileSystem: Caching over AdlFileSystem
Release 0.3.13¶
Fixes and Features¶
- Generation numbers are added for files cached on disk to avoid several race conditions with invalidations
- Scavenger service has been added to reap the idle connections
- Local Data Server connections are now pooled
- Fail fast when BookKeeper or Local Data Server sockets cannot be created
- Use bounded thread pools BookKeeper and Local Data Server
- Parallel warmup is now enabled by default
Release 0.3.14¶
Fixes and Features¶
- Fixed a regression from 0.3.11 which slows down split generation.
- Jmx stats refactoring to for better accounting of stats.
- Added support to plug in custom reporter for metrics that can send metrics to custom sinks. It can be set used by setting rubix.metrics.reporters=CUSTOM and providing implementation class using rubix.metric-collector.impl.
Release 0.3.15¶
Fixes and Features¶
- Run scavenger thread in daemon mode to allow jvm to exit. This prevents Spark apps from getting stuck during exit.
Release 0.3.16¶
Fixes and Features¶
- Allow ClusterManager implementations to provide hostname and host-address instead of always using fixed ones
- Initialize caching stats in embedded-mode initialisation instead of CachingFileSystem initialisation
Release 0.3.17¶
Note
This release removes shading of thrift jars from rubix-spi. If you are using rubix-spi as a dependency, you will not find the thrift classes from this jar and you will need to use rubix-build jars instead.
Fixes and Features¶
- Improvements in consistent hashing logic to minimize redistributions during change in the membership of the cluster
- Moved shading of Thrift classes into a common sub-module rubix-build. Project is now traversable in IDE and mvn test works at the root. Clients should now include rubix-build artifact instead of including sub-modules independently
- Consider requests served from another node’s cache under cache hit
- Added total_system_source_mb_read stat in detailed metrics to show total data read from source: during read + warmups
Release 0.3.18¶
Fixes and Features¶
- Cleanup cache directories before initializing the cache to correctly measure available disk space
- Update cache status with each successful readRequest in case of parallel warmup to minimize the errors in accounting disk space. Maximum read-request length is also limited to 100MB by default to minimize accounting errors.
Release 0.3.19¶
Fixes and Features¶
- Add an implementation of PrestoClusterManager that does not cache the list of worker nodes. Set rubix.cluster.manager.presto.class as com.qubole.rubix.prestosql.SyncPrestoClusterManager to use the new implementation.