Welcome to BestConfig’s documentation!

BestConfig is a system for automatically finding a best configuration setting within a resource limit for a deployed system under a given application workload. BestConfig is designed with an extensible architecture to automate the configuration tuning for general systems.

Contents

QuickStart

Good tools make system performance tuning quicker, easier and cheaper than if everything is done manually or by experience.

Bestconfig can find better configurations for a specific large-scale system deployed for a given application workload.

Overview

_images/BestConfig.png

Deployment architecture

Here, “deployment environment” refers to the actual running environment of your applications, while “staging environment” is some environment that is almost the same as the deployment environment but where tests are run without interfering the actual application.

_images/workflow.jpg

The process of deploying BestConfig

The detailed method of using BestConfig to tune practical system is as the following, which can be showed by a case of spark tuning.

BestConfig Tuning – Taking Spark as the example SUT

Step 1. Deploy shells scripts for system under tune

There are 9 shell scripts in BestConfig and they are classified into two groups.

  1. One group consists of 5 shell scripts. They are start.sh, isStart.sh, stop.sh, isClosed.sh and terminateSystem.sh and deployed on the system under tune.

_images/shells-tune.jpg

The scripts of start.sh and stop.sh deployed on worker and master node are different.

(1) Shell scripts (start.sh and stop.sh) on master node

_images/start.jpg

start.sh(master) – this script will start the system on the master node

_images/stop.jpg

stop.sh(master) – this script will stop the system on the master node

(2) Shell scripts (start.sh and stop.sh) on worker node

_images/start_worker.jpg

start.sh(worker) – this script will start the system on the worker node

_images/stop_worker.jpg

stop.sh(worker) – this script will stop the system on the worker node

  1. Identical shell scripts on master and worker node

_images/isStart.jpg

isStart.sh – this script will return OK if the system is successfully started

_images/terminateSystem.jpg

terminateSystem.sh – this script will terminate the system process on the server

_images/isClosed.jpg

isClosed.sh – this script will return OK if the system is successfully terminated

  1. The other group consists of 4 shell scripts. They are startTest.sh, getTestResult.sh, terminateTest.sh and isFinished.sh and deployed on the test node.

_images/shell-test.jpg

_images/startTest.jpg

startTest.sh – this script will start a test towards the system under tune

_images/isFinished.jpg

isFinished.sh – this script will return OK if the test is done

_images/getTestResult.jpg

getTestResult.sh – this script will return performance metrics regarding the test

_images/terminateTest.jpg

terminateTest.sh – this script will terminate the testing process

Step 2. Implement the ConfigReadin and ConfigWrite interfaces

As for spark tuning, we need to implement the ConfigReadin and ConfigWrite interfaces as SparkConfigReadin and SparkConfigWrite.

Next, we need to compile SparkConfigReadin and SparkconfigWrite to bytecodes. Then the location(path) of compiled bytecodes need to be added to classpath of BestConfig project.

_images/interface1.jpg

_images/interface2.jpg

_images/interface3.jpg

Step 3. Specify the parameter set for tuning and their ranges

  1. An example of defaultConfig.yaml (specifying the parameters for tuning)

_images/defaultConfig.jpg

(2) An example of defaultConfig.yaml_range (the valid ranges of parameters)

_images/defaultConfig_range.jpg

Step 4. Specify the resource limit and things about the tuning environment (or, sample size/round number)

  1. bestconf.properties

_images/bestconf_propertiesNew4.jpg

  1. SUTconfig.properties

_images/SUTconfig_propertiesNew3.jpg

Step 5. Start BestConfig

Now, you can start BestConfig. BestConfig will automatically run the tuning process without any requirement for user interferences, until the tuning process ends due to resource exhaustion or unhandlable environment errors.

BestConfig will output the best configuration setting into files once the tuning is done.

You can start bestconfig with the help of ant. The detailed instructions are as follows.

(1). cd bestconf-master

(2). ant compile

(3). ant run

Implementing your own sampling/tuing algorithms for BestConfig

You can also choose to extend and tailor BestConfig for your specific use cases using your own sampling/tuning algorithms.

  1. To implement your own sampling algorithms –> Extend the
    abstract class of ConfigSampler

_images/ConfigSampler1.jpg

_images/ConfigSampler2.jpg

  1. To implement your own tuning algorithms –> Implement the
    interface of Optimization

_images/Optimization1.jpg

_images/Optimization2.jpg

_images/Optimization3.jpg

_images/Optimization4.jpg

FAQ

Q1. When I tried to start bestconfig by executing start.sh, an exception of “can’t find the class of bestconf” occured?

You need to move bestconf.jar to deploy directory. And remember to move the data directory to the deploy directory.

Q2. Why an exception of “connection refused” occured?

You need to modify the ssh_config file located in /etc/ssh/ directory. Set the value of “PermitRootLogin” to yes and then restart the ssh service.

Q3. In production environment, the scale of hadoop cluster is considerably large, for example, the number of nodes in hadoop cluster may reach 1500, so the overhead of staring and stopping the cluster is significantly huge. In that case, is BestConfig still applicable?

No problem. This just depends on how long you can put up with the tuning process. If you can bear a long waiting time, then you can use Bestconfig to obtain a best configuration setting.

Q4. Can BestConfig only be used for software system tuning? Can it also be able to tune hardware system?

Now, BestConfig can tune both software and hardware systems. Given the set of parameters of system (software / hardware) under tune and their valid ranges, then BestConfig is able to generate a best configuration setting.

Q5. Can BestConfig only be used to tune a certain system such as MySQL or Hadoop? Is it able to tune other systems such as database systems or big data systems.

Yes! BestConfig is a general system tuning tool and able to tune the widely used systems including MySQL, Tomcat, JVM, Hadoop(Hive),Spark, Cassandra, etc.

Q6. Is BestConfig tuning process based on the online real system and workload?

No, the tuning process of BestConfig is run on the staging environment which is a mirror of the production environment, using the same actual deployment settings (e.g. hardware, clustering, software, etc.).

It is mainly for a final test, using live data, of the system before production. And, implementing the real application workload in the workload generator is possible for the system in the staging environment, e.g., by log replay.

When the tuning process on the staging systems ends, the best configuration setting is obtained. Then we apply the best configuration setting to the real system. In this way, samples can be collected without affecting applications on the real system deployment.

Q7. The workloads generated by large number of users on the online systems are constantly changing. Can the workloads applied in the staging environment exactly simulate the real online workloads?

The application workloads can be classified into two types. One is periodic workload, the other is non-periodic workload. As for periodic workload, we can apply log replay to implement the real application workload. As for non-periodic workload, we can use benchmark to simulate a workload that highly similar to real application workload.

Q8. How to build BestConfig project from source?

It’s easy for users to build BestConfig project from source. You only need to import the whole Bestconfig project into eclipse and then build it.

Q9. How to use BestConfig?

You can use BestConfig by the following steps.

Step 1. Download the latest release of BestConfig and then unzip it. After that, enter the directory of bestconf-master/deploy.

Step 2. Set up a system for tuning. In the project, we offer deployable examples for 6 systems, including Spark, Hive+Hadoop, Cassandra, MySQL, and Tomcat. We also specify the workload generators to be used for tuning the systems.
The detailed steps for setting up BestConfig for you own systems are presented in QuickStart.

Step 3. Run BestConfig.
As for linux system, firstly, you need to update all system and deployment related scripts accordingly and move them to the correct path on the servers. Next, you need to move the system-specific jar file to lib. (For example, move deploy/4BI/bestconfBI.jar to deploy/lib). At last, you enter the directory of deploy and run the shell of “start.sh”.

Q10. Why did I encounter a system exception of “java.lang.ClassNotFoundException” when I tried to run BestConfig for tomcat system?

Firstly, you need to implement the ConfigReadin and ConfigWrite interfaces as TomcatConfigReadin and TomcatConfigWrite.

Next, you need to compile TomcatConfigReadin and TomcatconfigWrite to bytecodes. After that, the location(path) of compiled bytecodes need to be added to classpath of BestConfig project.

Q11. How can I get the detailed information of of BestConfig’s implementation mechanism?

You can read the following two papers to get all the information that you need.

[1] Yuqing ZHU, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Ma, Zhuoyue Liu, Kunpeng Song, Yingchun Yang. BestConfig: Tapping the Performance Potential of Systems via Automatic Configuration Tuning. Proceedings of the ACM Symposium on Cloud Computing 2017 (SoCC’17) [pdf_Socc2017] [slides_Socc2017]

[2] Yuqing ZHU, Jianxun Liu, Mengying Guo, Yungang Bao. ACTS in Need: Automatic Configuration Tuning with Scalability. Proceedings of the 8th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys’17) [pdf_APSys2017]

Use cases

BestConfig for Hadoop + Hive

Experimental Settings

We executed Bestconfig for the Hadoop cluster with 4 nodes. The Hadoop cluster consists of 1 master node and 3 slave nodes. All nodes used in our experiment are shown below.

Node OS CPU Memory
Master CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Slave 1 CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Slave 2 CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Slave 3 CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Performance Surface

We use HiBench that is a widely adopted benchmark tools in the workload generator for Spark to generate the target workload. Figure 1 plot the highly differed performance surfaces for Hadoop+Hive Join workload.

_images/hadoop-join.jpg

The performance surface of Hadoop+Hive under Hibench-Join workload

Test Results

The test results of Hadoop under Join workload hadoopJoin.arff.

The test results of Hadoop under Pagerank workload hadoopPageRank.arff.

The test results of Hadoop under Join workload with 500 samples join-trainingBestConf.arff and join-BestConfig.arff.

Interface Impl

The source files of HadoopConfigReadin and HadoopConfigWrite implement the interfaces of ConfigReadin and ConfigWrite respectively.

BestConfig for Spark

Experimental Settings

We executed Bestconfig for the spark cluster with 4 nodes. The spark cluster consists of 1 master node and 3 slave nodes. All nodes used in our experiment are shown below.

Node OS CPU Memory
Master CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Slave 1 CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Slave 2 CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Slave 3 CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Performance Surface

We use HiBench that is a widely adopted benchmark tools in the workload generator for Spark to generate the target workload. Figure 1 plot the highly differed performance surfaces for Spark Pagerank workload.

_images/spark-pagerank.jpg

The performance surface of Spark under Hibench-Pagerank workload

Test Results

The test result of Spark pagerank workload pagerank. The test result of Spark kmeans workload kmeans.

Interface Impl

The source files of SparkConfigReadin and SparkConfigWrite implement the interfaces of ConfigReadin and ConfigWrite respectively.

BestConfig for Cassandra

Experimental Settings

We executed Bestconfig for the spark cluster with 4 nodes. The spark cluster consists of 1 master node and 3 slave nodes. All nodes used in our experiment are shown below.

Node OS CPU Memory
Cassandra 1 CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Cassandra 2 CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Cassandra 3 CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
YCSB CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Performance Surface

We use YCSB that is a widely adopted benchmark tools in the workload generator for Cassandra to generate the target workload. Currently, the workload adopted in our test is workoada, and we set recorecount to 17000000 and operationcount to 720000. Figure 1 is the scatter plot of performance for Cassandra under YCSB workloada.

_images/cassandra-scatter.jpg

The scatter plot of performance for Cassandra under YCSB workloada.

Test Results

The test result of Cassandra under YCSB workloada cassandraYcsba.arff.

Interface Impl

The source files of CassandraConfigReadin and CassandraConfigWrite implement the interfaces of ConfigReadin and ConfigWrite respectively.

BestConfig for MySQL

Experimental Settings

We executed Bestconfig for the MySQL system, and we applied sysbench to test the performance of MySQL. All nodes used in our experiment are shown below.

Node OS CPU Memory
MySQL CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Sysbench CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Performance Surface

We use Sysbench that is a widely adopted benchmark tools in the workload generator for MySQL to generate the target workload. Currently, the test type in our experiment is oltp and the test mode is simple, and we set num-threads to 16, oltp-table-size to 10000000, and max-time to 300. Figure 1 is the scatter plot of performance for MySQL under OLTP simple test mode.

_images/mysql-simple.jpg

The scatter plot of performance for MySQL under OLTP simple test mode

Test Results

The result of MySQL under the zipfian read-write workload MySQL_zipfian_readwrite.arff. The result of MySQL under OLTP simple test mode MySQL_OLTP_simple.arff.

Interface Impl

The source files of MySQLConfigReadin and MySQLConfigWrite implement the interfaces of ConfigReadin and ConfigWrite respectively.

BestConfig for Tomcat Server

Experimental Settings

We executed Bestconfig for the Tomcat server, and we applied sysbench to test the performance of Tomcat server. All nodes used in our experiment are shown below.

Node OS CPU Memory
Tomcat Server CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
JMeter CentOS 16 Intel(R) Xeon(R) CPU E5620 @ 2.40GHz 32GB
Performance Surface

We use JMeter that is a widely adopted benchmark tools in the workload generator for Tomcat to generate the target workload.

_images/tomcat.png

The performance surface of Tomcat under a page navigation workload.

Test Results

All the test resuls of Tomcat under different workloads -> Tomcat_Results.

Interface Impl

The source files of TomcatConfigReadin and TomcatConfigWrite implement the interfaces of ConfigReadin and ConfigWrite respectively.

Citing BestConfig

Please cite:

@inproceedings{Zhu:2017:BTP:3127479.3128605,
  author = {Zhu, Yuqing and Liu, Jianxun and Guo, Mengying and Bao, Yungang and
            Ma, Wenlong and Liu, Zhuoyue and Song, Kunpeng and Yang, Yingchun},
  title = {BestConfig: Tapping the Performance Potential of Systems via Automatic
           Configuration Tuning},
  booktitle = {Proceedings of the 2017 Symposium on Cloud Computing},
  series = {SoCC '17},
  year = {2017},
  isbn = {978-1-4503-5028-0},
  location = {Santa Clara, California},
  pages = {338--350},
  numpages = {13},
  url = {http://doi.acm.org/10.1145/3127479.3128605},
  acmid = {3128605},
  publisher = {ACM},
  address = {New York, NY, USA},
  keywords = {ACT, automatic configuration tuning, performance optimization},
}