Overview

Configuration files of Optimizer Studio are located in the installation directory /opt/concertio-optimizer/studio/. You can specify your own configuration files via command line switches.
There are two configuration files:

  1. knobs.yaml - Specifies the knobs, input metrics and target function of Optimizer Studio.
  2. settings.yaml - Specifies the system-wide settings of Optimizer Studio.

Optimization Target

Users can configure Optimizer Studio to search for the optimal parameters that maximize a specific target metric.

The target metric can be one of the following:

  1. performance - defined as the total number of retired instructions msr.inst_retired_all measured in all CPUs per second.
  2. energy - defined as the package energy of the CPUs per retired instruction. In case of energy, Optimizer Studio will attempt to minimize the energy per retired instruction.
  3. duration - optimizes for the total duration of a workload, running it until completion and changing knob values between runs.
  4. Any other available metric, as long as it is sampled by Optimizer Studio, such as proc.diskstats.sda.sectors_written. See below more details about metrics.

Example
In order to define performance as a target metric, define the following in the knobs.yaml file (more details about knobs.yaml below)

domain:
  common:
    target: performance

knobs.yaml: knob definition

Knobs are tunable parameters of the system which Optimizer Studio will try to change in order to optimize the system.

Concertio Optimizer Studio ships with embedded set of knob definitions containing many knobs that were tested and benchmarked by Concertio engineers. If you want to use these knobs, specify --knobs=embedded command line switch.

It is possible to provide additional knob files by using the --knobs=/path/to/knob-file.yaml command line switch, where you can add custom knobs that are relevant for your system. If you don't supply your knobs file, the embedded knobs will be used automatically.

Optimizer Studio can accept any number of --knobs=PATH command line switches. In such case Optimizer Studio loads knob files in the order they appear in the command line. This way a knob files appearing later can overload any settings in previous knob files. Note that this doesn't include embedded knobs which are always processed first.

knobs section: adding a Knob

It is possible to add or override an existing knob by adding a knob section to the knobs list. Each knob section contains a few values and POSIX shell scripts used for defining and manipulating the knob.

Knob types

Optimizer Studio can work with two kinds of knobs - scripted knobs and memory knobs.
The main differences between two knob types are as follows:

  1. scripted knobs use two scripts which are used to get and set knob values to the system:

    • get_script to return current knob value
    • set_script to set new knob value
  2. The current values of memory knobs are passed to the workload script as environment variables on each workload invocation. For example, $my_knob. Such knobs can't affect the system in any other way

  3. The knob baseline value for a scripted knob is obtained by get_script at the beginning of the optimization. For memory knobs the first value of the options list (below) is used for this purpose
  4. Since shell environment variables can consist only of letters, digits or underscore characters, this limitations are applied to the names of memory knobs as well

Knob scripts

  1. get_script - returns current knob value. Mandatory for scripted knobs
  2. set_script - sets new knob value. Mandatory for scripted knobs
  3. pre_set_script - when provided, this script is run prior to running any of the set_script(s)
    Note that in case more than one knob refers to the same pre_set_script, identical script invocations will be merged.
  4. post_set_script - when provided, this script is run after running any of the set_script(s)
    Note that in case more than one knob refers to the same post_set_script, identical script invocations will be merged.
  5. enable_script - when provided, returns 0 to indicate that the knob should be discarded, or any other value to enable the knob. When not provided, the knob is enabled by default.
  6. options_script - when provided, populates available knob options. When not provided, available options must be provided by options list (below).
  7. devices_script - when provided, returns list of applicable devices, and the knob is split into multiple per-device knobs, e.g.
    io.scheduler → io.scheduler.sda, io.scheduler.sdb, io.scheduler.sdc

Each knob script is allowed to use the following environment variables:

  1. $KNOB_VAL - knob value being set by set_script
  2. $KNOB_DEV - device name, in device-specific knobs, as returned by devices_script

Optional knob parameters

  1. options - a list of available knob options
  2. disable - when provided, the knob will be discarded and erased from internal data structures. This parameter overrides enable_script (above).
  3. skip_tuning - when provided, the knob will be excluded from the tuning process
  4. default - when provided, sets the baseline default to the value provided. this option is not allowed for knobs with get script

Knob options definition

A simplest way to define knob options is to list all valid values explicitly:

options:
  values: [10, 15, 20]
default: 15

The second way is to define an options script which returns the list of option values. For example, the following script creates a list of numbers from 0 to 7 for a computer with 8 logical CPUs

options:
  script: cat /proc/cpuinfo | awk '/processor/ {print $3}'

Yet another possibility is to define a list of options using numeric range:

options:
  range:
    min: 1
    max: 16
    step: 1
    format: --num-threads=%d
default: 8

This definition creates a list of options "--num-threads=1", --num-threads=2, etc.
Printf style format field is optional. It contains a string with one of supported format specifiers, like in the example:

Format Numeric Value Knob Value
"-O%d" 2 -O2
"--pi=%.2f" 3.14159 --pi=3.14
"--pi=%.2f" 3 --pi=3.00
"%g" 5000000 5000000
"%g" 2.718000 2.718
"%x" 2047 7ff
"%X" 2047 7FF

You can combine different ways of specification:

options:
  values: [ "" ]
  range:
    min: 1
    max: 16
    step: 1
    format: --num-threads=%d

The example above adds an empty option value in addition to the ones defined by the range.

Knob dependency

In practice, there are situations where one parameter depends on the other. Thus, in such cases knob dependency should be considered when defining and optimizing knobs. Optimizer Studio supports knob dependency. A simple example of dependency handling between knobs

In the provided example knob Y depends on knob X in the following manner:

  1. 1 <= X < 50: 1 <= Y <= 150
  2. 50 <= X <= 100: Y = 1

Given, Z = X + Y, maximum value of Z is achieved when {X, Y} = {49, 150}.

The dependency is maintained via shell environment variable BIGX.
BIGX is set each time X is written back.
BIGX is consulted each time Y is written back.

In order for this dependency to be properly maintained:

  1. Knob X has to be written back prior to knob Y - the knobs are ordered alphabetically
  2. Knob Y has to be written back each time knob X is written back - this is controlled by 'dependency' section in knob Y definition

The knob definition for this example is available at /opt/concertio-optimizer/studio/examples/knob-dependency.

Disabling a knob

It is possible to disable a knob, either temporarily or permanently, without actually removing its definition from the knobs file.
To temporarily disable a knob, add to the knob section the following value:

skip_tuning: ""

To permanently erase a knob from the internal data structures, either add to the knob section the following value:

disable: ""

or define an appropriate enable_script:

enable_script: "echo -n 0"

Note that this approach allows disabling embedded knobs as well. For example:

domain:
  common:
    knobs:
      kernel.io_scheduler:  ## The name of the knob you wish to disable
        disable: ""

scripts section: named reusable scripts

Frequently different knobs reuse the same scripts, in particular pre_set_script(s)/post_set_script(s), as these are intended to preface/conclude multiple knobs.
In the scripts section, reusable scripts are represented as name:code pairs. Instead of providing the full code, it is possible to refer the script name prefixed with a @ symbol, e.g. "@my-script".

Examples

A: vm.dirty_background_ratio

domain:
  common:
    knobs:
      sys.vm.dirty_background_ratio:
        description: "The number of pages at which the pdflush threads begin writeback of dirty data."
        get_script: "/sbin/sysctl -n vm.dirty_background_ratio"
        set_script: "/sbin/sysctl -w vm.dirty_background_ratio=$KNOB_VAL"
        options: [10, 15, 20]

B: io.scheduler

Below is a more complex knob example of a device-specific knob. Eventually, this knob definition produces multiple knobs: io.scheduler.sda, io.scheduler.sdb etc.

domain:
  common:
    knobs:
      io.scheduler:
        description: "Block device scheduling algorithm selector."
        get_script: "cat /sys/block/$KNOB_DEV/queue/scheduler | sed 's/.*\\[\\([^]]*\\)\\].*/\\1/g'"
        set_script: "echo $KNOB_VAL > /sys/block/$KNOB_DEV/queue/scheduler"
        device_script: "ls -d /sys/block/sd* | cut -d/ -f 4"
        options_script: cat /sys/block/$KNOB_DEV/queue/scheduler | sed 's/\[\|\]//g ; s/ $//g ; s/\s/\n/g'

Passing knob values to the workload

The easiest way of passing knob values to applications is via the shell environment using memory knobs. Below is an example of a simple knob with five values:

domain:
  common:
    knobs:
      my_knob:
        options: [1,2,3,4,5]

Note that no set_script or get_script are defined for memory knobs. The workload scripts will receive $my_knob as an environment variable, as follows:

#!/bin/bash
executable_name ${my_knob}

In the above case, the baseline value of my_knob is 1 as it is the first value in the options list.

It is also possible to use scripted knobs to pass values of application knobs. One way to achieve this is using the filesystem. For example, a knob's set script can write a value into /tmp/knob_file:

domain:
  common:
    knobs:
      my_knob:
        get_script: "cat /tmp/knob_file"
        set_script: "echo $KNOB_VAL > /tmp/knob_file"
        options: [1,2,3,4,5]

Then, the workload script can read this value when invoked. For example:

#!/bin/bash
my_parameter=$(cat /tmp/knob_value)
executable_name ${my_parameter}

In order for the above to work, the file /tmp/knob_file should be populated, for example by:

$ echo "1" > /tmp/knob_file

The workload script can also read the knob names and values through an associated array. This is useful when experimenting with numerous knobs in the configuration files because the workload script can detect which knobs have been defined. Below is an example:

# source studio functions
. /opt/concertio-optimizer/studio/studio-functions.bash
# call memory knobs associative array function
get_memory_knobs_assoc
for K in "${!ASSOC_KNOBS_ARRAY[@]}"; do
        args+="--$K=${ASSOC_KNOBS_ARRAY[$K]} "
done
executable_name $args

The code in knobs.yaml can be tested as follows:

$ optimizer-studio --knobs=knobs.yaml --testknob=my_knob
I[3857][12:34:41.278] Concertio Optimizer, version 2.5.0
I[3857][12:34:41.278] License expiration date: January 1, 2021
I[3857][12:34:41.285] Knob my_knob: set value: 1 --> 2 
I[3857][12:34:41.287] Knob my_knob: set value: 2 --> 3 
I[3857][12:34:41.288] Knob my_knob: set value: 3 --> 4 
I[3857][12:34:41.290] Knob my_knob: set value: 4 --> 5 
I[3857][12:34:41.292] Knob my_knob: set value: 5 --> 1 [baseline]
E[3857][12:34:41.293] Knob my_knob test: success
Knob my_knob test: success

You can also test all knobs by using the option --testknob=all:

$ optimizer-studio --knobs=knobs.yaml --testknob=all

By default, the Optimizer Studio performs a silent test of all knobs prior to optimization tasks. If you wish to skip this test (not recommended) you can do so by using the --testknob=none option:

$ optimizer-studio --knobs=knobs.yaml ./my_workload.sh --testknob=none

metrics section: named HW and SW metrics

Metrics are used by Optimizer Studio to learn about the system behavior and to detect different phases of execution. Optimizer Studio will then attempt to find an optimal knob configuration that maximizes a certain sampled metric for each phase. The metrics are sampled periodically.

Metrics definition

Comma separated regular expressions define which metrics are sampled, and which metrics are excluded.

domain:
  common:
    include_metrics: [msr.*, proc.*]
    exclude_metrics: [proc.diskstats.sda.sectors_written]

In the above example, all msr metrics and all proc metrics, except for proc.diskstats.sda.sectors_written will be considered by Optimizer Studio for learning about the system behavior.

User-defined metrics

Optimizer Studio supports user-developed plugins for sampling custom metrics.

Target metric

You can designate one of the metrics to be used as a target one. Optimizer Studio will search for the optimal parameters that maximize the target metric.

domain:
  common:
    target: performance

You can also direct the Optimizer Studio to minimize the target metric, like below.

domain:
  common:
    target: my.custom.metric:min

max keyword can be used as well to explicitly specify the optimization goal.

Importing Configuration Files

It is possible to import configuration files by using the import directive. For example, the default embedded knobs of Optimizer Studio can be imported as follows:

import:
  optimizer.studio:
domain:
  common:
    knobs:
...

Other embedded knob definitions will always have the optimizer. prefix. In order to import a configuration file from the filesystem, its yaml extension should be removed and slashes (/) need to be converted into dots .. For example, my_configurations/my_software_knobs.yaml will be imported as my_configurations.my_software_knobs.

Embedded knob categories supported by Optimizer-Studio:

knobs category import syntax description comments/limitations
mellanox import optimizer.mellanox.connectx3 Mellanox Connect-X 3,4 and 5 NIC cards optional args: MELLANOX_DEVICES. works on bare metal machines only. NICS are detected automatically if not provided
intel import optimizer.intel.msrs Intel CPUs msr tuning parameters works on bare metal machines only. disabled automatically on unsupported platforms
java import optimizer.jvm.jvm-[7, 8, 9, 11] Java Virtual Machine tunables JRE or JDK to be installed
nginx import optimizer.nginx.nginx NGINX Web and Proxy server tunables required args: NGINX_CONF_FILE - for location of nginx.conf file
mysql import optimizer.mysql.mysql MySQL 5.7.8 and above system and caching tunables required args: MYSQL_CONF_FILE - for location of mysqld.cnf. currently assumes mysql client and server installed on the same machine
mongodb import optimizer.mongodb MongoDB 4.x and above tunables assumes mongo client and server installed on the same machine
php import optimizer.php.php7 PHP 7 tuning parameters required args: PHP_CONF_FILE - for location of php.ini
apache2 import optimizer.apache.apache2 Apache 2.x web server tuning parameters required args: APACHE_CONF_FILE - for location of mpm_prefork.conf file
postgresql import optimizer.postgresql.v11 or v10 PostgreSQL 10 and 11 best practice tuning parameters required args: POSTGRESQL_CONF_FILE - for location of postgresql.conf file
openmpi import optimizer.posgresql.openmpi.mca OpenMPI Modular Component Architecture (MCA) tuning parameters required args: MCA_CONF_FILE - for location of mca-params.conf
network import optimizer.network Operating System (Linux) system level network tunables
hhvm import optimizer.hhvm.hack HHVM benchmark tuning required args: HHVM_CONF_FILE - for the server.ini file path. HHVM must be installed. supports hhvm version 4.6.0
hadoop yarn import optimizer.hadoop.yarn tuning parameters for Haddop Yarn Cluster required args: MAPRED_CONF_FILE - to point to Yarn config file path
hadoop spark import optimizer.hadoop.spark tuning parameters for Hadoop Spark Cluster required args: SPARK_CONF_FILE - to point to Spark config file path
gcc import optimizer.compilers.gcc.[4-7-0, 4-8-0, 4-9-0, 5-3-0, 7-1-0] all GCC compilation flags tuning parameters gcc of supported version to be installed
llvm import optimizer.compilers.llvm.4-0-0 all LLVM compilation flags tuning parameters compatible with version 4 llvm of supported version to be installed - tested up to version 9

Example for import with condfiguration file as argument:

import:
  optimizer.postgresql.v11:
    args:
      POSTGRESQL_CONF_FILE: /etc/postgres/postgresql.conf

See examples folder under optimizer-studio folder for many embedded knob examples with self-documentated knobs files.

We are working continuosly to add support for more projects.

Filtering knobs from imported files

Specific knobs can be selected from imported files using regular expressions. In the following example, all embedded knobs are imported, except those that have "net" in their names:

import:
  optimizer.studio:
    include_knobs: [ .* ]
    exclude_knobs: [ .*net.* ]

File-specific directives

Configuration files can have their own enable and onload directives, as shown in the following example:

import:
  my_example_import:
enable:
  script: echo 1
onload:
  source: my_script.sh
domain: ...

File-specific enable

The enable directive determines whether the configuration file should be loaded. It can either be a scalar value (enable:), a script, or a sourced script. A script can be defined as following:

enable:
  script: echo 1

A sourced script can be defined as follows:

enable:
  source: my_filesystem_script.sh

In all of these cases, if 1 is returned, the configuration file is loaded. Otherwise it is skipped.

Passing arguments to the enable scripts of imported files

Passing arguments as environment variables is possible using the args directive:

import:
  my_example_import:
    args:
      MY_ENV_VARIABLE: value

The parameter can then be used in my_example_import.yaml's enable script as follows:

enable:
  script: echo ${MY_ENV_VARIABLE}

File-specific onload

When a configuration file is found to be enabled, the onload script is invoked. It can either be sourced (using source:) or in-lined (using script:).

Workload definition and multi-objective optimization

A workload is normally defined in a script supplied to Optimizer Studio via its command line.
Optimizer Studio also supports complex workload definition with scalarized muti-objective target.
The complex workload is defined in the configuration file as a sequence of steps, each comprising a script, metrics and validity checks, as following:

  1. Script: a script that runs the workload
  2. Metrics: after the script (if defined) completes, metrics are gathered from the filesystem
  3. Validity checks: after the metrics are gathered, they are validated in a sequence of tests. If all checks pass, the next workload step is executed. Otherwise, the knob configuration is deemed invalid and Optimizer Studio resumes testing a different configuration.

Upon completion and validation of all the steps, workload target metric is calculated. The workload target is specified through scalarization of the step metrics as described above.

Optimization target metric definition

The workload target can be set as target metric of the whole optimization.

Example

domain:
  common:

    ...

    target: workload.target:max

workload_settings:
  workloads:
    -
      script: ./workload_1.sh arg1
      metrics:
        workload1_output: /tmp/target1
        metric1: /tmp/metric1
      validity: 
        - workload1_output > 3200
        - metric1 < 200
    -
      script: ./workload_2.sh arg2
      metrics:
        workload2_output: /tmp/target2
      validity: workload2_output < 10000

  target: workload1_output * 0.12 + workload2_output * 8 - metric1

The above example demonstrates a 2-step complex workload.
Optimizer Studio comes up with a configuration to test.
It will then run ./workload_1.sh arg1. Upon completion, metrics will be sampled, and validity criteria applied - (workload1_output > 3200) && (metric1 < 200).
If both checks pass, Optimizer Studio will proceed to running the next step script (./workload_2.sh arg2). Otherwise, Optimizer Studio will come up with a different configuration, and start testing the first step again.

If all the workloads run correctly and pass their validity tests, Optimizer Studio will calculate the workload target according to the formula at the last line of the example.
Workload target is defined as optimization target metric (and is maximized).

Reusing Optimization Results In a New Experiment

More often than not, users perform several optimization experiments in the same system. Users can reuse results of previous experiments in succeeding ones in order to reduce total optimization time by starting optimization from best results of a previous optimization run. In addition, users can define a baseline knob configuration based on results of a previous optimization run in order to calculate incremental improvement of succeeding experiments.

At the end of optimization, an optimization report file is created in the ${HOME}/.concertio directory next to log and csv file. The report file has the name report_<timestamp>.json, where timestamp specifies the file creation time. The file is stored in JSON format. Among other information, the report contains a list of best knob configurations found during optimization.

Reusing best configurations

If users believe that best knob configurations found in one experiment can be good candidates for another optimization, there is a way to reuse the previously found configurations. Doing this, can save time in comparison to starting optimization from the baseline. However, in other cases, previous results can be irrelevant for the experiment, so users have to apply their own judgement.

A user can specify a special directive in the knobs.yaml file which points to "topKnobConfigs" item in the report file, which contains a list of best knob configurations:

domain:
  common:
    knobs:
    ...
    seed_configs: <report_file_path>#topKnobConfigs

Here, #topKnobConfigs is the separator character and the JSON tag name. The default tag name can be omitted, together with its separator. If users provide their own list(s) of knob configurations, they can specify their own tag, provided that the JSON format is kept.

It is possible to provide several files with seed configurations:

domain:
  common:
    knobs:
    ...
    seed_configs:
     - <report_file_path1>#topKnobConfigs
     - <report_file_path2>#topKnobConfigs
     ...

Defining an alternative baseline configuration

Before running an optimization, Optimizer Studio runs the workload using baseline knob configuration and uses the obtained target metric as a base for improvement calculations. There is a possibility for a user to define an alternative baseline configuration in a knobs.yaml file.

domain:
  common:
    knobs:
    ...
    baseline_config: <report_file_path>#bestKnobConfig

Here, the config specification includes a JSON report file and a tag name inside this JSON document. The format of this JSON item is like below:

"tag": [
  {"name":"knob1", "value":"value1"},
  {"name":"knob2", "value":"value2"},
  ...
]

System-wide Settings

System-wide settings can be configured in the settings.yaml file or directly in the configuration file, as follows:

global_settings:
  max_config_mean_cv: 0.02

Note that some settings need to be defined in settings.yaml or an equivalent parameters file, such as out_directory, metrics_csv_directory, and shell_command. All of the others can be defined in the regular configuration files, together with the knobs. It is recommended to use the template in the installation directory.

The possible settings are summarized below:

Setting name Default value Description
interval_seconds 1 The interval in seconds between samples. Relevant only for asynchronous sampling mode.
knob_ranking_max_num_of_samples 10000 Maximum number of configurations to use for knob ranking calculations.
max_baseline_cv 0.04 The maximum allowed coefficient of variation of the mean of the measurements in baseline settings. A high value will mandate additional measurements.
max_config_mean_cv 0.04 The maximum allowed coefficient of variation of the mean of the measurements per knob configuration. A high value will mandate additional measurements.
max_configs_in_report 10 Maximum number of knob configurations to include into best configurations report
max_invalid_samples_per_config 0 The maximum allowed invalid measurements per knob configuration, above which the configuration is considered invalid and will not be further tested.
max_samples_per_config 120 Optimizer will not test any knob configuration more than the number of times specified by this parameter.
metrics_csv_filename - If specified, Optimizer Studio creates a CSV file with the details of all the knob settings and metric measurements.
min_baseline_samples 2 The minimum number of baseline samples. This is used in conjunction with max_baseline_cv.
min_samples_per_config 2 The minimum number of samples per knob configuration. This is used in conjunction with max_config_mean_cv.
min_samples_to_early_retire 4 The minimum number of samples after which the optimizer can decide about knob configuration retirement if it doesn't contribute to improvement.
optimization_strategy evolution The algorithm employed for searching through the knobs. Available options are greedy and evolution.
optimization_strategy_settings Settings specific for each optimization strategy. Only settings specific for the selected strategy will be parsed.
out_directory ${HOME}/.concertio Concertio Optimizer Studio generates output data such as optimization database file, log files, etc. into this location.
pending_config_timeout_minutes 10000 Maximum time the optimizer waits for the workload to respond with a metric. After this time, the optimization session is aborted.
point_estimator average: <no_value> Point estimation function. Additional functions: percentile: <percent>, mode: <no_value>
save_interval_minutes 120 The interval in minutes between saving the data file to the disk.
shell_command /bin/sh +e This defines the backend shell of the knobs and metrics. It is possible to run all knobs and metrics scripts on remote hosts using a different shell command.