Using InfluxDB to store performance metrics
InfluxDB it is a database oriented on time series. It was designed to accept large number of queries and write requests. It is perfect for storing timestamped data – data we would like to query by timestamp rather than index. It supports SQL-like language to easily pull data off as well as JSON requests through HTTP API. Influx uses tags and fields – tags to easily index series of data (you can query by it after where clause) and fields for keeping values (select field_1, field_2 from … )
In order to get or write data you can use influx shell started directly from Linux command line or use HTTP endpoint with proper JSON requests (to save data) or query (to get data). To start a terminal use:
$ influx -port 7086
Visit https://enterprise.influxdata.com to register for updates
Connected to http://localhost:7086 version 1.4.3
InflushDB shell version: 1.0.0
>
We can have multiple ‘databases’ within single influx instance, so each type of metric can be stored in different database. To switch between databases type: use <database>:
InflushDB shell version: 1.0.0
> use jmeter
Using database jmeter
>
Databases must be created manually with create database <name> command.
Within a database data is structured with so called measurements that are equivalents of tables. To list all measurements in database use: show measurements. Measurements are created dynamically upon write request – when where is a request with data to non-existing measurement it will be created automatically with all the fields and tags (also write request with data with non-existing tags or fields will extend this measurement). To know the structure of a measurement we can use following commands:
· show tag keys from <measurement> – to list all tags within measurement
· show field keys from <measurement> – to list all fields and its type within measurement
name: test#1
------------
tagKey
env
successful
transaction
name: test#1
------------
fieldKey fieldType
build integer
response string
value float
Having this information, we can create a select query to grab information. The simples one is: select * from “test” (note that measurement’s name must be in quotes):
> select * from "test#1"
name: test#1
------------
time build env response
1460628592046000000 1429 unit OK
1460628594449000000 1429 unit OK
1460628598740400000 1429 unit OK
1460628602446000000 1429 unit OK
1460628602987009000 1429 unit OK
1460628612249600000 1429 unit OK
1460628624798000000 1429 unit OK
1460628627985000000 1429 unit OK
1460628684854200000 1429 unit OK
1460628692048800000 1429 unit OK
And the more complicated one using fields and tags will be like:
> select response, value from "test#1" where "transaction"='User Authentication'
name: test#1
------------
time build env response
1460628592046000000 1429 unit OK
Please notice that tag names are also put into quotes and string values into apostrophes.
Finally, we can query based on timestamp what is a key feature here:
select * from „test#1” where time < 1460628601937000000 – based on given timestamp
select * from „test#1” where time < now() – 10d – list all entries from the last 10 days
Influx has also several aggregation functions especially useful for data analysis, like:
- COUNT()
- MEAN()
- STDDEV()
- SUM()
- DISTINCT()
Like regular SQL it supports data selectors like:
- FIRST()
- LAST()
- MAX()
- PERCENTILE()
And transformations:
- EXP()
- COS()
- ROUND()
> select MEAN("value") from "test#1" where time < 1460628612249600000
name: test#1
------------
time mean
0 4425.25
Writing the data to InfluxDB can be done in 2 ways. First one is through Influx shell using INSERT statement which goes according to rules:
INSERT <measurement> [,<tag sets>] <field sets> <timestamp>
So our insert example into InfluxDB via CLI looks like;
INSERT test#1, env=”unit” successful=”true” transaction=”abc” value=1111, build=1234, response=”OK” 1439938800000000000
Second option for writing data into InfluxDB is using POST request with JSON through HTTP API. Request shall be sent to
<influx_host>/db/<database>/series?u=<username>&p=<password[MA1] [MA2] > with a body in a form of JSON:
[{
"measurement": 'test#1',
"tags":
{
'env': 'unit',
'transaction': 'abc',
'successful': 'True'
},
"time": 1460628601937000000,
"fields":
{
'build': 1234,
'value': 2990.0,
'response': 'OK'
}
}]
Instead of putting credentials into request basic HTTP authentication can be used. Also, structure of a body allows to save multiple point within single request. Time can be specified in JSON body but in case of its lack it will be generated automatically. As mentioned before Influx columns (tags or fields) can be created dynamically so we can add them whenever we want without any error – existing series would be filled with nulls in these columns.
Here is python example how to write data to InfluxDB using HTTP API:
from influxdb import InfluxDBClient
host = 'influx_host'
port = 8086
username = 'username'
password = 'password'
json = [{
"measurement": 'test#1',
"tags":
{
'env': 'unit',
'transaction': 'abc',
'successful': 'True'
},
"time": 1460628601937000000,
"fields":
{
'build': 1234,
'value': 2990.0,
'response': 'OK'
}
}]
client = InfluxDBClient(host, port, username, password)
client.write_points(points=json, database='jmeter')
In InfluxDB measurements, tag keys, tag values and field keys are always strings. This is the reason why any math operation cannot be performed on tag values. Different from this are field values where different data types can be stored:
· Float – default numerical value, when we request value=2990.0 or value=2990 it will be stored as float
· Integer – to force value be saved as integer it must have an ‘i’ at the end: value=2990i
· String – value put in quotes will be treated as string: value=”2990”
· Boolean – TRUE is represented by t, T, true, True, or TRUE; FALSE is represented by f, F, false, False, or FALSE
Type of filed value is set during first write request.
Summary
Influx seems to be a very decent DB for storing time-oriented large amount of data. According to external benchmarks it is much more efficient comparing to i.e. Cassandra in terms of write throughput or response times. Additionally, many visualizing data tools like Grafana have a built-in support for pulling out data what makes charts creation very easy.