Migrate from InfluxDB
This guide will help you understand the differences between the data models of GreptimeDB and InfluxDB, and guide you through the migration process.
Data model in difference
While you may already be familiar with InfluxDB key concepts, the data model of GreptimeDB is something new to explore. Let's start with similarities and differences:
- Both solutions are schemaless, eliminating the need to define a schema before writing data.
- In InfluxDB, a point represents a single data record with a measurement, tag set, field set, and a timestamp. In GreptimeDB, it is represented as a row of data in the time-series table, where the table name aligns with the measurement, and the columns are divided into three types: Tag, Field, and Timestamp.
- GreptimeDB uses
TimestampMillisecond
as the data type for timestamp data from the InfluxDB line protocol API. - GreptimeDB uses
Float64
as the data type for numeric data from the InfluxDB line protocol API.
Consider the following sample data borrowed from InfluxDB docs as an example:
_time | _measurement | location | scientist | _field | _value |
---|---|---|---|---|---|
2019-08-18T00:00:00Z | census | klamath | anderson | bees | 23 |
2019-08-18T00:00:00Z | census | portland | mullen | ants | 30 |
2019-08-18T00:06:00Z | census | klamath | anderson | bees | 28 |
2019-08-18T00:06:00Z | census | portland | mullen | ants | 32 |
The data mentioned above is formatted as follows in the InfluxDB line protocol:
census,location=klamath,scientist=anderson bees=23 1566086400000000000
census,location=portland,scientist=mullen ants=30 1566086400000000000
census,location=klamath,scientist=anderson bees=28 1566086760000000000
census,location=portland,scientist=mullen ants=32 1566086760000000000
In the GreptimeDB data model, the data is represented as follows in the census
table:
+---------------------+----------+-----------+------+------+
| ts | location | scientist | bees | ants |
+---------------------+----------+-----------+------+------+
| 2019-08-18 00:00:00 | klamath | anderson | 23 | NULL |
| 2019-08-18 00:06:00 | klamath | anderson | 28 | NULL |
| 2019-08-18 00:00:00 | portland | mullen | NULL | 30 |
| 2019-08-18 00:06:00 | portland | mullen | NULL | 32 |
+---------------------+----------+-----------+------+------+
The schema of the census
table is as follows:
+-----------+----------------------+------+------+---------+---------------+
| Column | Type | Key | Null | Default | Semantic Type |
+-----------+----------------------+------+------+---------+---------------+
| location | String | PRI | YES | | TAG |
| scientist | String | PRI | YES | | TAG |
| bees | Float64 | | YES | | FIELD |
| ts | TimestampMillisecond | PRI | NO | | TIMESTAMP |
| ants | Float64 | | YES | | FIELD |
+-----------+----------------------+------+------+---------+---------------+
Database connection information
Before you begin writing or querying data, it's crucial to comprehend the differences in database connection information between InfluxDB and GreptimeDB.
- Token: The InfluxDB API token, used for authentication, aligns with the GreptimeDB authentication. When interacting with GreptimeDB using InfluxDB's client libraries or HTTP API, you can use
<greptimedb_user:greptimedb_password>
as the token. - Organization: Unlike InfluxDB, GreptimeDB does not require an organization for connection.
- Bucket: In InfluxDB, a bucket serves as a container for time series data, which is equivalent to the database name in GreptimeDB.
Write data
GreptimeDB is compatible with both v1 and v2 of InfluxDB's line protocol format, facilitating a seamless migration from InfluxDB to GreptimeDB.
HTTP API
To write a measurement to GreptimeDB, you can use the following HTTP API request:
- InfluxDB line protocol v2
- InfluxDB line protocol v1
curl -X POST 'http://<greptimedb-host>:4000/v1/influxdb/api/v2/write?db=<db-name>' \
-H 'authorization: token <greptime_user:greptimedb_password>' \
-d 'census,location=klamath,scientist=anderson bees=23 1566086400000000000'
curl 'http://<greptimedb-host>:4000/v1/influxdb/write?db=<db-name>&u=<greptime_user>&p=<greptimedb_password>' \
-d 'census,location=klamath,scientist=anderson bees=23 1566086400000000000'
Telegraf
GreptimeDB's support for the Influxdb line protocol ensures its compatibility with Telegraf.
To configure Telegraf, simply add http://<greptimedb-host>:4000
URL into Telegraf configurations:
- InfluxDB line protocol v2
- InfluxDB line protocol v1
[[outputs.influxdb_v2]]
urls = ["http://<greptimedb-host>:4000/v1/influxdb"]
token = "<greptime_user>:<greptimedb_password>"
bucket = "<db-name>"
## Leave empty
organization = ""
[[outputs.influxdb]]
urls = ["http://<greptimedb-host>:4000/v1/influxdb"]
database = "<db-name>"
username = "<greptime_user>"
password = "<greptimedb_password>"
Client libraries
Writing data to GreptimeDB is a straightforward process when using InfluxDB client libraries. Simply include the URL and authentication details in the client configuration.
For example:
- Node.js
- Python
- Go
- Java
- PHP
'use strict'
/** @module write
**/
import { InfluxDB, Point } from '@influxdata/influxdb-client'
/** Environment variables **/
const url = 'http://<greptimedb-host>:4000/v1/influxdb'
const token = '<greptime_user>:<greptimedb_password>'
const org = ''
const bucket = '<db-name>'
const influxDB = new InfluxDB({ url, token })
const writeApi = influxDB.getWriteApi(org, bucket)
writeApi.useDefaultTags({ region: 'west' })
const point1 = new Point('temperature')
.tag('sensor_id', 'TLM01')
.floatField('value', 24.0)
writeApi.writePoint(point1)
import influxdb_client
from influxdb_client.client.write_api import SYNCHRONOUS
bucket = "<db-name>"
org = ""
token = "<greptime_user>:<greptimedb_password>"
url="http://<greptimedb-host>:4000/v1/influxdb"
client = influxdb_client.InfluxDBClient(
url=url,
token=token,
org=org
)
# Write script
write_api = client.write_api(write_options=SYNCHRONOUS)
p = influxdb_client.Point("my_measurement").tag("location", "Prague").field("temperature", 25.3)
write_api.write(bucket=bucket, org=org, record=p)
bucket := "<db-name>"
org := ""
token := "<greptime_user>:<greptimedb_password>"
url := "http://<greptimedb-host>:4000/v1/influxdb"
client := influxdb2.NewClient(url, token)
writeAPI := client.WriteAPIBlocking(org, bucket)
p := influxdb2.NewPoint("stat",
map[string]string{"unit": "temperature"},
map[string]interface{}{"avg": 24.5, "max": 45},
time.Now())
writeAPI.WritePoint(context.Background(), p)
client.Close()
private static String url = "http://<greptimedb-host>:4000/v1/influxdb";
private static String org = "";
private static String bucket = "<db-name>";
private static char[] token = "<greptime_user>:<greptimedb_password>".toCharArray();
public static void main(final String[] args) {
InfluxDBClient influxDBClient = InfluxDBClientFactory.create(url, token, org, bucket);
WriteApiBlocking writeApi = influxDBClient.getWriteApiBlocking();
Point point = Point.measurement("temperature")
.addTag("location", "west")
.addField("value", 55D)
.time(Instant.now().toEpochMilli(), WritePrecision.MS);
writeApi.writePoint(point);
influxDBClient.close();
}
$client = new Client([
"url" => "http://<greptimedb-host>:4000/v1/influxdb",
"token" => "<greptime_user>:<greptimedb_password>",
"bucket" => "<db-name>",
"org" => "",
"precision" => InfluxDB2\Model\WritePrecision::S
]);
$writeApi = $client->createWriteApi();
$dateTimeNow = new DateTime('NOW');
$point = Point::measurement("weather")
->addTag("location", "Denver")
->addField("temperature", rand(0, 20))
->time($dateTimeNow->getTimestamp());
$writeApi->write($point);
In addition to the languages previously mentioned, GreptimeDB also accommodates client libraries for other languages supported by InfluxDB. You can code in your language of choice by referencing the connection information and code snippets provided earlier.
Query data
GreptimeDB does not support Flux and InfluxQL, opting instead for SQL and PromQL.
SQL is a universal language designed for managing and manipulating relational databases. With flexible capabilities for data retrieval, manipulation, and analytics, it is also reduce the learning curve for users who are already familiar with SQL.
PromQL (Prometheus Query Language) allows users to select and aggregate time series data in real time, The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems via the HTTP API.
Suppose you are querying the maximum cpu usage from the monitor
table, recorded over the past 24 hours.
In influxQL, the query might look something like this:
SELECT
MAX("cpu")
FROM
"monitor"
WHERE
time > now() - 24h
GROUP BY
time(1h)
This InfluxQL query computes the maximum value of the cpu
field from the monitor
table,
considering only the data where the time is within the last 24 hours.
The results are then grouped into one-hour intervals.
In Flux, the query might look something like this:
from(bucket: "public")
|> range(start: -24h)
|> filter(fn: (r) => r._measurement == "monitor")
|> aggregateWindow(every: 1h, fn: max)
The similar query in GreptimeDB SQL would be:
SELECT
ts,
host,
AVG(cpu) RANGE '1h' as mean_cpu
FROM
monitor
WHERE
ts > NOW() - INTERVAL '24 hours'
ALIGN '1h' TO NOW
ORDER BY ts DESC;
In this SQL query,
the RANGE
clause determines the time window for the AVG(cpu)
aggregation function,
while the ALIGN
clause sets the alignment time for the time series data.
For more information on time window grouping, please refer to the Aggregate data by time window document.
The similar query in PromQL would be something like:
avg_over_time(monitor[1h])
To query time series data from the last 24 hours,
you need to execute this PromQL, using the start
and end
parameters of the HTTP API to define the time range.
For more information on PromQL, please refer to the PromQL document.
Visualize data
It is recommended using Grafana to visualize data in GreptimeDB. Please refer to the Grafana documentation for details on configuring GreptimeDB.
Migrate data
For a seamless migration of data from InfluxDB to GreptimeDB, you can follow these steps:
- Write data to both GreptimeDB and InfluxDB to avoid data loss during migration.
- Export all historical data from InfluxDB and import the data into GreptimeDB.
- Stop writing data to InfluxDB and remove the InfluxDB server.
Write data to both GreptimeDB and InfluxDB simultaneously
Writing data to both GreptimeDB and InfluxDB simultaneously is a practical strategy to avoid data loss during migration. By utilizing InfluxDB's client libraries, you can set up two client instances - one for GreptimeDB and another for InfluxDB. For guidance on writing data to GreptimeDB using the InfluxDB line protocol, please refer to the write data section.
If retaining all historical data isn't necessary, you can simultaneously write data to both GreptimeDB and InfluxDB for a specific period to accumulate the required recent data. Subsequently, cease writing to InfluxDB and continue exclusively with GreptimeDB. If a complete migration of all historical data is needed, please proceed with the following steps.
Export data from InfluxDB v1 Server
Create a temporary directory to store the exported data of InfluxDB.
mkdir -p /path/to/export
Use the influx_inspect export
command of InfluxDB to export data.
influx_inspect export \
-database <db-name> \
-end <end-time> \
-lponly \
-datadir /var/lib/influxdb/data \
-waldir /var/lib/influxdb/wal \
-out /path/to/export/data
- The
-database
flag specifies the database to be exported. - The
-end
flag specifies the end time of the data to be exported. Must be in RFC3339 format, such as2024-01-01T00:00:00Z
. You can use the timestamp when simultaneously writing data to both GreptimeDB and InfluxDB as the end time. - The
-lponly
flag specifies that only the Line Protocol data should be exported. - The
-datadir
flag specifies the path to the data directory, as configured in the InfluxDB data settings. - The
-waldir
flag specifies the path to the WAL directory, as configured in the InfluxDB data settings. - The
-out
flag specifies the output directory.
The exported data in InfluxDB line protocol looks like the following:
disk,device=disk1s5s1,fstype=apfs,host=bogon,mode=ro,path=/ inodes_used=356810i 1714363350000000000
diskio,host=bogon,name=disk0 iops_in_progress=0i 1714363350000000000
disk,device=disk1s6,fstype=apfs,host=bogon,mode=rw,path=/System/Volumes/Update inodes_used_percent=0.0002391237988702021 1714363350000000000
...
Export Data from InfluxDB v2 Server
Create a temporary directory to store the exported data of InfluxDB.
mkdir -p /path/to/export
Use the influx inspect export-lp
command of InfluxDB to export data in the bucket to line protocol.
influxd inspect export-lp \
--bucket-id <bucket-id> \
--engine-path /var/lib/influxdb2/engine/ \
--end <end-time> \
--output-path /path/to/export/data
- The
--bucket-id
flag specifies the bucket ID to be exported. - The
--engine-path
flag specifies the path to the engine directory, as configured in the InfluxDB data settings. - The
--end
flag specifies the end time of the data to be exported. Must be in RFC3339 format, such as2024-01-01T00:00:00Z
. You can use the timestamp when simultaneously writing data to both GreptimeDB and InfluxDB as the end time. - The
--output-path
flag specifies the output directory.
The outputs look like the following:
{"level":"info","ts":1714377321.4795408,"caller":"export_lp/export_lp.go:219","msg":"exporting TSM files","tsm_dir":"/var/lib/influxdb2/engine/data/307013e61d514f3c","file_count":1}
{"level":"info","ts":1714377321.4940555,"caller":"export_lp/export_lp.go:315","msg":"exporting WAL files","wal_dir":"/var/lib/influxdb2/engine/wal/307013e61d514f3c","file_count":1}
{"level":"info","ts":1714377321.4941633,"caller":"export_lp/export_lp.go:204","msg":"export complete"}
The exported data in InfluxDB line protocol looks like the following:
cpu,cpu=cpu-total,host=bogon usage_idle=80.4448912910468 1714376180000000000
cpu,cpu=cpu-total,host=bogon usage_idle=78.50167052182304 1714376190000000000
cpu,cpu=cpu-total,host=bogon usage_iowait=0 1714375700000000000
cpu,cpu=cpu-total,host=bogon usage_iowait=0 1714375710000000000
...
Import Data to GreptimeDB
Before importing data to GreptimeDB, if the data file is too large, it's recommended to split the data file into multiple slices:
split -l 100000 -d -a 10 data data.
# -l [line_count] Create split files line_count lines in length.
# -d Use a numeric suffix instead of a alphabetic suffix.
# -a [suffix_length] Use suffix_length letters to form the suffix of the file name.
You can import data using the HTTP API as described in the write data section. The Python script provided below will help you in reading data from the files and importing it into GreptimeDB.
Create a Python file named ingest.py
, ensuring you're using Python version 3.9 or later, and then copy and paste the following code into it.
import os
import sys
import subprocess
def process_file(file_path, url, token):
print("Ingesting file:", file_path)
curl_command = ['curl', '-i',
'-H', "authorization: token {}".format(token),
'-X', "POST",
'--data-binary', "@{}".format(file_path),
url]
print(" ".join(curl_command))
attempts = 0
while attempts < 3: # Retry up to 3 times
result = subprocess.run(curl_command, universal_newlines=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(result)
# Check if there are any warnings or errors in the curl command output
output = result.stderr.lower()
if "warning" in output or "error" in output:
print("Warnings or errors detected. Retrying...")
attempts += 1
else:
break
if attempts == 3:
print("Request failed after 3 attempts. Giving up.")
sys.exit(1)
def process_directory(directory, url, token):
file_names = []
# Walk through the directory
for root, dirs, files in os.walk(directory):
for file in files:
file_path = os.path.join(root, file)
file_names.append(file_path)
# Sort the file names array
file_names.sort()
# Process each file
for file_name in file_names:
process_file(file_name, url, token)
# Check if the arguments are provided
if len(sys.argv) < 4:
print("Please provide the directory path as the first argument, the url as the second argument and the token as the third argument.")
sys.exit(1)
directory_path = sys.argv[1]
url = sys.argv[2]
token = sys.argv[3]
# Call the function to process the directory
process_directory(directory_path, url, token)
Suppose your dictionary tree is as following:
.
├── ingest.py
└── slices
├── data.0000000000
├── data.0000000001
├── data.0000000002
Execute the Python script in the current directory and wait for the data import to complete.
python3 ingest.py slices http://<greptimedb-host>:4000/v1/influxdb/write?db=<db-name> <token>