Version: 0.7

Migrate from InfluxDB

This guide will help you understand the differences between the data models of GreptimeDB and InfluxDB, and guide you through the migration process.

Data model in difference

While you may already be familiar with InfluxDB key concepts, the data model of GreptimeDB is something new to explore. Let's start with similarities and differences:

Both solutions are schemaless, eliminating the need to define a schema before writing data.
In InfluxDB, a point represents a single data record with a measurement, tag set, field set, and a timestamp. In GreptimeDB, it is represented as a row of data in the time-series table, where the table name aligns with the measurement, and the columns are divided into three types: Tag, Field, and Timestamp.
GreptimeDB uses TimestampMillisecond as the data type for timestamp data from the InfluxDB line protocol API.
GreptimeDB uses Float64 as the data type for numeric data from the InfluxDB line protocol API.

Consider the following sample data borrowed from InfluxDB docs as an example:

_time	_measurement	location	scientist	_field	_value
2019-08-18T00:00:00Z	census	klamath	anderson	bees	23
2019-08-18T00:00:00Z	census	portland	mullen	ants	30
2019-08-18T00:06:00Z	census	klamath	anderson	bees	28
2019-08-18T00:06:00Z	census	portland	mullen	ants	32

The data mentioned above is formatted as follows in the InfluxDB line protocol:

census,location=klamath,scientist=anderson bees=23 1566086400000000000
census,location=portland,scientist=mullen ants=30 1566086400000000000
census,location=klamath,scientist=anderson bees=28 1566086760000000000
census,location=portland,scientist=mullen ants=32 1566086760000000000

In the GreptimeDB data model, the data is represented as follows in the census table:

+---------------------+----------+-----------+------+------+
| ts                  | location | scientist | bees | ants |
+---------------------+----------+-----------+------+------+
| 2019-08-18 00:00:00 | klamath  | anderson  |   23 | NULL |
| 2019-08-18 00:06:00 | klamath  | anderson  |   28 | NULL |
| 2019-08-18 00:00:00 | portland | mullen    | NULL |   30 |
| 2019-08-18 00:06:00 | portland | mullen    | NULL |   32 |
+---------------------+----------+-----------+------+------+

The schema of the census table is as follows:

+-----------+----------------------+------+------+---------+---------------+
| Column    | Type                 | Key  | Null | Default | Semantic Type |
+-----------+----------------------+------+------+---------+---------------+
| location  | String               | PRI  | YES  |         | TAG           |
| scientist | String               | PRI  | YES  |         | TAG           |
| bees      | Float64              |      | YES  |         | FIELD         |
| ts        | TimestampMillisecond | PRI  | NO   |         | TIMESTAMP     |
| ants      | Float64              |      | YES  |         | FIELD         |
+-----------+----------------------+------+------+---------+---------------+

Database connection information

Before you begin writing or querying data, it's crucial to comprehend the differences in database connection information between InfluxDB and GreptimeDB.

Token: The InfluxDB API token, used for authentication, aligns with the GreptimeDB authentication. When interacting with GreptimeDB using InfluxDB's client libraries or HTTP API, you can use <greptimedb_user:greptimedb_password> as the token.
Organization: Unlike InfluxDB, GreptimeDB does not require an organization for connection.
Bucket: In InfluxDB, a bucket serves as a container for time series data, which is equivalent to the database name in GreptimeDB.

Write data

GreptimeDB is compatible with both v1 and v2 of InfluxDB's line protocol format, facilitating a seamless migration from InfluxDB to GreptimeDB.

HTTP API

To write a measurement to GreptimeDB, you can use the following HTTP API request:

InfluxDB line protocol v2
InfluxDB line protocol v1

curl -X POST 'http://<greptimedb-host>:4000/v1/influxdb/api/v2/write?db=<db-name>' \
  -H 'authorization: token <greptime_user:greptimedb_password>' \
  -d 'census,location=klamath,scientist=anderson bees=23 1566086400000000000'

curl 'http://<greptimedb-host>:4000/v1/influxdb/write?db=<db-name>&u=<greptime_user>&p=<greptimedb_password>' \
  -d 'census,location=klamath,scientist=anderson bees=23 1566086400000000000'

Telegraf

GreptimeDB's support for the Influxdb line protocol ensures its compatibility with Telegraf. To configure Telegraf, simply add http://<greptimedb-host>:4000 URL into Telegraf configurations:

InfluxDB line protocol v2
InfluxDB line protocol v1

[[outputs.influxdb_v2]]
  urls = ["http://<greptimedb-host>:4000/v1/influxdb"]
  token = "<greptime_user>:<greptimedb_password>"
  bucket = "<db-name>"
  ## Leave empty
  organization = ""

[[outputs.influxdb]]
  urls = ["http://<greptimedb-host>:4000/v1/influxdb"]
  database = "<db-name>"
  username = "<greptime_user>"
  password = "<greptimedb_password>"

Client libraries

Writing data to GreptimeDB is a straightforward process when using InfluxDB client libraries. Simply include the URL and authentication details in the client configuration.

For example:

Node.js
Python
Go
Java
PHP

'use strict'
/** @module write
**/

import { InfluxDB, Point } from '@influxdata/influxdb-client'

/** Environment variables **/
const url = 'http://<greptimedb-host>:4000/v1/influxdb'
const token = '<greptime_user>:<greptimedb_password>'
const org = ''
const bucket = '<db-name>'

const influxDB = new InfluxDB({ url, token })
const writeApi = influxDB.getWriteApi(org, bucket)
writeApi.useDefaultTags({ region: 'west' })
const point1 = new Point('temperature')
  .tag('sensor_id', 'TLM01')
  .floatField('value', 24.0)
writeApi.writePoint(point1)

import influxdb_client
from influxdb_client.client.write_api import SYNCHRONOUS

bucket = "<db-name>"
org = ""
token = "<greptime_user>:<greptimedb_password>"
url="http://<greptimedb-host>:4000/v1/influxdb"

client = influxdb_client.InfluxDBClient(
    url=url,
    token=token,
    org=org
)

# Write script
write_api = client.write_api(write_options=SYNCHRONOUS)

p = influxdb_client.Point("my_measurement").tag("location", "Prague").field("temperature", 25.3)
write_api.write(bucket=bucket, org=org, record=p)

bucket := "<db-name>"
org := ""
token := "<greptime_user>:<greptimedb_password>"
url := "http://<greptimedb-host>:4000/v1/influxdb"
client := influxdb2.NewClient(url, token)
writeAPI := client.WriteAPIBlocking(org, bucket)

p := influxdb2.NewPoint("stat",
    map[string]string{"unit": "temperature"},
    map[string]interface{}{"avg": 24.5, "max": 45},
    time.Now())
writeAPI.WritePoint(context.Background(), p)
client.Close()

private static String url = "http://<greptimedb-host>:4000/v1/influxdb";
private static String org = "";
private static String bucket = "<db-name>";
private static char[] token = "<greptime_user>:<greptimedb_password>".toCharArray();

public static void main(final String[] args) {

    InfluxDBClient influxDBClient = InfluxDBClientFactory.create(url, token, org, bucket);
    WriteApiBlocking writeApi = influxDBClient.getWriteApiBlocking();
    Point point = Point.measurement("temperature")
            .addTag("location", "west")
            .addField("value", 55D)
            .time(Instant.now().toEpochMilli(), WritePrecision.MS);

    writeApi.writePoint(point);
    influxDBClient.close();
}

$client = new Client([
    "url" => "http://<greptimedb-host>:4000/v1/influxdb",
    "token" => "<greptime_user>:<greptimedb_password>",
    "bucket" => "<db-name>",
    "org" => "",
    "precision" => InfluxDB2\Model\WritePrecision::S
]);

$writeApi = $client->createWriteApi();

$dateTimeNow = new DateTime('NOW');
$point = Point::measurement("weather")
        ->addTag("location", "Denver")
        ->addField("temperature", rand(0, 20))
        ->time($dateTimeNow->getTimestamp());
$writeApi->write($point);

In addition to the languages previously mentioned, GreptimeDB also accommodates client libraries for other languages supported by InfluxDB. You can code in your language of choice by referencing the connection information and code snippets provided earlier.

Query data

GreptimeDB does not support Flux and InfluxQL, opting instead for SQL and PromQL.

SQL is a universal language designed for managing and manipulating relational databases. With flexible capabilities for data retrieval, manipulation, and analytics, it is also reduce the learning curve for users who are already familiar with SQL.

PromQL (Prometheus Query Language) allows users to select and aggregate time series data in real time, The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems via the HTTP API.

Suppose you are querying the maximum cpu usage from the monitor table, recorded over the past 24 hours. In influxQL, the query might look something like this:

SELECT 
   MAX("cpu") 
FROM 
   "monitor" 
WHERE 
   time > now() - 24h 
GROUP BY 
   time(1h)

This InfluxQL query computes the maximum value of the cpu field from the monitor table, considering only the data where the time is within the last 24 hours. The results are then grouped into one-hour intervals.

In Flux, the query might look something like this:

from(bucket: "public")
  |> range(start: -24h)
  |> filter(fn: (r) => r._measurement == "monitor")
  |> aggregateWindow(every: 1h, fn: max)

The similar query in GreptimeDB SQL would be:

SELECT
    ts,
    host,
    AVG(cpu) RANGE '1h' as mean_cpu
FROM
    monitor
WHERE
    ts > NOW() - INTERVAL '24 hours'
ALIGN '1h' TO NOW
ORDER BY ts DESC;

In this SQL query, the RANGE clause determines the time window for the AVG(cpu) aggregation function, while the ALIGN clause sets the alignment time for the time series data. For more information on time window grouping, please refer to the Aggregate data by time window document.

The similar query in PromQL would be something like:

avg_over_time(monitor[1h])

To query time series data from the last 24 hours, you need to execute this PromQL, using the start and end parameters of the HTTP API to define the time range. For more information on PromQL, please refer to the PromQL document.

Visualize data

It is recommended using Grafana to visualize data in GreptimeDB. Please refer to the Grafana documentation for details on configuring GreptimeDB.

Migrate data

For a seamless migration of data from InfluxDB to GreptimeDB, you can follow these steps:

Double write to GreptimeDB and InfluxDB

Write data to both GreptimeDB and InfluxDB to avoid data loss during migration.
Export all historical data from InfluxDB and import the data into GreptimeDB.
Stop writing data to InfluxDB and remove the InfluxDB server.

Write data to both GreptimeDB and InfluxDB simultaneously

Writing data to both GreptimeDB and InfluxDB simultaneously is a practical strategy to avoid data loss during migration. By utilizing InfluxDB's client libraries, you can set up two client instances - one for GreptimeDB and another for InfluxDB. For guidance on writing data to GreptimeDB using the InfluxDB line protocol, please refer to the write data section.

If retaining all historical data isn't necessary, you can simultaneously write data to both GreptimeDB and InfluxDB for a specific period to accumulate the required recent data. Subsequently, cease writing to InfluxDB and continue exclusively with GreptimeDB. If a complete migration of all historical data is needed, please proceed with the following steps.

Export data from InfluxDB v1 Server

Create a temporary directory to store the exported data of InfluxDB.

mkdir -p /path/to/export

Use the influx_inspect export command of InfluxDB to export data.

influx_inspect export \
  -database <db-name> \ 
  -end <end-time> \
  -lponly \
  -datadir /var/lib/influxdb/data \
  -waldir /var/lib/influxdb/wal \
  -out /path/to/export/data

The -database flag specifies the database to be exported.
The -end flag specifies the end time of the data to be exported. Must be in RFC3339 format, such as 2024-01-01T00:00:00Z. You can use the timestamp when simultaneously writing data to both GreptimeDB and InfluxDB as the end time.
The -lponly flag specifies that only the Line Protocol data should be exported.
The -datadir flag specifies the path to the data directory, as configured in the InfluxDB data settings.
The -waldir flag specifies the path to the WAL directory, as configured in the InfluxDB data settings.
The -out flag specifies the output directory.

The exported data in InfluxDB line protocol looks like the following:

disk,device=disk1s5s1,fstype=apfs,host=bogon,mode=ro,path=/ inodes_used=356810i 1714363350000000000
diskio,host=bogon,name=disk0 iops_in_progress=0i 1714363350000000000
disk,device=disk1s6,fstype=apfs,host=bogon,mode=rw,path=/System/Volumes/Update inodes_used_percent=0.0002391237988702021 1714363350000000000
...

Export Data from InfluxDB v2 Server

Create a temporary directory to store the exported data of InfluxDB.

mkdir -p /path/to/export

Use the influx inspect export-lp command of InfluxDB to export data in the bucket to line protocol.

influxd inspect export-lp \
  --bucket-id <bucket-id> \
  --engine-path /var/lib/influxdb2/engine/ \
  --end <end-time> \
  --output-path /path/to/export/data

The --bucket-id flag specifies the bucket ID to be exported.
The --engine-path flag specifies the path to the engine directory, as configured in the InfluxDB data settings.
The --end flag specifies the end time of the data to be exported. Must be in RFC3339 format, such as 2024-01-01T00:00:00Z. You can use the timestamp when simultaneously writing data to both GreptimeDB and InfluxDB as the end time.
The --output-path flag specifies the output directory.

The outputs look like the following:

{"level":"info","ts":1714377321.4795408,"caller":"export_lp/export_lp.go:219","msg":"exporting TSM files","tsm_dir":"/var/lib/influxdb2/engine/data/307013e61d514f3c","file_count":1}
{"level":"info","ts":1714377321.4940555,"caller":"export_lp/export_lp.go:315","msg":"exporting WAL files","wal_dir":"/var/lib/influxdb2/engine/wal/307013e61d514f3c","file_count":1}
{"level":"info","ts":1714377321.4941633,"caller":"export_lp/export_lp.go:204","msg":"export complete"}

The exported data in InfluxDB line protocol looks like the following:

cpu,cpu=cpu-total,host=bogon usage_idle=80.4448912910468 1714376180000000000
cpu,cpu=cpu-total,host=bogon usage_idle=78.50167052182304 1714376190000000000
cpu,cpu=cpu-total,host=bogon usage_iowait=0 1714375700000000000
cpu,cpu=cpu-total,host=bogon usage_iowait=0 1714375710000000000
...

Import Data to GreptimeDB

Before importing data to GreptimeDB, if the data file is too large, it's recommended to split the data file into multiple slices:

split -l 100000 -d -a 10 data data.
# -l [line_count]    Create split files line_count lines in length.
# -d                 Use a numeric suffix instead of a alphabetic suffix.
# -a [suffix_length] Use suffix_length letters to form the suffix of the file name.

You can import data using the HTTP API as described in the write data section. The Python script provided below will help you in reading data from the files and importing it into GreptimeDB.

Create a Python file named ingest.py, ensuring you're using Python version 3.9 or later, and then copy and paste the following code into it.

import os
import sys
import subprocess

def process_file(file_path, url, token):
    print("Ingesting file:", file_path)
    curl_command = ['curl', '-i',
                    '-H', "authorization: token {}".format(token),
                    '-X', "POST",
                    '--data-binary', "@{}".format(file_path),
                    url]
    print(" ".join(curl_command))

    attempts = 0
    while attempts < 3:  # Retry up to 3 times
        result = subprocess.run(curl_command, universal_newlines=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        print(result)
        # Check if there are any warnings or errors in the curl command output
        output = result.stderr.lower()
        if "warning" in output or "error" in output:
            print("Warnings or errors detected. Retrying...")
            attempts += 1
        else:
            break

    if attempts == 3:
        print("Request failed after 3 attempts. Giving up.")
        sys.exit(1)

def process_directory(directory, url, token):
    file_names = []

    # Walk through the directory
    for root, dirs, files in os.walk(directory):
        for file in files:
            file_path = os.path.join(root, file)
            file_names.append(file_path)

    # Sort the file names array
    file_names.sort()

    # Process each file
    for file_name in file_names:
        process_file(file_name, url, token)

# Check if the arguments are provided
if len(sys.argv) < 4:
    print("Please provide the directory path as the first argument, the url as the second argument and the token as the third argument.")
    sys.exit(1)

directory_path = sys.argv[1]
url = sys.argv[2]
token = sys.argv[3]

# Call the function to process the directory
process_directory(directory_path, url, token)

Suppose your dictionary tree is as following:

.
├── ingest.py
└── slices
    ├── data.0000000000
    ├── data.0000000001
    ├── data.0000000002

Execute the Python script in the current directory and wait for the data import to complete.

python3 ingest.py slices http://<greptimedb-host>:4000/v1/influxdb/write?db=<db-name> <token>

Migrate from InfluxDB

Data model in difference​

Database connection information​

Write data​

HTTP API​

Telegraf​

Client libraries​

Query data​

Visualize data​

Migrate data​

Write data to both GreptimeDB and InfluxDB simultaneously​

Export data from InfluxDB v1 Server​

Export Data from InfluxDB v2 Server​

Import Data to GreptimeDB​