Skip to main content
Version: 0.8

Migrate from InfluxDB

This guide will help you understand the differences between the data models of GreptimeDB and InfluxDB, and guide you through the migration process.

Data model in difference

While you may already be familiar with InfluxDB key concepts, the data model of GreptimeDB is something new to explore. Here are the similarities and differences between the data models of GreptimeDB and InfluxDB:

  • Both solutions are schemaless, eliminating the need to define a schema before writing data.
  • In InfluxDB, a point represents a single data record with a measurement, tag set, field set, and a timestamp. In GreptimeDB, it is represented as a row of data in the time-series table, where the table name aligns with the measurement, and the columns are divided into three types: Tag, Field, and Timestamp.
  • GreptimeDB uses TimestampNanosecond as the data type for timestamp data from the InfluxDB line protocol API.
  • GreptimeDB uses Float64 as the data type for numeric data from the InfluxDB line protocol API.

Consider the following sample data borrowed from InfluxDB docs as an example:

_time_measurementlocationscientist_field_value
2019-08-18T00:00:00Zcensusklamathandersonbees23
2019-08-18T00:00:00Zcensusportlandmullenants30
2019-08-18T00:06:00Zcensusklamathandersonbees28
2019-08-18T00:06:00Zcensusportlandmullenants32

The data mentioned above is formatted as follows in the InfluxDB line protocol:

census,location=klamath,scientist=anderson bees=23 1566086400000000000
census,location=portland,scientist=mullen ants=30 1566086400000000000
census,location=klamath,scientist=anderson bees=28 1566086760000000000
census,location=portland,scientist=mullen ants=32 1566086760000000000

In the GreptimeDB data model, the data is represented as follows in the census table:

+---------------------+----------+-----------+------+------+
| ts | location | scientist | bees | ants |
+---------------------+----------+-----------+------+------+
| 2019-08-18 00:00:00 | klamath | anderson | 23 | NULL |
| 2019-08-18 00:06:00 | klamath | anderson | 28 | NULL |
| 2019-08-18 00:00:00 | portland | mullen | NULL | 30 |
| 2019-08-18 00:06:00 | portland | mullen | NULL | 32 |
+---------------------+----------+-----------+------+------+

The schema of the census table is as follows:

+-----------+----------------------+------+------+---------+---------------+
| Column | Type | Key | Null | Default | Semantic Type |
+-----------+----------------------+------+------+---------+---------------+
| location | String | PRI | YES | | TAG |
| scientist | String | PRI | YES | | TAG |
| bees | Float64 | | YES | | FIELD |
| ts | TimestampNanosecond | PRI | NO | | TIMESTAMP |
| ants | Float64 | | YES | | FIELD |
+-----------+----------------------+------+------+---------+---------------+

Database connection information

Before you begin writing or querying data, it's crucial to comprehend the differences in database connection information between InfluxDB and GreptimeDB.

  • Token: The InfluxDB API token, used for authentication, aligns with the GreptimeDB authentication. When interacting with GreptimeDB using InfluxDB's client libraries or HTTP API, you can use <greptimedb_user:greptimedb_password> as the token.
  • Organization: Unlike InfluxDB, GreptimeDB does not require an organization for connection.
  • Bucket: In InfluxDB, a bucket serves as a container for time series data, which is equivalent to the database name in GreptimeDB.

Write data

GreptimeDB is compatible with both v1 and v2 of InfluxDB's line protocol format, facilitating a seamless migration from InfluxDB to GreptimeDB.

HTTP API

To write a measurement to GreptimeDB, you can use the following HTTP API request:

curl -X POST 'http://<greptimedb-host>:4000/v1/influxdb/api/v2/write?db=<db-name>' \
-H 'authorization: token <greptime_user:greptimedb_password>' \
-d 'census,location=klamath,scientist=anderson bees=23 1566086400000000000'

Telegraf

GreptimeDB's support for the Influxdb line protocol ensures its compatibility with Telegraf. To configure Telegraf, simply add GreptimeDB URL into Telegraf configurations:

[[outputs.influxdb_v2]]
urls = ["http://<greptimedb-host>:4000/v1/influxdb"]
token = "<greptime_user>:<greptimedb_password>"
bucket = "<db-name>"
## Leave empty
organization = ""

Client libraries

Writing data to GreptimeDB is a straightforward process when using InfluxDB client libraries. Simply include the URL and authentication details in the client configuration.

For example:

'use strict'
/** @module write
**/

import { InfluxDB, Point } from '@influxdata/influxdb-client'

/** Environment variables **/
const url = 'http://<greptimedb-host>:4000/v1/influxdb'
const token = '<greptime_user>:<greptimedb_password>'
const org = ''
const bucket = '<db-name>'

const influxDB = new InfluxDB({ url, token })
const writeApi = influxDB.getWriteApi(org, bucket)
writeApi.useDefaultTags({ region: 'west' })
const point1 = new Point('temperature')
.tag('sensor_id', 'TLM01')
.floatField('value', 24.0)
writeApi.writePoint(point1)

In addition to the languages previously mentioned, GreptimeDB also accommodates client libraries for other languages supported by InfluxDB. You can code in your language of choice by referencing the connection information and code snippets provided earlier.

Query data

GreptimeDB does not support Flux and InfluxQL, opting instead for SQL and PromQL.

SQL is a universal language designed for managing and manipulating relational databases. With flexible capabilities for data retrieval, manipulation, and analytics, it is also reduce the learning curve for users who are already familiar with SQL.

PromQL (Prometheus Query Language) allows users to select and aggregate time series data in real time, The result of an expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems via the HTTP API.

Suppose you are querying the maximum cpu usage from the monitor table, recorded over the past 24 hours. In influxQL, the query might look something like this:

SELECT 
MAX("cpu")
FROM
"monitor"
WHERE
time > now() - 24h
GROUP BY
time(1h)

This InfluxQL query computes the maximum value of the cpu field from the monitor table, considering only the data where the time is within the last 24 hours. The results are then grouped into one-hour intervals.

In Flux, the query might look something like this:

from(bucket: "public")
|> range(start: -24h)
|> filter(fn: (r) => r._measurement == "monitor")
|> aggregateWindow(every: 1h, fn: max)

The similar query in GreptimeDB SQL would be:

SELECT
ts,
host,
AVG(cpu) RANGE '1h' as mean_cpu
FROM
monitor
WHERE
ts > NOW() - INTERVAL '24 hours'
ALIGN '1h' TO NOW
ORDER BY ts DESC;

In this SQL query, the RANGE clause determines the time window for the AVG(cpu) aggregation function, while the ALIGN clause sets the alignment time for the time series data. For more information on time window grouping, please refer to the Aggregate data by time window document.

The similar query in PromQL would be something like:

avg_over_time(monitor[1h])

To query time series data from the last 24 hours, you need to execute this PromQL, using the start and end parameters of the HTTP API to define the time range. For more information on PromQL, please refer to the PromQL document.

Visualize data

It is recommended using Grafana to visualize data in GreptimeDB. Please refer to the Grafana documentation for details on configuring GreptimeDB.

Migrate data

For a seamless migration of data from InfluxDB to GreptimeDB, you can follow these steps:

Double write to GreptimeDB and InfluxDB

  1. Write data to both GreptimeDB and InfluxDB to avoid data loss during migration.
  2. Export all historical data from InfluxDB and import the data into GreptimeDB.
  3. Stop writing data to InfluxDB and remove the InfluxDB server.

Write data to both GreptimeDB and InfluxDB simultaneously

Writing data to both GreptimeDB and InfluxDB simultaneously is a practical strategy to avoid data loss during migration. By utilizing InfluxDB's client libraries, you can set up two client instances - one for GreptimeDB and another for InfluxDB. For guidance on writing data to GreptimeDB using the InfluxDB line protocol, please refer to the write data section.

If retaining all historical data isn't necessary, you can simultaneously write data to both GreptimeDB and InfluxDB for a specific period to accumulate the required recent data. Subsequently, cease writing to InfluxDB and continue exclusively with GreptimeDB. If a complete migration of all historical data is needed, please proceed with the following steps.

Export data from InfluxDB v1 Server

Create a temporary directory to store the exported data of InfluxDB.

mkdir -p /path/to/export

Use the influx_inspect export command of InfluxDB to export data.

influx_inspect export \
-database <db-name> \
-end <end-time> \
-lponly \
-datadir /var/lib/influxdb/data \
-waldir /var/lib/influxdb/wal \
-out /path/to/export/data
  • The -database flag specifies the database to be exported.
  • The -end flag specifies the end time of the data to be exported. Must be in RFC3339 format, such as 2024-01-01T00:00:00Z. You can use the timestamp when simultaneously writing data to both GreptimeDB and InfluxDB as the end time.
  • The -lponly flag specifies that only the Line Protocol data should be exported.
  • The -datadir flag specifies the path to the data directory, as configured in the InfluxDB data settings.
  • The -waldir flag specifies the path to the WAL directory, as configured in the InfluxDB data settings.
  • The -out flag specifies the output directory.

The exported data in InfluxDB line protocol looks like the following:

disk,device=disk1s5s1,fstype=apfs,host=bogon,mode=ro,path=/ inodes_used=356810i 1714363350000000000
diskio,host=bogon,name=disk0 iops_in_progress=0i 1714363350000000000
disk,device=disk1s6,fstype=apfs,host=bogon,mode=rw,path=/System/Volumes/Update inodes_used_percent=0.0002391237988702021 1714363350000000000
...

Export Data from InfluxDB v2 Server

Create a temporary directory to store the exported data of InfluxDB.

mkdir -p /path/to/export

Use the influx inspect export-lp command of InfluxDB to export data in the bucket to line protocol.

influxd inspect export-lp \
--bucket-id <bucket-id> \
--engine-path /var/lib/influxdb2/engine/ \
--end <end-time> \
--output-path /path/to/export/data
  • The --bucket-id flag specifies the bucket ID to be exported.
  • The --engine-path flag specifies the path to the engine directory, as configured in the InfluxDB data settings.
  • The --end flag specifies the end time of the data to be exported. Must be in RFC3339 format, such as 2024-01-01T00:00:00Z. You can use the timestamp when simultaneously writing data to both GreptimeDB and InfluxDB as the end time.
  • The --output-path flag specifies the output directory.

The outputs look like the following:

{"level":"info","ts":1714377321.4795408,"caller":"export_lp/export_lp.go:219","msg":"exporting TSM files","tsm_dir":"/var/lib/influxdb2/engine/data/307013e61d514f3c","file_count":1}
{"level":"info","ts":1714377321.4940555,"caller":"export_lp/export_lp.go:315","msg":"exporting WAL files","wal_dir":"/var/lib/influxdb2/engine/wal/307013e61d514f3c","file_count":1}
{"level":"info","ts":1714377321.4941633,"caller":"export_lp/export_lp.go:204","msg":"export complete"}

The exported data in InfluxDB line protocol looks like the following:

cpu,cpu=cpu-total,host=bogon usage_idle=80.4448912910468 1714376180000000000
cpu,cpu=cpu-total,host=bogon usage_idle=78.50167052182304 1714376190000000000
cpu,cpu=cpu-total,host=bogon usage_iowait=0 1714375700000000000
cpu,cpu=cpu-total,host=bogon usage_iowait=0 1714375710000000000
...

Import Data to GreptimeDB

Before importing data to GreptimeDB, if the data file is too large, it's recommended to split the data file into multiple slices:

split -l 100000 -d -a 10 data data.
# -l [line_count] Create split files line_count lines in length.
# -d Use a numeric suffix instead of a alphabetic suffix.
# -a [suffix_length] Use suffix_length letters to form the suffix of the file name.

You can import data using the HTTP API as described in the write data section. The script provided below will help you in reading data from the files and importing it into GreptimeDB.

Suppose you are in the directory where the data files are stored:

.
├── data.0000000000
├── data.0000000001
├── data.0000000002
...

Replace the following placeholders with your GreptimeDB connection information to setup the environment variables:

export GREPTIME_USERNAME=<greptime_username>
export GREPTIME_PASSWORD=<greptime_password>
export GREPTIME_HOST=<host>
export GREPTIME_DB=<db-name>

Import the data from the files into GreptimeDB:

for file in data.*; do
curl -i --retry 3 \
-X POST "http://${GREPTIME_HOST}:4000/v1/influxdb/write?db=${GREPTIME_DB}&u=${GREPTIME_USERNAME}&p=${GREPTIME_PASSWORD}" \
--data-binary @${file}
sleep 1
done