Exporting Data From GaussDB
Comparison of Export Methods
Export Tool
Export Steps
Applicable Scenarios and Notes
Using GDS Tool to Export Data to a Regular File System
Note: The GDS tool must be installed on the server where the data files are exported
Remote Export Mode: Export business data from the cluster to an external host.
1. Plan the export path, create GDS operation users, and set write permissions for the GDS user on the export path.
2. Install, configure, and start GDS on the server where data will be exported.
3. Create an external table in the cluster, with the location path in the format “gsfs://192.168.0.90:5000/”.
Local Export Mode: Export business data from the cluster to the host where the cluster nodes are located. This strategy is tailored for numerous small files.
1. Plan the export path and create directories to store exported data files on each DN in the cluster, such as “/output_data”, and change the owner of this path to omm.
2. Install, configure, and start GDS on the server where data will be exported.
3. Create an external table in the cluster, with the location path in the format “file:///output_data/”.
GDS tools suitable for scenarios with high concurrency and large data exports. Utilizes multi-DN parallelism to export data from the database to data files, improving overall export performance. Does not support direct export to HDFS file system.
Notes on Remote Export:
1. Supports concurrent export by multiple GDS services, but one GDS can only provide export services for one cluster at a time.
2. Configure GDS services within the same intranet as the cluster nodes. Export speed is affected by network bandwidth. The recommended network configuration is 10GE.
3. Supported data file formats: TEXT, CSV, and FIXED. Single-row data size must be <1GB.
Notes on Local Export:
1. Data will be evenly split and generated in the specified folders on the cluster nodes, occupying disk space on the cluster nodes.
2. Supports data file formats: TEXT, CSV, and FIXED. Single-row data size must be <1GB.
gs_dump and gs_dumpall Tools
gs_dump supports exporting a single database or its objects.
gs_dumpall supports exporting all databases in the cluster or common global objects in each database.
The tools support exporting content at the database level, schema level, and second level. Each level can be separately defined to export the entire content, only object definitions, or only data files.
Step 1: The omm operating system user logs into any host with MPPDB service installed and executes: source $ {BIGDATA_HOME}/mppdb/.mppdb
gs_profile command to start environment variables
Step 2: Use gs_dump to export the postgres database: gs_dump -W Bigdata@123 -U jack -f /home/omm/backup/postgres_backup.tar -p 25308 postgres -F t
1. Export the entire database information, including data and all object definitions.
2. Export the full information of all databases, including each database in the cluster and common global objects (including roles and tablespace information).
3. Export only all object definitions, including tablespace, database definitions, function definitions, schema definitions, table definitions, index definitions, and stored procedure definitions.
4. Export only data, excluding all object definitions.
GDS External Table Remote Export Example
groupadd gdsgrp
useradd -g gdsgrp gds_user
chown -R gds_user:gdsgrp /output_data
/opt/bin/gds/gds -d /output_data -p 192.168.0.90:5000 -H 10.10.0.1/24 -D
CREATE FOREIGN TABLE foreign_tpcds_reasons
(
r_reason_sk integer not null,
r_reason_id char(16) not null,
r_reason_desc char(100)
) SERVER gsmpp_server OPTIONS (LOCATION ‘gsfs://192.168.0.90:5000/’, FORMAT ‘CSV’,ENCODING
‘utf8′,DELIMITER E’x08′, QUOTE E’x1b’, NULL ”) WRITE ONLY;
INSERT INTO foreign_tpcds_reasons SELECT * FROM reasons;
ps -ef|grep gds
gds_user 128954 1 0 15:03 ? 00:00:00 gds -d /output_data -p 192.168.0.90:5000 -D
gds_user 129003 118723 0 15:04 pts/0 00:00:00 grep gds
kill -9 128954