千万级数据拆分与迁移

​ 将数据从一张表中迁移到另一张表中,在数据量少的情况下很容易实现,而且方法也有很多,但当数据量到了百万、千万级时,不同方法在效率上会有很大差别。

业务场景

项目在运行时会存储实时数据,主表每秒插入2,3条数据,一条主数据对应10条左右明细数据,现在数据库中,主表1千万条,明细8千万;需求是要根据主表中的type字段进行分类,将主表和明细表拆再拆出2张表。

实现

项目在运行,实时会有数据存入,所以分为3个阶段;原表alarm为主表,alarm_detail为明细表;

1、将数据按类型从原表中导出为文件,再导入到对应的新建表中

2、项目暂停,执行替换sql,更新代码,重启项目

3、将在导入导出期间产生的数据再次按类型插入到新表中

一、项目正常运行,代码未更新

复制alarm 与 alarm_detail 表

1
2
CREATE TABLE alarm_copy1 LIKE alarm;
CREATE TABLE alarm_detail_copy1 LIKE alarm_detail;

新建alarm_2表与明细表alarm_2_detail

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CREATE TABLE `alarm_2`  (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`alarmId` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '告警Id',
`type` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE,
UNIQUE INDEX `alarmId`(`alarmId`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci COMMENT = '主表' ROW_FORMAT = Dynamic;

CREATE TABLE `alarm_2_detail` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`iotId` bigint(20) NOT NULL,
`paramValue` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE,
INDEX `iotId`(`iotId`) USING BTREE,
CONSTRAINT `fk_iotId` FOREIGN KEY (`iotId`) REFERENCES `alarm_2` (`id`) ON DELETE CASCADE
) ENGINE = InnoDB AUTO_INCREMENT = 0 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci COMMENT = '明细表' ROW_FORMAT = Compact;

获取原表主键最大id

1
SELECT @maxId:=MAX(id) FROM alarm;

导出出A类型主数据,再导入新表

FIELDS TERMINATED BY ','表示字段间用,分割

OPTIONALLY ENCLOSED BY '"'表示所有字符用"包裹

ESCAPED BY '*'表示用*代表转义字符

LINES TERMINATED BY '\r\n'表示每条数据用空格加回车结束

1
2
3
SELECT * FROM alarm WHERE id<@maxId AND type='A' INTO OUTFILE 'alarm_2.txt' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '*' LINES TERMINATED BY '\r\n';
#导入
LOAD DATA INFILE 'alarm_2.txt' INTO TABLE alarm_2 FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '*' LINES TERMINATED BY '\r\n';

导出出A类型明细数据,再导入新表

1
2
SELECT ad.* FROM alarm_detail ad WHERE EXISTS (SELECT iotId FROM alarm a WHERE ad.iotId=a.id) INTO OUTFILE 'alarm_2_detail.txt' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '*' LINES TERMINATED BY '\r\n';
LOAD DATA INFILE 'alarm_2_detail.txt' INTO TABLE alarm_2_detail FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '*' LINES TERMINATED BY '\r\n';

导出出B类型主数据,再导入copy表

1
2
SELECT * FROM alarm WHERE id<@maxId AND type='A' INTO OUTFILE 'alarm_copy1.txt' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '*' LINES TERMINATED BY '\r\n';
LOAD DATA INFILE 'alarm_copy1.txt' INTO TABLE alarm_copy1 FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '*' LINES TERMINATED BY '\r\n';

导出出B类型明细数据,再导入新表

1
2
SELECT ad.* FROM alarm_detail ad WHERE EXISTS (SELECT iotId FROM alarm_copy1 ac WHERE ac.iotId=ad.id) INTO OUTFILE 'alarm_detail_copy1.txt' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '*' LINES TERMINATED BY '\r\n';
LOAD DATA INFILE 'alarm_detail_copy1.txt' INTO TABLE alarm_detail_copy1 FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '*' LINES TERMINATED BY '\r\n';

二、项目暂停、更新代码

项目需要暂停几分钟,修改表名并更新代码

修改表名称

将*_copy1表名与原表互换

1
2
3
4
5
6
ALTER TABLE alarm RENAME TO alarm_temp;
ALTER TABLE alarm_detail RENAME TO alarm_detail_temp;
ALTER TABLE alarm_copy1 RENAME TO alarm;
ALTER TABLE alarm_detail_copy1 RENAME TO alarm_detail;
ALTER TABLE alarm_temp RENAME TO alarm_copy1;
ALTER TABLE alarm_detail_temp RENAME TO alarm_detail_copy1;

修改自增数

为了保证在新数据插入新表后不会与暂未拆分数据主键冲突,需要将所有表的自增值改为原表自增数大小,

1
2
3
4
5
6
7
8
9
10
11
#创建存储过程修改自增值
DROP PROCEDURE IF EXISTS copy_auto_increment;
CREATE PROCEDURE `copy_auto_increment`(IN `sourceTable` varchar(50),IN `targetTable` varchar(50))
BEGIN
DECLARE num BIGINT;
DECLARE alterSql VARCHAR(250);
SELECT AUTO_INCREMENT INTO num FROM information_schema.tables WHERE table_name = sourceTable AND table_schema = DATABASE();
SET num=num+100;
SET alterSql=CONCAT('ALTER TABLE ',targetTable,' AUTO_INCREMENT=',num);
EXECUTE alterSql;
END;

修改alarmalarm_detailalarm_2alarm_2_detail的自增数(大于原表目前的自增数值即可)

1
2
3
4
5
#调用
CALL copy_auto_increment('iot_alarm_copy1','iot_alarm');
CALL copy_auto_increment('iot_alarm_copy1','iot_alarm_fault');
CALL copy_auto_increment('iot_alarm_detail_copy1','iot_alarm_detail');
CALL copy_auto_increment('iot_alarm_detail_copy1','iot_alarm_detail_fault');

查询已拆分数据的最大id,并记录

1
2
3
4
SELECT @maxId:=MAX(a.id) MAXId FROM(
SELECT MAX(id) id FROM iot_alarm
UNION
SELECT MAX(id) id FROM iot_alarm_fault)a;

三、代码更新完成,项目启动

现在在*_copy1表中还有未拆分的数据,现在再进行拆分

创建拆分存储过程

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
DROP PROCEDURE IF EXISTS post_alarm_split;
CREATE PROCEDURE `post_alarm_split`(IN `maxId` bigint)
BEGIN
-- 定义接收游标数据的变量
DECLARE done BOOLEAN DEFAULT 0;
DECLARE var_id BIGINT DEFAULT 0;
DECLARE var_alarmId VARCHAR(50);
DECLARE var_type VARCHAR(50);

DECLARE cur_alarm CURSOR FOR (SELECT id,alarmId,type FROM iot_alarm_copy1 WHERE id>=maxId ORDER BY id);
#退出标志
DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = 1 ;
OPEN cur_alarm;

REPEAT
FETCH cur_alarm INTO var_id,var_alarmId,var_type;
IF NOT done THEN
#A类
IF var_type='A' THEN
INSERT INTO alarm_2 VALUES(var_id,var_alarmId,var_type);
INSERT INTO alarm_2_detail (SELECT id,iotId,paramValue FROM alarm_detail_copy1 WHERE iotId = var_id);
#B类
ELSE
INSERT INTO alarm VALUES(var_id,var_alarmId,var_type);
INSERT INTO alarm_detail (SELECT id,iotId,paramValue FROM alarm_detail_copy1 WHERE iotId = var_id);
END IF;
END IF;
UNTIL done END REPEAT;
CLOSE cur_alarm;
END;

执行拆分

@maxId为之前已拆分过数据的最大主键id+1

1
2
#后置拆分
CALL post_alarm_split(@maxId);

总结

1、这种拆分方式适合主键id自增的情况

2、项目重启后的拆分也可以使用导入导出的方式拆,重点是记录好已拆分的id

3、利用导出导入的方式可以大大提高数据迁移的效率

4、如果预计某些表数据量会很大时,可以考虑在表创建时就进行分区、或分表

备注

导出时生成的文件在mysql的Data目录下,可以删除,节省服务器存储空间