Skip to content

Commit 704b5d3

Browse files
committed
copy edits:
1 parent 30dbd7c commit 704b5d3

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

blog/2022/lessons-learned-migrating-large-mysql-databases.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ All of these lessons require careful thought, but this one especially so.
2626

2727
```sql
2828
set foreign_key_checks = 0;
29-
load data from s3 /* ... */;
29+
/* import data here */;
3030
set foreign_key_checks = 1
3131
```
3232

@@ -71,12 +71,14 @@ Now imagine your table is hundreds of gigabytes, with a bunch of indexes. In the
7171

7272
## Lesson #3: Split the work up to use the best tool for the data you need to move
7373

74-
`mysqldump` is great, but not the best approach when you've got a bunch of tables that are hundreds of gigabytes each that need to be moved. Since we're on AWS, and our MySQL databases are actually [Aurora MySQL][aurora], we were able to use some special syntax to [export data to CSV's on S3][export] and then [load it back into MySQL][import] on the new instance:
74+
`mysqldump` is great, but not the best approach when you've got a bunch of tables that are hundreds of gigabytes each that need to be moved. Since we're on AWS, and our MySQL databases are actually [Aurora MySQL][aurora], we were able to use some special syntax to [export data to CSV's on S3][export] and then [load it back into MySQL][import] on the new instance.
7575

7676
[aurora]: https://aws.amazon.com/rds/aurora/
7777
[export]: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.SaveIntoS3.html
7878
[import]: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraMySQL.Integrating.LoadFromS3.html
7979

80+
We also heavily used `mysqldump` to handle the migration of smaller tables. That's why this lesson is labeled "split the work up;" because sometimes it makes sense to use `mysqldump` and sometimes it doesn't. Anything under 1gb we especially didn't even consider worth ... considering. But for those really big tables, here's how we exported them to S3 and then into the new database server:
81+
8082
```sql
8183
select * from Fubar into outfile s3 's3-us-east-1://bucket-name/Fubar.csv'
8284
fields terminated by ','
@@ -107,20 +109,20 @@ set
107109

108110
_Repeat the `load data` step for each 6gb chunk, e.g. `.part_00001`, `.part_00002`, etc._
109111

110-
The approach described above is great for moving really large tables to a new db server instance. You wouldn't want to do this for all tables, unless they're all really large, because it adds a bunch of manual steps. You have to create the tables manually, and you'll want to wait until the data is inserted before adding the indexes so that the indexes don't have to be updated for every record you insert.
112+
The approach described above is great for moving really large tables to a new db server instance. You wouldn't want to do this for all tables, unless they're all really large, because it adds a bunch of manual steps. When you do it this way, you have to create the tables manually and you'll want to wait until the data is inserted before adding the indexes so that the indexes don't have to be updated for every record you insert.
111113

112-
But this approach is better than using `mysqldump` specifically because it allows you to import the data before the indexes are added. I bet there are arguments to `mysqldump` to make it not create indexes or to affect when it adds them, but by default it includes them in the `create table` statements.
114+
But this approach is better than using `mysqldump` for large tables specifically because it allows you to import the data before the indexes are added. I bet there are arguments to `mysqldump` to make it not create indexes or to affect when it adds them, but by default it includes them in the `create table` statements.
113115

114116
This approach also gives you the opportunity to:
115117

116-
1. import a subset of the data
118+
1. import only a subset of the data (in our case, the most recent data)
117119
2. add the indexes
118120
3. bring the application back online so that users can continue working, and then
119121
4. allow the rest of the data to restore over time, albeit more slowly because of the constant index rebuilding.
120122

121123
This is what we ended up doing.
122124

123-
After waiting for _more than 6 hours_ for 5 indexes to be added to a single table, we stumbled our way into the idea of importing only the most recent data for each table, then added the indexes. This allowed our app to run in a slightly degraded state, but it was online. Then we queued up the rest of the imports (`load data from s3`) and went to bed. When I woke up in the morning, I checked in on it and the data loads were complete.
125+
After waiting for _more than 6 hours_ for 5 indexes to be added to a single table after importing its entire history, we stumbled our way into the idea of importing only the most recent data for each table, then adding the indexes. This allowed our app to run in a slightly degraded state, but it was online. Then we queued up the rest of the imports (`load data from s3`) and went to bed. When I woke up in the morning, I checked in on it and the data loads were complete.
124126

125127
This approach also requires some special care.
126128

0 commit comments

Comments
 (0)