You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: blog/2022/lessons-learned-migrating-large-mysql-databases.md
+8-6Lines changed: 8 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ All of these lessons require careful thought, but this one especially so.
26
26
27
27
```sql
28
28
set foreign_key_checks =0;
29
-
load data from s3 /*...*/;
29
+
/*import data here*/;
30
30
set foreign_key_checks =1
31
31
```
32
32
@@ -71,12 +71,14 @@ Now imagine your table is hundreds of gigabytes, with a bunch of indexes. In the
71
71
72
72
## Lesson #3: Split the work up to use the best tool for the data you need to move
73
73
74
-
`mysqldump` is great, but not the best approach when you've got a bunch of tables that are hundreds of gigabytes each that need to be moved. Since we're on AWS, and our MySQL databases are actually [Aurora MySQL][aurora], we were able to use some special syntax to [export data to CSV's on S3][export] and then [load it back into MySQL][import] on the new instance:
74
+
`mysqldump` is great, but not the best approach when you've got a bunch of tables that are hundreds of gigabytes each that need to be moved. Since we're on AWS, and our MySQL databases are actually [Aurora MySQL][aurora], we were able to use some special syntax to [export data to CSV's on S3][export] and then [load it back into MySQL][import] on the new instance.
We also heavily used `mysqldump` to handle the migration of smaller tables. That's why this lesson is labeled "split the work up;" because sometimes it makes sense to use `mysqldump` and sometimes it doesn't. Anything under 1gb we especially didn't even consider worth ... considering. But for those really big tables, here's how we exported them to S3 and then into the new database server:
81
+
80
82
```sql
81
83
select*from Fubar into outfile s3 's3-us-east-1://bucket-name/Fubar.csv'
82
84
fields terminated by ','
@@ -107,20 +109,20 @@ set
107
109
108
110
_Repeat the `load data` step for each 6gb chunk, e.g. `.part_00001`, `.part_00002`, etc._
109
111
110
-
The approach described above is great for moving really large tables to a new db server instance. You wouldn't want to do this for all tables, unless they're all really large, because it adds a bunch of manual steps. You have to create the tables manually, and you'll want to wait until the data is inserted before adding the indexes so that the indexes don't have to be updated for every record you insert.
112
+
The approach described above is great for moving really large tables to a new db server instance. You wouldn't want to do this for all tables, unless they're all really large, because it adds a bunch of manual steps. When you do it this way, you have to create the tables manually and you'll want to wait until the data is inserted before adding the indexes so that the indexes don't have to be updated for every record you insert.
111
113
112
-
But this approach is better than using `mysqldump` specifically because it allows you to import the data before the indexes are added. I bet there are arguments to `mysqldump` to make it not create indexes or to affect when it adds them, but by default it includes them in the `create table` statements.
114
+
But this approach is better than using `mysqldump`for large tables specifically because it allows you to import the data before the indexes are added. I bet there are arguments to `mysqldump` to make it not create indexes or to affect when it adds them, but by default it includes them in the `create table` statements.
113
115
114
116
This approach also gives you the opportunity to:
115
117
116
-
1. import a subset of the data
118
+
1. import only a subset of the data (in our case, the most recent data)
117
119
2. add the indexes
118
120
3. bring the application back online so that users can continue working, and then
119
121
4. allow the rest of the data to restore over time, albeit more slowly because of the constant index rebuilding.
120
122
121
123
This is what we ended up doing.
122
124
123
-
After waiting for _more than 6 hours_ for 5 indexes to be added to a single table, we stumbled our way into the idea of importing only the most recent data for each table, then added the indexes. This allowed our app to run in a slightly degraded state, but it was online. Then we queued up the rest of the imports (`load data from s3`) and went to bed. When I woke up in the morning, I checked in on it and the data loads were complete.
125
+
After waiting for _more than 6 hours_ for 5 indexes to be added to a single table after importing its entire history, we stumbled our way into the idea of importing only the most recent data for each table, then adding the indexes. This allowed our app to run in a slightly degraded state, but it was online. Then we queued up the rest of the imports (`load data from s3`) and went to bed. When I woke up in the morning, I checked in on it and the data loads were complete.
0 commit comments