Thinking about affordable, reliable data migrations
I'm currently working with two tools to try and perform full load and ongoing replication onto Redshift.
AWS DMS
- Much less expensive than FlyData as an option - I dropped and recreated all the tables several times last month and that + running the replication cost $64.74 (charges would be lower if dropping and reloading the tables hadn't been needed). There is an additional cost from the replication instance, but it's still nothing compared to FlyData.
- Allows some operations on the data to be imported (e.g. truncating long text fields), which allows us to customize the tables so they might work better for queries
- I'm still doing back and forth with AWS customer support to figure out the issues that are going on. I'd also like to spend a bit more time debugging this with customer support, because
- DMS enables some more transformations on the incoming data -> we can optimize the long text columns
- using DMS doesn't bind us to a new third party service provider (we're already on Amazon)
- we can use DMS for more sources than just the AK database - for example, if we'll want to write data into a different schema from the 'email a target' tool Omar is working on right now, we could just set up a DMS task.
FlyData
- A third party service that replicates using binary logs similarly to DMS
- Seems to just work without any issues with a quick setup
- Would get very expensive due to all the writes we do to core_usermailing - in December, we wrote 75 million rows and in January, 54 million. The plan for 75 million writes per costs $1,163 per month if payed monthly and $874 on an annual plan.
- If we instead used a different tool for migrating core_usermailing (e.g. periodic script / DMS), we'd only have a fraction of the writes as the next most active table, core_open, gets only about 0.15 of the writes that core_usermailing does. Imagining that we would then be at up to 15 million writes per month, we would still be at $398 per month if payed monthly and $200 on an annual plan.
- Doesn't really support transformations on the data to be imported - so we can't really optimize the data types on Redshift to work better for queries (e.g. value fields in core_actionfield and core_userfield will be max length varchar fields)
- This might actually be a deal breaker! Redshift does not perform well with queries over long text columns. We might want to consider whether it makes sense to have an expensive Redshift instance that might not work any better than a well indexed MySQL / Postgres db.
Thinking about affordable, reliable data migrations
I'm currently working with two tools to try and perform full load and ongoing replication onto Redshift.
AWS DMS
FlyData