Cloud Migration — Part 2

yaron edry
8 min readAug 24, 2019

--

In my previous post, I focused on virtual machines migration with Velostrata. The purpose of this post is to explore migration options for other cloud services and DBs from AWS to GCP.

Relational DBs:

DMS (Database Migration Service) is an AWS Service which helps you migrate databases quickly and securely. Define the source DB endpoint and the destination. Set an ongoing migration task which occurs in the background with zero interference to your live environment.

  1. Go to GCP dashboard, enter the SQL menu and create an instance MySQL or PostgreSQL, make sure to choose the Region, Zone, and network of which your resources reside.
GCP dashboard — SQL service

2. Go to AWS console and navigate to the DMS dashboard

3. Create an endpoint. see an “active” status:

4. Go to Database migration task and ‘create’ a job

The process itself is pretty fast and 500GB migrated in a matter of 1–2 days, however, it's important to note that after the data transfer was completed, our web-apps ran into errors because of missing primary keys and Auto Increment (PK,AI) definitions in the tables schemas.

Solution:

  • Since it is really a minor issue I suggest to enforce these configs on the schemas with a UI:
MySQL workbench UI — mark PK and AI in your ID field and apply the changes
  • or simply query it:
ALTER TABLE `table_name` 
CHANGE COLUMN `id` `id` INT(11) NOT NULL AUTO_INCREMENT ,
ADD PRIMARY KEY (`id`);

For any issues or errors:

A. Make sure firewall rules/SG enabling this transfer.

B. Use CloudWatch for Troubleshooting — it's not cheap but it's very useful.

Load Balancers and Certificates

If you have web apps running on AWS you probably have LBs and SSL certificates configured in the AWS EC2 LBs and in the Certificate Manager:

Now, there is a nice guide here on how to create LBs in GCP, but before you can create an SSL certificate (google issued) in GCP you need to remove the existing certificate for the specific domain you wish to migrate on AWS (in my example above: go to Action=>Delete

deleting the certificate for the xxx.mydomain.com before issuing a new certificate on GCP

Important: attempting to issue a new certificate for the subdomain of your service will result in a never-ending “PROVISIONING” status. In general, if the PROVISIONING takes longer than 20 minutes, remove the certificate and try again.

note: domain should not include “www” but ‘subdomain.domain.com

We have thousands of endpoint users engaging with this endpoint/subdomain every day. As our goal was always zero downtime and interruption in the production environment this poses an issue. Removing the LBs and SSL certificates in AWS and re-issue in GCP takes about 15–20 minutes. during that time of which the endpoint is unavailable.

Solutions:

1. Before the move, create an alternative/temp endpoint which points to the web-apps LB in your newly deployed GCP environment.

Your web application environment should be hybrid at this point to support both endpoints in both clouds:

Hybrid cloud — sub.domain.com and sub-temp.domain.com both available for clients

The same LB can be used with several certificates, so you can create a new certificate with a temp.subdomain.domain.com and temporarily update your clients to use this URL. If you do this, make sure you update your server with the new subdomain. for example, if you use Nginx:

updating Nginx conf.d file with the newly temporarily endpoint temp-sub.domain.com
LBs configuration on GCP — multiple certificates for the same Load Balancer

2. If you did ‘1’ you are pretty much covered and you can already remove the certificate in AWS without disturbing your live environment. if you didn’t, I would advise notifying your clients for a 1 hour of unavailability for maintenance — this way you have about 15 minutes for the certificate issuing in GCP and more time to deploy your code with the new configurations and handle unexpected issues that may arise.

3. Make sure no resources are used from your old environment on AWS

None relational DBs:

  1. MongoDB — if you are working with MongoDB on a replica-set framework I suggest migrating your MongoDB instances with Velostrata then configure these instances in GCP as ‘secondaries’. Simply execute within a mongo shell:
rs.add(“new_mongo_ip_or_dns_gcp.com:27017” )

2. After you verify healthy sync you can force one of the replicas to become primary (see docs here).

rs.status() // => see which is the index of your preferred node
cfg = rs.conf()
cfg.members[0].priority = 0.5
cfg.members[1].priority = 0.5
cfg.members[2].priority = 1
rs.reconfig(cfg) // => member[2] will become primary immediately

3. Once rs.status() looks okay. you can drop the rs members in the old AWS environments:

rs.remove("your.aws.node.com:27017")
connectivity JSONs with rs.status() should look healthy

An issue you may encounter: not reachable member

"stateStr" : "(not reachable/healthy)",

Solution: resolve certainly by going to VPC network Firewall rules (GCP) / Security groups in AWS, enable IPs and port 27017 on both directions.

ElastiCache

If you are using cache my best suggestion is to rebuild the cache in the new environment. An hybrid-cloud environment as we did in the “LB and certificates” section above is the best way to go. you can build the cache in the new environment while the old environment is still operational and working with the cache on AWS. then, After the cache is built, you simply deploy your code with the new cache configs and remove the ElastiCache cluster from AWS. (More about Memorystore here)

Apache Solr

Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene™

Today it is really common to have a synced Solr/elasticsearch cluster in addition to other SQL/NoSQL DBs. More and more apps require Full-Text Search and it is likely to find these configurations within many applications:

Built on top of Lucene

Exciting news: Full-text search will be soon natively built into MongoDB so that it integrates directly with Lucene (I will not get into that, click here if you want to hear more about it). Anyway, until then it is inevitable to not have an additional search engine synced with your SQL/NoSQL DBs.

Solr is being used for many tasks, performs fast and easily maintained. As part of a feature to support ‘suggestions’ for a query term — (read about Solr suggestComponent) An opportunity presented itself that the Cloud Migration of Solr should also include a version upgrade.

Trust me when I say that upgrading your Solr from version 4.x.x to 7.x.x is not something that can be done with a simple ./bash cmd.

To start with I see two options:

  1. Deploy a Linux-based machine and install Java/Tomcat/Solr — there are many step-by-step guides for that.
  2. Deploy Solr directly from the GCP marketplace .available stable version now is 7.7.2.

Now, we can start configuring our newly upgraded Solr and GCP and be up and running in no-time:

A. security.json: Update your security.json:

cd /var/solr/data
sudo vim security.json

replace the content of the security.json file with this:

{
"authentication":{
"blockUnknown": true,
"class":"solr.BasicAuthPlugin",
"credentials":{"solr":"IV0EHq1OnNrj6gvRCwvFwTrZ1+z1oBbnQdiVC3otuq0= Ndd7LKvVBAaZIF0QAVi1ekCfAJXr1GGfLtRUXhgrF8c="}
},
"authorization":{
"class":"solr.RuleBasedAuthorizationPlugin",
"user-role":{"solr":"admin"},
"permissions":[{"name":"security-edit",
"role":"admin"}]
}}

This creates the user ‘solr’ with the password ‘SolrRocks’, after that, you can ‘add user or edit password’ with the set-user plugin command (more about Solrs Authentication API here)

B. Log in: login to the Solr UI and browse through it, see that your access is granted.

C. Additional Disk: At this point, I suggest you add a secondary (non-boot) zonal persistent disks for your data directory. create the disk from the GCP dashboard (VM instances), formatting/mounting it is very straight forward and explained perfectly on the GCP docs (see here)

important: make sure you have created the symbolic link to the mounted disk and that the permissions are ‘solr’ and not ‘root’:

permissions: ‘solr’ user

to set permissions, edit the path to the file/dir and execute:

sudo chown -R solr:solr path

D. Cores: Create the Cores:

cd /opt/solr/bin/
sudo -H -u solr bash solr create -c core1
sudo -H -u solr bash solr create -c core2
...
...
...

E. Rebuild: Now you can add fields to the Schemas easily with Solr Schemas UI. you can look at your old cores schema in one side and rebuild on the other.

I highly recommend to not copy your old schema. challenging backwords-compatibility throughout 1 Solr version or 2 is acceptable, but for more than that you are likely to spend more time fixing the errors from your schemas and configs rather than to rebuild it from scratch.

Adding fields/dynamic fields/copy fields from the UI

Problem: After creating a field verify that the field was inserted with the correct properties. Common bugs with Solr UI causing fields to be inserted with more values than what you’ve box-checked.

Solution: mark the checkbox twice, or insert it directly to the managed-schema file directly via SSH. After each insert, go to ‘Core-Admin’ and reload the schema. make sure no error occurs and continue setting the schema fields.

F. Indexing: although this part comes last it might take the longest depends on how many documents contained within your Solr cores. If inserting the documents to your index is something that is daily/weekly within your business logic then you are in luck, simply run your code and you are done. otherwise, create a process which reads from old-AWS-solr and writes to new-GCP-solr. Another option is Solr's Data Import Handler (see here) I’m not optimistic but you are welcome to try.

For me, reindex (to new Solr)~20 million documents took about 2.5 days.

Summary

In this post (Part 2), we’ve covered other services/DBs which needed to be migrated from AWS cloud to GCP.

I think that minimal (zero) interference to production can be accomplished if it is planned correctly and the key is a ‘hybrid-cloud’ — gradually relocate resources while enabling connectivity between resources in both environments. Although it may require more work, coding and deployments there is no impact on live-ongoing production end-clients.

Feel free to share your experience on how you migrate these/other resources between clouds or ask any question.

Happy Migrating

--

--