Take Orchestration to the next level with lineage, observability, quality monitoring and incredible alerting

Migrating from Shipyard to Orchestra

Everything you need to know to ensure minimal disruption

9 min readJul 18, 2024

--

At Orchestra our mission is to make data engineers’ lives easier, so have put together this short but detailed technical migration guide for Shipyard users to migrate their workflows to Orchestra, the Unified Control Plane for Data Pipelines.

To find out more about Orchestra speak to us or start your migration here for free.

Introduction

This is a technical guide for Shipyard users who are considering migrating to Orchestra. In this guide, we’ll show how to migrate a simple flow and provide additional resources for more low-level tasks.

High level concepts and Definitions

  • In Shipyard, a Fleet is a Data Pipeline in Orchestra. A pipeline is a DAG
  • In Shipyard, a Vessel is container for a Task in Orchestra; the atomic unit in a DAG. Whereas in Shipyard you may require multiple “Blueprints” to achieve something, in Orchestra you can do this with a single Task
  • Triggers in Shipyard and Orchestra have the same meaning
  • Pipelines are built in Shipyard’s UI; Orchestra also has a UI for this

Technical Migration Steps

  1. Download the YAML representation for your Data Pipelines or Shipyard Fleets. This can be found in the .yaml editor here. If your code is located in a git repository, saving this down will suffice.
The yaml editor in Shipyard

Normally — this would only have been possible via the API. We understand API Access has been enabled for all accounts. With the API, one can export Fleet YAML configurations, giving you a record of any workflows you have built with Shipyard and their underlying definition. For more information on this — see the guide written at the end of this article.

2. Download any code you are currently using to run in Shipyard. Group these into the following categories: Python, dbt-core, Other

3. Create a list of Integrations or Blueprints that you currently use. An example would be

Python, dbt-core, Fivetran, Airbyte Server, Tableau, Snowflake, etc.

You then have two options:

Option 1: managed implementation

Managed implementation will be offered to Shipyard customers free of charge. Set-up a call with the team, and we will assist in the creation of pipelines from your existing code in steps (1–3) and ensure you are able to run your pipelines in Orchestra.

Option 2: Self-service implementation

See more below.

Self-service implementation

  1. Identify important Pipelines: you should identify the most mission-critical pipelines ahead of any migration efforts. Make a note of the Integrations required to facilitate these.
  2. Identify Pipeline Candidate: Identify a pipeline to build in Orchestra. You should ensure Orchestra supports the relevant integrations. These can be found here and within the portal under “Integrations”
  3. Create additional data resources: in order to avoid disruption, it will be imperative to adopt an “expand and contract” approach. This means creating a new staging data environment to run data pipelines in Orchestra while migrating from Shipyard to avoid disruption.

Example; Fivetran, dbt-core and Snowflake

Firstly, create a new database and schema in Snowflake. This will be used for “raw” data or data landing into Snowflake.

CREATE DATABASE my_new_database;
USE DATABASE my_new_database;
CREATE SCHEMA my_new_schema;
CREATE WAREHOUSE my_new_warehouse;

Next, reconfigure your dbt profiles.yml file and add the resources from before.

my_dbt_project:
target: dev
outputs:
staging_new:
type: snowflake
account: <your_snowflake_account>
user: <your_username>
password: <your_password>
role: <your_role>
database: my_new_database
warehouse: <your_warehouse>
schema: my_new_schema
threads: 4
client_session_keep_alive: False

If using dbt-cloud, you will need to set-up a new Project with new credentials or override the connection configurations in dbt Cloud.

Configuring connections to a project in dbt Cloud

You should then create a new connection in Fivetran (but this could be any ingestion tool such as Airbyte, Portable etc.). This connection should land data for your data Pipeline in the schema you created previously. This will require you to create a new destination for Snowflake as well.

Creating a new Snowflake Destination in Fivetran

Well done. You have completed the bare minimum set-up required to migrate a simple Shipyard Fleet to an Orchestra Pipeline that leverages Fivetran, dbt-core/cloud and Snowflake.

4. Set-up Connections: in order to work with pieces of cloud infrastructure, you will need to set-up connections to your integrations in Step (3). Ensure you have these handy. An example of the information you might need for a Snowflake connection are below (key/pair auth is also supported):

STORAGE_ACCOUNT_URL = os.getenv("STORAGE_ACCOUNT_URL")
STORAGE_ACCOUNT_KEY = os.getenv("STORAGE_ACCOUNT_KEY")
CONTAINER_NAME = os.getenv("CONTAINER_NAME")
SNOWFLAKE_USER = os.getenv("SNOWFLAKE_USER")
SNOWFLAKE_PASSWORD = os.getenv("SNOWFLAKE_PASSWORD")
SNOWFLAKE_ACCOUNT = os.getenv("SNOWFLAKE_ACCOUNT")
SNOWFLAKE_DATABASE = os.getenv("SNOWFLAKE_DATABASE")
SNOWFLAKE_SCHEMA = os.getenv("SNOWFLAKE_SCHEMA")
SNOWFLAKE_WAREHOUSE = os.getenv("SNOWFLAKE_WAREHOUSE")
SNOWFLAKE_ROLE = os.getenv("SNOWFLAKE_ROLE")

5. Build Pipelines: in order to build the pipelines, you must head to the pipeline builder. The Pipeline Builder is extremely intuitive and easy to use and is similar to Shipyard’s. You can find more information below:

6. Manually run a pipeline: you should manually run a pipeline before placing it on a schedule

7. Cross-reference data: you should cross-reference the outputted data in the new tables, schemas and data assets produced by Orchestra to your existing data assets that are being updated by Shipyard

8. Schedule pipeline: you should add a schedule or other trigger to your pipeline and ensure it is running smoothly and as expected

9. Repeat: repeat steps 5–8 for your pipelines

Good to know

Task parameters: you will see that task parameters such as retries, timeouts and so on are not options in Orchestra. While we support configurability, we have already optimised these parameters at the task level to avoid users having to tweak these manually.

Alerting: alerting should be added to Orchestra Pipelines in the pipeline builder. More information can be found here.

Adding alerting to Orchestra

Lineage: Orchestra displays granular end-to-end lineage for every DAG run. You should confirm this is correct for your data pipeline runs. To check this, simply head to the main page and click on the Pipeline.

How to view the lineage for the pipeline’s latest run
Coalesce end-to-end lineage

Non-Technical Migration Steps

  1. Establish a timeline for migration: this is crucial to do. Without a timeline, there is a risk of data teams having to maintain multiple systems which will create pain.
  2. Ensure “expand and contract” is feasible: we recommend expanding and contracting as this is by far the most robust way to ensure minimal disruption to existing processes. If you need a faster implementation, please get in touch.
  3. Evaluate Technical Gaps: Orchestra works well with a modular environment, and is not responsible for executing any python or dbt-core code. This means if you are currently using Shipyard for these purposes, you may need to identify these technical gaps.
    For Python, Orchestra supports supports you running this on cloud infrastructure likeAWS EC2/ECS, Azure Virtual Machines, GCP, Databricks, Prefect and Snowpark
    For dbt-core, Orchestra supports you running this on cloud infrastructure like AWS EC2/ECS and Azure Virtual machines

Conclusion

Migrating from Shipyard to Orchestra is easy and straightforward. As the best-in-class unified control plane for data operations, the experience of building Data Pipelines is even more seamless in Orchestra thanks to higher-level abstractions and more out-of-the-box features such as lineage, alerting, data quality monitoring and the Observability UI.

It will be important for organisations to consider how existing data SLAs may be impacted by migration. Where SLAs must be adhered to, it is imperative to adopt an “expand and contract” pattern under a decided timeline to ensure minimal disruption to data teams.

📜 To learn more about Orchestra, visit our website

🚀 For accelerated implementation, get in touch.

💪 To get started yourself today, get on to the portal here.

📗 Find our docs here.

Find out more about Orchestra

Orchestra is a platform for getting the most value out of your data as humanely possible. It’s also a feature-rich orchestration tool, and can solve for multiple use-cases and solutions. Our docs are here, but why not also check out our integrations — we manage these so you can get started with your pipelines instantly. We also have a blog, written by the Orchestra team + guest writers, and some whitepapers for more in-depth reads.

Appendix 1: Exporting Your Shipyard Data

The original migration guide from the Shipyard team can be found here.

In order to facilitate the swift transition from Shipyard to another external platform our team first recommends exporting records of your activity in the platform.

Generate an API Key

Follow the instructions in our documentation to generate a new API key. If you already have a Shipyard API key, you can skip this step.

Creating a New Fleet

We will be walking through steps to simultaneously:

  • Export the last X days of logs as a CSV, where X is equal to the log limits of your current subscription plan.
  • Export the YAML configuration files for all Fleets in your organization.

After exporting the contents of our Logs and YAMLs, we will be zipping the resulting files and sending them via email. You can also choose to upload these file contents to any other location that you choose that can store files (Google Drive, OneDrive, S3, Box, etc.)

Option 1 — Use a One-Click Deployment Link

  1. Click this link. This will take you to the Fleet builder with a Fleet 90% of the way set up.
  2. Click the “Export Logs” Vessel. Enter your Shipyard API Key.
  3. Click the “Export YAML Configs” Vessel. Enter your Shipyard API Key.
  4. Click the “Send Message with File” Vessel. Add the emails you want to send a message to under the To field.
  5. Click Save & Finish.
  6. Click “Run Your Fleet”

Option 2 — Build from Scratch

Start out by clicking the “+ New Fleet” button on the sidebar. For each Blueprint listed below, you’ll need to click the “Add Vessel” button in the top-left of the canvas.

Step 1 — Export Logs

  1. Search for “Export Logs” and find the option listed under “Shipyard API”.
  2. Click the name to add this Blueprint.
  3. Provide your Shipyard API Key. All other input fields can be left blank.

This Vessel will export the last X days of logs, where X is equal to the log limits of your current subscription plan. The metadata of your logs will be stored as shipyard_logs.csv, located in the current working directory.

Step 2 — Export YAML

  1. Search for “Export YAML” and find the option listed under “Shipyard API”.
  2. Click the name to add this Blueprint.
  3. Provide your Shipyard API Key. All other input fields can be left blank.

This Vessel will export all Fleet YAML configurations in your organization. The file contents will be downloaded into a folder structure of ./shipyard_yaml/{org_name}/{project_name}. The project directory will contain all of the Fleets that lived under the project as YAML files named {fleet_name}.yaml.

Step 3 — Compress Files

  1. Search for “Compress Files” and find the option listed under “File Manipulation”.
  2. Click the name to add this Blueprint.
  3. Fill out the inputs as follows:
  4. Change the “Local File Name Match Type” to Regex.
  5. Set the “Local File Name” to shipyard_.*
  6. Set the “New File Name” to shipyard_exports.zip

Step 4 — Send Files Externally (via Email)

  1. Search for “Send Message” and find the option titled “Send Message with File” listed under “Email”.
  2. Click the name to add this Blueprint.
  3. Fill out the inputs as follows:
  4. Set the “SMTP Host” to smtp.gmail.com
  5. Set the “SMTP Port” to 587
  6. Add your own email to the “To” field.
  7. Set the “Subject” to anything you like.
  8. Set the “Message” to anything you’d like.
  9. Set the “File Name” to shipyards_exports.zip

Step 5 — Connect the Vessels Together

  1. Drag from the circles on both the “Export Logs” and “Export YAML” Vessels to the “Compress Files” Vessel.
  2. Drag from the circles on the “Compress Files” Vessel to the “Send Message” Vessel.
  3. Click Save & Finish.
  4. Click “Run Your Fleet”

--

--

I write on Data engineering and the coolest data stuff. CEO@ Orchestra, the best-in-class data pipeline management platform. https://app.getorchestra.io/signup