Creating a Server on AWS as Remote and Establishing a SQL Connection for Data Manipulation

Dr. Fatih Hattatoglu
Academy Team
Published in
17 min readApr 24, 2024

In today’s data-driven world, accessing databases and manipulating data is critical, especially in data science and analytics projects. In this blog post, we will guide you on setting up a PostgreSQL database on Amazon Web Services (AWS), one of the leading names in cloud technologies, and how to connect to this database using a popular SQL client like DBeaver to manipulate data. By using an EC2 Instance on AWS to create our PostgreSQL database, remote access becomes possible, enhancing the flexibility and strength of your data analysis and processing operations.

Setting up a PostgreSQL database via an EC2 Instance on AWS provides remote access and enhances the work's scalability and adaptability for data scientists and analysts. This configuration allows for global access, enabling teams to work beyond geographical boundaries, thus fostering a more dynamic and collaborative work environment. Furthermore, using cloud-based solutions simplifies data security and backups, optimizing critical data protection and disaster recovery processes. Utilizing powerful tools like AWS and DBeaver facilitates faster and more effective manipulations of datasets, which accelerates and enhances data-driven decision-making processes. This technical infrastructure helps organizations expand their data management capacities, securing a competitive advantage.

Let’s Begin Our Journey in the World of Data Science with This Guide

As we embark on this journey into data science, let’s start with this article, which will serve as your guide. In this article, we will see a below flowchart similar to the one pictured that broadly summarizes our work. Generally, we will connect the SQL infrastructure on our local computer with the infrastructure we create on a remote AWS machine, enabling communication between them. For SQL-related operations on our local computer, we will use the DBeaver program. Of course, an SQL Database should also be installed on our computer. In this study, we will carry out database operations using PostgreSQL, though you can also use other databases. Data manipulation will be done locally using DBeaver, while we will create a server on the remote machine. As mentioned in our previous article, you must already have an AWS Free account. After installing PostgreSQL on our remote AWS machine, we will edit two configuration files to enable external access to the databases we will create.

Flowchart for this paper

Let’s Start by Creating an EC2 Instance on AWS

To begin, let’s create an EC2 instance on Amazon Web Services. Here are the steps to follow:

  1. Navigate to the EC2 service. On the page, click “Launch Instance,” which is highlighted in orange.

2. Create the Machine for Installing PostgreSQL
Next, we will create the machine where PostgreSQL will be installed. Let’s name it “postgres”:

Select an Ubuntu machine as the remote machine. Choose a free-tier machine sufficient for PostgreSQL to facilitate our smooth operations. Other free-tier machines can also be selected if preferred.

Keep the Instance Type as Default (such as below figure)

3. Choose a Key Pair

In the key pair selection, either use an existing key file or generate a new one. For instructions on how to produce a key, you can refer to our previous article.

4. Network Settings

In the network settings section, we will continue to retain the existing port 22 and additionally create port 5432 for PostgreSQL. To do this, click on the ‘Edit’ section in the upper right corner of the image, and then click on “Add Security Group” at the bottom of the page. After that, as shown in the next image, add port 5432 in the manner displayed.

5. The rest of the settings can remain in their default state. At the bottom of the final page, click “Launch Instance.” This will initiate the remote machine.

6. On the new page, you will see a confirmation that the instance has been successfully launched. In the lower right corner, click on “View all Instances.” After waiting for a moment on the resulting page, you will see the PostgreSQL instance you created.

Now that we’ve created our machine on AWS using EC2, let’s leave it as it is and move on to establishing a terminal connection. Later, through this terminal, we will connect our computer to the EC2 instance we created as the remote machine.

Why VS Code? Because on an Ubuntu system, you can very easily perform folder creation or file operations through the VS Code interface. Normally, doing these through the terminal would be more difficult. VS Code offers conveniences at this point. Alternatively, an SSH connection can also be established through the Windows terminal, as mentioned in the previous article (SSH connection is the type of connection that will link our computer to the remote machine).

VS Code Operations

  1. On VS Code, go to the extensions section and install the Remote-SSH extension. This tool will enable us to connect remotely to our EC2 instance.

Since I have previously installed it, you will see the “Install” option in the image below on your screen.

2. Connect VS Code with SSH

To do this, click on the connection icon indicated by the arrow at the bottom left of the VS Code interface, as shown in the image. This will start the process to establish an SSH connection.

In the list that opens in the middle of the VS Code window, click on the top option “Connect to Host” to proceed with establishing a connection to your remote host.

Continue by selecting “Add new Host” from the new list that appears in the same location. If you are performing this action for the second time, that is, if you have previously created a Host, then you can choose “Configure SSH Host” instead.

Enter the name for your new Host and continue by pressing Enter.

3. From the list that appears, select the top option and proceed to edit the configuration file. In this file, define the new connection port so that VS Code can establish a connection.

For the newly created Host, enter the required information into this configuration file and modify it using a text editor. Following the previous step, this will open the configuration file which allows us to modify settings for the host named “techubuntu”. This configuration file may contain information about other hosts in the future; this is not an issue. You will only need to modify the information for the host you wish to continue with.

Host techubuntu
HostName 44.201.137.117
IdentityFile ~/.ssh/streamlit4.pem
User ubuntu
ServerAliveInterval 60

In this configuration file, fill in the relevant fields as follows:

Host: The name here should be the hostname we are currently creating or have created previously.

Hostname: This part is very crucial. Let’s go back to the EC2 page on AWS where we created our instance. Copy the Public IPv4 address shown by the arrow in the image below (Note: clicking on the box above the number will copy it).

IMPORTANT: Remember to terminate your AWS instance once you are done with your tasks to avoid incurring unwanted charges. 😊

IdentityFile: In this section of the config file, you will indicate the location of the key.pem file you used while creating the AWS instance. It is preferable for this file to be located in the ssh folder as shown in the image below. The path is indicated in the config file as follows: (C:\Users\pc\.ssh). For this exercise, I used a previously created file named streamlit4.pem. As I mentioned earlier, you can either use a newly created pem file or an existing one.

The other fields in the configuration file will remain the same as they currently are.

4. Now that you have set up the configuration file in VS Code as described, save it by pressing `Ctrl+S`. This will ensure that all your changes are applied and preserved for your connection setup.

5. Now, click on the Connection button located at the bottom left corner of the VS Code screen again. This will allow you to proceed with the next steps in establishing the SSH connection.

In the options that appear at the top center of the screen, click again on the “Connect to Host” option. This will allow you to proceed with connecting to your specified host via SSH.

Now, from the list that appears, select the host named “techubuntu” that we just created. Continue by clicking on it to initiate the connection process.

In the new VSCode window that opens, continue by selecting the Linux option among the choices provided. This means we will soon be working with Linux commands to interact with the server.

The text “SSH:techubuntu” displayed in the bottom left corner of the new window indicates that we have successfully established an SSH connection to our Linux server.

From the menu of this new window, open a New Terminal by selecting the option as shown in the figure. We will conduct our server operations on AWS from this terminal. This allows you to interact directly with the AWS server using command line tools.

In this window, click on the areas indicated by arrows and select “Open Folder” to easily manage file operations on this server. This feature allows you to navigate through the server’s directory structure and access or edit files directly within VS Code, streamlining your workflow.

Now, proceed with the operations indicated in the subsequent windows one by one.

If necessary, reopen the terminal.

You can see the green line in the image below, indicating that our terminal connection has been established. This section represents the interior of our remote machine on AWS.

6. From now on, we will proceed with the following steps from this terminal screen. Firstly, we will install PostgreSQL on our remote machine because it is currently empty.

Let’s execute the command “pwd” to see the file structure of the machine.

First, let’s run the following command to download the last packages to the machine.

sudo apt update -y

Type the following command to install PostgreSQL (copy paste)

sudo apt install postgresql -y

After PostgreSQL is successfully installed, let’s check the version with the code below.

psql -V

According to the image below we can see that Postgre version 14.11 is installed

7. Now we need to make 2 important changes in the configuration with 2 files so that anyone can connect to this machine, which is a server, so that we can connect to this machine remotely. These changes will allow remote connection.

To access PostgreSQL on the AWS remote machine, let’s execute the command below and log in to PostgreSQL. The bottom line of the bottom image with the arrow indicates that you are connected to PostgreSQL. If you want to return to the Ubuntu command line from this section in the future, you will need to type \q.

sudo -u postgres psql

This problem can be created if every user connects to the database with SuperUser authorization. So let’s create USER with the code below to connect to this database. When you type the code, CREATE ROLE should appear on the bottom line.

CREATE USER techubuntu  WITH PASSWORD 'tech1234';

To see this role, let’s type \du in the command line.

Let’s write the command below to give superuser authorization to this User: (then again with the \du command and see the result at the bottom)

ALTER ROLE techubuntu SUPERUSER;

8. Now we are at the stage of creating a database. We will first create this database with Linux commands and by typing in the VSCode terminal. Then we will use DBeaver as the SQL interface to see if this database is successful or if we have successfully entered other records.
To create a database:

CREATE DATABASE mydatabase;

Then type \l to list the databases: (mydatabase appears in the list, the others are there by default)

Now we authorize the techubuntu user role we just created to perform operations (data manipulation) on mydatabase. First, let’s type the command line and enter, then list the users and their authorization status with \du

GRANT ALL PRIVILEGES ON DATABASE mydatabase TO techubuntu;
\du -->> list users

9. Now we will connect with techubuntu user information via DBeaver which we use as SQL interface. For this, let’s first log out from postgre in VSCode terminal. We will use the \q command for this purpose.

10. In order for everyone to access the remote machine, we need to change the configuration settings so that everyone can access it. For this purpose we will make mini changes in 2 configuration files. All tools have a config file and they control access from outside.

First, let’s open the 1st configuration file with the command below.

sudo vim /etc/postgresql/14/main/postgresql.conf

When the file is opened, the terminal screen looks like the one below:

In order to navigate and make changes within this configuration file, it is necessary to press the Insert key on the keyboard. In this file, we will modify a single variable, thereby allowing connections from external sources. Scrolling down the screen, we will change the ‘listen_addresses = ‘localhost’’ entry seen below to ‘*’ (asterisk). The ‘*’ symbol signifies that all machines can establish connections. If we wanted to allow access only from a single machine, we would need to specify the IP address of that machine here.

IMPORTANT NOTE: It is necessary to remove the ‘#’ symbol at the beginning of the ‘#listen_addresses’ line in the image above. Otherwise, your operation on this line will be considered invalid because this line will remain as a comment line in the file.

To save the file, we will first press ESC to exit the editing mode, and then type ‘:wq’ in the command line.”

11. Let’s modify the 2.config file. Let’s enter the following command

sudo vim /etc/postgresql/14/main/pg_hba.conf

In this file, we will now handle the configuration differently from how we specified ‘*’ earlier. We will add a new line at the bottom of the file. This line will be for ‘md5’. Let’s navigate to the bottom of the page and add the following line at the very end. To do this, we need to press the Insert key on the keyboard again to move to the next line.

host    all             all              0.0.0.0/0               md5

We have added the bottom line ourselves.

Now, let’s exit again by pressing ESC and then type “:wq” to exit from here.

12. Type the following code into the terminal to UPDATE the new settings.

sudo systemctl restart postgresql

To check the latest system status, let’s write the following code.

sudo systemctl status postgresql

13. Let’s check the status of logging into the database for the last time on the terminal screen. For this, let’s type the command below.

psql -U techubuntu -h localhost -d mydatabase

Following this, it will prompt us for a password. Let’s type “tech1234” (as we defined earlier) when prompted for the password, but note that it won’t be visible on the screen when typed. Once you enter the password and press Enter, you will see the following screen, indicating that the database login has been successful. In the subsequent steps, to return from the database to the Ubuntu command line, you will need to type “\q”. However, let’s keep the terminal as it is for now.

14. While keeping our terminal screen open, let’s now establish a connection to the database on the remote machine through DBeaver. To do this, let’s open DBeaver, which was previously installed on our computer, and click on “New Database Connection” from the indicated location in the image. Then, in the next window, let’s continue by clicking on the PostgreSQL icon.

Let’s navigate back to our AWS EC2 instance window and recopy the IPv4 address of the instance we had copied earlier.

Now, let’s enter the required information into the relevant fields as shown in the image below. In this screen, we have modified the areas indicated by arrows.

In the Host field, let’s enter the IPv4 address obtained from AWS. Let’s change the database name to “mydatabase,” and for the Username and Password fields, let’s enter the username and password we created earlier, which are “techubuntu” and “tech1234” respectively. Then, click on the “Test Connection” button at the bottom left and wait for a successful connection. Afterward, click “Finish.”

On the left side of the DBeaver screen, we can now see our remote machine’s “mydatabase” and its tables.

15. Let’s create a new SQL script in DBeaver by right-clicking on the location where our “mydatabase” connection name is displayed (SQL Editor/Open SQL Script), as shown in the image. Then, paste the following code into it to create a table.

CREATE TABLE Users (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);

Then, let’s insert some data using the following line:

INSERT INTO Users (name, email) VALUES ('John Doe', 'john.doe@example.com');

Let’s see the table we created:

SELECT * FROM Users;

16. Now, let’s verify if the actions we performed in DBeaver have indeed taken place on the remote machine. To do this, let’s enter the following code into the terminal in VS Code. When you enter the code, it will ask for the password for the user “techubuntu”. As we did before, let’s type the password without it being displayed and press Enter. (Actually, I left that part open in the previous step, so there is no need for me to rewrite it on my screen.)

psql -U techubuntu -h localhost -d mydatabase 

When we reach the “mydatabase” prompt, we can use the command `\dt` to view the tables. In fact, if we make a different change in DBeaver, when we type `\dt` again in the terminal, we can see the new change reflected once more on the terminal line.

By entering the command `SELECT * FROM users;` in the terminal command line, we can also see the data in the table. This way, we will observe that all the operations we performed in DBeaver are reflected here as well.

For exit, you can use the following command:

\q -->> quit

To terminate the connection in the terminal, click on the connection button at the bottom left of the screen in the image and select “Close Remote Connection.”

NOTE: After you have finished your work, don’t forget to go back to the AWS console, select the instance you were working on, and terminate it.

Through this exercise, we have learned how to create a remote machine, install SQL infrastructure on it, and perform data manipulation using third-party applications like DBeaver. This workflow represents a typical scenario in many industry tasks.

In this exercise, we manually managed the remote machine setup using EC2. However, AWS also offers a service called RDS, which automates all SQL-related tasks within its own infrastructure. While we have full control and responsibility for the EC2 instance, in the case of RDS, the responsibility lies with AWS. Of course, the pricing structure for these services differs, with AWS charging additional fees for maintenance and repairs. If you want to set up a Free SQL automation via RDS, you can follow the steps with the images below.

When selecting PostgreSQL in the image, AWS automatically installs PostgreSQL on any AWS machine without our intervention. We cannot customize it. By choosing the Multi-AZ option below, PostgreSQL is installed simultaneously on machines in different countries, providing a more robust infrastructure, albeit at a higher cost.

We can select the Free Tier feature in the bottom image and install the features on this page.

After making all the settings, we will go to Dbeaver again with the IP it will give and we will do the same connection operations as above.

In this article, we have seen in detail how to connect to a server on AWS remote machine via our local machine and manipulate data there.

I wish it to be useful.

--

--