What is Predictive Analysis?

AGEDB
Bitnine Global
Published in
4 min readAug 22, 2023

The graph data model provides a “data model” in an intuitive form close to the perceptions of human brains. The intuitive data modeling of the graph database enables data modeling similar to the data model of the actual organization, and through this, it can be of great help in easily understanding and utilizing the data flow of the entire organization. This article will introduce how graph modeling enables predictive analysis.

Graph Data Modeling
When modeling from relational to graph, the modeling process is based on the variables that relational considers important. For example, after setting the account as the central node in the bank’s Fraud Detection System (FDS), it is possible to model the necessary variables according to the deposit and withdrawal process. Similarly, in a commerce site, listing necessary variables centering on a transaction (Order) and setting nodes and edges can be another method of modeling.

Let’s consider an example: how to perform GDB modeling that meets the purpose of analysis using data from Brazil’s E-Commerce platform ‘Olist’. It has information related to customer and seller information, products, orders, shipping, reviews, etc. When storing data in a relational database(RDB), it is necessary to map the complex intertwined tables using keys.

In order to graph a series of processes in which a consumer orders a product from a seller, the consumer, product, seller, and order are set as nodes. And ‘order’, which is the standard action, becomes the ego node. In addition, multiple products and sellers may exist in one order.

Property setting
Prior to the property setting, it is necessary to find out the final purpose to be performed using data. This is because modeling that can efficiently extract data according to the purpose of the analysis is necessary. The purpose of the final analysis using the data was set to ‘prediction of delivery time’, and it is divided into ‘Seller_Time’– the time it takes for the seller to ship the product, and ‘Carrier_Time’ — the time it takes for the shipper to ship the product to the customer.

Figure 1. Prediction performance property setting

Additionally, explanatory data analysis determined that the following features affect delivery time.

  1. Distance, Freight_Value

Now we need to insert the variables that affect Carrier_Time, distance, and freight_value, into the model. Since ‘distance’ indicates the distance between ‘customer’ and ‘seller’, it would be reasonable to set it as a property on the edge connecting the two nodes. However, if it is setup this way, several steps must be taken when searching based on the Ego Node, Order, which can cause issues such as increased search time and reduced performance. Therefore, we decided to set it as a property on the edge between the product and order node so that it can be quickly searched based on the central ‘Order’.

As mentioned above, since there can be multiple products in one order, this setting may cause the same data to be duplicated. However, due to the nature of graph database, it is possible to set and query properties on multiple edges, so we will proceed as is.

2. Seller_Time, Carrier_Time

Let’s start modeling Seller_Time and Carrier_Time, which are the target variables of the analysis. First, there is a way to set each variable to a node that derives from one order. In this case, it is necessary to segment the new delivery time to classify it into a specific node, and a classification analysis model is needed when predicting the delivery time for a new order.

However, in the case of delivery time, since ‘predicting’ the time corresponds to the original purpose more than classifying the section, it would be more appropriate to insert continuous data as a property rather than setting it as a separate node. Since there is one delivery time information per order based on ‘Order’ in the data, it is set as a property of the ‘Order’ node.

Figure 2. Insert delivery time data as a property of the ‘Order’ node
Figure 3. Completed GDB Model

As seen, Graph Data Modeling concisely express tables that require mapping individually. If data is loaded through the graph database, the desired information can be retrieved more easily and quickly, and the analysis model can be executed more efficiently. As industries continue to embrace data-driven decision-making, the amalgamation of graph modeling and predictive analysis will undoubtedly play a pivotal role in unraveling hidden patterns and driving informed business strategies.

Follow us for more tips in graph modeling and techniques! Need a graph-capable database management system?
Learn more from our website!

--

--