Customized Coordinat Point Grouping

Mia Renauly
3 min readSep 27, 2017

--

There was a time, dimana gue menemukan sebuah permasalahan bisnis, yang mana untuk menyelesaikannya, gue harus membagi bagi point koordinat ke dalam beberapa grup, dan masing masing grup sudah ditentukan jumlah point nya. As an amateur data analyst, of course what I was doing is checking stackoverflow with “clustering” as the keyword to find a solution. I’ve visited some page, but found no solution which I can actually working on, like below:

Finally, I realize that I’ve been using the wrong keyword all this time. If you want to have same amount of point in each group, you are doing a grouping, not clustering (unsupervised learning). I finally able to solve the problem using simple python script and postgis query.

For example, I have a table called tb_points which has 4000 coordinate points, and you want to divide it into 10 group, where each group contains 400 coordinate points. Here is the example of the table structure

CREATE TABLE tb_points (
id SERIAL PRIMARY KEY,
outlet_id INTEGER,
longitude FLOAT,
latitide FLOAT,
group_id INTEGER
);

Then I do the following steps:

  1. Find the first coordinate that will be your starting point
  2. Find nearest coordinate from your starting point, order ascending by distance, limit the result by the number of your preferred coordinate point member (in this case 400)
  3. Update the result by updating the group_id column
  4. Do 3 steps above 10 times for the rest of data, which group_id column is still NULL

This is the implementation code in python:

import psycopg2

dbhost = ''
dbuser = ''
dbpass = ''
dbname = ''
dbport = 5432

conn = psycopg2.connect(host = dbhost,
user = dbuser,
password = dbpass,
database = dbname,
port = dbport)

def fetch(sql):
cursor = conn.cursor()
rs = None
try:
cursor.execute(sql)
rs = cursor.fetchall()
except psycopg2.Error as e:
print(e.pgerror)
rs = 'error'
cursor.close()
return rs

def execScalar(sql):
cursor = conn.cursor()
try:
cursor.execute(sql)
conn.commit()
rowsaffected = cursor.rowcount
except psycopg2.Error as e:
print(e.pgerror)
rowsaffected = -1
conn.rollback()
cursor.close()
return rowsaffected


def select_first_cluster_id():
sql = """ SELECT a.outlet_id as ori_id, a.longitude as ori_lon,
a.latitude as ori_lat, b.outlet_id as dest_id, b.longitude as
dest_lon, b.latitude as dest_lat,
ST_Distance(CAST(ST_SetSRID(ST_Point(a.longitude,a.latitude),
4326) AS geography),
CAST(ST_SetSRID(ST_Point(b.longitude,b.latitude),4326) AS
geography))
AS air_distance FROM tb_points a CROSS JOIN tb_points b WHERE
a.outlet_id != b.outlet_id and a.group_id is NULL and b.group_id
is null order by air_distance desc limit 1 """
return sql

def update_group_id(group_id, ori_id, limit_constraint):
sql = """ UPDATE tb_points
set group_id = %s
where outlet_id in
(select b.outlet_id
from tb_points a,
tb_points b
where a.outlet_id = '%s'
and a.group_id is null
and b.group_id is null
order by
ST_Distance(CAST(ST_SetSRID(ST_Point(a.longitude,a.latitude),
4326) AS geography),
CAST(ST_SetSRID(ST_Point(b.longitude,b.latitude),4326) AS
geography)) asc
limit %s)
""" % (group_id, ori_id, limit_constraint)
return sql

def clustering():
data_constraint = [100]
n = 1
while n <= 10:
sql = select_first_cluster_id()
res = fetch(sql)
ori_id = res[0][0]

sql = update_group_id(n, ori_id, data_constraint[0])
print(sql)
execScalar(sql)

n += 1

clustering()
Coordinate Grouping Result

Tadaaa. You got the result. You can also customize the code, like how many group you want to make, and the amount of point coordinate in each group. Turn out I don’t need fancy algorithm to solve the problem… lol, just need to see the problem from different perspective

--

--

Mia Renauly

Interested at Tech, Investment, and Business. Analytics writings and personal opinion