APGCE Geohack 2022 Part 1: Design Intent

Tan Nian Wei
4 min readNov 2, 2022

--

How we set up the heart of our upcoming subsurface-themed hackathon

We hosted the first Geohack a few years ago in 2019, as a side dish to the APGCE 2019 conference. Ten teams of five came together to build solutions within the intersection of machine learning and exploration geoscience. As APGCE 2022 will be held physically in Kuala Lumpur again this year, we jumped at the chance to organize Geohack once more.

As we count down one month to the event, details are being hammered out like steel in a forge. Although we are currently in the thick of things, I thought it would be a good idea to take advantage of present-day clarity and document our efforts to serve as reference for posterity. In this post, we’ll talk about how we intend to set up the heart of the hackathon: the data, the challenges, and scoring.

Disclaimer: if you are a prospective participant, take all this information with a grain of salt; things are subject to change as they are.

Data selection

Photo by Alexander Sinn on Unsplash

We want the hackathon to be unique. Thus, using data from active Malaysian oilfields in production came to mind. How many subsurface hackathons in the world let you work on data that still has commercial value?

As great as that sounds, permission to use this live data comes with strict stipulations. In order to comply with Malaysian data export regulations, we would need to ensure participants can only access the data during the event itself. But that’s just the easy part. The hard part is to ensure the participants no longer have access to the data after the event. Although it is possible to monitor data transfer throughout the event (don’t ask how), the team isn’t too keen to do that. We would know; we’ve done this in the last Geohack.

The alternative would be to use publicly available oil and gas datasets. Our lives would indeed be easier, but with the scant few datasets available out there, we have the impression that these datasets will already have been utilized for some hackathon or project one way or another.

Eventually, the thought came to us: why not both? Kaggle, the online machine learning competition platform, runs a dual scoring system where submitted models are scored on a public and private leaderboard. The public leaderboard is based on public validation data that is accessible by participants, while the private leaderboard (which decides the final ranking) is based on private validation data that is not shown to participants. We plan to emulate the concept in spirit: repurpose existing public oil and gas datasets for the hackathon, while allowing participants to run their final solution on the Malaysian field data in a controlled environment. We plan to use Dataiku to serve as a virtual data room of sorts.

We’re still tinkering with the interplay between public and private data when it comes to the challenges. Speaking of which …

Challenges

For challenge design, we decided to use a bottom-up approach by listing all possible challenges we can recall / think of, then zoomed in on a select few as a group. At the moment, we narrowed it down to two well-specific challenges and two seismic-specific challenges. We made a conscious decision to limit the number of challenges for two reasons: (i) to focus our data prep efforts better, and (ii) too many challenges makes it difficult to balance them for scoring. We hope that the challenges will be able to cater to teams of varying skillsets; ultimately we just want everyone to have fun at Geohack.

Speaking of scoring …

Scoring

I know we mentioned Kaggle earlier, but we don’t plan to score the hackathon like a Kaggle competition. We want Geohack to be open and welcoming to everybody regardless of skill level; making it a score race will defeat the purpose of the event. Instead, participants will be asked to present their solutions via pitching, preferably with a demo. The scoring criteria will take into account creativity, execution, impact, and presentation. No surprises here; our scoring will resemble a standard hackathon.

Wrapping up

The above information is as much as we can reveal at this point in time. We’ll share more as the events approaches; hope you enjoyed the peek behind the scenes! If you are interested, the event page and signup link can be found at https://icep.com.my/apgce/geohack.

--

--