My Epic 3-Day Journey to Craft a Compliance & Regulatory Framework Dashboard for a Crypto Exchange
This challenge was self-imposed upon discovering an online job posting for analytic roles that required the creation of dashboards to monitor transactions and clients’ compliance and regulatory statuses.
I had been creating dashboards for the company I worked for and thought it might be a good challenge to make a demo dashboard.
Day 1 Data Gathering…or faking….
Before I can begin working on the dashboard using my Power BI desktop app on my Mac (yes, a Mac — I installed Parallels to run Windows 11 and Power BI), I need the following:
- Crypto exchange clients' data, transaction records and trading records
- Regulatory frameworks that concern a crypto exchange the most
Obviously, the data for a crypto exchange is not readily available. That’s why I began exploring ways to simulate data and consider what types of data a typical crypto exchange would have.
- client’s personal information
- KYC information
- transitional data (fiat deposit, fiat withdrawal, crypto deposit, crypto withdrawal)
- trading records (OTC trading, crypto futures trading, crypto spot trading)
I opened a new notebook and began typing and ‘designing’ a database for a global crypto exchange.
Based on the structure and contents of the CSV files you provided, we can outline a schema diagram for the database of a crypto exchange. Here is a summary of the structure and relationships among the tables:
- Users Table: Contains information about users of the crypto exchange, including their ID, name, email, password, type (individual or enterprise), registration date, and possibly enterprise name and country.
- Crypto Table: Lists the cryptocurrencies available on the exchange, with each cryptocurrency having an ID, name, and symbol.
- Fiat Table: Lists the fiat currencies available on the exchange, with each currency having an ID, name, and symbol.
- CryptoTransaction Table: Records transactions related to cryptocurrencies, including an order ID, transaction type (deposit or withdrawal), user ID, cryptocurrency ID, the amount of cryptocurrency, and timestamp.
- FiatTransaction Table: Records transactions related to fiat currencies, including an order ID, transaction type (deposit or withdrawal), user ID, fiat ID, the amount of fiat, and timestamp.
- FuturesTrades Table: Details futures trading on the exchange, with records including a trade ID, user ID, contract type (long or short), cryptocurrency ID, crypto amount, fiat ID, fiat amount, and timestamp.
- OTCTrades Table: Details over-the-counter (OTC) trades, with records including a trade ID, trade type (buy or sell), initiating user ID, counterparty user ID, cryptocurrency ID, crypto amount, fiat amount, fiat ID, and timestamp.
- SpotTrades Table: Details spot market trades, with records including an order ID, order type (buy or sell), user ID, cryptocurrency ID, crypto amount, fiat amount, fiat ID, and timestamp.
- KYCInformation Table: Contains Know Your Customer (KYC) information for users, including a user ID, name, date of birth, country, and KYC verification status.
Master Tables:
- Users: Master table for user-related information.
- Crypto: Master table for cryptocurrency-related information.
- Fiat: Master table for fiat currency-related information.
Transaction and Trade Tables (linked to Master Tables):
- CryptoTransaction: Linked to both Users and Crypto tables.
- FiatTransaction: Linked to both Users and Fiat tables.
- FuturesTrades: Linked to Users, Crypto, and Fiat tables.
- OTCTrades: Linked to Users, Crypto, and Fiat tables.
- SpotTrades: Linked to Users, Crypto, and Fiat tables.
- KYCInformation: Linked to the Users table.
How to generate them
My first instinct was to use random number generators, but I wondered how to generate data that wasn’t merely random numbers.
From some research online, I found that some packages in Python can help generate data for training purposes.
from faker import Faker
For example, in generating the Fiat table, I used both the Faker package and random numbers.
# Fiat table
fiat_data = {
'FiatID': ['CUR' + str(i) for i in range(1, 31)],
'Name': [fake.currency_name() for _ in range(30)],
'Symbol': [fake.currency_code() for _ in range(30)],
}
fiat_df = pd.DataFrame(fiat_data)
And this is how I generate the Fiat Transaction table:
# FiatTransaction table
FiatTransaction = {
'OrderID': ['FiatTrans' + str(i) for i in range(1, num_records + 1)],
'TransType': [np.random.choice(['Fiat Deposit', 'Fiat Withdrawl']) for _ in range(num_records)],
'UserID': ['USR' + str(np.random.randint(1, num_records_user + 1)) for _ in range(num_records)],
# 'CryptocurrencyID': ['CRY' + str(np.random.randint(1, 11)) for _ in range(num_records)],
# 'CryptoAmount': [round(np.random.uniform(0.1, 100), 2) for _ in range(num_records)],
'FiatID': ['CUR' + str(np.random.randint(1, 31)) for _ in range(num_records)],
'FiatAmount': [round(np.random.uniform(100, 10000), 2) for _ in range(num_records)],
'Timestamp': [fake.date_time_between(start_date='-1y', end_date='now') for _ in range(num_records)],
}
FiatTransaction_df = pd.DataFrame(FiatTransaction)
After generating 2,000,000 records of “transactional data”, I split them into different CSV files. (At first, I wanted to use Spark to simulate a very large database, but it had lots of problems using just my laptop, so instead I changed to using 100 csv files for simulation and demonstration purposes.)
# Split the DataFrame into chunks and export each one to a CSV file
total_records = len(FiatTransaction_df) # Total number of records in the DataFrame
num_files = (total_records + num_records_per_file - 1) // num_records_per_file # Calculate the number of files needed
directory = f'{base_path}/FiatTransaction'
if not os.path.exists(directory):
os.makedirs(directory)
for i in range(num_files):
start_idx = i * num_records_per_file
end_idx = start_idx + num_records_per_file
chunk = FiatTransaction_df.iloc[start_idx:end_idx]
chunk.to_csv(f'{base_path}/FiatTransaction/FiatTransaction_{i+1:02d}.csv', index=False)
Issues encountered during generation:
When I checked the data generated, there were some problems. However, I don’t think it’s worth the time to altering the code to generate a new set of data. Afterall, the data is just for building a demo dashboard.
- The data was too ‘nice’, lacking real randomness. For example, the customer distribution was evenly split between individuals and enterprises, 50% each. In reality, I believe it would be more like 99% individuals versus 1% enterprises.
- The data did not reconcile: the balance in a particular customer’s account was not consistent with their trades. For instance, a customer might have only $8,000 but executed a trade for $10,000.
Researching Regulatory Framework
I searched for what compliance issues concern a crypto exchange operating in Canada and the US the most.
The findings below do not necessarily reflect the true business pains, but at least they can be the foundation of my demo dashboard.
- Anti-Money Laundering (AML) and Know Your Customer (KYC) Compliance:
- Use Case: Implement robust AML and KYC procedures to prevent money laundering and terrorist financing activities.
- Canada: Proceeds of Crime (Money Laundering) and Terrorist Financing Act (PCMLTFA) and its associated regulations, overseen by the Financial Transactions and Reports Analysis Centre of Canada (FINTRAC). FINTRAC
- US: Bank Secrecy Act (BSA) and Anti-Money Laundering (AML) regulations enforced by the Financial Crimes Enforcement Network (FinCEN). FinCEN - Transaction Monitoring and Suspicious Activity Reporting:
- Use Case: Develop algorithms to monitor transactions and identify suspicious activities for reporting.
- Canada: Mandatory reporting of suspicious transactions to FINTRAC. Requirements outlined in PCMLTFA regulations.
- US: Suspicious Activity Report (SAR) filing requirements under the BSA regulations administered by FinCEN. - Regulatory Compliance for Securities:
- Use Case: Ensure compliance with securities regulations for tokenized assets and digital securities trading.
- Canada: Compliance with securities laws enforced by the Canadian Securities Administrators (CSA) at both federal and provincial levels. Canadian Securities Administrators
- US: Compliance with securities laws enforced by the Securities and Exchange Commission (SEC). SEC - Tax Compliance:
- Use Case: Facilitate tax reporting and compliance for cryptocurrency transactions.
- Canada: Compliance with Canada Revenue Agency (CRA) regulations for reporting cryptocurrency transactions for tax purposes. Canada Revenue Agency
- US: Compliance with Internal Revenue Service (IRS) regulations for reporting cryptocurrency transactions and paying taxes. IRS - Data Privacy and Security:
- Use Case: Implement measures to protect customer data and ensure compliance with privacy regulations.
- Canada: Compliance with the Personal Information Protection and Electronic Documents Act (PIPEDA) for handling personal data. Office of the Privacy Commissioner of Canada
- US: Compliance with the California Consumer Privacy Act (CCPA) and other state-level privacy regulations, as well as sector-specific regulations like HIPAA for healthcare data.
For now, there’s already a lot of research and preparation work done. Let’s call it a day.
Day 2 Dashboard building
To continue from yesterday, I had all the data, but still, I needed to know what had to be built.
- What metrics can help an exchange ensure that it is in compliance?
To get the first dashboard version up I decided to choose a few regulatory frameworks to develop some metrics. Otherwise, the scope of this project would be enormous and impossible to finish in a few days.
Besides compliance-related metrics, I also added some operational metrics.
Metrics
Percentage of users with approved KYC verification:
- Formula: `(Number of users with ‘Approved’ verification status / Total number of users) * 100`
- Query: Count the number of users with ‘Approved’ verification status in the KYCInformation table.
Number of KYC verifications pending approval:
- Query: Count the number of users with ‘Pending’ verification status in the KYCInformation table.
Number of suspicious activity reports filed:
- Query: Count the number of entries in the SuspiciousActivityReport table.
Suspicious trading caught per customer and the total amount of suspicious trade:
- create a new suspicious table (see a later section)
- created a new column to map the transaction table with the suspicious activity report table
SuspiciousType = LOOKUPVALUE(SuspiciousActivityReport[Type],
SuspiciousActivityReport[OrderID],CryptoTransaction[OrderID],"Normal")
From there, I could start building the dashboard, let me open the Power BI desktop app
- Loading data into Power BI
- Build the relationship of the tables
- Create new tables
At first, when I was thinking of building a SQL server for the dashboard, it was harder than I thought if I didn’t want to spend a dime. But I came up with this SQL code before changing to use CSV file instead.
SELECT
CONCAT('R', id) AS ReportID,
UserID,
TradeID,
CURRENT_TIMESTAMP() AS ReportDate,
CASE
WHEN FiatAmount > 10000 OR CryptoAmount > 1000
THEN 'Large Transactions'
WHEN (OrderType = 'Buy' AND FiatAmount > 5000)
OR (OrderType = 'Sell' AND FiatAmount > 3000)
THEN 'Unusual Trading Patterns'
WHEN TIMESTAMPDIFF(DAY, Timestamp, CURRENT_TIMESTAMP()) <= 7
AND AccountType = 'Main'
THEN 'Rapid Account Activity'
WHEN Country != 'US' AND Country != 'Canada'
THEN 'Cross-Border Transactions'
ELSE 'Other Reasons'
END AS Reason
FROM (
SELECT
UserID,
OrderID AS TradeID,
FiatAmount,
CryptoAmount,
Timestamp,
User.AccountType,
COALESCE(Country, 'Unknown') AS Country,
OrderType
FROM spot_trades
JOIN users ON spot_trades.UserID = users.UserID
UNION ALL
SELECT
SellerUserID AS UserID,
OrderID AS TradeID,
FiatAmount,
CryptoAmount,
Timestamp,
User.AccountType,
COALESCE(Country, 'Unknown') AS Country,
OrderType
FROM otc_trades
JOIN users ON otc_trades.SellerUserID = users.UserID
UNION ALL
SELECT
UserID,
TradeID,
FiatAmount,
CryptoAmount,
Timestamp,
User.AccountType,
COALESCE(Country, 'Unknown') AS Country,
'N/A' AS OrderType
FROM futures_trades
JOIN users ON futures_trades.UserID = users.UserID
) AS combined_trades
From the idea of the SQL code, I now change to use DAX to create a new table on Power BI directly. That’s much easier for me to handle. However, it may not be possible for a real crypto exchange as there would be way more data and executing this code would be impossible on Power BI.
SuspiciousActivityReport =
VAR CryptoWithdrawals =
FILTER(
CryptoTransaction,
CryptoTransaction[TransType] = "Crypto Withdrawl" && CryptoTransaction[CryptoAmount] > 50
)
VAR CryptoDesposits =
FILTER(
CryptoTransaction,
CryptoTransaction[TransType] = "Crypto Deposit" && CryptoTransaction[CryptoAmount] > 70
)
VAR FiatDeposits =
FILTER(
FiatTransaction,
FiatTransaction[TransType] = "Fiat Deposit" && FiatTransaction[FiatAmount] > 6000
)
VAR FiatWithdrawls =
FILTER(
FiatTransaction,
FiatTransaction[TransType] = "Fiat Withdrawl" && FiatTransaction[FiatAmount] > 5000
)
RETURN
UNION(
SELECTCOLUMNS(
CryptoWithdrawals,
"OrderID", CryptoTransaction[OrderID],
"UserID", CryptoTransaction[UserID],
"Type", "High Crypto Withdrawal",
"Amount", CryptoTransaction[CryptoAmount],
"Timestamp", CryptoTransaction[Timestamp]
),
SELECTCOLUMNS(
CryptoDesposits,
"OrderID", CryptoTransaction[OrderID],
"UserID", CryptoTransaction[UserID],
"Type", "High Crypto Deposits",
"Amount", CryptoTransaction[CryptoAmount],
"Timestamp", CryptoTransaction[Timestamp]
),
SELECTCOLUMNS(
FiatDeposits,
"OrderID", FiatTransaction[OrderID],
"UserID", FiatTransaction[UserID],
"Type", "Large Fiat Deposit",
"Amount", FiatTransaction[FiatAmount],
"Timestamp", FiatTransaction[Timestamp]
),
SELECTCOLUMNS(
FiatWithdrawls,
"OrderID", FiatTransaction[OrderID],
"UserID", FiatTransaction[UserID],
"Type", "Large Fiat Withdrawls",
"Amount", FiatTransaction[FiatAmount],
"Timestamp", FiatTransaction[Timestamp]
)
)
- Create new columns and measures
The suspicious type column was mentioned above, and here are the other measures I created:
% KYC Approved =
VAR __BASELINE_VALUE =
CALCULATE(
COUNTA('kyc_information'[UserID]),
'kyc_information'[VerificationStatus] IN { "Approved" }
)
VAR __MEASURE_VALUE = COUNTA('kyc_information'[UserID])
RETURN
DIVIDE(__BASELINE_VALUE,__MEASURE_VALUE)
KYC Pending Users =
CALCULATE(
COUNTA('kyc_information'[UserID]),
'kyc_information'[VerificationStatus] IN { "Pending" }
)
KYC Rejected Users =
CALCULATE(
COUNTA('kyc_information'[UserID]),
'kyc_information'[VerificationStatus] IN { "Rejected" }
)
After typing all the formulas and dragging the widgets, it’s time to change the dashboard to dark mode because it just looks fancier. I like dark because it makes the information more apparent, allowing viewers to focus on the data instead of facing a big white illuminated screen. I referenced the design discussed in this blog post.
Day 3 Publish, fine-tune and present it
Free account? Yes.. but Power BI does not allow you to share your dashboard with others unless you have Premium. Therefore the best I can do is to use screenshots.
Also writing this blog post proves that I actually did the work.
Fine-tuning: Some visuals got screwed up after publishing (this also happens a lot in my previous job), I enlarged some widgets, and made the fonts smaller.
The full layout
For all the data and the Jupyter notebook used for generating the data, please refer to Github.
Learning points
It is time to reflect on the project…
How I can do it faster and how to start
At the same time when I was working on this project, I was also reading some time management books and watching some YouTube videos. (This is my time management practice. There is a lot of idle time when loading data, and publishing the dashboard).
Specifically for this project, I learned and applied the principle of
asking the right question - “now”, not “how”
It’s commonly heard that you should ask the right question when you plan to do one thing, you should ask who can get the job done but not how. But this time I am the one who will be working on the task, so a better question would be “Am I going to do that NOW? If not then do it now.”
Any analysis ahead of action is purely speculation.
I didn’t know what data I needed until I literally pulled up a paper on my desk and started drafting the dashboard.
The sooner I roll up my sleeves, the clearer the task is. Action really clarifies everything.
Subject matter expert matters
Can someone who wrote excellent SQLs and has built millions of dashboards be better than myself in completing this dashboard? I don’t think so.
If he doesn't understand how a crypto exchange operates, he will be unable to create all the simulated data. If someone is not from a financial, technical, and legal background combined, it’s impossible to figure out what is needed in the dashboard.
Thanks to my experience in financial regulatory frameworks, banking practice, business laws, cryptos, dashboard building, and database building, all make it easier to complete this demo project.
̶C̶o̶n̶c̶l̶u̶s̶i̶o̶n̶ ̶ Continous Improvements
There is no end to a project, only people giving it up.
There is much to do to improve the dashboard.
- synchronizing color tone
- creating more matrics
- creating a summary dashboard showing the most important metrics on the first page, other pages for drilling down the data
- generating better data
- adding more metrics
- …