Who Owned That Bitcoin Before You?
Traversing bitcoin’s public ledger in Python, for beginners
A few weeks ago I was camping in Joshua Tree with my sister and my brother-in-law, and there were three things we talked about:
- cryptocurrency
- learning how to code
- other stuff
Although my sister greatly preferred #3 from the above list, this blog post is about 1 and 2, cryptocurrencies and learning how to code.
Learning how to code is hard
It took me several tries to learn how to code, spaced out over 8 years. My first attempt was the book Programming for Dummies, which I checked out from the library when I was in fifth grade. I quickly felt overwhelmed and ashamed, finding that a book written for “dummies” was actually quite challenging for me.
Every attempt after that, save one, was similar: I wanted to “get into coding,” but didn’t have a clear idea of what I would do with my beginner-level capabilities once I acquired them. As a result, I would always get frustrated and give up, not really seeing the point of the obscure, confusing stuff I was learning anyway.
Once I had a good motivator, though, everything changed. There was a university class that promised a cool robotics lab, and I could take it if I could just show some basic programming skills. Plus, all my friends were taking it. There was a pre-requisite class, but it looked boring. I tried one more time to teach myself how to code… and it’s my job now, and I love it. (Also, the class was really fun.)
This article == self-learning motivation
This article is written for people like my brother-in-law — you want to learn how to code, but you need a fun project to help motivate you. Also, for you, blockchain counts as “fun.”
This article != Python tutorial
There are a LOT of good Python tutorials out there, explaining how to use the basic data structures and tools the language offers. This article doesn’t do that. Instead, I hope to offer an interesting project that people can use to motivate themselves to learn stuff that can otherwise feel kind of dry and boring — like how to access fields on a Python dictionary, for instance.
Feel free to flip back and forth between this article and a more standard Python tutorial, like the one on the Python website or Codecademy. You can read some of this, get to something you don’t know how to do, read some more of the Python tutorial with renewed interest, find the treasure, recite the incantation, feel the magic flow through you … and complete the project! That’s the dream, anyway.
The story
It’s more fun to work on a “story” than a “problem” or, worse, a “ticket.” (This is a big deal in Agile Development, which I won’t get into right now.) The story of this project is:
You just sold something on craigslist to a guy who insisted on paying you in bitcoin. You thought sure, why not, so you created a wallet, looked at some app saying he paid you, and drove away feeling trendy. Later though, doubts crept in. Why wasn’t Venmo good enough? Did the buyer have something to hide? You Google it and find out about the Silk Road, and your guilt multiplies. What if that fraction-of-a-bitcoin sitting in your account is dirty money?
Fortunately, bitcoin is a “distributed public ledger,” meaning that the history of your coin is public information. The wallet your coin came from has all the credits it has ever received encoded in the blockchain for all to see. All you have to do is find an address you’re worried about — this Silk Road wallet, for example — and the transaction hash of your incoming payment, and you can verify whether there’s a chain of transactions between them.
There are a bunch of caveats to this, of course, but whatever. It’s a fun project that you can do with basic Python.
Designing a solution
One of the hardest parts of designing a technical solution to a real-world problem is knowing where to start. I can help with that.
You want to traverse bitcoin’s public ledger to prove or disprove that bitcoin from some transaction originated from a sketchy address.
To do this, you’ll need:
- An API that you can provide with a transaction hash, and request two things from in return: the sender address, and the “previous transaction(s)” indicating where the sender got the money.
- The ability to parse an API response, store the information, and query the API again to keep traversing the blockchain an arbitrary number of steps.
That may sound like a lot. If you’re just learning programming, you probably haven’t used a web API before. Don’t give up just yet — I’ll try to break down the solution process further and share my code, which you can use if you get stuck.
Try to solve each section yourself before reading my code. It feels like a paradox sometimes, but learning how to code requires you to actively write code — just reading other people’s code won’t get you very far.
Finding the right API
We want an API that not only gives us the “from” addresses, but also gives us the “previous transaction hash,” which is a unique identifier of where the sender got the money to begin with. For example, if Alice paid Bob 1BTC minus fees, we’d expect something like this to show up in the public ledger:
{
'hash': 'this_transaction_hash',
'total': 9950000,
'fees': 50000,
'inputs': [{
'prev_hash': 'prev_transaction_hash',
'output_value': 10000000,
'addresses': ['Alice_wallet_address'],
}],
'outputs': [{
'value': 9950000,
'addresses': ['Bob_wallet_address'],
}]
}
(Note that the example is simplified. There are actually many more fields in a transaction object, and uglier addresses and transaction hashes. If you’re interested in what they are, or learning more about how Bitcoin works in general, I recommend Mastering Bitcoin.)
I’ll save you the trouble of finding a website that offers the right API— Blockcypher gives us what we need. (To find it, I just Googled bitcoin APIs and read the documentation for the top few in the results.)
Step Zero: boring installation
To code your own solution, you’ll need to install Python.
You should also use virtualenv for all your Python projects, though it isn’t strictly necessary for this one.
Optionally, you can use jupyter if you’re interested in trying out a useful data science tool that lets you run a few lines of Python at a time. Jupyter works with virtual environments — you can add your virtualenv to your jupyter config by running ipython kernel install --user --name=your_env_name
from your command line.
Open up a python file in your preferred IDE (Python comes with IDLE, so you can use that for now, but I prefer PyCharm), or if you’re justing jupyter, type jupyter notebook
from the command line (more detailed jupyter instructions here).
Step One: query the API for one transaction
Yay, we get to write some code!
Remember, the end goal is to walk back in time and figure out where your bitcoin came from. The first step will be to query Blockcypher’s API for information about a single transaction.
Feel free to try this with your own transaction hash! But if you haven’t actually used bitcoin before, or the transaction hashes are annoying to find with your wallet provider, you can use this transaction hash instead: 0627052b6f28912f2703066a912ea577f2ce4da4caa5a5fbd8a57286c345c2f2
The URL to query is https://api.blockcypher.com/v1/btc/main/txs/YOUR_TRANSACTION_HASH
. (replace YOUR_TRANSACTION_HASH with your transaction hash).
Take a moment to try querying the API in Python before scrolling past the … below. You may need to Google how to do that, and that’s ok! Googling is part of the coding process.
Step One Solution
Here’s my code for querying Blockcypher’s API for a single transaction:
import requestsEXAMPLE_TRANSACTION_HASH = '0627052b6f28912f2703066a912ea577f2ce4da4caa5a5fbd8a57286c345c2f2'
response = requests.get('https://api.blockcypher.com/v1/btc/main/txs/' + EXAMPLE_TRANSACTION_HASH)
print(response.json())
See, not bad! I imported requests
, a Python module for sending HTTP requests. Then I sent a GET request to Blockcypher’s API, and provided the transaction hash for the transaction I am interested in. I printed the “json” of the response, which contains the info sent back from Blockcypher for a successful response.
Note if you got an error like ModuleNotFoundError: No module named ‘requests’
when you ran this, you need to install requests
by typing pip install requests
from the command line.
The result is kinda ugly, but if you paste it into a JSON formatter like this one it’s more readable. For the example transaction hash it looks like this:
{
"block_hash":"0000000000000001b6b9a13b095e96db41c4a928b97ef2d944a9b31b2cc7bdc4",
"block_height":277316,
"block_index":64,
"hash":"0627052b6f28912f2703066a912ea577f2ce4da4caa5a5fbd8a57286c345c2f2",
"addresses":[
"1Cdid9KFAaatwczBwBttQcwXYCpvK8h7FK",
"1GdK9UzpHBzqzX2A9JFP3Di4weBwqgmoQA"
],
"total":9950000,
"fees":50000,
"size":258,
"vsize":258,
"preference":"high",
"confirmed":"2013-12-27T23:11:54Z",
"received":"2013-12-27T23:11:54Z",
"ver":1,
"double_spend":false,
"vin_sz":1,
"vout_sz":2,
"confirmations":398396,
"confidence":1,
"inputs":[
{
"prev_hash":"7957a35fe64f80d234d76d83a2a8f1a0d8149a41d81de548f0a65a8a999f6f18",
"output_index":0,
"script":"483045022100884d142d86652a3f47ba4746ec719bbfbd040a570b1deccbb6498c75c4ae24cb02204b9f039ff08df09cbe9f6addac960298cad530a863ea8f53982c09db8f6e381301410484ecc0d46f1918b30928fa0e4ed99f16a0fb4fde0735e7ade8416ab9fe423cc5412336376789d172787ec3457eee41c04f4938de5cc17b4a10fa336a8d752adf",
"output_value":10000000,
"sequence":4294967295,
"addresses":[
"1Cdid9KFAaatwczBwBttQcwXYCpvK8h7FK"
],
"script_type":"pay-to-pubkey-hash",
"age":277298
}
],
"outputs":[
{
"value":1500000,
"script":"76a914ab68025513c3dbd2f7b92a94e0581f5d50f654e788ac",
"addresses":[
"1GdK9UzpHBzqzX2A9JFP3Di4weBwqgmoQA"
],
"script_type":"pay-to-pubkey-hash"
},
{
"value":8450000,
"script":"76a9147f9b1a7fb68d60c536c2fd8aeaa53a8f3cc025a888ac",
"addresses":[
"1Cdid9KFAaatwczBwBttQcwXYCpvK8h7FK"
],
"script_type":"pay-to-pubkey-hash"
}
]
}
Printed out in its entirety, it looks intimidating — but remember, all we care about are two things:
- The
prev_hash
field telling you where the sender got the money, and - The
addresses
field telling you who the sender was.
Step Two: Get Previous Transaction Hashes from API Response
We need to be able to traverse the public ledger backwards in time. So for each transaction, we need to get all the “previous transaction” hashes.
What’s a previous transaction again?
When Alice pays Bob, the “previous transaction” is the time Alice got the 1BTC from Chris that she is now paying to Bob. She may also have received 0.5BTC from Chris, and 0.5BTC from Diane, that she is combining to pay Bob — in this case there are multiple “previous transactions.”
So how do we get previous transaction hashes from the API response?
Take a minute to try this on your own before scrolling past the ellipses below. Hint: you’ll need to know something about dictionaries, “for loops”, and lists.
Ok, here’s my code for getting a list of previous transaction hashes from a Blockcypher transactions API response.
transaction_info = response.json() # Blockcypher api response
inputs = transaction_info['inputs']
prev_hashes = []
for elt in inputs:
prev_hashes.append(elt['prev_hash'])print(prev_hashes)
Note that the word “elt” is short for “element.”
If you run my codeafter assigning the response
variable to a Blockcypher api response, you should see something like this printed to the console:
['7957a35fe64f80d234d76d83a2a8f1a0d8149a41d81de548f0a65a8a999f6f18']
Woohoo! You traced your bitcoin back one transaction!
Step Three: Get Sender Wallet Address from API Response
The previous transaction hash is nice and all, but you want to know who the sender was. Otherwise, what’s the point of traversing back through the public ledger if you can’t see whose hands your money passed through?
For this step, try to get the sender’s wallet address from the API response. Hint: your solution should look pretty similar to the code from Step 2…
Here’s how I got the sender’s address:
transaction_info = response.json() # Blockcypher api response
inputs = transaction_info['inputs']sender_addresses = []
for elt in inputs:
sender_addresses += elt['addresses']print(sender_addresses)
There’s a small difference in how I did this from how I collected the previous transaction hashes. In particular, instead of using append
to add bitcoin senders’ wallet addresses to my list, I used +=
. To see why, try changing it to
sender_addresses.append(elt['addresses'])
and run the code again to see what happens. I won’t give any spoilers here; it would ruin the fun.
Step Four: Repeat!
This next step is to iterate on the previous three steps, namely:
- Query the Blockcypher API for information about a transaction,
- get the previous transaction hashes from the transaction info
- get the sender addresses from the transaction info.
Notice that since step 2 gives us previous transaction hashes, we can query the Blockcypher API all over again for each of those transaction hashes, and then we can theoretically repeat the process indefinitely until we reach the beginning of your bitcoin’s lifecycle (when it was “minted”).
More likely though, we can only repeat our querying until Blockcypher gets annoyed with us for using their API too many times in a short time span. This is called “getting rate limited.” If you’re wondering when exactly this would happen, the answer is here. TLDR: don’t call their API more than 5 times per second or 600 times per hour.
One way you can complete this step is just dump the code from the previous steps into a big loop. I urge you to not do that. It will be a pain to read, and even more painful to debug. Instead, try writing functions for each of the previous steps:
- A function to query the Blockcypher API,
- a function to get the previous transaction hashes from a Blockcypher API response,
- and another function to get the sender addresses from a Blockcypher API response.
Then you can call those functions from inside a loop (or, if you’re virtuous, another function!) to query the Blockcypher API several times and traverse the public ledger.
Again, please be respectful of Blockcypher’s rules on how often you can query their API without paying for it. You may find this code useful for avoiding querying an API more than 5 times per second:
import timedef intentionally_slow_function():
time.sleep(0.2) # waits for 0.2 seconds
... # does other stuff
One more hint: you may want to use a “queue” of transaction hashes that you want to query Blockcypher’s API for one at a time.
Ok, now go give it a shot! Let’s see what you come up with :)
Alright, here’s how I completed this step, where we traversed the public ledger and collected a list of wallet addresses your bitcoin passed through on its way to you.
num_transactions_back = 0
addresses_seen = set()
transaction_queue = [EXAMPLE_TRANSACTION_HASH]
while num_transactions_back < 10 and len(transaction_queue) > 0:
transaction_info = get_transaction_info_blockcypher(transaction_queue.pop(0))
transaction_queue += get_prev_hashes(transaction_info)
sender_addresses = get_sender_addresses(transaction_info)
addresses_seen.update(sender_addresses)
num_transactions_back += 1
My algorithm is pretty simple. Inside the loop, I:
- pop a transaction hash off a queue
- query blockcypher’s api
- add the sender addresses to a set
- add previous transaction hashes to a queue to explore next
I exit the loop as soon as:
- I have queried Blockcypher 10 times, or
- my queue is empty
I am only traversing the public ledger 10 steps back because I want to be able to run my code several times per hour before I get rate limited by Blockcypher. If my bitcoin went through more than 10 people since it was in a sketchy person’s hands, I guess I’m fine with that. (If I wasn’t fine with it though, I could always pay Blockcypher for the ability to dig deeper, or download the public ledger myself.)
If you’re wondering where these magical functions like get_transaction_info_blockcypher
, get_prev_hashes
, and get_sender_addresses
come from, good question! I defined them earlier in my code, and I leave their implementation as an exercise to the reader :). (It should be pretty easy to piece together from reading steps 1, 2, and 3 above.)
Step Five: Check if payer was sketchy
The last step is to use the set (or list) of wallet addresses your money has passed through and compare it to a value of some wallet that you are hoping wasn’t included in that set.
Since we only went back ten steps in the example, you could theoretically just do this with a visual scan. But, for the sake of argument, let’s say your bitcoin went through MANY wallets before it got to yours, and you cared to check every single one of those against a known “bad wallet” address. You can use 1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx
as the bad address for this step. (It’s reportedly linked to the Silk Road.)
This step should be pretty straightforward if you’ve made it this far. Go ahead, give it a shot!
SILK_ROAD_ADDRESS = '1HQ3Go3ggs8pFnXuHVHRytPCq5fGG8Hbhx'funds_from_silk_road = SILK_ROAD_ADDRESS in addresses_seen
print(f'You do {"" if funds_from_silk_road else "not (as far as we know) "}have funds from silk road!')
Ok, I got a bit fancy there and used an “f-string” and a conditional expression. But the idea was simple: my program checked if a string was in a set of strings, and used that to decide what to print out as a result.
What just happened?
In this project, we used a few basic Python programming concepts to do something cool: to check whether some bitcoin we received was linked to the Silk Road (at least, within the last few transactions).
If you enjoyed this project, and you’re hoping to ride the wave of motivation to learn more about coding and/or cryptocurrencies, I see a couple ways to extend it that could be fun:
- Given a pair of wallet addresses, determine whether the two parties ever transacted with each other (via those particular wallet addresses).
- Given a pair of wallet addresses, determine how many “degrees of separation” there are between the wallets. (Oh look, we have the same bitcoin great-grandparent! Did you sell a craigslist item to Tony back in 2018 too?)
- Produce a plot of the “leaky bucket” of bitcoin transaction fees from the last 10 or so transactions tracing back from when your bitcoin arrived in your account.
I’m sure there are more fun projects in this area that I haven’t thought of — if you think of any, please comment and let me know!
Additionally, if you had trouble understanding any part of this article, please let me know so I can rethink, revise, and improve it for other learners.
Happy coding!