Why Google PubSub? — Streaming financial data thru PubSub to BigQuery
In modern cloud architecture, building and deploying applications are so much easier than ever. Also, apps are built into smaller, independent building blocks that are much easier to develop and maintain. However, along with the advantages the modern cloud offers, an important thing that needs to be considered in building and deploying apps in the modern cloud is how resilient the apps are.
One aspect of resiliency, scalability is already taken care in lots of cloud products; for example, spinning up more Compute Engines in Google Cloud can help reallocating heavy traffic coming into the apps. However, another important aspect of resiliency is the interaction between apps, meaning how heavily the apps are dependent on each other. One simple case would be that Compute Engines send data to a database that interacts with an API that handles user info. They are all heavily dependent on each other to make the apps working. However, if one single component described in the previous sentence no longer functions, the entire system does not work because they need one and another to make the system functions. To help with this aspect of resiliency, Google Cloud PubSub is a perfect choice!
In Google Cloud Platform, PubSub is a messaging queue service that allows you to send and receive messages between independent applications. PubSub plays a critical role in NOT making apps heavily rely on each other; PubSub decouples apps, meaning apps performing its task independently but still working together as a part of the system deployed in cloud. As in the previous scenario, instead of letting the apps directly interacting with each other, PubSub comes in between the database and the API sitting in App Engine or somewhere else and the database publishes messages thru PubSub and the subscriber, the API, receives messages via the same means. In this fashion, even if the subscriber breaks down, PubSub holds messages until it comes back online and sends them once it come back online, meaning they are decoupled.
Thru this independent side-project dealing with PubSub and BigQuery, I would like to share the key concepts and steps I took to use PubSub and BigQuery. Also, for demonstration, I have created a simple Python function that fabricates google stock price to serve as data to stream to PubSub and BigQuery. All of the work can be found here,
- Go to Google Cloud Console and enable APIs.
- Open your Google Cloud Shell and copy the repository.
3. This is the most important step. Do not forget! Enter the following command:
gcloud auth application-default login
Type in ‘Y’ and go to the link provided. Make sure to copy and paste the key provided after allowing the account to be accessed by the Google Cloud Console. If the Google Cloud Shell is disconnected due to lack of activity, you have to go thru this process again.
4. Go to BigQuery console and create a dataset called ‘demo_stock’ and a table called ‘google’. These names can be changed but make sure that you also change the names in the python script as well. Enter the following command:
python pubsub_to_bigquery.py --price='put any number here'
— price is an initial GOOGL stock price to start at. For example, — price=25.
5. Now, fake google stock prices are being generated and sent to BigQuery. Now go to BigQuery console.
As preview, you would not see any data streaming into BigQuery due to the streaming buffer. However, when a query is ran, you will see the data. So, do not worry!
6. A simple query can be ran on BigQuery such as
SELECT * FROM demo_stock.google ORDER BY STOCK DESC;
I hope you start grasping the concept of Google PubSub; as for me, it took a while to understand this concept and the functionality of PubSub. For more info, you can always visit https://cloud.google.com/pubsub/docs/. Thanks for reading and feel free to leave comments.
“We learn the most from a failure, not from a success”