Kafka — some Real world Applications

here in this blog post i will talk about few real world applications where using Kafka would be ideal . the endgame of learning Apache Kafka and its functionality is to apply it to a real world scenario to solve a particular problem. so it would be beneficial to have some understanding about how Kafka comes in to play in day today scenarios. so lets talk about some case studies here so you know what you will be using Apache Kafka for
so without further delay lets look at our first example
lets assume there is a company that offers a platform for passengers to book taxis .something like uber or pick me. how would we incorporate Kafka in the architectural design . if we made a simple design how would it look like
these are they requirements of the company
the passenger or the user should match with the close by driver
price should rise if the demand is high( users are high or drivers are low)
all the location or geological data during the ride and before and after the ride should be stored in an analytics store so that a reasonable price for the ride can be computed
if you were the solution architect how would you implement this using Kafka ?
so there are two topics
passenger_position and taxi_position
we need to know where the customers and taxis are ,so whenever they open their respective applications those data will be fed to above mentioned topics via two producers . there will be a proxy between the user app and the user_position topic which acts as a service to insert data in to kafka .therefore this is a producer
above diagram shows how the scenario plays out
then we have another topic called “pricing” , this is to include the computed price of the ride or the cost of the ride. we need taxi position and customer position both these data to calculate the price.there is a pricing computation model outside kafka.kafka streams application will read data in to this model from both position topics and will send the resultant data to the pricing topic .
we can feed it in to the user application via a consumer whom we will name in this occasion as pricing service or taxi cost service .
further the business may want data to compute various other paramenters such as they may want historic data to identify peak hours, which ares taxi s are more available likewise
therefore lets feed this positional data and pricing data to an analytics store preferably amazon s3 via kafka connect. we will name the proxy as analytics service.
remember positional data are high volume . they just keep coming in. so therefore the can have multiple producers
should be highly distributed ,probably more than 30 partitions.imagine the requests coming during new years eve or sinhala aurudu.
for the keys i chose taxi_id and passenger_id because we need data with the same id go to the same partition. we need the data in order
don't need to store data long in kafka .therefore low retention period
pricing topic comes from kafka streams application ,this topic is also high volume
lets look at another application
how would kafka behave if the application is a social media
rquirements -
this applications allows people to posts phtoos and posts and lets other s react to the photos
user should be able to post like and comment
users should see comments and likes per post in real time
further users should also be able to see trending posts
how can we implement this in kafka ?
i suggest you take a piece of paper and do this by yourself thn refer to the article again since we have already done one expamle you wont find this diffcult
this can be implemebted in many ways ,therre are no hard and fast rules but i will give you my design here
there are three topics posts,likes and comments
we need to get the user posts ,so lets get them via a posting service (producer)
for user likes and comments lets create another user/likes asnd comments service and they contribute(send data) to likes and comments topics
now these posts gets likes and comments ,therefore we need another topic .lets name it as posts_with_reactions.
we cant to this in a traditional database ,because its high volume of data flowing in. we are using kafka streams for this. kafka streams collect data from all three topics posts likes and comments computes likes and comments for each post and send the data back to the newly created topic “posts_with_reactions”.
oh wait we need another topic ..trending posts .lets assume trending posts are the posts that get most likes and most comments .so lets make another kafka streams service .it will take data from comments likes and posts topic and will calculate who has the most like or most comments or both and will return the results to the topic trending posts
lets send the data back to the users
we need a consumer ,lets name it as the current news feed service which gives the users a fresh feed with likes and comments updating real time
we need another consumer this can even be a kafka connect sink to send trending posts to the user .lets name this service as trending posts service .
finally from the two consumers data will be fed in to the website for the users to see and experience .
there maybe lot of people posting/liking/commenting at the same time. so topics can have multiple producers.
for the posts user_id can be used as the key
need a high retention period since th posts has to be kept for a while
for the likes and comments post_id will be used as the key since likes and comments for a posts should be stored at the same place for retrieval.
easy right ? no big deal .Kafka provides all you need .

