Kafka connect API
Well lets start it like this ,if a reliable resource is available we can always use it without reinventing the wheel. thats the whole concept behind the concept kafka connect . we already know there are so many sources that we can import daata to kafkaa from. you are not the first person to pull data from a source and direct it to kafka.
so what is kafka connect ?
kafka connect is basically a framework that is included in apache kafka
the main functionality of it is to connect other systems to kafka
so to exchange data between kafka and another system we need to initiate kafka connectors
there are 2 types of connectors
source connectors
sink connectors
source connectors import data to kafka from a different sources .for an example to push real time tweets from twitter to kafka we can use a twitter source connector . we can pull data from so many sources including elastic search ,files and directories ,jenkins ,github ,couch DB ,apache ignite solr and manuy more .there are connectors available for each.
sink connectors on the other hand export data from kafka to a different source .this source maybe a relational database ,nosql database or even be elastic search.
therefore connectors are written by confluent or other vendors or third parties to facilitate this exchange of data between Kafka and other systems
here you will find a complete list of connectors available both sink and source.
therefore you don't need to write new code.instead you fetch the code that is already there and adjust the configurations so that it suits your purpose.
isnt it easy ?
so Kafka connect API makes life so easy and helps you connect kafka to almost any known popular system. this flexibility offered by kafka makes it so powerful.
kafka connect is all about connectors re use and simplifying getting data in an out of kafka
so our objective is to pull data from twitter to kafka
so we will need a twitter source connector
if you follow the above link way down you will find the twitter source connector and 2 links with it .i will use this one
read it carefully ,go to the confugurations section
there you will find the configurations explained
name=connector1
tasks.max=1
connector.class=com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector
# Set these required values
twitter.oauth.accessTokenSecret=
process.deletes=
filter.keywords=
kafka.status.topic=
kafka.delete.topic=
twitter.oauth.consumerSecret=
twitter.oauth.accessToken=
twitter.oauth.consumerKey=first configuration twitter.oauth.accessTokenSecret= and the last three you will receive the tokens necessary to fill these values when you register for a twitter developer account .
go here and register if you havent ,they will review your application and will allow you to extract tweets from twitter .
process.deletes=
filter.keywords=
kafka.status.topic=
kafka.delete.topic=these four we have to fill
got to the releases ,get the latest zip file
all the jars in the zip file include them in our project
in your config folder there is a file named connect-standalone.properties ,open it it will have the defaults and in the bottom in the plugin-path field set the path to the jars you added to the project
make another file twitter.properties and include these with the values
name=connector1
tasks.max=1
connector.class=com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector
# Set these required values
twitter.oauth.accessTokenSecret=
process.deletes=
filter.keywords=
kafka.status.topic=twitter_status_connect
kafka.delete.topic=
twitter.oauth.consumerSecret=
twitter.oauth.accessToken=
twitter.oauth.consumerKey=create a topic for kafka connector to write to and include it at kafka.status.topic
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic twitter_status_connect
create the other topic too
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic twitter_deletes_connect
now go ahead and add this newly created topic also in configs
name=connector1
tasks.max=1
connector.class=com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector
# Set these required values
twitter.oauth.accessTokenSecret= from twitter developer credentials
process.deletes= false
filter.keywords= cricket
kafka.status.topic=twitter_status_connect
kafka.delete.topic= twitter_deletes_connect
twitter.oauth.consumerSecret= from twitter developer credentials
twitter.oauth.accessToken= from twitter developer credentials
twitter.oauth.consumerKey= from twitter developer credentialsmade process.deletes false and keyword was set to cricket so that we pull tweets with word cricket on it
make a folder kafka-connect and put connect-standalone.properties .connectors folder with jars in it . a run.sh file where we have the commands to run and twitter.properties file.
bin/connect-standalone.sh connect-standalone.properties twitter.properties
what this command basically says is use connect-standalone.properties and then twitter.properties files
when you run this command you will see data being pulled from twitter and you will see the data from the consumer console
so how easy was it ,get your connector from confluent page ,set its properties and configurations include the jars in your project ,create the necessary topics to which twitter connector should send data ,then run your connector. walk in the park.
this is the fundamentals ,i strongly suggest you do some research yourself about Kafka connect .Its a vast area eventhough the idea sounded simple .
so good luck !! you can refer to official Kafka documentation if you want to deep dive in to this topic

