Login to AWS , Search for EMR and Click EMR
Create security configurations: ns-emr-kdc
Choose “Enable Kerberos authentication”
Create Cluster “Go to advanced options” ns-blog-emr
Choose services that you need
You can choose spot instances if you like and press next
Choose Cluster name and add tags if needed
Important: Choose EC2 key pair that you have already or create a new one using this link.
Choose Security configuration that we created
Let’s fill up the security settings “Realm and KDC admin password”. You will need this information at the later stage during the principal creation.
Press Create Cluster
Cluster creation in progress. Click Hardware and you will need this information to ssh to EC2 instances
Hardware tab — Click Master ID
Let’s add a new user called neeraj.sab and add principal to run EMR Spark and Hive jobs. This user needs to be in all the EMR nodes.
Let’s create hdfs directory for user neeraj.sab
Let’s run pyspak job as user neeraj.sab.
demo.txt exist on hdfs under /user/neeraj.sab/
EMR Hive example
What happens if there is no user in kdc and kerberos ticket? We have OS user bosco.
User bosco logged in but beeline fails as there is no authentication for user bosco in EMR Kerberos enabled cluster
Let’s create principal called bosco and generate a kerberos ticket
Now, user bosco can run beeline and execute other operations like pyspark etc. This user needs to all the nodes of EMR.