Most common errors when setting up Amazon EMR

Nowsath
2 min readNov 14, 2023

--

AWS EMR configuration with DynamoDB

In this article, I’ll guide you through resolving common errors that often arise during the configuration of Amazon EMR with DynamoDB.

Error — 1

Could not lookup table test_ddb in DynamoDB.
In this case my DyanamoDB name is: test_ddb

Insufficient permissions to access DynamoDB can lead to this kind of errors when attempting to create an external table with DynamoDB.

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: Could not lookup table test_ddb in DynamoDB.

Solution:
Add aws access key and aws secret access key as a property in the hadoop configuration file.

File name: core-site.xml
File path: /etc/hadoop/conf/core-site.xml

  <property>
<name>fs.s3.awsAccessKeyId</name>
<value>NKKIXXXXXXXXTRDQDPNG</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>TYwQnTXXXXxxxxXXXX9kvVc54</value>
</property>

In certain instances, it might be necessary to include the same properties in the tez-site.xml file too.

File path: /etc/tez/conf/tez-site.xml

Error — 2

Execution errors for any DB queries.

When querying data from the external table, this error may arise as a result of missing properties in TEZ configurations.

hive> select count(*) from ddb_testtable;
Query ID = hadoop_20231112163703_8e8fd7d7-0a00-45ff-97d6-c4cf11a58ad5
Total jobs = 1
Launching Job 1 out of 1
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask

Solution:
Add following property details in the hive configuration file.

File name: hive-site.xml
File path: /etc/hive/conf/hive-site.xml

  <property>
<name>hive.conf.hidden.list</name>
<value>javax.jdo.option.ConnectionPassword,hive.server2.keystore.password,fs.s3a.proxy.password,dfs.adls.oauth2.credential,fs.adl.oauth2.credential</value>
</property>

Error - 3

Hive Runtime Error while processing row.

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row 
at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:996)

This type of error may occur due to datatype mapping issues arising from unsupported formats in Hive.

Solution:
Set these two properties as false in the hive terminal.

set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;

Error - 4

Hive Runtime Error while processing writable.

Caused by: java.lang.NumberFormatException: For input string: "240381698172046689239"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

This kind of error can cause by datatype limitations. The number is too big to convert to an integral type. According to the Apache Hive documentation on Numeric Types, the maximum value for a BIGINT is "9223372036854775807" but the input "240381698172046689239" is larger than the limit.

Solution:
Refer Apache Hive documentation on Numeric Types to handle long numeric values.

Conclusion

These are the primary errors I encountered while setting up Amazon EMR with DynamoDB for data back filling purposes. I will continue to add any additional issues that arise in the future.

If you encounter any other issues, please feel free to mention them in the comment section.

Also, be sure to give me a follow too!!

--

--

Nowsath

DevOps/Cloud Engineer | AWS Community Builder | CKA | RHCSA | Docker & Kubernetes Enthusiast