HIVE — know the unknown

Ramprakash
2 min readJul 25, 2020

--

1. Is Every query in hive will trigger map reduce job?

No.

When you run select * from table, it won’t trigger map reduce job. This is because here we are not performing any operation, just reading all the records in a table. So, it just performs fetch task like this; hdfs dfs -cat tablelocation. This applies when we use LIMIT operator also.

2. Is both of this queries results same?

--Statement 1
Create table new as select * from old where 1=0;
--Statement 2
Create table new like old;

No.

If old table is normal table, the result will be same. But if old table is partitioned table, the result will not be same. In statement 1, new table will be created without partition, but in statement 2 it will be created with partition.

3. Will it load, if we load wrong data to a hive table?

Yes.

When we load data to a hive table, it just moves the specified file into the hive table location. It won’t read data, so no errors will be generated while loading but when you try to read data, it throws error. Hive is like Schema on read type i.e it validates data with schema while reading unlike regular RDBMS which do while writing(Schema on write)

4. Why ORC with hive gave better performance than parquet?

Both ORC and Parquet formats stores the data in columnar format and both has compression methods.

But there is one difference. Hive has vectorization which supports only ORC file format. By enabling vectorization, hive can process 1024 rows at once. By default hive process one row at a time. Also, ORC file format has indexing in each block.

Also to support ACID transactions in hive your file format should be ORC , table should be transaction and bucket.

I will add few more points, Hope you had a quality time.

Thanks

--

--