Hive commands

Varchasa Aggarwal
Apr 19 · 7 min read
Photo by Markus Spiske on Unsplash

For basics of Hive read my medium article, Hadoop-Hive.

Basics of Hadoop-Hive article consists of some topics listed below.

  • Architecture of Hive
  • Functionality of Hive
  • Limitations of Hive
  • Hive Data types
  • String types
  • Complex types.

And, this article consists of Hive commands.

Create Database

  • This command will create a database.
hive> create database <database-name>;
For Ex - create database demo;
  • This command will show all databases that are present.
hive> show databases; 
  • This command will only create a database if it is not present.
hive> create database if not exists <database-name>;
For ex - create database if not exists demo;
  • Assigning properties with the database in the form of key-value pair.
hive> create the database demo
> WITH DBPROPERTIES ('creator'='Varchasa Aggarwal', 'date'='18-04-2021');
  • Let’s retrieve the information associated with the database.
hive> describe database extended demo;

Drop Database

Delete a defined database.

  • This command will delete a database.
hive> drop database demo;
  • To check that database is deleted or not.
hive> show databases;
  • Drop database if and only if it exists.
hive> drop database if exists demo;
  • In Hive, it is not allowed to drop the database that contains the tables directly. In such a case, we can drop the database either by dropping tables first or use Cascade keyword with the command.
  • Let’s see the cascade command used to drop the database:-
hive> drop database if exists demo cascade. 

This command automatically drops the table present in the database first.

Hive — Create Table

In hive, we have two types of table —

  • Internal table
  • External table

Internal Table

  • The internal tables are also called managed tables as the lifecycle of their data is controlled by the Hive.
  • By default, these tables are stored in a subdirectory under the directory defined by hive.metastore.warehouse.dir (i.e. /user/hive/warehouse).
hive> create table demo.employee(Id int, Name string, Salary float)
> row format delimited
> fields terminated by ',';

Here, the command also includes the information that the data is separated by ‘,’.

  • Let’s see the metadata of the created table.
hive> describe demo.employee
  • Let’s create a table if it not exists.
hive> create table if not exists demo.employee(Id int, Name string, Salary float)
> row format delimited
> fields terminated by ','
  • While creating a table, we can add the comments to the columns and can also define the table properties.
hive> create table demo.new_employee(Id int comment 'Employee Id' Name string comment 'Employee Name', Salary float comment 'Employee Salary') comment 'Table Description' TBLProperties ('creator'='Varchasa Aggarwal','created at'='18-04-2021');
  • Let’s see the metadata of the created table.
hive> describe new_employee;
  • Hive allows creating a new table by using the schema of an existing table.

Schema is the skeleton structure that represents the logical view of the entire database. It defines how the data is organized and how the relations among them are associated.

hive> create table if not exists demo.copy_employee like demo.employee;

Here, we can say that the new table is a copy of an existing table.

Hive — Load Data

Once the internal table has been created, the next step is to load the data into it.

  • Let’s load the data of the file into the database by using the following command: -
load data local inpath '/home/<username>/hive/emp_details'into table demo.employee;select * from.demo.employee;

Hive — Drop Table

Let’s delete a specific table from the database.

hive> show databases;
hive> use demo;
hive> show tables;
hive> drop table new_employee;
hive> show tables;

Hive — Alter table

In Hive, we can perform modifications in the existing table like changing the table name, column name, comments, and table properties.

  • Rename a table
hive> Alter table <old_table_name> rename to <new_table_name>

Let’s check table name changed or not.

hive> show tables;
  • Adding a column —
Alter table table_name add columns(columnName datatype);
  • Change column —
hive> Alter table_name change <old_column_name> <new_column_name> datatype;
  • Delete or replace column —
alter table employee_data replace columns( id string, first_name string, age int);

Hive — Partitioning

The partitioning in hive can be done in two ways —

  • Static partitioning
  • Dynamic partitioning

Static Partitioning

In static or manual partitioning, it is required to pass the values of partitioned columns manually while loading the data into the table. Hence, the data file doesn’t contain the partitioned columns.

hive> use test;
hive> create table student (id int, name string, age int, institute string)
> partitioned by (course string)
> row format delimited
> fields terminated by ',';
  • Let’s retrieve the information.
hive> describe student;
  • Load the data into the table and pass the values of partition columns with it by using the following command: -
hive> load data local inpath '/home/<username>/hive/student_details1' into table student 
partition(course= "python");
hive> load data local inpath '/home/<username>/hive/student_details1' into table student
partition(course= "Hadoop");
  • Now retrieve the data.
hive> select * from student;
hive> select * from student where course = 'Hadoop';

Dynamic Partitioning

In dynamic partitioning, the values of partitioned columns exist within the table. So, it is not required to pass the values of partitioned columns manually.

hive> use show;
  • Enable the dynamic partitioning.
hive> set hive.exec.dynamic.partition=true;
hive> set hive.exec.dynamic.partition.mode=nonstrict;
  • Create the dummy table.
hive> create table stud_demo(id int, name string, age int, institute string, course string)
row format delimited
fields terminated by ',';
  • Now load the data.
hive> load data local inpath '/home/<username>/hive/student_details' into table stud_demo;
  • Create a partition table.
hive> create table student_part (id int, name string, age int, institute string)
partitioned by (course string)
row format delimited
fields terminated by ',';
  • Insert the data of dummy table in the partition table.
hive> insert into student_part
partition(course)
select id, name age, institute, course
from stud_demo;
  • Now you can view the table data with the help of select command.

HiveQL — Operators

he HiveQL operators facilitate to perform various arithmetic and relational operations.

hive> use hql;
hive> create table employee (Id int, Name string , Salary float)
row format delimited
fields terminated by ',' ;
  • Now load the data.
hive> load data local inpath '/home/<username>/hive/emp_data' into table employee;
  • Fetch the data.
select * from employee;

Arithmetic Operators in Hive

  • Adding 50 to salary column.
hive> select id, name, salary + 50 from employee;
  • Substracting 50 from the salary column.
hive> select id, name, salary -50 from employee;
  • Find out the 10% salary of each employee.
hive> select id, name, salary *10 from employee;

Relational Operators in Hive

  • Fetch the details of the employee having salary>=25000.
hive> select * from employee where salary>=25000;
  • Fetch the details of the employee having salary<25000.
hive> select * from employee where salary < 25000;

Functions in Hive

hive> use hql;
hive> create table employee_data (Id int, Name string , Salary float)
row format delimited
fields terminated by ',' ;
  • Now load the data.
hive> load data local inpath '/home/<username>/hive/employee_data' into table employee;
  • Fetch the data.
select * from employee_data;

Mathematical Functions in Hive

  • Let’s see an example to fetch the square root of each employee’s salary.
hive> select Id, Name, sqrt(Salary) from employee_data ;

Aggregate Functions

  • Let’s see an example to fetch the maximum/minimum salary of an employee.
hive> select max(Salary) from employee_data;
hive> select min(Salary) from employee_data;

Other functions in Hive

  • Let’s see an example to fetch the name of each employee in uppercase.
hive> select Id, upper(Name) from employee_data;
  • Let’s see an example to fetch the name of each employee in lowercase.
hive> select Id, lower(Name) from employee_data;

GROUP BY Clause

The HQL Group By clause is used to group the data from the multiple records based on one or more column. It is generally used in conjunction with the aggregate functions (like SUM, COUNT, MIN, MAX and AVG) to perform an aggregation over each group.

hive> use hql;
hive> create table employee_data (Id int, Name string , Salary float)
row format delimited
fields terminated by ',' ;
  • Now load the data.
hive> load data local inpath '/home/<username>/hive/employee_data' into table employee;
  • Fetch the data.
select department, sum(salary) from employee_data group by department;

HAVING CLAUSE

The HQL HAVING clause is used with GROUP BY clause. Its purpose is to apply constraints on the group of data produced by GROUP BY clause. Thus, it always returns the data where the condition is TRUE.

  • Let’s fetch the sum of employee’s salary based on department having sum >= 35000 by using the following command:
hive> select department, sum(salary) from emp group by department having sum(salary)>=35000;

HiveQL — ORDER BY Clause

In HiveQL, ORDER BY clause performs a complete ordering of the query result set. Hence, the complete data is passed through a single reducer. This may take much time in the execution of large datasets. However, we can use LIMIT to minimize the sorting time.

hive> use hql;
hive> create table employee_data (Id int, Name string , Salary float)
row format delimited
fields terminated by ',' ;
  • Now load the data.
hive> load data local inpath '/home/<username>/hive/employee_data' into table employee;
  • Fetch the data.
select * from emp order by salary desc;

HiveQL — SORT BY Clause

The HiveQL SORT BY clause is an alternative of ORDER BY clause. It orders the data within each reducer. Hence, it performs the local ordering, where each reducer’s output is sorted separately. It may also give a partially ordered result.

  • Let’s fetch the data in the descending order by using the following command:
select * from emp sort by order by salary desc;

This was all about HIVE COMMANDS. Hope you have learned something from here.

Found it useful! do leave a clap. Wanna add something in this blog, add in comments.

Thank You. 🙌

Nerd For Tech

From Confusion to Clarification

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.

Varchasa Aggarwal

Written by

Machine learning enthusiast | self-learner | Python www.linkedin.com/in/varchasaaggarwal06 https://varchasaaggarwal.herokuapp.com/

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.