Do you know you can use Variables in Hive?
Have you every wondered if you can use variables in the Apache Hive database?
Hive variables are key-value pairs that can be set using the set command and they can be used in scripts and Hive SQL. The values of the variables in Hive scripts are substituted during the query construct.
In this article, I will explain Hive variables, how to create and set values to the variables and use them on Hive QL and scripts, and finally pass them through the command line.
What are the Hive variables?
When working with Hive QL and scripts we are often required to use specific values for each environment, and hard-coding these values on code is not a good practice as the values change for each environment.
To overcome this tight coupling of environment-specific values within the Hive QL script code, we can externalize these by creating variables and setting values outside of the scripts.
These variables are similar to Unix variables.
You can set these variables on Hive CLI (older version), Beeline, and Hive scripts.
Note that when you set values to variables they are local to the active Hive session and these values are not visible to other sessions.
Create and Set Hive variables
Hive stores variables in four different namespaces, namespace is a way to separate variables.
- hiveconf
- hivevar
- system, and
- env.
Hive Config Variables (hiveconf)
hiveconf is the default namespace, if you don’t provide a namespace at the time of setting a variable, it will store your variable in hiveconf namespace by default. hiveconf namespace also contains several Hive default configuration variables.
The below examples set’s emp to table variable in hiveconf namespace.
0: jdbc:hive2://127.0.0.1:10000>SET table='emp';
To retrieve the variable from hiveconf
, you have to explicitly specify the namespace. Not specifying the namespace returns an error.
0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${hiveconf:table}
# Below statement returns an error
0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${table}
Hive Custom Variables (hivevar)
Hive version 0.8.0 introduced a new namespace hivevar
to set the custom variables (JIRA HIVE-2020), this separates custom variables from Hive default config variables. Hive recommends using hivevar
explicitly for custom variables.
The below example sets the emp value to the table variable in hivevar
namespace.
0: jdbc:hive2://127.0.0.1:10000>SET hivevar:table='emp';
To retrieve values from hivevar
namespace, you can either specify hivevar
namespace or ignore it as hivevar
is a default namespace for retrieval.
0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${table}
(or)
0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${hivevar:table}
Hive System Variables (system)
Hive default provides certain system variables and all system variables can be accessed in Hive using system
namespace. For example.
system:java.version
system:user.timezone
system:user.home
Hive Environment Variables (env)
Hive also default provides certain environment variables and all environment variables can be accessed in Hive using env
namespace. For example
env:PWD
env:USER
3. Set Variables in Hive Scripts
Hive scripts support using all variables explained above, you can use any of these along with their namespace.
let’s create test.hql
file with the below content. If you notice, I am referring to the table name from hivevar
namespace.
0: jdbc:hive2://127.0.0.1:10000>SELECT * FROM ${hivevar:table};
Execute the test.hql
script by running the below command.
bin/beeline -u jdbc:hive2://127.0.0.1:10000 scott tiger -f test.hql
(or)
bin/hive -f test.hql
Before executing the queries mentioned in the script, hive replaces the variables with the actual values from the namespace during the query construct stage.
4. How to set and pass command line values to Hive scripts?
Hive also supports setting a variable from the command line when starting a Hive CLI or beeline.
bin/beeline -u jdbc:hive2://127.0.0.1:10000 scott tiger --hivevar table='emp'
bin/hive --hivevar table='emp'
You can refer to this variable in Hive CLI.
0: jdbc:hive2://127.0.0.1:10000>SELECT & FROM ${hivevar:table};
You can also call test.hql
script by setting command line variables.
bin/beeline -u jdbc:hive2://127.0.0.1:10000 scott tiger --hivevar table='dept' -f test.hql
(or)
bin/hive --hivevar table='dept' -f test.hql
This should select data from the dept
table.
5. List all Hive variables
From Hive scripts you can access the environment (env), system, hive configuration, and custom variables. since there are many variables it’s hard to remember all of them, use a set to list variables in the hive
terminal
0: jdbc:hive2://127.0.0.1:10000>set;
(or)
0: jdbc:hive2://127.0.0.1:10000>set -v; #This lists all variables with namespace
This set command lists all available variables and configurations in Hive.
6. How variable substitute works?
Hive substitutes the value for a variable when a query is constructed with the variable.
- Values on Hive variables are visible only to active sessions where it’s been assigned and they cannot be accessible from another session.
7. Disabling Variable Substitution
By default Hive substitutes all variables, you can disable these using (hive.variable.substitute=true) in case you want to run a script without substitution variables.
0: jdbc:hive2://127.0.0.1:10000>set hive.variable.substitute=false;
I hope this helps!!
Happy Learning !!