Sharing The Knowledge

Massoud Mazar

  • Join Us on Facebook!
  • Follow Us on Twitter!
  • LinkedIn
  • Subcribe to Our RSS Feed

Setting up Hadoop/Hive to use MySQL as metastore

In a previous post I showed how to setup Hadoop/Hive to use Derby in server mode as the metastore. Many believe MySQL is a better choice for such purpose, so here I'm going to show how we can configure our cluster which we created previously to use a MySQL server as the metastore for Hive.

First we need to install MySQL. In this scenario, I'm going to install MySQL on our Master node, which is named centos1.

When logged in as root user:

yum install mysql-server

Now make sure MySQL server is started:

/etc/init.d/mysqld start

Next, I'm going to create a new MySQL user for hadoop/hive:

mysql> CREATE USER 'hadoop'@'centos1' IDENTIFIED BY 'hadoop';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hadoop'@'centos1' WITH GRANT OPTION;
mysql> exit
To make sure this new user can connect to MySQL server, switch to user hadoop:
su - hadoop
mysql -h centos1 -u hadoop -p
We need to change the hive configuration so it can use MySQL:
nano /hadoop/hive/conf/hive-site.xml
and new configuration values are:

Some of the above parameters do not match what we did to setup derby server in previous post, so I decided to delete the file:

rm /hadoop/hive/conf/
hive needs to have the MySQL jdbc drivers, so we need to download and copy it to hive/lib folder:
cd /hadoop
tar -xvzf mysql-connector-java-5.1.11.tar.gz
cp mysql-connector-java-5.1.11/*.jar /hadoop/hive/lib
To make sure all settings are done correctly, we can do this:
cd /hadoop/hive
hive> show tables;

Add comment