• Home
  • Engineering
  • Business
  • Travel

DeMar.is

DeMar.is

Tag Archives: engineering

How-To: VirtualBox Hive Server with Python Client

06 Monday Apr 2015

Posted by Justin DeMaris in Engineering

≈ 1 Comment

Tags

engineering, hadoop, hive

Getting up and running with a Hive proof of concept is not very intuitive and required me to go through a mishmash of documents to find all of the pieces. I hope these instructions help somebody else get their proof of concept instance up and running sooner!

Baseline

My goal was to have a local running Hive Server instance that I could use to query various types of data to get a better idea of what was possible with Hive and how maintainable it would be in a future production architecture. This tutorial does not take into account any degree of hardening or security, nor major scalability.

I do all of my development on a Macbook Pro with 16GB of RAM, a 2.2 GHz Intel Core i7 and running OS X Yosemite. My proof of concept systems generally run on Docker using boot2docker, but in this case it was easier for me to build it directly within a VirtualBox instance. I am using Ubuntu Server 14.04.2 as my base image for the VirtualBox Virtual Machine.

I set my networking for the virtual box instance to use Bridged networking so that my main MacBook can access it.

Setup Hadoop

1. Create a new VirtualBox image. I granted mine 1GB of RAM and 8GB of disk space and installed it with the latest Ubuntu Server ISO to get started.

2. Once installation is complete, log in as your admin user.

3. Setup the Guest Additions CD (Devices -> Insert Guest Additions CD). This will make copy-pasting from the host into the VM easier.

4. Make sure SSHD is working properly by executing “ssh localhost” and accepting the server certificate.

4. apt-get install -y ssh openjdk-7-jre openjdk-7-jdk wget vim

5. Work out /tmp so we have a clean baseline: cd /tmp

6. Download Hadoop: wget http://mirrors.advancedhosters.com/apache/hadoop/common/hadoop-2.7.0/hadoop-2.7.0.tar.gz

6. tar xzf hadoop-2.6.0.tar.gz

7. sudo mv hadoop-2.6.0 /usr/local/hadoop

7. export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre

8. export HADOOP_PREFIX=/usr/local/hadoop

10. export PATH=/usr/local/hadoop/bin:$PATH

11. Modify /usr/local/hadoop/etc/hadoop/core-site.xml to add the following inside of the <configuration> tag:

<property><name>fs.defaultFS</name><value>hdfs://localhost:9000</value></property>

12. Modify /usr/local/hadoop/etc/hadoop/hdfs-site.xml to add the following inside of the <configuration> tag:

<property><name>dfs.replication</name><value>1</value></property>

13. Setup passwordless SSH

ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Prepare HDFS

1. cd /usr/local/hadoop

2. bin/hdfs namenode -format

3. Modify /usr/local/hadoop/etc/hadoop/hadoop-env.sh by adding a line the end:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre

4. sbin/start-dfs.sh (be sure to say yes when SSH fingerprint verification comes up)

5. bin/hdfs dfs -mkdir /user
bin/hdfs pdfs -mkdir /user/justin (replace justin with your own username)

6. bin/hdfs pdfs -put etc/hadoop input
Prepare Hive

1. wget http://mirrors.advancedhosters.com/apache/hive/hive-1.1.0/apache-hive-1.1.0-bin.tar.gz

2. tar xzf apache-hive-1.1.0-bin.tar.gz

3. sudo mv apache-hive-1.1.0-bin /usr/local/hive

4. export HIVE_HOME=/usr/local/hive

5. rm /usr/local/hive/lib/hive-jdbc-1.1.0-standalone.jar

6. rm /usr/local/hadoop/share/hadoop/yarn/lib/jline-0.9.94.jar

7. cp /usr/local/hive/conf/hive-env.sh.template /usr/local/hive/conf/hive-env.sh

8. Add the following lines to the end of /usr/local/hive/conf/hive-env.sh

export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre
export HIVE_HOME=/usr/local/hive

9. Start the Hive Server2 Instance:

/usr/local/hive/bin/hive —-service hiveserver2

Simple Python Client

1. Back on your host computer, install the python requirements:

sudo easy_install pip
pip install pyhs2

2. Create a simple script (test.py) – replace the host with the IP address of your VirtualBox (you can get it by running ifconfig on the VirtualBox instance):

import pyhs2

with pyhs2.connect(host=’192.168.20.82′,
port=10000,
authMechanism=’PLAIN’,
user=’hdfs’,
password=’hdfs’,
database=’default’) as conn:
with conn.cursor() as cur:
print cur.getDatabases()

3. Run the script to test it:

python test.py

You should some output that looks like this:

[[‘default’, ”]]

References:

https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html

https://cwiki.apache.org/confluence/display/Hive/AdminManual+Installation#AdminManualInstallation-InstallingfromaTarball

https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2

https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-PythonClientDriver

How To: Apiary / GitHub README.md Integration

19 Thursday Mar 2015

Posted by Justin DeMaris in Engineering

≈ Leave a comment

Tags

engineering

Are you using Apiary for documenting your API and GitHub for storing your code base? Apiary has a really cool feature that will keep your markdown stored inside of your git repository and update the Apiary version every time you push a change to Github. To set this up, go to the Settings section of your Apiary account by this icon on the top bar:

Apiary Settings Icon

Then scroll to the bottom of the page and you will see a “Link Your GitHub Repository” section:

Link GitHubCheck the box to grant access to the private repositories (if you’re connecting to a private one) and then click Connect to GitHub. Once you have finished the authorization process you will be back on the same page in Apiary, but it will look like this:

GitHub Repo Selector

Click on the repository you want to connect it to and click Go. Apiary will add a file to the master branch of the repository named apiary.apib. If you update the Apiary documentation on their website and save it, it will do a git commit to keep the repository up to date. Even better, if you commit changes to the Apiary markdown inside of your repository and push it to GitHub, the latest documentation will be reflected in Apiary! This makes it very convenient to keep your Apiary in lock step with your code and you can even apply branching and tagging to it now.

Now for the Coup d’état: Integration with your README.md!

As you probably know, GitHub supports a special file in the root of the repo called README.md. This is a MarkDown file that almost everybody uses to display the documentation for the repository. GitHub renders it below the root file listing on the repository homepage. Since the markdown dialects for Apiary and GitHub are reasonably similar, it would be really awesome if your Apiary documentation showed up here!

Luckily for us, git supports symlinks. Check out your repository and get rid of your existing README.md file (if any). I actually moved mine to INSTALL.md since it was more appropriate for installation than primary documentation. Now run the following from your command line (Linux and Mac Only – Sorry Windows folks, I have no clue if Windows has some version of symlinks yet):

ln -s apiary.apib README.md
git add README.md
git commit -m "Symlinking README.md to apiary.apib thanks to demar.is"
git push origin master

Voila! Other than the FORMAT: 1A line at the top, this looks great!

GitHub Apiary Link Complete

 

Subscribe

  • Entries (RSS)
  • Comments (RSS)

Archives

  • April 2015
  • March 2015
  • July 2012
  • June 2012
  • January 2012
  • December 2011
  • November 2011
  • March 2010
  • January 2009
  • July 2008
  • March 2008
  • February 2008
  • January 2008
  • August 2007
  • June 2007
  • May 2007
  • April 2007
  • February 2007
  • January 2007
  • November 2006
  • June 2006
  • February 2006
  • January 2006
  • December 2005
  • November 2005
  • October 2005
  • July 2005
  • June 2005

Categories

  • Business
  • Engineering
  • Travel
  • Uncategorized

Meta

  • Register
  • Log in

Blog at WordPress.com.

  • Follow Following
    • DeMar.is
    • Already have a WordPress.com account? Log in now.
    • DeMar.is
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...