7/16/2015

Building Hadoop 2.4.0 on Mac OS X Yosemite 10.10.3 with native components

Install pre-requisites

We'll need these for the actual build.

sudo port install cmake gmake gcc48 zlib gzip maven32 apache-ant

Install protobuf 2.5.0

As the current latest version in macports is 2.6.x, we need to stick to an earlier version:

cd ~/tools
svn co http://svn.macports.org/repository/macports/trunk/dports/devel/protobuf-cpp -r 105333
cd protobuf-cpp/
sudo port install

To verify:

protoc --version
# libprotoc 2.5.0

Acquire sources

As I needed an exact version for my work to reproduce an issue, I'll go with version 2.4.0 for now. I suppose some of the fixes will work with earlier or later versions as well. Look around in the tags folder for other versions.

cd ~/dev
svn co http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0 hadoop-2.4.0
cd hadoop-2.4.0

Fix sources

We need to patch JniBasedUnixGroupsNetgroupMapping:

patch -p0 <<EOF
--- hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.orig 2015-07-16 17:14:20.000000000 +0200
+++ hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c 2015-07-16 17:17:47.000000000 +0200
@@ -74,7 +74,7 @@
   // endnetgrent)
   setnetgrentCalledFlag = 1;
 #ifndef __FreeBSD__
-  if(setnetgrent(cgroup) == 1) {
+  setnetgrent(cgroup); {
 #endif
     current = NULL;
     // three pointers are for host, user, domain, we only care

EOF

As well as container-executor.c:

patch -p0 <<EOF
--- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c.orig 2015-07-16 17:49:15.000000000 +0200
+++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c 2015-07-16 18:13:03.000000000 +0200
@@ -498,7 +498,7 @@
   char **users = whitelist;
   if (whitelist != NULL) {
     for(; *users; ++users) {
-      if (strncmp(*users, user, LOGIN_NAME_MAX) == 0) {
+      if (strncmp(*users, user, 64) == 0) {
         free_values(whitelist);
         return 1;
       }
@@ -1247,7 +1247,7 @@
               pair);
     result = -1; 
   } else {
-    if (mount("none", mount_path, "cgroup", 0, controller) == 0) {
+    if (mount("none", mount_path, "cgroup", 0) == 0) {
       char *buf = stpncpy(hier_path, mount_path, strlen(mount_path));
       *buf++ = '/';
       snprintf(buf, PATH_MAX - (buf - hier_path), "%s", hierarchy);
@@ -1274,3 +1274,21 @@
   return result;
 }
 
+int fcloseall(void)
+{
+    int succeeded; /* return value */
+    FILE *fds_to_close[3]; /* the size being hardcoded to '3' is temporary */
+    int i; /* loop counter */
+    succeeded = 0;
+    fds_to_close[0] = stdin;
+    fds_to_close[1] = stdout;
+    fds_to_close[2] = stderr;
+    /* max iterations being hardcoded to '3' is temporary: */
+    for ((i = 0); (i < 3); i++) {
+ succeeded += fclose(fds_to_close[i]);
+    }
+    if (succeeded != 0) {
+ succeeded = EOF;
+    }
+    return succeeded;
+}

EOF

Install Oracle JDK 1.7

You'll need to install "Java SE Development Kit 7 (Mac OS X x64)" from Oracle. Then let's fix some things expected by the build at a different place:

export JAVA_HOME=`/usr/libexec/java_home -v 1.7`
sudo mkdir $JAVA_HOME/Classes
sudo ln -s $JAVA_HOME/lib/tools.jar $JAVA_HOME/Classes/classes.jar

Install Hadoop 2.4.0:

Sooner or later we've been expected to get here, right?

mvn package -Pdist,native -DskipTests -Dtar

If all goes well:

main:
     [exec] $ tar cf hadoop-2.4.0.tar hadoop-2.4.0
     [exec] $ gzip -f hadoop-2.4.0.tar
     [exec] 
     [exec] Hadoop dist tar available at: /Users/doma/dev/hadoop-2.4.0/hadoop-dist/target/hadoop-2.4.0.tar.gz
     [exec] 
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ hadoop-dist ---
[INFO] Building jar: /Users/doma/dev/hadoop-2.4.0/hadoop-dist/target/hadoop-dist-2.4.0-javadoc.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main ................................ SUCCESS [1.177s]
[INFO] Apache Hadoop Project POM ......................... SUCCESS [1.548s]
[INFO] Apache Hadoop Annotations ......................... SUCCESS [3.394s]
[INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.277s]
[INFO] Apache Hadoop Project Dist POM .................... SUCCESS [1.765s]
[INFO] Apache Hadoop Maven Plugins ....................... SUCCESS [3.143s]
[INFO] Apache Hadoop MiniKDC ............................. SUCCESS [2.498s]
[INFO] Apache Hadoop Auth ................................ SUCCESS [3.265s]
[INFO] Apache Hadoop Auth Examples ....................... SUCCESS [2.074s]
[INFO] Apache Hadoop Common .............................. SUCCESS [1:26.460s]
[INFO] Apache Hadoop NFS ................................. SUCCESS [4.527s]
[INFO] Apache Hadoop Common Project ...................... SUCCESS [0.032s]
[INFO] Apache Hadoop HDFS ................................ SUCCESS [2:09.326s]
[INFO] Apache Hadoop HttpFS .............................. SUCCESS [14.876s]
[INFO] Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS [5.814s]
[INFO] Apache Hadoop HDFS-NFS ............................ SUCCESS [2.941s]
[INFO] Apache Hadoop HDFS Project ........................ SUCCESS [0.034s]
[INFO] hadoop-yarn ....................................... SUCCESS [0.034s]
[INFO] hadoop-yarn-api ................................... SUCCESS [57.713s]
[INFO] hadoop-yarn-common ................................ SUCCESS [20.985s]
[INFO] hadoop-yarn-server ................................ SUCCESS [0.040s]
[INFO] hadoop-yarn-server-common ......................... SUCCESS [6.935s]
[INFO] hadoop-yarn-server-nodemanager .................... SUCCESS [12.889s]
[INFO] hadoop-yarn-server-web-proxy ...................... SUCCESS [2.362s]
[INFO] hadoop-yarn-server-applicationhistoryservice ...... SUCCESS [4.059s]
[INFO] hadoop-yarn-server-resourcemanager ................ SUCCESS [11.368s]
[INFO] hadoop-yarn-server-tests .......................... SUCCESS [0.467s]
[INFO] hadoop-yarn-client ................................ SUCCESS [4.109s]
[INFO] hadoop-yarn-applications .......................... SUCCESS [0.043s]
[INFO] hadoop-yarn-applications-distributedshell ......... SUCCESS [2.123s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS [1.902s]
[INFO] hadoop-yarn-site .................................. SUCCESS [0.030s]
[INFO] hadoop-yarn-project ............................... SUCCESS [3.828s]
[INFO] hadoop-mapreduce-client ........................... SUCCESS [0.069s]
[INFO] hadoop-mapreduce-client-core ...................... SUCCESS [19.507s]
[INFO] hadoop-mapreduce-client-common .................... SUCCESS [13.039s]
[INFO] hadoop-mapreduce-client-shuffle ................... SUCCESS [2.232s]
[INFO] hadoop-mapreduce-client-app ....................... SUCCESS [7.625s]
[INFO] hadoop-mapreduce-client-hs ........................ SUCCESS [6.198s]
[INFO] hadoop-mapreduce-client-jobclient ................. SUCCESS [5.440s]
[INFO] hadoop-mapreduce-client-hs-plugins ................ SUCCESS [1.534s]
[INFO] Apache Hadoop MapReduce Examples .................. SUCCESS [4.577s]
[INFO] hadoop-mapreduce .................................. SUCCESS [2.903s]
[INFO] Apache Hadoop MapReduce Streaming ................. SUCCESS [3.509s]
[INFO] Apache Hadoop Distributed Copy .................... SUCCESS [6.723s]
[INFO] Apache Hadoop Archives ............................ SUCCESS [1.705s]
[INFO] Apache Hadoop Rumen ............................... SUCCESS [4.460s]
[INFO] Apache Hadoop Gridmix ............................. SUCCESS [3.330s]
[INFO] Apache Hadoop Data Join ........................... SUCCESS [2.585s]
[INFO] Apache Hadoop Extras .............................. SUCCESS [2.361s]
[INFO] Apache Hadoop Pipes ............................... SUCCESS [9.603s]
[INFO] Apache Hadoop OpenStack support ................... SUCCESS [3.797s]
[INFO] Apache Hadoop Client .............................. SUCCESS [6.102s]
[INFO] Apache Hadoop Mini-Cluster ........................ SUCCESS [0.091s]
[INFO] Apache Hadoop Scheduler Load Simulator ............ SUCCESS [3.251s]
[INFO] Apache Hadoop Tools Dist .......................... SUCCESS [5.068s]
[INFO] Apache Hadoop Tools ............................... SUCCESS [0.032s]
[INFO] Apache Hadoop Distribution ........................ SUCCESS [24.974s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8:54.425s
[INFO] Finished at: Thu Jul 16 18:22:12 CEST 2015
[INFO] Final Memory: 173M/920M
[INFO] ------------------------------------------------------------------------

Using it

First we'll extract the results of our build. Then actually there is a little bit of configuration needed even for a single-cluster setup. Don't worry, I'll copy it here for your comfort ;-)

tar -xvzf /Users/doma/dev/hadoop-2.4.0/hadoop-dist/target/hadoop-2.4.0.tar.gz -C ~/tools

The contents of ~/tools/hadoop-2.4.0/etc/hadoop/core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

The contents of ~/tools/hadoop-2.4.0/etc/hadoop/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Passwordless SSH

From the official docs:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Starting up

Let's see what we've did. This is a raw copy from the official docs.

  1. Format the filesystem:
    bin/hdfs namenode -format
    
  2. Start NameNode daemon and DataNode daemon:
    sbin/start-dfs.sh
    

    The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

  3. Browse the web interface for the NameNode; by default it is available at:
  4. Make the HDFS directories required to execute MapReduce jobs:
    bin/hdfs dfs -mkdir /user
    bin/hdfs dfs -mkdir /user/<username>
    
  5. Copy the input files into the distributed filesystem:
    bin/hdfs dfs -put etc/hadoop input
    

    Check if they are there at http://localhost:50070/explorer.html#/

  6. Run some of the examples provided (that's actually one line...):
    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output 'dfs[a-z.]+'
    
  7. Examine the output files:

    Copy the output files from the distributed filesystem to the local filesystem and examine them:

    bin/hdfs dfs -get output output
    cat output/*
    

    or

    View the output files on the distributed filesystem:

    bin/hdfs dfs -cat output/*
    
  8. When you're done, stop the daemons with:
    sbin/stop-dfs.sh
    

Possible errors without the fixes & tweaks above

This list is an excerpt from my efforts during the build. They meant to drive you here via google ;-) Apply the procedure above and all of these errors will be fixed for you.

Without ProtoBuf

If you don't have protobuf, you'll get the following error:

[INFO] --- hadoop-maven-plugins:2.4.0:protoc (compile-protoc) @ hadoop-common ---
[WARNING] [protoc, --version] failed: java.io.IOException: Cannot run program "protoc": error=2, No such file or directory
[ERROR] stdout: []

Wrong version of ProtoBuf

If you don't have the correct version of protobuf, you'll get

[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.4.0:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 2.6.1', expected version is '2.5.0' -> [Help 1]

CMAKE missing

If you don't have cmake, you'll get

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-common: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "cmake" (in directory "/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native"): error=2, No such file or directory
[ERROR] around Ant part ...... @ 4:132 in /Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/antrun/build-main.xml

JAVA_HOME missing

If you don't have JAVA_HOME correctly set, you'll get

     [exec] -- Detecting CXX compiler ABI info
     [exec] -- Detecting CXX compiler ABI info - done
     [exec] -- Detecting CXX compile features
     [exec] -- Detecting CXX compile features - done
     [exec] CMake Error at /opt/local/share/cmake-3.2/Modules/FindPackageHandleStandardArgs.cmake:138 (message):
     [exec]   Could NOT find JNI (missing: JAVA_AWT_LIBRARY JAVA_JVM_LIBRARY
     [exec]   JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH)
     [exec] Call Stack (most recent call first):
     [exec]   /opt/local/share/cmake-3.2/Modules/FindPackageHandleStandardArgs.cmake:374 (_FPHSA_FAILURE_MESSAGE)
     [exec]   /opt/local/share/cmake-3.2/Modules/FindJNI.cmake:287 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
     [exec]   JNIFlags.cmake:117 (find_package)
     [exec]   CMakeLists.txt:24 (include)
     [exec] 
     [exec] 
     [exec] -- Configuring incomplete, errors occurred!
     [exec] See also "/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native/CMakeFiles/CMakeOutput.log".

JniBasedUnixGroupsNetgroupMapping.c patch missing

If you don't have the patch for JniBasedUnixGroupsNetgroupMapping.c above, you'll get

     [exec] [ 38%] Building C object CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.o
     [exec] /Library/Developer/CommandLineTools/usr/bin/cc  -Dhadoop_EXPORTS -g -Wall -O2 -D_REENTRANT -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fPIC -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native/javah -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/src -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native -I/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/include/darwin -I/opt/local/include -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util    -o CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.o   -c /Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c
     [exec] /Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c:77:26: error: invalid operands to binary expression ('void' and 'int')
     [exec]   if(setnetgrent(cgroup) == 1) {
     [exec]      ~~~~~~~~~~~~~~~~~~~ ^  ~
     [exec] 1 error generated.
     [exec] make[2]: *** [CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.o] Error 1
     [exec] make[1]: *** [CMakeFiles/hadoop.dir/all] Error 2
     [exec] make: *** [all] Error 2

fcloseall patch missing

Without applying the fcloseall patch above, you might get the following error:

     [exec] Undefined symbols for architecture x86_64:
     [exec]   "_fcloseall", referenced from:
     [exec]       _launch_container_as_user in libcontainer.a(container-executor.c.o)
     [exec] ld: symbol(s) not found for architecture x86_64
     [exec] collect2: error: ld returned 1 exit status
     [exec] make[2]: *** [target/usr/local/bin/container-executor] Error 1
     [exec] make[1]: *** [CMakeFiles/container-executor.dir/all] Error 2
     [exec] make: *** [all] Error 2

Symlink missing

Without the "export JAVA_HOME=`/usr/libexec/java_home -v 1.7`;sudo mkdir $JAVA_HOME/Classes;sudo ln -s $JAVA_HOME/lib/tools.jar $JAVA_HOME/Classes/classes.jar" line creating the symlinks above, you'll get

Exception in thread "main" java.lang.AssertionError: Missing tools.jar at: /Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/Classes/classes.jar. Expression: file.exists()
 at org.codehaus.groovy.runtime.InvokerHelper.assertFailed(InvokerHelper.java:395)
 at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.assertFailed(ScriptBytecodeAdapter.java:683)
 at org.codehaus.mojo.jspc.CompilationMojoSupport.findToolsJar(CompilationMojoSupport.groovy:371)
 at org.codehaus.mojo.jspc.CompilationMojoSupport.this$4$findToolsJar(CompilationMojoSupport.groovy)
...

References:

http://java-notes.com/index.php/hadoop-on-osx

https://issues.apache.org/jira/secure/attachment/12602452/HADOOP-9350.patch

http://www.csrdu.org/nauman/2014/01/23/geting-started-with-hadoop-2-2-0-building/

https://developer.apple.com/library/mac/documentation/Porting/Conceptual/PortingUnix/compiling/compiling.html

https://github.com/cooljeanius/libUnixToOSX/blob/master/fcloseall.c

http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html


Install Apache Ant 1.8.1 via MacPorts

If the latest version of Apache Ant in MacPorts is not what you're after, you can try to downgrade. Here's how simple it is:

cd ~/tools
svn co http://svn.macports.org/repository/macports/trunk/dports/devel/apache-ant -r 74985
cd apache-ant/
sudo port install

To verify:

# ant -version
# Apache Ant version 1.8.1 compiled on April 30 2010

If you want to get other revisions, see here.

Switch between versions

But wait, there's more! You can even switch between the different versions installed easily. Actually this is what makes the whole process comfortable, no need to store different versions of tools at arbitrary locations...

sudo port activate apache-ant
--->  The following versions of apache-ant are currently installed:
--->      apache-ant @1.8.1_1 (active)
--->      apache-ant @1.8.4_0
--->      apache-ant @1.9.4_0
Error: port activate failed: Registry error: Please specify the full version as recorded in the port registry.

I have these three versions - I used 1.8.4 for the earlier Tomcat build, and 1.8.1 for the Hadoop build (the next post...)

But now that Hadoop is also built on OSX for my work, I switch back to the latest version:

sudo port activate apache-ant@1.9.4_0
--->  Deactivating apache-ant @1.8.1_1
--->  Cleaning apache-ant
--->  Activating apache-ant @1.9.4_0
--->  Cleaning apache-ant

To verify:

ant -version
# Apache Ant(TM) version 1.9.4 compiled on April 29 2014

Neat, huh?