Install pre-requisites
We'll need these for the actual build.
sudo port install cmake gmake gcc48 zlib gzip maven32 apache-ant
Install protobuf 2.5.0
As the current latest version in macports is 2.6.x, we need to stick to an earlier version:
cd ~/tools svn co http://svn.macports.org/repository/macports/trunk/dports/devel/protobuf-cpp -r 105333 cd protobuf-cpp/ sudo port install
To verify:
protoc --version # libprotoc 2.5.0
Acquire sources
As I needed an exact version for my work to reproduce an issue, I'll go with version 2.4.0 for now. I suppose some of the fixes will work with earlier or later versions as well. Look around in the tags folder for other versions.
cd ~/dev svn co http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0 hadoop-2.4.0 cd hadoop-2.4.0
Fix sources
We need to patch JniBasedUnixGroupsNetgroupMapping:
patch -p0 <<EOF --- hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.orig 2015-07-16 17:14:20.000000000 +0200 +++ hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c 2015-07-16 17:17:47.000000000 +0200 @@ -74,7 +74,7 @@ // endnetgrent) setnetgrentCalledFlag = 1; #ifndef __FreeBSD__ - if(setnetgrent(cgroup) == 1) { + setnetgrent(cgroup); { #endif current = NULL; // three pointers are for host, user, domain, we only care EOF
As well as container-executor.c:
patch -p0 <<EOF --- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c.orig 2015-07-16 17:49:15.000000000 +0200 +++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c 2015-07-16 18:13:03.000000000 +0200 @@ -498,7 +498,7 @@ char **users = whitelist; if (whitelist != NULL) { for(; *users; ++users) { - if (strncmp(*users, user, LOGIN_NAME_MAX) == 0) { + if (strncmp(*users, user, 64) == 0) { free_values(whitelist); return 1; } @@ -1247,7 +1247,7 @@ pair); result = -1; } else { - if (mount("none", mount_path, "cgroup", 0, controller) == 0) { + if (mount("none", mount_path, "cgroup", 0) == 0) { char *buf = stpncpy(hier_path, mount_path, strlen(mount_path)); *buf++ = '/'; snprintf(buf, PATH_MAX - (buf - hier_path), "%s", hierarchy); @@ -1274,3 +1274,21 @@ return result; } +int fcloseall(void) +{ + int succeeded; /* return value */ + FILE *fds_to_close[3]; /* the size being hardcoded to '3' is temporary */ + int i; /* loop counter */ + succeeded = 0; + fds_to_close[0] = stdin; + fds_to_close[1] = stdout; + fds_to_close[2] = stderr; + /* max iterations being hardcoded to '3' is temporary: */ + for ((i = 0); (i < 3); i++) { + succeeded += fclose(fds_to_close[i]); + } + if (succeeded != 0) { + succeeded = EOF; + } + return succeeded; +} EOF
Install Oracle JDK 1.7
You'll need to install "Java SE Development Kit 7 (Mac OS X x64)" from Oracle. Then let's fix some things expected by the build at a different place:
export JAVA_HOME=`/usr/libexec/java_home -v 1.7` sudo mkdir $JAVA_HOME/Classes sudo ln -s $JAVA_HOME/lib/tools.jar $JAVA_HOME/Classes/classes.jar
Install Hadoop 2.4.0:
Sooner or later we've been expected to get here, right?
mvn package -Pdist,native -DskipTests -Dtar
If all goes well:
main: [exec] $ tar cf hadoop-2.4.0.tar hadoop-2.4.0 [exec] $ gzip -f hadoop-2.4.0.tar [exec] [exec] Hadoop dist tar available at: /Users/doma/dev/hadoop-2.4.0/hadoop-dist/target/hadoop-2.4.0.tar.gz [exec] [INFO] Executed tasks [INFO] [INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ hadoop-dist --- [INFO] Building jar: /Users/doma/dev/hadoop-2.4.0/hadoop-dist/target/hadoop-dist-2.4.0-javadoc.jar [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop Main ................................ SUCCESS [1.177s] [INFO] Apache Hadoop Project POM ......................... SUCCESS [1.548s] [INFO] Apache Hadoop Annotations ......................... SUCCESS [3.394s] [INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.277s] [INFO] Apache Hadoop Project Dist POM .................... SUCCESS [1.765s] [INFO] Apache Hadoop Maven Plugins ....................... SUCCESS [3.143s] [INFO] Apache Hadoop MiniKDC ............................. SUCCESS [2.498s] [INFO] Apache Hadoop Auth ................................ SUCCESS [3.265s] [INFO] Apache Hadoop Auth Examples ....................... SUCCESS [2.074s] [INFO] Apache Hadoop Common .............................. SUCCESS [1:26.460s] [INFO] Apache Hadoop NFS ................................. SUCCESS [4.527s] [INFO] Apache Hadoop Common Project ...................... SUCCESS [0.032s] [INFO] Apache Hadoop HDFS ................................ SUCCESS [2:09.326s] [INFO] Apache Hadoop HttpFS .............................. SUCCESS [14.876s] [INFO] Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS [5.814s] [INFO] Apache Hadoop HDFS-NFS ............................ SUCCESS [2.941s] [INFO] Apache Hadoop HDFS Project ........................ SUCCESS [0.034s] [INFO] hadoop-yarn ....................................... SUCCESS [0.034s] [INFO] hadoop-yarn-api ................................... SUCCESS [57.713s] [INFO] hadoop-yarn-common ................................ SUCCESS [20.985s] [INFO] hadoop-yarn-server ................................ SUCCESS [0.040s] [INFO] hadoop-yarn-server-common ......................... SUCCESS [6.935s] [INFO] hadoop-yarn-server-nodemanager .................... SUCCESS [12.889s] [INFO] hadoop-yarn-server-web-proxy ...................... SUCCESS [2.362s] [INFO] hadoop-yarn-server-applicationhistoryservice ...... SUCCESS [4.059s] [INFO] hadoop-yarn-server-resourcemanager ................ SUCCESS [11.368s] [INFO] hadoop-yarn-server-tests .......................... SUCCESS [0.467s] [INFO] hadoop-yarn-client ................................ SUCCESS [4.109s] [INFO] hadoop-yarn-applications .......................... SUCCESS [0.043s] [INFO] hadoop-yarn-applications-distributedshell ......... SUCCESS [2.123s] [INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS [1.902s] [INFO] hadoop-yarn-site .................................. SUCCESS [0.030s] [INFO] hadoop-yarn-project ............................... SUCCESS [3.828s] [INFO] hadoop-mapreduce-client ........................... SUCCESS [0.069s] [INFO] hadoop-mapreduce-client-core ...................... SUCCESS [19.507s] [INFO] hadoop-mapreduce-client-common .................... SUCCESS [13.039s] [INFO] hadoop-mapreduce-client-shuffle ................... SUCCESS [2.232s] [INFO] hadoop-mapreduce-client-app ....................... SUCCESS [7.625s] [INFO] hadoop-mapreduce-client-hs ........................ SUCCESS [6.198s] [INFO] hadoop-mapreduce-client-jobclient ................. SUCCESS [5.440s] [INFO] hadoop-mapreduce-client-hs-plugins ................ SUCCESS [1.534s] [INFO] Apache Hadoop MapReduce Examples .................. SUCCESS [4.577s] [INFO] hadoop-mapreduce .................................. SUCCESS [2.903s] [INFO] Apache Hadoop MapReduce Streaming ................. SUCCESS [3.509s] [INFO] Apache Hadoop Distributed Copy .................... SUCCESS [6.723s] [INFO] Apache Hadoop Archives ............................ SUCCESS [1.705s] [INFO] Apache Hadoop Rumen ............................... SUCCESS [4.460s] [INFO] Apache Hadoop Gridmix ............................. SUCCESS [3.330s] [INFO] Apache Hadoop Data Join ........................... SUCCESS [2.585s] [INFO] Apache Hadoop Extras .............................. SUCCESS [2.361s] [INFO] Apache Hadoop Pipes ............................... SUCCESS [9.603s] [INFO] Apache Hadoop OpenStack support ................... SUCCESS [3.797s] [INFO] Apache Hadoop Client .............................. SUCCESS [6.102s] [INFO] Apache Hadoop Mini-Cluster ........................ SUCCESS [0.091s] [INFO] Apache Hadoop Scheduler Load Simulator ............ SUCCESS [3.251s] [INFO] Apache Hadoop Tools Dist .......................... SUCCESS [5.068s] [INFO] Apache Hadoop Tools ............................... SUCCESS [0.032s] [INFO] Apache Hadoop Distribution ........................ SUCCESS [24.974s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 8:54.425s [INFO] Finished at: Thu Jul 16 18:22:12 CEST 2015 [INFO] Final Memory: 173M/920M [INFO] ------------------------------------------------------------------------
Using it
First we'll extract the results of our build. Then actually there is a little bit of configuration needed even for a single-cluster setup. Don't worry, I'll copy it here for your comfort ;-)
tar -xvzf /Users/doma/dev/hadoop-2.4.0/hadoop-dist/target/hadoop-2.4.0.tar.gz -C ~/tools
The contents of ~/tools/hadoop-2.4.0/etc/hadoop/core-site.xml:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
The contents of ~/tools/hadoop-2.4.0/etc/hadoop/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
Passwordless SSH
From the official docs:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Starting up
Let's see what we've did. This is a raw copy from the official docs.
- Format the filesystem:
bin/hdfs namenode -format
- Start NameNode daemon and DataNode daemon:
sbin/start-dfs.sh
The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
- Browse the web interface for the NameNode; by default it is available at:
- NameNode - http://localhost:50070/
- Make the HDFS directories required to execute MapReduce jobs:
bin/hdfs dfs -mkdir /user bin/hdfs dfs -mkdir /user/<username>
- Copy the input files into the distributed filesystem:
bin/hdfs dfs -put etc/hadoop input
Check if they are there at http://localhost:50070/explorer.html#/
- Run some of the examples provided (that's actually one line...):
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output 'dfs[a-z.]+'
- Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
bin/hdfs dfs -get output output cat output/*
or
View the output files on the distributed filesystem:
bin/hdfs dfs -cat output/*
- When you're done, stop the daemons with:
sbin/stop-dfs.sh
Possible errors without the fixes & tweaks above
This list is an excerpt from my efforts during the build. They meant to drive you here via google ;-) Apply the procedure above and all of these errors will be fixed for you.
Without ProtoBuf
If you don't have protobuf, you'll get the following error:
[INFO] --- hadoop-maven-plugins:2.4.0:protoc (compile-protoc) @ hadoop-common --- [WARNING] [protoc, --version] failed: java.io.IOException: Cannot run program "protoc": error=2, No such file or directory [ERROR] stdout: []
Wrong version of ProtoBuf
If you don't have the correct version of protobuf, you'll get
[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.4.0:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 2.6.1', expected version is '2.5.0' -> [Help 1]
CMAKE missing
If you don't have cmake, you'll get
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-common: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "cmake" (in directory "/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native"): error=2, No such file or directory [ERROR] around Ant part ...... @ 4:132 in /Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/antrun/build-main.xml
JAVA_HOME missing
If you don't have JAVA_HOME correctly set, you'll get
[exec] -- Detecting CXX compiler ABI info [exec] -- Detecting CXX compiler ABI info - done [exec] -- Detecting CXX compile features [exec] -- Detecting CXX compile features - done [exec] CMake Error at /opt/local/share/cmake-3.2/Modules/FindPackageHandleStandardArgs.cmake:138 (message): [exec] Could NOT find JNI (missing: JAVA_AWT_LIBRARY JAVA_JVM_LIBRARY [exec] JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH) [exec] Call Stack (most recent call first): [exec] /opt/local/share/cmake-3.2/Modules/FindPackageHandleStandardArgs.cmake:374 (_FPHSA_FAILURE_MESSAGE) [exec] /opt/local/share/cmake-3.2/Modules/FindJNI.cmake:287 (FIND_PACKAGE_HANDLE_STANDARD_ARGS) [exec] JNIFlags.cmake:117 (find_package) [exec] CMakeLists.txt:24 (include) [exec] [exec] [exec] -- Configuring incomplete, errors occurred! [exec] See also "/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native/CMakeFiles/CMakeOutput.log".
JniBasedUnixGroupsNetgroupMapping.c patch missing
If you don't have the patch for JniBasedUnixGroupsNetgroupMapping.c above, you'll get
[exec] [ 38%] Building C object CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.o [exec] /Library/Developer/CommandLineTools/usr/bin/cc -Dhadoop_EXPORTS -g -Wall -O2 -D_REENTRANT -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fPIC -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native/javah -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/src -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native -I/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/include/darwin -I/opt/local/include -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util -o CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.o -c /Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c [exec] /Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c:77:26: error: invalid operands to binary expression ('void' and 'int') [exec] if(setnetgrent(cgroup) == 1) { [exec] ~~~~~~~~~~~~~~~~~~~ ^ ~ [exec] 1 error generated. [exec] make[2]: *** [CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.o] Error 1 [exec] make[1]: *** [CMakeFiles/hadoop.dir/all] Error 2 [exec] make: *** [all] Error 2
fcloseall patch missing
Without applying the fcloseall patch above, you might get the following error:
[exec] Undefined symbols for architecture x86_64: [exec] "_fcloseall", referenced from: [exec] _launch_container_as_user in libcontainer.a(container-executor.c.o) [exec] ld: symbol(s) not found for architecture x86_64 [exec] collect2: error: ld returned 1 exit status [exec] make[2]: *** [target/usr/local/bin/container-executor] Error 1 [exec] make[1]: *** [CMakeFiles/container-executor.dir/all] Error 2 [exec] make: *** [all] Error 2
Symlink missing
Without the "export JAVA_HOME=`/usr/libexec/java_home -v 1.7`;sudo mkdir $JAVA_HOME/Classes;sudo ln -s $JAVA_HOME/lib/tools.jar $JAVA_HOME/Classes/classes.jar" line creating the symlinks above, you'll get
Exception in thread "main" java.lang.AssertionError: Missing tools.jar at: /Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/Classes/classes.jar. Expression: file.exists() at org.codehaus.groovy.runtime.InvokerHelper.assertFailed(InvokerHelper.java:395) at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.assertFailed(ScriptBytecodeAdapter.java:683) at org.codehaus.mojo.jspc.CompilationMojoSupport.findToolsJar(CompilationMojoSupport.groovy:371) at org.codehaus.mojo.jspc.CompilationMojoSupport.this$4$findToolsJar(CompilationMojoSupport.groovy) ...
References:
http://java-notes.com/index.php/hadoop-on-osx
https://issues.apache.org/jira/secure/attachment/12602452/HADOOP-9350.patch
http://www.csrdu.org/nauman/2014/01/23/geting-started-with-hadoop-2-2-0-building/
https://github.com/cooljeanius/libUnixToOSX/blob/master/fcloseall.c
http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html