Install pre-requisites
We'll need these for the actual build.
sudo port install cmake gmake gcc48 zlib gzip maven32 apache-ant
Install protobuf 2.5.0
As the current latest version in macports is 2.6.x, we need to stick to an earlier version:
cd ~/tools
svn co http://svn.macports.org/repository/macports/trunk/dports/devel/protobuf-cpp -r 105333
cd protobuf-cpp/
sudo port install
To verify:
protoc --version
# libprotoc 2.5.0
Acquire sources
As I needed an exact version for my work to reproduce an issue, I'll go with version 2.4.0 for now. I suppose some of the fixes will work with earlier or later versions as well. Look around in the tags folder for other versions.
cd ~/dev
svn co http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0 hadoop-2.4.0
cd hadoop-2.4.0
Fix sources
We need to patch JniBasedUnixGroupsNetgroupMapping:
patch -p0 <<EOF
--- hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.orig 2015-07-16 17:14:20.000000000 +0200
+++ hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c 2015-07-16 17:17:47.000000000 +0200
@@ -74,7 +74,7 @@
// endnetgrent)
setnetgrentCalledFlag = 1;
#ifndef __FreeBSD__
- if(setnetgrent(cgroup) == 1) {
+ setnetgrent(cgroup); {
#endif
current = NULL;
// three pointers are for host, user, domain, we only care
EOF
As well as container-executor.c:
patch -p0 <<EOF
--- hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c.orig 2015-07-16 17:49:15.000000000 +0200
+++ hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c 2015-07-16 18:13:03.000000000 +0200
@@ -498,7 +498,7 @@
char **users = whitelist;
if (whitelist != NULL) {
for(; *users; ++users) {
- if (strncmp(*users, user, LOGIN_NAME_MAX) == 0) {
+ if (strncmp(*users, user, 64) == 0) {
free_values(whitelist);
return 1;
}
@@ -1247,7 +1247,7 @@
pair);
result = -1;
} else {
- if (mount("none", mount_path, "cgroup", 0, controller) == 0) {
+ if (mount("none", mount_path, "cgroup", 0) == 0) {
char *buf = stpncpy(hier_path, mount_path, strlen(mount_path));
*buf++ = '/';
snprintf(buf, PATH_MAX - (buf - hier_path), "%s", hierarchy);
@@ -1274,3 +1274,21 @@
return result;
}
+int fcloseall(void)
+{
+ int succeeded; /* return value */
+ FILE *fds_to_close[3]; /* the size being hardcoded to '3' is temporary */
+ int i; /* loop counter */
+ succeeded = 0;
+ fds_to_close[0] = stdin;
+ fds_to_close[1] = stdout;
+ fds_to_close[2] = stderr;
+ /* max iterations being hardcoded to '3' is temporary: */
+ for ((i = 0); (i < 3); i++) {
+ succeeded += fclose(fds_to_close[i]);
+ }
+ if (succeeded != 0) {
+ succeeded = EOF;
+ }
+ return succeeded;
+}
EOF
Install Oracle JDK 1.7
You'll need to install "Java SE Development Kit 7 (Mac OS X x64)" from Oracle. Then let's fix some things expected by the build at a different place:
export JAVA_HOME=`/usr/libexec/java_home -v 1.7`
sudo mkdir $JAVA_HOME/Classes
sudo ln -s $JAVA_HOME/lib/tools.jar $JAVA_HOME/Classes/classes.jar
Install Hadoop 2.4.0:
Sooner or later we've been expected to get here, right?
mvn package -Pdist,native -DskipTests -Dtar
If all goes well:
main:
[exec] $ tar cf hadoop-2.4.0.tar hadoop-2.4.0
[exec] $ gzip -f hadoop-2.4.0.tar
[exec]
[exec] Hadoop dist tar available at: /Users/doma/dev/hadoop-2.4.0/hadoop-dist/target/hadoop-2.4.0.tar.gz
[exec]
[INFO] Executed tasks
[INFO]
[INFO] --- maven-javadoc-plugin:2.8.1:jar (module-javadocs) @ hadoop-dist ---
[INFO] Building jar: /Users/doma/dev/hadoop-2.4.0/hadoop-dist/target/hadoop-dist-2.4.0-javadoc.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................ SUCCESS [1.177s]
[INFO] Apache Hadoop Project POM ......................... SUCCESS [1.548s]
[INFO] Apache Hadoop Annotations ......................... SUCCESS [3.394s]
[INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.277s]
[INFO] Apache Hadoop Project Dist POM .................... SUCCESS [1.765s]
[INFO] Apache Hadoop Maven Plugins ....................... SUCCESS [3.143s]
[INFO] Apache Hadoop MiniKDC ............................. SUCCESS [2.498s]
[INFO] Apache Hadoop Auth ................................ SUCCESS [3.265s]
[INFO] Apache Hadoop Auth Examples ....................... SUCCESS [2.074s]
[INFO] Apache Hadoop Common .............................. SUCCESS [1:26.460s]
[INFO] Apache Hadoop NFS ................................. SUCCESS [4.527s]
[INFO] Apache Hadoop Common Project ...................... SUCCESS [0.032s]
[INFO] Apache Hadoop HDFS ................................ SUCCESS [2:09.326s]
[INFO] Apache Hadoop HttpFS .............................. SUCCESS [14.876s]
[INFO] Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS [5.814s]
[INFO] Apache Hadoop HDFS-NFS ............................ SUCCESS [2.941s]
[INFO] Apache Hadoop HDFS Project ........................ SUCCESS [0.034s]
[INFO] hadoop-yarn ....................................... SUCCESS [0.034s]
[INFO] hadoop-yarn-api ................................... SUCCESS [57.713s]
[INFO] hadoop-yarn-common ................................ SUCCESS [20.985s]
[INFO] hadoop-yarn-server ................................ SUCCESS [0.040s]
[INFO] hadoop-yarn-server-common ......................... SUCCESS [6.935s]
[INFO] hadoop-yarn-server-nodemanager .................... SUCCESS [12.889s]
[INFO] hadoop-yarn-server-web-proxy ...................... SUCCESS [2.362s]
[INFO] hadoop-yarn-server-applicationhistoryservice ...... SUCCESS [4.059s]
[INFO] hadoop-yarn-server-resourcemanager ................ SUCCESS [11.368s]
[INFO] hadoop-yarn-server-tests .......................... SUCCESS [0.467s]
[INFO] hadoop-yarn-client ................................ SUCCESS [4.109s]
[INFO] hadoop-yarn-applications .......................... SUCCESS [0.043s]
[INFO] hadoop-yarn-applications-distributedshell ......... SUCCESS [2.123s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS [1.902s]
[INFO] hadoop-yarn-site .................................. SUCCESS [0.030s]
[INFO] hadoop-yarn-project ............................... SUCCESS [3.828s]
[INFO] hadoop-mapreduce-client ........................... SUCCESS [0.069s]
[INFO] hadoop-mapreduce-client-core ...................... SUCCESS [19.507s]
[INFO] hadoop-mapreduce-client-common .................... SUCCESS [13.039s]
[INFO] hadoop-mapreduce-client-shuffle ................... SUCCESS [2.232s]
[INFO] hadoop-mapreduce-client-app ....................... SUCCESS [7.625s]
[INFO] hadoop-mapreduce-client-hs ........................ SUCCESS [6.198s]
[INFO] hadoop-mapreduce-client-jobclient ................. SUCCESS [5.440s]
[INFO] hadoop-mapreduce-client-hs-plugins ................ SUCCESS [1.534s]
[INFO] Apache Hadoop MapReduce Examples .................. SUCCESS [4.577s]
[INFO] hadoop-mapreduce .................................. SUCCESS [2.903s]
[INFO] Apache Hadoop MapReduce Streaming ................. SUCCESS [3.509s]
[INFO] Apache Hadoop Distributed Copy .................... SUCCESS [6.723s]
[INFO] Apache Hadoop Archives ............................ SUCCESS [1.705s]
[INFO] Apache Hadoop Rumen ............................... SUCCESS [4.460s]
[INFO] Apache Hadoop Gridmix ............................. SUCCESS [3.330s]
[INFO] Apache Hadoop Data Join ........................... SUCCESS [2.585s]
[INFO] Apache Hadoop Extras .............................. SUCCESS [2.361s]
[INFO] Apache Hadoop Pipes ............................... SUCCESS [9.603s]
[INFO] Apache Hadoop OpenStack support ................... SUCCESS [3.797s]
[INFO] Apache Hadoop Client .............................. SUCCESS [6.102s]
[INFO] Apache Hadoop Mini-Cluster ........................ SUCCESS [0.091s]
[INFO] Apache Hadoop Scheduler Load Simulator ............ SUCCESS [3.251s]
[INFO] Apache Hadoop Tools Dist .......................... SUCCESS [5.068s]
[INFO] Apache Hadoop Tools ............................... SUCCESS [0.032s]
[INFO] Apache Hadoop Distribution ........................ SUCCESS [24.974s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 8:54.425s
[INFO] Finished at: Thu Jul 16 18:22:12 CEST 2015
[INFO] Final Memory: 173M/920M
[INFO] ------------------------------------------------------------------------
Using it
First we'll extract the results of our build. Then actually there is a little bit of configuration needed even for a single-cluster setup. Don't worry, I'll copy it here for your comfort ;-)
tar -xvzf /Users/doma/dev/hadoop-2.4.0/hadoop-dist/target/hadoop-2.4.0.tar.gz -C ~/tools
The contents of ~/tools/hadoop-2.4.0/etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
The contents of ~/tools/hadoop-2.4.0/etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Passwordless SSH
From the official docs:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Starting up
Let's see what we've did. This is a raw copy from the official docs.
- Format the filesystem:
bin/hdfs namenode -format
- Start NameNode daemon and DataNode daemon:
The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
- Browse the web interface for the NameNode; by default it is available at:
- Make the HDFS directories required to execute MapReduce jobs:
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/<username>
- Copy the input files into the distributed filesystem:
- Run some of the examples provided (that's actually one line...):
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar grep input output 'dfs[a-z.]+'
- Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
bin/hdfs dfs -get output output
cat output/*
or
View the output files on the distributed filesystem:
bin/hdfs dfs -cat output/*
- When you're done, stop the daemons with:
sbin/stop-dfs.sh
Possible errors without the fixes & tweaks above
This list is an excerpt from my efforts during the build. They meant to drive you here via google ;-) Apply the procedure above and all of these errors will be fixed for you.
Without ProtoBuf
If you don't have protobuf, you'll get the following error:
[INFO] --- hadoop-maven-plugins:2.4.0:protoc (compile-protoc) @ hadoop-common ---
[WARNING] [protoc, --version] failed: java.io.IOException: Cannot run program "protoc": error=2, No such file or directory
[ERROR] stdout: []
Wrong version of ProtoBuf
If you don't have the correct version of protobuf, you'll get
[ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.4.0:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: protoc version is 'libprotoc 2.6.1', expected version is '2.5.0' -> [Help 1]
CMAKE missing
If you don't have cmake, you'll get
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-common: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "cmake" (in directory "/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native"): error=2, No such file or directory
[ERROR] around Ant part ...... @ 4:132 in /Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/antrun/build-main.xml
JAVA_HOME missing
If you don't have JAVA_HOME correctly set, you'll get
[exec] -- Detecting CXX compiler ABI info
[exec] -- Detecting CXX compiler ABI info - done
[exec] -- Detecting CXX compile features
[exec] -- Detecting CXX compile features - done
[exec] CMake Error at /opt/local/share/cmake-3.2/Modules/FindPackageHandleStandardArgs.cmake:138 (message):
[exec] Could NOT find JNI (missing: JAVA_AWT_LIBRARY JAVA_JVM_LIBRARY
[exec] JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH)
[exec] Call Stack (most recent call first):
[exec] /opt/local/share/cmake-3.2/Modules/FindPackageHandleStandardArgs.cmake:374 (_FPHSA_FAILURE_MESSAGE)
[exec] /opt/local/share/cmake-3.2/Modules/FindJNI.cmake:287 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
[exec] JNIFlags.cmake:117 (find_package)
[exec] CMakeLists.txt:24 (include)
[exec]
[exec]
[exec] -- Configuring incomplete, errors occurred!
[exec] See also "/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native/CMakeFiles/CMakeOutput.log".
JniBasedUnixGroupsNetgroupMapping.c patch missing
If you don't have the patch for JniBasedUnixGroupsNetgroupMapping.c above, you'll get
[exec] [ 38%] Building C object CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.o
[exec] /Library/Developer/CommandLineTools/usr/bin/cc -Dhadoop_EXPORTS -g -Wall -O2 -D_REENTRANT -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fPIC -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native/javah -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/src -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/target/native -I/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/include/darwin -I/opt/local/include -I/Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/util -o CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.o -c /Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c
[exec] /Users/doma/dev/hadoop-2.4.0/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c:77:26: error: invalid operands to binary expression ('void' and 'int')
[exec] if(setnetgrent(cgroup) == 1) {
[exec] ~~~~~~~~~~~~~~~~~~~ ^ ~
[exec] 1 error generated.
[exec] make[2]: *** [CMakeFiles/hadoop.dir/main/native/src/org/apache/hadoop/security/JniBasedUnixGroupsNetgroupMapping.c.o] Error 1
[exec] make[1]: *** [CMakeFiles/hadoop.dir/all] Error 2
[exec] make: *** [all] Error 2
fcloseall patch missing
Without applying the fcloseall patch above, you might get the following error:
[exec] Undefined symbols for architecture x86_64:
[exec] "_fcloseall", referenced from:
[exec] _launch_container_as_user in libcontainer.a(container-executor.c.o)
[exec] ld: symbol(s) not found for architecture x86_64
[exec] collect2: error: ld returned 1 exit status
[exec] make[2]: *** [target/usr/local/bin/container-executor] Error 1
[exec] make[1]: *** [CMakeFiles/container-executor.dir/all] Error 2
[exec] make: *** [all] Error 2
Symlink missing
Without the "export JAVA_HOME=`/usr/libexec/java_home -v 1.7`;sudo mkdir $JAVA_HOME/Classes;sudo ln -s $JAVA_HOME/lib/tools.jar $JAVA_HOME/Classes/classes.jar" line creating the symlinks above, you'll get
Exception in thread "main" java.lang.AssertionError: Missing tools.jar at: /Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/Classes/classes.jar. Expression: file.exists()
at org.codehaus.groovy.runtime.InvokerHelper.assertFailed(InvokerHelper.java:395)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.assertFailed(ScriptBytecodeAdapter.java:683)
at org.codehaus.mojo.jspc.CompilationMojoSupport.findToolsJar(CompilationMojoSupport.groovy:371)
at org.codehaus.mojo.jspc.CompilationMojoSupport.this$4$findToolsJar(CompilationMojoSupport.groovy)
...
References:
http://java-notes.com/index.php/hadoop-on-osx
https://issues.apache.org/jira/secure/attachment/12602452/HADOOP-9350.patch
http://www.csrdu.org/nauman/2014/01/23/geting-started-with-hadoop-2-2-0-building/
https://developer.apple.com/library/mac/documentation/Porting/Conceptual/PortingUnix/compiling/compiling.html
https://github.com/cooljeanius/libUnixToOSX/blob/master/fcloseall.c
http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html