Hadoop学习第一步之基础环境搭建 1.下载并安装安装ssh
sudo apt-get install openssh-server openssh-client
3.搭建vsftpd
#sudo apt-get update
#sudo apt-get install vsftpd
配置参考 的开始、关闭和重启
$sudo /etc/vsftpd start #开始
$sudo /etc/vsftpd stop #关闭
$sudo /etc/vsftpd restart #重启
4.安装
sudo chown -R hadoop:hadoop /opt
cp /soft/ /opt
sudo vi /etc/profile
alias untar='tar -zxvf'
sudo source /etc/profile
source /etc/profile
untar jdk*
环境变量配置
# vi /etc/profile
●在profile文件最后加上
# set java environment
export JAVA_HOME=/opt/
export CLASSPATH=.:$JAVA_HOME/lib/:$JAVA_HOME/lib/
export PATH=$JAVA_HOME/bin:$PATH
配置完成后,保存退出。
●不重启,更新命令
#source /etc/profile
●测试是否安装成功
# Java –version
其他问题:
出现unable to resolve host 解决方法
参考
开机时停在 Starting sendmail 不动了的解决方案
参考 安装软件时出现 E: Unable to locate package vsftpd
参考 vi/vim 使用方法讲解
参考 分类: Hadoop --------------------------------------------
克隆master虚拟机至node1 、node2
分别修改master的主机名为master、node1的主机名为node1、node2的主机名为node2
(启动node1、node2系统默认分配递增ip,无需手动修改)
分别修改/etc/hosts中的ip和主机名(包含其他节点ip和主机名)
---------
配置ssh免密码连入
hadoop@node1:~$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_dsa.
Your public key has been saved in /home/hadoop/.ssh/.
The key fingerprint is:
SHA256:B8vBju/uc3kl/v9lrMqtltttttCcXgRkQPbVoU hadoop@node1
The key's randomart image is:
+---[DSA 1024]----+
| .... |
| o+.E . |
| . oo + |
| .. + + |
|o +. o ooo +|
|=o. . o. ooo. o.|
|*o... .+=o .+++.+|
+----[SHA256]-----+
hadoop@node1:~$ cd .ssh
hadoop@node1:~/.ssh$ ll
总用量 16
drwx------ 2 hadoop hadoop 4096 Jul 24 20:31 ./
drwxr-xr-x 18 hadoop hadoop 4096 Jul 24 20:31 ../
-rw------- 1 hadoop hadoop 668 Jul 24 20:31 id_dsa
-rw-r--r-- 1 hadoop hadoop 602 Jul 24 20:31
hadoop@node1:~/.ssh$ cat >> authorized_keys
hadoop@node1:~/.ssh$ ll
总用量 20
drwx------ 2 hadoop hadoop 4096 Jul 24 20:32 ./
drwxr-xr-x 18 hadoop hadoop 4096 Jul 24 20:31 ../
-rw-rw-r-- 1 hadoop hadoop 602 Jul 24 20:32 authorized_keys
-rw------- 1 hadoop hadoop 668 Jul 24 20:31 id_dsa
-rw-r--r-- 1 hadoop hadoop 602 Jul 24 20:31
单机回环ssh免密码登录测试
hadoop@node1:~/.ssh$ ssh localhost
The authenticity of host 'localhost ' can't be established.
ECDSA key fingerprint is
SHA256:daO0dssyqt12tt9yGUauImOh6tt6A1SgxzSfSmpQqJVEiQTxas.
Are you sure you want to continue connecting (yes/no) yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu (GNU/Linux x86_64)
* Documentation: packages can be updated.
178 updates are security updates.
New release ' LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Sun Jul 24 20:21:39 2016 from exit
注销
Connection to localhost closed.
hadoop@node1:~/.ssh$
出现以上信息说明操作成功,其他两个节点同样操作
让主结点(master)能通过SSH免密码登录两个子结点(slave)
hadoop@node1:~/.ssh$ scp hadoop@master:~/.ssh/ ./
The authenticity of host 'master ' can't be established.
ECDSA key fingerprint is
SHA256:daO0dssyqtt9yGUuImOh646A1SgxzSfatSmpQqJVEiQTxas.
Are you sure you want to continue connecting (yes/no) yes
Warning: Permanently added 'master,' (ECDSA) to the list of known hosts.
hadoop@master's password:
100% 603 s 00:00
hadoop@node1:~/.ssh$ cat >> authorized_keys
如上过程显示了node1结点通过scp命令远程登录master结点,并复制master的公钥文件到当前的目录下,
这一过程需要密码验证。接着,将master结点的公钥文件追加至authorized_keys文件中,通过这步操作,
如果不出问题,master结点就可以通过ssh远程免密码连接node1结点了。在master结点中操作如下:
hadoop@master:~/.ssh$ ssh node1
The authenticity of host 'node1 ' can't be established.
ECDSA key fingerprint is
SHA256:daO0dssyqt9yGUuImOh3466A1SttgxzSfSmpQqJVEiQTxas.
Are you sure you want to continue connecting (yes/no) yes
Warning: Permanently added 'node1,' (ECDSA) to the list of known hosts.
Welcome to Ubuntu (GNU/Linux x86_64)
* Documentation: packages can be updated.
178 updates are security updates.
New release ' LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Sun Jul 24 20:39:30 2016 from exit
注销
Connection to node1 closed.
hadoop@master:~/.ssh$
由上图可以看出,node1结点首次连接时需要,“YES”确认连接,
这意味着master结点连接node1结点时需要人工询问,无法自动连接,
输入yes后成功接入,紧接着注销退出至master结点。要实现ssh免密码连接至其它结点,
还差一步,只需要再执行一遍ssh node1,如果没有要求你输入”yes”,就算成功了,过程如下:
hadoop@master:~/.ssh$ ssh node1
Welcome to Ubuntu (GNU/Linux x86_64)
* Documentation: packages can be updated.
178 updates are security updates.
New release ' LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Sun Jul 24 20:47:20 2016 from exit
注销
Connection to node1 closed.
hadoop@master:~/.ssh$
如上图所示,master已经可以通过ssh免密码登录至node1结点了。
对node2结点也可以用上面同样的方法进行
表面上看,这两个结点的ssh免密码登录已经配置成功,但是我们还需要对主结点master也要进行上面的同样工作,
这一步有点让人困惑,但是这是有原因的,具体原因现在也说不太好,据说是真实物理结点时需要做这项工作,
因为jobtracker有可能会分布在其它结点上,jobtracker有不存在master结点上的可能性。
对master自身进行ssh免密码登录测试工作:
hadoop@master:~/.ssh$ scp hadoop@master:~/.ssh/ ./
The authenticity of host 'master ' can't be established.
ECDSA key fingerprint is
SHA256:daO0dssttqt9yGUuImOahtt166AgxttzSfSmpQqJVEiQTxas.
Are you sure you want to continue connecting (yes/no) yes
Warning: Permanently added 'master' (ECDSA) to the list of known hosts.
100% 603 s 00:00
hadoop@master:~/.ssh$ cat >> authorized_key
hadoop@master:~/.ssh$ ssh master
Welcome to Ubuntu (GNU/Linux x86_64)
* Documentation: packages can be updated.
178 updates are security updates.
New release ' LTS' available.
Run 'do-release-upgrade' to upgrade to it.
Last login: Sun Jul 24 20:39:24 2016 from exit
注销
Connection to master closed.
至此,SSH免密码登录已经配置成功。
-------------------------
解压 hadoop
然后更新环境变量
vi /etc/profile
export JAVA_HOME=/opt/
CLASSPATH=.:$JAVA_HOME/lib/:$JAVA_HOME/lib/
export HADOOP_HOME=/opt/hadoop
export
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=\"\"
alias untar='tar -zxvf'
alias viprofile='vi /etc/profile'
alias sourceprofile='source /etc/profile'
alias catprofile='cat /etc/profile'
alias cdhadoop='cd /opt/hadoop/'
alias startdfs='$HADOOP_HOME/sbin/'
alias startyarn='$HADOOP_HOME/sbin/'
alias stopdfs='$HADOOP_HOME/sbin/'
alias stopyarn='$HADOOP_HOME/sbin/'
source /etc/profile
------------------
步骤六:修改配置
一共有7个文件要修改:
$HADOOP_HOME/etc/hadoop/
$HADOOP_HOME/etc/hadoop/
$HADOOP_HOME/etc/hadoop/
$HADOOP_HOME/etc/hadoop/
$HADOOP_HOME/etc/hadoop/
$HADOOP_HOME/etc/hadoop/
$HADOOP_HOME/etc/hadoop/slaves
其中$HADOOP_HOME表示hadoop根目录
a) 、
这二个文件主要是修改JAVA_HOME后的目录,改成实际本机jdk所在目录位置
vi etc/hadoop/ (及 vi etc/hadoop/)
找到下面这行的位置,改成(jdk目录位置,大家根据实际情况修改)
export JAVA_HOME=/opt/另外 中 , 建议加上这句:
export HADOOP_PREFIX=/opt/hadoop
b) 参考下面的内容修改:
注:/opt/hadoop/tmp 目录如不存在,则先mkdir手动创建 的完整参数请参考 注: 表示数据副本数,一般不大于 datanode 的节点数。 的完整参数请参考 的完整参数请参考 version=\"\"> 的完整参数请参考 另外,hadoop 与相比, 中的很多参数已经被标识为过时,具体可参考 最后一个文件slaves暂时不管(可以先用mv slaves 将它改名),上述配置弄好后,就可以在master上启用 NameNode测试了,方法: $HADOOP_HOME/bin/hdfs namenode –format 先格式化 16/07/25 。。。 16/07/25 20:34:42 INFO : Allocated new BlockPoolId: 20:34:42 INFO : Storage directory /opt/hadoop/tmp/dfs/name has been successfully formatted. 16/07/25 20:34:43 INFO : Going to retain 1 images with txid >= 0 16/07/25 20:34:43 INFO : Exiting with status 0 16/07/25 20:34:43 INFO : SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at master/等看到这个时,表示格式化ok $HADOOP_HOME/sbin/ 启动完成后,输入jps (ps -ef | grep ...)查看进程,如果看到以下二个进程: 5161 SecondaryNameNode 4989 NameNode 表示master节点基本ok了 再输入$HADOOP_HOME/sbin/ ,完成后,再输入jps查看进程 5161 SecondaryNameNode 5320 ResourceManager 4989 NameNode 如果看到这3个进程,表示yarn也ok了 f) 修改 /opt/hadoop/etc/hadoop/slaves 如果刚才用mv slaves 对该文件重命名过,先运行 mv slaves 把名字改回来,再 vi slaves 编辑该文件,输入 node1 node2 保存退出,最后运行 $HADOOP_HOME/sbin/ $HADOOP_HOME/sbin/ 停掉刚才启动的服务 步骤七:将master上的hadoop目录复制到 node1,node2 仍然保持在master机器上 cd 先进入主目录 cd /opt zip -r hadoop scp -r hadoop@node1:/opt/ scp -r hadoop@node2:/opt/ unzip 注: node1 、 node2 上的hadoop临时目录(tmp)及数据目录(data),仍然要先手动创建。 ----- 步骤八:验证 master节点上,重新启动 $HADOOP_HOME/sbin/ $HADOOP_HOME/sbin/ ------ hadoop@master:/opt/hadoop/sbin$ Starting namenodes on [master] master: starting namenode, logging to /opt/hadoop/logs/ node1: starting datanode, logging to /opt/hadoop/logs/ node2: starting datanode, logging to /opt/hadoop/logs/ Starting secondary namenodes [ starting secondarynamenode, logging to /opt/hadoop/logs/ ------ hadoop@master:/opt/hadoop/sbin$ starting yarn daemons starting resourcemanager, logging to /opt/hadoop/logs/ node1: starting nodemanager, logging to /opt/hadoop/logs/ node2: starting nodemanager, logging to /opt/hadoop/logs/ ------ 顺利的话,master节点上有几下3个进程: ps -ef | grep ResourceManager ps -ef | grep SecondaryNameNode ps -ef | grep NameNode 7482 ResourceManager 7335 SecondaryNameNode 7159 NameNode slave01、slave02上有几下2个进程: ps -ef | grep DataNode ps -ef | grep NodeManager 2296 DataNode 2398 NodeManager 同时可浏览: bin/hdfs dfsadmin -report 查看hdfs的状态报告 其它注意事项: a) master(即:namenode节点)若要重新格式化,请先清空各datanode上的data目录(最好连tmp目录也一起清空),否则格式化完成后,启动dfs时,datanode会启动失败 b) 如果觉得master机器上只运行namenode比较浪费,想把master也当成一个datanode,直接在slaves文件里,添加一行master即可 c) 为了方便操作,可修改/etc/profile,把hadoop所需的lib目录,先加到CLASSPATH环境变量中,同时把hadoop/bin,hadoop/sbin目录也加入到PATH变量中,可参考下面的内容(根据实际情况修改): export HADOOP_HOME=/home/hadoop/ JAVA_HOME=/usr/java/ CLASSPATH=.:$JAVA_HOME/jre/lib/:$JAVA_HOME/lib/:$JAVA_HOME/lib/:$HADOOP_HOME/share/hadoop/common/ PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin by colplay 因篇幅问题不能全部显示,请点此查看更多更全内容