developer
Contents
#custom-toc-container
#custom-toc-container
A
A
Serif
Sans
White
Sepia
Night
Twitter
Google
Facebook
Weibo
Instapaper
Rocks cluster笔记——Rocks安装的一些常见问题
Rocks cluster笔记——Rocks安装的一些常见问题
2018年12月4日 23:35
143次浏览
软科
常见问题和命令 1.永久关闭防火墙: rocks run host "chkconfig iptables off" 2.增加环境变量: 全局变量 加入到 /etc/profile 当前用户变量加入到 ~/.bashrc 3.设置系统时间 设置系统时间 date -s 20071215 date -s 15:35 如果要同时更改BIOS时间 在执行 clock -w 所有节点安装完成后: 4. ssh 其他节点时: Warning: untrusted X11 forwarding setup failed: xauth key data not generated Warning: No xauth data; using fake authentication data for X11 forwarding. 解决办法:修改 /etc/ssh/ssh_config 文件,在最后加入 ForwardX11Trusted yes (加入各个节点,并将头结点的秘钥文件拷过来) scp /root/.ssh/* compute:/root/.ssh/ 然后退出执行 rocks sync config 4.节点重新安装 如果集群中的节点机需要重新安装,可以在这个节点机上运行: `/boot/kickstart/cluster-kickstart` 来重装系统。或者可以在Frontend节点机上运行: `rocks run host '/boot/kickstart/cluster-kickstart'` 来重新安装所有的compute节点机。 如果想重装集群中所有的compute节点机,并在重装完以后让这些节点机继续执行由于重装而中断的计算任务,可以通过SGE控制来实现,运行: /opt/gridengine/examples/jobs/sge-reinstall.sh 5. How do I remove a compute node from the cluster? On your frontend end, execute: ``` rocks remove host "[your compute node name]" For example, if the compute node’s name is compute-0-1, you’d execute # rocks remove host compute-0-1 # rocks sync config ``` # # ``` The compute node has been removed from the cluster. 6. How do I export a new directory from the frontend to all the compute nodes that is accessible under /home? Execute this procedure: • Add the directory you want to export to the file /etc/exports. For example, if you want to export the directory /export/disk1, add the following to /etc/exports: /export/disk1 10.0.0.0/255.0.0.0(rw) • Restart NFS: # /etc/rc.d/init.d/nfs restart • Add an entry to /etc/auto.home. For example, say you want /export/disk1 on the frontend machine (named frontend-0) to be mounted as /home/scratch on each compute node. Add the following entry to /etc/auto.home: scratch frontend-0:/export/disk1 • Inform 411 of the change: make -C /var/411 Now when you login to any compute node and change your directory to /home/scratch, it will be automounted. 7. 注意:在每次运行完rocks的一些命令修改了数据库配置信息后,比如删除compute节点机,都要再运行: rocks sync config 来将更新后的数据库信息写入到节点机的系统配置文件中,否则在运行其他管理命令时会遇到一些莫名的错误。 8 VASP 任务提交 1) (周健)名称: vasp.sh ``` #!/bin/bash # #$ -cwd #$ -j y #$ -S /bin/bash mpirun -r ssh -f $TMPDIR/machines -n $NSLOTS /home/software/vasp/vasp ``` 蓝色部分每个作业脚本必写。 Entries which start with #$ will be treated as SGE options. • -cwd means to execute the job for the current working directory. • -j y means to merge the standard error stream into the standard output stream instead of having two separate error and output streams. • -S /bin/bash specifies the interpreting shell for this job to be the Bash shell. -np $NSLOTS 表明使用多少个处理器核心进行计算,后面跟着计算软件路径。 提交时: qsub -pe mpich 4 vasp.sh 2) ```#!/bin/bash # #$ -cwd #$ -j y #$ -S /bin/bash #$ -pe mpich 16 ``` (可加 expor=$PATH:路径) mpirun -r ssh -f $TMPDIR/machines -n $NSLOTS /home/software/vasp/vasp (MPI_DIR=/opt/mpich/gnu $MPI_DIR/bin/mpirun -np $NSLOTS -machinefile $TMP/machines ./cpi ) 蓝色部分每个作业脚本必写。 #$ -pe mpich 16 指定脚本的并行环境为mpich,同时申请了16个处理器核心来进行运算。其它 根据各个应用程序不同做相应更改。 提交时: qsub vasp.sh (或 ./vasp.sh) 4)执行 qstat 查看作业执行状态 说明,作业执行状态 qw 作业处于等待状态,r 运行状态。Slots 显示的是当前作业时 几个处理器核心在运算。 9 软件安装 修改组名: group -n 新组名 旧组名 修改用户属组: usermod -g 组名 用户名 Usermod -l 新用户名 旧用户名 Usermod -d 登录目录 用户名 Userdel -r 用户名 Groupadd cluster 10 添加用户 (当不存在 cluster组时) Adduser -g root mu Adduser -g root soft Passwd mu Rocks sync users make -C /var/411/ force Rocks sync config 默认情况下,新建用户mu建立/export/home/mu目录,此目录是被其他计算节点共享的,对应/home/mu (包括头节点,软件可装在/export/home/mu/soft/下)。 2) Root下建立用户 softe useradd soft 3) Root下删除其密码 passwd -d soft Chmod a+rwx /export/home/soft 同步账户 rocks sync users 发布密码的信息 make -C /var/411 force 2) 使用XFTP 将程序考到soft 下 使用root用户copy /export/home/soft/src 下 然后更改属主 chown -v soft:soft 文件名或目录 (用户名:用户组) 3) rocks run host compute-0-0 command="hostname" rocks run host n "reboot" Run the command ’ls /tmp/’ on all n nodes. 11. ERROR: unable to send message to qmaster using port 536 on host "cluster.local": got send error Luca Clementi luca.clementi at gmail.com Wed Sep 12 19:34:35 PDT 2012 • Previous message: [Rocks-Discuss] ERROR: unable to send message to qmaster using port 536 on host "cluster.local": got send error • Next message: [Rocks-Discuss] ERROR: unable to send message to qmaster using port 536 on host "cluster.local": got send error • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] ________________________________________ On Wed, Sep 12, 2012 at 4:42 PM, 杨燚 <yang_yi at neusoft.com> wrote: > I just delete some users and stop some services. and then the rocks sync > config doesn’t work any more > > > > [root at cluster ~]# rocks sync config > > error: commlib error: got select error (Connection refused) > > ERROR: unable to send message to qmaster using port 536 on host > "cluster.local": got send error > > I would think it's an sge problem. Can you restart it from the init script? /etc/init.d/sgemaster.zhaoming start /etc/init.d/sgeexecd.sten start Luca 12.Problems with X11 forwarding and qlogin Anoop Rajendra anoop.rajendra at gmail.com Fri Oct 9 17:19:37 PDT 2009 • Previous message: [Rocks-Discuss] Problems with X11 forwarding and qlogin • Next message: [Rocks-Discuss] Problems with X11 forwarding and qlogin • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] ________________________________________ On your frontend, add the line ForwardX11Trusted yes to your /etc/ssh/ssh_config and let us know if that solves your problem. 注:在rockscluster中所有节点的/etc/ssh/ssh_config都要加入这条语句 13 给所有节点安装软件 见rocks cluster6.2版手册5.1. Adding Packages to Compute Nodes 在主节点frontend上yum装完软件后,安装包在/var/cache/yum下,将该目录下所有安装包拷到/export/rocks/install/contrib/6.2/arch/RPMS 下,然后按如下操作,便可将将主节点安装的软件安装到所有子节点 Put the package you want to add in: /export/rocks/install/contrib/6.2/arch/RPMS Where arch is your architecture ("i386" or "x86_64"). Create a new XML configuration file that will extend the current compute.xml configuration file: ``` # cd /export/rocks/install/site-profiles/6.2/nodes # cp skeleton.xml extend-compute.xml ``` If you use extend-compute.xml your packages will be installed only on your computed nodes. If you want your packages to be installed on all other appliances (e.g. login nodes, nas nodes, etc.) you should use extend-base.xml instead of extend-compute.xml. Inside extend-compute.xml, add the package name by changing the section from: <package> <!-- insert your package name here --> </package> to: <package> your package </package> <package>rsh-server </package> It is important that you enter the base name of the package in extend-compute.xml and not the full name. For example, if the package you are adding is named XFree86-100dpi-fonts-4.2.0-6.47.i386.rpm, input XFree86-100dpi-fonts as the package name in extend-compute.xml. <package>XFree86-100dpi-fonts</package> If you have multiple packages you’d like to add, you’ll need a separate <package> tag for each. For example, to add both the 100 and 75 dpi fonts, the following lines should be in extend-compute.xml: <package>XFree86-100dpi-fonts</package> <package>XFree86-75dpi-fonts</package> Also, make sure that you remove any package lines which do not have a package in them. For example, the file should NOT contain any lines such as: 36 Chapter 5. Customizing your Rocks Installation <package> <!-- insert your package name here --> </package> Now build a new Rocks distribution. This will bind the new package into a RedHat compatible distribution in the directory /export/rocks/install/rocks-dist/.... ``` # cd /export/rocks/install # rocks create distro ``` Now, reinstall your compute nodes. 14 网络重装reinstall所有计算节点 After your frontend completes its installation, the last step is to force a re-installation of all of your compute nodes. The following will force a PXE (network install) reboot of all your compute nodes. ``` # ssh-agent $SHELL # ssh-add # rocks run host compute ’/boot/kickstart/cluster-kickstart-pxe’ ``` 15. [Rocks-Discuss] installing rsh in rocks cluster 5.3 请参考上一节:给所有节点安装软件 Go Yoshimura go-yoshimura at sstc.co.jp Mon May 17 23:41:14 PDT 2010 • Previous message: [Rocks-Discuss] installing rsh in rocks cluster 5.3 • Next message: [Rocks-Discuss] SGE Not reporting CPU Cores Correctly • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] ________________________________________ Hi Leo! base-rsh.xml - We are not sure about base-rsh.xml but you can create it by hand. - Perhaps, /export/rocks/install/rocks-dist/x86_64/build/nodes/rsh.xml may be the answer(I'm not sure). [root at panrocks53 nodes]# cat /export/rocks/install/rocks-dist/x86_64/build/nodes/rsh.xml | grep package <package>rsh</package> <package>rsh-server</package> - We usually install rsh-server,telnet-server, vsftpd with specifying like <package>rsh-server </package> <package>telnet-server </package> <package>vsftpd </package> in a node file. - About node file and graph file, http://www.rocksclusters.org/rocksapalooza/2009/customizing.pdf is helpful. RPM - We pick up RPMs from CentOS5.4 iso file. thank you go ----- Leo P. wrote: >Hi everyone, > >I am trying to install rsh in rocks cluster 5.3. I tried using the old way specified here > >http://www.rocksclusters.org/rocks-documentation/4.2/customization-rsh.html > >But i can find the base-rsh.xml and RPM in the repository. > >So can anyone please tell me how i can install rsh in rocks cluster 5.3. > >I need rsh to run an old software and can not use ssh instead :) > > >Leo > >-------------- next part -------------- >An HTML attachment was scrubbed... >URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20100518/edebcc46/attachment.html > ---- Go Yoshimura <go-yoshimura at sstc.co.jp> Scalable Systems Co., Ltd. <http://www.sstc.co.jp/> Osaka Office HONMACHI-COLLABO Bldg. 4F, 4-4-2 Kita-kyuhoji-machi, Chuo-ku, Osaka 541-0057 Japan Tel: 81-6-6224-4115 Tokyo Kojimachi Office BUREX Kojimachi 11F, 3-5-2 Kojimachi, Chiyoda-ku, Tokyo 102-0083 Japan Tel: 81-3-5875-4718 Fax: 81-3-3237-7612 16. 关于分区 export 链接到 硬盘剩余空间 share目录下新建链接apps文件夹 ,链接到 export下的apps文件夹 17. 制作frontend的iso文件并升级节点 见rockscluster6.2手册3.4. Upgrade or Reconfigure Your Existing Frontend This procedure describes how to use a Restore Roll to upgrade or reconfigure your existing Rocks cluster. Let’s create a Restore Roll for your frontend. This roll will contain site-specific info that will be used to quickly reconfigure your frontend (see the section below for details). ``` # cd /export/site-roll/rocks/src/roll/restore # make roll ``` The above command will output a roll ISO image that has the name of the form: hostname-restore-date-0.arch.disk1.iso. For example, on the i386-based frontend with the FQDN of rocks-45.sdsc.edu, the roll will be named like: rocks-45.sdsc.edu-restore-2006.07.24-0.i386.disk1.iso Burn your restore roll ISO image to a CD. Reinstall the frontend by putting the Rocks Boot CD in the CD tray (generally, this is the Kernel/Boot Roll) and reboot the frontend. 22 Chapter 3. Installing a Rocks Cluster At the boot: prompt type: build At this point, the installation follows the same steps as a normal frontend installation (See the section: Install Frontend) -- with two exceptions: 1. On the first user-input screen (the screen that asks for ’local’ and ’network’ rolls), be sure to supply the Restore Roll that you just created. 2. You will be forced to manually partition your frontend’s root disk. You must reformat your / partition, your /var partition and your /boot partition (if it exists). Also, be sure to assign the mountpoint of /export to the partition that contains the users’ home areas. Do NOT erase or format this partition, or you will lose the user home directories. Generally, this is the largest partition on the first disk. After your frontend completes its installation, the last step is to force a re-installation of all of your compute nodes. The following will force a PXE (network install) reboot of all your compute nodes. ``` # ssh-agent $SHELL # ssh-add # rocks run host compute ’/boot/kickstart/cluster-kickstart-pxe’ ```
Name:
Content: