Friday, September 12, 2014

Git ignore pattern in a file not working? Watch out for spaces on the line!

I couldn't figure out why a git ignore pattern applied via:

git config --global core.excludesfile

wasn't working.  Turns out there were spaces at the end of the pattern line that were preventing it from working.  According to the docs, spaces at the end of lines aren't supposed to matter:

Trailing spaces are ignored unless they are quoted with backlash ("\") 

However, it did in this case.  Perhaps this is just an issue with Git on Windows.

TIP:  Use the following to test out your ignore patterns in dry run mode via -n:

git add -n *

Monday, September 8, 2014

Resetting (Deleting and Cleaning Out) an Ambari Cluster

If you are experimenting with Ambari for Hadoop cluster provisioning, it is useful to be able to wipe the ambari server and agents clean so you can try again.  There are some commands provided by Ambari that you can run to do this, but there are also a couple of things to watch out for--detailed below.  These instructions worked for me on Ambari 1.6.1 with Redhat 6.5.

First, stop and reset on the Ambari server:

[root@test-ambari ambuser]# ambari-server stop
[root@test-ambari ambuser]# ambari-server reset

Next, to prevent a possible obscure "no more mirrors to try" error on re-provisioning, clean out yum cache on all the agent machines--as I showed here.  I have SaltStack installed so I can run it across my cluster like this (or just log into each machine and run 'yum clean all'):

[root@test-ambari ~]# salt '*' 'yum clean all'

Then go to each Ambari agent machine and run the host cleanup.  It would be nice to do this with SaltStack, but that requires giving sudo tty permissions for the command (which I didn't want to get into).  

I'm showing some of the output below but you may see different behaviour depending on the particulars of the cluster and how far the prior provisioning process got:

[root@master-master ~]# python /usr/lib/python2.6/site-packages/ambari_agent/ --silent

Now restart the Ambari server:

[root@test-ambari ambuser]# ambari-server start

Now I don't know if this was documented anywhere, but I am using a script to provision my cluster via the API--and I found I had to wait until all the machines (agents) self-register with the ambari server (at least this is what I think is going on).  Here, I am using the ambari api,  piped through "wc", to monitor the count of of registered machines.  It took about 45 seconds for all the agents to register (when the count finally hit 4).

[root@test-ambari ~]# curl -sH "X-Requested-By: ambari" -u $USER:$PWD -i  http://localhost:8080/api/v1/hosts | grep host_name | wc
      2       6     102
[root@test-ambari ~]# curl -sH "X-Requested-By: ambari" -u $USER:$PWD -i  http://localhost:8080/api/v1/hosts | grep host_name | wc
      3       9     152
[root@test-ambari ~]# curl -sH "X-Requested-By: ambari" -u $USER:$PWD -i  http://localhost:8080/api/v1/hosts | grep host_name | wc
      3       9     152
[root@test-ambari ~]# curl -sH "X-Requested-By: ambari" -u $USER:$PWD -i  http://localhost:8080/api/v1/hosts | grep host_name | wc
      4      12     202

If you proceed before everything is registered, you may run into this error using the API:

  "status" : 400,
  "message" : "Attempted to add unknown hosts to a cluster.  These hosts have not been registered with the server:"
At this point, you should have clean ambari server/agent cluster substrate to create the next cluster.  Happy provisioning!

Here are the commands with output:

Ambari-server  stop/reset:

[root@test-ambari ambuser]# ambari-server stop
Using python  /usr/bin/python2.6
Stopping ambari-server
Ambari Server stopped
[root@test-ambari ambuser]# ambari-server reset
Using python  /usr/bin/python2.6
Resetting ambari-server
**** WARNING **** You are about to reset and clear the Ambari Server database. This will remove all cluster host and configuration information from the database. You will be required to re-configure the Ambari server and re-run the cluster wizard. 
Are you SURE you want to perform the reset [yes/no] (no)? y
Confirm server reset [yes/no](no)? y
Resetting the Server database...
Connecting to local database...done.
WARNING: Non critical error in DDL, use --verbose for more information
Ambari Server 'reset' completed with warnings.

Yum cache cleaning:

[root@test-ambari ~]# salt '*' 'yum clean all'
    Loaded plugins: product-id, refresh-packagekit, rhnplugin, security,
    Cleaning repos: HDP-2.1 HDP-UTILS- Updates-ambari-1.6.1 ambari-1.x
                  : dogfood dogfood_6_x86-64 epel6_x86-64 rhel-x86_64-server-6
                  : rhel-x86_64-server-optional-6 rhel-x86_64-server-supplementary-6
    Cleaning up Everything

Host Cleanup (on the agents)--your output could be quite different:

[root@master-master ~]# python /usr/lib/python2.6/site-packages/ambari_agent/ --silent
Killing pid's: ['']
INFO:HostCleanup:Deleting packages: ['']
Deleting users: ['ambari-qa', 'yarn', 'hdfs', 'mapred', 'zookeeper']
INFO:HostCleanup:Executing command: sudo userdel -rf ambari-qa
INFO:HostCleanup:Successfully deleted user: ambari-qa
INFO:HostCleanup:Executing command: sudo userdel -rf yarn
INFO:HostCleanup:Successfully deleted user: yarn
INFO:HostCleanup:Executing command: sudo userdel -rf hdfs
INFO:HostCleanup:Successfully deleted user: hdfs
INFO:HostCleanup:Executing command: sudo userdel -rf mapred
INFO:HostCleanup:Successfully deleted user: mapred
INFO:HostCleanup:Executing command: sudo userdel -rf zookeeper
INFO:HostCleanup:Successfully deleted user: zookeeper
INFO:HostCleanup:Executing command: sudo groupdel hadoop
WARNING:HostCleanup:Cannot delete group : hadoop, groupdel: cannot remove the primary group of user 'tez'
INFO:HostCleanup:Path doesn't exists: /home/ambari-qa
INFO:HostCleanup:Path doesn't exists: /home/yarn
INFO:HostCleanup:Path doesn't exists: /home/hdfs
INFO:HostCleanup:Path doesn't exists: /home/mapred
INFO:HostCleanup:Path doesn't exists: /home/zookeeper
Deleting directories: ['']
INFO:HostCleanup:Path doesn't exists: 
Deleting repo files: []
Erasing alternatives:{'symlink_list': [''], 'target_list': ['']}
INFO:HostCleanup:Path doesn't exists: 

INFO:HostCleanup:Clean-up completed. The output is at /var/lib/ambari-agent/data/hostcleanup.result

Restart the Ambari server:

[root@test-ambari ambuser]# ambari-server start
Using python  /usr/bin/python2.6
Starting ambari-server
Ambari Server running with 'root' privileges.
Organizing resource files at /var/lib/ambari-server/resources...
Waiting for server start...
sh: line 0: ulimit: open files: cannot modify limit: Operation not permitted
Server PID at: /var/run/ambari-server/
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
Ambari Server 'start' completed successfully.

[root@test-ambari ambuser]# python 

Friday, September 5, 2014

Ambari Cluster Provisioning Failure -- No More Mirrors To Try

Saw this when trying to re-provision a cluster after doing an "ambari-server reset" (Ambari 1.6.1 on Redhat 6.5):

Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop-yarn' returned 1. Error Downloading Packages:
  hadoop-yarn- failure: hadoop/hadoop-yarn- from HDP-2.1: [Errno 256] No more mirrors to try.
  hadoop- failure: hadoop/hadoop- from HDP-2.1: [Errno 256] No more mirrors to try.
  zookeeper- failure: zookeeper/zookeeper- from HDP-2.1: [Errno 256] No more mirrors to try.

The solution was to do a "yum clean all" on the agents and retrying (requires doing all the "ambari-server reset" and agent cleanup again).