Thursday, 3 June 2010

Hadoop Cluster Problems

Network Instability

The cluster had intermittent network availability, sometimes it would accept ssh connections only sometimes. The connections only seemed to last a period of 10 minutes before they were disconnected.
It was initailly thought that it may have been something to do with the dhcp server but it turned out that this was not the cause.
One of the symptoms was during a ping to the DNS server one would get a few returns then a few Destination Host Unreachable errors.
It was eventually traced to having two gateways setup in the '/etc/network/interfaces' one on the public network and one on the private network, when this was corrected to having only one public gateway this fixed the problem.

No comments:

Post a Comment

File resolv.conf changed on reboot

The file /etc/resolv.conf was being reset and losing the correct nameserver on my Raspberry Pi after a reboot. The unfriendly way of fixing ...