?

Log in

No account? Create an account
 

Linux clustering is a steaming pile of shit - QLJ

About Linux clustering is a steaming pile of shit

Previous Entry Linux clustering is a steaming pile of shit Jan. 4th, 2010 @ 02:50 am Next Entry
[To longtime readers of my blog, sorry about the more CS-y content that's starting to appear. It's all part of Iron Blogger. I'll try to also make sure to post some more personal stuff.]

I've spent the past 6 hours trying to configure a cluster of Linux machines. Before I talk about the current sad state of affairs, let me give you a brief history of Linux clustering.

In the beginning, there was Heartbeat. Released in 1998, it was a very simple clustering system. It only supported two nodes, and it managed a simple set of "resources" that could be running on either machine. It wasn't very powerful, but it also just worked.

Meanwhile Sistina (later RedHat) was creating cman, their own clustering system. cman was so bad that it was completely replaced with cman 2, which was a somewhat modular system where cman was a thin layer of configuration utilities that controlled the OpenAIS cluster library.

Heartbeat meanwhile had developed into Heartbeat 2, with support for more than 2 nodes and much more complete support for resource management. But the big feature it doesn't have is distributed locking support, which is a system for allowing synchronization between all the nodes in the cluster. Thus, I started looking at cman.

Around this time, Heartbeat development started to stagnate. So, some of the developers split heartbeat into a new tool called "pacemaker" that just manages the resources in the cluster and a new heartbeat that manages cluster membership. Then, there was an unholy matrimony of pacemaker with openais, where it gained support for using either heartbeat or openais as its cluster membership library.

Then things got even more interesting when openais decided to split into two modules, a new cluster membership library called "corosync" and a cluster utility library called (the new) "openais". Pacemaker doesn't require any of the utilities from the new openais, but it still calls corosync "openais" when it is using corosync.

(While doing research for this blog post, I also discovered that Heartbeat development has picked up again and they're about to do a release of Heartbeat 3.0.)

So, it is in this confusing and incestuous world of clustering software that I have tried to set up a clvm cluster. clvm is a piece of software that allows you to share a single physical disk (in our case, an iSCSI-connected RAID) between multiple computers that are simultaneously accessing it. clvm delightfully supports four different clustering systems: cman, openais, corosync, and gulm. We had been running clvm with cman, but it is extremely prone to deadlocks, which is particularly bad because cman's distributed locking is implemented in the Linux kernel, and there is no way to reset the state of the locks without completely rebooting your machine.

evanbro and I have been working on getting clvm, openais, corosync, and pacemaker running on Ubuntu Hardy for Invirt. After backporting the current versions of all of the software, we discovered that none of the documentation out there matched the current state of the software (it almost all assumes that "openais" is a single, unsplit program), and worse, most of the software isn't documented at all.

The corosync documentation says that you can write a simple configuration file and start "corosync" on all the nodes, and they will all find each other and form a cluster. Hah. While the log messages that get to syslog certainly indicate that the nodes find each other, all the client utilities (such as corosync-cfgtool, or more importantly, clvmd) fail with error code 6 ("SA_AIS_ERR_TRY_AGAIN"). There is, of course, no documentation anywhere that tells you what you should do when you encounter this error. After pounding on it for a while, including both enabling and disabling the openais plugins in corosync, we were able to get clvmd to start and configure itself (we did this by disabling openais but enabling the openais_lck plugin for distributed locking), but performing any operation simultaneously in two places immediately and repeatedly caused a deadlock.

Did I mention that pacemaker, despite running on not its original clustering software, just worked the minute we turned it on on openais?

Ugh. As keithw would say, "I bemoan the state of Computer Science".
Current Location: SIPB Office
Current Mood: accomplished
Tags: , ,
Leave a comment
From:(Anonymous)
Date:January 4th, 2010 01:45 pm (UTC)

Corrections

(Link)
Get your facts straight before mouthing off. Pacemaker is not a fork of Heartbeat: Heartbeat 3.0 = Heartbeat 2.x - Pacemaker 0.6 - ClusterGlue 1.0 - ResourceAgents 1.0 Corosync is not a fork of openais: OpenAIS 1.0 = OpenAIS 0.80 - Corosync 1.0 In both cases there was no fork, only the modularization of monolithic codebases. The code continues to be maintained by its original authors. There is also no Heartbeat "development". Linbit is simply releasing whats left after the splits with the intention of fixing any serious bugs that come up.
[User Picture Icon]
From:coolerq
Date:January 4th, 2010 03:23 pm (UTC)

Re: Corrections

(Link)
Sorry, you're right that my post isn't clear; "fork" implies that there is a difference between the new version's developers and the old version's. I've updated the post to just use the word split instead of the word fork.

Heartbeat 3 development definitely seems to have picked up though; I remember 6-12 months ago when the Heartbeat website didn't even work for a period of several months. Now they have a new wiki and are doing regular releases.

--Quentin
From:(Anonymous)
Date:February 2nd, 2010 06:17 am (UTC)
(Link)
Hi Quentin!

It's interesting to see "iSCSI-connected RAID" as one of the application on clustering. Do you have any idea about SAS network instead of iSCSI? (How can we make clustering working SAS newtwork?)

--Prashanth (prahanths2012@gmail.com)
(Leave a comment)
Top of Page Powered by LiveJournal.com