Friday, December 30, 2011

Overview of Veritas Cluster Server (VCS) Configuration Files

In a previous article, Introduction to Veritas Cluster Server, I gave a general overview of the VCS product and how it's put together and what you can do with it.  In this article I'll give a general overview of how you go about configuring it.

Before writing this, I have had no dealings with with product before, so hopefully what I cover here will help some lazy folk out there to quickly get an idea of what configuring this product is about.  I'm going to be looking at it from a Solaris perspective, since I believe most of the deployments of this product out in the field is on Solaris, and it's what I may have to support and perhaps you too.  I think it's very similar to the way it's done on Red Hat, so don't fear.

Files

The configuration files are normally in: /etc/VRTSvcs/conf/config

There are two main configuration files:
  1. main.cf that defines the entire cluster.
  2. types.cf that defines the resource types. 
There are more like the above, if more agents are used.  For example Oracletypes.cf.
These configuration files are loaded an maintained in a specific way.  The first node to start up in the cluster reads the configuration file from disk and keeps it in-memory and when other systems come online they have this configuration synchronized to them.  They write these files back to disk, and also updates to it gets written back to disk this way.  The only time you can really edit these from the command line is when the cluster is stopped.  Then you edit it on one server, start that server up and the others after it.

The language of the configuration file would look familiar to most system administrators.  You use curly braces around most things, and there are include clauses to import configuration from other files.

Blackberry's future

Regardless of outages and platform problems Blackberry is experiencing, I believe the phone that always tends to win in the long run is the one that wins both with the user interface and durability.  Early on this was Nokia, the user interface was intuitive, steps were thought through to happen in the way people would actually use it.  You make a call, you get an SMS at the same time, the SMS won't stop you from using DTMF tones to navigate your voice mail, when deciding to read the SMS the default options enable you to reply or erase with the easiest to reach button.  This couldn't be said for Ericsson phones at the time, where the steps that I described on the Nokia phone was a pain on the Ericsson.  Durability wise Nokia also won.  I've never had a broken Nokia, I've only had one Ericsson phone and it broke a few times.  I suspect Nokia must have done something similar to what Apple is rumoured to do, have lots of internal designs and prototypes compete it out until they get it right.  Nokia did slip up a couple of times, probably in a rush to market, for example with their first WAP phone and their first megapixel camera phone.  I suspect it's because they made shortcuts with the selection and refinement process.  Unfortunately Nokia got worse at it all, and Apple and Blackberry took over with smartphones.

Blackberry phones did some things good and, until the iPhone, they did them very well on the usability front.  They had a decent battery life, the buttons were good for e-mail, the scroll wheel and later the ball was good for navigation.  The menus weren't cumbersome and quick and easy to navigate.  It still is not bad at those things, probably better than other phones with that.  The user interface is simple but works well with most of the things you want to do with e-mail and things like Facebook.  That said, durability wise I found it to be horrible.  I've had 5 blackberry phones from work, and they were all replacements of phones that has broken while I had it.  Only one of the times it broke was it totally my fault (I dropped it in the toilet), but this is offset by the fact that I rarely used it anyway (I used my personal phone more) and that it was inside of my bag most of the time so not even exposed to the environment.  I am also not counting the times it just had parts swapped out like batteries or the ball.  This was only over a period of 3 years.

In the end the Blackberry seems to lag massively behind on usability that new Android and iOS phones rule now.  Maps are nowhere close to being as good, web browsing is nowhere close, interfacing with these kinds of apps and others are nowhere near.  That said Blackberry phones are still a lot cheaper and it's also lighter on bandwidth so for many that it still makes sense for people to buy them.  I can't comment on the durability of Android based phones, but the iPhone beats the Blackberry, hands down.

Apple always operate in the premium side of the market, so I'm expecting Android offerings to do well in the market below that.  Both in kicking Blackberry butt, and also in converting formerly non-smartphone Nokia users to Android using smartphone users.  Google seems to have copied a lot of Microsoft strategies when it comes to Android, while Microsoft was asleep at the time and didn't even notice their own strategies, so I am not holding my breath for Microsoft to regain much market unless they spend to the scale they did with the Xbox.  I expect Android to do to smartphone competition what Windows did to Novell, OS/2 and many others.  This doesn't mean there won't be a market for Apple products, I believe that Apple will still do well for a while, however there is no Steve Jobs any more but I think the company has one good phone left where he still had a lot of input and they'll work hard to make it good so people still have faith in the company and stick with them until the experience, or cost, starts to suck.

After that, is anyone's guess, but I doubt Blackberry will be one of those unless they have a very good shakeup, a shakeup that Nokia didn't manage to pull off successfully.  I'm not holding my breath.

Wednesday, December 28, 2011

Introduction to Veritas Cluster Server (VCS)

I somehow managed to escape having to deal with Veritas products over the years. Now I have been tasked to deal with some of it, so I'm taking this opportunity to familiarise myself with it. The product I'm looking at more specifically is Veritas Cluster Server, which is a different product from their popular file system and backup products. Veritas is now also a Symantec product, but I'm looking specifically at the shape of the product before the Symantec acquisition, because it's the form I have to support. Hopefully the details I cover here will help others who also end up having to support a VCS installation.

Description of VCS

Veritas Cluster Server is software used to facilitate and manage application clusters. Examples of these would be to manage an Oracle database cluster, a web application stack, or handle the clustering of Veritas file system products, or various combinations of all of these in a global operation.

I have been told it's a good product, but also quite expensive. It's more focused on managing a cluster for high availability than for performance. In other words its main purpose is to detect failures and perform failover operations, reliably.

How it works

It is run as a service (or a set of services) on top of the operating sytem on each server. It has its own heartbeat and synchronization system communicating over the network at layer 2 level, wanting several redundant network links to do this. It provides service over a virtual IP on the system node which is currently active. In other words clients only connect to the virtual IP, and VCS makes sure that something is available to provide service over that IP address and does all the failover magic in the background to achieve that.

It is also configured to know the dependencies of resources on each other, and will shut down and start them up in the most optimal order. For example to start up an application, it will make sure that the file system resource is brought up first, then the database, then the application. It will also make sure the network is up before making sure the IP address configuration is brought up. It also does this optimally so it will make sure that certain services can be started in parallel where possible.

Service Groups

Systems can also be grouped, in Service Groups. These are used to group systems that form a particular service.  For example a bunch of web servers and database servers are in one group, and when there is a fault on one of the database servers the entire lot is failed over to another set of web and database servers.  Failover can also be done for maintenance purposes. Service Groups can be in active-standby configurations (called 'failover'), active-active (called 'parallel') or a mix of these (called 'hybrid')

Resource types and agents

To shut services down and bring them up, VCS interacts with the system through commands supplied to it. You have to define these in the configuration, with defined stop, start and monitoring procedures. These are coordinated by agents. Bundled Agents, Enterprise Agents and Custom Agents exist for the various resource types. A resource type could be, for example, a database, web server or file system service, e.g. Oracle DB or an NFS server. On initial startup, VCS will determine which agents are needed to manage the services, and only those agents will be started.  Each agent can manage multiple services of the same resource type on a system.  For example the Oracle agent can manage multiple Oracle databases on one server.

Agents have entry points, which are usually points perl scripts are triggered to perform certain functions.  It doesn't have to be perl, extensions can be developed in C++ or bolted on using other scripting languages.  There are various entry points: online, to bring a service up, offline to shut a service down, monitor to check the status of a resource and other entry points such as clean, action and info.

Daemons

There are three main daemons, and one module, run on each system, that makes up the VCS service.

High-Availability Daemon (HAD)

This is the main daemon controlling the whole show. It's typically referred to as the VCS engine. It maintains the cluster according to the configuration files, maintains state information, and performs all the monitoring and failover needed. It runs as a replicated state machine, so on each node it contains a synchronized view of what's going on in the whole cluster. The replicated state machine is maintained through the LLT and GAB daemons.

Low Latency Transport Daemon (LLT)

This daemon is a low latency, high performance replacement for the IP stack, for cluster maintenance. This is done over a private network and requires two independent networks between all the cluster nodes for redundancy, and to be able to tell the difference between a system failure and a network failure. It has two major functions:
  1. Traffic distribution - it spreads internode traffic between all the private links, for speed and reliability.
  2. Heartbeat - This is used by the GAB daemon to determine the state of cluster membership.
Group Membership Services/Atomic Broadcast (GAB) Daemon.

This does two things:
  1. Maintains cluster membership, setting nodes as up or down based on heartbeat status.
  2. Handles cluster communications, doing guaranteed delivery of point to point and broadcast messages to all the nodes.
I/O Fencing Module

This makes sure that only one cluster survives a split of the private network. It determines who remains in the cluster and makes sure that systems that aren't members of the cluster any more can't write to storage.

Other Processes

Veritas Cluster Server comes with a couple of other commands and processes:

  1. Command Line Interface - to manage and administer VCS.
  2. Cluster Manager - This comes in two forms.  One is a Java based graphical user interface, the other is a web interface.
  3. hacf - This is a utility that can verify the configuration file or make HAD load a configuration file while running.
  4. hashadow - This watches the health of HAD, and restarts when needed.

Cluster Topologies

VCS supports a lot of different cluster topologies, this is where you can start to see the value and strength of the product.  It supports from the most basic topologies up to fairly complex, and useful configurations.

The most basic form is the asymmetric, or active-passive setups.  This is where there is one live server that runs an application, and there is another server that can be started up and failed over to when needed.  Then it can also support symmetric, or active-active setups.  Here you can have one server with one application, and another server with another application, and when one of the servers goes down, the application on that server gets launched to run on the other server along with the application already on there.

The possibilities get better from here.  For example you can have multiple servers sharing a few spares, banking on the fact that not all of them will fail at the same time so you can get away with only a few spares.  Another is that you can have a bunch of servers running multiple applications each, and if one of them fails it can shuffle them around on the remaining servers that has available capacity to run the application on.  It can also handle failover between data centers, for example for disaster recovery.  Neat.

Configuration

I think it's best that I cover configuration in another article.  So, I've done that in the Overview of Veritas Cluster Server Configuration article.

Saturday, January 29, 2011

The Media Causes Peanut Allergies

Anyone who has children probably know a few other parents who claim their children are allergic to certain kinds of foods, including peanuts or nuts. It also seems this is becoming more common. Is this really so? I don't believe so, and I blame the media a bit for making it seem like that.

Here are some interesting bits of trivia:

25% of families believe their children have food allergies. 4% of them actually do.

Peanuts are legumes and the other tree nuts are dried fruits. Allergies to them are different, but it's not uncommon that a hypersensitive person is allergic to multiple things.

About 3-4% of people have reported food allergies. Most of them to certain fruits, then vegetables, then milk, then seafood, then latex, then tree nuts and only 1% of them to peanuts.

The most common reaction to a peanut allergy is eczema (40%) hoarseness (37%) , asthma (14%), anaphylaxis (6%), digestive problems (1.4%).

According to current statistics about the same amount of people get struck by lightning per year in the US as people who get anaphylaxis from peanut allergy.

Yet the Food Allergy Initiative say: "Peanut allergy is one of the most common, serious and potentially fatal food allergies."

Only 13% of severe cases of allergies to foodstuff are to people over the age of 17.

Most of the studies indicating an increase in peanut allergy in the US didn't include an actual allergy test. The widely report study in the UK that shows the occurrence of it has gone up from 0.5% to 1% in the UK was not considered 'statistically significant', especially considering the small sample size. It was obviously not insignificant enough not to make headlines all over the papers.

In other words, adults who claim they have a peanut allergy that will cause serious anaphylaxis, or digestive problems, are extremely rare.

What makes it hard to study it properly is that people with peanut allergies are quite rare and it is quite hard to test reliably. Because of the remote risk of it being fatal the only true test by oral challenge (making them eat it, placebo controlled) is not always done, about 40% of people who respond to blood or skin tests don't actually show symptoms when eating it.

People have the right to believe whatever they like about their own allergies, but the sad news is that according to another study, children who were told that they were allergic to peanuts had more anxiety and felt more physically restricted than children with diabetes.