users@glassfish.java.net

Re: GF v2.1 and HADB ...

From: Peter L. Gratzer <peter.gratzer_at_spanningtree-solutions.com>
Date: Wed, 5 Jan 2011 00:17:55 -0700

Maitrayi, Shreedhar,

Thank you both for this excellent insight into HADB !! This is very helpful to my current project.

As I am also progressing to a horizontal scaled solution solution a few more related question came to mind:

     - examples using asadmin with the configure-ha-cluster directive always show a two node cluster i.e. test1 and test 2. In some cases
       the hosts parameter has "test1, test2" specified, in others "test1,test2,test1,test2". As I would see that a pair represents one connection
       for distributing the data, is the connection bi-directional or do you have to specify the nodes twice aka is it uni-directional ? Is the order
       in a uni-directional case correct, that you would specify "test1, test2, test1, test2" ?
     - in a configuration of more than two nodes I have not found any documentation regarding the setup of an HADB database.
       Can you provide a guideline for a 2+ configuration on how nodes are specified using asadmin with the configure-ha-cluster directive ?

Thanks again in advance,

Peter

PS: Sorry,

On Jan 4, 2011, at 6:09 AM, Maitrayi Sabaratnam wrote:

> Answers inlined.
>
>
>> -------- Original Message --------
> > Subject: GF v2.1 and HADB ...
> > Date: Mon, 3 Jan 2011 18:02:07 +0000 (GMT)
> > From:<peter.gratzer_at_spanningtree-solutions.com>
> > To:users_at_glassfish.java.net
> >
> > I am looking for some insight into HADB ...
> >
> > Configuration:
> > - Solaris 10 u8 on 2 SPARC systems
> > - GF v.2.1 running in each system within a Local Zone
> > - no shared memory area specified
> > - Cluster configured with HADB 4.4.3.6
> >
> > it seems the HA database is working without having a shared memory area
> > being specified. Any help available regarding the following questions:
> >
> > - What is the purpose of the shared memory ?
> Shared memory in hadb contains admin info (dictionary info regarding to data table access, sessions), as well as the data and log buffers.
>
> To perform a data operation (like read or update), a copy of the data page from the disk data file must be brought into the buffer area. The modified pages are flushed again to disk at checkpointing. The operations are logged in the log buffer and been flushed to disk before the data pages are flushed, in order to enable recovery.
>
> So the buffer area needs to be large enough to cope up with the user load and the availability of system resources.
>
> Normally, for performance reasons, data pages are locked into the buffers, and hadb maintains the buffering and swapping itself, disabling the OS to do that.
>
> (Note: The shared memory allocation and usage is exclusive for one hadb node.)
>
> > - Why is there no error showing up ?
>
> I'm not an expert in local zones, so my guess is, the shared memory allocated (default or user-defined sizes) is within the virtual addressable memory space, and locking shared memory is just ignored or not performed.
>
> > - What are the ramifications if shared memory is not configured ?
>
> - Tuning problem: As I mentioned in the first answer, the performance tuning may not be effective (or cannot be performed at all). The buffer sizes may not be enough to cache the pages used by the user or system transactions, during the normal or peak time load.
> - Performance problem: Not being able to lock pages may cause pages being swapped out before finished using and thus need to be swapped in again (and again... Dining philosophers problem like). The consequence is: transactions time out and lack of resources makes hadb resource control mechanism to reject new loads.
> - Double buffering: If disabling the OS level buffering is NOT possible in local zones (I do not know), it will also cause a performance hazard.
>
> > - What are the ramifications not being able to set the real time
> > priority for NSUP
> > due to be running in a local zone ?
>
> Assume that the machine has very high load (many local zones, etc).
> If the node supervisor (nsup) process of an hadb node is not scheduled within the timeout, it will restart the whole node when it is scheduled next time. And also the system may suffer from network partition like scenarios. This will cause transactions to abort. The performance will suffer if node restarts are occurring very often continuously (contrary to one node restarting very seldom, which will NOT be noticeable and thus not a problem).
>
>
> > - Are all of these performance related ? Would it just impact the
> > availability rate ?
>
> Performance and availability are tightly connected. Availability is not only that the data base system processes are up and running, but also that the transactions are served within a time duration acceptable to users.
>
> So if too many transactions are aborted due to either node restart or timeout (caused by lack of system resources - cpu, mem, network, disk i/o), it will be perceived as unavailability by the users.
>
> So the summary is, hadb needs system resources (available within a timeout) to function smoothly.
>
> >
> > Thanks in advance,
> >
> > Peter
> >
>
> --
> *** Maitrayi ***
>