Re: cluster setup issues.

From: Harpreet Singh <Harpreet.Singh_at_Sun.COM>
Date: Thu, 14 Sep 2006 12:00:28 -0700

Hi Nandini
Thanks for the detailed reply. To set my context, I started out
installation and at some point decided to wear a new developer/user hat.
When I try a new software - I expect things to be very straightforward.
As we say that with V2 clustering capabilities are available. I expect
the initial setup to be painless :-)
Further comments inline...
> General comment to set context:
> The issues about changing commonly shared information between DAS and
> NA and the way it was handled (or not handled at all) has become the
> focus of changes for 9.1ee. It started in 8.2ee itself and is an
> on-going effort.
> Please see specific comments inline.
> Thanks
> Nandini
> Harpreet Singh wrote:
>> Hi
>> As promised in an earlier email, here are the issues that I faced
>> setting up a simple node agent and instance under it.
>> Forgive a rather long email to make 3 points:
>> 1. masterpassword: default value needs to be exposed in the setup
>> guide on glassfish. There is no mention of this.
> Yes that was found necessary and I think is already in the process...
> atleast for now they are already in the following docs.
My point was/is: if we are using 2 passwords (no problem with it) - they
should be treated the same way(from installation-end user perspective).
We expose admin password through a passfile. We could do the same with
masterpassword. A simple page (v2-setup-page.html) on glassfish with
cluster specific setup assumptions should very easily address this
problem. Thats all I am asking for :-).
>> 2. Steps/error messages are non-intuitive.
> Yes some of them can be improved though once you go into the EE land I
> feel some introductory reading surely tends to help make them intuitive.
>> 3. We ask a developer to know too much of our internal systems to do
>> simple things.
> IMO typically cluster setup would not qualify for a simple thing. But
> I agree, suggestions to make non-simple things simple are equally
> important and a true value add.
>> I am listing steps/messages that are non-intuitive(to me) in blue.
>> Here are the steps:
>> 0. java -Xmx256m -jar glassfish-installer-v2-b16.jar
>> 1. glassfish/passfile has u/p: admin/adminadmin.
>> 2. ant -f setup-cluster.xml
>> 3. asadmin start-domain domain1. Asks for admin password, I know from
>> step1 that it is adminadmin
>> So far so good, domain starts up.
>> 4. asadmin create-node-agent dhoom.
>> Asks for admin u/p. Asks for master password. Huh - whats that? I did
>> not see it in the installation. Ctrl-c.
> Kedar has worked and resolved adminpassword related issues and you may
> want to refer
> to his findings in these documents he had shared quite a few times on
> the mailing lists.
> http://appserver.sfbay/apollo/admininfra/structure/discussions/changing-admin-password.html
Thanks for the links - it clarifies things a bit more. But the point I
was making remains: a setup-v2 page does not end user to be exposed to
any of the intricacies. Btw : these pages are internal so these will
have to be moved out (I presume).
>> So I hunt on to understand where the
>> masterpassword is. Eventually I land up at
>> . It tells me to download WebServer and show how to configure it or
>> download samples and run scripts from there. There has to be
>> something simpler.
>> Some more hunting...Still no idea where to get the masterpassword is.
>> Maybe I will run the create-node-agent command and try adminadmin.
> I tried google
> and I
> got the information
Aha: googling - usually means we have to cleanup the docs ;-).
> Byron has found even better filtered way of searching
> " masterpassword SJSAS"
> Both give the default value of masterpassword
>> 5. asadmin create-node-agent dhoom. I enter u/p admin/adminadmin,
>> masterpassword adminadmin . Voila it works!!
>> I find later that though this creates the node agent but it does not
>> match the magic masterpassword (changeit).
> Yes this is what I was referring to in my earlier mail (I have
> attached it). There is no way a newly created node agent can know
> which DAS it is going to eventually be bound with. So to start with it
> accepts anything.
> The change to remove such information altogether at the 0th step had
> already been considered but I forget the exact reason this was left
> out in 8.2 at the last moment. Perhaps Prashanth / Kedar recollects it
Then perhaps it should print out a diagonastic message saying so.
>> 6. asadmin create-instance --nodeagent dhoom inst1.
>> Unable to connect to admin-server. Please check if the server is up
>> and running and that the host and port provided are correct.
>> What I just created it successfully? Oh - its that thing on the email
>> alias that said that there is an open issue about 4848 and 4849.
> ok some basics...create-instance is remote CLI call to let a domain
> (administered by a running DAS) know what instance will make it to its
> underlying layout of NAs and instances. So this call is for DAS (and
> hence the option --nodeagent has to be specified so that DAS knows
> which NA will take care of this instance)...Because asadmin is a cross
> domain tool. Now the question comes which domain. A domain in our
> system is identified by host and port (for local CLI domain
> identification can be done automatically in case there is just one
> domain but that is more for improving convenience and
> (post)installation experience). Here comes the --port part.
> asadmin needs to know who to talk to. The present solution takes into
> consideration the pain of specifying this in every CLI and so there
> are defaults of 4849/4848 (EE/PE). And yes with the latest move to GF,
> we need to resolve this 4848/4849 issue which I think is already taken
> up for discussion.
> Incidentally since you are interested in trying out changes
> simplifying user experience and if you are not aware of, checkout new
> convenience CLI : ./asadmin login . It has been quite a few months it
> has been around now.
>> Well I well delete the node-agent and create it again.
>> 7. asadmin delete-node-agent dhoom. Worked
>> 8. asadmin create-node-agent --port 4848 dhoom. Success!
>> 9. asadmin create-instance --nodeagent dhoom inst1.
>> Unable to connect to admin-server. Please check if the server is up
>> and running and that the host and port provided are correct.
>> What ? Oh same issue 4849 - but hey I fixed that in 8.
> No you did not. Again, create-node-agent does not mandate DAS's
> presence. create-instance does and "talks" to a DAS and has to know
> where to find it.
> IOW, create-node-agent is local CLI and create-instance remote
>> 10. I do 7 again, and vi domain.xml. Node-agent dhoom is present. So
>> the delete-node-agent just deleted the directory. On snooping around
>> I find that I need to do delete-node-agent-config. But I did not need
>> to create-node-agent-config. I presume a do/undo command should work
>> in the same manner and do/undo/undo-something-more is not intuitive.
>> Nandini points out to an open issue for this case
>> http://monaco.sfbay/detail.jsf?cr=6170688. Its open from 2 years and
>> marked as an RFE. Are there plans to fix this.
> This has been a confusion point for the generic reasons I explained in
> my first mail and will be treated as a bug for 9.1 EE for reasons in
> the bug details
> Evaluation
> The problem is the delete-node-agent command only delete the physical
> node-agent. It doesn't update the domain configuration which is done
> using the command delete-node-agent-config. Using
> delete-node-agent-config will remove the nodeagent from the
> list-node-agents command.
> Modifying the delete-node-agent command to remove the
> node-agent-config in domain.xml is a straight forward change but it
> involves adding the the information necessary to contact the das,
> which would make it look just like the create-node-agent command. I
> need dashost, dasport, adminuser, adminpassword and master password to
> contact the das so I can delete the nodeagent configuration.
> This would effect testing and docs and could potentially introduce new
> bugs. This is a good thing to add for 8.2 so I am going to make it a
> RFE.
> <> 10/20/04 23:14 GMT
> <> 10/20/04 23:16 GMT
> Not sure if this is an RFE. As Lidia points out, at least one attempt
> must be made so that delete-node-agent is (sort of) "exact opposite"
> of create-node-agent.
> <> 2005-06-29
> 07:17:36 GMT
> *Entry 1* shreedhar.ganapathy [2004-08-31 21:58] *Last updated:*
> kedar.mhaswade [2005-06-29 07:17]
> Will not fix for 8.2 for it is not reported by a customer and 8.2 is a
> restricted-bug-fix release. Changing the target release to 9.0ee.
> *Entry 2* kedar.mhaswade [2005-11-23 20:02]
>> 11. asadmin delete-node-agent-config dhoom and restart from step 4
>> again.
>> 12. Step 4 with --port 4848 option and masterpassword adminadmin,
>> Step 6 with --port 4848 works!
>> 13. asadmin start-node-agent dhoom. u/p admin/adminadmin,
>> masterpassword : adminadmin
>> Node Agent dhoom failed to startup. Please check the server log for
>> more details. The log says
>> [#|2006-09-13T14:11:39.362-0700|SEVERE|sun-appserver-ee9.1||_ThreadID=10;_ThreadName=main;|NAGT0014:Unexpected
>> Node Agent exception.
>> com.sun.appserv.server.ServerLifecycleException:
>> NSS password
>> is invalid. Failed to authenticate to PKCS11 slot: internal
>> at
>> at
>> Caused by: NSS
>> password is invalid. Failed to authenticate to PKCS11 slot: internal
>> at
>> at
>> at
>> at
>> ... 1 more
>> Caused by: NSS
>> password is invalid. Failed to authenticate to PKCS11 slot: internal
>> at
>> at
>> ... 4 more
>> Caused by: java.lang.IllegalStateException: NSS password is invalid.
>> Failed to authenticate to PKCS11 slot: internal
>> at
>> at
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>> at
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(
>> at
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
>> at
>> java.lang.reflect.Constructor.newInstance(
>> at java.lang.Class.newInstance0(
>> at java.lang.Class.newInstance(
>> at
>> com.sun.enterprise.pluggable.PluggableFeatureFactoryBaseImpl.invoke(
>> at $Proxy1.getSecuritySupport(Unknown Source)
>> at
>> at
>> at
>> at
>> ... 5 more
>> 14. Why? Nandini (thanks a lot!) tells me that masterpassword is set
>> to "changeit" by default. Oh... I remember something like this from
>> my previous lifetime when I had to do security tests.
> now from process simplification perspective I am sure you are mighty
> happy that we have stuck to such "lifetime defaults" :)
>> 15. Anyways it is not over yet.
>> 16. asadmin delete-node-agent dhoom. I presume it will delete
>> instances created under it. It does not.
>> 17. asadmin delete-instance inst1. It does not want the node-agent
>> name, fine I am okay with it.
>> 18. asadmin delete-node-agent-config dhoom
>> 17. I need a coffee ....
>> 18. Starting from step 4 again with masterpassword changeit. Aha
>> everything worked and started.
>> <master-card-ad>
>> Cluster Created = 1
>> Headaches earned = 2
>> time spent = 1 day
>> understood a new developers pain = priceless :-)
>> </master-card-ad>
> By the way, didn't you say you needed a clustered setup? If so you
> have to do a different setup. :)
I know :-) I did not what to call the setup I was configuring :-)

Nandini - thanks for your time and patience.
A final point: the setup issues/assumptions reminds me of the "GUI Focus
Groups" where in total new comers are given a gui and told to navigate
it and the pain points are identified. A similar focus group should help
:-). I hope some of this is covered via the user-experience group.

A final final point: I have evaluated some external portals where they
just provide a script (for new comers) which setups a clustered instance
as specified in a readme. The advantage has always been that I was
setup very quickly with a cluster and a simple app that lets me play and
keeps me shielded away from complexities as much as possible. As they
say in the ruby world "convention over configuration :-) "
> Harpreet,
> This error happens when DAS/NA master passwords do not match.
> Also here are some inherent issues with the kind of problem that had
> to be resolved.
> DAS and NA are two decoupled processes i.e. one can run without the
> other once each once goes follows a certain state diagram.
> Unfortunately this leads to changes that need to be done (from a user
> perspective) on both the entities but as one can be down they lead to
> non-atomic best-effort actions. Delete node agent is one such example
> (see its repercusion in bug 6170688). So it would be good to have
> delete mirror create, but it has set of tradeoffs and assumptions. In
> 9.1 we are trying to resolve as best we can per such case. Would be
> glad to get more inputs here though so send in that mail.
> thanks,
> Nandini
> Harpreet Singh wrote:
>> Hi
>> I see a NSS exception while starting the node-agent. Here are my
>> commands followed by the exception:
>> 1. asadmin create-domain domain1
>> 2. asadmin create-node-agent --savemasterpassword=true myagent
>> 3. asadmin create-instance --nodeagent inst1
>> 4. asadmin start-node-agent myagent
>> I have done this multiple times with a newly created domain and node
>> agent.
>> (Another email will follow about the pain points we put developers
>> through to delete a node agent and create new ones)
>> Thanks
>> Harpreet
>>> Please enter the admin user name>admin
>>> Please enter the admin password>
>>> Node Agent myagent failed to startup. Please check the server log
>>> for more details.
>>> CLI137 Command start-node-agent failed.
>>> [#|2006-09-12T16:36:26.913-0700|SEVERE|sun-appserver-ee9.1||_ThreadID=10;_ThreadName=main;|SEC8001:
>>> Exception in initializing SunPKCS11.
>>> java.lang.Exception: NSS password is invalid. Failed to authenticate
>>> to PKCS11 slot: internal
>>> at
>>> Method)
>>> at
>>> at
>>> at
>>> at
>>> at
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(
>>> at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
>>> at
>>> java.lang.reflect.Constructor.newInstance(
>>> at java.lang.Class.newInstance0(
>>> at java.lang.Class.newInstance(
>>> at
>>> com.sun.enterprise.pluggable.PluggableFeatureFactoryBaseImpl.invoke(
>>> at $Proxy1.getSecuritySupport(Unknown Source)
>>> at
>>> at
>>> at
>>> at
>>> at
>>> at
>>> at
>>> at
>>> at
>>> |#]
>>> [#|2006-09-12T16:36:26.934-0700|WARNING|sun-appserver-ee9.1||_ThreadID=10;_ThreadName=main;|NAGT0003:An
>>> exception has occurred during the initialization of the NodeAgent.
>>> java.lang.IllegalStateException: NSS password is invalid. Failed to
>>> authenticate to PKCS11 slot: internal
>>> at
>>> at
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(
>>> at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
>>> at
>>> java.lang.reflect.Constructor.newInstance(
>>> at java.lang.Class.newInstance0(
>>> at java.lang.Class.newInstance(
>>> at
>>> com.sun.enterprise.pluggable.PluggableFeatureFactoryBaseImpl.invoke(
>>> at $Proxy1.getSecuritySupport(Unknown Source)
>>> at
>>> at
>>> at
>>> at
>>> at
>>> at
>>> at
>>> at
>>> at
>>> |#]
>>> [#|2006-09-12T16:36:26.938-0700|WARNING|sun-appserver-ee9.1||_ThreadID=10;_ThreadName=main;|NAGT0002:An
>>> exception has occurred during the sychronization of this node with
>>> the DAS.
>>> NSS password
>>> is invalid. Failed to authenticate to PKCS11 slot: internal
>>> at
>>> at
>>> at
>>> at
>>> at
>>> at
>>> Caused by: java.lang.IllegalStateException: NSS password is invalid.
>>> Failed to authenticate to PKCS11 slot: internal
>>> at
>>> at
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(
>>> at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
>>> at
>>> java.lang.reflect.Constructor.newInstance(
>>> at java.lang.Class.newInstance0(
>>> at java.lang.Class.newInstance(
>>> at
>>> com.sun.enterprise.pluggable.PluggableFeatureFactoryBaseImpl.invoke(
>>> at $Proxy1.getSecuritySupport(Unknown Source)
>>> at
>>> at
>>> at
>>> at
>>> ... 5 more
>>> |#]
>>> [#|2006-09-12T16:36:26.943-0700|SEVERE|sun-appserver-ee9.1||_ThreadID=10;_ThreadName=main;|NAGT0014:Unexpected
>>> Node Agent exception.
>>> com.sun.appserv.server.ServerLifecycleException:
>>> NSS password
>>> is invalid. Failed to authenticate to PKCS11 slot: internal
>>> at
>>> at
>>> Caused by:
>>> NSS password is invalid. Failed to authenticate to PKCS11 slot:
>>> internal
>>> at
>>> at
>>> at
>>> at
>>> ... 1 more
>>> Caused by:
>>> NSS password is invalid. Failed to authenticate to PKCS11 slot:
>>> internal
>>> at
>>> at
>>> ... 4 more
>>> Caused by: java.lang.IllegalStateException: NSS password is invalid.
>>> Failed to authenticate to PKCS11 slot: internal
>>> at
>>> at
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>> at
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(
>>> at
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
>>> at
>>> java.lang.reflect.Constructor.newInstance(
>>> at java.lang.Class.newInstance0(
>>> at java.lang.Class.newInstance(
>>> at
>>> com.sun.enterprise.pluggable.PluggableFeatureFactoryBaseImpl.invoke(
>>> at $Proxy1.getSecuritySupport(Unknown Source)
>>> at
>>> at
>>> at
>>> at
>>> ... 5 more
>>> |#]
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail: