Re: [arch] Some follow-up proposals and questions after the app client container review

From: Tim Quinn <Timothy.Quinn_at_Sun.COM>
Date: Fri, 13 Feb 2009 13:41:28 -0600

Hi, Bill.

Thanks for the feedback. Responses below...I'm very glad for the
discussion...and apologies to all for the length.

Bill Shannon wrote:
> Tim Quinn wrote:
>> *1. Difference in the contents of the "deploy --retrieve" and
>> "get-client-stubs" download directory *
>>
>> In v2 when the user uses "deploy --retreive" or "get-client-stubs,"
>> GlassFish downloads the entire cooked JAR which contains all app
>> clients in the EAR and any required library JARs. It places this one
>> umbrella cooked JAR into the directory which the user specified on
>> the command. The use can easily copy the downloaded file anywhere on
>> that system or any other system.
>>
>> In the v3 proposal - also mentioned briefly during the review - we
>> could add the optional "--client client-name" option to the deploy
>> and get-client-stubs commands. (The client-name could also be a
>> comma-separated list of client names.) GlassFish would download to
>> the specified directory the cooked JAR and library JARs for the
>> client or clients requested. In the absence of the option GlassFish
>> would download the cooked JARs for all clients (and relevant library
>> JARs) in the specified application.
>>
>> What is stored in the download directory would be different from v2.
>> Users who might be accustomed to copying the single downloaded file
>> to other places would need to change. We have never published or
>> documented the contents of that directory, but people might still
>> have made assumptions about what's there.
>
> Let me make sure I understand what you're proposing...
>
> If my ear file contains appclient1.jar and appclient2.jar, which both
> depend on common.jar, then the download directory would contain
> appclient1.jar, appclient2.jar, and common.jar?
Yes, exactly. Plus, the download directory on the client side would
retain any relative path relationships among the JARs in the EAR. If
the application.xml said the app client was at x/y/appclient1.jar then
the download directory would contain an x/y subtree which in turn
contained appclient1.jar. Same for the EAR's library directory and
other JARs on which the app client depends via the Class-Path entry in
its manifest.

>
> Would appclient1.jar and appclient2.jar be exactly the original versions
> from my ear file? I assume not. I assume you're processing them in
> some way to at least add the information the ACC needs to start the app.
Very interesting question.

We could "cook" the app client JAR, replacing its manifest to augment
the Class-Path with any JARs in the EAR's library directory and adding
to the JAR any generated stub .class files. (Generating stubs is not
the default, but at least in v2 the user can opt for this at deployment
time. I assume we'll keep that as an option in v3.)

I am a bit nervous about that approach, for two reasons.

One, the repackaging would happen during deployment (as the "cooked" JAR
generation does in v2) and I don't really know how much it would cost to
replace the manifest and/or add stub classes working with a large JAR.
I would expect that most app client JARs are relatively small anyway, so
this may not be a big concern but it's there.

Two, the developer might have signed the app client JAR. By cooking it
we would place unsigned entries in the originally-signed JAR, or perhaps
we could try signing the new entries with the domain's cert or one of
the administrator's choosing. (I'm not sure at this point if either of
those would work, by the way. I need to look into that further.) But
even if they do work, cooking the signed JAR would change the nature of
the assurance that the original signing by the developer was intended to
provide.

Plus, to me it seems a cleaner solution if we can completely avoid
changing the developer's files.

An alternative which avoids both potential problems: Download the
developer's appclient1.jar as appclient1.__gf__.jar (we'd make sure the
name is unique and predictable and unchanging - these characteristics
are important for Java Web Start caching). Generate a new, tiny
appclient1.jar. Its manifest would be the original app client JAR's
manifest with

1. the Class-Path adjusted to add references to the renamed original app
client JAR and any JARs in the EAR's library directory, and
2. its Main-Class entry assigned as we chose. (More about that below.)

and the new JAR would contain any generated stubs.

Generating this new, tiny JAR will be fast and will not depend on the
size of JARs from the developer. It also avoids the issue with a
possibly signed app client JAR.

The end-user would launch appclient1.jar, not needing to know that it's
not the original developer's appclient1.jar. This is the same end-user
experience as in v2, although accomplished quite differently, in which
the generated app client JAR has the same name but quite different
format and content from the original one from the developer.
>
> I also assume that appclient1.jar and appclient2.jar would have
> Class-Path
> manifest entries that reference common.jar, correct? So if I run them in
> place (or copy the entire download directory), the dependencies will be
> found.
Exactly. See also the above Class-Path discussion.
>
> I think that's all fine, but there's another alternative to consider.
>
> There's some open source technology (whose name I can't remember) that
> allows you to add a bunch of dependent jar files into a single jar file,
> along with a "starter" program that sets up a special class loader that
> allows loading classes from a jar file within the jar file, and then
> runs your app. That would allow you to bundle the original, unchanged,
> appclient1.jar and common.jar into a new appclient1.jar that would run
> the app from that single jar file.
I was hoping to avoid wrapping all the JARs into an umbrella JAR, partly
to streamline deployment. Also, to take best advantage of Java Web
Start's caching we would need to serve the individual JARs separately.
In v2 we have the problem in which a developer changes a small JAR but
the app client depends on a big library JAR. The redeployment creates a
brand new large cooked JAR file that includes the app client and the
large library. Java Web Start will detect that the cooked JAR is now
more recent on the server so it will download the entire cooked JAR
again, even though only a small part of it changed. I want to fix that
in v3 by serving up the individual JARS instead of a single umbrella JAR
to Java Web Start. If we used this open source technology to create an
umbrella JAR then that means more differences in the ACC code path for
the JWS vs. the non-JWS case.

Also, I looked into this general idea a while ago. The comments in
forums from Java engineers at that time was that writing a class loader
to handle JARs within JARs was certainly possible but could be very slow
if the umbrella JAR were not expanded into a temp directory and the
class path set to the various expanded JARs. If I remember correctly,
this was because accessing a particular entry of a "top-level" JAR is
fast using the internal index in the JAR and RandomAccessFile (or maybe
I imagined that part) to go right to a particular byte location in the
file where a given entry starts. But using a RandomAccessFile is is not
possible on a stream opened from the JAR (which is what the nested JARs
would be) so the access to a particular nested entry is serial within
the inner JAR, not indexed.

The open source software might get around this by extracting the nested
JARs into a temporary directory and constructing a class path including
the extracted JARs. This is exactly what the v2 ACC does and we have
had users complain that it's a time-consuming step if the generated JAR
contains large JARs, typically large library JARs.

Or, perhaps the technology you're referring to employs a clever solution
that avoids both of these problems. Even so, I'd still like to keep the
code paths for the Java Web Start and non-JWS cases as close as possible
and this would work against that, and I'd also like to avoid the cost of
creating potentially large artifacts during deployment.
>
>> *2. Requiring the user indicate where the ACC is at launch time when
>> using "java -jar cookedAC.jar"*
>
> Are you assuming that the entire app server is installed separately on
> the client, and that the application client needs to specify the
> location of the app client container within that full app server
> installation?
>
> Is it not possible to bundle the parts of the app client container
> that are needed and include them in the download directory? (Wouldn't
> that be the same classes delivered to a web start client?) And given
> the technology referenced above, couldn't they all be bundled into a
> single jar file? (Depending on the size, that may not be a good idea.)
Several points.

I was thinking that v3 would follow the same general model as in v2, in
which the end-user system has only those parts of GF that the ACC
needs. (Of course, that footprint is much too large in v2.) I have
talked with Jerome some already about whether we should create an
additional GlassFish "distribution" which contains the runtime bits
required to support the ACC or whether we would create not a
full-fledged distribution but just a zip file to meet this need. So
although exactly how we'd bundle that up and make it available, I've
been thinking we'd continue the same general idea.

It would certainly be possible to place the required GF bits into the
download directory. But because the end-user can download different
app clients into different download directories, we would end up with
multiple copies of those GF bits in various directories. Although the
ACC itself will not be large, at the moment its dependencies on other
parts of GF drag along a lot of additional megabytes. I am working with
component leads to address this, but we have a ways to go and creating
more modules to split client from server classes works against the goal
of minimizing modules to help accelerate server start-up.

Anyway, I'm trying to be realistic about this and I am assuming that the
footprint will be large enough that we won't want to dedicate the
download time or the end-user disk space to including the ACC and its
dependencies with every downloaded app client.

As for the Java Web Start case, the v2 implementation already downloads
the runtime bits only once from each host by thanks to the way GF
constructs the paths in the URLs for the artifacts. The generated JNLP
documents refer to all the runtime bits and the files for the specific
app client, but Java Web Start recognizes that for clients 2 through n
from the same host it already has the runtime bits and does not download
them again.

[Brief aside: One improvement I would like to make for v3 in our Java
Web Start support is to take advantage of the Java Web Start
"installation" feature to present the runtime bits as an installable
unit rather than as "normal" JARs that an app client depends on. This
will allow Java Web Start to reuse a single download of the runtime bits
across launches from different GlassFish servers. I will have much more
to say about these and other Java Web Start improvements later on in a
separate one-pager. ]
>
> Another alternative you don't list is the use of the "appclient" command
> to run the app client. If I could just replace the "java" command with
> the "appclient" command, and all other arguments would be the same, this
> would seem to be an easy way to run an app client. Of course, that
> depends on a separate installation of the appclient command and
> dependencies.
Yes and yes! I really want v3 to support using "appclient" followed by
any legal "java" command options followed by ACC and app client
arguments. I think this will be fairly simple, actually, and also
maintain backward compatibility. As you point out, to invoke the
"appclient" script a user must have the bits locally and in a known place.
>
> (Yes, I know some people don't like this, because they trust the "java"
> command and they don't want to use a different command to run their
> applications. Those people would need to use one of your other
> approaches,
> and maybe we don't need to worry so much that it's convenient.)
I have a simple way for the appclient script to map its arguments to a
"java" command that's very similar to what a user could have used
manually to accomplish the same launch. That has the advantage for us,
selfishly, of supporting not two ways to launch app clients but really
just one without making it cumbersome for the "appclient" script user.

And this brings me back to the question of what we would put in the
Main-Class manifest entry of the appclient1.jar we would generate.
Right now, I am thinking that we'd specify the main class of the app
client and not main class of the ACC, and rely on an ACC Java agent to
do some very mild byte code transformation. Please read on before
panicking!

Of course if we specify the app client's main class as the Main-Class in
the generated manifest then the ACC needs some other way to gain control
before the developer's main class runs. The Java agent would register
a class transformer with the VM that would insert logic into the static
initializer of the developer's main class (and no other class). This
logic would simply invoke a static method on the ACC which would
initialize the ACC (including injection into the now-loaded main class.
The spec requires that any injection occur before the developer's main
method is invoked, and this design would meet that requirement. Then
the rest of the developer's static initializer (if any) would run,
followed by the VM's invocation of the main method of the developer's
now-injected main class.

So how does the agent know which is the one class to transform? In the
"appclient" case, the script can provide all the user-provided
command-line arguments as agent arguments, so the agent can
unambiguously determine which is the main class from those arguments.
For the "java" case ... Well, despite discussing this with several VM
people a long time ago and very recently, there is no supported,
standard way for a Java app or even an agent to find out what main class
the launcher has chosen to run. So that's why we'd ask the user
entering the "java" command to tell us, via an agent argument, the main
class or the JAR which contains it. This is redundant with what he or
she would already be entering on the "java" command but there seems to
be no way to avoid this. I've filed an RFE against the
agent/instrumentation module of Java asking for a way to get the main
class. If this feature ever arrives then we could take advantage of it
and relax this requirement on our "java" command users at that point.

This approach, although it adds a little complexity in the form of the
byte code transformation, simplifies other parts of the implementation
by allowing our internal code paths to be the same for cooked vs. raw
app client JARs and for "appclient" vs. "java" invocations. (We do still
need to support the use of the "appclient"script with artifacts that
have not been deployed (that is, raw JARs).

Further questions and comments?

Thanks for reading.

- Tim