[javaee-spec users] Re: DataSourceDefinition

From: arjan tijms <arjan.tijms_at_gmail.com>
Date: Wed, 27 Aug 2014 14:25:02 +0200

Hi,

On Wed, Aug 27, 2014 at 10:02 AM, Mark Struberg <struberg_at_yahoo.de> wrote:

> Arjan, so you have the passwords of all your stages checked in to SCM and
> in your EAR/WAR in plaintext?
>

The dev databases are available to everyone, so yes, the passwords for
those are always in the SCM in plaintext. The SCM itself is of course
protected and only the trusted team has access to it. When you separate the
config from the EAR/WAR the passwords are -somewhere- as well, often times
in a SCM too and deployed to the servers via things like cf-engine or chef.
It boils down to the same thing really.

The live database is only accessible from a very limited number of white
listed IPs in a secured zone, so even if the live DB password would leak
there's little that can be done with it.

Yet, for situations where the code is accessible to more people than the
trusted team, then yes indeed the live username and password are not
checked into the code SCM but are provided separately. The approach that I
outlined in the JDevelopment article doesn't exclude settings being loaded
from alternative sources, and those sources could be the local file system
just as well.

> What is your ops team and the security guys saying about it?
>

There's no separate ops team. There's one (small) multi-disciplinary team
that is responsible for nearly everything, e.g. from initial design to
coding, deployment and monitoring. Some team members are naturally more an
expert on certain field than others, but ultimately the team is responsible
and involved at every stage.

There's thus no throwing things over the wall so to speak and no need to
convince people of things needing to be done that are not directly of
interest to those people. E.g. asking an ops member to create a JMS queue,
when the ops member has no idea what the queue should be for (since that is
solely an internal application concern) and is thus not particularly
excited about creating said queue (just a mundane task to do).

Instead, in our process the person creating the queue is also the person
having a need to create that queue and is thus motivated to do so. In a
small team full of experienced people (which I hope to belief we have at
zeef) this makes the entire development process much smoother.

> And how do you treat different databases?
>

When you build an in-house application that you deploy on your own servers
(like m4n.nl and zeef.com does) then there are no unknown amount of
different databases to support. There's one type of database (e.g.
Postgres) and if that ever changes you change its driver in said
configuration file. Don't forget though that in the wrapper data source the
*actual* data source classname also comes from the configuration file, so
although unneeded for our particular use case you can easily swap this out
for anything (and since as I explained settings can come from anywhere,
this can be done externally as well).

Or an application which runs at a customer and you don't have any
> credentials at all?
>

The process I described works best for in-house development where you
deploy to your own servers. Applications that you develop for a general
public (e.g. products like JIRA) or for customers are less suited. But
that's why I mentioned before that both approaches have their use. One
approach is not inherently better than the other. It depends on the use
case.

Historically Java EE has focussed on the use case for highly separated
roles, but failed to acknowledge not all teams work in that way. It's good
that Java EE now increasingly acknowledges the lighter and more agile way
of working, in addition to the more traditional way. As said, neither way
of working is better. It's just a matter of Java EE being capable of
scaling both up and down.

> This solution just doesn't scale...
>

In fact it actually appears to do. The key is that you don't look at
@DataSourceDefinition and related elements as the sole way to do things
(which would indeed be crazy), but as an essential piece of a spectrum of
possibilities.

In terms of number of servers the approach also seems to scale. At m4n.nl
we scaled from 1 single server that had everything (jn 2002) to a hundred
servers or so in 2011. And embedded data sources nicely scaled along (at
first we used a proprietary solution for this, later the standardized Java
EE version).

> Old trick. I wrote something similar 4 years ago for CODI [1][2] (it's
> actually much older, a colleague and I first wrote this around 2006).
> But we decided to ditch it and not move it over to DeltaSpike as it
> doesn't work on all containers when it comes to JTA. Even if you do a
> ConfigurableXaDataSource. The problem is that some containers evaluate the
> settings even before your app is booted (for doing JPA instrumentation,
> etc). Creates funny NPEs...
>

Interesting. Any idea which container that might be? I tested the
ConfigurableXaDataSource mainly with JBoss and GlassFish and at least there
it worked.

But one way or the other, the Configurable(Xa)DataSource is of course a
workaround, a hack if you like, and the issue should be solved in a better
way.

> And forget about JNDI. It just stinks. A DataSource configured on the
> container pops up on a different location for almost every container. The
> JNDI location sometimes even changes between different versions of the same
> container.
>

I've seen that indeed, so this is one additional advantage of
@DataSourceDefinition and friends. When you define the thing to be in
"java:app/ds/myds" it will actually end up in "java:app/ds/myds", and not
in "vendorname:app/ds/myds" or "java:jdbc/app/ds/myds" or whatever.

> Fully agree, but I think this should not be fixed as the whole approach
> is imo broken.
> So let's review and then deprecate this annotation based config.
>

I personally don't see any need to deprecate the annotation based config,
but I do absolutely see the need for improvements. I actually had been
working on preparing a JIRA issue for this a while back, but have not yet
finished it. The gist of it is approximately the following:

In addition to @DataSourceConfiguration, have an additional programmatic
way to provide a data source, roughly like how in the Servlet spec you can
use a programmatic API to register Servlets during startup in addition to
annotations and XML. We could use CDI events or perhaps a qualified
producer for this.

Using a producer would approximately look like this:

@Produces @DataSourceDefinition
XADataSource produceMyDataSource(DataSourceContainer container) {
    container.setMinPool(20);
    container.setProperty("vendorx.validation-statement", "select 1");
    // ...
    XADataSource myDataSource = new ....
    // ...
    return myDataSource;
}

The producer could load its settings from everywhere e.g. using DeltaSpike
config, or (if/when it becomes available) use the Config JSR.
"DataSourceContainer" is a new type to distinguish between data source
settings and container settings. Something like this should be reflected in
the annotation and XML variant as well. It should have both well defined
standardised settings (like the existing min pool) and the ability to set
vendor specific ones (SQL validation in this example).

Furthermore @DataSourceDefinition should be capable of having placeholders
in its attributes, and there should be a facility to override it externally
(which are both things the Config JSR should be able to provide).

Kind regards,
Arjan Tijms

>
>
> LieGrue,
> strub
>
>
>
>
> [1]
> https://github.com/apache/myfaces-extcdi/blob/trunk/jee-modules/jpa-module/api/src/main/java/org/apache/myfaces/extensions/cdi/jpa/api/datasource/DataSourceConfig.java
>
> [2]
> https://github.com/apache/myfaces-extcdi/blob/trunk/jee-modules/jpa-module/impl/src/main/java/org/apache/myfaces/extensions/cdi/jpa/impl/datasource/ConfigurableDataSource.java
>
>
>
>
> On Tuesday, 26 August 2014, 23:06, arjan tijms <arjan.tijms_at_gmail.com>
> wrote:
> >
> >
> >Hi,
> >
> >
> >On Tue, Aug 26, 2014 at 8:12 PM, Arun Gupta <arun.gupta_at_gmail.com> wrote:
> >
> >There is clear evidence that nobody is using @DataSourceDefinition in
> >>production code. See the conversation at:
> >>
> >>https://twitter.com/arungupta/status/504039335688404992
> >>
> >>Seems like its good only for demos.
> >
> >
> >I hate to be at the disagreeing side lately ;) but I disagree.
> >
> >
> >At zeef.com we definitely are using @DataSourceDefinition in production
> (albeit the xml variant of this in application.xml). In our development
> process development and configuration is done within the same team. Inside
> each deployable application we have a directory with sub-directories
> holding the config for every stage. The advantage is that everybody is able
> to see which config applies to which stage and can keep the config in sync
> with the actual code.
> >
> >
> >At m4n.nl where I worked before we had a similar setup, although before
> we introduced that we had the separate config that was advocated at the
> time as a best practice. This separate config didn't really work well for
> us; it was frequently out of sync with the code, configuration kept growing
> and old keys that no code was using anymore kept piling up (because the
> developers didn't saw the configuration and the sysop didn't necessarily
> saw the code). Worse, when there were live issues it wasn't clear which
> values the live code was actually using. Did a thread pool had more threads
> than there were connections, or the other way around?
> >
> >
> >So the concept of defining a data source from within the app, which
> @DataSourceDefinition facilitates, is crucial for our process.
> >
> >
> >Another important thing is that @DataSourceDefinition/the data-source
> element remains stable by virtue of the spec. Some vendors unfortunately
> often change the way their proprietary data source is configured, or even
> worse, remove a way altogether. One version a data source is specified in
> XML format 1, half a version later it's in incompatible format 2, then it's
> a deployable artifact, then it's not a deployable artifact anymore, and
> then surprise it is again. One version the data source even though
> proprietary can be embedded in an EAR, then the next version it can't be
> embedded anymore. One version it has to be defined in separate XML file,
> then one version later it goes into one big configuration file (with of
> course has yet again a different XML format), etc etc.
> >
> >
> >The biggest issue however with the current
> @DataSourceDefinition/data-source element is that it's not directly
> configurable. This was my main motivation for creating
> https://java.net/jira/browse/JAVAEE_SPEC-19
> >
> >
> >In the meantime I solved the configuration problem a little by using a
> data source wrapper that reads configuration based on a parameter and uses
> that to configure the real data source. This data source wrapper is then
> registered using the data-source element. I outlined the approach here:
> http://jdevelopment.nl/switching-data-sources-datasourcedefinition
> >
> >
> >There is some more room for improvement in @DataSourceDefinition though.
> Specifically there are now vendor specific properties that are supposed to
> go to the data source (e.g. for the Postgres or MySql driver), but there is
> no mechanism for setting vendor specific properties for the container (e.g.
> for JBoss or GlassFish). Things like transaction recovery or some advanced
> pooling settings are intended for the container, not the data source, but
> there now is no good way to configure that other than by some naming
> convention.
> >
> >
> >Long story short (TL;DR):
> >
> >
> >* @DataSourceDefinition is definitely used in production
> >
> >* Configuration is issue, but can be solved today. (I hope that config
> JSR does this even better)
> >* Room for general improvements
> >
> >
> >Kind regards,
> >Arjan
> >
> >
> >
> >
> >
> >I'd urge platform EG and other EGs
> >>in Java EE 8 to strongly consider adding a similar annotation.
> >>
> >>Cheers
> >>Arun
> >>
> >>--
> >>http://blog.arungupta.me
> >>http://twitter.com/arungupta
> >>
> >
> >
> >
> >
>