[jpa-spec users] [jsr338-experts] Re: support for multitenancy

From: michael keith <michael.keith_at_oracle.com>
Date: Tue, 27 Mar 2012 14:47:18 -0400

Hi Linda,

Thanks for writing all this up.

Some comments inline.

-Mike

On 26/03/2012 7:54 PM, Linda DeMichiel wrote:
> One of the main items on the agenda for the JPA 2.1 release is support
> for multitenancy in Java EE 7 cloud environments.
>
> In Java EE 7, an application can be submitted into a cloud environment
> for use by multiple tenants in what can be viewed as a basic form of
> software as a service (SaaS). The application is customized and
> deployed on a per-tenant basis. At runtime, there is a separate
> application instance (or set of instances, e.g., in a clustered
> environment) per tenant. The instances used by different tenants are
> isolated from one another. The resources used by a tenant's
> application may also be isolated from one another, or may be shared.
> In general, however, it is assumed that a tenant's data is isolated
> from that other tenants.

That seems like the right default assumption to make. A config option
made to/by the resource consumer (in our case the JPA provider) would
override that assumption, I suppose?

> There are three well-known approaches to support for multitenancy at
> the database level:
>
> (1) separate database approach
> (2) shared database / separate schema approach
> (3) shared schema / shared table approach
>
>
> To get the discussion started, this is a high-level strawman sketch of
> how the 3 approaches might be used with JPA in keeping with the Java
> EE 7 approach. At the same time, however, we also want to be sure
> that what we specify in JPA 2.1 can be extended to encompass a more
> general approach to SaaS in the future in which a single application
> instance serves multiple tenants and in which multitenancy is managed
> by the Java EE environment.
>
> For further information on how Java EE 7 is approaching PaaS/SaaS, you
> might find the documents on the javaee-spec.java.net project useful,
> particularly
> http://java.net/projects/javaee-spec/downloads/download/PaaS.pdf
> and the latest draft of the Java EE 7 Platform spec,
> http://java.net/projects/javaee-spec/downloads/download/JavaEE_Platform_Spec.pdf.
>
>
> Note that the identifier for the tenant will be made available to the
> application in JNDI as java:comp/tenantId. The tenantId will be a
> string, whose max length should allow it to be portably stored in a
> single database column.
>
>
> APPROACHES:
>
> (1) Separate database approach
>
> In this approach, each tenant's persistence unit is mapped to a
> separate database. This approach provides the greatest isolation
> between tenants and does not impose any additional constraints over
> the object/relational mapping of the persistence unit or over the
> operations that can be performed. In particular, the use of
> multiple database schemas or catalogs are supported as are native
> queries.
>
> In some cloud environments, use of this approach might not be
> available, as a tenant might be allocated storage within a database
> rather than a separate database.

So this is basically what JPA assumes today.

> (2) Shared database / separate schema approach
>
> In this approach, each tenant's data is stored in database tables
> that are isolated from those of any other tenant. In databases that
> support schemas, this will typically be achieved by allocating a
> separate schema per tenant. The database's permissions facility is
> used to confine a tenant's access to the designated schema, thus
> providing isolation between tenants at the schema level.
>
> Support for this approach is straightforward if the persistence unit
> uses only the default schema or catalog (i.e., if it does not specify
> schema names or catalogs in the object/relational mapping metadata).
> A native query that attempts to access data in a schema other than
> that assigned to the tenant by the platform provider will be trapped
> by the database authorization mechanisms and will result in an
> exception.
>
> [While the case where the persistence unit metadata explicitly
> specifies one or more schemas could potentially be handled by the
> persistence provider by remapping schema names and native queries that
> embed schema names, I would not propose that we specify or require
> support for this case, although a more sophisticated persistence
> provider might choose to support it.]

So, in summary, portable apps may not specify a schema or catalog at
any level:
mapping (annotation or XML), mapping file, persistence unit default, or
in a native query.

>
> (3) Shared table approach
>
> In this approach, database tables are shared ("striped") across tenants.
>
> It is the reponsibility of the persistence provider to provide
> per-tenant isolation in accessing data. This will typically be done
> by mapping and maintaining a tenant ID column in the respective
> tables, and augmenting data retrieval and query operations, updates,
> and inserts with tenant IDs. The use of native queries would need to
> be trapped by the persistence provider and not allowed unless the
> persistence provider were able to modify them to provide isolation of
> tenant data.

So, portable applications could not use either schemas or native queries in
this mode, and there will be an opportunity for the application to be able
to map the tenant id column in each table.

>
> Ideally, the management of the tenant id should be transparent to the
> application, although we should revisit this in Java EE 8 as we move
> further into support for SaaS.

For the application to not have to manage tenant ids, I guess the tenant
identifier
would need to be available to the provider on a per-invocation basis (in
a thread
context set by the container)? As you mention, not something that we
necessarily have to worry about now, but just so we know what we will need
in the future if this is what we want.

> I believe that the main use case for the shared table approach is in
> SaaS environments in which a single application instance is servicing
> multiple tenants. This is outside the scope of Java EE 7, so I don't
> think that we need to standardize on support for this approach now,
> although we should not lose sight of it as we standardize on other
> aspects.

Yes, there is some value in this being available today, though, given that
some people are doing multitenancy in their own environment, outside the
cloud. I guess it just depends how far we want to go to enable SaaS in JPA
in this round.

> DETERMINING THE MULTITENANCY STORAGE MAPPING STRATEGY:
>
> We see two general approaches to determining the multitenancy storage
> mapping strategy that should be used for a persistence unit. In some
> cases, these approaches might be combined.
>
> Again, note that a cloud platform provider might use a single strategy
> for all tenants in allocating database storage. For example, each
> tenant might be allocated a separate database, or each tenant might
> only be allocated a schema within a database.
>
>
> (A) The Application Specifies Its Requirements
>
> In this approach, the application specifies its functional
> requirements (in terms of need for named, multiple schemas and/or use
> of native queries) in the persistence.xml descriptor, and the deployer
> and/or cloud platform provider determine the storage strategy that is
> used for the tenant. This metadata serves as input to the deployer
> for the tenant or as input into the automated provisioning of the
> application by the cloud platform provider (if automated provisioning
> is supported by the platform instance).
>
> For example, an application might specify that it requires support for
> multiple schemas and native queries. In general, such requirements
> would mean that a separate database would need to be provisioned for
> the tenant. If this is not possible, then unless the platform
> provider supported a persistence provider that could perform schema
> remapping and/or modification of native queries, the application might
> fail to deploy or fail to initialize. On the other hand, if an
> application specifies that it uses only the default schema and native
> queries, then either the separate database or separate schema approach
> could be used.

I'm less enamored with this approach.
Although many cloud platforms are going to support both an internally
hosted DBaaS as well as access to an external DB, my guess is that they
won't
have multiple different ways of implementing their internally hosted
database
services (e.g. one as a separate DB and one with striped data). I could
be wrong,
but realistically I don't think a cloud provider is ever going to
implement a db
service using striping. As was mentioned above, a SaaS application might
decide
to use its database that way.
Basically, the restriction that schemas not be used in portable cloud
apps is
enough, I think, for cloud applications. Any additional requirements or
relaxations
are cloud specific.

> (B) The Application Specifies the Multitenancy Storage Mapping Strategy
>
> An alternative approach is that the application specifies the required
> (or preferred) multitenancy storage mapping strategy in the
> persistence.xml.

This is a preferable approach, and even though it may not be *necessary*
for
cloud deployment, it would be nice to have these options so the provider
can do
some checking at deployment time rather than the app failing at runtime.
It would also provide a standard way of configuring for striping in SaaS
apps.

> For example, a multitenant application that is designed with the
> intention that separate databases be used might indicate this in the
> persistence.xml as multitenancy = SEPARATE_DATABASE.

In general I don't think they would even need to specify this, since
this is what
we already assume, isn't it?

> An application that is designed with the intention that databases may
> be shared by partitioning at the database schema level might indicate
> this in the persistence.xml as multitenancy = SHARED_DATABASE. [A
> portable application that specifies this strategy should not specify
> schema or catalog names, as it might otherwise fail to deploy or fail
> to initialize.]

This probably doesn't matter, but although I find the terminology easy to
understand, from a PaaS user perspective the line between 1 and 2 might
be a little fuzzy because most of the cloud providers have some kind of
"database service", but the capabilities of those services differ.
In some cases one can create db instances and schemas (SEPARATE DB), yet
and in other cases the tenant "database" is just a place to store data,
with a
default schema and no ability to create a new one (SHARED DB).

> An application that is designed with the intention that tables be
> shared might indicate this in the persistence.xml as multitenancy =
> SHARED_SCHEMA. An app that uses explicit multitenant mapping metadata
> would be expected to specify this.
>
> [Open Issue: Is it useful to specify requirements along the lines of
> those used in approach (A) with this approach? If so, is the platform
> provider allowed to choose a different mapping strategy as long as
> that approach is more isolated? If no functional requirements are
> specified as in approach (A) and if a mapping strategy is specified in
> the persistence.xml that is provided by the application submitter,
> then if this information is not observed, the risk is that the app
> will fail. For example, observation of the specified mapping strategy
> might be required for the case where explicit multitenant mapping
> metadata is supplied for the striped mapping approach.]
>
>
> With both the approaches (A) and (B), different storage mapping
> strategies may be used for different tenants of the same application
> if the cloud platform provider supports a range of storage mapping
> choices.
>
>
> REQUIREMENTS FOR PORTABLE APPLICATIONS
>
> Applications that are intended to be portable in cloud environments
> should not specify schema or catalog names.

This sounds very reasonable to me and solves 99% of the cloud JPA app
scenario.

> DEPLOYMENT
>
> When an application instance is deployed for a tenant, the container
> needs to make the tenant identifier and tenant-related configuration
> information available to the persistence provider. The container
> needs to pass to the persistence provider a data source that is
> configured with appropriate credentials for the tenant, and which will
> provide isolation between that tenant and other tenants of the
> application. We should probably also define an interface to capture
> the tenant identifier and tenant-related metadata and configuration
> information that the container needs to pass to the persistence
> provider, e.g., a TenantContext.

Again, this would definitely help to enable JPA in SaaS apps.

> OTHER OPEN ISSUES
>
> 1. Additional metadata to support schema generation.

We might want to rename this to what it actually does -- table
generation :-)

> 2. Do we need metadata to indicate whether an application supports
> multitenant use -- i.e., whether it is "multitenant enabled"?
> Do we need this information specifically for JPA?

Again, it is not strictly required for PaaS, but it would be really nice
to have it
so SaaS cound be enabled, even though it is not formally supported.

> 3. Specification of resources that are shared across tenants--e.g.,
> a persistence unit for reference data that can be accessed by
> multiple tenants.

I'm not sure we need to solve this problem at this stage. Multiple tenants
accessing a shared read-only resource through identical JPA
configurations is
one thing, but having a single shared persistence unit spanning multiple
applications seems out of scope.