[jpa-spec users] [jsr338-experts] Re: support for multitenancy

From: michael keith <michael.keith_at_oracle.com>
Date: Wed, 04 Apr 2012 17:24:54 -0400

On 04/04/2012 1:51 PM, Pinaki Poddar wrote:
>> (1) Separate database approach
>> (2) Shared database / separate schema approach
>> (3) Shared table approach
> The third approach seems to be fundamentally different than the former two
> w.r.t JPA.
>
>> At runtime, there is a separate application instance (or set of
> instances, e.g., in a clustered environment) per tenant.
>
> This scenario could be supported currently if we consider a persistence
> unit (a.k.a EntityManagerFactory) has an 1:1 affinity
> to a tenant, except in the case of SHARED_TABLE approach.
> Of course, to support the first two approaches, the configuration via
> persistence.xml could be extended with some sort of variable substitution
> mechanics that allows the deployment process to generate individual
> persistence.xml for each tenant from a template by substituting
> tenant-specific database, schema, credentials etc. Essentially, the
> specification should strive to keep EntityManagerFactory least aware of
> multi-tenancy at runtime and push the complexity towards configuration
> phase as much as possible.

Right, I agree, in these cases the JPA the provider does not need to be
tenant-aware at all.

> The SHARED_TABLE or "striped" use case is fundamentally different because
> it would be more intrusive to the current runtime behavior. Every database
> operation would now has to be scoped by the tenant identifier. All tenants
> data in the same table does not appear to be a recommended approach for
> multi-tenant environment -- and it may be prudent to wait-and-watch how the
> data storage strategy emerges in cloud environment, before trying to
> accommodate this use case in JPA 2.1 timeframe.
>
>
>> JPA 2.1 can be extended to encompass a more general approach to SaaS in
> the future in which a single application instance serves multiple tenants
>
> This requirement turns the affinity of persistence unit and tenant to 1:n.
> One possibility to address the issue could be to consider a wider scope
> that encloses EntityManagerFactory itself, something like a
> PersitenceUnitFactory that gets a tenant-specific EntityManagerFactory.
> Such an abstraction will retain the current EntityManagerFactory scoped per
> tenant and hence least intrusive to accommodate multi-tenancy aspects.

Exactly. Since an EMF is already a perfect unit of isolation it makes
perfect sense to provide
tenant isolation using that. We would just need to provide some
flexibility to enable it.

> Regards --
>
> Pinaki Poddar
> Chair, Apache OpenJPA Project http://openjpa.apache.org/
> JPA Expert Group Member
> Application& Integration Middleware
>
>
>
>
>
>
>
>
> From: Linda DeMichiel<linda.demichiel_at_oracle.com>
> To: jsr338-experts_at_jpa-spec.java.net
> Date: 03/26/2012 04:54 PM
> Subject: [jsr338-experts] support for multitenancy
>
>
>
> One of the main items on the agenda for the JPA 2.1 release is support
> for multitenancy in Java EE 7 cloud environments.
>
> In Java EE 7, an application can be submitted into a cloud environment
> for use by multiple tenants in what can be viewed as a basic form of
> software as a service (SaaS). The application is customized and
> deployed on a per-tenant basis. At runtime, there is a separate
> application instance (or set of instances, e.g., in a clustered
> environment) per tenant. The instances used by different tenants are
> isolated from one another. The resources used by a tenant's
> application may also be isolated from one another, or may be shared.
> In general, however, it is assumed that a tenant's data is isolated
> from that other tenants.
>
> There are three well-known approaches to support for multitenancy at
> the database level:
>
> (1) separate database approach
> (2) shared database / separate schema approach
> (3) shared schema / shared table approach
>
>
> To get the discussion started, this is a high-level strawman sketch of
> how the 3 approaches might be used with JPA in keeping with the Java
> EE 7 approach. At the same time, however, we also want to be sure
> that what we specify in JPA 2.1 can be extended to encompass a more
> general approach to SaaS in the future in which a single application
> instance serves multiple tenants and in which multitenancy is managed
> by the Java EE environment.
>
> For further information on how Java EE 7 is approaching PaaS/SaaS, you
> might find the documents on the javaee-spec.java.net project useful,
> particularly
> http://java.net/projects/javaee-spec/downloads/download/PaaS.pdf
> and the latest draft of the Java EE 7 Platform spec,
> http://java.net/projects/javaee-spec/downloads/download/JavaEE_Platform_Spec.pdf
> .
>
> Note that the identifier for the tenant will be made available to the
> application in JNDI as java:comp/tenantId. The tenantId will be a
> string, whose max length should allow it to be portably stored in a
> single database column.
>
>
> APPROACHES:
>
> (1) Separate database approach
>
> In this approach, each tenant's persistence unit is mapped to a
> separate database. This approach provides the greatest isolation
> between tenants and does not impose any additional constraints over
> the object/relational mapping of the persistence unit or over the
> operations that can be performed. In particular, the use of
> multiple database schemas or catalogs are supported as are native
> queries.
>
> In some cloud environments, use of this approach might not be
> available, as a tenant might be allocated storage within a database
> rather than a separate database.
>
>
> (2) Shared database / separate schema approach
>
> In this approach, each tenant's data is stored in database tables
> that are isolated from those of any other tenant. In databases that
> support schemas, this will typically be achieved by allocating a
> separate schema per tenant. The database's permissions facility is
> used to confine a tenant's access to the designated schema, thus
> providing isolation between tenants at the schema level.
>
> Support for this approach is straightforward if the persistence unit
> uses only the default schema or catalog (i.e., if it does not specify
> schema names or catalogs in the object/relational mapping metadata).
> A native query that attempts to access data in a schema other than
> that assigned to the tenant by the platform provider will be trapped
> by the database authorization mechanisms and will result in an
> exception.
>
> [While the case where the persistence unit metadata explicitly
> specifies one or more schemas could potentially be handled by the
> persistence provider by remapping schema names and native queries that
> embed schema names, I would not propose that we specify or require
> support for this case, although a more sophisticated persistence
> provider might choose to support it.]
>
>
> (3) Shared table approach
>
> In this approach, database tables are shared ("striped") across tenants.
>
> It is the reponsibility of the persistence provider to provide
> per-tenant isolation in accessing data. This will typically be done
> by mapping and maintaining a tenant ID column in the respective
> tables, and augmenting data retrieval and query operations, updates,
> and inserts with tenant IDs. The use of native queries would need to
> be trapped by the persistence provider and not allowed unless the
> persistence provider were able to modify them to provide isolation of
> tenant data.
>
> Ideally, the management of the tenant id should be transparent to the
> application, although we should revisit this in Java EE 8 as we move
> further into support for SaaS.
>
> I believe that the main use case for the shared table approach is in
> SaaS environments in which a single application instance is servicing
> multiple tenants. This is outside the scope of Java EE 7, so I don't
> think that we need to standardize on support for this approach now,
> although we should not lose sight of it as we standardize on other
> aspects.
>
>
>
> DETERMINING THE MULTITENANCY STORAGE MAPPING STRATEGY:
>
> We see two general approaches to determining the multitenancy storage
> mapping strategy that should be used for a persistence unit. In some
> cases, these approaches might be combined.
>
> Again, note that a cloud platform provider might use a single strategy
> for all tenants in allocating database storage. For example, each
> tenant might be allocated a separate database, or each tenant might
> only be allocated a schema within a database.
>
>
> (A) The Application Specifies Its Requirements
>
> In this approach, the application specifies its functional
> requirements (in terms of need for named, multiple schemas and/or use
> of native queries) in the persistence.xml descriptor, and the deployer
> and/or cloud platform provider determine the storage strategy that is
> used for the tenant. This metadata serves as input to the deployer
> for the tenant or as input into the automated provisioning of the
> application by the cloud platform provider (if automated provisioning
> is supported by the platform instance).
>
> For example, an application might specify that it requires support for
> multiple schemas and native queries. In general, such requirements
> would mean that a separate database would need to be provisioned for
> the tenant. If this is not possible, then unless the platform
> provider supported a persistence provider that could perform schema
> remapping and/or modification of native queries, the application might
> fail to deploy or fail to initialize. On the other hand, if an
> application specifies that it uses only the default schema and native
> queries, then either the separate database or separate schema approach
> could be used.
>
>
>
> (B) The Application Specifies the Multitenancy Storage Mapping Strategy
>
> An alternative approach is that the application specifies the required
> (or preferred) multitenancy storage mapping strategy in the
> persistence.xml.
>
> For example, a multitenant application that is designed with the
> intention that separate databases be used might indicate this in the
> persistence.xml as multitenancy = SEPARATE_DATABASE.
>
> An application that is designed with the intention that databases may
> be shared by partitioning at the database schema level might indicate
> this in the persistence.xml as multitenancy = SHARED_DATABASE. [A
> portable application that specifies this strategy should not specify
> schema or catalog names, as it might otherwise fail to deploy or fail
> to initialize.]
>
> An application that is designed with the intention that tables be
> shared might indicate this in the persistence.xml as multitenancy =
> SHARED_SCHEMA. An app that uses explicit multitenant mapping metadata
> would be expected to specify this.
>
> [Open Issue: Is it useful to specify requirements along the lines of
> those used in approach (A) with this approach? If so, is the platform
> provider allowed to choose a different mapping strategy as long as
> that approach is more isolated? If no functional requirements are
> specified as in approach (A) and if a mapping strategy is specified in
> the persistence.xml that is provided by the application submitter,
> then if this information is not observed, the risk is that the app
> will fail. For example, observation of the specified mapping strategy
> might be required for the case where explicit multitenant mapping
> metadata is supplied for the striped mapping approach.]
>
>
> With both the approaches (A) and (B), different storage mapping
> strategies may be used for different tenants of the same application
> if the cloud platform provider supports a range of storage mapping
> choices.
>
>
> REQUIREMENTS FOR PORTABLE APPLICATIONS
>
> Applications that are intended to be portable in cloud environments
> should not specify schema or catalog names.
>
>
> DEPLOYMENT
>
> When an application instance is deployed for a tenant, the container
> needs to make the tenant identifier and tenant-related configuration
> information available to the persistence provider. The container
> needs to pass to the persistence provider a data source that is
> configured with appropriate credentials for the tenant, and which will
> provide isolation between that tenant and other tenants of the
> application. We should probably also define an interface to capture
> the tenant identifier and tenant-related metadata and configuration
> information that the container needs to pass to the persistence
> provider, e.g., a TenantContext.
>
>
> OTHER OPEN ISSUES
>
> 1. Additional metadata to support schema generation.
>
> 2. Do we need metadata to indicate whether an application supports
> multitenant use -- i.e., whether it is "multitenant enabled"?
> Do we need this information specifically for JPA?
>
> 3. Specification of resources that are shared across tenants--e.g.,
> a persistence unit for reference data that can be accessed by
> multiple tenants.
>
>
>
>
>