jsr338-experts@jpa-spec.java.net

[jsr338-experts] Re: JPA schema generation

From: Linda DeMichiel <linda.demichiel_at_oracle.com>
Date: Tue, 15 May 2012 11:17:19 -0700

On 5/15/2012 11:09 AM, Steve Ebersole wrote:
> Is the hope to have schema generation supported with multi-tenancy?

Yes -- all the pieces need to fit together.

I see no why for the JPA provider to know whether
> the schema needs to be generated because of this segmentation where an EMF only serves a single tenant. So is the plan
> there to put the onus to keep track of that on the environment? You seem to prefer the environment to manage the schema
> generation (the JPA provider simply generates a generation script as I read your proposal), so maybe this is not so bad.
>

I've also proposed that when the JPA provider "owns" the database (i.e., the database
is a dedicated rather than a shared database) that the JPA provider creates the schemas
in the database. The process as a whole, however, is driven from the platform provider side.

I would really like to get feedback from you and the other experts in the group on the details
here. The hard part is always in the details :-)

thanks again,

-Linda

> As for know if dropping is necessary, heck put all the onus on the environment ;)
>
>
> On Tue 15 May 2012 12:57:15 PM CDT, Linda DeMichiel wrote:
>> Folks,
>>
>> I'd like to get more feedback on this.
>>
>> thanks,
>>
>> -Linda
>>
>>
>> On 5/9/2012 11:36 AM, Linda DeMichiel wrote:
>>>
>>> In conjunction with support for use of JPA in PaaS environments, we
>>> will need to address the issue of schema generation. After we discuss
>>> this area, I'd like to return to a broader discussion of multitenancy
>>> support with this in view.
>>>
>>>
>>> Here's what I think we'll need to address:
>>>
>>> 1. How/when schema generation happens:
>>> a. at a deployment time phase (or predeployment time phase)
>>> b. at EMF creation time
>>>
>>> 2. Providing an option whereby the application can supply DDL for
>>> schema generation as an alternative to the persistence provider
>>> doing schema generation based on the ORM metadata
>>>
>>> 3. How, once schema generation has happened, tables can get loaded with
>>> data. If we enable the application to provide DDL scripts for
>>> schema generation, then supporting the use of SQL bulk insert scripts
>>> would go hand-in-hand with this, but perhaps there are also other
>>> approaches, e.g., bulk insert via JPQL(?).
>>>
>>> 4. The need for additional ORM metadata: e.g. metadata for index
>>> generation; for foreign key constraints; other(?).
>>>
>>> 5. More precise definition of existing metadata that can be used
>>> for schema generation: what it comprises, and how it can/must
>>> be used.
>>>
>>> 6. Additional persistence.xml level metadata and additional metadata
>>> the platform provider passes to the persistence provider to control
>>> or configure the operations outlined above.
>>>
>>>
>>> I am assuming for that Java EE 7 we need to support both the
>>> separate database and shared database (aka separate schema) approaches.
>>>
>>> I am also assuming that it is the platform provider that does the
>>> provisioning of either the database or the schema respectively (i.e.,
>>> in accordance with whether the separate database or shared database
>>> approach is taken). I.e., in the separate database approach, I'm
>>> assuming that it is the persistence provider that creates the schemas
>>> directly in the database or creates the DDL to generate the schemas
>>> needed for the persistence unit. [I'll use the term "schema
>>> generation" generically below to mean either table generation or
>>> schema+table generation with this assumption in view.]
>>>
>>>
>>> Here's a proposal to get the discussion started.
>>>
>>> SCHEMA GENERATION
>>>
>>> I think we should allow schema generation to be done either as a
>>> deployment (or pre-deployment) phase or at createEMF time.
>>>
>>> Deployment-time (or predeployment-time) schema generation could be
>>> done either by the platform provider calling into the
>>> PersistenceProvider implementation to generate the database schema or
>>> by the platform provider running SQL DDL scripts.
>>>
>>> SQL DDL scripts might be provided with the application (bundled as
>>> part of the persistence unit) or the platform provider might invoke
>>> the PersistenceProvider implementation to generate them from the
>>> persistence unit metadata. If scripts are provided or generated, it
>>> should be the platform provider's responsibility to run them against
>>> the database. If the application has provided DDL scripts, the
>>> application should also have the option to provide SQL scripts to do
>>> a bulk insert of data. It would be the responsibility of the
>>> platform provider to run these as well.
>>>
>>> An alternative to the use of scripts is for the platform provider to
>>> call into the PersistenceProvider to directly generate the tables in
>>> the database. In the case that a database has been provisioned rather
>>> than only a schema, the persistence provider would generate the
>>> schemas and tables.
>>>
>>> For the sake of argument, let's assume that we add a generateSchema()
>>> method to the PersistenceProvider interface, which the platform
>>> provider invokes during the deployment phase if SQL scripts have not
>>> been provided by the application.
>>>
>>> Information that needs to be passed in would include the following:
>>>
>>> (1) Information that specifies how schema generation is to proceed.
>>> This information could take the form of a SchemaGenerationInfo
>>> interface or it could be in the form of additional standard properties.
>>> (2) Connection to the database or schema. [Is this a Connection or a
>>> DataSource ?]
>>> (3) PersistenceUnitInfo
>>>
>>>
>>> The SchemaGenerationInfo should include the following:
>>>
>>> ddl-generation-mode: GENERATE_AND_EXECUTE | GENERATE_ONLY | EXECUTE_ONLY
>>>
>>> GENERATE_AND_EXECUTE => generate ddl scripts and generate schema (if
>>> applicable) and tables directly in the database
>>>
>>> GENERATE_ONLY => generate ddl scripts
>>>
>>> EXECUTE_ONLY => generate schema (if applicable) and tables directly in
>>> the database
>>>
>>> create-ddl-target: a Writer configured for the persistence provider
>>> for outputting of the DDL scripts for the creation of schema/tables.
>>> This should be null if EXECUTE_ONLY ddl-generation-mode is specified.
>>>
>>> drop-ddl-target: a Writer configured for the persistence provider
>>> for outputting of the DDL scripts for the dropping of schema/tables.
>>> The should be null if EXECUTE_ONLY ddl-generation-mode is specified.
>>>
>>> ddl-execution-mode: Used when executing directly against the database,
>>> rather than generating scripts. Options should include:
>>> CREATE_TABLES | DROP_TABLES | DROP_AND_CREATE_TABLES
>>> Do we also need to distinguish CREATE_SCHEMAS_AND_TABLES ? DROP_SCHEMA ?
>>> Or can these be implicit?
>>> Do we also need an ALTER_TABLES capability here?
>>>
>>> properties: These could include any vendor-specific properties
>>>
>>>
>>> ADDITIONAL METADATA (persistence.xml)
>>>
>>> If scripts are provided with the application (i.e., bundled as part
>>> of the persistence unit), we should add entries in the persistence.xml
>>> to identify them:
>>> create-ddl-script
>>> drop-ddl-script
>>> data-load-script(s) [Do we need to support more than one load script?
>>> If so, the ordering may need to be specified as well.]
>>>
>>> Open issue: do we need metadata in the persistence.xml for
>>> ddl-execution-mode as well? For example, this might be used in
>>> standalone scenarios (?)
>>>
>>>
>>> ORM METADATA (annotations and orm.xml)
>>>
>>> (1) Indexes
>>>
>>> I think we need to support indexes.
>>>
>>> Here's a strawman annotation:
>>>
>>> @Target({}) @Retention(RUNTIME)
>>> public @interface Index {
>>> String name() default "";
>>> String columnList();
>>> }
>>>
>>> The columnList syntax could follow that of the OrderBy annotation:
>>>
>>> columnList::= indexColumn [, indexColumn]*
>>> indexColumn::= columnName [ASC | DESC]
>>>
>>> If ASC or DESC is not specified, ASC is assumed
>>>
>>> We'd also need to add
>>> Index[] indexes() default {};
>>> to Table, Secondary Table, CollectionTable, JoinTable, and
>>> TableGenerator
>>>
>>>
>>> (2) Foreign Key Constraints
>>>
>>> I see two possible approaches:
>>>
>>> (a) Add a string-valued foreignKeyDefinition element to JoinColumn,
>>> JoinColumns, MapKeyJoinColumn, etc. to specify a SQL fragment
>>> (intended along the lines of columnDefinition) for defining a foreign
>>> key constraint and/or for overriding the persistence provider's
>>> default foreign key definition. It might also be helpful to allow a
>>> name for the foreign key constraint to be specified for the case where
>>> the provider is using its default foreign key generation strategy.
>>>
>>> or
>>>
>>> (b) Add a ForeignKey annotation to specify the foreign key constraint --
>>> for example,
>>>
>>> public @interface ForeignKey {
>>> String name() default "";
>>> ForeignKeyAction updateAction() default NO_ACTION;
>>> ForeignKeyAction deleteAction() default NO_ACTION;
>>> boolean deferred() default false;
>>> }
>>>
>>> public enum ForeignKeyAction {NO_ACTION, CASCADE, SET_NULL};
>>>
>>> I have some misgivings about approach (b), and, given a lack of
>>> database portability here, we might wind up needing a
>>> foreignKeyDefinition
>>> fragment anyway.
>>>
>>>
>>> (3) Other
>>>
>>> What about the ability to distinguish a CHAR rather than VARCHAR
>>> mapping for strings? Or should we just leave this as belonging in a
>>> SQL fragment (with the understanding that we would be intentionally
>>> discouraging the use of CHAR strings)?
>>>
>>> (4) Anything else I'm missing?
>>>
>>>
>>> APIs
>>>
>>> I assumed the addition of a PersistenceProvider.generateSchema() method
>>> above.
>>>
>>> If we also support schema generation at EMF creation time, do we want
>>> another createContainerEntityManagerFactory() method that takes a
>>> SchemaGenerationInfo as well as PersistenceUnitInfo argument ?
>>>
>>> If generation is done directly in the database, at some point the
>>> container (or application) may need to have the schema/tables dropped.
>>> I'm not sure of the best way to support this. A dropAndClose() method
>>> on the EMF??
>>>
>>>
>>>