[jsr338-experts] Re: JPA schema generation

From: Linda DeMichiel <linda.demichiel_at_oracle.com>
Date: Mon, 21 May 2012 11:44:25 -0700

Do any of the rest of you have any feedback on this? If so, please
send it now.

thanks,

-Linda

On 5/15/2012 10:57 AM, Linda DeMichiel wrote:
> Folks,
>
> I'd like to get more feedback on this.
>
> thanks,
>
> -Linda
>
>
> On 5/9/2012 11:36 AM, Linda DeMichiel wrote:
>>
>> In conjunction with support for use of JPA in PaaS environments, we
>> will need to address the issue of schema generation. After we discuss
>> this area, I'd like to return to a broader discussion of multitenancy
>> support with this in view.
>>
>>
>> Here's what I think we'll need to address:
>>
>> 1. How/when schema generation happens:
>> a. at a deployment time phase (or predeployment time phase)
>> b. at EMF creation time
>>
>> 2. Providing an option whereby the application can supply DDL for
>> schema generation as an alternative to the persistence provider
>> doing schema generation based on the ORM metadata
>>
>> 3. How, once schema generation has happened, tables can get loaded with
>> data. If we enable the application to provide DDL scripts for
>> schema generation, then supporting the use of SQL bulk insert scripts
>> would go hand-in-hand with this, but perhaps there are also other
>> approaches, e.g., bulk insert via JPQL(?).
>>
>> 4. The need for additional ORM metadata: e.g. metadata for index
>> generation; for foreign key constraints; other(?).
>>
>> 5. More precise definition of existing metadata that can be used
>> for schema generation: what it comprises, and how it can/must
>> be used.
>>
>> 6. Additional persistence.xml level metadata and additional metadata
>> the platform provider passes to the persistence provider to control
>> or configure the operations outlined above.
>>
>>
>> I am assuming for that Java EE 7 we need to support both the
>> separate database and shared database (aka separate schema) approaches.
>>
>> I am also assuming that it is the platform provider that does the
>> provisioning of either the database or the schema respectively (i.e.,
>> in accordance with whether the separate database or shared database
>> approach is taken). I.e., in the separate database approach, I'm
>> assuming that it is the persistence provider that creates the schemas
>> directly in the database or creates the DDL to generate the schemas
>> needed for the persistence unit. [I'll use the term "schema
>> generation" generically below to mean either table generation or
>> schema+table generation with this assumption in view.]
>>
>>
>> Here's a proposal to get the discussion started.
>>
>> SCHEMA GENERATION
>>
>> I think we should allow schema generation to be done either as a
>> deployment (or pre-deployment) phase or at createEMF time.
>>
>> Deployment-time (or predeployment-time) schema generation could be
>> done either by the platform provider calling into the
>> PersistenceProvider implementation to generate the database schema or
>> by the platform provider running SQL DDL scripts.
>>
>> SQL DDL scripts might be provided with the application (bundled as
>> part of the persistence unit) or the platform provider might invoke
>> the PersistenceProvider implementation to generate them from the
>> persistence unit metadata. If scripts are provided or generated, it
>> should be the platform provider's responsibility to run them against
>> the database. If the application has provided DDL scripts, the
>> application should also have the option to provide SQL scripts to do
>> a bulk insert of data. It would be the responsibility of the
>> platform provider to run these as well.
>>
>> An alternative to the use of scripts is for the platform provider to
>> call into the PersistenceProvider to directly generate the tables in
>> the database. In the case that a database has been provisioned rather
>> than only a schema, the persistence provider would generate the
>> schemas and tables.
>>
>> For the sake of argument, let's assume that we add a generateSchema()
>> method to the PersistenceProvider interface, which the platform
>> provider invokes during the deployment phase if SQL scripts have not
>> been provided by the application.
>>
>> Information that needs to be passed in would include the following:
>>
>> (1) Information that specifies how schema generation is to proceed.
>> This information could take the form of a SchemaGenerationInfo
>> interface or it could be in the form of additional standard properties.
>> (2) Connection to the database or schema. [Is this a Connection or a
>> DataSource ?]
>> (3) PersistenceUnitInfo
>>
>>
>> The SchemaGenerationInfo should include the following:
>>
>> ddl-generation-mode: GENERATE_AND_EXECUTE | GENERATE_ONLY | EXECUTE_ONLY
>>
>> GENERATE_AND_EXECUTE => generate ddl scripts and generate schema (if
>> applicable) and tables directly in the database
>>
>> GENERATE_ONLY => generate ddl scripts
>>
>> EXECUTE_ONLY => generate schema (if applicable) and tables directly in
>> the database
>>
>> create-ddl-target: a Writer configured for the persistence provider
>> for outputting of the DDL scripts for the creation of schema/tables.
>> This should be null if EXECUTE_ONLY ddl-generation-mode is specified.
>>
>> drop-ddl-target: a Writer configured for the persistence provider
>> for outputting of the DDL scripts for the dropping of schema/tables.
>> The should be null if EXECUTE_ONLY ddl-generation-mode is specified.
>>
>> ddl-execution-mode: Used when executing directly against the database,
>> rather than generating scripts. Options should include:
>> CREATE_TABLES | DROP_TABLES | DROP_AND_CREATE_TABLES
>> Do we also need to distinguish CREATE_SCHEMAS_AND_TABLES ? DROP_SCHEMA ?
>> Or can these be implicit?
>> Do we also need an ALTER_TABLES capability here?
>>
>> properties: These could include any vendor-specific properties
>>
>>
>> ADDITIONAL METADATA (persistence.xml)
>>
>> If scripts are provided with the application (i.e., bundled as part
>> of the persistence unit), we should add entries in the persistence.xml
>> to identify them:
>> create-ddl-script
>> drop-ddl-script
>> data-load-script(s) [Do we need to support more than one load script?
>> If so, the ordering may need to be specified as well.]
>>
>> Open issue: do we need metadata in the persistence.xml for
>> ddl-execution-mode as well? For example, this might be used in
>> standalone scenarios (?)
>>
>>
>> ORM METADATA (annotations and orm.xml)
>>
>> (1) Indexes
>>
>> I think we need to support indexes.
>>
>> Here's a strawman annotation:
>>
>> @Target({}) @Retention(RUNTIME)
>> public @interface Index {
>> String name() default "";
>> String columnList();
>> }
>>
>> The columnList syntax could follow that of the OrderBy annotation:
>>
>> columnList::= indexColumn [, indexColumn]*
>> indexColumn::= columnName [ASC | DESC]
>>
>> If ASC or DESC is not specified, ASC is assumed
>>
>> We'd also need to add
>> Index[] indexes() default {};
>> to Table, Secondary Table, CollectionTable, JoinTable, and TableGenerator
>>
>>
>> (2) Foreign Key Constraints
>>
>> I see two possible approaches:
>>
>> (a) Add a string-valued foreignKeyDefinition element to JoinColumn,
>> JoinColumns, MapKeyJoinColumn, etc. to specify a SQL fragment
>> (intended along the lines of columnDefinition) for defining a foreign
>> key constraint and/or for overriding the persistence provider's
>> default foreign key definition. It might also be helpful to allow a
>> name for the foreign key constraint to be specified for the case where
>> the provider is using its default foreign key generation strategy.
>>
>> or
>>
>> (b) Add a ForeignKey annotation to specify the foreign key constraint --
>> for example,
>>
>> public @interface ForeignKey {
>> String name() default "";
>> ForeignKeyAction updateAction() default NO_ACTION;
>> ForeignKeyAction deleteAction() default NO_ACTION;
>> boolean deferred() default false;
>> }
>>
>> public enum ForeignKeyAction {NO_ACTION, CASCADE, SET_NULL};
>>
>> I have some misgivings about approach (b), and, given a lack of
>> database portability here, we might wind up needing a foreignKeyDefinition
>> fragment anyway.
>>
>>
>> (3) Other
>>
>> What about the ability to distinguish a CHAR rather than VARCHAR
>> mapping for strings? Or should we just leave this as belonging in a
>> SQL fragment (with the understanding that we would be intentionally
>> discouraging the use of CHAR strings)?
>>
>> (4) Anything else I'm missing?
>>
>>
>> APIs
>>
>> I assumed the addition of a PersistenceProvider.generateSchema() method
>> above.
>>
>> If we also support schema generation at EMF creation time, do we want
>> another createContainerEntityManagerFactory() method that takes a
>> SchemaGenerationInfo as well as PersistenceUnitInfo argument ?
>>
>> If generation is done directly in the database, at some point the
>> container (or application) may need to have the schema/tables dropped.
>> I'm not sure of the best way to support this. A dropAndClose() method
>> on the EMF??
>>
>>
>>