[jsr338-experts] Re: JPA schema generation

From: Linda DeMichiel <linda.demichiel_at_oracle.com>
Date: Thu, 17 May 2012 17:14:19 -0700

Thanks for the input! A few remarks below.....

On 5/17/2012 6:56 AM, Steve Ebersole wrote:
> My thoughts inline...
>
> On 05/09/2012 01:36 PM, Linda DeMichiel wrote:
>>
>> In conjunction with support for use of JPA in PaaS environments, we
>> will need to address the issue of schema generation. After we discuss
>> this area, I'd like to return to a broader discussion of multitenancy
>> support with this in view.
>>
>>
>> Here's what I think we'll need to address:
>>
>> 1. How/when schema generation happens:
>> a. at a deployment time phase (or predeployment time phase)
>> b. at EMF creation time
>
> Well if the creation is driven by the platform provider, its 6-in-one IMO. But in standalone cases we obviously do not
> have the benefit of a platform provider, so there the answer has to be "at EMF creation".
>
>
>> 2. Providing an option whereby the application can supply DDL for
>> schema generation as an alternative to the persistence provider
>> doing schema generation based on the ORM metadata
>
> Is this per deployment? Or per entity?
>

Per deployment

> Another related thing to consider is "auxiliary database objects"; basically allowing create/drop definitions for
> database objects other than those explicitly named in mappings. Functions, procedures, view, etc, etc.
>

Right. This would be doable via the scripts

>
>> 3. How, once schema generation has happened, tables can get loaded with
>> data. If we enable the application to provide DDL scripts for
>> schema generation, then supporting the use of SQL bulk insert scripts
>> would go hand-in-hand with this, but perhaps there are also other
>> approaches, e.g., bulk insert via JPQL(?).
>
> +1 If you are exporting schemas, you will generally need this behavior.
>
> I think SQL is much better than JPAQL here because of re-usability. I get the desire to insulate from object
> (table/column) naming, but still think the re-usability is more important.
>
>
>> 4. The need for additional ORM metadata: e.g. metadata for index
>> generation; for foreign key constraints; other(?).
>
> +1 I assume most vendors have this already in vendor-specific annotations.
>
>
>> 5. More precise definition of existing metadata that can be used
>> for schema generation: what it comprises, and how it can/must
>> be used.
>
> Not following this. Could you elaborate?
>

Right now, the spec contains a lot of "physical" annotations and XML elements
that would be used in schema generation, in addition to logical elements
that have corresponding "physical" defaults. However, their semantics aren't
explicitly defined in the spec with schema generation in view.

>
>> 6. Additional persistence.xml level metadata and additional metadata
>> the platform provider passes to the persistence provider to control
>> or configure the operations outlined above.
>
> Again, not following. What would some specifics be?
>

What I describe further below under "Additional Metadata"

>
>> I am assuming for that Java EE 7 we need to support both the
>> separate database and shared database (aka separate schema) approaches.
>
> Why not discriminator-based as well?
>

Since Java EE 7 isn't supporting full SaaS, spec'ing out JPA provider-managed
SaaS seemed premature. However, after having thought more about this, I do
agree with you that it would be good if we could at least standardize on
the annotations to support application-managed single-table SaaS.

>
>> I am also assuming that it is the platform provider that does the
>> provisioning of either the database or the schema respectively (i.e.,
>> in accordance with whether the separate database or shared database
>> approach is taken). I.e., in the separate database approach, I'm
>> assuming that it is the persistence provider that creates the schemas
>> directly in the database or creates the DDL to generate the schemas
>> needed for the persistence unit. [I'll use the term "schema
>> generation" generically below to mean either table generation or
>> schema+table generation with this assumption in view.]
>>
>>
>> Here's a proposal to get the discussion started.
>>
>> SCHEMA GENERATION
>>
>> I think we should allow schema generation to be done either as a
>> deployment (or pre-deployment) phase or at createEMF time.
>>
>> Deployment-time (or predeployment-time) schema generation could be
>> done either by the platform provider calling into the
>> PersistenceProvider implementation to generate the database schema or
>> by the platform provider running SQL DDL scripts.
>>
>> SQL DDL scripts might be provided with the application (bundled as
>> part of the persistence unit) or the platform provider might invoke
>> the PersistenceProvider implementation to generate them from the
>> persistence unit metadata. If scripts are provided or generated, it
>> should be the platform provider's responsibility to run them against
>> the database. If the application has provided DDL scripts, the
>> application should also have the option to provide SQL scripts to do
>> a bulk insert of data. It would be the responsibility of the
>> platform provider to run these as well.
>
> Personally, I think the ability to provide SQL init scripts for initial data insertion should be available regardless of
> whether the application provided the DDL or the persistence provider generated the DDL.
>

OK, but if the provider generates the DDL, we do need to be very precise as
to exactly what would be generated to guarantee that this would work.
Even then, I'd expect developers to have the provider generate the DDL and then
package that DDL together with the data insertion scripts.

>
>> An alternative to the use of scripts is for the platform provider to
>> call into the PersistenceProvider to directly generate the tables in
>> the database. In the case that a database has been provisioned rather
>> than only a schema, the persistence provider would generate the
>> schemas and tables.
>>
>> For the sake of argument, let's assume that we add a generateSchema()
>> method to the PersistenceProvider interface, which the platform
>> provider invokes during the deployment phase if SQL scripts have not
>> been provided by the application.
>>
>> Information that needs to be passed in would include the following:
>>
>> (1) Information that specifies how schema generation is to proceed.
>> This information could take the form of a SchemaGenerationInfo
>> interface or it could be in the form of additional standard properties.
>> (2) Connection to the database or schema. [Is this a Connection or a
>> DataSource ?]
>> (3) PersistenceUnitInfo
>>
>>
>> The SchemaGenerationInfo should include the following:
>>
>> ddl-generation-mode: GENERATE_AND_EXECUTE | GENERATE_ONLY | EXECUTE_ONLY
>>
>> GENERATE_AND_EXECUTE => generate ddl scripts and generate schema (if
>> applicable) and tables directly in the database
>>
>> GENERATE_ONLY => generate ddl scripts
>>
>> EXECUTE_ONLY => generate schema (if applicable) and tables directly in
>> the database
>>
>> create-ddl-target: a Writer configured for the persistence provider
>> for outputting of the DDL scripts for the creation of schema/tables.
>> This should be null if EXECUTE_ONLY ddl-generation-mode is specified.
>>
>> drop-ddl-target: a Writer configured for the persistence provider
>> for outputting of the DDL scripts for the dropping of schema/tables.
>> The should be null if EXECUTE_ONLY ddl-generation-mode is specified.
>>
>> ddl-execution-mode: Used when executing directly against the database,
>> rather than generating scripts. Options should include:
>> CREATE_TABLES | DROP_TABLES | DROP_AND_CREATE_TABLES
>> Do we also need to distinguish CREATE_SCHEMAS_AND_TABLES ? DROP_SCHEMA ?
>> Or can these be implicit?
>> Do we also need an ALTER_TABLES capability here?
>
> Schema evolution is infinitely more tricky than schema export and infinitely more tricky than it first appears. -1 to
> schema evolution capability IMO.
>
>
>> properties: These could include any vendor-specific properties
>>
>>
>> ADDITIONAL METADATA (persistence.xml)
>>
>> If scripts are provided with the application (i.e., bundled as part
>> of the persistence unit), we should add entries in the persistence.xml
>> to identify them:
>> create-ddl-script
>> drop-ddl-script
>> data-load-script(s) [Do we need to support more than one load script?
>> If so, the ordering may need to be specified as well.]
>>
>> Open issue: do we need metadata in the persistence.xml for
>> ddl-execution-mode as well? For example, this might be used in
>> standalone scenarios (?)
>>
>>
>> ORM METADATA (annotations and orm.xml)
>>
>> (1) Indexes
>>
>> I think we need to support indexes.
>>
>> Here's a strawman annotation:
>>
>> @Target({}) @Retention(RUNTIME)
>> public @interface Index {
>> String name() default "";
>> String columnList();
>> }
>>
>> The columnList syntax could follow that of the OrderBy annotation:
>>
>> columnList::= indexColumn [, indexColumn]*
>> indexColumn::= columnName [ASC | DESC]
>>
>> If ASC or DESC is not specified, ASC is assumed
>>
>> We'd also need to add
>> Index[] indexes() default {};
>> to Table, Secondary Table, CollectionTable, JoinTable, and TableGenerator
>>
>>
>> (2) Foreign Key Constraints
>>
>> I see two possible approaches:
>>
>> (a) Add a string-valued foreignKeyDefinition element to JoinColumn,
>> JoinColumns, MapKeyJoinColumn, etc. to specify a SQL fragment
>> (intended along the lines of columnDefinition) for defining a foreign
>> key constraint and/or for overriding the persistence provider's
>> default foreign key definition. It might also be helpful to allow a
>> name for the foreign key constraint to be specified for the case where
>> the provider is using its default foreign key generation strategy.
>>
>> or
>>
>> (b) Add a ForeignKey annotation to specify the foreign key constraint --
>> for example,
>>
>> public @interface ForeignKey {
>> String name() default "";
>> ForeignKeyAction updateAction() default NO_ACTION;
>> ForeignKeyAction deleteAction() default NO_ACTION;
>> boolean deferred() default false;
>> }
>>
>> public enum ForeignKeyAction {NO_ACTION, CASCADE, SET_NULL};
>>
>> I have some misgivings about approach (b), and, given a lack of
>> database portability here, we might wind up needing a foreignKeyDefinition
>> fragment anyway.
>
> Can you give an example of what you are thinking with 'foreignKeyDefinition'? Personally I was leaning towards (b) until
> I read this last bit. Do you just mean different capabilities (i.e., some dbs do not support deferred constraints, etc)?
>

Analogous to the columnDefinition, it would allow capturing database-specific syntax/semantics.
E.g., different variants of foreign key actions, what might be deferrable / deferred, etc.

> There is an implication in terms of how a persistence provider needs to handle the case of cascading foreign keys in
> terms of mapped cascading. Take the case of remove operation, with ForeignKey.deleteAction==CASCADE all of a sudden the
> database has now physically deleted corresponding rows "underneath" the provider which has ramifications on the
> persistence provider in terms of cleaning up cache entries and possibly in terms of additional cascading. Yes, we have
> this mismatch today as is, but today we also don't have the needed visibility into that situation because we do not know
> the foreign key has been defined that way. Here that changes. Just want to make sure we then account for that in the
> other "entity operation"-related sections.
>

If the application bundles the DDL, that problem remains, doesn't it? I didn't understand what you
meant by the last sentence above though.

> Either way, I think the ability to name the foreign key is definitely warranted.
>
>>
>>
>> (3) Other
>>
>> What about the ability to distinguish a CHAR rather than VARCHAR
>> mapping for strings? Or should we just leave this as belonging in a
>> SQL fragment (with the understanding that we would be intentionally
>> discouraging the use of CHAR strings)?
>>
>> (4) Anything else I'm missing?
>>
>>
>> APIs
>>
>> I assumed the addition of a PersistenceProvider.generateSchema() method
>> above.
>>
>> If we also support schema generation at EMF creation time, do we want
>> another createContainerEntityManagerFactory() method that takes a
>> SchemaGenerationInfo as well as PersistenceUnitInfo argument ?
>>
>> If generation is done directly in the database, at some point the
>> container (or application) may need to have the schema/tables dropped.
>> I'm not sure of the best way to support this. A dropAndClose() method
>> on the EMF??
>>
>>
>>