[jsr338-experts] Re: JPA schema generation

From: Nicolas Seyvet <nicolas.seyvet_at_ericsson.com>
Date: Thu, 10 May 2012 08:30:14 +0200

See below.

-----Original Message-----
From: Linda DeMichiel [mailto:linda.demichiel_at_oracle.com]
Sent: Wednesday, May 09, 2012 8:37 PM
To: jsr338-experts_at_jpa-spec.java.net
Subject: [jsr338-experts] JPA schema generation

In conjunction with support for use of JPA in PaaS environments, we will need to address the issue of schema generation. After we discuss this area, I'd like to return to a broader discussion of multitenancy support with this in view.

Here's what I think we'll need to address:

1. How/when schema generation happens:
     a. at a deployment time phase (or predeployment time phase)
[lmcnise] Useful for testing and or quick deployment, so it should be in.
     b. at EMF creation time
[lmcnise] That is an interesting idea, as you could then have entities not generating a schema/table for as long as the entity manager is inactive.

2. Providing an option whereby the application can supply DDL for
     schema generation as an alternative to the persistence provider
     doing schema generation based on the ORM metadata
[lmcnise] Only valid option for a production system. It is also how you can upgrade an in production system by applying small changes to the DDL.

3. How, once schema generation has happened, tables can get loaded with
     data. If we enable the application to provide DDL scripts for
     schema generation, then supporting the use of SQL bulk insert scripts
     would go hand-in-hand with this, but perhaps there are also other
     approaches, e.g., bulk insert via JPQL(?).

[lmcnise] Don’t understand.

4. The need for additional ORM metadata: e.g. metadata for index
     generation; for foreign key constraints; other(?).

[lmcnise] Yes. Hibernate has interesting annotations to cover this. Much more convenient than to have a separate file containing the different indexes, etc.

Ex:
    @Basic // Std JPA
    @Index(name = "EDB_NODE_PARENT_ID_IDX") // hibernate
    private Long parentId;

package org.hibernate.annotations;

@java.lang.annotation.Target({java.lang.annotation.ElementType.FIELD, java.lang.annotation.ElementType.METHOD})
@java.lang.annotation.Retention(java.lang.annotation.RetentionPolicy.RUNTIME)
public @interface Index {

    java.lang.String name();

    java.lang.String[] columnNames() default {};
}

5. More precise definition of existing metadata that can be used
     for schema generation: what it comprises, and how it can/must
     be used.

[lmcnise] Goes hand in hand with 4.

6. Additional persistence.xml level metadata and additional metadata
     the platform provider passes to the persistence provider to control
     or configure the operations outlined above.
[lmcnise] Any ideas what sort of control/configuration? Is it something like being able to control the schema generation based on the DB engine? For example, MySQL InnoDB, NDB are different schemas as NDB does not support Foreign key support.

I am assuming for that Java EE 7 we need to support both the separate database and shared database (aka separate schema) approaches.

I am also assuming that it is the platform provider that does the provisioning of either the database or the schema respectively (i.e., in accordance with whether the separate database or shared database
approach is taken). I.e., in the separate database approach, I'm
assuming that it is the persistence provider that creates the schemas directly in the database or creates the DDL to generate the schemas needed for the persistence unit. [I'll use the term "schema generation" generically below to mean either table generation or
schema+table generation with this assumption in view.]

Here's a proposal to get the discussion started.

SCHEMA GENERATION

I think we should allow schema generation to be done either as a deployment (or pre-deployment) phase or at createEMF time.

Deployment-time (or predeployment-time) schema generation could be done either by the platform provider calling into the PersistenceProvider implementation to generate the database schema or by the platform provider running SQL DDL scripts.

SQL DDL scripts might be provided with the application (bundled as part of the persistence unit) or the platform provider might invoke the PersistenceProvider implementation to generate them from the persistence unit metadata. If scripts are provided or generated, it should be the platform provider's responsibility to run them against the database. If the application has provided DDL scripts, the application should also have the option to provide SQL scripts to do a bulk insert of data. It would be the responsibility of the platform provider to run these as well.

An alternative to the use of scripts is for the platform provider to call into the PersistenceProvider to directly generate the tables in the database. In the case that a database has been provisioned rather than only a schema, the persistence provider would generate the schemas and tables.

For the sake of argument, let's assume that we add a generateSchema() method to the PersistenceProvider interface, which the platform provider invokes during the deployment phase if SQL scripts have not been provided by the application.

Information that needs to be passed in would include the following:

(1) Information that specifies how schema generation is to proceed.
     This information could take the form of a SchemaGenerationInfo
     interface or it could be in the form of additional standard properties.
(2) Connection to the database or schema. [Is this a Connection or a
     DataSource ?]
(3) PersistenceUnitInfo

The SchemaGenerationInfo should include the following:

   ddl-generation-mode: GENERATE_AND_EXECUTE | GENERATE_ONLY | EXECUTE_ONLY

       GENERATE_AND_EXECUTE => generate ddl scripts and generate schema (if
         applicable) and tables directly in the database

       GENERATE_ONLY => generate ddl scripts

       EXECUTE_ONLY => generate schema (if applicable) and tables directly in
          the database

   create-ddl-target: a Writer configured for the persistence provider
       for outputting of the DDL scripts for the creation of schema/tables.
       This should be null if EXECUTE_ONLY ddl-generation-mode is specified.

   drop-ddl-target: a Writer configured for the persistence provider
       for outputting of the DDL scripts for the dropping of schema/tables.
       The should be null if EXECUTE_ONLY ddl-generation-mode is specified.

   ddl-execution-mode: Used when executing directly against the database,
       rather than generating scripts. Options should include:
        CREATE_TABLES | DROP_TABLES | DROP_AND_CREATE_TABLES
      Do we also need to distinguish CREATE_SCHEMAS_AND_TABLES ? DROP_SCHEMA ?
        Or can these be implicit?
      Do we also need an ALTER_TABLES capability here?

   properties: These could include any vendor-specific properties

ADDITIONAL METADATA (persistence.xml)

If scripts are provided with the application (i.e., bundled as part of the persistence unit), we should add entries in the persistence.xml to identify them:
       create-ddl-script
       drop-ddl-script
       data-load-script(s) [Do we need to support more than one load script?
          If so, the ordering may need to be specified as well.]

Open issue: do we need metadata in the persistence.xml for
   ddl-execution-mode as well? For example, this might be used in
   standalone scenarios (?)

ORM METADATA (annotations and orm.xml)

(1) Indexes

I think we need to support indexes.

Here's a strawman annotation:

   @Target({}) @Retention(RUNTIME)
   public @interface Index {
     String name() default "";
     String columnList();
   }

   The columnList syntax could follow that of the OrderBy annotation:

     columnList::= indexColumn [, indexColumn]*
     indexColumn::= columnName [ASC | DESC]

     If ASC or DESC is not specified, ASC is assumed

We'd also need to add
    Index[] indexes() default {};
to Table, Secondary Table, CollectionTable, JoinTable, and TableGenerator

(2) Foreign Key Constraints

I see two possible approaches:

(a) Add a string-valued foreignKeyDefinition element to JoinColumn, JoinColumns, MapKeyJoinColumn, etc. to specify a SQL fragment (intended along the lines of columnDefinition) for defining a foreign key constraint and/or for overriding the persistence provider's default foreign key definition. It might also be helpful to allow a name for the foreign key constraint to be specified for the case where the provider is using its default foreign key generation strategy.

or

(b) Add a ForeignKey annotation to specify the foreign key constraint -- for example,

public @interface ForeignKey {
   String name() default "";
   ForeignKeyAction updateAction() default NO_ACTION;
   ForeignKeyAction deleteAction() default NO_ACTION;
   boolean deferred() default false;
}

public enum ForeignKeyAction {NO_ACTION, CASCADE, SET_NULL};

I have some misgivings about approach (b), and, given a lack of database portability here, we might wind up needing a foreignKeyDefinition fragment anyway.

(3) Other

What about the ability to distinguish a CHAR rather than VARCHAR mapping for strings? Or should we just leave this as belonging in a SQL fragment (with the understanding that we would be intentionally discouraging the use of CHAR strings)?

(4) Anything else I'm missing?

APIs

I assumed the addition of a PersistenceProvider.generateSchema() method above.

If we also support schema generation at EMF creation time, do we want another createContainerEntityManagerFactory() method that takes a SchemaGenerationInfo as well as PersistenceUnitInfo argument ?

If generation is done directly in the database, at some point the container (or application) may need to have the schema/tables dropped.
I'm not sure of the best way to support this. A dropAndClose() method
on the EMF??