[jpa-spec users] [jsr338-experts] Re: JPA schema generation

From: Steve Ebersole <steve.ebersole_at_redhat.com>
Date: Tue, 15 May 2012 13:35:32 -0500

Well part of the problem is that JPA decided to go a totally different
implementation direction for multi-tenancy support than what I did for
Hibernate. It takes a fair amount of extrapolation based on my work
and experiences there to bear relevant discussion on this JPA approach.

But I will read through your proposal some more this evening and
respond back.

On Tue 15 May 2012 01:17:19 PM CDT, Linda DeMichiel wrote:
>
>
> On 5/15/2012 11:09 AM, Steve Ebersole wrote:
>> Is the hope to have schema generation supported with multi-tenancy?
>
> Yes -- all the pieces need to fit together.
>
> I see no why for the JPA provider to know whether
>> the schema needs to be generated because of this segmentation where
>> an EMF only serves a single tenant. So is the plan
>> there to put the onus to keep track of that on the environment? You
>> seem to prefer the environment to manage the schema
>> generation (the JPA provider simply generates a generation script as
>> I read your proposal), so maybe this is not so bad.
>>
>
> I've also proposed that when the JPA provider "owns" the database
> (i.e., the database
> is a dedicated rather than a shared database) that the JPA provider
> creates the schemas
> in the database. The process as a whole, however, is driven from the
> platform provider side.
>
> I would really like to get feedback from you and the other experts in
> the group on the details
> here. The hard part is always in the details :-)
>
> thanks again,
>
> -Linda
>
>> As for know if dropping is necessary, heck put all the onus on the
>> environment ;)
>>
>>
>> On Tue 15 May 2012 12:57:15 PM CDT, Linda DeMichiel wrote:
>>> Folks,
>>>
>>> I'd like to get more feedback on this.
>>>
>>> thanks,
>>>
>>> -Linda
>>>
>>>
>>> On 5/9/2012 11:36 AM, Linda DeMichiel wrote:
>>>>
>>>> In conjunction with support for use of JPA in PaaS environments, we
>>>> will need to address the issue of schema generation. After we discuss
>>>> this area, I'd like to return to a broader discussion of multitenancy
>>>> support with this in view.
>>>>
>>>>
>>>> Here's what I think we'll need to address:
>>>>
>>>> 1. How/when schema generation happens:
>>>> a. at a deployment time phase (or predeployment time phase)
>>>> b. at EMF creation time
>>>>
>>>> 2. Providing an option whereby the application can supply DDL for
>>>> schema generation as an alternative to the persistence provider
>>>> doing schema generation based on the ORM metadata
>>>>
>>>> 3. How, once schema generation has happened, tables can get loaded
>>>> with
>>>> data. If we enable the application to provide DDL scripts for
>>>> schema generation, then supporting the use of SQL bulk insert scripts
>>>> would go hand-in-hand with this, but perhaps there are also other
>>>> approaches, e.g., bulk insert via JPQL(?).
>>>>
>>>> 4. The need for additional ORM metadata: e.g. metadata for index
>>>> generation; for foreign key constraints; other(?).
>>>>
>>>> 5. More precise definition of existing metadata that can be used
>>>> for schema generation: what it comprises, and how it can/must
>>>> be used.
>>>>
>>>> 6. Additional persistence.xml level metadata and additional metadata
>>>> the platform provider passes to the persistence provider to control
>>>> or configure the operations outlined above.
>>>>
>>>>
>>>> I am assuming for that Java EE 7 we need to support both the
>>>> separate database and shared database (aka separate schema)
>>>> approaches.
>>>>
>>>> I am also assuming that it is the platform provider that does the
>>>> provisioning of either the database or the schema respectively (i.e.,
>>>> in accordance with whether the separate database or shared database
>>>> approach is taken). I.e., in the separate database approach, I'm
>>>> assuming that it is the persistence provider that creates the schemas
>>>> directly in the database or creates the DDL to generate the schemas
>>>> needed for the persistence unit. [I'll use the term "schema
>>>> generation" generically below to mean either table generation or
>>>> schema+table generation with this assumption in view.]
>>>>
>>>>
>>>> Here's a proposal to get the discussion started.
>>>>
>>>> SCHEMA GENERATION
>>>>
>>>> I think we should allow schema generation to be done either as a
>>>> deployment (or pre-deployment) phase or at createEMF time.
>>>>
>>>> Deployment-time (or predeployment-time) schema generation could be
>>>> done either by the platform provider calling into the
>>>> PersistenceProvider implementation to generate the database schema or
>>>> by the platform provider running SQL DDL scripts.
>>>>
>>>> SQL DDL scripts might be provided with the application (bundled as
>>>> part of the persistence unit) or the platform provider might invoke
>>>> the PersistenceProvider implementation to generate them from the
>>>> persistence unit metadata. If scripts are provided or generated, it
>>>> should be the platform provider's responsibility to run them against
>>>> the database. If the application has provided DDL scripts, the
>>>> application should also have the option to provide SQL scripts to do
>>>> a bulk insert of data. It would be the responsibility of the
>>>> platform provider to run these as well.
>>>>
>>>> An alternative to the use of scripts is for the platform provider to
>>>> call into the PersistenceProvider to directly generate the tables in
>>>> the database. In the case that a database has been provisioned rather
>>>> than only a schema, the persistence provider would generate the
>>>> schemas and tables.
>>>>
>>>> For the sake of argument, let's assume that we add a generateSchema()
>>>> method to the PersistenceProvider interface, which the platform
>>>> provider invokes during the deployment phase if SQL scripts have not
>>>> been provided by the application.
>>>>
>>>> Information that needs to be passed in would include the following:
>>>>
>>>> (1) Information that specifies how schema generation is to proceed.
>>>> This information could take the form of a SchemaGenerationInfo
>>>> interface or it could be in the form of additional standard
>>>> properties.
>>>> (2) Connection to the database or schema. [Is this a Connection or a
>>>> DataSource ?]
>>>> (3) PersistenceUnitInfo
>>>>
>>>>
>>>> The SchemaGenerationInfo should include the following:
>>>>
>>>> ddl-generation-mode: GENERATE_AND_EXECUTE | GENERATE_ONLY |
>>>> EXECUTE_ONLY
>>>>
>>>> GENERATE_AND_EXECUTE => generate ddl scripts and generate schema (if
>>>> applicable) and tables directly in the database
>>>>
>>>> GENERATE_ONLY => generate ddl scripts
>>>>
>>>> EXECUTE_ONLY => generate schema (if applicable) and tables directly in
>>>> the database
>>>>
>>>> create-ddl-target: a Writer configured for the persistence provider
>>>> for outputting of the DDL scripts for the creation of schema/tables.
>>>> This should be null if EXECUTE_ONLY ddl-generation-mode is specified.
>>>>
>>>> drop-ddl-target: a Writer configured for the persistence provider
>>>> for outputting of the DDL scripts for the dropping of schema/tables.
>>>> The should be null if EXECUTE_ONLY ddl-generation-mode is specified.
>>>>
>>>> ddl-execution-mode: Used when executing directly against the database,
>>>> rather than generating scripts. Options should include:
>>>> CREATE_TABLES | DROP_TABLES | DROP_AND_CREATE_TABLES
>>>> Do we also need to distinguish CREATE_SCHEMAS_AND_TABLES ?
>>>> DROP_SCHEMA ?
>>>> Or can these be implicit?
>>>> Do we also need an ALTER_TABLES capability here?
>>>>
>>>> properties: These could include any vendor-specific properties
>>>>
>>>>
>>>> ADDITIONAL METADATA (persistence.xml)
>>>>
>>>> If scripts are provided with the application (i.e., bundled as part
>>>> of the persistence unit), we should add entries in the persistence.xml
>>>> to identify them:
>>>> create-ddl-script
>>>> drop-ddl-script
>>>> data-load-script(s) [Do we need to support more than one load script?
>>>> If so, the ordering may need to be specified as well.]
>>>>
>>>> Open issue: do we need metadata in the persistence.xml for
>>>> ddl-execution-mode as well? For example, this might be used in
>>>> standalone scenarios (?)
>>>>
>>>>
>>>> ORM METADATA (annotations and orm.xml)
>>>>
>>>> (1) Indexes
>>>>
>>>> I think we need to support indexes.
>>>>
>>>> Here's a strawman annotation:
>>>>
>>>> @Target({}) @Retention(RUNTIME)
>>>> public @interface Index {
>>>> String name() default "";
>>>> String columnList();
>>>> }
>>>>
>>>> The columnList syntax could follow that of the OrderBy annotation:
>>>>
>>>> columnList::= indexColumn [, indexColumn]*
>>>> indexColumn::= columnName [ASC | DESC]
>>>>
>>>> If ASC or DESC is not specified, ASC is assumed
>>>>
>>>> We'd also need to add
>>>> Index[] indexes() default {};
>>>> to Table, Secondary Table, CollectionTable, JoinTable, and
>>>> TableGenerator
>>>>
>>>>
>>>> (2) Foreign Key Constraints
>>>>
>>>> I see two possible approaches:
>>>>
>>>> (a) Add a string-valued foreignKeyDefinition element to JoinColumn,
>>>> JoinColumns, MapKeyJoinColumn, etc. to specify a SQL fragment
>>>> (intended along the lines of columnDefinition) for defining a foreign
>>>> key constraint and/or for overriding the persistence provider's
>>>> default foreign key definition. It might also be helpful to allow a
>>>> name for the foreign key constraint to be specified for the case where
>>>> the provider is using its default foreign key generation strategy.
>>>>
>>>> or
>>>>
>>>> (b) Add a ForeignKey annotation to specify the foreign key
>>>> constraint --
>>>> for example,
>>>>
>>>> public @interface ForeignKey {
>>>> String name() default "";
>>>> ForeignKeyAction updateAction() default NO_ACTION;
>>>> ForeignKeyAction deleteAction() default NO_ACTION;
>>>> boolean deferred() default false;
>>>> }
>>>>
>>>> public enum ForeignKeyAction {NO_ACTION, CASCADE, SET_NULL};
>>>>
>>>> I have some misgivings about approach (b), and, given a lack of
>>>> database portability here, we might wind up needing a
>>>> foreignKeyDefinition
>>>> fragment anyway.
>>>>
>>>>
>>>> (3) Other
>>>>
>>>> What about the ability to distinguish a CHAR rather than VARCHAR
>>>> mapping for strings? Or should we just leave this as belonging in a
>>>> SQL fragment (with the understanding that we would be intentionally
>>>> discouraging the use of CHAR strings)?
>>>>
>>>> (4) Anything else I'm missing?
>>>>
>>>>
>>>> APIs
>>>>
>>>> I assumed the addition of a PersistenceProvider.generateSchema()
>>>> method
>>>> above.
>>>>
>>>> If we also support schema generation at EMF creation time, do we want
>>>> another createContainerEntityManagerFactory() method that takes a
>>>> SchemaGenerationInfo as well as PersistenceUnitInfo argument ?
>>>>
>>>> If generation is done directly in the database, at some point the
>>>> container (or application) may need to have the schema/tables dropped.
>>>> I'm not sure of the best way to support this. A dropAndClose() method
>>>> on the EMF??
>>>>
>>>>
>>>>