users@glassfish.java.net

Re: Can JPA cope with a dynamic schema?

From: <glassfish_at_javadesktop.org>
Date: Fri, 01 Jun 2007 13:44:59 PDT

The way I see it I have two choices.
I can choose the static schema option and use JPA.
or I can choose a dynamic schema option and use JDBC.

If I choose the dynamic option I'll have schema something like this..

[b]IssueTable[/b]
IssueId
IssueTitle
IssueTypeId

[b]IssueTypeFeatureTable[/b]
IssueId
Description
Sponser
Accepted
....[ more columns representing custom fields ]

[b]IssueTypeBugTable[/b]
IssueId
StepsToReporoduce
Severity
....[ more columns representing custom fields ]

So to get at issue 100 I would do..
select * from IssueTable where issueId = 100
to get the issue type and so its table then
select * from IssueTypeBugTable where issueId = 100
then I would manually stitch the result set together into my issue object (yuk!!, thats why I want to use JPA)

For a static schema it would become
[b]IssueTable[/b]
IssueId
IssueTitle
IssueType

[b]IssueCustomFieldTable[/b]
IssueId
CustomFieldId

[b]CustomFieldTypeStringTable[/b]
IssueId
CustomFieldId
StringValue

[b]CustomFieldTypeDateTable[/b]
IssueId
CustomFieldId
DateValue

....[ more tables for each supported custom field type ]

each table would be have its own entity class in JPA, so I imagine em.find(Issue.class, 100) would cause (approx) the following sql queries

select * from IssueTable where IssueId = 100
select * from IssueCustomFieldTable where IssueId = 100 // to get the issues custom fields, and so the identities of the custom field data type tables
so, for each custom field thats a string type
  select * from CustomFieldTypeString where IssueId = 100 and CustomFieldId = ?
and or each custom field thats a date type
  select * from CustomFieldTypeDate where IssueId = 100 and CustomFieldId = ?

What gives cause for pause about the static schema option is how slow all those queries might be compared to the dynamic option. Plus the fact that if I have a million issues, and each issue has 10 custom fields in use, then thats 10 million rows. Thats a lot. Thats 10 times more than I need from the dynamic solution.

What gives cause for pause about the dynamic option is that the consensus of opnion is that a rdbms schema is static, dont mess with it, thats not what its designed for.
The two best aguements for not having dynamic columns is that
a) its wastes data when you add a new column to 1 million rows. and
b) if you alter a table then nobody can access that table for the duration of the change.
The thing is though, I always thought that adding a column full of nulls had little real resource overhead. I figured any database only used resources to store actual data. Am I being naive, please correct me if I'm wrong on this.
The other point about locking others from the table while its changing is something I think I'm comfortable about. I'm in control of the sole application accessing the database. The tables being altered will only take place very infrequently, at the start of the issue types life (when there will be no or few actual issues in its table) and very infrequently later on, and by an administrator user.

Thank you again for every bodies opinion. I'm still in two minds about the solution. because JPA makes it so easy to write the database layer, its probably not going to be two hard to try the static schema option out and see how it performs. I wonder if my hardware is up to one million issue database.

Cheers,
Phil Wilkinson.
[Message sent by forum member 'wilkinp' (wilkinp)]

http://forums.java.net/jive/thread.jspa?messageID=220134