users@jpa-spec.java.net

[jpa-spec users] Re: Full text search

From: Christian Beikov <christian.beikov_at_gmail.com>
Date: Tue, 29 May 2012 17:46:44 +0200

The problem that the index update will have a bad performance will always
exist. I don't really get your point, you always can make something bad and
where is the difference between making it bad because of defining DBMS
specific indices that will be updated after updating a table or declaring
inidces(maybe full text) in the entity model(of course those indices will
probably only be used in schema generation and for query hints). My point
is, that you would be able to generate the db specific statements for
creating indices if needed. Configuring for example lucene as search engine
provider might disable the generation of indices in the schema generation
but could use the declared metadata to build it's own indices.

Maybe you are right that this would exceed the purpose of JPA, anyway the
index annotations proposed by Mark Struberg should IMO provide at least a
way to use "extensions".

Regards,

Christian
Am 29.05.2012 16:43 schrieb "Christian Romberg" <cromberg_at_versant.com>:

> Hi Christian,
>
> Batching per transaction works only nicely, if there is a low transaction
> rate, and each transaction contains lot's of changes.
>
> E.g. a batch-import scenario.
>
> It does not work for high-load, small change-set scenarios.
>
> The point is, IMO not everything should be standardized and any spec
> should have a clear scope.
>
> Any vendor is free to offer a fulltext syntax extension for JPQL for
> full-text search.
> Any vendor can offer pluggability for any search providers or things like
> that.
>
> I don't see, how this could be included in the spec in a way, that it is
> still sound.
>
> For this it would need to work (almost) orthogonally with all other
> features, i.e. scenario independent.
>
> All this are indicators to me, that this exceeds the scope of what should
> be standardized.
>
> Regards,
>
> Christian
>
> On Tue, May 29, 2012 at 4:00 PM, Christian Beikov <
> christian.beikov_at_gmail.com> wrote:
>
>> Hibernate search supports batching within transactions, generally the
>> behavior described in the docs could be reused. What kind of limitations
>> are you thinking of? If the index for the full text search is managed by
>> the DBMS, there shouldn't be any problems?
>>
>> IMO JPA should provide the annotations for the full text indexing and
>> extend the EntityManager, Query, etc. to allow a nice way to search for
>> data. The implementation of such a search provider should be pluggable, you
>> could for example configure your JPA provider(in a standardized way) to use
>> the DBMS, Lucene or any other full text search engine as the provider.
>>
>>
>> Mit freundlichen Grüßen,
>> ------------------------------
>> *Christian Beikov*
>>
>> Am 29.05.2012 14:22, schrieb Christian Romberg:
>>
>> Hi Christian,
>>
>> No, it's not that simple.
>>
>> Updating full-text indexes takes an enourmous amount
>> of time compared to the time needed for all other things that happen in a
>> database commit operation.
>>
>> Thus, for most users it's hardly usable to do transactional full text
>> index updates.
>>
>> Batching such updates would be the way to go, however this imposes quite
>> some limitations.
>>
>> It's totally fine for any JPA vendor to provide special extensions for
>> special use cases, or use cases with serious restrictions.
>>
>> However, and this is just my personal opinion, the scope of any
>> specifications should not include such, they don't make
>> a specification sound but brittle instead.
>>
>> Regards,
>>
>> Christian
>>
>> On Tue, May 29, 2012 at 2:10 PM, Christian Beikov <
>> christian.beikov_at_gmail.com> wrote:
>>
>>> Hello!
>>>
>>> I have just read some lines of the documentation of hibernate search and
>>> found out that the index is written transactional if a transaction exists.
>>> In other words, the implementation would have to participate in a JDBC or
>>> JTA transaction.
>>>
>>> Here a little excerpt of the documentation(also see
>>> http://docs.jboss.org/hibernate/search/3.2/reference/en/html_single/#d0e488
>>> ):
>>>
>>> To be more efficient, Hibernate Search batches the write interactions
>>> with the Lucene index. There is currently two types of batching depending
>>> on the expected scope. Outside a transaction, the index update operation is
>>> executed right after the actual database operation. This scope is really a
>>> no scoping setup and no batching is performed. However, it is recommended -
>>> for both your database and Hibernate Search - to execute your operation in
>>> a transaction be it JDBC or JTA. When in a transaction, the index update
>>> operation is scheduled for the transaction commit phase and discarded in
>>> case of transaction rollback. The batching scope is the transaction. There
>>> are two immediate benefits:
>>>
>>> -
>>>
>>> Performance: Lucene indexing works better when operation are
>>> executed in batch.
>>> -
>>>
>>> ACIDity: The work executed has the same scoping as the one executed
>>> by the database transaction and is executed if and only if the transaction
>>> is committed. This is not ACID in the strict sense of it, but ACID behavior
>>> is rarely useful for full text search indexes since they can be rebuilt
>>> from the source at any time.
>>>
>>> You can think of those two scopes (no scope vs transactional) as the
>>> equivalent of the (infamous) autocommit vs transactional behavior. From a
>>> performance perspective, the *in transaction* mode is recommended. The
>>> scoping choice is made transparently. Hibernate Search detects the presence
>>> of a transaction and adjust the scoping.
>>>
>>> Adapting this would fulfill your requirement, wouldn't it?
>>>
>>>
>>> Mit freundlichen Grüßen,
>>> ------------------------------
>>> *Christian Beikov*
>>>
>>> Am 29.05.2012 09:08, schrieb Christian Romberg:
>>>
>>> Hi Christian,
>>>
>>> Normal indexes in standard ACID databases (regardless whether an RDBMS
>>> or an ODBMS like ours) are transactionally consistent.
>>>
>>> With full text indexes this becomes a problem, and I think this is the
>>> point, which would need to be discussed before discussing any
>>> integration in JPA.
>>>
>>> Regards,
>>>
>>> Christian
>>>
>>> On Sun, May 27, 2012 at 9:31 PM, Christian Beikov <
>>> christian.beikov_at_gmail.com> wrote:
>>>
>>>> Before adding another issue to JIRA I wanted to discuss the following.
>>>> Mark Struberg has added the issue for indices,
>>>> http://java.net/jira/browse/JPA_SPEC-22
>>>> Depending on this issue I would like to see something like full text
>>>> search support/integration in JPA.
>>>>
>>>> Hibernate has the possibility, via hibernate search, to create full
>>>> text queries. The entity manager interface would probably have to be
>>>> extended if a similar approach would be offered in JPA. I would really like
>>>> to see a standaradized way of integrating full text search in JPA via
>>>> search providers or so.
>>>>
>>>> What do you think?
>>>> --
>>>> Mit freundlichen Grüßen,
>>>> ------------------------------
>>>> *Christian Beikov*
>>>>
>>>
>>>
>>>
>>> --
>>> Christian Romberg
>>> Chief Engineer | Versant GmbH
>>> (T) +49 40 60990-0
>>> (F) +49 40 60990-113 <%2B49%2040%2060990-113>
>>> (E) cromberg_at_versant.com
>>> www.versant.com<http://www.google.com/url?q=http%3A%2F%2Fwww.versant.com%2F&sa=D&sntz=1&usg=AFrqEzeeEBc_gN_8mxtt8xDB0tjXDXQVlw>|
>>> www.db4o.com<http://www.google.com/url?q=http%3A%2F%2Fwww.db4o.com%2F&sa=D&sntz=1&usg=AFrqEzdo3Q40RwKQPBtnPIuBYQd1diFxJQ>
>>>
>>> --
>>> Versant
>>> GmbH is incorporated in Germany. Company registration number: HRB
>>> 54723, Amtsgericht Hamburg. Registered Office: Halenreie 42, 22359
>>> Hamburg, Germany. Geschäftsführer: Bernhard Wöbker, Volker John
>>>
>>> CONFIDENTIALITY
>>> NOTICE: This e-mail message, including any attachments, is for the sole
>>> use of the intended recipient(s) and may contain confidential or
>>> proprietary information. Any unauthorized review, use, disclosure or
>>> distribution is prohibited. If you are not the intended recipient,
>>> immediately contact the sender by reply e-mail and destroy all copies of
>>> the original message.
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Christian Romberg
>> Chief Engineer | Versant GmbH
>> (T) +49 40 60990-0
>> (F) +49 40 60990-113
>> (E) cromberg_at_versant.com
>> www.versant.com<http://www.google.com/url?q=http%3A%2F%2Fwww.versant.com%2F&sa=D&sntz=1&usg=AFrqEzeeEBc_gN_8mxtt8xDB0tjXDXQVlw>|
>> www.db4o.com<http://www.google.com/url?q=http%3A%2F%2Fwww.db4o.com%2F&sa=D&sntz=1&usg=AFrqEzdo3Q40RwKQPBtnPIuBYQd1diFxJQ>
>>
>> --
>> Versant
>> GmbH is incorporated in Germany. Company registration number: HRB
>> 54723, Amtsgericht Hamburg. Registered Office: Halenreie 42, 22359
>> Hamburg, Germany. Geschäftsführer: Bernhard Wöbker, Volker John
>>
>> CONFIDENTIALITY
>> NOTICE: This e-mail message, including any attachments, is for the sole
>> use of the intended recipient(s) and may contain confidential or
>> proprietary information. Any unauthorized review, use, disclosure or
>> distribution is prohibited. If you are not the intended recipient,
>> immediately contact the sender by reply e-mail and destroy all copies of
>> the original message.
>>
>>
>>
>>
>
>
> --
> Christian Romberg
> Chief Engineer | Versant GmbH
> (T) +49 40 60990-0
> (F) +49 40 60990-113
> (E) cromberg_at_versant.com
> www.versant.com<http://www.google.com/url?q=http%3A%2F%2Fwww.versant.com%2F&sa=D&sntz=1&usg=AFrqEzeeEBc_gN_8mxtt8xDB0tjXDXQVlw>|
> www.db4o.com<http://www.google.com/url?q=http%3A%2F%2Fwww.db4o.com%2F&sa=D&sntz=1&usg=AFrqEzdo3Q40RwKQPBtnPIuBYQd1diFxJQ>
>
> --
> Versant
> GmbH is incorporated in Germany. Company registration number: HRB
> 54723, Amtsgericht Hamburg. Registered Office: Halenreie 42, 22359
> Hamburg, Germany. Geschäftsführer: Bernhard Wöbker, Volker John
>
> CONFIDENTIALITY
> NOTICE: This e-mail message, including any attachments, is for the sole
> use of the intended recipient(s) and may contain confidential or
> proprietary information. Any unauthorized review, use, disclosure or
> distribution is prohibited. If you are not the intended recipient,
> immediately contact the sender by reply e-mail and destroy all copies of
> the original message.
>
>
>
>