users@jpa-spec.java.net

[jpa-spec users] Re: Full text search

From: Christian Romberg <cromberg_at_versant.com>
Date: Tue, 29 May 2012 16:42:56 +0200

Hi Christian,

Batching per transaction works only nicely, if there is a low transaction
rate, and each transaction contains lot's of changes.

E.g. a batch-import scenario.

It does not work for high-load, small change-set scenarios.

The point is, IMO not everything should be standardized and any spec should
have a clear scope.

Any vendor is free to offer a fulltext syntax extension for JPQL for
full-text search.
Any vendor can offer pluggability for any search providers or things like
that.

I don't see, how this could be included in the spec in a way, that it is
still sound.

For this it would need to work (almost) orthogonally with all other
features, i.e. scenario independent.

All this are indicators to me, that this exceeds the scope of what should
be standardized.

Regards,

Christian

On Tue, May 29, 2012 at 4:00 PM, Christian Beikov <
christian.beikov_at_gmail.com> wrote:

> Hibernate search supports batching within transactions, generally the
> behavior described in the docs could be reused. What kind of limitations
> are you thinking of? If the index for the full text search is managed by
> the DBMS, there shouldn't be any problems?
>
> IMO JPA should provide the annotations for the full text indexing and
> extend the EntityManager, Query, etc. to allow a nice way to search for
> data. The implementation of such a search provider should be pluggable, you
> could for example configure your JPA provider(in a standardized way) to use
> the DBMS, Lucene or any other full text search engine as the provider.
>
>
> Mit freundlichen Grüßen,
> ------------------------------
> *Christian Beikov*
>
> Am 29.05.2012 14:22, schrieb Christian Romberg:
>
> Hi Christian,
>
> No, it's not that simple.
>
> Updating full-text indexes takes an enourmous amount
> of time compared to the time needed for all other things that happen in a
> database commit operation.
>
> Thus, for most users it's hardly usable to do transactional full text
> index updates.
>
> Batching such updates would be the way to go, however this imposes quite
> some limitations.
>
> It's totally fine for any JPA vendor to provide special extensions for
> special use cases, or use cases with serious restrictions.
>
> However, and this is just my personal opinion, the scope of any
> specifications should not include such, they don't make
> a specification sound but brittle instead.
>
> Regards,
>
> Christian
>
> On Tue, May 29, 2012 at 2:10 PM, Christian Beikov <
> christian.beikov_at_gmail.com> wrote:
>
>> Hello!
>>
>> I have just read some lines of the documentation of hibernate search and
>> found out that the index is written transactional if a transaction exists.
>> In other words, the implementation would have to participate in a JDBC or
>> JTA transaction.
>>
>> Here a little excerpt of the documentation(also see
>> http://docs.jboss.org/hibernate/search/3.2/reference/en/html_single/#d0e488
>> ):
>>
>> To be more efficient, Hibernate Search batches the write interactions
>> with the Lucene index. There is currently two types of batching depending
>> on the expected scope. Outside a transaction, the index update operation is
>> executed right after the actual database operation. This scope is really a
>> no scoping setup and no batching is performed. However, it is recommended -
>> for both your database and Hibernate Search - to execute your operation in
>> a transaction be it JDBC or JTA. When in a transaction, the index update
>> operation is scheduled for the transaction commit phase and discarded in
>> case of transaction rollback. The batching scope is the transaction. There
>> are two immediate benefits:
>>
>> -
>>
>> Performance: Lucene indexing works better when operation are executed
>> in batch.
>> -
>>
>> ACIDity: The work executed has the same scoping as the one executed
>> by the database transaction and is executed if and only if the transaction
>> is committed. This is not ACID in the strict sense of it, but ACID behavior
>> is rarely useful for full text search indexes since they can be rebuilt
>> from the source at any time.
>>
>> You can think of those two scopes (no scope vs transactional) as the
>> equivalent of the (infamous) autocommit vs transactional behavior. From a
>> performance perspective, the *in transaction* mode is recommended. The
>> scoping choice is made transparently. Hibernate Search detects the presence
>> of a transaction and adjust the scoping.
>>
>> Adapting this would fulfill your requirement, wouldn't it?
>>
>>
>> Mit freundlichen Grüßen,
>> ------------------------------
>> *Christian Beikov*
>>
>> Am 29.05.2012 09:08, schrieb Christian Romberg:
>>
>> Hi Christian,
>>
>> Normal indexes in standard ACID databases (regardless whether an RDBMS or
>> an ODBMS like ours) are transactionally consistent.
>>
>> With full text indexes this becomes a problem, and I think this is the
>> point, which would need to be discussed before discussing any
>> integration in JPA.
>>
>> Regards,
>>
>> Christian
>>
>> On Sun, May 27, 2012 at 9:31 PM, Christian Beikov <
>> christian.beikov_at_gmail.com> wrote:
>>
>>> Before adding another issue to JIRA I wanted to discuss the following.
>>> Mark Struberg has added the issue for indices,
>>> http://java.net/jira/browse/JPA_SPEC-22
>>> Depending on this issue I would like to see something like full text
>>> search support/integration in JPA.
>>>
>>> Hibernate has the possibility, via hibernate search, to create full text
>>> queries. The entity manager interface would probably have to be extended if
>>> a similar approach would be offered in JPA. I would really like to see a
>>> standaradized way of integrating full text search in JPA via search
>>> providers or so.
>>>
>>> What do you think?
>>> --
>>> Mit freundlichen Grüßen,
>>> ------------------------------
>>> *Christian Beikov*
>>>
>>
>>
>>
>> --
>> Christian Romberg
>> Chief Engineer | Versant GmbH
>> (T) +49 40 60990-0
>> (F) +49 40 60990-113 <%2B49%2040%2060990-113>
>> (E) cromberg_at_versant.com
>> www.versant.com<http://www.google.com/url?q=http%3A%2F%2Fwww.versant.com%2F&sa=D&sntz=1&usg=AFrqEzeeEBc_gN_8mxtt8xDB0tjXDXQVlw>|
>> www.db4o.com<http://www.google.com/url?q=http%3A%2F%2Fwww.db4o.com%2F&sa=D&sntz=1&usg=AFrqEzdo3Q40RwKQPBtnPIuBYQd1diFxJQ>
>>
>> --
>> Versant
>> GmbH is incorporated in Germany. Company registration number: HRB
>> 54723, Amtsgericht Hamburg. Registered Office: Halenreie 42, 22359
>> Hamburg, Germany. Geschäftsführer: Bernhard Wöbker, Volker John
>>
>> CONFIDENTIALITY
>> NOTICE: This e-mail message, including any attachments, is for the sole
>> use of the intended recipient(s) and may contain confidential or
>> proprietary information. Any unauthorized review, use, disclosure or
>> distribution is prohibited. If you are not the intended recipient,
>> immediately contact the sender by reply e-mail and destroy all copies of
>> the original message.
>>
>>
>>
>>
>
>
> --
> Christian Romberg
> Chief Engineer | Versant GmbH
> (T) +49 40 60990-0
> (F) +49 40 60990-113
> (E) cromberg_at_versant.com
> www.versant.com<http://www.google.com/url?q=http%3A%2F%2Fwww.versant.com%2F&sa=D&sntz=1&usg=AFrqEzeeEBc_gN_8mxtt8xDB0tjXDXQVlw>|
> www.db4o.com<http://www.google.com/url?q=http%3A%2F%2Fwww.db4o.com%2F&sa=D&sntz=1&usg=AFrqEzdo3Q40RwKQPBtnPIuBYQd1diFxJQ>
>
> --
> Versant
> GmbH is incorporated in Germany. Company registration number: HRB
> 54723, Amtsgericht Hamburg. Registered Office: Halenreie 42, 22359
> Hamburg, Germany. Geschäftsführer: Bernhard Wöbker, Volker John
>
> CONFIDENTIALITY
> NOTICE: This e-mail message, including any attachments, is for the sole
> use of the intended recipient(s) and may contain confidential or
> proprietary information. Any unauthorized review, use, disclosure or
> distribution is prohibited. If you are not the intended recipient,
> immediately contact the sender by reply e-mail and destroy all copies of
> the original message.
>
>
>
>


-- 
Christian Romberg
Chief Engineer | Versant GmbH
(T) +49 40 60990-0
(F) +49 40 60990-113
(E) cromberg_at_versant.com
www.versant.com<http://www.google.com/url?q=http%3A%2F%2Fwww.versant.com%2F&sa=D&sntz=1&usg=AFrqEzeeEBc_gN_8mxtt8xDB0tjXDXQVlw>|
www.db4o.com<http://www.google.com/url?q=http%3A%2F%2Fwww.db4o.com%2F&sa=D&sntz=1&usg=AFrqEzdo3Q40RwKQPBtnPIuBYQd1diFxJQ>
-- 
Versant
GmbH is incorporated in Germany. Company registration number: HRB
54723, Amtsgericht Hamburg. Registered Office: Halenreie 42, 22359
Hamburg, Germany. Geschäftsführer: Bernhard Wöbker, Volker John
CONFIDENTIALITY
NOTICE: This e-mail message, including any attachments, is for the sole
use of the intended recipient(s) and may contain confidential or
proprietary information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient,
immediately contact the sender by reply e-mail and destroy all copies of
the original message.