users@jpa-spec.java.net

[jpa-spec users] Re: Full text search

From: Mark Struberg <struberg_at_yahoo.de>
Date: Wed, 30 May 2012 09:31:54 +0100 (BST)

> b.) single- and multi-column unique ACID standard SQL search indexes
whoops -unique. should read:
b.) single- and multi-column ACID standard SQL search indexes

txs!



----- Original Message -----
> From: Mark Struberg <struberg_at_yahoo.de>
> To: "users_at_jpa-spec.java.net" <users_at_jpa-spec.java.net>
> Cc:
> Sent: Wednesday, May 30, 2012 10:28 AM
> Subject: [jpa-spec users] Re: Full text search
>
> Hi!
>
> My original @Index proposal was mainly targeted at
>
> a.) single- and multi-column unique indexes
>
> b.) single- and multi-column unique ACID standard SQL search indexes
>
>
> We might also think about supporting asynchronous and other non-standard
> indices, but this is really far more work. Most time it's not only about a
> special create index notation, but often also requires even a nativeQuery to use
> them (eg 'CONTAINS' for the Oracle Text Search Index). Lucene also has
> other specialties: it cannot update a search-key, but must delete and
> subsequently re-import it. Also a lot of those special search indices have tons
> of additional (implementation specific) attributes, like the special character
> sorting behaviour, stop word lists, soundEx nearby values, etc
>
> Next anomaly: geo and triangular indices which are often used for location based
> services. Might become hard to define a 1-fits-it-all.
>
>
> All things that are not well defined in any specification (like e.g. SQL_2008
> ISO/IEC 9075:2008) or a widely adopted industry standard are really hard to
> grasp.
>
> LieGrue,
> strub
>
>> ________________________________
>> From: Christian Beikov <christian.beikov_at_gmail.com>
>> To: users_at_jpa-spec.java.net
>> Sent: Tuesday, May 29, 2012 5:46 PM
>> Subject: [jpa-spec users] Re: Full text search
>>
>>
>> The problem that the index update will have a bad performance will always
> exist. I don't really get your point, you always can make something bad and
> where is the difference between making it bad because of defining DBMS specific
> indices that will be updated after updating a table or declaring inidces(maybe
> full text) in the entity model(of course those indices will probably only be
> used in schema generation and for query hints). My point is, that you would be
> able to generate the db specific statements for creating indices if needed.
> Configuring for example lucene as search engine provider might disable the
> generation of indices in the schema generation but could use the declared
> metadata to build it's own indices.
>> Maybe you are right that this would exceed the purpose of JPA, anyway the
> index annotations proposed by Mark Struberg should IMO provide at least a way to
> use "extensions".
>> Regards,
>> Christian
>> Am 29.05.2012 16:43 schrieb "Christian Romberg"
> <cromberg_at_versant.com>:
>>
>> Hi Christian,
>>>
>>> Batching per transaction works only nicely, if there is a low
> transaction rate, and each transaction contains lot's of changes.
>>>
>>> E.g. a batch-import scenario.
>>>
>>> It does not work for high-load, small change-set scenarios.
>>>
>>> The point is, IMO not everything should be standardized and any spec
> should have a clear scope.
>>>
>>> Any vendor is free to offer a fulltext syntax extension for JPQL for
> full-text search.
>>> Any vendor can offer pluggability for any search providers or things
> like that.
>>>
>>> I don't see, how this could be included in the spec in a way, that
> it is still sound.
>>>
>>> For this it would need to work (almost) orthogonally with all other
> features, i.e. scenario independent.
>>>
>>> All this are indicators to me, that this exceeds the scope of what
> should be standardized.
>>>
>>> Regards,
>>>
>>> Christian
>>>
>>>
>>> On Tue, May 29, 2012 at 4:00 PM, Christian Beikov
> <christian.beikov_at_gmail.com> wrote:
>>>
>>> Hibernate search supports batching within transactions, generally the
> behavior described in the docs could be reused. What kind of limitations are you
> thinking of? If the index for the full text search is managed by the DBMS, there
> shouldn't be any problems?
>>>>
>>>> IMO JPA should provide the annotations for the full text indexing
>     and extend the EntityManager, Query, etc. to allow a nice way to
>     search for data. The implementation of such a search provider should
>     be pluggable, you could for example configure your JPA provider(in a
>     standardized way) to use the DBMS, Lucene or any other full text
>     search engine as the provider.
>>>>
>>>>
>>>>
>>>> Mit freundlichen Grüßen,
>>>>>>> ________________________________
>>>> Christian Beikov
>>>>
>>>>
> Am 29.05.2012 14:22, schrieb Christian Romberg:
>>>> Hi Christian,
>>>>>
>>>>> No, it's not that simple.
>>>>>
>>>>> Updating full-text indexes takes an enourmous amount
>>>>> of time compared to the time needed for all other things that
>       happen in a database commit operation.
>>>>>
>>>>> Thus, for most users it's hardly usable to do transactional
> full
>       text index updates.
>>>>>
>>>>> Batching such updates would be the way to go, however this
> imposes
>       quite some limitations.
>>>>>
>>>>> It's totally fine for any JPA vendor to provide special
> extensions
>       for special use cases, or use cases with serious restrictions.
>>>>>
>>>>> However, and this is just my personal opinion, the scope of any
>       specifications should not include such, they don't make
>>>>> a specification sound but brittle instead.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Christian
>>>>>
>>>>>
>>>>> On Tue, May 29, 2012 at 2:10 PM, Christian Beikov
> <christian.beikov_at_gmail.com> wrote:
>>>>>
>>>>> Hello!
>>>>>>
>>>>>> I have just read some lines of the documentation of
>             hibernate search and found out that the index is written
>             transactional if a transaction exists. In other words, the
>             implementation would have to participate in a JDBC or JTA
>             transaction.
>>>>>>
>>>>>> Here a little excerpt of the documentation(also see
> http://docs.jboss.org/hibernate/search/3.2/reference/en/html_single/#d0e488):
>>>>>>
>>>>>> To be more efficient, Hibernate Search batches the write
> interactions with the Lucene index. There is currently two types of batching
> depending on the expected scope. Outside a transaction, the index update
> operation is executed right after the actual database operation. This scope is
> really a no scoping setup and no batching is performed. However, it is
> recommended - for both your database and Hibernate Search - to execute your
> operation in a transaction be it JDBC or JTA. When in a transaction, the index
> update operation is scheduled for the transaction commit phase and discarded in
> case of transaction rollback. The batching scope is the transaction. There are
> two immediate benefits:
>>>>>>>     * Performance: Lucene indexing works better when
> operation are executed in batch.
>>>>>>>     * ACIDity: The work executed has the same scoping as
> the one executed by the database transaction and is executed if and only if the
> transaction is committed. This is not ACID in the strict sense of it, but ACID
> behavior is rarely useful for full text search indexes since they can be rebuilt
> from the source at any time.
>>>>>>> You can think of those two scopes (no scope vs
> transactional) as the equivalent of the (infamous) autocommit vs transactional
> behavior. From a performance perspective, the in transaction mode is
> recommended. The scoping choice is made transparently. Hibernate Search detects
> the presence of a transaction and adjust the scoping.
> Adapting this would fulfill your requirement, wouldn't it?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mit freundlichen Grüßen,
>>>>>>>>>>> ________________________________
>>>>>> Christian Beikov
>>>>>>
>>>>>>
> Am 29.05.2012 09:08, schrieb Christian Romberg:
>>>>>> Hi Christian,
>>>>>>>
>>>>>>> Normal indexes in standard ACID databases (regardless
>                   whether an RDBMS or an ODBMS like ours) are
>                   transactionally consistent.
>>>>>>>
>>>>>>> With full text indexes this becomes a problem, and I
>                   think this is the point, which would need to be
>                   discussed before discussing any
>>>>>>> integration in JPA.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Christian
>>>>>>>
>>>>>>>
>>>>>>> On Sun, May 27, 2012 at 9:31 PM, Christian Beikov
> <christian.beikov_at_gmail.com> wrote:
>>>>>>>
>>>>>>> Before adding another issue to JIRA I wanted to discuss
> the following.
>>>>>>>> Mark Struberg has added the issue for indices,
> http://java.net/jira/browse/JPA_SPEC-22
>>>>>>>> Depending on this issue I would like to see
>                         something like full text search
>                         support/integration in JPA.
>>>>>>>>
>>>>>>>> Hibernate has the possibility, via hibernate
>                         search, to create full text queries. The entity
>                         manager interface would probably have to be
>                         extended if a similar approach would be offered
>                         in JPA. I would really like to see a
>                         standaradized way of integrating full text
>                         search in JPA via search providers or so.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Mit freundlichen Grüßen,
>>>>>>>>>>>>>>> ________________________________
>>>>>>>> Christian Beikov
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Christian Romberg
>>>>>>> Chief Engineer| Versant GmbH
>>>>>>> (T) +49 40 60990-0
>>>>>>> (F) +49 40 60990-113
>>>>>>> (E) cromberg_at_versant.com
>>>>>>> www.versant.com| www.db4o.com
>>>>>>>
>>>>>>> --
>>>>>>> Versant
>>>>>>> GmbH is incorporated in Germany. Company
>                     registration number: HRB
>>>>>>> 54723, Amtsgericht Hamburg. Registered Office:
>                     Halenreie 42, 22359
>>>>>>> Hamburg, Germany. Geschäftsführer: Bernhard Wöbker,
>                     Volker John
>>>>>>>
>>>>>>> CONFIDENTIALITY
>>>>>>> NOTICE: This e-mail message, including any
>                     attachments, is for the sole
>>>>>>> use of the intended recipient(s) and may contain
>                     confidential or
>>>>>>> proprietary information. Any unauthorized review,
>                     use, disclosure or
>>>>>>> distribution is prohibited. If you are not the
>                     intended recipient,
>>>>>>> immediately contact the sender by reply e-mail and
>                     destroy all copies of
>>>>>>> the original message.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Christian Romberg
>>>>> Chief Engineer| Versant GmbH
>>>>> (T) +49 40 60990-0
>>>>> (F) +49 40 60990-113
>>>>> (E) cromberg_at_versant.com
>>>>> www.versant.com| www.db4o.com
>>>>>
>>>>> --
>>>>> Versant
>>>>> GmbH is incorporated in Germany. Company registration number:
>         HRB
>>>>> 54723, Amtsgericht Hamburg. Registered Office: Halenreie 42,
>         22359
>>>>> Hamburg, Germany. Geschäftsführer: Bernhard Wöbker, Volker John
>>>>>
>>>>> CONFIDENTIALITY
>>>>> NOTICE: This e-mail message, including any attachments, is for
>         the sole
>>>>> use of the intended recipient(s) and may contain confidential or
>
>>>>> proprietary information. Any unauthorized review, use,
>         disclosure or
>>>>> distribution is prohibited. If you are not the intended
>         recipient,
>>>>> immediately contact the sender by reply e-mail and destroy all
>         copies of
>>>>> the original message.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Christian Romberg
>>> Chief Engineer| Versant GmbH
>>> (T) +49 40 60990-0
>>> (F) +49 40 60990-113
>>> (E) cromberg_at_versant.com
>>> www.versant.com| www.db4o.com
>>>
>>> --
>>> Versant
>>> GmbH is incorporated in Germany. Company registration number: HRB
>>> 54723, Amtsgericht Hamburg. Registered Office: Halenreie 42, 22359
>>> Hamburg, Germany. Geschäftsführer: Bernhard Wöbker, Volker John
>>>
>>> CONFIDENTIALITY
>>> NOTICE: This e-mail message, including any attachments, is for the sole
>>> use of the intended recipient(s) and may contain confidential or
>>> proprietary information. Any unauthorized review, use, disclosure or
>>> distribution is prohibited. If you are not the intended recipient,
>>> immediately contact the sender by reply e-mail and destroy all copies of
>>> the original message.
>>>
>>>
>>>
>>>
>>>
>>
>>             
>