[jpa-spec users] Re: Full text search

From: Mark Struberg <struberg_at_yahoo.de>
Date: Wed, 30 May 2012 09:28:57 +0100 (BST)

Hi!

My original @Index proposal was mainly targeted at

a.) single- and multi-column unique indexes

b.) single- and multi-column unique ACID standard SQL search indexes

We might also think about supporting asynchronous and other non-standard indices, but this is really far more work. Most time it's not only about a special create index notation, but often also requires even a nativeQuery to use them (eg 'CONTAINS' for the Oracle Text Search Index). Lucene also has other specialties: it cannot update a search-key, but must delete and subsequently re-import it. Also a lot of those special search indices have tons of additional (implementation specific) attributes, like the special character sorting behaviour, stop word lists, soundEx nearby values, etc

Next anomaly: geo and triangular indices which are often used for location based services. Might become hard to define a 1-fits-it-all.

All things that are not well defined in any specification (like e.g. SQL_2008 ISO/IEC 9075:2008) or a widely adopted industry standard are really hard to grasp.

LieGrue,
strub

>________________________________
> From: Christian Beikov <christian.beikov_at_gmail.com>
>To: users_at_jpa-spec.java.net
>Sent: Tuesday, May 29, 2012 5:46 PM
>Subject: [jpa-spec users] Re: Full text search
>
>
>The problem that the index update will have a bad performance will always exist. I don't really get your point, you always can make something bad and where is the difference between making it bad because of defining DBMS specific indices that will be updated after updating a table or declaring inidces(maybe full text) in the entity model(of course those indices will probably only be used in schema generation and for query hints). My point is, that you would be able to generate the db specific statements for creating indices if needed. Configuring for example lucene as search engine provider might disable the generation of indices in the schema generation but could use the declared metadata to build it's own indices.
>Maybe you are right that this would exceed the purpose of JPA, anyway the index annotations proposed by Mark Struberg should IMO provide at least a way to use "extensions".
>Regards,
>Christian
>Am 29.05.2012 16:43 schrieb "Christian Romberg" <cromberg_at_versant.com>:
>
>Hi Christian,
>>
>>Batching per transaction works only nicely, if there is a low transaction rate, and each transaction contains lot's of changes.
>>
>>E.g. a batch-import scenario.
>>
>>It does not work for high-load, small change-set scenarios.
>>
>>The point is, IMO not everything should be standardized and any spec should have a clear scope.
>>
>>Any vendor is free to offer a fulltext syntax extension for JPQL for full-text search.
>>Any vendor can offer pluggability for any search providers or things like that.
>>
>>I don't see, how this could be included in the spec in a way, that it is still sound.
>>
>>For this it would need to work (almost) orthogonally with all other features, i.e. scenario independent.
>>
>>All this are indicators to me, that this exceeds the scope of what should be standardized.
>>
>>Regards,
>>
>>Christian
>>
>>
>>On Tue, May 29, 2012 at 4:00 PM, Christian Beikov <christian.beikov_at_gmail.com> wrote:
>>
>>Hibernate search supports batching within transactions, generally the behavior described in the docs could be reused. What kind of limitations are you thinking of? If the index for the full text search is managed by the DBMS, there shouldn't be any problems?
>>>
>>>IMO JPA should provide the annotations for the full text indexing
    and extend the EntityManager, Query, etc. to allow a nice way to
    search for data. The implementation of such a search provider should
    be pluggable, you could for example configure your JPA provider(in a
    standardized way) to use the DBMS, Lucene or any other full text
    search engine as the provider.
>>>
>>>
>>>
>>>Mit freundlichen Grüßen,
>>>>>>________________________________
>>> Christian Beikov
>>>
>>>
Am 29.05.2012 14:22, schrieb Christian Romberg:
>>>Hi Christian,
>>>>
>>>>No, it's not that simple.
>>>>
>>>>Updating full-text indexes takes an enourmous amount
>>>>of time compared to the time needed for all other things that
      happen in a database commit operation.
>>>>
>>>>Thus, for most users it's hardly usable to do transactional full
      text index updates.
>>>>
>>>>Batching such updates would be the way to go, however this imposes
      quite some limitations.
>>>>
>>>>It's totally fine for any JPA vendor to provide special extensions
      for special use cases, or use cases with serious restrictions.
>>>>
>>>>However, and this is just my personal opinion, the scope of any
      specifications should not include such, they don't make
>>>>a specification sound but brittle instead.
>>>>
>>>>Regards,
>>>>
>>>>Christian
>>>>
>>>>
>>>>On Tue, May 29, 2012 at 2:10 PM, Christian Beikov <christian.beikov_at_gmail.com> wrote:
>>>>
>>>>Hello!
>>>>>
>>>>>I have just read some lines of the documentation of
            hibernate search and found out that the index is written
            transactional if a transaction exists. In other words, the
            implementation would have to participate in a JDBC or JTA
            transaction.
>>>>>
>>>>>Here a little excerpt of the documentation(also see
http://docs.jboss.org/hibernate/search/3.2/reference/en/html_single/#d0e488):
>>>>>
>>>>>To be more efficient, Hibernate Search batches the write interactions with the Lucene index. There is currently two types of batching depending on the expected scope. Outside a transaction, the index update operation is executed right after the actual database operation. This scope is really a no scoping setup and no batching is performed. However, it is recommended - for both your database and Hibernate Search - to execute your operation in a transaction be it JDBC or JTA. When in a transaction, the index update operation is scheduled for the transaction commit phase and discarded in case of transaction rollback. The batching scope is the transaction. There are two immediate benefits:
>>>>>>    * Performance: Lucene indexing works better when operation are executed in batch.
>>>>>>    * ACIDity: The work executed has the same scoping as the one executed by the database transaction and is executed if and only if the transaction is committed. This is not ACID in the strict sense of it, but ACID behavior is rarely useful for full text search indexes since they can be rebuilt from the source at any time.
>>>>>>You can think of those two scopes (no scope vs transactional) as the equivalent of the (infamous) autocommit vs transactional behavior. From a performance perspective, the in transaction mode is recommended. The scoping choice is made transparently. Hibernate Search detects the presence of a transaction and adjust the scoping.
Adapting this would fulfill your requirement, wouldn't it?
>>>>>
>>>>>
>>>>>
>>>>>Mit freundlichen Grüßen,
>>>>>>>>>>________________________________
>>>>> Christian Beikov
>>>>>
>>>>>
Am 29.05.2012 09:08, schrieb Christian Romberg:
>>>>>Hi Christian,
>>>>>>
>>>>>>Normal indexes in standard ACID databases (regardless
                  whether an RDBMS or an ODBMS like ours) are
                  transactionally consistent.
>>>>>>
>>>>>>With full text indexes this becomes a problem, and I
                  think this is the point, which would need to be
                  discussed before discussing any
>>>>>>integration in JPA.
>>>>>>
>>>>>>Regards,
>>>>>>
>>>>>>Christian
>>>>>>
>>>>>>
>>>>>>On Sun, May 27, 2012 at 9:31 PM, Christian Beikov <christian.beikov_at_gmail.com> wrote:
>>>>>>
>>>>>>Before adding another issue to JIRA I wanted to discuss the following.
>>>>>>>Mark Struberg has added the issue for indices, http://java.net/jira/browse/JPA_SPEC-22
>>>>>>>Depending on this issue I would like to see
                        something like full text search
                        support/integration in JPA.
>>>>>>>
>>>>>>>Hibernate has the possibility, via hibernate
                        search, to create full text queries. The entity
                        manager interface would probably have to be
                        extended if a similar approach would be offered
                        in JPA. I would really like to see a
                        standaradized way of integrating full text
                        search in JPA via search providers or so.
>>>>>>>
>>>>>>>What do you think?
>>>>>>>
>>>>>>>--
>>>>>>>Mit freundlichen Grüßen,
>>>>>>>>>>>>>>________________________________
>>>>>>> Christian Beikov
>>>>>>>
>>>>>>
>>>>>>
>>>>>>--
>>>>>>Christian Romberg
>>>>>>Chief Engineer| Versant GmbH
>>>>>>(T) +49 40 60990-0
>>>>>>(F) +49 40 60990-113
>>>>>>(E) cromberg_at_versant.com
>>>>>>www.versant.com| www.db4o.com
>>>>>>
>>>>>>--
>>>>>>Versant
>>>>>>GmbH is incorporated in Germany. Company
                    registration number: HRB
>>>>>>54723, Amtsgericht Hamburg. Registered Office:
                    Halenreie 42, 22359
>>>>>>Hamburg, Germany. Geschäftsführer: Bernhard Wöbker,
                    Volker John
>>>>>>
>>>>>>CONFIDENTIALITY
>>>>>>NOTICE: This e-mail message, including any
                    attachments, is for the sole
>>>>>>use of the intended recipient(s) and may contain
                    confidential or
>>>>>>proprietary information. Any unauthorized review,
                    use, disclosure or
>>>>>>distribution is prohibited. If you are not the
                    intended recipient,
>>>>>>immediately contact the sender by reply e-mail and
                    destroy all copies of
>>>>>>the original message.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>--
>>>>Christian Romberg
>>>>Chief Engineer| Versant GmbH
>>>>(T) +49 40 60990-0
>>>>(F) +49 40 60990-113
>>>>(E) cromberg_at_versant.com
>>>>www.versant.com| www.db4o.com
>>>>
>>>>--
>>>>Versant
>>>>GmbH is incorporated in Germany. Company registration number:
        HRB
>>>>54723, Amtsgericht Hamburg. Registered Office: Halenreie 42,
        22359
>>>>Hamburg, Germany. Geschäftsführer: Bernhard Wöbker, Volker John
>>>>
>>>>CONFIDENTIALITY
>>>>NOTICE: This e-mail message, including any attachments, is for
        the sole
>>>>use of the intended recipient(s) and may contain confidential or
>>>>proprietary information. Any unauthorized review, use,
        disclosure or
>>>>distribution is prohibited. If you are not the intended
        recipient,
>>>>immediately contact the sender by reply e-mail and destroy all
        copies of
>>>>the original message.
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>>--
>>Christian Romberg
>>Chief Engineer| Versant GmbH
>>(T) +49 40 60990-0
>>(F) +49 40 60990-113
>>(E) cromberg_at_versant.com
>>www.versant.com| www.db4o.com
>>
>>--
>>Versant
>>GmbH is incorporated in Germany. Company registration number: HRB
>>54723, Amtsgericht Hamburg. Registered Office: Halenreie 42, 22359
>>Hamburg, Germany. Geschäftsführer: Bernhard Wöbker, Volker John
>>
>>CONFIDENTIALITY
>>NOTICE: This e-mail message, including any attachments, is for the sole
>>use of the intended recipient(s) and may contain confidential or
>>proprietary information. Any unauthorized review, use, disclosure or
>>distribution is prohibited. If you are not the intended recipient,
>>immediately contact the sender by reply e-mail and destroy all copies of
>>the original message.
>>
>>
>>
>>
>>
>
>