Re: How to make caching work?

From: Paul Sandoz <Paul.Sandoz_at_Sun.COM>
Date: Thu, 27 Sep 2007 14:07:14 +0200

Peter Liu wrote:
> Marc Hadley wrote:
>> On Sep 26, 2007, at 4:15 PM, Peter Liu wrote:
>>>
>>> Thanks for the pointers. I am starting to understand how caching works.
>>> Here is my understanding of how we should generate code to support
>>> caching in the resource classes we generate from entity classes.
>>>
>>> First, we need to generate a memory store similar to the one in
>>> the Storage sample. This memory store will keep track of the
>>> digest and lastModified date of an resource.
>>>
>> PMFJI. If you want to avoid hitting the database and the resource
>> class is the only means of updating the data then an in-memory cache
>> makes sense. However you can run into problems if the database can be
>> updated by other means (e.g. some background process or other external
>> access) since the cache and database can then get out of sync and
>> you'll run into problems that preconditions were meant to solve.

Also if you have multiple instances of the web application deployed for
scalability purposes different instances may have different in memory
versions of the information. For example, a PUT request gets routed to
instance A, and instance B is not informed of the update.

> Yes, we just discussed this issue and I am not sure what the alternative
> would be if we want to avoid hitting the database in order
> to evaluate the precondition. Is it enough of a performance boost if we
> don't return the data even though we fetch it internally?
>

It could also depend on how much data you need to get back from the DB
for pre-conditions compared that for the complete representation.

One alternative is to MD5 checksum the representation in the runtime,
and this can decrease performance and use up memory, especially for
large representations because they data needs to be serialized and
buffered. So what we are saving if preconditions are supported in the
application using a DB is no serializing and buffering.

I think you may need to discuss the DB aspects with a DB expert so that
we can make the best technical decisions based on good experience. For
example, in existing deployed web apps that support preconditions using
a DB is the DB the limiting factor?

>> In general you should check the etag and/or last modified against the
>> database within the same transaction as the update if you want
>> complete consistency.
>>
> What about fetching data? You also need to check against the database in
> order to get complete consistency?
>
>>> Second, we need to change the return type for all the http methods
>>> to Response. We will then use the precondition evaluator to
>>> determine what response to send back. We will also us the CacheControl
>>> to specify the caching policy for the resource.
>>>
>> Right.
>>
>>> Here are some questions:
>>>
>>> 1. Regarding etag (digest) and lastModified data, do we need both? We
>>> will lose
>>> all the lastModified dates after each restart but we can always
>>> recompute
>>> the etags from the resource itself. So etag seems to be a more
>>> reliable method.
>>
>> Right. Etag is preferable to last modified since the latter only has a
>> granularity of 1 second which might not be sufficient in a rapidly
>> changing dataset.
>>
>> Also note that the etag value doesn't have to be a digest of the
>> representation, it can be anything that generates a unique value for a
>> particular representation of the resource. E.g. the database might
>> include a version or update timestamp field for each record and you
>> could concatenate either of those with the representation format to
>> make the etag and save a costly digest calculation.
>>
>> If the tooling allowed a developer to specify that a particular field
>> (or combination of fields) was suitable for etag generation that would
>> be quite powerful. The tooling could then default to the more
>> expensive digest calculation when the developer doesn't specify an
>> alternative.
>>
> Good suggestion. I'll keep it in mind.
>
> I have a couple of more questions. At what granularity should we be
> caching the resources? Should we cache the container resources, too or
> just the item resources.

Both if you can, although i can see the difficulty because how do you
know if the feed is modified unless checking all entries. I suppose
paging can help here. Or maybe there DB support to query when rows of a
table was last modified? (a bit like checking the last modification time
of a directory).

> What about query parameter? How does it affect caching?

A URI with query parameters is a different URI to those without, and the
former can return different representations based on the parameters.

>>> 2. If I call Response.Builder.representation(jaxbInstance) without
>>> the mime type,
>>> will it automatically serialize the jaxb instance into xml or json
>>> depending on the
>>> mime type in the request header? The reason I ask this is because we
>>> currently
>>> specify a list of mime types in the ConsumeMime and ProduceMime
>>> annotations.
>>
>> Yes, I believe that is how it should work.
>>

Correct.

Paul.

-- 
| ? + ? = To question
----------------\
    Paul Sandoz
         x38109
+33-4-76188109