Re: How to make caching work?

From: Peter Liu <Peter.Liu_at_Sun.COM>
Date: Thu, 27 Sep 2007 17:55:45 -0700

Paul Sandoz wrote:
> Peter Liu wrote:
>> Marc Hadley wrote:
>>> On Sep 26, 2007, at 4:15 PM, Peter Liu wrote:
>>>>
>>>> Thanks for the pointers. I am starting to understand how caching
>>>> works.
>>>> Here is my understanding of how we should generate code to support
>>>> caching in the resource classes we generate from entity classes.
>>>>
>>>> First, we need to generate a memory store similar to the one in
>>>> the Storage sample. This memory store will keep track of the
>>>> digest and lastModified date of an resource.
>>>>
>>> PMFJI. If you want to avoid hitting the database and the resource
>>> class is the only means of updating the data then an in-memory cache
>>> makes sense. However you can run into problems if the database can
>>> be updated by other means (e.g. some background process or other
>>> external access) since the cache and database can then get out of
>>> sync and you'll run into problems that preconditions were meant to
>>> solve.
>
> Also if you have multiple instances of the web application deployed
> for scalability purposes different instances may have different in
> memory versions of the information. For example, a PUT request gets
> routed to instance A, and instance B is not informed of the update.
>
I am going to scratch this idea since it has too many issues.
>> Yes, we just discussed this issue and I am not sure what the
>> alternative would be if we want to avoid hitting the database in order
>> to evaluate the precondition. Is it enough of a performance boost if
>> we don't return the data even though we fetch it internally?
>>
>
> It could also depend on how much data you need to get back from the DB
> for pre-conditions compared that for the complete representation.
>
I think this is the key. The data for pre-condition evaluation needs to
be smaller than the actual resource data. As Marc mentioned, it
would be very power to the let users specify which columns to use for
creating etags.
> One alternative is to MD5 checksum the representation in the runtime,
> and this can decrease performance and use up memory, especially for
> large representations because they data needs to be serialized and
> buffered. So what we are saving if preconditions are supported in the
> application using a DB is no serializing and buffering.
>
> I think you may need to discuss the DB aspects with a DB expert so
> that we can make the best technical decisions based on good
> experience. For example, in existing deployed web apps that support
> preconditions using a DB is the DB the limiting factor?
I will shoot David an email on. He should have an opinion on this.
>
>
>>> In general you should check the etag and/or last modified against
>>> the database within the same transaction as the update if you want
>>> complete consistency.
>>>
>> What about fetching data? You also need to check against the database
>> in order to get complete consistency?
>>
>>>> Second, we need to change the return type for all the http methods
>>>> to Response. We will then use the precondition evaluator to
>>>> determine what response to send back. We will also us the CacheControl
>>>> to specify the caching policy for the resource.
>>>>
>>> Right.
>>>
>>>> Here are some questions:
>>>>
>>>> 1. Regarding etag (digest) and lastModified data, do we need both?
>>>> We will lose
>>>> all the lastModified dates after each restart but we can always
>>>> recompute
>>>> the etags from the resource itself. So etag seems to be a more
>>>> reliable method.
>>>
>>> Right. Etag is preferable to last modified since the latter only has
>>> a granularity of 1 second which might not be sufficient in a rapidly
>>> changing dataset.
>>>
>>> Also note that the etag value doesn't have to be a digest of the
>>> representation, it can be anything that generates a unique value for
>>> a particular representation of the resource. E.g. the database might
>>> include a version or update timestamp field for each record and you
>>> could concatenate either of those with the representation format to
>>> make the etag and save a costly digest calculation.
>>>
>>> If the tooling allowed a developer to specify that a particular
>>> field (or combination of fields) was suitable for etag generation
>>> that would be quite powerful. The tooling could then default to the
>>> more expensive digest calculation when the developer doesn't specify
>>> an alternative.
>>>
>> Good suggestion. I'll keep it in mind.
>>
>> I have a couple of more questions. At what granularity should we be
>> caching the resources? Should we cache the container resources, too
>> or just the item resources.
>
> Both if you can, although i can see the difficulty because how do you
> know if the feed is modified unless checking all entries. I suppose
> paging can help here. Or maybe there DB support to query when rows of
> a table was last modified? (a bit like checking the last modification
> time of a directory).
>
>
>> What about query parameter? How does it affect caching?
>
> A URI with query parameters is a different URI to those without, and
> the former can return different representations based on the parameters.
>
>
>>>> 2. If I call Response.Builder.representation(jaxbInstance) without
>>>> the mime type,
>>>> will it automatically serialize the jaxb instance into xml or json
>>>> depending on the
>>>> mime type in the request header? The reason I ask this is because
>>>> we currently
>>>> specify a list of mime types in the ConsumeMime and ProduceMime
>>>> annotations.
>>>
>>> Yes, I believe that is how it should work.
>>>
>
> Correct.
>
> Paul.
>