Re: Code Review Needed

From: Brian Goetz <Brian.Goetz_at_Sun.COM>
Date: Thu, 09 Nov 2006 21:44:04 -0500

>> My position is that there is nothing that either thread can do to
>> either corrupt the SB object or cause any problems to the other
>> thread. Thus it is perfectly safe code.
>>
>> If I'm wrong -- can you explain -- because this stuff comes up pretty
>> regularly and I would change my practices...
>
> You're wrong.

But don't feel bad -- you're in good company! Everyone makes this
mistake until they're hit over the head with it. A few times.

Background: modern shared-memory multiprocessors (which is all of them
these days) are permitted to do some downright weird things in the
absence of synchronization. So, for example, if Thread A executes

   a = 3;

and Thread B later executes

   System.out.println(a)

thread B is not guaranteed to print "3". This could be because the
value for a is still sitting in a processor-local cache somewhere and
hasn't been flushed out yet (resulting in a delay between the write to A
and its value being _visible_ to all other threads), or even because the
compiler has hoisted A into a register (resulting in an effectively
infinite delay between the write and the read.)

So, what does synchronization mean? Synchronization has two significant
aspects, one of which you already understand well: atomicity (or mutual
exclusion). The other, which is less well-understood, is visibility.
Unless threads A and B synchronize on a common lock (or do one of a few
other things, see below), there is NO GUARANTEE that thread B will EVER
see the value 3 when it reads A. Yow!

This might be harmless, but could be very nasty. Take the example of
StringBuilder. Imagine this implementation:

class StringBuilder {
   int curLen = 0;
   char[] curArray = new char[16];

   void append(String s) {
     if (curLen + s.length() > curArray.size) { /* expand array */ }

     /* copy chars from s to curArray starting at curLen */
     curLen += s.length();
   }
}

Now, thread A comes along and appends "foo", so it writes a "3" to
curLen and the appropriate characters to curArray. Then, thread B comes
along, and wants to append "bar". Since it is not guaranteed to see the
right value of curLen, it could start writing at position 0, instead of
3, resulting in "bar" instead of "foobar".

Worse, just because thread B can't see the value of curLen at time t
doesn't mean it won't see the right value at time t+1. So by the time
it gets around to adding 3 to curLen, it might now see the right value,
add 3, and set the length to 6, resulting in "barglq". Oops! And of
course, when thread C reads the result, who knows what it finds. (There
is a safety guarantee called "out of thin air safety" which does
guarantee that you will see a value that was put there by _some_ thread;
the most likely result if you get a stale value is the default value,
which is better than total garbage, but only slightly so.)

The rules are specified using a partial ordering called happens-before.
  If action A happens-before action B, then B is guaranteed to see the
result of A. Things within a thread happen-before each other according
to the (intuitive) program order, but across threads anything goes unless:

  - one thread releases a lock and another thread acquires the same lock
  - one thread writes a volatile and another reads the same volatile
  - one thread starts another
  - one thread joins with another

Putting it all together: if more than one thread accesses a variable,
and one thread might write to it, and all reads and writes are not
ordered by happens-before (which basically comes down to every access is
done with the same lock held), your program is broken and has no defined
semantics under the Java Memory Model. Not so desirable.

> My understanding is that there is no guarantee that any data written
> by one thread will be *visible* to the other thread unless you use
> synchronization. Since StringBuilder is unsynchronized, the data
> written by one thread may never be seen by the other thread.

Unfortunately, it _might_ still be visible anyway, which means your
program might _appear_ to work in development and testing, and still be
broken. And some processors (IA 32, and Sparc/TSO to name two) have
stronger memory models than required by the JMM, so certain bugs will
never show up on them (but can show up on weaker MM systems like PowerPC
or Niagara.) The only solution is to carefully design concurrent
classes to be thread-safe, and try to minimize the amount of your code
that is exposed to concurrency.

> You'll find more information on these issues in Effective Java

In particular, see EJ #48.

> and other fine Java programming books. :-)

Like Java Concurrency in Practice, which has a detailed treatment of the
JMM, among many other things. Buy a copy now, and get it signed by the
author next week! :)