Coda File System

Re: large servers: please help

From: Robert Watson <robert_at_cyrus.watson.org>
Date: Wed, 20 Jan 1999 16:07:15 -0500 (EST)
On Wed, 20 Jan 1999, Peter J. Braam wrote:

> > AFS deals with this by 'chunking' -- that is, it demand-loads portions of
> > files into the cache as they are needed; I believe it also uses an
> > agressive read-ahead policy.  The net result is more efficient use of the
> > cache for partial file reads or writes, especially for mammoth files.
> 
> I just sent a message about this.
> 
> > However, that raises consistency issues: currently the resolution of
> > conflicts between file versions is that of entire file system objects
> > (files or directories).  Dealing with fine-grained inconsistency severely
> > complicates the repair process, I would guess; it is not even clear if the
> > client would have access to the whole file version it is attempting to
> > integrate.  For disconnected operation anyway, it seems like transferring
> > the whole file is more useful, as the chances are high that if you access
> > a bit of the file, you will access all of it (loading it into emacs,
> > writing it out, etc).
> 
> Whoops, this is a good point.  However, the conflict resolution mechanisms
> themselves would use the chunk fetching code, so it need not really be a
> problem.

The problem situation I was thinking of was this: Client1 is connected,
and retrieves the middle chunk of a file.  A write is made to the middle
chunk, but before it can be written back, Client1 goes disconnected.  we
now have a pending write on the middle chunk of a file, but only the
middle chunk is on Client1.  Client2 now bops up and proceeds to modify
the file in some manner, and succeeds.  Client1 now reconnects.  A
Client-Server conflict has arisen and must be resolved for the change to
be reintegrated.  However, because only a small part of the entire file is
available on Client1, the resolution process may now be more difficult.
Consider, for example, the case where it is an MS Word file.  An
application-specific resolver is required, but it doesn't have access to
the two complete versions of the file, so it may not even have access to
the old file header :(.  The whole-file-in-cache is a simplification for
version control that I think really does make life easier.

On the other hand, chunking would definitely improve performance
(especially perceived performance during a more or the like--the latency
to the first available data is much lower).  Maybe this is an appropriate
application for the 'client class' behavior I suggest below, and that we
both seem to agree is a large project and should wait :(.

> > > I was really hoping to have home directories mounted over coda, with inbox
> > > being stored right in the accounts, (and also large procmail filtered
> > > mailing-list archived mail folders) but that wont be feasible until at
> > > least write-back caching is available in a connected state.
> > > 
> > > I just got coda running recently, but the initial excitement has faded
> > > somewhat after discovering the above.. :(
> > 
> > My suspicion is that the arrangement you describe will suffer from Coda's
> > weak consistency model: if multiple clients are using write-back caching,
> > then conflicts can occur.  
> 
> Write back caching wil have the same semantics as connected Coda.
> If another client comes along, then the one holding the write back token
> will have to reintegrate first.
> 
> Conflicts in Coda arise as easily in connected mode as in AFS you would
> overwrite data (last close wins in AFS).  The problem with receiving email
> in Coda is locking to avoid conflicts.  I don't know how AFS does this,
> but with NFS it is certainly possible to ruin your mailbox easily.

Token-like behavior for file systems is clearly very nice, and would
improve consistency.  However, this is a departure from the traditional
Coda consistency model.  With replicated servers, how will tokens be
allocated, and by which server(s)?

In Coda, conflicts are more easily come upon than AFS-last-close behavior
because of replication.  Having the 'AFS-class client' that uses
last-close and timestamps to manage conflicts might result in unexpected
but at least non-interactive behavior.

> It's a good puzzle to see if Coda's connected semantics allow for the
> atomic creation of a lock file. Perhaps that is just possible.  On the
> other hand, I don't really have much more faith in AFS or NFS without lock
> daemons when it comes to my mail.

I would guess that Coda does not allow atomic creation on a replicated
volume, only on an unreplicated one.  Even then, only Venus will know
whether it was atomic and successful; if the client is disconnected, then
the userland mail process only sees the lock file creation succeed, and
doesn't know it has been logged.  Similarly, you might have problems with
lock files being left around: client is connected, creates lock file, and
then goes disconnected.  This is a lock like that nasty netscape problem
with netscape crashing and leaving lock files all over the place, only in
this case the disconnected mail client still thinks it has an atomic lock
:O.

As such, a disconnected system really needs to support lock preemption,
possibly notification, and certainly verification that a lock is still
valid.  Perhaps an optional distributed lock manager could be used with
Coda (presumably replicated with strong consistency in the style of Ubiq
or using a multi-party lock algorithm).  Disconnected operation still
introduces incomfortable situations, but at least connected clients could
guarantee locks.  My suspicion is, however, that when there are already
specific multi-user locking semantics for a specific application, that
application should be served by its own replication mechanism and not by a
file system with weak consistency.  So replicated IMAP servers might be a
better solution, with IMAP's disconnected operation and reintegration
techniques.  Or a mail reader that takes advantage of Coda as a message
store with weak sementics.

> > This is not to suggest that Coda is not useful in such an environment; 
> > it's real benefits come in the case of mobile computing.  It might be
> > interesting to introduce the concept of different 'classes' of client: 
> > that is, the semantics and consistency enforced for a particular client
> > might depend on the role it was expected to play.  
> 
> Yup, unfortunately, that's a rather major project probably.

It sounds like it.  Ideally I see something like this:

venus -consistency strong
venus -consistency afs
venus -consistency codamobile
venus -consistency slush

In each case, the strongest consistency available would be used, but the
fallback case where it wasn't would be different.  That is, if you started
venus with codamobile, when connected you'd get AFS or strong consistency,
but when disconnected you'd get logging and reintegration.  With AFS
consistency at startup, you'd get strong or AFS connected, and when
disconnected either everything hangs or obeys last-write based on
timestamps or something.  With strong, you'd get strong or hangs.

  Robert N Watson 

[email protected]              http://www.watson.org/~robert/
PGP key fingerprint: 03 01 DD 8E 15 67 48 73  25 6D 10 FC EC 68 C1 1C

Carnegie Mellon University            http://www.cmu.edu/
TIS Labs at Network Associates, Inc.  http://www.tis.com/
SafePort Network Services             http://www.safeport.com/
Received on 1999-01-20 16:08:23