[darcs-users] Need help getting off darcs
David Roundy
daveroundy at gmail.com
Fri Jan 4 18:19:27 UTC 2008
On Jan 3, 2008 11:05 PM, Daniel Burrows <dburrows at algebraicthunk.net> wrote:
> On Thu, Jan 03, 2008 at 07:25:19PM -0500, David Roundy <daveroundy at gmail.com> was heard to say:
>
> > On Jan 3, 2008 6:44 PM, James D Sadler <james at jamesdsadler.com> wrote:
> > > I don't understand how patches having back references to their
> > > dependencies leads you to conclude that record would be O(N) and size
> > > of the repository O(N^2). This is not a challenge, I am just missing
> > > the 'glue' that links my statements to your conclusion!
> >
> > In general, the number of dependencies that need be stored for each
> > patch is O(N). Thus the size of each patch is O(N) and the size of
> > the repository is O(N^2), the number of patches times the size of each
> > patch. It would probably be common that patches would have many fewer
> > than N dependencies, but there are no guarantees of this, so in the
> > worst-case scenario, the repository size is O(N^2).
> >
> > darcs record would need to compute the dependencies of the new patch.
> > Computing this would in general be O(N), since we'd need to check
> > whether or not it depends on N patches in the worst-case scenario.
> > When we're lucky, it might not be this expensive, but it certainly
> > could be.
>
> I'm also kind of curious about darcs internals, so I was wondering if
> you could tell me where I'm wrong in the following reasoning.
>
> AIUI, saying that "patch A depends on patch B" is equivalent to saying
> that A touches (modifies or deletes) lines introduced by B. [0] If
> that's the case, then if M is the number of lines affected by A, surely
> N <= M, and so the asymptotic performance/size requirements are O(M)?
There are different patch types, so due to darcs replace, one hunk
patch that affects only one line of text could depend on N replace
patches. If we abandoned the concept of the patch algebra (and the
resulting freedom to introduce change types that closer match our
desired merge semantics), then yes, we could make this simplification.
You have a good point that for hunk patches depending on hunk patches,
the maximum number of dependencies is proportional to the patch size.
> Put more practically, it seems like you could avoid having to
> recalculate patches by caching which patch is responsible for any given
> line of the current pristine tree. That would inflate repository sizes
> by a constant factor but would mean that darcs didn't have to
> constantly recalculate patch relationships.
I think you greatly overestimate the frequency with which darcs has to
compute dependencies. This would certainly *not* lead to performance
improvements. Maintaining caches of which patches affect which files
(or vice versa) would be much more effective.
> I'm guessing that my problem is that I don't understand patch
> dependencies. How do you get dependencies between patches that don't
> touch the same line of a file? I looked over some material on the
> theory of patches that I found on the Web, but it didn't go into detail
> on precisely how patch dependencies are calculated.
We never calculate patch dependencies (because they aren't needed),
that's why there's no detail on the subject.
The key (as I mentioned above) is that there is more than one type of patch.
> [0] I'm only thinking about direct dependencies here, not transitive
> ones; that is, I'm ignoring the possibility of A depends on B
> depends on C. Obviously these can be inferred with a simple graph
> traversal once you have direct dependencies; I can't imagine that
> it's worse than recalculating the dependencies from scratch.
Agreed.
More information about the darcs-users
mailing list