[darcs-users] index format
Petr Rockai
me at mornfall.net
Tue Jun 9 10:01:30 UTC 2009
Eric Kow <kowey at darcs.net> writes:
> I realise this is really more for Petr to say, but since I'm posting an update
> email, I might as well mention that I noticed this patch just now:
>
> Tue Jun 9 08:02:02 BST 2009 Petr Rockai <me at mornfall.net>
> * Make peekItem safe even when dirlen is Nothing.
>
> And the updated version of peekItem
>
> peekItem :: ForeignPtr () -> Int -> Maybe Int -> IO Item
> peekItem fp off dirlen =
> withForeignPtr fp $ \p -> do
> nl' :: Int32 <- peekByteOff p off
> let nl = fromIntegral nl'
> path = fromForeignPtr (castForeignPtr fp) (off + 4) (nl - 1)
> path_noslash = (BS.last path == '/') ? (BS.init path, path)
> hash = fromForeignPtr (castForeignPtr fp) (off + 4 + nl) 64
> name = snd $ case dirlen of
> Just split -> BS.splitAt split path_noslash
> Nothing -> BS.spanEnd (/= '/') path_noslash
> return $! Item { iName = name
> , iPath = path
> , iHash = hash
> , iSize = plusPtr p (off + 4 + nl + 64)
> , iAux = plusPtr p (off + 4 + nl + 64 + 8)
> }
>
> No more undefined! Thanks!
Well, we instead have a code line that's never executed (and never tested, so
if it's buggy -- although I guess it's not -- people will get funny behaviour
instead of a clear bug in the unlikely case they somehow use this code
directly). I'm not sure it's very cool, but I guess the opposition to
error/undefined is too strong for whatever reason. (I'm just wondering if one
of these days when we get HPC reports of darcs, people will complain about
never-executed code lines ...) And I guess we should now bury this discussion?
> Yet another question
> --------------------
> Yes, it's about the index again, and no it's not that important to me. So
> we've established that the index uses a binary format so that we can load it in
> faster. Since the index is organised into "lines" (watch out, haddock things
> your quote marks are links), how about making the lines textual lines by using
> a newline to separate them?
> The trick here is that we're still using a binary format, reading and writing
> to it in a rigid non-parsy way. It's just that we systematically end each
> entry with a byte that coincidentally renders in people's terminals a nice way.
> I was thinking that something like this might reduce the opacity of the index
> somehow. Not that anybody should be mucking around in it, but I can imagine
> that on the off chance that something goes wrong, maybe just the slight
> improvement in ease of visual inspection (not to mention grepping) will pay off
> in the future?
Ahw about hyperlinks. As for tacking newlines between entries, I'm not sure
it's really useful, since the file contains binary garbage in-between
anyway. You can use "hd" or hexedit to conveniently look at the file
contents. A tool for dumping the index content would be probably more useful
though, and is trivial to write with hashed-storage. Also, you can look into
your index with ghci:
... > :m + Storage.Hashed.Tree Storage.Hashed.Index Storage.Hashed
... > a <- readIndex "_darcs/index" >>= unfold
... > printPath a ""
== Listing Tree (immediates only):
install-sh
hpc.README
darcs.cabal
[snip]
... > printPath a "bugs"
== Listing Tree bugs (immediates only):
time-stamps.sh
record-scaling.sh
nfs-failure.sh
newlines.sh
merging_newlines.sh
[snip]
I guess I'll extend the listing format to include hashes and entry types. Maybe
add specialised code to dump also timestamps (those are internal to the Index
and there's no external interface to access them).
Yours,
Petr.
--
Peter Rockai | me()mornfall!net | prockai()redhat!com
http://blog.mornfall.net | http://web.mornfall.net
"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
-- Blair P. Houghton on the subject of C program indentation
More information about the darcs-users
mailing list