dev@glassfish.java.net

Re: Quick analysis of v3 hg repo

From: Jerome Dochez <Jerome.Dochez_at_Sun.COM>
Date: Mon, 10 Dec 2007 22:07:55 -0800

This is very helpful. I think you should remove the www and appserv-
tests modules to start with and see what the size would come to.

On Dec 10, 2007, at 5:39 PM, Paul Sterk wrote:

>
> Bill,
>
> Your postings to the dev_at_glassfish alias prompted me to run some
> numbers. I did a find on the binary file types in the /m/glassfish-
> svn-to-hg repo and then sent the output to a shell script that
> summed the bytes. If you are curious on how I got my results, see [1]
>
> File type Number of files Total bytes M
> jar 374 300
> dll 1 1
> pdf 46 30
> rar 12 13
> swf 3 8
> zip 45 13
> Total 481 365
> If we assume that these files should not be in the v3 hg developer
> repos (pdfs could go into a separate www repo), the current size of
> the /m/glassfish-svn-to-hg repo is about 1GB. If we remove the
> current size of the .hg repository of 600M, we are left with 400M of
> mostly text files. So, what is the breakdown?
>
> File Type (estimates) Mbtyes
> Binary files 365
> Text files 400
> hg history files 600
> Total 1365
> Now, of the the total non-history files in the repo (765M), 365M
> should not be there. That works out to 365 M/ 765M = 0.477 or 48%.
> If we removed the bloat, the revised size of the v3 .hg repository is:
>
> 600M x .52 = 312M
>
> So, the pruned size of the entire hg v3 repo (text files plus text
> file history) is about 400M + 312 M = 712M
>
> I expect that there are a number of opportunities to further reduce
> the size of the existing svn repo. I will send a follow up email
> that estimates the size of each of the modularized repos.
>
> Thanks,
> Paul
>
> -------------------------------------------------------------------------------------------------------------------------------------
> Notes from Ken:
>
> Note that one problem we have is people confusing the download size
> and the working copy size.
> Download is what hg pull sees, and for an initial pull, should be
> more or less the contents of .hg.
> hg pull may or may not compress on pull (it doesn't for ssh, but you
> can configure ssh to compress),
> which for text files should give good results. After the pull (or
> as part of an initial hg clone),
> the hg update roughly doubles the size of the local repository. I
> think everything we've seen says that
> the history is a rather small part of the repo size, compared to the
> large number (38000 or so) of text
> files.
>
> Just as an experiment, I cloned the hg repo and chopped out most of
> the big binaries
> (This probably corrupted the repository, but I'm only interested in
> rough sizes here).
> A tarball of the .hg directory takes up 304 Mbytes, which gzip's to
> 230 MB. This might
> be closer to the size for getting all of the repository, not
> including further cleanups, and
> especially modularization of the workspace. For example, www is 191
> MB, and that should
> probably be a separate repository, since most developers (other than
> doc writers contributing
> tutorials and such: we actually have outside contributes doing that
> for Grizzly) won't be working
> on the docs.
>
> If we delete the www directory, the tar ball reduces to 205 MB, and
> gzips to 140 MB.
> Further reductions by splitting into more repositories should get us
> to a typical developer
> repository download size of 20-40 MB or so.
> -------------------------------------------------------------------------------------------------------------------------------------
>
> [1]
> find . -name '*.jar' -exec ls -l {} \; | awk '{print $5}' > /home/
> psterk/jar.file.bytes
>
> addFiles.sh
> #! /bin/bash
> set -- `< $1` ## for multiple files use: set -- `cat "$@"`
> q=$*
> printf "%s\n" $(( ${q// / + } ))
>
> addFiles.sh jar.file.bytes
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: dev-unsubscribe_at_glassfish.dev.java.net For
> additional commands, e-mail: dev-help_at_glassfish.dev.java.net