[cfarm-users] GCC repository clones (was: gcc110 /home is now 98% full)

Jacob Bachmeyer jcb62281 at gmail.com
Fri Mar 8 00:55:51 CET 2024


Segher Boessenkool via cfarm-users wrote:
> On Thu, Mar 07, 2024 at 03:25:45PM +0000, Jonathan Wakely via cfarm-users wrote:
>   
>> On Thu, 7 Mar 2024 at 15:20, Zach van Rijn via cfarm-users
>> <cfarm-users at lists.tetaneutral.net> wrote:
>>     
>>> On Thu, 2024-03-07 at 10:32 +0100, Martin Guy via cfarm-users
>>> wrote:
>>>       
>>>> Il 06/03/24 20:31, Zach van Rijn via cfarm-users ha scritto:
>>>>         
>>>>> On Wed, 2024-03-06 at 13:27 -0500, Sean McGovern via cfarm-
>>>>> users
>>>>> wrote:
>>>>> I did a quick check, but did not log in as root
>>>>>           
>>>> So he's the only big user who isn't hiding anything :)
>>>>         
>>> Top 25 account for 1.0TB, top 50 account for 1.3TB, and the rest
>>> account for 0.2TB. Data current as of this message.
>>>       
>> A GCC clone, build dir, and install dir is over 10GB, so to be useful
>> for GCC development many of us need at least that much.
>>     
>
> But many, many of those are weeks out of date so almost certainly
> completely useless.
>   

I seem to remember that Git can use another local repository as a 
"reference" when cloning a repository; objects present in the reference 
repository need not be copied into the new repository and the reference 
is read-only.  Combined with Git's feature of using hardlinks to save 
space between local clones when possible, perhaps cfarm machines should 
start carrying "public" reference repositories of most of GCC's history 
in system space?

Even just an annual "snapshot" would save most of the disk space needed 
for a GCC clone.  Multiple editions of the reference repository can be 
easily maintained by cloning the previous reference (on the same volume, 
which will use hardlinks), then pulling updates to form a new 
reference.  Each reference repository would be read-only once built and 
available for local use with the git clone --reference option.  If I 
understand correctly, this would also greatly reduce the network traffic 
(and therefore time) required to obtain a new GCC repository clone.

A script could be implemented to search for .git/objects/info/alternates 
anywhere under /home and track which reference repositories actually 
have clones using them.  Reference clones that are unused and old could 
be automatically removed to save space, if necessary.  (There may be 
some clever tricks to essentially make multiple reference "snapshots" 
free, which would eliminate the need to bother with this.)


-- Jacob


More information about the cfarm-users mailing list