[cfarm-users] GCC repository clones (was: gcc110 /home is now 98% full)
Jacob Bachmeyer
jcb62281 at gmail.com
Fri Mar 8 00:55:51 CET 2024
Segher Boessenkool via cfarm-users wrote:
> On Thu, Mar 07, 2024 at 03:25:45PM +0000, Jonathan Wakely via cfarm-users wrote:
>
>> On Thu, 7 Mar 2024 at 15:20, Zach van Rijn via cfarm-users
>> <cfarm-users at lists.tetaneutral.net> wrote:
>>
>>> On Thu, 2024-03-07 at 10:32 +0100, Martin Guy via cfarm-users
>>> wrote:
>>>
>>>> Il 06/03/24 20:31, Zach van Rijn via cfarm-users ha scritto:
>>>>
>>>>> On Wed, 2024-03-06 at 13:27 -0500, Sean McGovern via cfarm-
>>>>> users
>>>>> wrote:
>>>>> I did a quick check, but did not log in as root
>>>>>
>>>> So he's the only big user who isn't hiding anything :)
>>>>
>>> Top 25 account for 1.0TB, top 50 account for 1.3TB, and the rest
>>> account for 0.2TB. Data current as of this message.
>>>
>> A GCC clone, build dir, and install dir is over 10GB, so to be useful
>> for GCC development many of us need at least that much.
>>
>
> But many, many of those are weeks out of date so almost certainly
> completely useless.
>
I seem to remember that Git can use another local repository as a
"reference" when cloning a repository; objects present in the reference
repository need not be copied into the new repository and the reference
is read-only. Combined with Git's feature of using hardlinks to save
space between local clones when possible, perhaps cfarm machines should
start carrying "public" reference repositories of most of GCC's history
in system space?
Even just an annual "snapshot" would save most of the disk space needed
for a GCC clone. Multiple editions of the reference repository can be
easily maintained by cloning the previous reference (on the same volume,
which will use hardlinks), then pulling updates to form a new
reference. Each reference repository would be read-only once built and
available for local use with the git clone --reference option. If I
understand correctly, this would also greatly reduce the network traffic
(and therefore time) required to obtain a new GCC repository clone.
A script could be implemented to search for .git/objects/info/alternates
anywhere under /home and track which reference repositories actually
have clones using them. Reference clones that are unused and old could
be automatically removed to save space, if necessary. (There may be
some clever tricks to essentially make multiple reference "snapshots"
free, which would eliminate the need to bother with this.)
-- Jacob
More information about the cfarm-users
mailing list