[Slogger] Sigh
Tom Hoover
lists at thimk.org
Sun Dec 4 08:53:57 EST 2005
On Tue, Nov 29, 2005 at 04:12:54PM +0000, Rory McCann wrote:
> On Mon, 28 Nov 2005, Jon B wrote:
> > 2. It saves a lot of redundant data. I wish there were a way to
> > compress the files that it saves. I might try using NTFS's file
> > compression, but I fear that would just make it even slower.
>
> I've noticed this too. I'm going to write a bash shell script that will
> compare new files to all existing files in the data/ directory and delete
> it if it's a duplicate. It'll then gzip all files. I will have to write a
> plugin for MacOSX's Spotlight so that it can search inside .gz files, I
> haven't found an existing one. This would mean I would have the advantages
> of searching through all the pages I've visited and minimizing the needed
> space.
Well, you prompted me to go ahead and do what I've been meaning to do for some
time, since my slogger directory has grown to 22G over the past year. The
following one line bash script requires the use of fdupes, found in most Linux
distributions (or you can google for it):
fdupes -r . | grep -v " " | gawk 'BEGIN {RS=""} { for(i=1;(i+1)<=NF;i++) print "ln -f "$i " " $(i+1) }' > runme
explanation:
fdupes # uses fdupes to find all duplicate files
grep # remove any filenames that include spaces (I was too lazy to do any fancy parsing)
gawk # gawk builds another bash script that links the duplicate files
runme # chmod +x on this file, and then execute it (or, just "bash ./runme")
I don't delete the files, but hard link them. The disk space goes way down,
but all the links will continue to work. No guarantees, but it works for me.
Decreased the size of my slogger directory from 22G to 6.4G.
I setup a cronjob to run this anytime the partition containing my slogger
directory gets more than 90% full, so it should be virtually maintenance free.
More information about the Slogger
mailing list