Previously. This post is like a prequel.
This is a real story. Each day a server dumps MySQL DB into a file:
... -rw-rw-r-- 1 i i 6697136 Dec 1 00:00 mysql_dump_1669849201.sql.xz -rw-rw-r-- 1 i i 6730080 Dec 2 00:00 mysql_dump_1669935601.sql.xz -rw-rw-r-- 1 i i 6762716 Dec 3 00:00 mysql_dump_1670022001.sql.xz -rw-rw-r-- 1 i i 6603604 Dec 4 00:00 mysql_dump_1670108401.sql.xz -rw-rw-r-- 1 i i 6590036 Dec 5 00:00 mysql_dump_1670194801.sql.xz -rw-rw-r-- 1 i i 6639448 Dec 6 00:00 mysql_dump_1670281201.sql.xz -rw-rw-r-- 1 i i 6673608 Dec 7 00:00 mysql_dump_1670367601.sql.xz -rw-rw-r-- 1 i i 6701520 Dec 8 00:00 mysql_dump_1670454001.sql.xz ...
But again, I don't need them all (in case of disaster). Logarithmic scale can help here as well, as it did with ZFS snapshots.
This is a general-use utility written in Python for logarithmic trimming.
#!/usr/bin/env python3 import subprocess, sys, os import math, time, datetime dry_run=True def get_files_list(): global dry_run rt={} for f in sys.argv[1:]: if f=="--commit": dry_run=False else: TS=os.path.getmtime(f) if TS not in rt: rt[TS]=[f] else: rt[TS].append(f) return rt files=get_files_list() if len(files)==0: print ("Usage: ./logtrim.py [--commit] filemask") print ("By default, it's executed in dry run mode. No files gets deleted.") print ("Add --commit to actually delete files.") exit(1) # These parameters are to be tuned if you want different logarithmic 'curve'... points=sorted(list(set([math.floor(1.09**x) for x in range(1,120+1)]))) # points in hours #print (points) now=math.floor(time.time()) # points in UNIX timestamps SECONDS_IN_HOUR=60*60 points_TS=sorted(list(map(lambda x: now-x*SECONDS_IN_HOUR, points)), reverse=True) points_TS.append(0) # remove the oldest file, if it's not in range prev=now # we are going to keep only one files between each range # a file to be picked randomly, or just the first/last # if there is only one file in the range, leave it for p in points_TS: print ("range", prev, p, datetime.datetime.fromtimestamp(prev), datetime.datetime.fromtimestamp(p)) range_hi=prev range_lo=p print ("files between:") files_between={} for s in files: # half-closed interval: if s>range_lo and s<=range_hi: print (s, files[s]) files_between[s]=files[s] print ("files_between total:", len(files_between)) if len(files_between)>1: files_between_vals=list(files_between.values()) # going to kill all files except the first print ("keeping this file(s):", files_between_vals[0]) for to_kill in files_between_vals[1:]: print ("removing this file(s):", to_kill) if dry_run==False: for f in to_kill: os.unlink(f) prev=p if dry_run==True: print ("No files deleted.") print ("Add --commit to actually delete files.")
Let's run it on my list of mysql files:
% ./logtrim.py testdata/* ... range 1668748740 1668478740 2022-11-18 07:19:00 2022-11-15 04:19:00 files between: 1668549600.0 ['testdata/mysql_dump_1668553201.sql.xz'] 1668636000.0 ['testdata/mysql_dump_1668639601.sql.xz'] 1668722400.0 ['testdata/mysql_dump_1668726001.sql.xz'] files_between total: 3 keeping this file(s): ['testdata/mysql_dump_1668553201.sql.xz'] removing this file(s): ['testdata/mysql_dump_1668639601.sql.xz'] removing this file(s): ['testdata/mysql_dump_1668726001.sql.xz'] range 1668478740 1668187140 2022-11-15 04:19:00 2022-11-11 19:19:00 files between: 1668204000.0 ['testdata/mysql_dump_1668207601.sql.xz'] 1668290400.0 ['testdata/mysql_dump_1668294001.sql.xz'] 1668376800.0 ['testdata/mysql_dump_1668380401.sql.xz'] 1668463200.0 ['testdata/mysql_dump_1668466801.sql.xz'] files_between total: 4 keeping this file(s): ['testdata/mysql_dump_1668207601.sql.xz'] removing this file(s): ['testdata/mysql_dump_1668294001.sql.xz'] removing this file(s): ['testdata/mysql_dump_1668380401.sql.xz'] removing this file(s): ['testdata/mysql_dump_1668466801.sql.xz'] ... No files deleted. Add --commit to actually delete files. ... % ./logtrim.py --commit testdata/* ...
List of files after trimming. Isn't it neat?
-rw-rw-r-- 1 i i 584695 Jun 7 2022 mysql_dump_1654552801.sql -rw-rw-r-- 1 i i 319376 Jun 13 2022 mysql_dump_1655071201.sql.xz -rw-rw-r-- 1 i i 742012 Jun 29 00:00 mysql_dump_1656453601.sql.xz -rw-rw-r-- 1 i i 1063884 Jul 13 00:00 mysql_dump_1657663201.sql.xz -rw-rw-r-- 1 i i 1929164 Jul 27 00:00 mysql_dump_1658872802.sql.xz -rw-rw-r-- 1 i i 2401192 Aug 8 00:00 mysql_dump_1659909601.sql.xz -rw-rw-r-- 1 i i 2311372 Aug 19 00:00 mysql_dump_1660860001.sql.xz -rw-rw-r-- 1 i i 2860008 Aug 30 00:00 mysql_dump_1661810402.sql.xz -rw-rw-r-- 1 i i 3294004 Sep 8 00:00 mysql_dump_1662588001.sql.xz -rw-rw-r-- 1 i i 3366360 Sep 17 00:00 mysql_dump_1663365601.sql.xz -rw-rw-r-- 1 i i 3914516 Sep 25 00:00 mysql_dump_1664056801.sql.xz -rw-rw-r-- 1 i i 3986248 Oct 3 00:00 mysql_dump_1664748001.sql.xz -rw-rw-r-- 1 i i 4183152 Oct 9 00:00 mysql_dump_1665266401.sql.xz -rw-rw-r-- 1 i i 4466500 Oct 16 00:00 mysql_dump_1665871201.sql.xz -rw-rw-r-- 1 i i 4380092 Oct 21 00:00 mysql_dump_1666303201.sql.xz -rw-rw-r-- 1 i i 4906184 Oct 26 00:00 mysql_dump_1666735201.sql.xz -rw-rw-r-- 1 i i 4877932 Oct 31 00:00 mysql_dump_1667170801.sql.xz -rw-rw-r-- 1 i i 5012264 Nov 5 00:00 mysql_dump_1667602801.sql.xz -rw-rw-r-- 1 i i 5151808 Nov 9 00:00 mysql_dump_1667948401.sql.xz -rw-rw-r-- 1 i i 5088692 Nov 12 00:00 mysql_dump_1668207601.sql.xz -rw-rw-r-- 1 i i 5286184 Nov 16 00:00 mysql_dump_1668553201.sql.xz -rw-rw-r-- 1 i i 5196168 Nov 19 00:00 mysql_dump_1668812401.sql.xz -rw-rw-r-- 1 i i 5290272 Nov 22 00:00 mysql_dump_1669071601.sql.xz -rw-rw-r-- 1 i i 5340424 Nov 24 00:00 mysql_dump_1669244401.sql.xz -rw-rw-r-- 1 i i 5692236 Nov 27 00:00 mysql_dump_1669503601.sql.xz -rw-rw-r-- 1 i i 6463064 Nov 29 00:00 mysql_dump_1669676401.sql.xz -rw-rw-r-- 1 i i 6697136 Dec 1 00:00 mysql_dump_1669849201.sql.xz -rw-rw-r-- 1 i i 6762716 Dec 3 00:00 mysql_dump_1670022001.sql.xz -rw-rw-r-- 1 i i 6603604 Dec 4 00:00 mysql_dump_1670108401.sql.xz -rw-rw-r-- 1 i i 6639448 Dec 6 00:00 mysql_dump_1670281201.sql.xz -rw-rw-r-- 1 i i 6673608 Dec 7 00:00 mysql_dump_1670367601.sql.xz -rw-rw-r-- 1 i i 6729428 Dec 9 00:00 mysql_dump_1670540401.sql.xz -rw-rw-r-- 1 i i 6755784 Dec 10 00:00 mysql_dump_1670626801.sql.xz -rw-rw-r-- 1 i i 6786088 Dec 11 00:00 mysql_dump_1670713201.sql.xz -rw-rw-r-- 1 i i 6838348 Dec 12 00:00 mysql_dump_1670799601.sql.xz -rw-rw-r-- 1 i i 6873036 Dec 13 00:00 mysql_dump_1670886001.sql.xz -rw-rw-r-- 1 i i 6801340 Dec 14 00:00 mysql_dump_1670972401.sql.xz -rw-rw-r-- 1 i i 6832060 Dec 15 00:00 mysql_dump_1671058802.sql.xz -rw-rw-r-- 1 i i 6944328 Dec 16 00:00 mysql_dump_1671145201.sql.xz -rw-rw-r-- 1 i i 7102432 Dec 17 00:00 mysql_dump_1671231601.sql.xz -rw-rw-r-- 1 i i 6967316 Dec 18 00:00 mysql_dump_1671318001.sql.xz -rw-rw-r-- 1 i i 6992008 Dec 19 00:00 mysql_dump_1671404401.sql.xz -rw-rw-r-- 1 i i 7018544 Dec 20 00:00 mysql_dump_1671490801.sql.xz -rw-rw-r-- 1 i i 7047548 Dec 21 00:00 mysql_dump_1671577201.sql.xz -rw-rw-r-- 1 i i 7272416 Dec 22 00:00 mysql_dump_1671663601.sql.xz
You can run logtrim.py as a cron job.
BUG: Treating several files with the same modify timestamp as the single time.
UPD: I use this utility to trim list of old versions of my books. 1, 2, 3.
Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.