How to resolve the algorithm Find duplicate files step by step in the REXX programming language

Published on 12 May 2024 09:40 PM

How to resolve the algorithm Find duplicate files step by step in the REXX programming language

Table of Contents

Problem Statement

In a large directory structure it is easy to inadvertently leave unnecessary copies of files around, which can use considerable disk space and create confusion.

Create a program which, given a minimum size and a folder/directory, will find all files of at least size bytes with duplicate contents under the directory and output or show the sets of duplicate files in order of decreasing size. The program may be command-line or graphical, and duplicate content may be determined by direct comparison or by calculating a hash of the data. Specify which filesystems or operating systems your program works with if it has any filesystem- or OS-specific requirements. Identify hard links (filenames referencing the same content) in the output if applicable for the filesystem. For extra points, detect when whole directory sub-trees are identical, or optionally remove or link identical files.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Find duplicate files step by step in the REXX programming language

Source code in the rexx programming language

/*REXX program to reads a (DOS) directory  and  finds and displays files that identical.*/
sep=center(' files are identical in size and content: ',79,"═")    /*define the header. */
tFID= 'c:\TEMP\FINDDUP.TMP'                      /*use this as a temporary  FileID.     */
arg maxSize aDir                                 /*obtain optional arguments from the CL*/
if maxSize='' | maxSize="," then maxSize=1000000 /*filesize limit (in bytes) [1 million]*/
aDir=strip(aDir)                                 /*remove any leading or trailing blanks*/
if right(aDir,1)\=='\'  then aDir=aDir"\"        /*possibly add a trailing backslash [\]*/
"DIR"  aDir  '/a-d-s-h /oS /s | FIND "/" >' tFID /*the (DOS) DIR output ───► temp file. */
pFN=                                             /*the previous  filename and filesize. */
pSZ=;  do j=0  while lines(tFID)\==0             /*process each of the files in the list*/
       aLine=linein(tFID)                        /*obtain (DOS) DIR's output about a FID*/
       parse var aLine . . sz fn                 /*obtain the filesize and its fileID.  */
       sz=space(translate(sz,,','),0)            /*elide any commas from the size number*/
       if sz>maxSize  then leave                 /*Is the file > maximum?  Ignore file. */
                                                 /* [↓]  files identical?  (1st million)*/
       if sz==pSZ  then  if charin(aDir||pFN,1,sz)==charin(aDir||FN,1,sz)  then do
                                                                                say sep
                                                                                say pLine
                                                                                say aLine
                                                                                say
                                                                                end
       pSZ=sz;      pFN=FN;      pLine=aLine     /*remember the previous stuff for later*/
       end   /*j*/

if lines(tFID)\==0  then 'ERASE' tFID            /*do housecleaning  (delete temp file).*/
                                                 /*stick a fork in it,  we're all done. */


/*REXX program to reads a (DOS) directory  and  finds and displays files that identical.*/
sep=center(' files are identical in size and content: ',79,"═")    /*define the header. */
parse arg !;     if !all(arg())  then exit                         /*boilerplate HELP(?)*/
signal on halt;  signal on novalue;  signal on syntax              /*handle exceptions, */

if \!dos  then call err 'this program requires the DOS [environment].'
call getTFID                                     /*defines a temporary  File ID for DOS.*/
arg maxSize aDir                                 /*obtain optional arguments from the CL*/
if maxSize='' | maxSize="," then maxSize=1000000 /*filesize limit (in bytes) [1 million]*/
if \isInt(maxSize)      then call err  "maxSize isn't an integer:"       maxSize
if maxSize<0            then call err  "maxSize can't be negative:"      maxSize
if maxSize=0            then call err  "maxSize can't be zero:"          maxSize
aDir=strip(aDir)                                 /*remove any leading or trailing blanks*/
if right(aDir,3)=='*.*' then aDir=substr(aDir,1,length(aDir)-3)   /*adjust the dir name.*/
if right(aDir,1)\=='\'  then aDir=aDir"\"        /*possibly add a trailing backslash [\]*/
@dir    = 'DIR'                                  /*literal for the (DOS)  DIR  command. */
@dirNots= '/a-d-s-h'                             /*ignore DIRs, SYSTEM, and HIDDEN files*/
@dirOpts= '/oS /s'                               /*sort DIR's (+ subdirs) files by size.*/
@filter = '| FIND "/"'                           /*the "lines" must have a slash [/].   */
@erase  = 'ERASE'                                /*literal for the (DOS)  ERASE command.*/
@dir aDir @dirNots @dirOpts @filter '>' tFID     /*(DOS) DIR  output ──► temporary file.*/
pFN=                                             /*the previous  filename and filesize. */
pSZ=;  do j=0  while lines(tFID)\==0             /*process each of the files in the list*/
       aLine=linein(tFID)                        /*obtain (DOS) DIR's output about a FID*/
       parse var aLine . . sz fn                 /*obtain the filesize and its fileID.  */
       sz=space(translate(sz,,','),0)            /*elide any commas from the size number*/
       if sz>maxSize  then leave                 /*Is the file > maximum?  Ignore file. */
                                                 /* [↓]  files identical?  (1st million)*/
       if sz==pSZ  then  if charin(aDir||pFN,1,sz)==charin(aDir||FN,1,sz)  then do
                                                                                say sep
                                                                                say pLine
                                                                                say aLine
                                                                                say
                                                                                end
       pSZ=sz;      pFN=FN;      pLine=aLine     /*remember the previous stuff for later*/
       end   /*j*/

say j  'file's(j)  "examined in"  aDir           /*show information to the screen.*/
if lines(tFID)\==0  then 'ERASE'  tFID           /*do housecleaning  (delete temp file).*/
exit                                             /*stick a fork in it,  we're all done. */
/*═════════════════════════════general 1─line subs══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════*/
!all:  !!=!;!=space(!);upper !;call !fid;!nt=right(!var('OS'),2)=="NT";!cls=word('CLS VMFCLEAR CLRSCREEN',1+!cms+!tso*2);if arg(1)\==1 then return 0;if wordpos(!,"? ?SAMPLES ?AUTHOR ?FLOW")==0 then return 0;!call=']$H';call "$H" !fn !;!call=;return 1
!cal:  if symbol('!CALL')\=="VAR" then !call=; return !call
!env:  !env='ENVIRONMENT'; if !sys=="MSDOS" | !brexx | !r4 | !roo  then !env='SYSTEM'; if !os2  then !env="OS2"!env; !ebcdic=1=='f1'x;     return
!fid:  parse upper source !sys !fun !fid . 1 . . !fn !ft !fm .; call !sys; if !dos  then do; _=lastpos('\',!fn); !fm=left(!fn,_); !fn=substr(!fn,_+1); parse var !fn !fn "." !ft; end;     return word(0 !fn !ft !fm, 1+('0'arg(1)))
!rex:  parse upper version !ver !vernum !verdate .; !brexx='BY'==!vernum; !kexx="KEXX"==!ver; !pcrexx='REXX/PERSONAL'==!ver | "REXX/PC"==!ver; !r4='REXX-R4'==!ver; !regina="REXX-REGINA"==left(!ver,11); !roo='REXX-ROO'==!ver; call !env;   return
!sys:  !cms=!sys=='CMS'; !os2=!sys=="OS2"; !tso=!sys=='TSO' | !sys=="MVS"; !vse=!sys=='VSE'; !dos=pos("DOS",!sys)\==0|pos('WIN',!sys)\==0|!sys=="CMD"; call !rex;                          return
!var:  call !fid; if !kexx  then return space(dosenv(arg(1)));             return space(value(arg(1),,!env))
err:       say;  say;  say  center(' error! ', 60, "*");  say;  do j=1  for arg();  say arg(j);  say;  end;  say;  exit 13
getdTFID:  tfid=p(!var("TMP") !var('TEMP') homedrive()"\"); if substr(tfid,2,1)==':'&substr(tfid,3,1)\=="\" then tfid=insert('\',t,2);        return strip(tfid,"T",'\')"\"arg(1)'.'arg(2)
getTFID:   if symbol('TFID')=="LIT" then tfid=; if tfid\=='' then return tfid; gfn=word(arg(1) !fn,1);gft=word(arg(2) "TMP",1); tfid='TEMP';if !tso  then tfid=gfn"."gft;if !cms  then tfid=gfn','gft",A4";if !dos then tfid=getdTFID(gfn,gft);return tfid
halt:      call err 'program has been halted.'
homedrive: if symbol('HOMEDRIVE')\=="VAR"  then homedrive=p(!var('HOMEDRIVE') "C:");   return homedrive
isint:     return datatype(arg(1),'W')
novalue:   syntax:   call err 'REXX program' condition("C") 'error',condition("D"),'REXX source statement (line' sigl"):",sourceline(sigl)
p:         return word(arg(1),1)
s:         if arg(1)==1  then return arg(3);   return word(arg(2) 's',1)


  

You may also check:How to resolve the algorithm Sum and product of an array step by step in the Joy programming language
You may also check:How to resolve the algorithm Poker hand analyser step by step in the Clojure programming language
You may also check:How to resolve the algorithm Variadic function step by step in the Golo programming language
You may also check:How to resolve the algorithm Rate counter step by step in the Yabasic programming language
You may also check:How to resolve the algorithm Hilbert curve step by step in the BQN programming language