How to resolve the algorithm Find duplicate files step by step in the PicoLisp programming language

Problem Statement
Step by Step Solution
Sourcecode

Problem Statement

In a large directory structure it is easy to inadvertently leave unnecessary copies of files around, which can use considerable disk space and create confusion.

Create a program which, given a minimum size and a folder/directory, will find all files of at least size bytes with duplicate contents under the directory and output or show the sets of duplicate files in order of decreasing size. The program may be command-line or graphical, and duplicate content may be determined by direct comparison or by calculating a hash of the data. Specify which filesystems or operating systems your program works with if it has any filesystem- or OS-specific requirements. Identify hard links (filenames referencing the same content) in the output if applicable for the filesystem. For extra points, detect when whole directory sub-trees are identical, or optionally remove or link identical files.

Let's start with the solution:

Step by Step solution about How to resolve the algorithm Find duplicate files step by step in the PicoLisp programming language

Source code in the picolisp programming language

`(== 64 64)
(de mmap (L F)
   (native "@" "mmap" 'N 0 L 1 2 F 0) )
(de munmap (A L)
   (native "@" "munmap" 'N A L) )
(de xxh64 (M S)
   (let
      (R (native "libxxhash.so" "XXH64" 'N M S 0)
         P `(** 2 64) )
      (if (lt0 R)
         (& (+ R P) (dec P))
         R ) ) )
(de walk (Dir)
   (recur (Dir)
      (for F (dir Dir)
         (let (Path (pack Dir "/" F)  Info (info Path T))
            (when (car Info)
               (if (=T (car Info))
                  (recurse Path)
                  (if (lup D (car Info))
                     (push (cdr @) Path)
                     (idx 'D (list (car Info) (cons Path)) T) ) ) ) ) ) ) )
(off D)
(walk "/bin")
(for Lst (filter cdadr (idx 'D))
   (let L
      (by
         '((F)
            (let (M (mmap (car Lst) (open F T))
               S (car Lst) )
               (prog1 (xxh64 M S) (munmap M S)) ) )
         group
         (cadr Lst) )
      (and (filter cdr L) (println (car Lst) @)) ) )

You may also check:How to resolve the algorithm Fibonacci sequence step by step in the Chez Scheme programming language
You may also check:How to resolve the algorithm The Name Game step by step in the AutoHotkey programming language
You may also check:How to resolve the algorithm A+B step by step in the Maple programming language
You may also check:How to resolve the algorithm Bioinformatics/Sequence mutation step by step in the Yabasic programming language
You may also check:How to resolve the algorithm Cuban primes step by step in the Racket programming language

How to resolve the algorithm Find duplicate files step by step in the PicoLisp programming language

Table of Contents

Problem Statement

Step by Step solution about How to resolve the algorithm Find duplicate files step by step in the PicoLisp programming language

Source code in the picolisp programming language