Duplicates, again, but other sort of ;)
Posted: Thu Nov 26, 2020 6:39 pm
I need to write a tool to find duplicate files in a folder. We talk about some 25k files in there , so "by hand" is no option.
The filenames are arbitrary, (i just found a file named: "____ ___ ___ (_____, ____. ___).docx" ); the extensions are arbitrary.
The app that controls this folder "notices", if a given file is already present, and changes the filename of the newly inserted file, by adding a date plus an increment. E.g:
If there's a "myTest.prg", that copy will be ""myTest-Nov-26-1.prg", if another occurs at the same date, it will be "myTest-Nov-26-2.prg", if it appears tomorrrow it will be named: "myTest-Nov-27-1.prg
I have no control over this.
My first thought was to get a list of FileInfos, stepping through, comparing file n with file n+1, if filenname front parts are identical, check for same size and same change date, delete the n+1 file (or better, move to backup ;->), iterate until no dups are found.
Does that make sense?
I'd feel better, if i could check for files being identitical, but didn't find some tool in .Net (probably searched wrongly).
Maybe one could send pairs to something like WinMerge?
The dups appear usually rather rarely, but every once and again there are hiccups in the upstream process, and i get 1000 new ones ;-(
Any idea welcomed!
EDIT: maybe should have consulted the web prior to write - found some candidates, and found one which tells my how dumb i was - ignoring the first "marker" - two identical files have to be the same size...
The filenames are arbitrary, (i just found a file named: "____ ___ ___ (_____, ____. ___).docx" ); the extensions are arbitrary.
The app that controls this folder "notices", if a given file is already present, and changes the filename of the newly inserted file, by adding a date plus an increment. E.g:
If there's a "myTest.prg", that copy will be ""myTest-Nov-26-1.prg", if another occurs at the same date, it will be "myTest-Nov-26-2.prg", if it appears tomorrrow it will be named: "myTest-Nov-27-1.prg
I have no control over this.
My first thought was to get a list of FileInfos, stepping through, comparing file n with file n+1, if filenname front parts are identical, check for same size and same change date, delete the n+1 file (or better, move to backup ;->), iterate until no dups are found.
Does that make sense?
I'd feel better, if i could check for files being identitical, but didn't find some tool in .Net (probably searched wrongly).
Maybe one could send pairs to something like WinMerge?
The dups appear usually rather rarely, but every once and again there are hiccups in the upstream process, and i get 1000 new ones ;-(
Any idea welcomed!
EDIT: maybe should have consulted the web prior to write - found some candidates, and found one which tells my how dumb i was - ignoring the first "marker" - two identical files have to be the same size...