Say you have a big directory of files spread across tons of subfolders, and you want to find the unique file types present, along with the count for each one. Try this:
find . | sed -e 's/\.[^.]*\.\(.*\)/\1/g' | grep -v '^\.' | sort | uniq -c > ~/files3.txt
find . | sed -e 's/\.[^.]*\.\(.*\)/\1/g' | grep -v '^\.' | sort | uniq -c > ~/files3.txt
- find lists all the files across all subdirectories.
- sed grabs only the extension after the . in each filename.
- grep removes extension-less folders.
- sort sorts all extensions alphabetically (somehow uniq doesn't work without it).
- uniq compiles the duplicate counts.
Alternatively,
find . | sed -e 's/\.[^.]*\.\(.*\)/\1/g;/\//d' | sort | uniq -c > ~/files3.txt
find . | sed -e 's/\.[^.]*\.\(.*\)/\1/g;/\//d' | sort | uniq -c > ~/files3.txt
- sed performs grep's deletion of extension-less folders by detecting remaining slashes in the filename (assuming folder pathnames don't contain dots).
I prefer to use a MS-Windows GUI tool: Directory Report
ReplyDeletehttp://www.file-utilities.com
After scanning your files select menu:
Largest / Display Largest Type
Seems like a useful app. I try to stay away from paid custom software, though.
DeleteA little improvement that keeps only everything after the last slash, and sorts the counts:
ReplyDeletefind . | sed -e 's/.*\/\(.*\)$/\1/g' | sed -e 's/.*\.\(.*\)$/\1/g' | sort | uniq -c | sort -n