Monday, February 20, 2012

File type count

Say you have a big directory of files spread across tons of subfolders, and you want to find the unique file types present, along with the count for each one. Try this:

find . | sed -e 's/\.[^.]*\.\(.*\)/\1/g' | grep -v '^\.' | sort | uniq -c > ~/files3.txt
  • find lists all the files across all subdirectories.
  • sed grabs only the extension after the . in each filename.
  • grep removes extension-less folders.
  • sort sorts all extensions alphabetically (somehow uniq doesn't work without it).
  • uniq compiles the duplicate counts.
Alternatively,

find . | sed -e 's/\.[^.]*\.\(.*\)/\1/g;/\//d' | sort | uniq -c > ~/files3.txt
  • sed performs grep's deletion of extension-less folders by detecting remaining slashes in the filename (assuming folder pathnames don't contain dots).

    3 comments:

    1. I prefer to use a MS-Windows GUI tool: Directory Report
      http://www.file-utilities.com
      After scanning your files select menu:
      Largest / Display Largest Type

      ReplyDelete
      Replies
      1. Seems like a useful app. I try to stay away from paid custom software, though.

        Delete
    2. A little improvement that keeps only everything after the last slash, and sorts the counts:

      find . | sed -e 's/.*\/\(.*\)$/\1/g' | sed -e 's/.*\.\(.*\)$/\1/g' | sort | uniq -c | sort -n

      ReplyDelete