Wednesday, February 29, 2012

Show hidden files, Mac OS X

Mac OS X's Finder annoyingly does not seem to have a GUI option to show/hide hidden files/folders in the file system. But it can be done through the terminal, easily enough:

To show hidden files:
defaults write com.apple.finder AppleShowAllFiles TRUE

To hide hidden files:
defaults write com.apple.finder AppleShowAllFiles TRUE

Not too bad, right? Credits to:

Monday, February 20, 2012

File type count

Say you have a big directory of files spread across tons of subfolders, and you want to find the unique file types present, along with the count for each one. Try this:

find . | sed -e 's/\.[^.]*\.\(.*\)/\1/g' | grep -v '^\.' | sort | uniq -c > ~/files3.txt
  • find lists all the files across all subdirectories.
  • sed grabs only the extension after the . in each filename.
  • grep removes extension-less folders.
  • sort sorts all extensions alphabetically (somehow uniq doesn't work without it).
  • uniq compiles the duplicate counts.
Alternatively,

find . | sed -e 's/\.[^.]*\.\(.*\)/\1/g;/\//d' | sort | uniq -c > ~/files3.txt
  • sed performs grep's deletion of extension-less folders by detecting remaining slashes in the filename (assuming folder pathnames don't contain dots).

    Wednesday, February 8, 2012

    Quick web crawling

    wget -r -l 1 --http-user=$USER --http-passwd=$PASSWD $URL

    will get you a level-1 depth crawl of all links starting at $URL. $USER and $PASSWD are optional, of course, in case the page requires HTML authentication.

    Now that is nice!

    Empty line removal

    A simple sub-task in sed, handy to keep around in case one does not know sed well:

    sed '/^$/d' myFile

    That's it! So I learned how to do deletions today :)

    Thursday, February 2, 2012

    Column concatenation

    When processing text files on a shell, sometimes you want to concatenate columns from different files. Maybe you have the students' names on one file and their grades on the other, or something. Anyway, I found several ways to concatenate columns, and due to their potential time-saving capability, I'm posting them up here:

    • Method 1: pr (print)
    • Method 2: paste
    • Method 3: awk
    The first two will allow you to simply concatenate columns horizontally, and nothing else. Most useful perhaps for numbering rows in a text file (aided by seq), or concatenating two perfectly aligned files. I've had both needs at times.

    The third method extends upon the first two, and allows you to shuffle the columns onto another order, or to join columns based on their values. The first use case is neatly exemplified here:


    > cat file1
    one two three
    one two three
    one two three
    one two three
    
    > cat file2
    four five six
    four five six
    four five six
    four five six
    
    > pr -m -t -s\  file1 file2 | gawk '{print $4,$5,$6,$1}'
    four five six one
    four five six one
    four five six one
    four five six one

    See how it uses pr to concatenate the columns first, and THEN uses awk to shuffle them. I went through the manual pages, and it seems pr is geared towards formatting text for page printing, while paste is a more simple operator. I haven't used them much, but I'd suggest paste over pr in general. Both both have their uses.

    Then this other webpage suggests the following command, which I have not tried:

    awk '
      # store the first file, indexed by col2
      NR==FNR {f1[$2] = $0; next}
      # output only if file1 contains file2's col2
      ($2 in f1) {print f1[$2], $0}
    ' file1 file2

    It seems complicated and I don't yet quite understand it, but if it does what it claims, it should come very much in handy.

    And a supporter of the join command suggests a slightly cleaner method:


    You can do that with a combination of the sort and join commands. The straightforward approach is
    join -j2 <(sort -k2 file1) <(sort -k2 file2)
    
    but that displays slightly differently than you're looking for. It just shows the common join field and then the remaining fields from each file
    "1431_at" "3973" 2.52832098784342 "653" 2.14595534191867
    "207201_s_at" "1826" 2.41685345240968 "1109" 2.13777517447307
    
    If you need the format exactly as you showed, then you would need to tell join to output in that manner
    join -o 1.1,1.2,1.3,2.1,2.2,2.3 -j2 <(sort -k2 file1) <(sort -k2 file2)
    
    where -o accepts a list of FILENUM.FIELDNUM specifiers.
    Note that the <() syntax I'm using isn't POSIX sh, so you should sort to a temporary file if you need POSIX sh syntax.

    I also haven't sat today to understand it, but it looks promising.
    I should be trying them out soon. Hope you do too.

    Wednesday, February 1, 2012

    Sorting text columns

    I was looking for ways to sort text files by column, and I sure found it!


    It's already really well explained there, so I'll just copy paste the explanation (with due credit):

    Sorting a tab delimited file using the Unix sort command is easy once you which parameters to use. An advanced file sort can get difficult if it has multiple columns, uses tab characters as the column separator, you want to reverse the sort order on some columns, and where you want the columns sorted in non-sequential order.
    Assume that we have the following file where each column is separated by a [TAB] character:
    Group-ID   Category-ID   Text        Frequency
    ----------------------------------------------
    200        1000          oranges     10
    200        900           bananas     5
    200        1000          pears       8
    200        1000          lemons      10
    200        900           figs        4
    190        700           grapes      17
    I’d like to have this file sorted by these columns and in this specific order (note that column 4 is sorted before column 3 and that column 4 is sorted in reverse order):
    • Group ID (integer)
    • Category ID (integer)
    • Frequency “sorted in reverse order” (integer)
    • Text (alpha-numeric)
    This should sort the file into this format:
    Group-ID   Category-ID   Text        Frequency
    ----------------------------------------------
    190        700           grapes      17
    200        900           bananas     5
    200        900           figs        4
    200        1000          lemons      10
    200        1000          oranges     10
    200        1000          pears       8
    The quick answer is that these sort arguments would solve the problem:
    sort -t $'\t' -k 1n,1 -k 2n,2 -k4rn,4 -k3,3 <my-file>
    A description of what it all means please read on. The first thing we need to do is to tell sort to use TAB as a column separator (column separated or delimited) which we can do using:
    sort -t $'\t' <my-file>
    If our input file was comma separated we could have used:
    sort -t "," <my-file>
    The next step is define that we want the file sorted by columns 1, 2, 4 and 3 and in this particular order. The key argument “-k” allows us to do this. The tricky part is that you have to define the column index twice to limit the sort to any given column, e.g. like this “-k 1,1″. If you only specify it once like this “-k 1″ you’re telling Unix “sort” to sort the file from column 1 and until the end of the line which is not what we want. If you want to sort column 1 and 2 together you’d use “-k 1,2″.  To tell sort to sort multiple columns we have to define the key argument “-k” multiple times. The sort arguments required to sort our file in column order 1, 2, 4 and 3 will therefore look like this:
    sort -t $'\t' -k 1,1 -k 2,2 -k 4,4 -k 3,3 <my-file>
    We however want the 4th column sorted in reverse order which we can instruct sort to do by changing the argument from “-k 4,4″ to “-k 4r,4″. The “r” option reverses the sort order that column only. There’s only one problem left to solve and that is that sort by default will interpret numbers as text and will sort e.g.  the number 10 ahead of 2. We solve this by adding the “n” option to tell “sort” to sort a column using its numerical values e.g. “-k 1n,1″. Note that the “n” option is only attached to the first number to the left of the comma. Since the 4th column is sorted in both reversed order and using numerical values we can combine the options like this “-k 4rn,4″
    So by adding all of these options together with end up with:
    sort -t $'\t' -k 1n,1 -k 2n,2 -k 4rn,4 -k 3,3 <my-file>
    I hope someone will find this useful. I tested this solution on both Linux and OS X. The documentation for the Unix sort command can be found using your man command “man sort” and “info sort”.

    NTFS-3G on Mac OS X

    So I have three NTFS hard drives, and I want to write to them from my Mac OS X. No native support, of course, so I found a handy little tool called NTFS-3G. Free but not fully trouble-free, I decided to install it. I just managed to make it work, and the basic steps were:
    • Install MacPorts (just download and install).
    • sudo port install ntfs-3g (I'm guessing it'll take a WHILE to grab all the necessary dependencies, but it just needs to do it once. Fast network highly preferred).
    • mv /sbin/mount_ntfs /sbin/mount_ntfs.orig
    • ln -s /opt/local/bin/ntfs-3g /sbin/mount_ntfs

    The volumes will no longer appear on the Desktop, but they'll be accessible through the terminal on /Volumes/*. And to get a window open, just open /Volumes/VolumeName. Done!