When processing text files on a shell, sometimes you want to concatenate columns from different files. Maybe you have the students' names on one file and their grades on the other, or something. Anyway, I found several ways to concatenate columns, and due to their potential time-saving capability, I'm posting them up here:
- Method 1: pr (print)
- Method 2: paste
- Method 3: awk
The first two will allow you to simply concatenate columns horizontally, and nothing else. Most useful perhaps for numbering rows in a text file (aided by seq), or concatenating two perfectly aligned files. I've had both needs at times.
The third method extends upon the first two, and allows you to shuffle the columns onto another order, or to join columns based on their values. The first use case is neatly exemplified here:
> cat file1 one two three one two three one two three one two three > cat file2 four five six four five six four five six four five six > pr -m -t -s\ file1 file2 | gawk '{print $4,$5,$6,$1}' four five six one four five six one four five six one four five six one
See how it uses pr to concatenate the columns first, and THEN uses awk to shuffle them. I went through the manual pages, and it seems pr is geared towards formatting text for page printing, while paste is a more simple operator. I haven't used them much, but I'd suggest paste over pr in general. Both both have their uses.
Then this other webpage suggests the following command, which I have not tried:
awk '
# store the first file, indexed by col2
NR==FNR {f1[$2] = $0; next}
# output only if file1 contains file2's col2
($2 in f1) {print f1[$2], $0}
' file1 file2
It seems complicated and I don't yet quite understand it, but if it does what it claims, it should come very much in handy.
And a supporter of the join command suggests a slightly cleaner method:
You can do that with a combination of the
sort
and join
commands. The straightforward approach isjoin -j2 <(sort -k2 file1) <(sort -k2 file2)
but that displays slightly differently than you're looking for. It just shows the common join field and then the remaining fields from each file
"1431_at" "3973" 2.52832098784342 "653" 2.14595534191867
"207201_s_at" "1826" 2.41685345240968 "1109" 2.13777517447307
If you need the format exactly as you showed, then you would need to tell
join
to output in that mannerjoin -o 1.1,1.2,1.3,2.1,2.2,2.3 -j2 <(sort -k2 file1) <(sort -k2 file2)
where
-o
accepts a list of FILENUM.FIELDNUM
specifiers.Note that the
<()
syntax I'm using isn't POSIX sh, so you should sort to a temporary file if you need POSIX sh syntax.I also haven't sat today to understand it, but it looks promising.
I should be trying them out soon. Hope you do too.
No comments:
Post a Comment