Thursday 23 January 2014

gdaltransform has a sed ending

What I wanted this post to be about was how to transform coordinates in a csv file from one coordinate system to another using gdaltransform. But in order to get a nicely functioning code I had to jump through some hoops.

gdaltransform is a command line tool for transforming coordinates

gdaltransform in action
Here I transform RT90 2,5g V to SWEREF99 TM. As you can see, I give the source coordinate system and the target coordinate system then hit return. I can then paste/type coordinates separated by spaces and a coordinate triplet is returned (Eastings, Northings, Elevation). Great, but what if I have a whole list of coordinates? No problem:

gdaltransform on a list of coordinates


So  gdaltransform -s_srs EPSG:3021 -t_srs EPSG:3006 < indata.csv > outdata.csv took the contents of indata.csv and returned outdata.csv. The problem is that it reads the first two space separated columns and return a file with three space separated columns. If you happen to have a data file that contains more columns than this and/or the coordinate columns aren't ordered so that x,y and perhaps z are the first three columns and/or you are a stickler and think that "csv" should mean "csv" and not "ssv" then you will need to do some work.

Lets just say that your coordinates are in columns 5 and 6 of a file called "Summer2004.csv", which mine are. The following will ask awk to skip the first line (the column headers) and then pass the contents of columns 5 and 6 to gdaltransform via "pipe" (the | thing).

awk -F "," 'NR>1 {print $5 " " $6}' Summer2004.csv | gdaltransform -s_srs EPSG:3021 -t_srs EPSG:3006 

This dumps the space separated triplets to the terminal. So far so good but not ideal. So then we replace the spaces with commas:

awk -F "," 'NR>1 {print $5 " " $6}' Summer2004.csv | gdaltransform -s_srs EPSG:3021 -t_srs EPSG:3006 | awk '{gsub(" ",",",$0); print $0}'

Again awk to the rescue. But now we want to put that data at the right hand side of our file i.e. add it as columns to the original table. We can use paste for this:

paste -d',' Summer2004.csv <(awk -F "," 'NR>1 {print $5 " " $6}' Summer2004.csv | gdaltransform -s_srs EPSG:3021 -t_srs EPSG:3006 | awk '{gsub(" ",",",$0); print $0}') > Summer2004_trans.csv

So, paste wants two files (or things to stick together) and we have told it to put a comma between the one and the other. The first thing/file is the source file "Summer2004.csv" then comes that <(awk .......) bit with all the awk gdal awk messing around and then out to "Summer2004_trans.csv".
Notice that the awk gdal stuff is in parentheses and is pointed at the paste command using <
But there is a problem:

Quicklook of csv file with missing column headers

As you can see from this Quicklook of the csv the three columns are in place but have no headers and have filled from the top. So how do we fix this? This is where sed should come riding in to the rescue. On Linux it does:

paste -d',' Summer2004.csv <(awk -F "," 'NR>1 {print $5 " " $6}' Summer2004.csv | gdaltransform -s_srs EPSG:3021 -t_srs EPSG:3006 | awk '{gsub(" ",",",$0); print $0}' | sed "1i\X,Y,Z")  > Summer2004_trans.csv

This uses sed to insert at line 1 (1i\) the column headers X,Y,Z before the now complete columns of transformed coordinates are then passed to paste to be combined into the result file.

On Mac OSX it isn't that simple. The standard installation of sed on OSX is an older version and the syntax is different. After a lot of searching for a solution to this I found nothing that works.
Except...
Install a newer version of sed. But what if OSX needs the old version? This is a problem. The OSX standard sed is installed under /usr/bin/ as it should be. Now, this should mean that OSX won't mind if we replace it with a version of sed that uses a different syntax, but let's be honest, something is bound to go wrong. Even if OSX won't throw a wobbly, some of your other installations might.
My solution is as follows:
Go to the sed download page and download sed
Then cd into the unzipped directory and run

./configure
make
sudo make install

This will put a brand new (-ish, 2009) copy of sed in your /usr/local/bin/ directory. Then open your .bash_profile and create a new alias

alias sad='/usr/local/bin/sed'

Then you can restart the terminal and try again using sad instead of sed.

paste -d',' Summer2004.csv <(awk -F "," 'NR>1 {print $5 " " $6}' Summer2004.csv | gdaltransform -s_srs EPSG:3021 -t_srs EPSG:3006 | awk '{gsub(" ",",",$0); print $0}' | sad "1i\X,Y,Z")  > Summer2004_trans.csv

All's well that ends well



A quick update on this post for a question posted on StackExchange. Here is a bash script for processing multiple files:


#!/bin/bash
#
for i in $( ls *.csv ); do
    paste -d',' ${i} <(awk -v OFS="," -F " " 'NR>1 {print $1 " " $2}' ${i} | gdaltransform -s_srs EPSG:4326 -t_srs EPSG:32633 | awk '{gsub(" ",",",$0); print $0}' | /usr/local/bin/sed "1i\X,Y,Z")  > utm${i}

done



No comments:

Post a Comment