If you are doing this conversion using Microsoft Excel on a Mac, you'll note that the resulting file does not have Unix-style newlines. A Unix new line is the 0x0a character, also written as \n. What Excel produces is the 0x0d character, also written as \r.
Most Linux commands do not recognize \r as a line ending. There are several ways to convert the \r characters to proper Linux style line endings. Using the vi editor is a common method. However, there is also the issue that sometimes if the Excel spreadsheet has blank columns, Excel insists on writing a possibly large number of \r characters at the end of the csv file. The vi method would write a newline per each of these \r characters and that is not ideal.
Instead, you could use this perl one-liner to accomplish both : turn all \r into \n except for the trailing \r characters :
perl -ne 's/([^\r])\r/$1\n/g; s/\r//g; print;' imported.csv
The regular expression replaces any non \r character followed by \r with the non \r character followed by a \n. Since the trailing \r characters do not match this pattern, they are thus ignored. The second regexp removes these \r characters.
No comments:
Post a Comment