Monday, February 18, 2013

Perl : Use Text::CSV instead of split for parsing CSV lines

When you have lines which have quoted fields delimited by commas, it is often easy to use the perl native split command to parse:

 perl -ne '@x=split(",");' file.csv  
But this will not parse this type of line :


"2013-02-15","478944","http://cdn.springboard.gozammer.com/mediaplayer/springboard/mediaplayer.swf?config={""externalConfiguration"":""http://www.springeagle.com/superconfig/sgv014.js"",""playlist"":""http://cms.springeagle.com/xml_feeds_advanced/index/683/rss3/668219/0/0/5/""}","1","0",""

Everything starting from "http://" and including the closing brace is a single field.

But within this field, we find the "," separator. This means that when our perl split does its work, we get the following as the 2nd field in the line:


http://cdn.springboard.gozammer.com/mediaplayer/springboard/mediaplayer.swf?config={""externalConfiguration"":""http://www.springeagle.com/superconfig/sgv014.js"

However, Text::CSV perl module is smart enough to recognize certain sub formats within the quoted fields and parse accordingly. It identifies the opening brace and searches for a closing brace, passing over any delimiters in between.

Here is an example:

cat /tmp/x | perl -ne 'BEGIN { use Text::CSV; $csv=Text::CSV->new();} chomp; $csv->parse($_); @x=$csv->fields(); for $x (@x) {print "$x ## "}; print "\n";'
2013-02-15 ## 478944 ##  http://cdn.springboard.gozammer.com/mediaplayer/springboard/mediaplayer.swf?config={"externalConfiguration":"http://www.springeagle.com/superconfig/sgv014.js","playlist":"http://cms.springeagle.com/xml_feeds_advanced/index/683/rss3/668219/0/0/5/"} ## 1 ## 0 ##  ## 


1 comment:

Blogger said...

The best facts about Clixsense's Get Paid To Click Program:
1. Up to $0.02 per click.
2. 5 seconds lowest timer.
3. Repeat every 24 hours.