[SOLVED] Need help using sed to remove preceding and trailing spaces in CSV

netwaves · 09-16-2010, 07:35 AM

Hello,

I need help removing the preceding and trailing spaces around the commas in my CSV without destroying my address field. I'm new to regex and sed so this is probably easy but I just can't do it without destroying the Address section. Any help would be greatly appreciated. I'm using vanilla Linux and sed 4.1.3
I'm willing to use any regex or even awk if needed.

Example:
I need this
randall , dean, 11111 , 1309 Hillside Ave., Warsaw, VA , 23591
tina , jane , 22222, 1309 Hillside Ave.,Warsaw , VA, 23591

to become this
randall,dean,11111,1309 Hillside Ave.,Warsaw,VA,23591
tina,jane,22222,1309 Hillside Ave.,Warsaw,VA,23591

Update: Say the file name is fixed_all_3.csv I have it most of the way with the following:
sed "s/^ *//;s/ *$//;s/ \{1,\}//g" fixed_all_3.csv > fixed_all_4.csv
The problem is that it also removes all the spaces from the address section so 1309 Hillside Ave. becomes 1309hillsideave.

kurumi · 09-17-2010, 12:08 AM

Code:

ruby -ne 'print if gsub(/\s*,\s*/,",")' file

EricTRA · 09-17-2010, 12:13 AM

Hello,

I'm sure there will be shorter ways then mine since I'm just learning sed but this should do the trick:

Code:

sed -i 's/[ ]*,[ ]*/,/' yourfile

It deletes all spaces before and after the comma and doesn't touch the other spaces in the file. It performs the changes in the file you indicate without saving it to another file. If you want to keep the original and save the changed result to another file, then remove the '-i' and redirect output to another file.

Hope that helps.

Kind regards,

Eric

netwaves · 09-17-2010, 07:19 AM

I tried this on my personal machine and it almost works perfectly but unfortunately I do not have access to Ruby on the server. :-(

ruby -ne 'print if gsub(/\s*,\s*/,",")' file

The following comes very close but seems to leave some of the extraneous spaces inside the lines in the file even when run multiple times on the same file. Could someone explain each section of the following so I can understand and possibly change/correct it for my needs? TIA

sed -i 's/[ ]*,[ ]*/,/' yourfile.csv

Sample output from above:
This:
DOEDOE ,JONJON ,T, 55555, 1012 KOLONIAL AVENUE ,NORFOLK ,VA, 23513, 5555555555, INC AD, 12/12/2012, 9, TARASEN

Became:
DOEDOE,JONJON ,T, 5555, 1012 KOLONIAL AVENUE ,NORFOLK ,VA, 23513, 5555555555, INC AD, 12/12/2012, 9, TARASEN

crts · 09-17-2010, 07:37 AM

Hi,

Eric already gave you the solution. Just one minor change needs to be done:

Code:

sed -i 's/[ ]*,[ ]*/,/g' yourfile.csv

BTW,

Code:

sed -i 's/ *, */,/g' yourfile.csv

will also do in this case. However, in the latter solution the spaces before the '*' might be overseen.

netwaves · 09-17-2010, 08:34 AM

This works! Great! Thank you Eric, crts, and everyone else that helped. I'm in the process of reading Mastering Regular Expressions, Third Edition, and Pro Bash Programming: Scripting the GNU/Linux Shell, so hopefully there won't be too many such newbie questions from me soon. :-)

EricTRA · 09-17-2010, 09:17 AM

Quote:

Originally Posted by crts

Hi,

Eric already gave you the solution. Just one minor change needs to be done:

Code:

sed -i 's/[ ]*,[ ]*/,/g' yourfile.csv

BTW,

Code:

sed -i 's/ *, */,/g' yourfile.csv

will also do in this case. However, in the latter solution the spaces before the '*' might be overseen.

Hi,

Thanks for pointing that out. Why is it always the obvious that gets forgotten

Kind regards,

Eric

theNbomr · 09-17-2010, 01:00 PM

Quote:

Originally Posted by EricTRA

Why is it always the obvious that gets forgotten

Like, what happens to commas that are embedded within the fields of the CVS formatted file?

sed is a poor tool for solving this problem, unless you know for certain that the data does not contain embedded commas. CSV files cannot be parsed easily with regular expressions alone. That is why there are whole modules written in Perl to handle CSV formatted data.

--- rod.

EricTRA · 09-17-2010, 01:04 PM

Quote:

Originally Posted by theNbomr

Like, what happens to commas that are embedded within the fields of the CVS formatted file?

sed is a poor tool for solving this problem, unless you know for certain that the data does not contain embedded commas. CSV files cannot be parsed easily with regular expressions alone. That is why there are whole modules written in Perl to handle CSV formatted data.

--- rod.

Hi,

You're correct about that! But since the OP gave a pretty decent example of the data structure of his CSV file, it was pretty safe in my opinion.

Kind regards,

Eric

netwaves · 09-18-2010, 02:58 PM

Understood. But, the source data will never change format, embedded commas are not allowed, and I did request sed, awk, or plain regex. Those are the only tools I have available on the server. If there's a better solution that's available using these tools I would very much be interested. :-) TIA

grail · 09-19-2010, 09:09 PM

I think the sed is simple enough, but as you mention awk:

Code:

awk 'gsub(/ *, */,",")' file

netwaves · 09-20-2010, 07:14 AM

:-) Thanks. It's always good to be enlightened by multiple solutions to a problem.