LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-16-2010, 07:35 AM   #1
netwaves
LQ Newbie
 
Registered: Sep 2010
Location: Norfolk Virginia USA
Distribution: Linux, Mandriva, Fedora, Ubuntu
Posts: 5

Rep: Reputation: 1
Question Need help using sed to remove preceding and trailing spaces in CSV


Hello,

I need help removing the preceding and trailing spaces around the commas in my CSV without destroying my address field. I'm new to regex and sed so this is probably easy but I just can't do it without destroying the Address section. Any help would be greatly appreciated. I'm using vanilla Linux and sed 4.1.3
I'm willing to use any regex or even awk if needed.

Example:
I need this
randall , dean, 11111 , 1309 Hillside Ave., Warsaw, VA , 23591
tina , jane , 22222, 1309 Hillside Ave.,Warsaw , VA, 23591

to become this
randall,dean,11111,1309 Hillside Ave.,Warsaw,VA,23591
tina,jane,22222,1309 Hillside Ave.,Warsaw,VA,23591

Update: Say the file name is fixed_all_3.csv I have it most of the way with the following:
sed "s/^ *//;s/ *$//;s/ \{1,\}//g" fixed_all_3.csv > fixed_all_4.csv
The problem is that it also removes all the spaces from the address section so 1309 Hillside Ave. becomes 1309hillsideave.

Last edited by netwaves; 09-16-2010 at 02:05 PM. Reason: Update:
 
Old 09-17-2010, 12:08 AM   #2
kurumi
Member
 
Registered: Apr 2010
Posts: 228

Rep: Reputation: 53
Code:
ruby -ne 'print if gsub(/\s*,\s*/,",")' file
 
Old 09-17-2010, 12:13 AM   #3
EricTRA
LQ Guru
 
Registered: May 2009
Location: Gibraltar, Gibraltar
Distribution: Fedora 20 with Awesome WM
Posts: 6,805
Blog Entries: 1

Rep: Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297
Hello,

I'm sure there will be shorter ways then mine since I'm just learning sed but this should do the trick:
Code:
sed -i 's/[ ]*,[ ]*/,/' yourfile
It deletes all spaces before and after the comma and doesn't touch the other spaces in the file. It performs the changes in the file you indicate without saving it to another file. If you want to keep the original and save the changed result to another file, then remove the '-i' and redirect output to another file.

Hope that helps.

Kind regards,

Eric

Last edited by EricTRA; 09-17-2010 at 12:41 AM. Reason: Forgot to mention -i
 
Old 09-17-2010, 07:19 AM   #4
netwaves
LQ Newbie
 
Registered: Sep 2010
Location: Norfolk Virginia USA
Distribution: Linux, Mandriva, Fedora, Ubuntu
Posts: 5

Original Poster
Rep: Reputation: 1
I tried this on my personal machine and it almost works perfectly but unfortunately I do not have access to Ruby on the server. :-(

ruby -ne 'print if gsub(/\s*,\s*/,",")' file

The following comes very close but seems to leave some of the extraneous spaces inside the lines in the file even when run multiple times on the same file. Could someone explain each section of the following so I can understand and possibly change/correct it for my needs? TIA

sed -i 's/[ ]*,[ ]*/,/' yourfile.csv

Sample output from above:
This:
DOEDOE ,JONJON ,T, 55555, 1012 KOLONIAL AVENUE ,NORFOLK ,VA, 23513, 5555555555, INC AD, 12/12/2012, 9, TARASEN

Became:
DOEDOE,JONJON ,T, 5555, 1012 KOLONIAL AVENUE ,NORFOLK ,VA, 23513, 5555555555, INC AD, 12/12/2012, 9, TARASEN
 
1 members found this post helpful.
Old 09-17-2010, 07:37 AM   #5
crts
Senior Member
 
Registered: Jan 2010
Posts: 2,020

Rep: Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757Reputation: 757
Hi,

Eric already gave you the solution. Just one minor change needs to be done:
Code:
sed -i 's/[ ]*,[ ]*/,/g' yourfile.csv
BTW,
Code:
sed -i 's/ *, */,/g' yourfile.csv
will also do in this case. However, in the latter solution the spaces before the '*' might be overseen.

Last edited by crts; 09-17-2010 at 07:42 AM.
 
1 members found this post helpful.
Old 09-17-2010, 08:34 AM   #6
netwaves
LQ Newbie
 
Registered: Sep 2010
Location: Norfolk Virginia USA
Distribution: Linux, Mandriva, Fedora, Ubuntu
Posts: 5

Original Poster
Rep: Reputation: 1
This works! Great! Thank you Eric, crts, and everyone else that helped. I'm in the process of reading Mastering Regular Expressions, Third Edition, and Pro Bash Programming: Scripting the GNU/Linux Shell, so hopefully there won't be too many such newbie questions from me soon. :-)
 
Old 09-17-2010, 09:17 AM   #7
EricTRA
LQ Guru
 
Registered: May 2009
Location: Gibraltar, Gibraltar
Distribution: Fedora 20 with Awesome WM
Posts: 6,805
Blog Entries: 1

Rep: Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297
Quote:
Originally Posted by crts View Post
Hi,

Eric already gave you the solution. Just one minor change needs to be done:
Code:
sed -i 's/[ ]*,[ ]*/,/g' yourfile.csv
BTW,
Code:
sed -i 's/ *, */,/g' yourfile.csv
will also do in this case. However, in the latter solution the spaces before the '*' might be overseen.
Hi,

Thanks for pointing that out. Why is it always the obvious that gets forgotten

Kind regards,

Eric
 
Old 09-17-2010, 01:00 PM   #8
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
Originally Posted by EricTRA View Post
Why is it always the obvious that gets forgotten
Like, what happens to commas that are embedded within the fields of the CVS formatted file?

sed is a poor tool for solving this problem, unless you know for certain that the data does not contain embedded commas. CSV files cannot be parsed easily with regular expressions alone. That is why there are whole modules written in Perl to handle CSV formatted data.

--- rod.
 
Old 09-17-2010, 01:04 PM   #9
EricTRA
LQ Guru
 
Registered: May 2009
Location: Gibraltar, Gibraltar
Distribution: Fedora 20 with Awesome WM
Posts: 6,805
Blog Entries: 1

Rep: Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297Reputation: 1297
Quote:
Originally Posted by theNbomr View Post
Like, what happens to commas that are embedded within the fields of the CVS formatted file?

sed is a poor tool for solving this problem, unless you know for certain that the data does not contain embedded commas. CSV files cannot be parsed easily with regular expressions alone. That is why there are whole modules written in Perl to handle CSV formatted data.

--- rod.
Hi,

You're correct about that! But since the OP gave a pretty decent example of the data structure of his CSV file, it was pretty safe in my opinion.

Kind regards,

Eric
 
Old 09-18-2010, 02:58 PM   #10
netwaves
LQ Newbie
 
Registered: Sep 2010
Location: Norfolk Virginia USA
Distribution: Linux, Mandriva, Fedora, Ubuntu
Posts: 5

Original Poster
Rep: Reputation: 1
Understood. But, the source data will never change format, embedded commas are not allowed, and I did request sed, awk, or plain regex. Those are the only tools I have available on the server. If there's a better solution that's available using these tools I would very much be interested. :-) TIA
 
Old 09-19-2010, 09:09 PM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
I think the sed is simple enough, but as you mention awk:
Code:
awk 'gsub(/ *, */,",")' file
 
Old 09-20-2010, 07:14 AM   #12
netwaves
LQ Newbie
 
Registered: Sep 2010
Location: Norfolk Virginia USA
Distribution: Linux, Mandriva, Fedora, Ubuntu
Posts: 5

Original Poster
Rep: Reputation: 1
:-) Thanks. It's always good to be enlightened by multiple solutions to a problem.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sed remove trailing newline nc3b Programming 20 06-26-2015 11:02 AM
[SOLVED] SED - how to remove blank spaces carolflb Linux - Newbie 2 01-30-2010 06:02 AM
Using awk/sed to convert linefeed to csv, with some formatting jaykup Programming 1 04-03-2009 05:18 PM
Manipulating comma delimited CSV - newbie help with sed etc jonnymorris Programming 16 09-19-2008 06:14 AM
Inserting spaces using SED/AWK ? aravindts Programming 4 09-29-2006 05:28 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration