[SOLVED] Need help using sed to remove preceding and trailing spaces in CSV
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Need help using sed to remove preceding and trailing spaces in CSV
Hello,
I need help removing the preceding and trailing spaces around the commas in my CSV without destroying my address field. I'm new to regex and sed so this is probably easy but I just can't do it without destroying the Address section. Any help would be greatly appreciated. I'm using vanilla Linux and sed 4.1.3
I'm willing to use any regex or even awk if needed.
Example:
I need this
randall , dean, 11111 , 1309 Hillside Ave., Warsaw, VA , 23591
tina , jane , 22222, 1309 Hillside Ave.,Warsaw , VA, 23591
to become this
randall,dean,11111,1309 Hillside Ave.,Warsaw,VA,23591
tina,jane,22222,1309 Hillside Ave.,Warsaw,VA,23591
Update: Say the file name is fixed_all_3.csv I have it most of the way with the following:
sed "s/^ *//;s/ *$//;s/ \{1,\}//g" fixed_all_3.csv > fixed_all_4.csv
The problem is that it also removes all the spaces from the address section so 1309 Hillside Ave. becomes 1309hillsideave.
Last edited by netwaves; 09-16-2010 at 02:05 PM.
Reason: Update:
I'm sure there will be shorter ways then mine since I'm just learning sed but this should do the trick:
Code:
sed -i 's/[ ]*,[ ]*/,/' yourfile
It deletes all spaces before and after the comma and doesn't touch the other spaces in the file. It performs the changes in the file you indicate without saving it to another file. If you want to keep the original and save the changed result to another file, then remove the '-i' and redirect output to another file.
Hope that helps.
Kind regards,
Eric
Last edited by EricTRA; 09-17-2010 at 12:41 AM.
Reason: Forgot to mention -i
I tried this on my personal machine and it almost works perfectly but unfortunately I do not have access to Ruby on the server. :-(
ruby -ne 'print if gsub(/\s*,\s*/,",")' file
The following comes very close but seems to leave some of the extraneous spaces inside the lines in the file even when run multiple times on the same file. Could someone explain each section of the following so I can understand and possibly change/correct it for my needs? TIA
This works! Great! Thank you Eric, crts, and everyone else that helped. I'm in the process of reading Mastering Regular Expressions, Third Edition, and Pro Bash Programming: Scripting the GNU/Linux Shell, so hopefully there won't be too many such newbie questions from me soon. :-)
Like, what happens to commas that are embedded within the fields of the CVS formatted file?
sed is a poor tool for solving this problem, unless you know for certain that the data does not contain embedded commas. CSV files cannot be parsed easily with regular expressions alone. That is why there are whole modules written in Perl to handle CSV formatted data.
Like, what happens to commas that are embedded within the fields of the CVS formatted file?
sed is a poor tool for solving this problem, unless you know for certain that the data does not contain embedded commas. CSV files cannot be parsed easily with regular expressions alone. That is why there are whole modules written in Perl to handle CSV formatted data.
--- rod.
Hi,
You're correct about that! But since the OP gave a pretty decent example of the data structure of his CSV file, it was pretty safe in my opinion.
Understood. But, the source data will never change format, embedded commas are not allowed, and I did request sed, awk, or plain regex. Those are the only tools I have available on the server. If there's a better solution that's available using these tools I would very much be interested. :-) TIA
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.