icon Top 9 categories map      RocketAware > Perl >

How can I split a [character] delimited string except when inside

Tips: Browse or Search all pages for efficient awareness of Perl functions, operators, and FAQs.



Home

Search Perl pages


Subjects

By activity
Professions, Sciences, Humanities, Business, ...

User Interface
Text-based, GUI, Audio, Video, Keyboards, Mouse, Images,...

Text Strings
Conversions, tests, processing, manipulation,...

Math
Integer, Floating point, Matrix, Statistics, Boolean, ...

Processing
Algorithms, Memory, Process control, Debugging, ...

Stored Data
Data storage, Integrity, Encryption, Compression, ...

Communications
Networks, protocols, Interprocess, Remote, Client Server, ...

Hard World
Timing, Calendar and Clock, Audio, Video, Printer, Controls...

File System
Management, Filtering, File & Directory access, Viewers, ...

    

How can I split a [character] delimited string except when inside [character]? (Comma-separated files)

Take the example case of trying to split a string that is comma-separated into its different fields. (We'll pretend you said comma-separated, not comma-delimited, which is different and almost never what you mean.) You can't use split(/,/) because you shouldn't split if the comma is inside quotes. For example, take a data line like this:

    SAR001,"","Cimetrix, Inc","Bob Smith","CAM",N,8,1,0,7,"Error, Core Dumped"

Due to the restriction of the quotes, this is a fairly complex problem. Thankfully, we have Jeffrey Friedl, author of a highly recommended book on regular expressions, to handle these for us. He suggests (assuming your string is contained in $text):

     @new = ();
     push(@new, $+) while $text =~ m{
         "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # groups the phrase inside the quotes
       | ([^,]+),?
       | ,
     }gx;
     push(@new, undef) if substr($text,-1,1) eq ',';

If you want to represent quotation marks inside a quotation-mark-delimited field, escape them with backslashes (eg, "like \"this\"". Unescaping them is a task addressed earlier in this section.

Alternatively, the Text::ParseWords module (part of the standard perl distribution) lets you say:

    use Text::ParseWords;
    @new = quotewords(",", 0, $text);


Source: Perl FAQ: Data Manipulation
Copyright: Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
Next: How do I strip blank space from the beginning/end of a string?

Previous: How do I capitalize all the words on one line?



(Corrections, notes, and links courtesy of RocketAware.com)


[Overview Topics]

Up to: NUL Terminated String processing




Rapid-Links: Search | About | Comments | Submit Path: RocketAware > Perl > perlfaq4/How_can_I_split_a_character_de.htm
RocketAware.com is a service of Mib Software
Copyright 2000, Forrest J. Cavalier III. All Rights Reserved.
We welcome submissions and comments