! Aware to Perl: How can I hope to use regular expressions without creating illegible and unmaintainable code?

RocketAware > Perl >

How can I hope to use regular expressions without creating illegible and unmaintainable code?

Tips: Browse or Search all pages for efficient awareness of Perl functions, operators, and FAQs.

Home

Search Perl pages

Subjects

By activity
Professions, Sciences, Humanities, Business, ...

User Interface
Text-based, GUI, Audio, Video, Keyboards, Mouse, Images,...

Text Strings
Conversions, tests, processing, manipulation,...

Math
Integer, Floating point, Matrix, Statistics, Boolean, ...

Processing
Algorithms, Memory, Process control, Debugging, ...

Stored Data
Data storage, Integrity, Encryption, Compression, ...

Communications
Networks, protocols, Interprocess, Remote, Client Server, ...

Hard World
Timing, Calendar and Clock, Audio, Video, Printer, Controls...

File System
Management, Filtering, File & Directory access, Viewers, ...

How can I hope to use regular expressions without creating illegible and unmaintainable code?

Three techniques can make regular expressions maintainable and understandable.

Comments Outside the Regexp

Describe what you're doing and how you're doing it, using normal Perl comments.

    # turn the line into the first word, a colon, and the
    # number of characters on the rest of the line
    s/^(\w+)(.*)/ lc($1) . ":" . length($2) /ge;

Comments Inside the Regexp

The /x modifier causes whitespace to be ignored in a regexp pattern (except in a character class), and also allows you to use normal comments there, too. As you can imagine, whitespace and comments help a lot.

/x lets you turn this:

    s{<(?:[^>'"]*|".*?"|'.*?')+>}{}gs;

into this:

    s{ <                    # opening angle bracket
        (?:                 # Non-backreffing grouping paren
             [^>'"] *       # 0 or more things that are neither > nor ' nor "
                |           #    or else
             ".*?"          # a section between double quotes (stingy match)
                |           #    or else
             '.*?'          # a section between single quotes (stingy match)
        ) +                 #   all occurring one or more times
       >                    # closing angle bracket
    }{}gsx;                 # replace with nothing, i.e. delete

It's still not quite so clear as prose, but it is very useful for describing the meaning of each part of the pattern.

Different Delimiters

While we normally think of patterns as being delimited with / characters, they can be delimited by almost any character. the perlre manpage describes this. For example, the s/// above uses braces as delimiters. Selecting another delimiter can avoid quoting the delimiter within the pattern:

    s/\/usr\/local/\/usr\/share/g;      # bad delimiter choice
    s#/usr/local#/usr/share#g;          # better

Source: Perl FAQ: Regexps
Copyright: Copyright (c) 1997 Tom Christiansen and Nathan Torkington.

Next: I'm having trouble matching over more than one line. What's wrong?

Previous: How do I select a random line from a file?

(Corrections, notes, and links courtesy of RocketAware.com)

[Overview Topics]

Up to: NUL terminated String Comparison and Search

Rapid-Links: Search | About | Comments | Submit Path: RocketAware > Perl > perlfaq6/How_can_I_hope_to_use_regular_ex.htm