! Aware to Perl: I'm having trouble matching over more than one line. What's wrong?

RocketAware > Perl >

I'm having trouble matching over more than one line. What's wrong?

Tips: Browse or Search all pages for efficient awareness of Perl functions, operators, and FAQs.

Home

Search Perl pages

Subjects

By activity
Professions, Sciences, Humanities, Business, ...

User Interface
Text-based, GUI, Audio, Video, Keyboards, Mouse, Images,...

Text Strings
Conversions, tests, processing, manipulation,...

Math
Integer, Floating point, Matrix, Statistics, Boolean, ...

Processing
Algorithms, Memory, Process control, Debugging, ...

Stored Data
Data storage, Integrity, Encryption, Compression, ...

Communications
Networks, protocols, Interprocess, Remote, Client Server, ...

Hard World
Timing, Calendar and Clock, Audio, Video, Printer, Controls...

File System
Management, Filtering, File & Directory access, Viewers, ...

I'm having trouble matching over more than one line. What's wrong?

Either you don't have newlines in your string, or you aren't using the correct modifier(s) on your pattern.

There are many ways to get multiline data into a string. If you want it to happen automatically while reading input, you'll want to set $/ (probably to '' for paragraphs or undef for the whole file) to allow you to read more than one line at a time.

Read the perlre manpage to help you decide which of /s and /m (or both) you might want to use: /s allows dot to include newline, and /m allows caret and dollar to match next to a newline, not just at the end of the string. You do need to make sure that you've actually got a multiline string in there.

For example, this program detects duplicate words, even when they span line breaks (but not paragraph ones). For this example, we don't need /s because we aren't using dot in a regular expression that we want to cross line boundaries. Neither do we need /m because we aren't wanting caret or dollar to match at any point inside the record next to newlines. But it's imperative that $/ be set to something other than the default, or else we won't actually ever have a multiline record read in.

    $/ = '';            # read in more whole paragraph, not just one line
    while ( <> ) {
        while ( /\b(\w\S+)(\s+\1)+\b/gi ) {
            print "Duplicate $1 at paragraph $.\n";
        }
    }

Here's code that finds sentences that begin with ``From '' (which would be mangled by many mailers):

    $/ = '';            # read in more whole paragraph, not just one line
    while ( <> ) {
        while ( /^From /gm ) { # /m makes ^ match next to \n
            print "leading from in paragraph $.\n";
        }
    }

Here's code that finds everything between START and END in a paragraph:

    undef $/;           # read in whole file, not just one line or paragraph
    while ( <> ) {
        while ( /START(.*?)END/sm ) { # /s makes . cross line boundaries
            print "$1\n";
        }
    }

Source: Perl FAQ: Regexps
Copyright: Copyright (c) 1997 Tom Christiansen and Nathan Torkington.

Next: How can I pull out lines between two patterns that are themselves on different lines?

Previous: How can I hope to use regular expressions without creating illegible and unmaintainable code?

(Corrections, notes, and links courtesy of RocketAware.com)

[Overview Topics]

Up to: NUL terminated String Comparison and Search

Rapid-Links: Search | About | Comments | Submit Path: RocketAware > Perl > perlfaq6/I_m_having_trouble_matching_over.htm