icon Top 9 categories map      RocketAware > Perl >

How can I print out a word-frequency or line-frequency summary?

Tips: Browse or Search all pages for efficient awareness of Perl functions, operators, and FAQs.



Home

Search Perl pages


Subjects

By activity
Professions, Sciences, Humanities, Business, ...

User Interface
Text-based, GUI, Audio, Video, Keyboards, Mouse, Images,...

Text Strings
Conversions, tests, processing, manipulation,...

Math
Integer, Floating point, Matrix, Statistics, Boolean, ...

Processing
Algorithms, Memory, Process control, Debugging, ...

Stored Data
Data storage, Integrity, Encryption, Compression, ...

Communications
Networks, protocols, Interprocess, Remote, Client Server, ...

Hard World
Timing, Calendar and Clock, Audio, Video, Printer, Controls...

File System
Management, Filtering, File & Directory access, Viewers, ...

    

How can I print out a word-frequency or line-frequency summary?

To do this, you have to parse out each word in the input stream. We'll pretend that by word you mean chunk of alphabetics, hyphens, or apostrophes, rather than the non-whitespace chunk idea of a word given in the previous question:

    while (<>) {
        while ( /(\b[^\W_\d][\w'-]+\b)/g ) {   # misses "`sheep'"
            $seen{$1}++;
        }
    }
    while ( ($word, $count) = each %seen ) {
        print "$count $word\n";
    }

If you wanted to do the same thing for lines, you wouldn't need a regular expression:

    while (<>) { 
        $seen{$_}++;
    }
    while ( ($line, $count) = each %seen ) {
        print "$count $line";
    }

If you want these output in a sorted order, see the section on Hashes.


Source: Perl FAQ: Regexps
Copyright: Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
Next: How can I do approximate matching?

Previous: How do I process each word on each line?



(Corrections, notes, and links courtesy of RocketAware.com)


[Overview Topics]

Up to: File filtering and processing




Rapid-Links: Search | About | Comments | Submit Path: RocketAware > Perl > perlfaq6/How_can_I_print_out_a_word_frequ.htm
RocketAware.com is a service of Mib Software
Copyright 2000, Forrest J. Cavalier III. All Rights Reserved.
We welcome submissions and comments