In my hosting server (or should I say my sdf Unix-shell account) there is a nice bboard, one of whose groups is “HELPDESK”. There people come and go looking for and receiving help. It is a helpful and amiable area, in which I have learnt a lot (and which has served me also to refresh forgotten ideas or solutions to old problems).
About a week ago, someone asked there for a Unix utility to word-wrap files. For whatever reason (probably lack of attention) I understood he was looking for a justification utility, that is, a script to reformat paragraphs so that all the lines have the same length. You certainly know what it means, but the wikipedia explains it, just in case. The point is: I decided to do it, as an interesting prloblem in simple functional programming. I wanted also to include several options (like not justifying lines ending in “dots”, or choosing the distribution of spaces in the lines -rightwards, leftwards or randomly-, etc…). I came out with this (which includes a lorem ipsum text for demo purposes and a little “joiner” program as a plus).
In this entry, I am commenting the main loop and the justification subroutine:
The loop is (I have taken away the code for special cases):
1 while(<>
{
2 chomp ;
3 $line .= $_ ;
4 while($line) {
5 if ($start != 1) {
6 print $PREPEND ;
7 }
8 ($output, $line) = justify($start, $line);
9 # ONLY PRINT THE OUTPUT if the line goes on, otherwise,
10 # we need to adjust the loose line, in case it has to be justified.
11 if ($line) {
12 print $output ;
13 $start = 0;
14 }
15 }
16 # chomp the last part of
the line and process it again, otherwise,
17 # loose lines were always printed verbatim (which is
18 # not necessarily ustify($start, $output ) ;
21 print $output ;
22
It is quite simple, as you see. The only “idea” in it is to have the justification routine return not only a line, but both the line and the outstanding text. This makes it possible to loop on $line (at line 4 in the code above), taking advantage of the call in line 8 which gets the true “output” for the present line and sets $line to what remains to format. This loop is repeated until there is no remaining output. (this is the while($line) in line 4. Finally, the last remaining output *needs* to be processed, (it will be what is called a “loose line” and the user may or may not want to process it, according to the command-line parameters).The “justify” routine thakes a long line as input (and a flag telling it whether it is parsing the start of a paragraph or not) and returns two strings: a completely justified line + the outstanding text -there is some special code to deal with loose lines and for several different user preferences).
Interesting chunks are:
23 # inside the “justify()” subroutine 24 #25 $local_line = join(” “, my @words =
split(/s+/,$local_line)) ;
which in a single line takes away all the repeated spaces (and the trailing and starting ones) from $local_line (a copy of the input line of text) [this is done with split], saves into @words a list of each “word” (anything not containing spaces in it) and joins all the words again putting single spaces in between [join].
Then we pop words from the @words list and put them inside $overfull, which will contain the outstanding text:
26 # (…) 27 while($#words and length (”@words”) > $col_width) {28 $overfull = (pop @words) . ” ” . $overfull ;
29 }
Then, if there are remaining @words (this will happen unless there is just ONE word in $local_line of length greater than the line width), distribute spaces as evenly as possible between words:
30 # (…) 31 if ($#words) {32 my $free_space = $col_width - length(join (”", @words)) ;
33 $space = ” ” x int($free_space / $#words) ;
Keep the remaining spaces in @last space to be distributed later on according to the user’s preferences. The keys of array %spaces are exactly the places where these spaces will appear.
34 @last_space = (” “) x ($free_space % $#words) ; 35 %spaces = ();36
37 for(my $i = 0; $i <= $#last_space; $i++) {
38 # distribute outstanding space according to user’s prefs
39 $spaces{$j} = ” “;
40 }
41 my $i = 0 ;
42
Here the output is “written” word by word inserting as much space as the algorithm has computed after each word. Notice how we do this for all the @words but the last one, which get special treatment, as it may be the only one in the line.
43 # join words + spaces 44 foreach my $word (0..$#words - 1){45 $output .= $words[$word] . $space ;
46 $output .= ($spaces{$i} ? pop @last_space : “” ) ;
47 $i++ ;
48 }
49 }
50
Finally, insert the last word into $output, with space before if $output is already non-null or without it if there is no output still.
52 $output .= ($output ? “@last_space” . $words[$#words] : 53 $words[$#words] ) . “n” ;54 return ($output, $overfull) ;
55 return ($output, $overfull) ;
This is all. Comments are welcome and remember, you can download the code and do as you please with it. But don’t blame me.