Text justification (ASCII)

January 15, 2007 by pfortuny

In my hosting server (or should I say my sdf Unix-shell account) there is a nice bboard, one of whose groups is “HELPDESK”. There people come and go looking for and receiving help. It is a helpful and amiable area, in which I have learnt a lot (and which has served me also to refresh forgotten ideas or solutions to old problems).

About a week ago, someone asked there for a Unix utility to word-wrap files. For whatever reason (probably lack of attention) I understood he was looking for a justification utility, that is, a script to reformat paragraphs so that all the lines have the same length. You certainly know what it means, but the wikipedia explains it, just in case. The point is: I decided to do it, as an interesting prloblem in simple functional programming. I wanted also to include several options (like not justifying lines ending in “dots”, or choosing the distribution of spaces in the lines -rightwards, leftwards or randomly-, etc…). I came out with this (which includes a lorem ipsum text for demo purposes and a little “joiner” program as a plus).

In this entry, I am commenting the main loop and the justification subroutine:

The loop is (I have taken away the code for special cases):

  1 while(<&gt ;) {
2 chomp ;
3 $line .= $_ ;
4 while($line) {
5 if ($start != 1) {
6 print $PREPEND ;
7 }
8 ($output, $line) = justify($start, $line);
9 # ONLY PRINT THE OUTPUT if the line goes on, otherwise,
10 # we need to adjust the loose line, in case it has to be justified.
11 if ($line) {
12 print $output ;
13 $start = 0;
14 }
15 }
16 # chomp the last part of
the line and process it again, otherwise,
17 # loose lines were always printed verbatim (which is
18 # not necessarily ustify($start, $output ) ;
21 print $output ;
22

It is quite simple, as you see. The only “idea” in it is to have the justification routine return not only a line, but both the line and the outstanding text. This makes it possible to loop on $line (at line 4 in the code above), taking advantage of the call in line 8 which gets the true “output” for the present line and sets $line to what remains to format. This loop is repeated until there is no remaining output. (this is the while($line) in line 4. Finally, the last remaining output *needs* to be processed, (it will be what is called a “loose line” and the user may or may not want to process it, according to the command-line parameters).The “justify” routine thakes a long line as input (and a flag telling it whether it is parsing the start of a paragraph or not) and returns two strings: a completely justified line + the outstanding text -there is some special code to deal with loose lines and for several different user preferences).

Interesting chunks are:

23 # inside the “justify()” subroutine 24 #25 $local_line = join(” “, my @words =
split(/s+/,$local_line)) ;

which in a single line takes away all the repeated spaces (and the trailing and starting ones) from $local_line (a copy of the input line of text) [this is done with split], saves into @words a list of each “word” (anything not containing spaces in it) and joins all the words again putting single spaces in between [join].
Then we pop words from the @words list and put them inside $overfull, which will contain the outstanding text:

26 # (…) 27 while($#words and length (”@words”) > $col_width) {28 $overfull = (pop @words) . ” ” . $overfull ;
29 }

Then, if there are remaining @words (this will happen unless there is just ONE word in $local_line of length greater than the line width), distribute spaces as evenly as possible between words:

30 # (…) 31 if ($#words) {32 my $free_space = $col_width - length(join (”", @words)) ;
33 $space = ” ” x int($free_space / $#words) ;

Keep the remaining spaces in @last space to be distributed later on according to the user’s preferences. The keys of array %spaces are exactly the places where these spaces will appear.

34 @last_space = (” “) x ($free_space % $#words) ; 35 %spaces = ();36
37 for(my $i = 0; $i <= $#last_space; $i++) {
38 # distribute outstanding space according to user’s prefs
39 $spaces{$j} = ” “;
40 }
41 my $i = 0 ;
42

Here the output is “written” word by word inserting as much space as the algorithm has computed after each word. Notice how we do this for all the @words but the last one, which get special treatment, as it may be the only one in the line.

43 # join words + spaces 44 foreach my $word (0..$#words - 1){45 $output .= $words[$word] . $space ;
46 $output .= ($spaces{$i} ? pop @last_space : “” ) ;
47 $i++ ;
48 }
49 }
50

Finally, insert the last word into $output, with space before if $output is already non-null or without it if there is no output still.

52 $output .= ($output ? “@last_space” . $words[$#words] : 53 $words[$#words] ) . “n” ;54 return ($output, $overfull) ;
55 return ($output, $overfull) ;

This is all. Comments are welcome and remember, you can download the code and do as you please with it. But don’t blame me.

What the … iPod?

February 25, 2006 by pfortuny

Sometimes you find people ranting about things they ought to think better.

Some Thomas Hawk puts the following title to his today’s column:

Thomas Hawk’s Digital Connection: iTunes, One Billion Suckers Served

And starts complaining (and mocking people) about the DRM Apple puts in its iTunes music store files.

I thought Everybody knows you can them burn to a CD or to a DVD or whatever, there are many many tools to do the job for you; however, Thomas seems not to. Someone tries to explain this to him in the followups. The only answer he finds is the next magnificent paragraph:

Yes, you can burn them to a CD but once you’ve invested in 10,000 songs how much fun and work is that going to be? And what about all of your metadata that you’ve customized? Will you be able to burn this over and transfer it to your new mp3 file. Admittedly I haven’t tried this but I suspect you might lose customized meta data that you entered. Burning everything to CD and then reripping is time consuming and something that you shouldn’t have to do — better to start with DRM free mp3s in the first place.

And I say: DOES THIS GUY KNOW WHAT TECHNOLOGY IS ABOUT? He speaks at the beginning of the forthcoming “killer phone”

What happens when the killer phone is finally here? You know the one, built in terabyte of storage, lightening fast file transfer speeds, full satellite radio, a breathalyzer, your car and house key, a tiny little thing the size of credit card with a 12 mega pixel camera on it (hey it’s the future right, we can dream). What happens when this phone is out and you really want it and unfortunately Apple didn’t make it? That’s right, you’re a sucker then aren’t you. I thought so. You paid all that good money for your iTunes and now you can’t put them on your new phone because your new phone threatens Apple’s dominance.

But someone who believes in this phone… does not believe you will be able to transfer your music to it, just because of a DRM issue which has already been solved?

Amazing.

BTW: I have never bought anything from Apple. ZERO. Although I do own an iPod… cool tool.


Hi there

February 4, 2006 by pfortuny

I just downloaded Flock and created my wordpress account. Do I need anything else to be happy?

I hope to get back to this blog soon. Now I am just testing its funcionality. Seems easy to use with flock.

By the way, my homepage is here.

Seems that the pop-up window is not quite nice (a kind of semi-transparent background which looked too ugly). Should I post a but-report.

By for now.

Hello world!

February 4, 2006 by pfortuny

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!