The objective of this assignment is to give you a little experience using Perl to process text files.
Please keep in mind that it's very easy to write extremely obscure Perl; style will be even more important than usual in the grading of this assignment. Be kind to your grader, and he will be kind to you.
Turn in info: As always, follow the turnin guidelines. Remember the "341HW" string in your subject line, please.
Your company's decided that XML (a family of text markup languages, similar to HTML) is the way of the future, and that all the old .ini and .conf configuration files that your company's applications use will now be in an XML data format. You need to write a script that your customers can use to translate legacy configuration files.
Your script must accept a number of filenames at the command line and produce properly translated files of the same name with the suffix ".xml" appended.
The input files are in the following format:
[Section header] key1=18 key2=list,of,comma,separated,values ... etc.The XML format your company wants would translate the above into:
... etc. You can probably grasp the configuration formats fairly intuitively, but here's a precise definition of the input format (it sounds complicated, but actually it's quite simple):
- The configuration file is made of sections, where each section begins with a heading. A heading consists of the section name enclosed in brackets: [Section name].
- After the heading, there will be a list of parameters, which are key/value pairs. A key/value pair will reside on a single line; the key and value are separated by a "=" sign and no spaces.
- The value (the part that comes after the "=") can be either a single value, or a list of values, separated by commas. You do not have to worry about commas embedded in the value strings, they will never occur.
The output format is as follows:
- You should translate section into balanced <section name="section name"> ... </section> tags
- Keys are translated into <param name="key name"> ... </param> tags.
- Each individual value must be translated into a separate <item type="foo" value="bar"/> tag. Since <item> is an "empty" tag (that is, it does not take a closing </item> tag), the tag has a / before the > symbol.
- There are two possible types for parameter values: "string" and "number". We will provide a subroutine to allow you to determine whether a scalar variable contains a number.
- The entire body of the configuration file must be enclosed in a <config> ... </config> tag.
Generally, XML data formats are whitespace insensitive, but human beings still occasionally read and edit these files by hand, so make sure you get the indentation right.
If this still isn't clear, see the sample files below for a more extended example.
Your script should take a list of filenames at the command prompt and output files with .xml appended to the filename. So, for example, the following:
./script.pl foo.ini bar baz.conf should result in the creation of three files named foo.ini.xml, bar.xml, and baz.conf.xml. To get an idea how to handle command-line arguments, see the tip on @ARGV below.
Here is a starter script. It includes an is_number function with extensive comments. Also, here is a sample input file and the output file that should be generated.
Since most of you come from a C/C++ background, you may be tempted to attack this problem using low-level manipulation, treating strings as character arrays. Resist this impulse; Perl has many very powerful string manipulation tools. Use them.
Some Perl tips:
- Your "shebang" line (see the starter script) should read:
#!/usr/bin/perl -w where /usr/bin/perl is the path to your local Perl executable (this is the correct path on the instructional Linux machines). The -w flag means "show warnings"; this provides useful debugging information should your script go wrong. For safety, many programmers leave -w on all of the time, even when they are not debugging.- On script startup, all command line arguments are held in an array called @ARGV.
- Remember that square brackets ([ and ]) are special characters in regular expressions, and must be backslash-escaped to \[ and \].
- The open function is used to open filehandles for both input and output. open is a very flexible function, and as a result the Perl manual documentation is rather verbose; for this assignment, all you need to know is that you can open a file for reading using
open INFILE "<$filename" || die "Could not open $filename for input: $!"; and you can open a file for writing usingopen OUTFILE ">$filename" || die "Could not open $filename for output: $!"; where $filename is a scalar variable containing (you guessed it) a filename. Notice the different < and > symbols that precede the filename.
By the way, the || die "error string: $!" is a common idiom for catching errors. The above instances print error messages if open does not succeed. It's actually a neat hack that relies on Perl's short-circuit boolean evaluation: if the call to open returns true (i.e., success), the second term in the || (boolean OR) statement does not need to be evaluated, so the script will not execute die. Conversely, if open returns false, the second statement must be evaluated, so the script will execute die.- The split function will come in very handy.
- When you read a line from the input file, you will probably want to chomp it to remove the newline.
- Finally, do consult the Perl resources we've compiled for you. They will help you as a Perl hacker, not just now but for the rest of your career.
P.S.: In real XML, you would want to escape all <, >, and & signs in the input, translating them to <, >, and & "escape sequences". We're not going to require you to do this, though it's a pretty trivial extension using Perl's string matching facilities. (Obviously, if you want to implement this extension, we won't take off points.)