There’s always something to howl about.

Speaking in tongues: Parsing structured data on the fly

This is not ProjectBloodhound material, at least not first semester stuff. But if you find yourself running into highly structured data — such as the reports from a spreadsheet or a database application — you have the ability to easily manipulate that data in PHP.

This is a simple example, but you don’t have to limit yourself to doing simple things. Imagine a data structure like this:

Name[tab]Phone Number
Cathleen Collins[tab]602-369-9275
Greg Swann[tab]602-740-7531

In the file the code shown here as “[tab]” would be an actual tab character, and this kind of data goes by the arcane name of: A tab-delimited file.

Most programming languages were written by exacting people with abstract and elegant reasons for everything they did. PHP was written by overbooked programmers who needed to pound out new web pages as quickly as possible.

In consequence, PHP is optimized for dealing with highly structured data. Here is a short program that will take a tab-delimited phone number file as input and output reformatted phone numbers into the HTML stream. In other words, this code could produce a dynamically-updated phone list in what what might otherwise be a static web page:

<?PHP
auto_detect_line_endings;

$fi = fopen("PhoneNums.txt","r");
$line = fgets ($fi, 4096); // throw away fieldDef line

echo ("<b>Phone Numbers</b><br>");

while (!feof($fi))
    {
    $line = fgets ($fi, 4096);

    list ($Name, $Phone_Number) = explode ("\t", $line);

    if ($Name)
        {
        echo ("$Phone_Number <i>($Name)</i><br>");
        }
    }

fclose ($fi);
?>

There is one line that makes all the difference for this kind of work:

    list ($Name, $Phone_Number) = explode ("\t", $line);

The stuff between the parenthesis are our known field names, and we’re using them as variable names for clarity’s sake. The explode function will create an array of separate fields from the text stored in the $line variable, splitting the fields on the tab character. The list function then inherits the array just created by explode and assigns each field to the appropriate field name variables. We only have two fields in this case, but I have a variation on these ideas that parses an MLS database that contains 213 fields per line of text.

Once we have the fields assigned to the right variables, it’s duck soup to represent the data in whatever format we wish. Alternatively, we could write a new file out to disk. The routine that parses the MLS database writes XML files to disk using a few dozen of the available fields — and throwing the rest away.

In fact, from here it’s very easy to write XML files, such as those used by Realty.bots. Say so if you want to see a demonstration.

But there is a lot more that you can do with software like this. It’s common, when you get data that is almost what you want, to try to edit it in word processors or text editors. A parsing tool like this enables you to take complete control over the data, echoing it back as perfectly-formatted HTML or writing a formatted file out to disk.