Working with structured data

Comments

In this column we look at writing data to files in a structured format and how to read it back in that format.

Python vs. PHP: Choosing your next project's language

Writing To Files

To read files, we have focused on the fgets() function. This function relies on a file descriptor in order to know from which file to read, and which line of a file to read. Likewise, we will use the appropriately named fputs() function to write to a file. The following example shows how easy it is to use fputs().

View this script from your PHP-enabled Web server (see the cover CD for information on setting up a PHP-enabled Web server if you do not have one). This script writes three lines of data to a file: a separator, the time and the number of bytes written to the file on the second line (collectively, an 'item'). It also outputs this final line to the Web user.

Notice that the script opens fputs.data in mode 'a' - that is, opened for writing only, created if it does not exist with a file pointer placed after the last line of the file. This means that when fputs.php is viewed a second time, the latest time will be appended to the end of fputs.data. A useful exercise would be to change this script to place the most recent write to fputs.data at the beginning of the file.

You may also have noticed that the file pointer is closed with the fclose() function. It is important that this takes place when writing to a file opened with fopen(), since if the file pointer is not closed all data written to the file will be lost.

Reading Structured Data

In order to read back the data from fputs.data one item at a time, we will need to approach the reading of the file differently than last month. This requires the construction of an algorithm that locates the first line of an item by the "--\n" separator and assumes that the next two lines are the time and bytes written strings (more sophisticated algorithms in later months will not make such assumptions). This kind of algorithm is called a parser (pronounced 'par-zer').

<HTML>
<BODY>
<TABLE WIDTH=500 CELLPADDING=1>
<?
$fp = fopen("./fputs.data","r");
while(($s = fgets($fp,1024)) && !feof($fp)) {
if($s[0] == '-' && $s[1] == '-') {
if(($time = fgets($fp,1024)) && ($bytes = fgets($fp,1024))) {
echo "<TR>\n<TD>\n$time</TD>\n\n";
echo "<TD>\n$bytes</TD>\n</TR>\n";
} else {
echo "<TR><TD>Data file corrupted!\n</TD></TR>";
}
} else {
echo "<TR><TD>Data file corrupted!\n</TD></TR>";
}
}
?>
</TABLE>
</BODY>
</HTML>

Save this as parser1.php in the same directory as the fputs.data file and request it from your Web server. All the items you have created with fputs.php are returned in an HTML table, which reflects the structure of our data.

parser1.php introduces a few new features of PHP as well as the concept of parsing. To look at this step by step: first, the script opens fputs.data file for reading and places the file pointer at the beginning of the file. It then enters into a loop reading one line at a time. The conditions for the loop continuing are a) that the call to fgets() returns a true result (that is, it reads from fputs.data) and b) the current file pointer is not at the end of the file.

These two conditions are conjoined by a 'logical AND' - &&. That is, only when both conditions are true is the loop executed (a logical OR, denoted by ||, would have allowed either value to be true for the loop to be executed). See www.php.net/manual/en/language.operators.logical.php for more information.

Next the script checks if the first and second characters returned by fgets() are dashes ("-"), thereby matching the item separator. The script achieves this by comparing the first byte of the string s ($s[0]) to dash and the second byte ($s[1]) to dash. If both match dash, the script continues to parse the item; otherwise, it assumes the data is corrupted and tells the Web reader this.

If the script has found an item, it reads the time line and the bytes written line. Like before, it only continues if both results are true. The script then outputs the time and bytes written lines intermixed with HTML tags.

Notice how simple it is to embed HTML into your scripts in this way. If you cannot work out how parser1.php is constructing the HTML table, have a look at the HTML source in your Web browser - you will notice that the loop constructs all the table cells.

The final point of interest in parser1.php is the number of bytes it reads from each line. Why 1024? The simple answer is that we do not want to assume that all lines will be short. This assumption would cause chaotic results if the fputs.data files became corrupted, effectively breaking the data corruption detection.

Join the newsletter!

Error: Please check your email address.

More about Hewlett-Packard Australia HP ING Australia