Parse your Apache logs to Mysql using PHP

There are many web analysis tools available, for example, the excellent Webalizer. However, there are times when you just want to be able to search through your logs for specific conditions. At times like these, the power of a database comes in handy, but how do you get your log data in there in the first place?

You *could* use mod_log_mysql or one of it’s variants, but they involve recompiling apache from source. Having been through all that once, and got your site up and running it is probably not something you want to go through again in a hurry. This solution will take your historical data, already written to the filesystem in apache’s “combined” log format, and insert it into a mysql table for you.

Ths table should be in the following format*.

DROP TABLE IF EXISTS `tat`;
CREATE TABLE `tat` (
  `client` varchar(15) collate latin1_general_ci NOT NULL default '',
  `logname` varchar(255) collate latin1_general_ci NOT NULL default '',
  `user` varchar(255) collate latin1_general_ci NOT NULL default '',
  `date` varchar(35) collate latin1_general_ci NOT NULL default '',
  `method` varchar(10) collate latin1_general_ci NOT NULL default '',
  `uri` varchar(255) collate latin1_general_ci NOT NULL default '',
  `protocol` varchar(20) collate latin1_general_ci NOT NULL default '',
  `r_status` varchar(5) collate latin1_general_ci NOT NULL default '',
  `r_bytes` varchar(10) collate latin1_general_ci NOT NULL default '',
  `referrer` varchar(255) collate latin1_general_ci NOT NULL default '',
  `agent` varchar(255) collate latin1_general_ci NOT NULL default ''
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;

And now, the script…

#!/usr/bin/php -q

After configuring the first 6 lines, you can run this script. I’d probably recomend doing it daily, immediately after a logrotate, although this will not give you 100% up-to-the-minute stats. In most cases, though, this is good enough.

*NB. The table layout in this example needs tweaking. It is the first run of the draft, and I was not too choosy about column specifications. I’d recommend you take a look at the data and determine your own layout.