Posts Tagged ‘IANA’

What American Accent do I have?

December 2nd, 2006 by smp | Comments | Filed in Canada, Life
What American accent do you have?
Your Result: The Midland

“You have a Midland accent” is just another way of saying “you don’t have an accent.” You probably are from the Midland (Pennsylvania, southern Ohio, southern Indiana, southern Illinois, and Missouri) but then for all we know you could be from Florida or Charleston or one of those big southern cities like Atlanta or Dallas. You have a good voice for TV and radio.

The West
Boston
North Central
Philadelphia
The Northeast
The Inland North
The South
What American accent do you have?
Take More Quizzes

This is good to know. I have been spending years trying to make my speech sound as non-descript as possible. It was interesting this summer to go home and talk to my family. They have a uniquely Southeastern BC/Southwestern Alberta accent. Hey, even Canada has unique accents.

You don’t think so? Well, the stereotypical “Canadian” accent that is portrayed in the US popular media is actually a Southern Ontario / Toronto Area accent. It is easily recognizable and immediately pegs you in the rest of Canada as someone who should be distrusted as a likely carpetbagger. ;-)

The only spoken/written giveaways I have retained are:

– “a- bowt” and “owt”
– “prawcess”
– “sheduel”, although I do alternate this with “skeduel”
– I spell “theatre”, “centre”, “colour”, etc. correctly

However, if you call me American, I will be very quick to correct you.

Tags: , , , , , , , , , , , , , ,

Newest Industry Banner

February 10th, 2006 by smp | Comments | Filed in Life

If you are wondering about the picture at the top of this page (sorry for you folks in RSS readers, gotta head to the blog to see it), I can tell you one thing.

It is real.

[PS: Photo provided by Chris Magnusson.]

I grew up in the shadow if this mountain called, oddly enough, Mount Seven. The Seven is a natural event that only appears in the spring as the snow recedes from the summit.

The other claim to fame for this peak is among paragliders and hang-gliders. If you ask anyone who partakes in these sports if they have ever heard of Mount Seven, their eyes will likely glaze over and drool will run from their mouths,

Why?

Well, the banner doesn’t do the mountain justice from the perspective of a non-powered flight aficianado. This perspective for Google Earth give you a better idea.

When you depart Mount Seven, it is a free and clear 5,000 foot drop into the valley below. And it’s not just any valley; it’s the Rocky Mountain Trench. At a minimum of 5 miles wide, the flying is free and clear. One glider took off from the mountain and ended up in Montana.
Mount Seven is a glorious landmark, and one of the few fond rememberances I have of the town I grew up in.

Technorati Tags: , , , ,

Tags: , , , , , , , , , , , , , , , , , ,

Geographic IP database using PERL, PHP and MySQL — UPDATE: September 16 2008

November 8th, 2005 by smp | Comments | Filed in GrabIP, Technology, Web Performance

Updated September 16 2008 to reflect the numerous changes that have resulted since the original article was posted in 2005 - smp


Targeting Web site content to the specific visitors who view the site is a very important marketing advantage. Being able to track incoming visitors by the country that they originate from is an additional item that can assist companies in ensuring that visitors are presented with relevant content. This may seem like a daunting task, but it can be achieved with a high degree of accuracy using publicly accessible data, and Open Source software.

IP to Country Mapping

The idea for IP to Country mapping is one that has started to appear more frequently on the Internet in recent months. All GeoIP systems do warn users that they are not 100% accurate. The accuracy of GeoIP mapping can be affected by things such as large corporate or ISP networks where traffic is routed out a small number of public access points, regardless of the traffic’s point of origin.

Making IP Addresses Searchable

The first issue that needs to be addressed is how to determine if an IP Address is in one of the ranges that is defined as originating from a distinct country. The simplest way to range-match IP Addresses is to abandon the dotted-quad notation we are all familiar with, and convert the IP Address to an IP Number.

All IP Addresses can be converted into decimal numbers that fall into a known range between 0 and (2^32)-1 (4294967295). In reality, the range is even smaller than that, as public IP Addresses fall between 0.0.0.0 (IP Number: 0) and 223.255.255.255 (IP Number: 3758096383).

A quick search of the Web showed me that there is a way to create functions to convert IP Addresses to their numerical (IP Number) equivalent and reverse the process.

sub ip2long {
	return sprintf("%u",unpack("l*", pack("l*", unpack("N*", inet_aton(shift)))));
}

sub long2ip {
	return inet_ntoa(pack("N*", shift));
} 

<--snip-->

if ($start_ip =~ /\d+\.\d+\.\d+\.\d+/) {
         ## my $ip_address = shift($start_ip);
         chomp($start_ip);
         $long_start = ip2long($start_ip);
         ## print "$ip_address converts to $ip_number\n";
 }

If you are using PHP in your applications, this conversion process is made even easier by native function calls.

Convert IP Address to IP Number
$ip_number = sprintf("%u",ip2long($ip_address));

IP Address Location Data

Now that we have settled on a format for the IP data to be used in the database, we now have to find IP data that allows us to map IP Addresses to countries. This is easier than it sounds, as this data is centrally held by the 5 regional IP Registries — ARIN, RIPE, APNIC, LACNIC, and AFRINIC. After poking around in the depths of their Web sites, I found that they actually provide text formatted versions of the allocated and assigned IP ranges that they are responsible for. All of the registries use the same format, which makes parsing the these files a simple process.

As of March, 2005, the distribution of IPV4 networks in the database by registry is:

+----------+--------------+
| registry | num_networks |
+----------+--------------+
| arin     |        36073 |
| ripencc  |        14813 |
| apnic    |        10474 |
| lacnic   |         1460 |
| afrinic  |          443 |
+----------+--------------+

5 rows in set (0.52 sec)

I chose to use the PERL module WWW::CURL to retrieve the files. You could re-write the application to use LWP or some other method on systems where cURL is not supported, as it is a simple file download over FTP. I update the data once a day, which may at first appear excessive. However, I have seen upwards of 40-50 new rows added to the database in a single day.

Some may ask why I chose to write the downloaded files to a file rather than immediately inserting them into the database. Using this two-step process gives me the ability to manually rollback to an older database if there is a problem retrieving one of the registry files. I have set an arbitrary limit of 75,000 lines for the entire aggregated file; if the file is less than that, the remainder of the process is aborted.

The data retrieved from the registries is in the following format.

Registry Raw Data Format

<snip>
apnic|CN|ipv4|202.127.4.0|256|19950610|assigned
apnic|BN|ipv4|202.160.0.0|2048|19950610|allocated
apnic|NP|asn|4613|1|19950611|allocated
apnic|LK|ipv4|203.143.0.0|1024|19950612|allocated
apnic|MO|asn|4609|1|19950615|allocated
apnic|KR|asn|4670|1|19950616|allocated
apnic|SB|ipv4|202.63.254.0|512|19950618|assigned
apnic|JP|ipv4|202.232.0.0|262144|19950618|allocated
apnic|SG|ipv4|203.127.192.0|8192|19950618|allocated
apnic|PK|asn|4615|1|19950629|allocated
apnic|HK|asn|4614|1|19950704|allocated
</snip>

The fields are all “|” (pipe-character) separated, and are described below.

COLUMN		VALUES
---------------------------------------------------------------------
REGISTRY:	apnic,arin,ripencc,lacnic,iana
COUNTRY_CODE:	One of 240 unique 2-character country codes or "*"
ADDRESS_TYPE:	asn,ipv4,ipv6
ADDRESS:	Either the starting IP Address or AS Number or "*"
NUMBER:		Number of IPs in range or "1" if ADDRESS_TYPE is "asn"
DATE:		Date IP range or AS Number was added to database or "*"
RANGE_TYPE:	"allocated" -> borrowed; "assigned" -> owned

Storing the Data in MySQL

To store the data, I created a two-table MySQL database named “ip_registry”, using the script below.
Database creation statement for ‘ip_registry’

CREATE DATABASE ip_registry;

Table structure for table ‘country_code’

CREATE TABLE ip_registry.country_code (
code char(2) default NULL,
country varchar(50) default NULL,
UNIQUE KEY code (code)
);

Table structure for table ‘ip_map’

CREATE TABLE ip_registry.ip_map (
code char(2) default NULL,
registry char(10) default NULL,
ip_from double default NULL,
ip_to double default NULL,
UNIQUE KEY registry (registry,ip_from,ip_to)
);

As of March 2005 September 2008, the “ip_map” data table for my system runs to 63,263 88,000+ rows. This value will change daily, and may decrease suddenly at times. The registries make an effort to aggregate as many IP networks as possible into the largest possible contiguous block, and this aggregation process will reduce the number of individual entries by 2,000 - 3,000 rows in a single day.

The recognized standard for country codes is ISO 3166. In this standard, each nation is assigned a unique, two-character code. The ONLY exception I found to this rule is that, for historical reasons, the IP registries have entries for the United Kingdom listed with two country codes (GB and UK). I could have corrected this in the Perl script by standardizing on a single country code, but I preferred the solution of adding another row to the “country_code” table.

From the raw Registry data, I determined that only four of the fields useful for the project that I was working on: REGISTRY, COUNTRY_CODE, ADDRESS, and NUMBER. I then wrote PERL code to read the raw IP Registry data from the data file I created previously, convert the starting IP address to a number, use this starting IP Number that to generate the end IP Number, and then insert the rows into a database.

PERL: IP Number conversion and database insert

<--snip-->

if ($line_count >= 115000) {

## Establish Database Connection

print "\n\nOpening database connection";

my $dbh = DBI->connect("DBI:mysql:host=[host];database=logger","[username]","[password]",{PrintError=>0});

## Remove existing values
my $sth = $dbh->do("TRUNCATE TABLE ip_map");
print " --> Data from ip_map table dropped";

$sth = $dbh->prepare("INSERT into ip_map values (?,?,?,?,?,?)");

$count = 0;

print " --> Completed\n";

## Insert Data Into Database

print "Inserting data into the database";

open (PROCESS, "<$file");
  while ($line =
) {
        chomp ($line);
        if (($line =~ m/\|ipv4\|/) and ($line !~ m/\|\*\|/)) {

                ($registrar,$country_code,$item_type,$start_ip,$num_ip,$entry_date,$registry_type) = split(/\|/, $line);

                $long_start = 0;

                if ($start_ip =~ /\d+\.\d+\.\d+\.\d+/) {
                        ## my $ip_address = shift($start_ip);
                        chomp($start_ip);
                        $long_start = ip2long($start_ip);
                        ## print "$ip_address converts to $ip_number\n";
                        $long_end = $long_start + ($num_ip-1);
                        $count += $sth->execute($country_code,$registrar,$long_start,$long_end,$num_ip,$start_ip);
                }
        }
  }

}

<--snip-->

The “TRUNCATE” statement in the script has the affect of dropping the table and re-creating it using the column names and types defined in the initial create statement. It is easier to rebuild the data table each time new data is inserted to ensure that duplicates and overlaps do not enter into the database.

Why is the value of “$long_end” defined by “$long_start + ($num_ip-1)”? The IP address ranges delivered by the registries count the starting value as one of the items in the set — i.e. counting using ordinal numbers.

START_IP:   12.236.236.0
END_IP:     12.236.236.255
NUMBER_IP:  256

If cardinal numbering is used to calculate the address range, incorrect IP addresses will be generated.

IP Number Calculations
WRONG!	216853760 = 216853504 + 256 	-> END_IP = 12.236.237.0
RIGHT!	216853759 = 216853504 + (256-1)	-> END_IP = 12.236.236.255

I have also added a sanity-check that stops the insertion process if the number of lines in the data file is less than 75,0000 115,000. This would prevents the creation of a truncated database if one of the registries does not update their data files or the script is unable to retrieve the data files. The value of 75,000 115,000 appears high, but the data files that the data is extracted from also contain autonomous system and IPV6 data, as well as the IPV4 data that is inserted into the database. Currently, the data file runs to over 90,000 130,000 lines, so the 75,000 115,000 line barrier seems very reasonable to prevent inserting a broken dataset.

Querying the Database

Now that the database is constructed, we can start to run queries against it.

mysql> select ip.code,ip.registry, ip.ip_from, ip.ip_to, co.country
-> from ip_map ip, country_code co
-> where (ip.code = 'IS') and (ip.code = co.code);

+------+----------+------------+------------+---------+
| code | registry | ip_from    | ip_to      | country |
+------+----------+------------+------------+---------+
| IS   | ripencc  | 1049722880 | 1049731071 | ICELAND |
| IS   | ripencc  | 1359937536 | 1359970303 | ICELAND |
| IS   | ripencc  | 1383088128 | 1383096319 | ICELAND |
| IS   | ripencc  | 1385447424 | 1385455615 | ICELAND |
| IS   | ripencc  | 1390215168 | 1390280703 | ICELAND |
| IS   | ripencc  | 1403846656 | 1403863039 | ICELAND |
| IS   | ripencc  | 1433681920 | 1433690111 | ICELAND |
| IS   | ripencc  | 1439023104 | 1439039487 | ICELAND |
| IS   | ripencc  | 1440481280 | 1440514047 | ICELAND |
| IS   | ripencc  | 2644312064 | 2644377599 | ICELAND |
| IS   | ripencc  | 3238264832 | 3238330367 | ICELAND |
| IS   | ripencc  | 3245150208 | 3245154303 | ICELAND |
| IS   | ripencc  | 3261718528 | 3261726719 | ICELAND |
| IS   | ripencc  | 3264217088 | 3264282623 | ICELAND |
| IS   | ripencc  | 3556884480 | 3556892671 | ICELAND |
| IS   | ripencc  | 3558785024 | 3558793215 | ICELAND |
| IS   | ripencc  | 3565084672 | 3565092863 | ICELAND |
| IS   | ripencc  | 3584524288 | 3584532479 | ICELAND |
| IS   | ripencc  | 3585114112 | 3585122303 | ICELAND |
| IS   | ripencc  | 3585433600 | 3585441791 | ICELAND |
| IS   | ripencc  | 3586023424 | 3586031615 | ICELAND |
| IS   | ripencc  | 3587538944 | 3587547135 | ICELAND |
| IS   | ripencc  | 3587981312 | 3587997695 | ICELAND |
| IS   | ripencc  | 3641278464 | 3641282559 | ICELAND |
| IS   | ripencc  | 3642535936 | 3642540031 | ICELAND |
| IS   | ripencc  | 3650592768 | 3650596863 | ICELAND |
| IS   | ripencc  | 3650596864 | 3650600959 | ICELAND |
| IS   | ripencc  | 3651915776 | 3651919871 | ICELAND |
+------+----------+------------+------------+---------+

28 rows in set (0.09 sec)

So the database structure is sound. It is important to build the file using all four five registries; even though Iceland is now covered by RIPE, older IP allocations and assignments were been handled by both RIPE and ARIN.

Having the registry information helps build in the flexibility to add a WHOIS functionality using this database, something that I have done on my site (GrabIP). This allows for further drilldowns on the data, beyond the scope of this article.

The main item that will be of interest to most Web site administrators is that they can now build dynamic pages using a data source which tracks their visitors’ announced IP address to the country of origin with a high degree of accuracy. This is particulary useful if you are attempting to distribute users to geographically diverse mirror sites. You can also do fun things, such as displaying the flag of the country that the visitor is coming from, something I have implemented on my personal site.

A generic copy of the registry retrieval and database insertion script can be found here.

Tags: , , , , , , , , , , , , , , , ,

FEMA: Yes, we are really stupid

November 6th, 2005 by smp | Comments | Filed in Blogging

FEMA sends a bill for $3.7 billion to the state of Louisiana

Hmm…I would refuse to pay on the grounds that no usable services were provided by FEMA. In fact, I think that Louisiana should send FEMA a bill.

Via: CCUCEO

Tags: , , , ,

Rove and Bush: Staging Photo-Ops, Disposing With Lives

September 5th, 2005 by smp | Comments | Filed in smp

This is enough. Do something.

Read this. Then come back to this forum and tell me how you can support this man.

He and his cronies claim to support the little people. So far, the poor and desperate have suffered the indignity of having their humanity stripped from them by their own government.

Yesterday, we saw a car from Louisiana pass us on the 495. Samantha and I both thought poor buggers. Yet now, I realize that they were driving a very expensive minivan. They weren’t suffering. And they were likely staying in hotels.

The United States should be ashamed of their President, their government, and what they have down to their own people.

What will you do to bring humanity back to the United States.

Tags: , , , , , , , , , , , , , , ,

Oh Look! A Dinosaur! Call Bill Gates!

May 23rd, 2005 by smp | Comments | Filed in smp

Dr. Blaise Cronin (the name sounds like the nom de plume of an agent provocateur), author of a justly ridiculed and narrow-minded critique of blogging, has re-appeared just in time for the Summer Solstice. [here and here]

This man is an island. And he has lost the perspective that come from spending too much time in an ivory tower in Indiana. Pamphleteers and Gutenberg began this revolution. Apparently, in Dr. Cronin’s view, the only valid word is the word on paper.

Dr. Cronin: if that is the case, I will be happy to discuss the dismantling your library’s computer system so that you can move back to index cards in those sexy wooden shelving units which became all the rage a few years back when those systems were phased out.

Libraries are repositories of information in all its forms. For my generation, books are great, but they often slow down the process of learning as much as we can as quickly as we can, so that we can keep our jobs.

Or has the benefit of tenure made you complacent and snobbish at the same time.

PS: Thanks for visiting my blog — your personal computer at has a distinct hostname.

Tags: , , , , , , , , , , , , , , , , , , ,

Open Minds in Academia

May 4th, 2005 by smp | Comments | Filed in RANTING

The Dean of the School of Library and Information Sciences at Indiana University, Blaise Cronin, maintains an open mind about blogging. [here]

One wonders for whom these hapless souls blog. Why do they choose to expose their unremarkable opinions, sententious drivel and unedifying private lives to the potential gaze of total strangers? What prompts this particular kind of digital exhibitionism? The present generation of bloggers seems to imagine that such crassly egotistical behavior is socially acceptable and that time-honored editorial and filtering functions have no place in cyberspace. Undoubtedly, these are the same individuals who believe that the free-for-all, communitarian approach of Wikipedia is the way forward. Librarians, of course, know better.

Books are great. I still love to handle a book or magazine. But almost everything I have learned in the last 2 years has come from the Web. I have learned more about Sales and Marketing through blogs than any MBA program. I have learned how to quickly and clearly communicate complex ideas by putting my ideas on the Web for commentary.

Libraries will always exist; attitudes like Dr. Cronin’s will not.

Via Kevin Briody

Tags: , , , , , , , , , , , , , , , , , , , , , , ,