So today to have some fun i decided to rip data from a site that returns a street and city for any postcode. They have this service on the website but that's only intended for retrieving data for 1 address at a time for private use. To use this function commercially you have to get a paid subscription from them.
Now to protect their data from just being ripped from the site they have obfuscated their javascript; i.e. encoded the source so its unreadable until processed by javascript. The javascript generated source is then perfectly readable for the end user. The point being we cant simply read out the data with php and strip the required data from it. The only (easy) way to get at javascript generated source is with javascript. But then second problem: javascript cant get generated source from a different domain.. hmmm… but I got it working of course so here is what i did:
first: made a file with some fields into which you can enter postal code and number (note this only works with Dutch addresses). After entering a postal code (with valid format, 4 nums and 2 letters) and a number these 2 variables are passed to a php script in an iFrame. the php file sticks the data into an URL query to the site in question, gets the source content and echos it into itself. Thus in effect loading the page of data into the iFrame (which is hidden) on the same domain as the javascript that will retrieve the data. After a 2 second delay (just to be sure its loaded) we retrieve the generated code with javascript. Process it with a load of splits (no regex here because its a simple proof of concept) until we only have the data we need and spit it back out into the remaining fields.
works like a charm
http://www.thomassmart.com/Sandbox/nlpostcode/
Comments