iMacros for Retrieving Information from Multiple Web Pages

iMacros is a great tool for grabbing information from web sites. People often use it for grabbing the latest stock prices, or getting the latest prices on many products. Putting the data into a spreadsheet-compatible file is easy.

You can also use iMacros to fill out forms from a spreadsheet (.csv) text file.

iMacros is a free plugin for Firefox (or IE), or there is a more powerful paid version.

You specify the information to grab using a simple scripting language, that can search a web page for HTML tags or attributes. So, you could search for the the paragraph with an ID, and grab the text. Or look for a link that starts with a particular domain, and grab either the link or the text.

iMacros automatically handles looping through all the occurrences on a page (for example, each row of a table), the most common need.

The Firefox version lets you control switching tabs, to grab the data from one tab and write it to a form on another tab. (The form could process the data and save it to a database, or anything else you could program a form processing routine to do.)

But it doesn’t have a full programming language, with testing variables and calling subroutines based on the test. It doesn’t have FOR loops, or WHILE loops. So, you can grab all the prices on one page, or keep grabbing one value from pages as long as there is a “Next” link. But you can’t easily do both. You can grab and save All the values, but not save only the ones that meet your conditions.

Well, iMacros for Firefox has a Javascript interface. With that, you can have Javascript control the looping, testing for special conditions, massaging the data, checking for errors. You can pass values from Javascript to iMacros, and return extracted information to Javascript.

I’m going to show some code snippets for using loops to pass values to iMacros scripts and back to Javascript, and testing for errors.

So you have some idea what the script does, in my task, one web page lists the MapID of each Map in a section of the County. Another web page lists the Parcels contained in a specific Map. Yet another web page shows what I want, details about a Parcel. I then have iMacros put the details in a custom form, and submit the form data to a PHP program that writes it to a MySQL database.

Since each section of the county could have an unknown number of Map IDs, and each Map would have an unknown number of Parcels, and each Parcel has information on several web pages, and since there is so much information to collect it couldn’t be done in one day (probably non-stop gathering for 4 months!) and therefore I needed to have the scripts keep track of what had been completed, it was too complex for any iMacros script. But with the Javascript interface controlling 6 scripts, it works!

mapidnum = 1
/* Javascript uses labeled loops so you can break out of multiple loops */
loopm:
do {
iimSet("mapidnum", mapidnum)
/* get the nth map id */
iret = iimPlay("macro-that-finds-each-map-id.iim")
if (iret == -101) {
/* user pressed Stop button */
break loopm;
}
mapid = iimGetLastExtract() /* 849/849010100 */
if (mapid == "#EANF#") {
continue;
}
iimDisplay("TownshipRange "+t+"("+l+") SectionCount:"+s+" Section "+sections[s]+" Township "+tr[0]+" Range "+tr[1]+" MapIDnum: "+mapidnum+" MapID: "+mapid+" TotAPN="+totalapnsfound);
mapid = mapid.slice(4,13) /* 849010100 */
iimSet("mapid", mapid)
/* read number of parcel IDs for this map ID */
iret = iimPlay("macro-that-finds-how-many-detail-records-for-map-id.iim")
apnsfoundstr = iimGetLastExtract()
if (iret == -101) {
/* user pressed Stop button */
break loopm;
}
n = apnsfoundstr.indexOf(" APN(s) Found",0)
if (n != -1) {
apnsfound = parseInt(apnsfoundstr.slice(0,n))
} else apnsfound = 0
loopn:
for (n=1; n<= apnsfound; n++) { iimSet("parcelloop", n) iimDisplay("TownshipRange "+t+"("+l+") SectionCount:"+s+" Section "+sections[s]+" Township "+tr[0]+" Range "+tr[1]+" MapIDnum: "+mapidnum+" MapID: "+mapid+" TotAPN="+totalapnsfound+" APN "+n+"("+apnsfound+")") iret = iimPlay("macro-that-reads-detail-record.iim")if (iret == -101) { /* user pressed Stop button, break out of labeled loop (both FOR loop and DO...WHILE) */ break loopm; } if (iret == -933) { /* when disconnect WiFi get: -933 Error Loading Page */ /* do something to retry, or run a script PROMPT "Error Loading Page (WiFi Error?)" */ continue; } parceldash = iimGetLastExtract(1) /* have single parcel number (with dashes, maybe with split letter), next step needs parcel number without dashes */ parcel = parceldash.replace(/-/g,'');/* take all values so far, pass to iMacros to put into Form for writing to Database */ iimSet("county", county) iimSet("parcel", parcel) iimSet("parceldash", parceldash) iimSet("mapid", mapid) iimSet("township", tr[0]) iimSet("range", tr[1]) iimSet("section", sections[s]) iimDisplay("TownshipRange "+t+"("+l+") SectionCount:"+s+" Section "+sections[s]+" Township "+tr[0]+" Range "+tr[1]+" MapIDnum: "+mapidnum+" MapID: "+mapid+" TotAPN="+totalapnsfound+" APN "+n+"("+apnsfound+") Parcel:"+parcel); iret = iimPlay("macro-that-reads-parcel-data-puts-it-in-form-that-writes-to-MySQL.iim") } /* end for APNs found */ mapidnum++ } while (mapid != "#EANF#")

iMacros web site has full documentation on all the commands, including iimSet (sets a variable for use in an iMacros script), iimPlay (play an iMacros script), iimGetLastExtract (bring extracted data into Javascript), and iimDisplay (display text in a status area of the screen).

In iMacros scripts you would access the variables (set with iimSet).

' macro-that-finds-each-map-id.iim uses the parameter like this:
' Find the nth Map ID. The page displays a link to a PDF of the map; the Map ID is the text of the link.
TAG POS=R{{mapidnum}} TYPE=A ATTR=HREF:http://156.42.37.20* EXTRACT=TXT

' macro-that-reads-detail-record.iim uses the parameter like this:
' Find the nth Parcel Number, displayed as a link to the first page of the parcel's information (the County Assessor's page has links to the County Treasurer's info for this parcel).
TAG POS=R{{parcelloop}} TYPE=A ATTR=HREF:/Assessor/ParcelApplication/Detail.aspx* EXTRACT=TXT

Want Me to Help?

If this is giving you ideas for grabbing information you could use, I can probably write iMacros scripts to get what you need, into a database or web page. Contact Me so we can talk.


by

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.