Dynamically fetch web page contents in Excel

Rob Hammond
bloghammond
Published in
3 min readSep 23, 2011

Excel’s built-in web features are pretty frustrating when you want to do more with the web than import a static HTML table to a predefined set of cells.

I’ve often wanted to be able to update the contents of a cell based on dynamic parameters passed into a URL, and not found a decent, easy way of doing this. The official Office website shows you how to do this the Microsoft way, but lo and behold that doesn’t actually translate to real-world uses very well.

Say for example you want to fill a column of cells with the ranking for a given list of keywords, a function similar to that shown below (where the URL could potentially be defined in the E column) would be very useful:

Provided you had an API into ranking data (see Perl code below), this should be an easy operation, but this doesn’t seem to be something Excel does out of the box.

I’m sure there are tons of xll’s and paid for solutions out there to do the same thing, but I want something that’s flexible, and ideally free. Hence the user-defined function below can replace Excel’s awkward built-in parameter handling quite easily and for lots of different uses:

' Name the function anything you like - the variable parameters to pass in this instance are keyword, domain, number of results
Public Function getURL(kw As String, domain As String, num As Integer)
' Build the URL from the parameters passed in
URL = "http://www.example.com/api/script.pl?kw=" & kw & "&domain=" & domain & "&num=" & num
' Uncomment to debug the request URL
'Debug.Print "Request URL: " & URL
Dim objhttp As ObjectSet objhttp = CreateObject("MSXML2.ServerXMLHTTP.6.0")objhttp.Open "GET", URL, False
objhttp.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
' If your URL uses basic HTTP authentication (like mine did), uncomment the line below and replace the contents of [] with your base 64 encoded credentials
'objhttp.setRequestHeader "Authorization", "Basic " & "[base64_encode[user:pass]]"
objhttp.send ("")
Dim Response As String
Response = objhttp.responseText
' To debug the output into Excel's error console, uncomment the line below
'Debug.Print "GA data feed response: " Response
' Load contents of URL into the cell that's called the function
getURL = Response
End Function
I'm no VBA whiz or anything so I'm sure the above code could be tweaked vastly to make it faster, more robust, and more flexible for variable numbers of query, different query types etc (open to suggestions :)), but hey, it's a quick & easy way to solve a problem.
The Perl code: as mentioned above, the VBA code was written specifically to grab ranking data from an API. Hence there's no fancy XPath expressions in the VBA, or even any attempt to parse the output. This is another potential improvement, but I try and keep Excel's interactions with the web & text processing to an absolute minimum because Excel + teh internets = sloooow. In this instance, the script output is literally just the numeric data, so any processing is done by the faster Perl code.The code below relies on a custom scraping library (SearchMarketing::Crawl::GoogleNatural) - I'll leave that code up to you to re-create ;)It'll only check the top 100 results but for most purposes that's more than enough. If there is more than one result for the given domain, it'll print the numbers separated by an ampersand and can then be manipulated in Excel if necessary (I'll just about trust Excel for that...)#!/usr/bin/perl -wT
use strict;
use WWW::Mechanize;
use SearchMarketing::Crawl::GoogleNatural;
use CGI qw(:standard);
my $kw = param("kw");
my $num_results = param("num");
my $domain = param("domain");
print header;getPos($kw,$num_results,$domain);sub getPos {my $kw = $_[0];
my $num_results = $_[1];
my $domain = $_[2];
my %results = googleUK($kw, $num_results);
my $count = 0;
foreach my $position ( keys %results ) {
my $title = $results{$position}[0];
my $snippet = $results{$position}[1];
my $dest_url = $results{$position}[2];
my $disp_url = $results{$position}[3];
if ($dest_url =~ m{^https?://[^/]*$domain/?.*$}i) {if ($count > 0) {
print "&";
}
print $position;
$count++;
}
}
if ($count == 0) {
print "N/A";
}
}

--

--

Rob Hammond
bloghammond

PS Director, EMEA at BrightEdge. Technologist, SEO, renewable energy enthusiast