Postscript: Turns out this method using Weather.gov’s XML feed is not as effective as using their HTML feed. Check out the HTML sequel here.
One of my most popular posts is on retrieving real-time weather into Excel.
A common question here is how to get recorded weather updates; that is, keep a log of the weather updates for a specific place.
For that I will write an R script to parse weather.gov’s XML feed for a location and then use the task scheduler to automate going to the site, pulling the weather info & appending to our historical records.
Let’s get started. First time using R? Check out my free course, “5 Things Excel Users Should Know About R.”
1. Read our XML site
We will use the xml2 library to read in our .xml feed from the web and parse the various strings we are interested in. This is similar to what we did with WEBSERVICE and FILTERXML in our Excel example.
First, let’s load the xml2 library and read in our site. I will be using the feed from the Akron airport for this example.
I am also going to create a blank data frame called “history.” You’ll see why in a minute:
library(xml2) #first time set a blank history data frame history <- c() #read in the web file weather <- read_xml("https://w1.weather.gov/xml/current_obs/KAKR.xml")
2. Parse our nodes of interest
Next we will parse our nodes of interest using the xml_find_first function. I will use XPath to select these nodes – if you remember from our previous post, these will always start with “.//” and then the tag.
After that, we use xml_text and related functions to extract the information. Because our temperature variable is numeric, I use xml_double instead.
#get the temperature temp_f <- xml_find_first(weather, ".//temp_f") temp_f <- xml_double(temp_f) #get the weather description temp_desc <- xml_find_first(weather, ".//weather") temp_desc <- xml_text(temp_desc) #get the observation times obs_time <- xml_find_first(weather, ".//observation_time_rfc822") obs_time <- xml_text(obs_time)
3. Merge the real-time update with your historical records
Now you will see why we set up the blank data frame above. Each time we run this script we will want to append our new update to our historical records. Of course at this point we don’t have any history, so the data frame is blank. That will change after this, and we will modify our code accordingly.
For now, though, I will combine our three pieces of real-time information using the cbind function (which binds columns), then I’ll append this to our historical records using rbind (which binds rows).
At this point you can write the file to a .csv file if you’d like.
#put this info together realtime <- cbind(temp_f, temp_desc, obs_time) #merge the realtime and historical historyupdated <- rbind(history, realtime) #write to a csv if you want write.csv(historyupdated, "C:/RFiles/historyupdate.csv", row.names = FALSE)
4. Save the R data and set it up to load next time
We want to save this log as an R file to read up as our historical file next time. To do this I will save an RDS file (this is R’s internal data record file extension).
So next time instead of starting up with a blank historical data frame, we’ll read this log in instead.
#RESAVE THE FILE saveRDS(historyupdated, "C:/RFiles/weatherlog.rds") #AFTER THAT, load up the history history <- readRDS("C:/RFiles/weatherlog.rds")
See the full code below to get a better sense of how this works.
One last thing… Save your R Script somewhere convenient and simple. For example I have mine in C:\RScripts. This will make the next step easier.
5. Automate retrieval with the task scheduler
The above script is pretty cool but requires the user to run it at regular times, which would get tedious.
Instead we are going to use the Task Scheduler in Windows to automate running this code.
Find the Task Scheduler using your Windows search bar and head up to Action | Create Task.
Name the task what you’d like. I’d also suggest you set to “Run with highest privileges.” Remember that this script will only run when you are logged onto your computer (unless you say otherwise) and will certainly not run if the computer is off.
Next, go to Triggers. We are going to automate this script to run every hour, which is how often the weather is updated at this location. I will set the task to repeat every hour for an indefinite duration.
Almost there! This next part takes some practice. Go to Actions and create a new action.
Here you first select where your Rscript.exe file resides on your computer. For me it is C:\Program Files\R\R-3.4.3\bin\j386\Rscript.exe. Whew! Fortunately you can browse to this location.
Under Add arguments, type the name of the R script from above. Mine is named WeatherLog.R.
Now here is why I suggested you keep the file somewhere simple. Under Start in, you will put the folder where this file is located.
Now the weather comes to you.
So long as nothing in your system changes, you will get weather updates on the hour delivered to your CSV file. From here you could feed this into a workbook using Get & Transform. Bonus — from here you can use the Query Editor to transform the observation time column to your preferred format.
Leave a Reply