About a week ago I got an e-mail from a reader of the site asking me for help using “Power Query” to pull some Fangraphs data into Excel. Power Query is an add-in for Microsoft Excel that offers more advanced data importing options and ability to combine data from different resources.
I knew Power Query existed. But as I was reading the e-mail, my palms began to sweat and an overwhelming sense of guilt washed over me.
“I don’t know anything about Power Query!!!”
Coincidentally, my wife was out of town for the weekend and with the girls in bed early, I had a handful of hours on Saturday night to give myself a crash course in how to use Power Query (ah, the exciting and glamorous nightlife of a baseball nerd!).
And now I might be hooked.
Tables, Tables, Tables!!!
I recently wrote a post (about importing a CSV file into Excel) that included a list of benefit to using Excel tables.
But I missed a really important one… If you import your data into Excel as a table, you create a connection to the data that is linked and can be updated automatically.
Let that sink in for a minute.
I’ve shown you how to make a lot of the Excel file’s that are isolated, dead, and not directly linked to any outside information.
I might have you download some data. Then copy and paste it into Excel. And then convert it into a table. But this is not ideal. The only way to update that information is to manually download it, open the file you downloaded, copy the data, paste it into your file, and cross your fingers that none of your formulas break when you paste over the top of everything.
If we can start importing data directly into Excel as tables, rather than copying and pasting data manually, we can maintain the link to the original data and then very easily update it in the future. And Power Query is capable of helping us do that. It gives more options to create live links to data sources and better options to manage those connections.
Imagine not having to rebuild a new rankings and dollar value file from scratch EVERY season. If you set the file up intelligently, you can use the same file to quickly get in-season values or to update the file for the next season in only a couple of minutes.
Power Query and the Power of Tables
In my limited use of Power Query so far, the thing that has me most excited is that it gives you the ability to import more data sources as tables. We have previously looked at how to use web queries to get information into Excel, but if you use a basic web query, the information does not come in as a table.
Granted, a web query does still leave a live link to the original data. But I want the best of both worlds. I want a live link to the original data AND to import it as a table!
There is a Catch
When using a standard web query (outside of Power Query), you do have the option to import the entire web page. This is messy and loaded with complications, but it’s helpful to have the option.
There is no such option in Power Query. You can only use Power Query to web query actual HTML tables from a web site. My educated guess is that to set something up as a table in Excel requires a neat and structured block of data, which querying an entire web page is not.
This is unfortunate, because some really great sites like Baseball Press don’t use tables to present their data. Instead, they use the division (< DIV >) HTML tag.
Downloading Power Query
Despite Power Query not being the silver bullet we need to resolve all our data needs, it’s definitely a tool worth having in the arsenal. And it’s free!
You can download the add in from this page. There are some restrictions you should know about. The program requires at least Windows 7 (sorry again Mac users… you should really look into partitioning your Mac to run Windows).
You also need to be running Excel 2013 (any version) or Excel 2010 “Professional Plus”. After you’ve downloaded the installation, close out of Excel and proceed through the installation. The new toolbar on the ribbon should appear the next time you open Excel. If you’re not seeing a “Power Query” tab, you may need to activate the add-in. Check out the instructions here on how to turn on the add-in (look for the section labelled “My Power Query Tab Disappeared”).
Making Your First Web Query With Power Query
If you’ve read any of my previous pieces on web queries, this really isn’t that much different. The improvement is with the data being in a table and some additional capabilities to fine tune the data that is imported. But let’s take a closer look at how the basic functionality changes.
Step | Description |
---|---|
1. | The first task is to identify a web page you want to query AND to determine that it does contain HTML tables. My excitement over Power Query is tempered some by that fact that it is difficult to locate useful resources that put their data into tables. Many valuable sites and specific pages don’t!
Remember, to determine if data is in a table, right click somewhere on the web data you would like to capture and choose the menu option to “Inspect Element”. This will load the HTML “code” used to create the web page. If you see references to A few examples of potentially helpful tables that I’ve found:
After you have located a table to import, copy the web page address. For my example, I’ll use the Fantasy Pros rest of season projections (link is http://www.fantasypros.com/mlb/projections/ros-hitters.php). I realize this is not useful for DFS, but I just want to demonstrate the basics of Power Query now. |
2. | Open a blank Excel file. Click the newly added “Power Query” tab. Then click the “From Web” icon on the left of the ribbon.
Then paste the copied URL into the dialog box and click “OK”. |
3. | The “Navigator” dialog will appear. It may take a minute or two to load as Excel processes each of the tables on the page.
Once the loading process completes, you will see a list of all the tables available for import. Click your mouse to locate the data you want. If you wish to import more than one table, check the “Select multiple items” box. As you click on the various tables, watch the preview pane on the right in order to locate the exact table you want. |
4. | Stop! This step is informational only. Don’t do anything!
At this point, you could click the “Load” or “Load To…” button. The “Load” button will import the data exactly as you see it into a newly created worksheet tab. The “Load To…” option gives you a few more control over how the data loads. In the ensuing menu you have the option to import the table (recommended) or only add the connection to your file (not sure why you would want to do this unless you were unsure of where to place the table now). You can also choose to create a new worksheet or to place the table in an existing spot. Working with data models is something I may explore in the future. If you want to look ahead, you can start here. Loading from here bypasses some of the real value that Power Query offers. These features are available when you click the “Edit” button. |
5. | Start! You can start following along again.
Click the “Edit” button and the “Query Editor” will load. In this screen we not only see the preview of the data that will be imported, we can also clean things up. For example, the first column is labelled “VBR”. This looks like some kind of a ranking, but I don’t want to import this. Additionally, the second column has a lot of information in it. Instead of seeing “Mike Trout(LAA – CF)” all in one column, I want to try breaking that into separate columns. Rather than bring it in and have it clog up my screen, we can tell Power Query not to import this column. To do this, click on the “VBR” column to select it. Then click the “Remove Columns” button. |
6. | Now let’s move on to splitting apart the player name, team, and position.
Click once on the “Player” column. Then click the “Split Columns” button. And then click the “By Delimiter” option. A delimiter is a unique character that represents a change in the field or information. Looking at the data we have, the opening parenthesis is a delimiter between player name and team. There is no option to choose that from the drop down menu, so instead select the “–Custom–” option. Then type in the open parenthesis, “(“, and select the option for “At the left-most delimiter”. You should now see that the columns automatically get split! |
7. | Let’s keep going and try to separate out each player’s position. Click the new “Player.2” column and then click the “Split Columns>By Delimiter” menu button again. This time use the settings in the image below to split the column at the hyphen.
Power Query is really looking useful. |
8. | The last thing bothering me is the closing parenthesis after each player’s position. To get rid of this, click to select the “Player.2.2” column and then click the “Replace Values” icon.
Once the “Replace Values” dialog loads, enter the closing parenthesis in the “Value To Find” field AND LEAVE THE “REPLACE WITH” FIELD BLANK! Then click “OK”. Check this out… |
10. | Now that the data is cleaned up, click the “Close & Load To…” button on the “Home” tab of the ribbon.
This will load the same “Load To” box discussed earlier. Adjust the settings as you see fit and click the “Load” button when you’re done. The data loads exactly as you cleaned it up! |
This Is Not the Best Example
Because I’m in the middle of a series of DFS-related blog posts, I wish I had a more concrete example that specifically tied in DFS information. But I did want to demonstrate the power of cleaning and tweaking data with Power Query. Hopefully you can recognize there is a great deal of value in knowing these tools exist so you can use them to solve issues as you build your own DFS spreadsheet with the information you like using.
I’ve included a couple more links below that may help you down the Power Query path.
As always… stay smart.
Are You Using Power Query? Other Add-Ins?
Is anyone using Power Query already? What kinds of things are you using it for? What sites are you loading the data from?
I’d love to hear it if you are. Please e-mail me or leave a comment below.
Other Resources
I’ve only given a brief overview of the full capabilities. If you’re intrigued and looking for more examples, check out these additional resources below.
Name | Description |
---|---|
Download Page | Free download page. Power Query is free, but it does require you to have Excel 2013 or Excel 2010 “Professional Plus” (I don’t know exactly what that means). It also requires you be using at least Windows 7.
If you’ve been on the fence about upgrading to the newest version of Excel, I list out a few of the purchasing options here. |
Interactive Online Tutorial | If you choose only one of these items to click on. Choose this one. The demo is only a few minutes long, but it does a great job of demonstrating how you can really fine tune and clean up the data you import through Power Query. |
Interesting Examples that Might Apply to Baseball Data | This is a fairly lengthy post, but look specifically for the sections labelled “Append (Combine) Tables with Power Query”, (I could see this being a way to import a player’s last three seasons of data, or to import multiple projection systems) and “Merge Tables – A VLOOKUP Alternative” (a way to combine DFS salaries with info from other sites). |
Introduction to Power Query | This is simply a written explanation of Power Query and its features and benefits. More detailed than I’ve explained above. |
Advanced Example, Dynamic and Multi-Layered Queries | I started this post off by referring to a question I received from a reader of the site. He wanted to provide Excel with a list of player IDs and then have it systematically go out to the player pages for each of those IDs and pull back data.
I wasn’t sure this was possible in Excel, but turns out that it is! This resource demonstrates how to make advanced edits to your query to make it dynamic (to ask for a player ID) and to make it multi-layered. Said another way, you can have one query go and fetch a list of player IDs and you can have a subsequent query run off each of those IDs. “Hey Excel, go get this list of players. Then go through each player on that list and go get me the standard data from their Fangraphs page.” It’s slow… But it works. |
Hi, Thanks for the note on using PowerQuery. I actually use it for work a lot so I am quite familiar with it, but I haven’t quite gotten there on the fantasy baseball project I’m working on and was wondering if you could help. I’m primarily interested in tracking my league’s daily game activity / game logs. So 10 teams, 23 players, so give or take 230+ players throughout the 162 game season. I’m trying to figure a way to easily pull in all these players’ game logs – I know Fangraphs and B-R offer these, but power query seems to not be able to pull them through the table method you have explained here, so I’ve basically resorted to downloaded to excel and combining, which obviously is a big task. Basically, I’d like a way to 1) streamline 230+ players to be able to write a macro or code to download them all from their respective fangraphs or B-R game log page, and 2) constantly update throughout the season. Would love to hear your thoughts on this now that you’ve gotten to PQ demonstrations and may know something I don’t.
Hey! I’ve been looking to do exactly what you’re doing. Were you able to come up with any macros or get anywhere with this?
Hey Paul,
Sorry, I haven’t made much progress on this and automating a daily spreadsheet. Maybe this year. Who knows. I think I’m going to make a good faith effort to start playing some DFS baseball this summer.
Hi, Sean. I likely don’t have any tricks you’re not already using. But the one example I linked to above (and here), may be able to do what you’re looking for. Essentially, you could list the Fangraphs IDs for the 230 players in one table, and then instruct Power Query to go through all the players listed.
Out of curiosity, what are you going to use the game logs for?
I should mention that even if Power Query can handle the example from above, I don’t think it’s the best tool for the job. I’m not a programmer, but my guess is creating some kind of web scraping script and importing the results into SQL or some other database is probably the best bet.
Tanner – thanks for the response. I am attempting to use the game logs to monitor all teams in my league’s daily performance. Since it’s a head to head league, I want to understand some of the weekly matchup data and who “got lucky” vs. eked out victories in any given category. It’s a massive undertaking, I know, and I can easily take the data from weekly match up pages, but I’d like to go a step further and understand the players used by teams – basically, who was able to pick up a cast off from another team, who ended up having mostly free agents on their team – and figure out how successful those strategies could be. It’s a long shot, I know. I keep struggling when I get to this big data of being able to pull in 250+ players’ game logs and having them update so that I can update weekly performance relatively quickly and automated. Going through weekly sits/starts for each of the 10 teams is still a manual process, and I don’t know if there is a way there to be quicker.
I haven’t read through this web scraping technique, and I’m no programmer myself, but yours is the only blog I’ve seen that really gets into the nitty gritty of some of these excel / formula ways to chop up this type of data, so I figured I’d bounce it off you first.
At the end of the day, I don’t know if this project I am doing with the league is worth it, or if it will tell me anything meaningful. I mean, I know the team in first is good because he tanked last year and got some good keepers that paid off and drafted incredible pitching. There’s no secret there. But understanding anything else (such as injury history, # of moves, success of sitting / starting certain people) intrigues me. I performed this project (albeit, much more manually) two years ago and actually gave up halfway through due to the time it took, but it was interesting nonetheless.
Hi, Sean. Yes this sounds like quite an undertaking! It does sound to me like Excel is not the ideal tool for what you’re trying to do. But it may be possible.
I have a few thoughts popping up in my head, but none of them will be “easy”. If you’re up for the challenge, my first thought would be to see if you can do this after the season is over. You would need to gather some data during the season, but Retrosheet.org puts out game log files after the season is over. Maybe you could use those with the data you capture from your league and do a retrospective analysis after the season.
Next thought is to use the API for your league hosting provider. I have not seen much written on this and it would take some programming knowledge to do it. But there were two articles on Razzball about how to use Yahoo’s API earlier this season. The first is at razzball.com/using-yahoos-api-access-leagues-data-part-1-1000/.
Hope that helps some.
Tanner,
So, for instance, if I have created a conditionally-formatted spreadsheet using my own Fangraphs charts, using Power Query is the only way to update the spreadsheet automatically without exporting the data from Fangraphs and pasting it manually?
Hi Ralph,
No, that’s not necessarily true. Apologies if I said something misleading. There is a setting you can use to have data from a regular (non-Power Query) web query refresh when you open an Excel file. Or you can right-click within a set of connected data (from a regular web query) and choose to “Refresh” it manually.
To have it refresh upon opening the file, the very last step of creating a web query, on the screen where Excel is asking where you want to data to be placed, should be a “Properties” button. In that properties screen is a check box to refresh data upon opening the Excel file.
You definitely don’t need to copy and paste it.
The main advantage I see in Power Query is it brings data in as an Excel table, which a regular web query does not do. But in the grand scheme of things, that’s not a significant advantage. Power Query also seems to have some very high-end powerful features that I don’t fully understand yet. Maybe one day!
Tanner,
No fault of yours. I have never been able to get web queries to work in any Excel files I create. I’m hoping that after my machine is upgraded to Windows 10 the problems will disappear. Also, I believe a new version of Excel is going to be released in September, so maybe that will help.
Keep up the great work!
Hi Tanner,
Great article! This is Jon from ExcelCampus.com. Thanks for linking to a few of my articles in your post.
It’s really cool to see how others are using Power Query, and your example is just awesome! It’s a perfect example of the power of Power Query, and what it can accomplish without any programming knowledge needed.
In one of you comments above you mentioned that creating a SQL script and web scraping tool would be another alternative. Power Query is actually doing just that. All those steps you create in Power Query to modify the data actually create a SQL script in the background and run it on the database when you refresh the query. That SQL statement also depends on the source data. In this case the source is a web table, so most of the SQL work is done in Excel. If you are querying an actual database then the SQL work can be done on the database side before retrieving the data. This helps make Power Query fast and efficient.
If you really want a mind blowing experience with Power Query, checkout PowerBI.com. This is a new FREE service from Microsoft that uses Power Query and Power Pivot to create interactive dashboards on the web. I’m sure you might be able to find uses for it with fantasy baseball. I have an overview video on the following page where I connected it to three different web page sources to create charts and dashboards about one of my favorite ski resorts.
http://www.excelcampus.com/powerbi/powerbi-designer-overview/
Let me know if you ever have any questions about PQ or any of these tools. Thanks again and have a good one Tanner!
Hi Jon,
Thanks for taking the time to reach out. I might take you up on the offer to ask questions about PQ! I won’t bore everyone with the details here, but I’ve been struggling trying to get a web query to read a value from a cell in the workbook.
We’re a bunch of fantasy baseball fanatics trying to gain advantages by using data analysis, mostly in Excel. The challenge is to pull all different types of information and put it into an easy-to-digest format. We’re trying to build an efficient/easy way to pull player information, historic stats, opponent information, ball park information, lineup information, and more.
It’s a blast!
Tanner,
Well, I got Power Query to work….sort of. I am trying to create a spreadsheet of my own fangraphs data that will update either automatically or when I refresh it. When I link one of my tables to the fangraphs URL, it only imports whatever data is showing on the webpage(i.e. the first 30 players), not the entire table(all the players in the table). If I try linking the table to the fangraphs “export data” shortcut, I get this error:
Expression.Error: The scheme ‘javascript’ of the input URL isn’t supported
Any ideas?
Hi Ralph, unfortunately, I don’t see any way to automate this. The main reason being that the data does not actually load on the page. If you look at the underlying HTML for the page (right click somewhere on the page and choose “Inspect Element”), you will see that only thirty players load on a page. And you can’t just adjust the URL to get the next 30 (some sites you’ll see the web address changes for the next 30, meaning you could set up a bunch of different queries to get 30 at a time).
The best suggestion I can offer is that you follow these instructions and use the CSV approach to update the data. It’s still a bit manual because you have to download the CSV each time, but those instructions will help you streamline it as much as possible.
On fangraphs if you change the page to load 50 players it adds “&page=1_50” to the URL. You can then change that to &page=1_1000 to load 1000 players on one page and import that entire page into Excel :)
Thanks for sharing this, Sanga. I only wish this worked on ALL pages. It doesn’t work on projection pages. But this is definitely helpful.