Monday, June 13, 2022
HomeBusiness IntelligenceBallpark figures: Analyzing MLB baseball attendance

Ballpark figures: Analyzing MLB baseball attendance


It’s springtime within the U.S., which suggests one thing as American as apple pie is again: baseball. And since there’s every kind of nice knowledge round one of many nation’s nice pastimes, we determined for this week’s submit to take a look at Main League Baseball (MLB) attendance statistics from the final 20 years, which is revealed on many web sites together with the one we used to get the information you’ll discover within the charts under: ESPN.com.

To gather the attendance knowledge from ESPN, we used Jupyter Workspaces (at present in beta in Domo) and the Python package deal Stunning Soup to parse the HTML. And since Domo can now schedule code in Jupyter Workspaces to run on an everyday schedule, you possibly can make sure that this web page will proceed to replace with the 2022 knowledge.

The very first thing you’ll in all probability discover when trying on the knowledge is that 2020 is lacking. That’s as a result of, because of the pandemic, baseball was performed with out followers that 12 months. There was a little bit of a return to normalcy in 2021, nevertheless it wasn’t till this season that every one spectating restrictions had been lifted, so it will likely be fascinating to observe how attendance rebounds (although, in full transparency, we solely have the information for full years proper now, so we aren’t capturing any knowledge associated to seasonality, equivalent to how climate or a crew’s place within the playoff race impacts ticket gross sales).

One good technique to evaluate this knowledge is with an previous favourite of many knowledge scientists: a field and whisker plot. The chart reveals the minimal and most common attendance for every crew within the whiskers (the highest and backside traces). I’ve sorted this to point out the crew with the best peak attendance 12 months on the left, and the bottom on the fitting:

The place the visualization will get extra fascinating for me is with the field components. Every field reveals the house between twenty fifth and seventy fifth percentiles, which is supposed to mirror how a lot a crew’s attendance has swung over time. The larger bins inform me these groups (equivalent to Philadelphia and Detroit) have had some nice years for attendance and a few not so nice years. Smaller bins (equivalent to Boston) say {that a} crew has been very constant in its attendance numbers. We now have additionally filtered the chart for pre-pandemic years solely since 2021 (and to a lesser extent partial 2022 knowledge) skews the information.

Another strategy to understanding how groups rank in attendance is to create indexes of the place a crew’s attendance stands relative to the entire MLB common—which is what we’ve finished immediately under. Darkish blue bins imply {that a} crew is nicely above the typical, whereas darkish orange bins imply {that a} crew is nicely under the typical. You need to use the filters to take a look at no matter league, division, crew(s), or 12 months(s) you’re fascinated about:

Lengthy-time Domo customers could also be taking a look at these indexes and considering that I did some pre-calculation in a Magic ETL or a Dataset View. It’s true that doing calculations on such whole ranges usually require pre-calculation. But when I did that, it will be onerous to permit for the 12 months filter. So, the key is out: With Domo’s new FIXED beast modes (at present in beta), you are able to do FIXED stage of element features proper in a beast mode. For the above “Index to League Avg”, that is the calculation:

You possibly can see there are two issues taking place right here. First, when I’ve the SUM FIXED by League, then it’s summing throughout all values with the identical league because the row I’m on. That enables me to get that league whole we want for the denominator of the index. Second, it’s utilizing FILTER ALLOW to inform Domo that filters on Yr can affect the FIXED features.  There are alternatives for FILTER ALLOW, FILTER DENY, and FILTER NONE.

Right here’s one final instance of how helpful the FIXED with FILTER DENY will be. The bar charts under are defaulted to the New York Yankees (my boss’ favourite crew). The primary chart is just not utilizing FIXED, so after I filter for the Yankees, the Min, Max, and Median fields turn into meaningless since they get filtered to be the identical as the chosen crew. The second chart makes use of FIXED and DENY on crew identify in order that the Min, Max, and Median stay as references to the primary common, which is for the Yankees.

One of many issues I like—and in addition at occasions discover maddening—about exploring new knowledge is that there’s all the time extra to discover. As I labored on this submit, I spotted that it will be fairly fascinating to usher in groups’ win/loss data in addition to info on stadium capability. However then I believed: Let’s possibly save that for a future submit.




RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments