As you in all probability know, the 2022 NCAA Males’s Basketball Event ended earlier this month with the Kansas Jayhawks successful their fourth nationwide championship. However whereas the occasion is over, we haven’t put it in our rearview mirror but. That’s as a result of we thought it could make for a great alternative to jot down concerning the course of of making a knowledge app quite than displaying a knowledge app. Particularly we are going to comply with up on our earlier put up on March Insanity.
One of many causes Domo is a superb platform is the end-to-end performance it provides in creating knowledge apps. Two of the primary steps in creating a knowledge app are gathering the entire knowledge and mixing the information collectively. This may be troublesome, messy, and time-consuming. This put up will deal with a few of the knowledge inconsistencies we bumped into with our March Insanity knowledge app, and present how we take into consideration bringing knowledge into Domo and automating a few of these kinds of processes.
Through the pandemic, the NCAA arrange a web page with the entire outcomes of each males’s match from 1939-2019. The info itself will be messy, and has errors and inconsistencies all through. Moreover, the format of the match has modified many instances over time. It’s gone from being a 32-team match, to a 64-team match, to now a 68-team match. And at one stage there was a third-place sport.
We needed this mission to reflect what many customers must undergo typically to get knowledge. So, as a substitute of buying knowledge from one of many many sports-data suppliers, we determined to get knowledge from the NCAA utilizing Python and Lovely Soup, a Python bundle for parsing HTML and XML paperwork. The Domo platform is extremely highly effective and versatile, because it comes with plenty of built-in knowledge connectors whereas permitting folks to interrupt out their high-code abilities after they need to.
We opened Jupyter Workspaces (a beta characteristic) inside our Domo occasion and created a Python pocket book to scrape the information and deposit it into Domo. You may as well set Jupyter Notebooks to run on a schedule, clicking on the dataflow button within the pocket book:
After getting the information into Domo, we blended the information collectively utilizing the Magic ETL software. Easy SQL-like statements allowed us to create a standard knowledge definition amongst the tournaments, akin to for Spherical knowledge. Under is a take a look at the uncooked Spherical knowledge, and the variety of instances that Spherical appeared within the imported knowledge for a sport performed:
Right here you may see all types of fascinating data. For example, the primary spherical will be known as “First Spherical,” “First Spherical (Spherical of 64),” and even “Second Spherical (Spherical of 64),” as a result of at one time they thought-about that the second spherical after the play-in spherical.
To normalize the information, we checked out the entire completely different Spherical names, and aligned on Spherical names in order that our knowledge app would operate accurately. We created these transforms in Magic 2.0 with easy case statements like this:
CASE when `spherical` = 'CHAMPIONSHIP' then 'Nationwide Championship' when `spherical` = 'Championship' then 'Nationwide Championship' when `spherical` = 'round-1' then 'First Spherical (Spherical of 64)' when `spherical` = 'First Spherical' then 'First Spherical (Spherical of 64)' when `spherical` = 'round-2' then 'Second Spherical (Spherical of 32)' when `spherical` = 'Second Spherical' then 'Second Spherical (Spherical of 32)' when `spherical` = 'round-3' then 'Candy 16' when `spherical` = 'round-4' then 'Elite 8' when `spherical` = 'Candy Sixteen' then 'Candy 16' when `spherical` = 'Elite Eight' then 'Elite 8' when `spherical` = 'Second Spherical (Spherical of 64)' then 'First Spherical (Spherical of 64)' when `spherical` = 'Third Spherical (Spherical of 32)' then 'Second Spherical (Spherical of 32)' when `spherical` = 'FINAL FOUR®' then 'Closing 4®' when `spherical` = 'Closing 4' then 'Closing 4®' when `spherical` = 'Regional Finals' then 'Elite 8' when `spherical` = 'Regional Semifinals' then 'Candy 16' when `spherical` = 'FIRST FOUR®' then 'First 4®' when `spherical` = 'First 4' then 'First 4®' when `spherical` = 'Opening Spherical' then 'Opening Spherical Sport' else `spherical` finish
Outputting these gave us a blended dataset, giving us 4 a long time’ price of March Insanity that may be analyzed and shared with anyone. Fairly cool, huh?