March 30, 2017

Men’s March Madness Descriptive Data Model: Predicted Duke with 74.6% Accuracy

Every year during March Madness Men’s Basketball Tournament thousands of people fill out their brackets. Most of these brackets are filled out with gut based decision making or a bias towards your favorite team or alma mater. But what if there is a way to put data science to use and win your pool? This is the answer I set out to seek, and I can tell you that I not only predicted Duke to win it all but also guessed 47 of 63 games right, with an accuracy of 74.6%. My actual bracket is below, click to enlarge.

bracket2015Building the Data Model
Data Modeling the NCAA tournament isn’t the easiest thing to do as there are many data sources and you don’t know exactly which variables will help you derive the right model. This is why I started to look at 2014 tournament data to understand what variables can be gathered to help me make a descriptive model to predict the 2015 tournament.

I downloaded the 2014 data with the following variables: Wins, Losses, Winning Percentage, RPI, Strength of Schedule, AP ranking, Coaches Poll Ranking, Tournament Seed, Points Per Game, and Points Allowed. All of these variables are descriptive variables and not the dependent variable or goal of the data model.

The goal of the data model is to predict the winner of each game based on the descriptive variables. In order to do this, I created a goal variable called Tournament Wins. The maximum wins that a team in 2014 could have had is 5, which was University of Connecticut. I manually counted the number of wins for all 64 teams in the 2014 tournament and added the goal variable to my spreadsheet.

Processing the Data Through a Data Mining Tool
To process the descriptive variables and how they relate to the dependent variable Tournament Wins, I used a free machine learning tool called Weka. This allowed me to use different mining algorithms to process the data and create a descriptive model. First model I ran was a Conjunctive Rule model, which told me that Strength of Schedule had a 73% correlation with Tournament Wins.

Second, Decision Tree model told me that Strength of Schedule had a 74% correlation coefficient with Tournament Wins. Third, Linear Regression model told me that it was Winning Percentage and Strength of Schedule that had the highest impact on Tournament Wins with 64% correlation coefficient. It also showed that Points Per Game had a negative correlation with Tournament Wins. Last, the M5 rule model told me that winning percentage, and strength of schedule had an 82% correlation coefficient with Tournament Wins.

Descriptive Model Conclusion
From the data models, I concluded that Tournament Wins were very much influenced by the Strength of Schedule and Winning Percentage of the team. I started to fill out my bracket focusing on both of these rules and realized that sometimes one team had the stronger strength of schedule while the other team had the winning percentage. I needed a tie breaker for the tournament. I knew that Points Per Game was negatively correlating Tournament Wins, I decided to use the team that allows the fewest points as the tie breaker.  Thus, it was Strength of Schedule, Winning Percentage and if needed the team with the lowest points per game allowed. Based on these rules I filled out the tournament bracket above and was able to get 74.6% of the games right, predicted the winner to be Duke, and won my pool.

Getting Key Performance Indicators (KPIs) Right

Strategy creates data and data creates KPIsI have written and discussed how important it is to capture all the diagnostic metrics around your particular Key Performance Indicator (KPI). As analysts, we have a tendency to reveal too much information. We try to explain how much we know to the business and show our value. In the midst of these discussions, we forget the audience that to whom we are disseminating information. We overload our audience with knowledge, and they get overwhelmed with the information.

In the situation described above, nobody wins. The analyst is left without the satisfaction of providing the audience with data. The audience feels like they got information, but it is not what they were looking for. This is a fundamental communication problem that arises many times in the industry and we need to start from Strategy, move to Data and then create KPIs.

Strategy
Your website, landing page, microsite, and mobile page was built for a reason. There are marketing initiatives that are driving users to view the technical property. Strong analysts find out those reasons and understand the user experience of the page and it’s relationship to the data. Collecting this information is a vital step to finding your KPI.
[Read more…]

Customizing Web Analytics Using WebTrends Analytics 10

Webtrends Analytics10In my past life I used WebTrends 8, then some time has passed and I have moved into customizing WebTrends Analytics 10. I have seen some drastic improvements since the older version. I want to highlight four new features and how they can be beneficial to you or your organization.

Dashboards
The new WebTrends A10 has two dashboards that are provided. The system has an overall site dashboard that you can customize with important high level reports. There are also page dashboards that provide the visits, bounce rates, traffic sources, next page flow, social buzz, geography, search engines and search phrases reports. Both of these dashboards are really nice if you have an organization that looks at their site in specific sections or product categories. Your team can look at the high level pages of your products and determine where the traffic is coming from, bouncing, and where they are going. The design is easy to use and very simple to garner basic analytical insight. [Read more…]

Custom Business Dashboards are King

Custom DashboardsIn a world where your business processes are as complex as the clients you are trying to service, an out-of-the box dashboard solution is not the answer. Executives are flooded with work, and analysts are drowning in data. There is not enough time to spend digging through data and the need for instantaneous insights has become a true challenge in business today. The clear answer to these challenges is a dashboard that provides important at-a-glance metrics to executives and allows analysts to dig deeper into results.

For the Executive
Executives do not have time to understand what is happening, how it is happening, or how to fix it. The executive dashboard needs to show the few important metrics, trends, and answer the questions of “how is my business doing.” It cannot breach the executives attention span, yet it has to be informative enough that it is used and adopted.

[Read more…]

Big Data: Obtaining The Value of Text Analytics

Text analytics is the latest tool that can be mined for business intelligence. Sources such as blogs, websites, twitter, word documents, PDF documents, and comment fields in customer service databases can be leveraged for value. Output of text can be categorized,  interpreted with sentiment analysis and used as a concept web (image below). There is an incredible amount of value in text mining, and with that value comes the cost of doing the work. Below are items you must maintain to get the full value out of text analytics.

Text Analytics Concept Web

[Read more…]

3 Creative Ways to Judge Your Blog Articles Using Google Tools

You worked hard to write blogs that provide your audience with content they care about. Then, you took the necessary steps to promote your content via social channels (linkedin, Twitter, G+, Facebook, etc). After a few days, you start looking in your web analytics platform such as Google Analytics and understand if your content reached the right audience.

The Right Blog AudienceThe pages report lets you know how many people viewed your blog post. Furthermore, you see visits, page exits, and bounce rates. But this doesn’t tell you if your blog is a hit with your audience. A blog pages report, is just page view by a unanimous person somewhere out there on the interwebs, and there is no way of telling if they will ever come back and read your blog or do business with your company. Keeping this in mind, here are a few creative ways to judge your blog content using Google Tools.

Location, Location, Location
Our company is located in Chicago, and most of our business is U.S. based at this time. For this reason, it is important to judge our blog content by how many users in the U.S. read the blog article. When you look at the pages report in Google Analytics, use a secondary dimension to pull up the Region for each blog post. Then, look at the Metro area and City reports to see how many people viewed the post in your local, metro area or city. [Read more…]

Forecast Your Website Traffic With Historical Data

Many times as managers, marketers, and advertisers are asked to forecast future behavior. Without formal education in time series analysis, this can be a pretty daunting and scary task. However, with the proper tools and education, you can create a model that fits the data and creates a good forecast of your future website traffic.

The Challenge
Your website data has many fluctuations based on campaigns. Campaigns can be timely, constant, or pulsating, making traffic to your website change drastically over time. These variances in traffic data make it difficult for the model to predict future forecasts. [Read more…]

Our Business Intelligence Dashboard is Live

iphone business intelligence dashboardIn the last 6 months we have been focusing solely on becoming experts in dashboard design. The dashboard we created has three very important features Simple, Interactive, Mobile. I believe these features are what will drive future success of our dashboard product. The business intelligence behind our dashboards are the metrics that matter and provide at-a-glance informative information. Our design can be customized to almost any vertical.

Simple
Simplicity in the dashboard is about quick at-a-glance metrics that provide insight. Our dashboard information is meant to be scanned much faster than reading text. We explicitly avoid meaningless data by focusing on the few metrics that matter to your business.  [Read more…]

10 Things Good Analysts Do

  1. Educate their teams – analytics is complicated, make sure everyone understands the data they are looking at and where it is coming from.
  2. Be approachable – it’s important to be knowledgeable, but it’s more important to be approachable. Your team needs to be able to ask you questions and get understandable answers.
  3. Don’t fear the system that is collecting data – with so many software choices and options, it can be overwhelming to learn a new system. You should never fear learning a new web analytics platform, CRM System, Social Media Hub, or Mobile Platform. They all collect various data which is key to your success as an analyst.
  4. Seek one version of the truth – data can be manipulated in different ways and achieve various results. Good analysts look for causal relationships in data that allows them to have high confidence in the analysis.
  5. Be curious – listen to your team for key information that is going on in your organization. These can lead to interesting data findings.
  6. Seek 90/10 – 90% of an analysts time should be analyzing data, 10% should be pulling data. This is a very big goal in the industry.
  7. Suggest A/B Tests – sites always need improvement. A/B tests are a great way to move the needle in a positive direction. Good analysts should always be on the lookout for something to test and improve.
  8. Segment – groups of people may behave a certain way with your product, mobile device, or website. Finding these segments and providing them with what they want will create revenue for your organization.
  9. Document caveats – everything has a condition or something important to keep in mind while interpreting the data.
  10. Work with raw data – many systems simplify the analysis for your, but that isn’t always enough to obtain statistical significance. Exporting raw data into an SPSS, SAS or R allows you to obtain insights with statistical significance.

Related Posts:
3 ways to find something to A/B Test
How to become a great web analyst
When to switch your web analytics platform

Analyst’s Guide to Surviving a Website Redesign

Website redesign’s happen quite frequently these days. Unfortunately, for a web analyst this isn’t a time to go on vacation. Many changes in the organizational direction, copy, site functionality happen during a website redesign. These changes greatly impact your analytics approach and tagging of your current web properties. Here are 4 tips for a web analyst to survive a website redesign.

Tip #1: Get Wireframes and Functionality Early
When you know how the new site templates are going to be laid out, it will alert you to possible changes in the way users browse the website. The changes in user path, user experience, and functionality will greatly affect your website analytics data. Depending on how sites are laid out, your engagement with time on site and pages per visit will vastly change with a redesign. The sooner you get your hands on the wireframes, the more prepared you will be to create tags that capture all the new interesting aspects of your redesigned website. [Read more…]