February 19, 2017

Men’s March Madness Descriptive Data Model: Predicted Duke with 74.6% Accuracy

Every year during March Madness Men’s Basketball Tournament thousands of people fill out their brackets. Most of these brackets are filled out with gut based decision making or a bias towards your favorite team or alma mater. But what if there is a way to put data science to use and win your pool? This is the answer I set out to seek, and I can tell you that I not only predicted Duke to win it all but also guessed 47 of 63 games right, with an accuracy of 74.6%. My actual bracket is below, click to enlarge.

bracket2015Building the Data Model
Data Modeling the NCAA tournament isn’t the easiest thing to do as there are many data sources and you don’t know exactly which variables will help you derive the right model. This is why I started to look at 2014 tournament data to understand what variables can be gathered to help me make a descriptive model to predict the 2015 tournament.

I downloaded the 2014 data with the following variables: Wins, Losses, Winning Percentage, RPI, Strength of Schedule, AP ranking, Coaches Poll Ranking, Tournament Seed, Points Per Game, and Points Allowed. All of these variables are descriptive variables and not the dependent variable or goal of the data model.

The goal of the data model is to predict the winner of each game based on the descriptive variables. In order to do this, I created a goal variable called Tournament Wins. The maximum wins that a team in 2014 could have had is 5, which was University of Connecticut. I manually counted the number of wins for all 64 teams in the 2014 tournament and added the goal variable to my spreadsheet.

Processing the Data Through a Data Mining Tool
To process the descriptive variables and how they relate to the dependent variable Tournament Wins, I used a free machine learning tool called Weka. This allowed me to use different mining algorithms to process the data and create a descriptive model. First model I ran was a Conjunctive Rule model, which told me that Strength of Schedule had a 73% correlation with Tournament Wins.

Second, Decision Tree model told me that Strength of Schedule had a 74% correlation coefficient with Tournament Wins. Third, Linear Regression model told me that it was Winning Percentage and Strength of Schedule that had the highest impact on Tournament Wins with 64% correlation coefficient. It also showed that Points Per Game had a negative correlation with Tournament Wins. Last, the M5 rule model told me that winning percentage, and strength of schedule had an 82% correlation coefficient with Tournament Wins.

Descriptive Model Conclusion
From the data models, I concluded that Tournament Wins were very much influenced by the Strength of Schedule and Winning Percentage of the team. I started to fill out my bracket focusing on both of these rules and realized that sometimes one team had the stronger strength of schedule while the other team had the winning percentage. I needed a tie breaker for the tournament. I knew that Points Per Game was negatively correlating Tournament Wins, I decided to use the team that allows the fewest points as the tie breaker.  Thus, it was Strength of Schedule, Winning Percentage and if needed the team with the lowest points per game allowed. Based on these rules I filled out the tournament bracket above and was able to get 74.6% of the games right, predicted the winner to be Duke, and won my pool.

Getting Key Performance Indicators (KPIs) Right

Strategy creates data and data creates KPIsI have written and discussed how important it is to capture all the diagnostic metrics around your particular Key Performance Indicator (KPI). As analysts, we have a tendency to reveal too much information. We try to explain how much we know to the business and show our value. In the midst of these discussions, we forget the audience that to whom we are disseminating information. We overload our audience with knowledge, and they get overwhelmed with the information.

In the situation described above, nobody wins. The analyst is left without the satisfaction of providing the audience with data. The audience feels like they got information, but it is not what they were looking for. This is a fundamental communication problem that arises many times in the industry and we need to start from Strategy, move to Data and then create KPIs.

Your website, landing page, microsite, and mobile page was built for a reason. There are marketing initiatives that are driving users to view the technical property. Strong analysts find out those reasons and understand the user experience of the page and it’s relationship to the data. Collecting this information is a vital step to finding your KPI.
[Read more…]

Customizing Web Analytics Using WebTrends Analytics 10

Webtrends Analytics10In my past life I used WebTrends 8, then some time has passed and I have moved into customizing WebTrends Analytics 10. I have seen some drastic improvements since the older version. I want to highlight four new features and how they can be beneficial to you or your organization.

The new WebTrends A10 has two dashboards that are provided. The system has an overall site dashboard that you can customize with important high level reports. There are also page dashboards that provide the visits, bounce rates, traffic sources, next page flow, social buzz, geography, search engines and search phrases reports. Both of these dashboards are really nice if you have an organization that looks at their site in specific sections or product categories. Your team can look at the high level pages of your products and determine where the traffic is coming from, bouncing, and where they are going. The design is easy to use and very simple to garner basic analytical insight. [Read more…]

3 Creative Ways to Judge Your Blog Articles Using Google Tools

You worked hard to write blogs that provide your audience with content they care about. Then, you took the necessary steps to promote your content via social channels (linkedin, Twitter, G+, Facebook, etc). After a few days, you start looking in your web analytics platform such as Google Analytics and understand if your content reached the right audience.

The Right Blog AudienceThe pages report lets you know how many people viewed your blog post. Furthermore, you see visits, page exits, and bounce rates. But this doesn’t tell you if your blog is a hit with your audience. A blog pages report, is just page view by a unanimous person somewhere out there on the interwebs, and there is no way of telling if they will ever come back and read your blog or do business with your company. Keeping this in mind, here are a few creative ways to judge your blog content using Google Tools.

Location, Location, Location
Our company is located in Chicago, and most of our business is U.S. based at this time. For this reason, it is important to judge our blog content by how many users in the U.S. read the blog article. When you look at the pages report in Google Analytics, use a secondary dimension to pull up the Region for each blog post. Then, look at the Metro area and City reports to see how many people viewed the post in your local, metro area or city. [Read more…]

10 Things Good Analysts Do

  1. Educate their teams – analytics is complicated, make sure everyone understands the data they are looking at and where it is coming from.
  2. Be approachable – it’s important to be knowledgeable, but it’s more important to be approachable. Your team needs to be able to ask you questions and get understandable answers.
  3. Don’t fear the system that is collecting data – with so many software choices and options, it can be overwhelming to learn a new system. You should never fear learning a new web analytics platform, CRM System, Social Media Hub, or Mobile Platform. They all collect various data which is key to your success as an analyst.
  4. Seek one version of the truth – data can be manipulated in different ways and achieve various results. Good analysts look for causal relationships in data that allows them to have high confidence in the analysis.
  5. Be curious – listen to your team for key information that is going on in your organization. These can lead to interesting data findings.
  6. Seek 90/10 – 90% of an analysts time should be analyzing data, 10% should be pulling data. This is a very big goal in the industry.
  7. Suggest A/B Tests – sites always need improvement. A/B tests are a great way to move the needle in a positive direction. Good analysts should always be on the lookout for something to test and improve.
  8. Segment – groups of people may behave a certain way with your product, mobile device, or website. Finding these segments and providing them with what they want will create revenue for your organization.
  9. Document caveats – everything has a condition or something important to keep in mind while interpreting the data.
  10. Work with raw data – many systems simplify the analysis for your, but that isn’t always enough to obtain statistical significance. Exporting raw data into an SPSS, SAS or R allows you to obtain insights with statistical significance.

Related Posts:
3 ways to find something to A/B Test
How to become a great web analyst
When to switch your web analytics platform

Analyst’s Guide to Surviving a Website Redesign

Website redesign’s happen quite frequently these days. Unfortunately, for a web analyst this isn’t a time to go on vacation. Many changes in the organizational direction, copy, site functionality happen during a website redesign. These changes greatly impact your analytics approach and tagging of your current web properties. Here are 4 tips for a web analyst to survive a website redesign.

Tip #1: Get Wireframes and Functionality Early
When you know how the new site templates are going to be laid out, it will alert you to possible changes in the way users browse the website. The changes in user path, user experience, and functionality will greatly affect your website analytics data. Depending on how sites are laid out, your engagement with time on site and pages per visit will vastly change with a redesign. The sooner you get your hands on the wireframes, the more prepared you will be to create tags that capture all the new interesting aspects of your redesigned website. [Read more…]

Web Analytics is About Optimization

Web Analytics OptimizationThe web is not static, and neither are your competitors. Your web analytics team should always be looking to create hypothesis to optimize your website. Optimization is about finding areas of your site that can be improved on a systematic basis, and testing the validity of those hypothesis.

Web Analytics Hypothesis
Web analytics management needs to provide your team with the necessary software and tools to conduct proper analysis and create hypothesis. This can be mouse tracking software, deeper dive software such as Omniture Discover, survey software such as Forsee, and social measuring tools. These tools need to be used in conjunction with your web analytics platform. Providing your team with many tools to analyze pages will help them validate their hypothesis and create use cases.

Web Analytics Validation
This one is very important, as your web analytics manager should validate if the hypothesis are correct statistically and organizationally. Web managers need to validate the quality of the hypothesis by statistical means, making sure that significant data was collected. Web analytics managers also need to validate that the hypothesis aligns with organizational goals.

For example, analyst came up with the hypothesis that the Green Widget page needs to have a form on right rather than a form on left. This was done by analyzing the mouse tracking software. Web analytics manager needs to review that the eye tracking software had significant visits to create this hypothesis. More importantly, the manager needs to ask product managers if the organization is pushing to sell more Green Widgets. Any misalignment in the validation phase causes further testing, optimization and measurement for a possibly discontinued product.

Website Testing, Optimization, Measurement
Validated hypothesis need to be tested. Proper A/B or Multivariate testing techniques need to be conducted to make sure that the hypothesis will provide significant lift for your organization. Once tests are conclusive, most optimization software allows you to leave the winning version on your site until your website team has updated the code with the changes. During this process, your web analytics team should be measuring the results of the update and documenting the data for sharing.

Sharing Results With Executives, and Product Managers
Once your team has been through the optimization cycle several times, you are ready to share results. This is a very important step in driving a data driven organizational culture. Product managers and executives who see successful improvement in website performance become advocates of the structure and process. New hypothesis for tests are developed and validated. Your web analytics is now driving significant value to your organization.

With everything in data there are caveats. The most important caveat to the above process is your web analytics tags have to be deployed correctly and data has to be validated. Incorrect sourcing of data leads to distrust and breaking of the web optimization process.

Related Posts:
3 ways to find something to A/B Test
How to become a great web analyst
When to switch your web analytics platform

Measuring KPIs: Always Capture Diagnostic Metrics

KPI InfluencersEvery executive wants to see the top Key Performance Indicators (KPIs) on a regular reporting cadence. However, when major changes happen in your KPIs, you need to be able to explain what happened, what it means, and what you are going to improve.

Factors Influencing Your Key Performance Indicators
The factors influencing your key performance indicator to the left are a few that I picked as an example. Increases or decreases in your purchase funnels will highly affect your KPIs. The factors I chose on the left are a guess, but there is a way to find exactly what is affecting your KPIs through diagnostic metrics. Your analyst needs to find direct relationships between variables (influencing diagnostic metrics) and your KPI (response variable). [Read more…]

Web Analytics Fundamentals for Quality Insights

Quality Web Analytics InisghtsAll clients, internal or external, are looking for quality insights. Definition of quality is how actionable the insights are and the positive impact the insights will produce. An analyst has to be confident in the data that is being delivered to business stakeholders. Analytical insight confidence comes from the Reliability and Validity of the insights.

Reliability of the data comes from understanding the sources of data. It is mostly about the consistency of your data sample. Data that has a very large and diverse distribution may not be reliable for what you are looking to accomplish. Large variances in data are not consistent and will throw off the reliability of your insights. [Read more…]

What is Your Web Analytics Contingency Plan?

web analytics contingency planWeb Analytics Platform Limitations
Every system has it’s limitations. Sometimes you do not arrive at these limitations until you inquire about a specific metric that you need and it is not possible with your current system. It is possible that being on version 8 of a software does not provide you with the ability to capture the metrics that you need but version 9 does.

Cost Benefit Analysis
Next, your team analyses the costs of the software upgrade. The complexity, IT time commitment is sometimes beyond budgets. After consideration, estimation of costs and schedule your team concludes that this project is not feasib

Plan B
For the reasons stated above, it is vital to use multiple web analytics platforms to capture metrics. Many organizations use a WebTrends, Adobe Omniture, or Coremetrics in conjunction with Google Analytics. This provides organizations with multiple benefits: [Read more…]