Advanced statistics is an area that has recently started to pick up steam and shift into the mainstream focus in hockey over the past decade. Many NHL teams now employ full-time analytics staff dedicated to breaking down the numbers behind the game. So, what makes analytics such a powerful tool? Aside from helping you dominate your next fantasy hockey pool, advanced statistics provide potent insights into what is really causing teams to win or lose.
Hockey is a sport that has long been misunderstood. Its gameplay is fundamentally volatile, spontaneous and difficult to follow. There are countless different factors that contribute to a team’s chances of scoring a goal or winning a game on a nightly basis. While many in Canada would beg to differ, ice hockey still firmly occupies last place in terms of revenue and fan support amongst the big four major North American sport leagues (NFL, MLB, NBA, & NHL). As such, hockey is on the whole overlooked and is often the last to implement certain changes that come about in professional sports. The idea of a set of advanced statistics that would offer better insights into the game arose as other major sports leagues, starting with Major League Baseball, began looking beyond superficial characteristics and searching for the underlying numbers influencing outcomes. Coaches, players, and fans alike have all been subjected over the years to an epidemic failure to truly understand what is happening out there on the ice. This is the motivation behind the hockey analytics movement: to use data analysis to enhance and develop our knowledge of ice hockey and inform decision-making for the benefit of all who wish to understand the sport better.
Another barrier to progress in the field of hockey analytics is the hesitance of the sport to embrace modern statistics. Most casual fans are familiar with basic stats such as goals, assists, PIM, and plus/minus. But do these stats really tell the full story? In fact, most of these are actually detrimental to the uninformed fan’s understanding of the game. For starters, there is usually no distinction between first and second assists in traditional stat-keeping. A player could have touched the puck thirty seconds earlier in the play or made an unbelievable pass to set up a goal, and either way it still counts as a single assist on the scoresheet. Looking only at goals and assists can be deceiving; we need more reliable, repeatable metrics to determine which players are most valuable to their teams. Advanced stats are all about looking beyond the surface and identifying what’s actually driving the play.
So, what are these so-called “advanced stats”? Let’s start with the basics.
PDO: PDO (it doesn't stand for anything) is defined as a team’s save percentage (usually 5v5) + shooting percentage, with an average score of 1. If you only learn one concept, it's this one. It is usually regarded as a measure of a team or player’s luck, and can be a useful indicator that a player is under/over performing and whether a regression to the mean (back towards 1.000) is likely. This will not happen in every situation, of course, but watch for teams that have astronomic PDOs to hit a reality check sooner rather than later. Team PDO stats can be found on corsica.hockey’s team stats page.
Without trying to scare anyone, the Toronto Maple Leafs currently boast the 4th highest PDO at 101.85. To help ease your mind a bit, the Tampa Bay Lightning who are considered the team to beat in the East have the highest PDO of 102.35, and there's a decent gap between second place. They could be currently playing at a higher level than they really are as well, time will tell.
Corsi: You may have heard of terms like Corsi and/or Fenwick being thrown around before. These are core concepts that are fundamental to understanding what drives the play during a game. Basically, Corsi is an approximation of puck possession that measures the total shot attempts for your team, and against your team, and stats can be viewed for Corsi results when a specific player is on the ice.
A shot attempt is defined as any time the puck is directed at the goal, including shots on net, missed shots, and blocked shots. Anything above 50% possession is generally seen as being positive as you are generating more shot attempts than you are allowing.
Corsi stats are typically kept in the following ways: Corsi For (CF), Corsi Against (CA), +/-, and CF%. An example of how CF% can be useful is when evaluating offensive defensemen. Sometimes, these players are overvalued because of their noticeable offensive production, while failing to consider that their shaky defensive game offsets the offensive value they provide.
Fenwick: Fenwick is similar to Corsi, but excludes shot attempts that are blocked. Of course, with both of these stats, one should also take into account that a player’s possession score is influenced by both their linemates as well as the quality of competition (QoC). These stats can always be adjusted to reflect different game scenarios, like whether the team was up or down by a goal at the time, etc.
Measuring puck possession in hockey makes sense, because the team that has the puck on their stick more often controls the play. Granted, Corsi/Fenwick are far from perfect, and the team with the better possession metrics doesn’t always come out ahead. But at the very least, including all shot attempts offers a much larger sample size of data than traditional stats, and provides a solid foundation for further analysis.
Zone Starts (ZS%): this measures the proportion of the time that a player starts a shift in each area of the ice (offensive zone vs. defensive zone). A ZS% of greater than 50% tells us that the player is deployed in offensive situations more frequently than defensive situations. This is important because it gives us insight into a player’s usage, or in what scenarios he is normally deployed by his team’s coach. It also provides context for interpreting a player’s Corsi/Fenwick. Players who are more skilled offensively will tend to have a higher ZS% because they give the team a better chance to take advantage of the offensive zone faceoff and generate scoring opportunities. At the very least, ZS% can be used to get a glimpse at how a coach favors a player’s skillset.
Intro on 5v5 Isolated Stats and Repeatability
Often times, you will see those who do work with hockey analytics cite a player's stats solely while they are at even strength, or 5v5. Why? There's a few reasons.
First, 5v5 obviously takes up most of the hockey game. If a player is valuable to his team at 5v5, he will be valuable to a team for more time throughout the game, and this should be seen as a large positive. A player's power play contributions are certainly valuable to a team, but often over-valued. Next, the game is played very differently at different states. It would be wildly unfair to penalty killers to have their penalty kill stats included in their overall line, as more goals against are scored on the penalty kill, even for the best penalty killers. Separating these statistics helps provide a more complete picture into the player's skillset and value that they have contributed to their team. Finally, 5v5 stats are generally regarded as the most repeatable, partially due to the larger sample. While players' PP and PK stats can highly vary by year, 5v5 stats typically remain relatively stable (read more at PPP here if you like).
In addition, primary points (goals and first assists) have been regarded as relatively repeatable stats, so be on the lookout for player's that have many secondary assists to possibly have their point totals regress in the future (read more on this here).
Intro to Comparison Tools
One of the areas that has most benefited from hockey analytics is the domain of player comparison. One of the best and most intuitive tools is the HERO chart, as pioneered by Domenic Galamini Jr (@MiminoHero). The HERO chart is a quick comparison of how players stack up across ice time, goal scoring, primary assists, shot generation and shot suppression. At a single glance, we can get a sense of the strengths and contributions of different players. Here we compare Sidney Crosby to Connor McDavid:
We can see that Crosby is better at goal-scoring and shot generation, while McDavid is better at primary assists and shot suppression.
To compare any two players of your choice, or to compare a player to a positional archetype like First-Line Centre or Second-Pair Defender, you can use Galamini’s website: http://ownthepuck.blogspot.ca/. These comparisons can be used to enhance understanding of a player’s skill set, inform debates, and evaluate moves made by NHL teams, among other uses.
All-3-Zone Data Visualizations:
While a HERO chart is an all-encompassing snapshot of a players contributions on the ice, the All-Three-Zones visuals are concerned with more specific aspects of the game. CJ Turtoro (@CJTDevil) created two sets of visuals using data from Corey Sznajder’s (@ShutdownLine) massive tracking project.
You can find both sets of visuals at the links below:
In the first set of visuals, you will find 4 leaderboards. Players are ranked in the 5v5 stats listed below.
5v5 Entries -- How often players enter the offensive zone by making a clean pass to a teammate (Entry passes/60) or by carrying the puck across the blue line themselves (Carry-ins/60).
Other notes: The best way to enter the zone is to enter with possession of the puck (Entry passes + Carry-ins, as discussed above). These types of entries are called Possession Entries. Although other types of attempts are included in the leaderboard as well, players are automatically sorted by Possession Entries/60 because these alternative attempts are less than ideal. If you decide to change this, use the “Sort By (Entries)” filter to rank the players in other ways.
5v5 Exits -- This is the same as 5v5 entries, except at the blue line separating the defensive zone from the neutral zone. Players are ranked based on how often they transition the puck from the defensive zone into the neutral zone either by carrying it (Carries/60) or by passing it to a teammate (Exit Passes/60).
Other notes: Like 5v5 entries, the best ways to exit the defensive zone are classified as Possession Exits. This is why players are automatically sorted by Possession Exits/60. Again, the “Sort By (Exits)” filter will let you change how the leaderboard is sorted.
5v5 Entries per Target (5v5 Entry Def %) -- This stat measures defence at the blue line. It answers the question: When a defender is in proximity to an attempted zone entry, how often does he stop the attempt?
Other Notes: It is important to note that a “defender” is any player on the team playing defence (i.e. the team without the puck). Forwards are included in this definition of defender, but the best way to use this leaderboard is to judge defensemen only. This is why forwards are automatically filtered out of the leaderboard, but you can always change this using the filter if you wish.
5v5 Shots and Passes -- Players are ranked based on how often they contribute to shots. Players contribute to shots by being the shooter or by making one of three passes immediately before the shot in the same way they earn points by scoring a goal or by making one of two passes immediately before the shot was taken.
If you want a closer look at certain groups of players, the filters allow you to look at players who play certain positions (forwards/defencemen) and players who play on certain teams. In the screenshot below, for example, I filtered the 5v5 Entries leaderboard to see what it looks like for forwards on the Oilers:
You can use these leaderboards to judge offence (5v5 entries, 5v5 shot contributions), and defence (5v5 exits, 5v5 Entry Def %). Ultimately, these four leaderboards will help you identify the best and worst players in these areas.
In order to focus on one or two players, you should use the second set of visuals: The A3Z Player Comparison Tool. While HERO charts allow for player comparisons in stats collected by the NHL, this visualization was designed to help you judge players based on their performance in several stats from the tracking project. Instead of standard deviations, however, the measurement of choice in this comparison tool is percentiles. So keep in mind that “100” means the result is better than 100% of the other results. You can view a players results in two 1-year windows and one 2-year window, covering the 2016-17 season and the 2017-18 season. Here’s a two-year snapshot of how Erik Karlsson and Sidney Crosby rank in some of these key stats:
You probably noticed that the stats for forwards and defencemen are slightly different. The only difference is that defencemen have three extra categories, which measure their ability (or lack thereof) to defend their own blue line (i.e. their 5v5 Entries per Target, as discussed in the previous section). You may have also noticed some useful information hidden beneath each players name, including the numbers of games and minutes that have been tracked for the player. Although the numbers in the screenshot above are from two seasons, another thing to keep in mind is that you can also compare a players development over two seasons by looking at their stats in one-year windows. To see what I mean, take a look at Nikita Zaitsev’s numbers in two consecutive seasons:
Visualizing the dramatic fall of Nikita Zaitsev in this way is an excellent starting point for further analysis. Likewise, you can also compare two different players in the same season or over two seasons. This is, after all, a Player Comparison Tool. Other common uses for both sets of A3Z visualizations are to identify strengths and weaknesses of certain players, to evaluate potential acquisitions, to design the optimal lineup for your favourite team, and many more.
Of course, there are countless other useful terms and concepts to consider in analytics, like relative stats, shot quality, and expected goals (xG), which we’ll be touching upon more in-depth in future articles. If you’re interested in advanced stats and would like to learn more, we’ll be putting out more content on exciting topics in hockey analytics over the coming months, so stay tuned.