Guide

NHL Player Comparison Tool Guide by Anthony Turgelis

By: Owen Kewell and Adam Sigesmund (@Ziggy_14)

Player comparison is a popular topic of debate among armchair general managers: which guy is better? Would you rather have Player A or Player B? In the wake of a big 1:1 trade, which team won? While in the past we were left to bias, favouritism, and the infamous eye test, today we have some visualization tools to help compare players across useful metrics.

HERO Chart:

One of the best and most intuitive of these tools is the HERO Chart, as pioneered by Domenic Galamini Jr. (@MimicoHero). These charts, which are within the realm of descriptive statistics, can be found at the following website: http://ownthepuck.blogspot.ca/

Below we can see Alex Ovechkin’s HERO Chart:

Ovy.JPG

What Stats Are Measured?

HERO charts show performance across five stats: ICETIME, GOALS, FIRSTA, SHOTGEN, and SHOTSUP. ICETIME refers to all-situation (even strength, power-play, or short-handed) minutes per game. GOALS measures 5-on-5 goals per 60 minutes, while FIRSTA measures 5-on-5 first assists per 60 minutes. SHOTGEN is 5-on-5 shots generated per 60 minutes and SHOTSUP is 5-on-5 shots suppressed per 60 minutes, both relative to average. These stats are measured across the most recent three seasons, with weightings of 44%-33%-22% respectively to ultimately reach a single measure.

It’s important to note some key features of these metrics. Aside from ICETIME, the other four stats are measured only at even-strength and per 60 minutes of playing time. This serves to level the playing field, and accounts for the situation and frequency with which different players are deployed. Making these adjustments gives us a better sense of a player’s true performance, though we must consider HERO chart results in an appropriate context. Logging massive minutes and special teams scoring remain hugely important parts of the game, so they should not be disregarded when evaluating a player’s usefulness even if they are not reflected in a player’s HERO chart.

What Do the Numbers Mean?

Each of the numbers you see represents a standardized rating from 0 to 10. A rating of 5 represents league average performance at a skater’s position, with a standard deviation of 2 in either direction. For example, as we can see, Alex Ovechkin is league average at first assists compared to eligible wingers. A rating above 5 shows performance above league average, and vice versa. The scores are normally distributed with a standard deviation of 2. We can see that Ovechkin is considerably above league average at generating shots, and somewhat below league average at suppressing shots.

Can I See Someone’s Stats Over Time?

Yes you can! Just under the HERO chart you’ll find a chart showing how the player has performed over recent years. The dark blue line represents primary points per hour, and the light blue line represents shot impact per hour. Here is Ovechkin’s. We can see a slow decline, though Ovechkin remains a strong performer in both metrics.

Ovy2.JPG

How Do I Compare Players Directly?

HERO charts were largely built to perform direct comparison, so when you enter Domenic’s website you’ll see two charts beside each other. You can select players of your choice from the dropdown menu  for either chart and see a direct comparison. Let’s compare two elite centres: Sidney Crosby and Connor McDavid.

Scanning the charts, we can see that Crosby ranks higher in goals and shot generation, while McDavid ranks higher in first assists and shot suppression. Both players are fantastic across the board.

cros mc.JPG

What Else Can I Do?

In addition to comparing players to other players, we can compare players to positional archetypes. For example, we could see how Max Pacioretty stacks up compared to the average first-line winger, or how Morgan Rielly performs relative to an average #1 defenceman. Below we can see Pacioretty’s chart:

Pac.JPG

If you’re interested in learning more about how the archetypes are calculated, there’s a section labelled ‘Chart Guide’ on the website containing an explanation of the methodology. Personally, I (Owen) enjoy using archetype comparisons to evaluate acquisitions that my favourite team makes, as it gives a high-level indication of where a player could fit into a lineup. It’s also useful for convincing your friends that the young guy you’re bullish on has legitimate upside, and that your team is going to go all the way because of it.

I Have Unanswered Questions - Where Do I Go?

That’s a quick and dirty explanation of what HERO charts are and how to use them. If you have any burning questions that are unaddressed, I encourage you to read through the HERO chart FAQ’s that Domenic published. The link can be found here: https://ownthepuck.wordpress.com/2017/01/21/hero-charts-frequently-asked-questions/.

All-3-Zone Player Comparison Tools:

Eric Tulsky once said "the magic of analytics is in recording all of the small things lost to memory that add up to something significant.” The easiest events to remember after you watch a hockey game are the big events: the goals, and sometimes even the shots. What you probably don’t remember, though, are the small plays that led up to those events, and the small plays that led to nothing at all. Tulsky worked with people like Corey Sznajder (@Shutdownline) to study the events in the neutral zone that drive offense. Although Tulsky now works for the Hurricanes, Sznajder runs a massive tracking project whose numbers are brought to life by CJ Turtoro's (@CJTDevil) All-3-Zones Player Comparison Tools. Before we learn about these tools, it is important to note that Sznajder literally watches every game to collect these stats, as opposed to the data from HERO charts which are released by the league and then displayed as you saw earlier. The sample sizes in these visuals are smaller as a result, but we will see in a moment how they capture some important ways that players create value for their teams.

There are two sets of visuals, which can be found at the links below:

  1. https://public.tableau.com/profile/christopher.turtoro#!/vizhome/ZoneTransitionsper60/5v5Entries

  2. https://public.tableau.com/profile/christopher.turtoro#!/vizhome/2-yearA3ZPlayerComps/ComparisonDashboard

First, we will discuss the set of visuals you can find by clicking that first link above. Below, you will see a screenshot of one of the four visuals available at that link:

Entries.JPG

The stats displayed on this page quantify what happens when a player tries to enter the offensive zone with the puck. He can either carry it in (carry-ins/60), dump it into the zone and then chase after it (dump-ins/60), pass it off to a teammate (Entry passes/60) or fail in his attempt (fails/60).

We care about these numbers because entering the offensive zone with control of the puck is a reliable way to create offense. It is one way to quantify a small thing lost to memory that gives rise to something significant. As you can probably see from the leaderboard above, players who succeed at entering with control are better at creating offence than those who struggle to bypass opposing defenders. This is why the players here are sorted by possession entries (carry-ins + entry passes per 60 minutes).

While tracking carry-ins is a way to quantify the creation of offence, we can also use these numbers to quantify defence. Whenever a player tries to carry the puck into the offensive zone, the opposing defenders want to stop them. The best defenders in these metrics allow the fewest possession entries. The worst ones allow attackers to create offence with ease. It should not surprise you, then, that attackers try to target the defenders who struggle to defend the blue line. Defenders who allow possession entries 90% of the time they are targeted by opposing teams are obviously quite poor at defending the blue line. Below, you will see which defenders allow the fewest possession entries as a percentage of the number of times they were targeted:

Entry D.JPG

Some of the best defenders in the league show up in this leaderboard, which is further validation that what we are studying is actually important. It is always a good sign when the numbers are validated by the eye test and by years of research.

The best defensive teams either prevent zone entries altogether, or they remove the puck from the defensive zone as soon as possible. Indeed, zone exits are another way to measure defensive contributions in hockey, for both forwards and defensemen. The screenshot below shows which players succeed at removing the puck from their zone:

Exit.JPG

Again, positive contributions are measured by Possession Exits/60. Exiting with possession of the puck occurs when a player carries the puck out of the defensive zone (carries/60), or when they make a successful pass to a teammate (Exit passes/60). If a player fails to exit the zone with the puck, it is obviously a failed attempted (Fails/60). If he dumps it, clears it, or ices the puck, he is merely giving the other team another chance to create offence, which is why Possession Exits/60 ignores Dumps/60, Clears/60, and Icings/60. Exiting the defending zone with possession of the puck is obviously better than not.

So far, we have learned how to quantify the ways players transition from the defensive zone to the neutral zone, and then into the offensive zone. All of these numbers have one underlying theme: Puck possession leads to shots. But how do we measure which players create the most shots? While the obvious answer is to count the number of shots a player takes, the tracking project takes this one step further, and counts up to three passes before each shot is taken. In the same way that points are counted as goals and assists at the player level, the tracking project keeps track of shots and the passes that precede them. The visual below illustrates how each player contributes to shots by shooting or passing:

Shot.JPG

This leaderboard ranks players by their Total Shot Contributions per 60 minutes. A player contributes to a shot if he is the shooter (Shots/60), or if he made at least one of three passes before the shot was taken. Assisting on a shot is the same as assisting on a goal, except Shot Contributions consider up to three passes before a shot while points only consider two passes. If a player made a pass immediately before the shot was taken it is called a Primary Shot Assist (sA1/60), if he made the second pass before the shot it is a Secondary Shot Assist (sA2/60), and if he made the third pass it is a Tertiary Shot Assist (sA3/60). Altogether, shot contributions are an excellent and reliable way to measure which players are creating offence.

Now that we have explored this first set of 4 visualizations, we can move on to the second part: The Player Comparison Tool. As you will see below, the Player Comparison Tool presents the numbers in a way that summarizes all of the stats we have learned about from the leaderboards. Take a look:

Subban.JPG

Most of the stats seen here should seem familiar, but this time they are aggregated to provide you with a more general snapshot of each player. For example, the Shot Contributions leaderboard we saw earlier broke down Shot Contributions into four stats: shots, primary shot assists, secondary shot assists, and tertiary shot assists. The Player Comparison Tool, summarizes these numbers to measure shooting (Shots60), passing (ShotAssists60; sA1/60 + sA2/60 + sA3/60), and total contributions (ShotContr60; Shots60 + ShotAssists60).

The zone entry leaderboard is summarized in the Entry section, using possession entries expressed as a rate stat (PossEntries60) and possession entries expressed as a percentage of total entry attempts (PossEntry%). Similarly, the zone exit leaderboard is summarized in the Exit section.

It is important to note that if you are viewing a forward using this tool, you will only see the first three sections. The fourth section, Entry Defence, is only available for defenders. This section summarizes the aforementioned Entry Defence per Target leaderboard. As discussed earlier, the best way to defend the blueline is to prevent attackers from entering the zone with control of the puck. A defender who breaks up a play at the blue line is credited with breaking up the play (Breakups60). Defenders who concede controlled zone entries less often are the ones who rank best in the second stat (PossEntriesAllowed60). This is also expressed as a percentage of the number of times the defender is the target of an attempted zone entry by the other team (PossEntry% Allowed).

You can view a players results in two 1-year windows and one 2-year window, covering the 2016-17 season and the 2017-18 season. This allows you to compare one player to himself (in consecutive seasons) or two players to each other (in the same single season or across both seasons simultaneously). As shown in the intro to analytics article, an example that motivates the study of the former is Nikita Zaitsev’s first two NHL seasons. If you are feeling extra fancy, you can also view two different players with the same name...

Aho.JPG

Although the most valid comparisons are those between players of the same position, which is obviously not true of the Sebastian Aho’s, it demonstrates one of the many ways you can be creative with these visuals once you start using them. With these tools at your disposal, you can answer silly questions like “Is Sebastian Aho better than Sebastian Aho?” along with more  objective ones such as “Who contributes to offence the most often?” and “Which defenders are best at defending the blueline?” It would be impossible to answer any of these questions without the hard work of people like Sznajder, Turtoro, Tulsky, and the mission to record mundane elements of the game that uncover hidden areas of player value.


Keep up to date with the Queen's Sports Analytics Organization. Like us on Facebook. Follow us on Twitter. For any questions or if you want to get in contact with us, email qsao@clubs.queensuca, or send us a message on Facebook.

An xGuide to Soccer Analytics by Anthony Turgelis

By: Anthony Turgelis (@Anthony Turgelis), Erik Kiudorf, Jovan Novakovic

The State of Soccer Analytics

Relative to other major sports, soccer lags behind with regards to its acceptance of analytics within the game. Soccer is an extremely traditional sport that is usually reluctant to change, so this should not come as a huge surprise. While there are some that are ignoring, there are some that are using this as a competitive advantage - and it’s really working in some cases.

In a game as fluid as soccer, it is difficult to understand the game objectively amidst differing opinions from players, fans, coaching staff and the media alike. However, the recent growth of analytics in soccer provides an element of objectivity. It introduces new measures of predictability that encourage analysis, in an area where it is currently lacking.

Another reason that soccer analytics lags behind to the public eye, is due to the rarity and inaccessibility of the data. Not to mention the complexity and quantity of data required to fully capture value on an open-play sport with infinite game outcomes. The company that holds the monopoly on advanced soccer data is called Opta, and they track every game in every major soccer league around the world. Since there are a lot of games to cover worldwide, lots of things to track, and only a few groups doing it, it’s not hard to see why this data is easy to monopolize. As a result, this data is either difficult to scrape from the web, or too expensive for personal use as it is believed to be priced in the four digit range per year for a license for a single league’s worth of data, but obviously this varies by use and is not confirmed by Opta themselves. As a result, it is difficult, but not impossible, to practice public soccer data analysis.

There are still other ways though! Sites like WhoScored and Squawka offer simple game stats for teams and players, although they are not exportable with traditional methods. For MLS specifically, American Soccer Analysis offers many features to get your fix for advanced stats, which will be highlighted throughout the article. These concepts can be used as evaluation tools, to confirm the eye-test, or to just enhance the viewing experience of the game.

How Teams are Using Analytics

Although statistical analysis is not new to soccer - where pass counts, pass completions and shots taken, for example, are often recorded - such stats only provide information of certain events in the game, while lacking further insight. Soccer analytics helps identify and acquire insight regarding potential players’ performances based on previous data sources collected from past performances. These advancements enable coaches and managers to utilize this data to plan more effective training programs, team selections, and game strategies.  

Analytics can be broken down into technical and physical categories. The physical aspects account for distance covered, intensity, number of accelerations and decelerations and jumps and lands. This data is most often utilized to monitor individual training loads which helps minimize injuries. The Seattle Sounders of Major League Soccer mainly focus on sports science along with physical analytics to ensure players are at their physical peaks and to prevent injuries

However, technical analytics act as a tool to help players and coaches to quantitatively assess individual and based team performances. This information is used to improve both individual and team performances and design successful strategies for upcoming games. These mechanisms can also provide knowledge to predict outcomes of games, create new game strategies, determine the price value of a player and connect players to brands and sponsorship opportunities. Devin Pleuler, Senior Manager of Analytics at Toronto Football Club, explains the importance of analytics in Major League Soccer “The players are on a salary cap but the analytics department is not so it’s a way you can set yourselves apart in a relatively cheap manner”. Analytics helps us quantify individual in-game events to provide an understanding of the probability of success, often evaluated by estimating goal scoring potential. It assigns values to the events - events being each stat category - to help better understand and coordinate tactics and systems. Coaches and managers can use this data to tailor tactical systems for upcoming games that are backed by objective information, translating to higher success rates on the field.

It's no surprise then, that in a game where analytics is finally starting to carve out a place for itself, that the two using it the most heavily in the MLS, have ended up in back-to-back MLS Cup finals against each other. Fun tidbit, when these two teams first competed in the MLS Cup finals, TFC's Senior Manager of Analytics challenged the Sounders' Director of Analytics, Ravi Ramineni, to a friendly wager:

No word on whether Devin actually gave up his calculator or not, as TFC did end up losing that round. If he did, perhaps he got it back the next year when TFC was victorious over the Sounders.

Expected Goals (xG)

The most popular and most cited advanced metric in soccer analytics is Expected Goals (xG). Generally, expected goals is the count of how many goals a player should have been expected to score on, based on the quality of their chances. There are many models attempting to capture this, some better than others, but none are perfect. The main two inputs that can be found in most, if not all xG models, is where the shot took place, and how the shot was taken.

The ‘where’ of the shot refers to both the distance and angle of the shot. Logically, it seems to make sense that the further away a player is from goal the less likely their shot is to result in a goal. This becomes reflected in this statistic as shots from distance generally have a lower xG than close ones. In American Soccer Analysis’s model, they consider how much of the goal mouth is available to shoot at. The closer a player is to the goal line the less goal mouth will be directly exposed to him. To compensate for that a sharper angle will result in a decrease in xG.

Determining how the shot was taken is a slightly more complicated, as it is composed of the manner in which the physical shot is taken, as well as the lead up play to the shot. Higher probabilities are awarded to shots taken with the player’s foot rather than the head. This is simply because statistically a shot taken with the foot is more likely to score than a header. The build up play before the shot will affect the xG rating. For example, a shot taken from 10 yards on a counter attack will be awarded a higher xG then the exact same shot resulting from a corner. The reason for this is a concept is due to the time and space that the player would be allowed. Typically, on a fast break a player has more space and is able to get off his preferred shot. Whereas with a corner, the eighteen-yard box is very clogged so players are rushed to shoot and the chance of the ball being deflected is much higher.

What Can xG Tell Us?

Reasonable conclusions that can be drawn from xG are how often a player is in a good spot to score, and makes themselves available for good chances. Comparing their expected goals to their actual goals will give you an indicator of a player’s finishing ability, and whether they’ve benefited from good or bad luck. Think of it this way, if a player misses a sitter in front of the net by skying it over the bar, this type of shot from that location could be expected at (making this up) 95%. This player’s goal count would be zero, but xG count would be 0.95. The player got into a good position to score, but performed weakly in finishing. If they kept this up, there would be a large gap and this player could be deemed a poor finisher.

On the other hand though, let’s say two players in two different games take the same shot (which is deemed to be a 50% shot, or a 0.5 xG) against two goalies that are standing in the same spot. One goalie dives across and makes an incredible save, while the other falls just short. The player who did not score is penalized in goals for unluckily going up against a better goalie, which is out of their control. Sometimes, factors that are out of player’s control can affect their xG count in the short-term, while normalizing closer to the real goal total in a larger sample where luck would not affect them as much.

On AmericanSoccerAnalysis.com, you can find constantly updated MLS xG counts by game, player, and team. On Twitter, @11tegen11 tweets out a game maps of xG that were accumulated by each team in the game, and gives the odds of each team winning based on their xG count. This is a great way to identify which teams really got the better chances, but ran into some bad luck or good goaltending. His charts typically look like this:

11.PNG

Each scoring chance is denoted by the bar moving higher. The larger the rise of the bar, the higher the xG of the scoring chance, which means the more likely they are to score. In this came, it can be seen that Jelsson Vargas scored on a ~0.1xG chance, meaning he would be expected to score on that chance once every ten tries. The final xG coutns were 1.27 for Montreal, and 0.96 for Toronto, leading to the conclusion that it was a fairly even game that could have gone either way. This can also be seen in the match odds near the top left (that looks like a France flag for this game). What these mean are that in games where one team put up ~1.27 xG, and the other put up ~0.96, the team with the higher xG would be expected to win 43% of the time, draw 30% of the time, and win 28% of the time. TFC can consider themselves slightly unlucky to come out of this game without a point.

Expected Assists (xA) and Key Passes

xG is the most common tool to analyze how dangerous an attacker is. However, it doesn’t take into account how effective a passer is. That is why the stat ‘expected assists’ or xA was created. Expected assists is designed to give credit to the player that creates a chance not just the player who takes the chance. The way it does this is by assigning the xG rating of the chance to the passer in the form of xA. Therefore, if a through ball leads to a chance with an xG rating of 0.4 the player who laid the pass would be assigned an xA rating of 0.4.

Adding on to the playmaking measurement is key passes. Key passes are defined as “the final pass or pass-cum-shot leading leading to the recipient of the ball shooting”. The beauty of this stat comes from its simplicity. As long as the receiving player shoots the ball the passer is awarded a key pass regardless of the result of the shot. Therefore, it is quite easy to track and look out for during a game and will give the viewer a decent sense of which players create chances. However, the simplicity of key passes are also their downfall. Because every key pass is awarded the same rating of 1 it does not account for the type of chance created. A three-yard pass leading to a shot that goes ten yards wide is worth the same amount as a through ball leading to a tap in. Unlike xA, key passes do not differentiate and are less effective at actually measuring the total effect of creativity of a passer.

Player Comparison (Radars)

One the most useful, and easy to interpret tools (mostly) available to the public community are player radars. Due to the data constraints outlined earlier, it’s not so easy for everyone to make them, but there are thankfully a few people on Twitter who post them on a consistent basis, and that has essentially created a database of them on there. Here’s an example of a player radar created by Ted Knutson (@mixedknuts), for Sebastian Giovinco in  the 2016 season:

It might look like there’s a lot going on there, but it’s actually quite simple. Eleven stats are highlighted above, chosen by their position (in this case, forward). Each are presented in a per90 basis, so everyone is judged by the same scale. The closer each value stat is to the outer areas of the circle, is the closer that this player was to being the best in their respective league at it. The outer circle represents the top-5 percentile, while the middle of the circle represents the bottom-5 percentile for players in the same competition. If a player has a stat that touches the end, they are likely to be considered elite in that category. If they have a stat near the middle, this might be an indicator of their play style or they may have work to do. 0.39 throughballs has no relation to 1.2 dispossessions at all, aside from representing the same percentile rank for each different stat.

From this radar, we can see that Giovinco is an extremely high volume shooter, which is reflected in his high shots per 90, and low xG per shot. At first glance, his passing % looks weak, but considering that his passes into box number his well above average, he could be thought of as a creator near the goal. You probably already knew this, but the radar makes significant claims that Sebastian Giovinco is a fantastic soccer player, and has dominated the MLS. This really highlights the beauty of soccer analytics - it’s a great way to confirm the eye-test.

To access these player radars, it’s not an ideal process. First, go to The Twitter Search Page (does not require an account). The three people who have been identified that consistently post these are: @Mixedknuts, @Fussballradars, and @thefutebolist. Type any of their names (start with @Mixedknuts, his database is probably the largest, then move on to the other two) and then the name of the player you are looking for. It’s sometimes best to then filter by photos, as all the radars will appear there. You could then have found the radar you are looking for. If that didn’t produce any results, it’s not entirely hopeless. Ted Knutson occasionally opens a request line on Twitter, so if you want a radar for a player who does not have one yet, you can request one that way.

Score Effects

Score Effects are an important concept to consider, especially for casual viewing, as it might help explain certain phenomena that occur every single match. The idea here is that when teams are winning, they tend to sit back and defend more, and while they are losing, they push forward. Seems obvious, right? The thing that is not always obvious to most people is how this will affect the flow of the game, the final stat-line, and the quality of shots that can be expected. Statsbomb did a detailed statistical analysis on score effects which can be found here, which shows some of the math and stats they used to confirm this effect.

Essentially, what they found was that when teams were leading in a game, they tend to form a ‘defensive shell’ which will tighten them up defensively, and drop deeper. This is done because to them, preventing a goal would be more valuable than scoring another. They tend to allow more shots from a further distance out, and these shots typically are less likely to go in.

On the other hand, when teams are trailing by a goal, they will tend to take more shots in a more desperate attempt to score the tying goal. These shots will typically be of lesser quality due to this desperation and by not being afforded the freedom to wait for the perfect chance to become available. The conversion rates on these shots tend to be lower, which is another hat-nod to the notion that these shots are of lesser quality.

Add all of this up, and you could see a very lopsided statline at the end of the game if one team happened to be trailing for the most of it. It might paint a picture that one team dominated and got lucky. This could be true, but hopefully with knowledge of the concept of score effects, you will be able to see through this scoreline and consider that these shots could have been lower quality and part of the defending team’s plan all along.


Keep up to date with the Queen's Sports Analytics Organization. Like us on Facebook. Follow us on Twitter. For any questions or if you want to get in contact with us, email qsao@clubs.queensu.ca, or send us a message on Facebook.

What's a Corsi Anyway?: An Intro to Hockey Analytics by Scott Schiffner

By: Owen Kewell, Scott SchiffnerAdam Sigesmund (@Ziggy_14), Anthony Turgelis (@AnthonyTurgelis)

Advanced statistics is an area that has recently started to pick up steam and shift into the mainstream focus in hockey over the past decade. Many NHL teams now employ full-time analytics staff dedicated to breaking down the numbers behind the game. So, what makes analytics such a powerful tool? Aside from helping you dominate your next fantasy hockey pool, advanced statistics provide potent insights into what is really causing teams to win or lose.

Hockey is a sport that has long been misunderstood. Its gameplay is fundamentally volatile, spontaneous and difficult to follow. There are countless different factors that contribute to a team’s chances of scoring a goal or winning a game on a nightly basis. While many in Canada would beg to differ, ice hockey still firmly occupies last place in terms of revenue and fan support amongst the big four major North American sport leagues (NFL, MLB, NBA, & NHL). As such, hockey is on the whole overlooked and is often the last to implement certain changes that come about in professional sports. The idea of a set of advanced statistics that would offer better insights into the game arose as other major sports leagues, starting with Major League Baseball, began looking beyond superficial characteristics and searching for the underlying numbers influencing outcomes. Coaches, players, and fans alike have all been subjected over the years to an epidemic failure to truly understand what is happening out there on the ice. This is the motivation behind the hockey analytics movement: to use data analysis to enhance and develop our knowledge of ice hockey and inform decision-making for the benefit of all who wish to understand the sport better.

Another barrier to progress in the field of hockey analytics is the hesitance of the sport to embrace modern statistics. Most casual fans are familiar with basic stats such as goals, assists, PIM, and plus/minus. But do these stats really tell the full story? In fact, most of these are actually detrimental to the uninformed fan’s understanding of the game. For starters, there is usually no distinction between first and second assists in traditional stat-keeping. A player could have touched the puck thirty seconds earlier in the play or made an unbelievable pass to set up a goal, and either way it still counts as a single assist on the scoresheet. Looking only at goals and assists can be deceiving; we need more reliable, repeatable metrics to determine which players are most valuable to their teams. Advanced stats are all about looking beyond the surface and identifying what’s actually driving the play.

So, what are these so-called “advanced stats”? Let’s start with the basics.

PDO: PDO (it doesn't stand for anything) is defined as a team’s save percentage (usually 5v5) + shooting percentage, with an average score of 1. If you only learn one concept, it's this one. It is usually regarded as a measure of a team or player’s luck, and can be a useful indicator that a player is under/over performing and whether a regression to the mean (back towards 1.000) is likely. This will not happen in every situation, of course, but watch for teams that have astronomic PDOs to hit a reality check sooner rather than later. Team PDO stats can be found on corsica.hockey’s team stats page.

Without trying to scare anyone, the Toronto Maple Leafs currently boast the 4th highest PDO at 101.85. To help ease your mind a bit, the Tampa Bay Lightning who are considered the team to beat in the East have the highest PDO of 102.35, and there's a decent gap between second place. They could be currently playing at a higher level than they really are as well, time will tell. 

Corsi: You may have heard of terms like Corsi and/or Fenwick being thrown around before. These are core concepts that are fundamental to understanding what drives the play during a game. Basically, Corsi is an approximation of puck possession that measures the total shot attempts for your team, and against your team, and stats can be viewed for Corsi results when a specific player is on the ice.

A shot attempt is defined as any time the puck is directed at the goal, including shots on net, missed shots, and blocked shots. Anything above 50% possession is generally seen as being positive as you are generating more shot attempts than you are allowing.

Corsi stats are typically kept in the following ways: Corsi For (CF), Corsi Against (CA), +/-, and CF%. An example of how CF% can be useful is when evaluating offensive defensemen. Sometimes, these players are overvalued because of their noticeable offensive production, while failing to consider that their shaky defensive game offsets the offensive value they provide. 

Fenwick: Fenwick is similar to Corsi, but excludes shot attempts that are blocked. Of course, with both of these stats, one should also take into account that a player’s possession score is influenced by both their linemates as well as the quality of competition (QoC). These stats can always be adjusted to reflect different game scenarios, like whether the team was up or down by a goal at the time, etc.

Measuring puck possession in hockey makes sense, because the team that has the puck on their stick more often controls the play. Granted, Corsi/Fenwick are far from perfect, and the team with the better possession metrics doesn’t always come out ahead. But at the very least, including all shot attempts offers a much larger sample size of data than traditional stats, and provides a solid foundation for further analysis.

Zone Starts (ZS%): this measures the proportion of the time that a player starts a shift in each area of the ice (offensive zone vs. defensive zone). A ZS% of greater than 50% tells us that the player is deployed in offensive situations more frequently than defensive situations. This is important because it gives us insight into a player’s usage, or in what scenarios he is normally deployed by his team’s coach. It also provides context for interpreting a player’s Corsi/Fenwick. Players who are more skilled offensively will tend to have a higher ZS% because they give the team a better chance to take advantage of the offensive zone faceoff and generate scoring opportunities. At the very least, ZS% can be used to get a glimpse at how a coach favors a player’s skillset.

Intro on 5v5 Isolated Stats and Repeatability

Often times, you will see those who do work with hockey analytics cite a player's stats solely while they are at even strength, or 5v5. Why? There's a few reasons.

First, 5v5 obviously takes up most of the hockey game. If a player is valuable to his team at 5v5, he will be valuable to a team for more time throughout the game, and this should be seen as a large positive. A player's power play contributions are certainly valuable to a team, but often over-valued. Next, the game is played very differently at different states. It would be wildly unfair to penalty killers to have their penalty kill stats included in their overall line, as more goals against are scored on the penalty kill, even for the best penalty killers. Separating these statistics helps provide a more complete picture into the player's skillset and value that they have contributed to their team. Finally, 5v5 stats are generally regarded as the most repeatable, partially due to the larger sample. While players' PP and PK stats can highly vary by year, 5v5 stats typically remain relatively stable (read more at PPP here if you like).

In addition, primary points (goals and first assists) have been regarded as relatively repeatable stats, so be on the lookout for player's that have many secondary assists to possibly have their point totals regress in the future (read more on this here).

Intro to Comparison Tools

One of the areas that has most benefited from hockey analytics is the domain of player comparison. One of the best and most intuitive tools is the HERO chart, as pioneered by Domenic Galamini Jr (@MiminoHero). The HERO chart is a quick comparison of how players stack up across ice time, goal scoring, primary assists, shot generation and shot suppression. At a single glance, we can get a sense of the strengths and contributions of different players. Here we compare Sidney Crosby to Connor McDavid:

hero.png

We can see that Crosby is better at goal-scoring and shot generation, while McDavid is better at primary assists and shot suppression.

To compare any two players of your choice, or to compare a player to a positional archetype like First-Line Centre or Second-Pair Defender, you can use Galamini’s website: http://ownthepuck.blogspot.ca/. These comparisons can be used to enhance understanding of a player’s skill set, inform debates, and evaluate moves made by NHL teams, among other uses.

All-3-Zone Data Visualizations:

While a HERO chart is an all-encompassing snapshot of a players contributions on the ice, the All-Three-Zones visuals are concerned with more specific aspects of the game. CJ Turtoro (@CJTDevil) created two sets of visuals using data from Corey Sznajder’s (@ShutdownLine) massive tracking project.

You can find both sets of visuals at the links below:

  1. https://public.tableau.com/profile/christopher.turtoro#!/vizhome/ZoneTransitionsper60/5v5Entries

  2. https://public.tableau.com/profile/christopher.turtoro#!/vizhome/2-yearA3ZPlayerComps/ComparisonDashboard

In the first set of visuals, you will find 4 leaderboards. Players are ranked in the 5v5 stats listed below.

  • 5v5 Entries -- How often players enter the offensive zone by making a clean pass to a teammate (Entry passes/60) or by carrying the puck across the blue line themselves (Carry-ins/60).

Other notes: The best way to enter the zone is to enter with possession of the puck (Entry passes + Carry-ins, as discussed above). These types of entries are called Possession Entries. Although other types of attempts are included in the leaderboard as well, players are automatically sorted by Possession Entries/60 because these alternative attempts are less than ideal. If you decide to change this, use the “Sort By (Entries)” filter to rank the players in other ways.

  • 5v5 Exits -- This is the same as 5v5 entries, except at the blue line separating the defensive zone from the neutral zone. Players are ranked based on how often they transition the puck from the defensive zone into the neutral zone either by carrying it (Carries/60) or by passing it to a teammate (Exit Passes/60).

Other notes: Like 5v5 entries, the best ways to exit the defensive zone are classified as Possession Exits. This is why players are automatically sorted by Possession Exits/60. Again, the “Sort By (Exits)” filter will let you change how the leaderboard is sorted.

  • 5v5 Entries per Target (5v5 Entry Def %) -- This stat measures defence at the blue line. It answers the question: When a defender is in proximity to an attempted zone entry, how often does he stop the attempt?

Other Notes: It is important to note that a “defender” is any player on the team playing defence (i.e. the team without the puck). Forwards are included in this definition of defender, but the best way to use this leaderboard is to judge defensemen only. This is why forwards are automatically filtered out of the leaderboard, but you can always change this using the filter if you wish.

  • 5v5 Shots and Passes -- Players are ranked based on how often they contribute to shots. Players contribute to shots by being the shooter or by making one of three passes immediately before the shot in the same way they earn points by scoring a goal or by making one of two passes immediately before the shot was taken.

If you want a closer look at certain groups of players, the filters allow you to look at players who play certain positions (forwards/defencemen) and players who play on certain teams. In the screenshot below, for example, I filtered the 5v5 Entries leaderboard to see what it looks like for forwards on the Oilers:

entries:60.png

You can use these leaderboards to judge offence (5v5 entries, 5v5 shot contributions), and defence (5v5 exits, 5v5 Entry Def %). Ultimately, these four leaderboards will help you identify the best and worst players in these areas.

In order to focus on one or two players, you should use the second set of visuals: The A3Z Player Comparison Tool. While HERO charts allow for player comparisons in stats collected by the NHL, this visualization was designed to help you judge players based on their performance in several stats from the tracking project. Instead of standard deviations, however, the measurement of choice in this comparison tool is percentiles. So keep in mind that “100” means the result is better than 100% of the other results. You can view a players results in two 1-year windows and one 2-year window, covering the 2016-17 season and the 2017-18 season. Here’s a two-year snapshot of how Erik Karlsson and Sidney Crosby rank in some of these key stats:

a3z.png

You probably noticed that the stats for forwards and defencemen are slightly different. The only difference is that defencemen have three extra categories, which measure their ability (or lack thereof) to defend their own blue line (i.e. their 5v5 Entries per Target, as discussed in the previous section). You may have also noticed some useful information hidden beneath each players name, including the numbers of games and minutes that have been tracked for the player. Although the numbers in the screenshot above are from two seasons, another thing to keep in mind is that you can also compare a players development over two seasons by looking at their stats in one-year windows. To see what I mean, take a look at Nikita Zaitsev’s numbers in two consecutive seasons:

zaitsev.png

Visualizing the dramatic fall of Nikita Zaitsev in this way is an excellent starting point for further analysis. Likewise, you can also compare two different players in the same season or over two seasons. This is, after all, a Player Comparison Tool. Other common uses for both sets of A3Z visualizations are to identify strengths and weaknesses of certain players, to evaluate potential acquisitions, to design the optimal lineup for your favourite team, and many more.

Of course, there are countless other useful terms and concepts to consider in analytics, like relative stats, shot quality, and expected goals (xG), which we’ll be touching upon more in-depth in future articles. If you’re interested in advanced stats and would like to learn more, we’ll be putting out more content on exciting topics in hockey analytics over the coming months, so stay tuned.


Keep up to date with the Queen's Sports Analytics Organization. Like us on Facebook. Follow us on Twitter. For any questions or if you want to get in contact with us, email qsao@clubs.queensu.ca, or send us a message on Facebook.

Big Baller Data: A Basketball Analytics Guide by Anthony Turgelis

By: James Acres, Josh Antonucci, Michael Blumel, Cameron Raymond, Cody SmithHunter Smeaton

All current stats used are from basketballreference.com at time of article's publication.

As NBA fans, we are constantly bombarded with different statistics. Every evening you look at your phone to see notifications from various apps; triple double for Lebron, 50 pts 10 rebounds from Anthony Davis, and so on. We are constantly exposed to these types of simple statistics, they are what forms our opinions on players, and what we use to backup arguments when discussing the NBA players with peers. Although these statistics are extremely valuable, it is important to acknowledge different types of analytical methods that can be useful in formulating a more complete understanding of statistics in the NBA. Analytics certainly can not paint the entire picture of a basketball game, but they are certainly a part of it, so there’s no sense in ignoring it any longer.

This guide will introduce you to many concepts that are prevalent in the basketball analytics community. They can be used for your own analysis, or to enhance your viewership of the game. Hopefully, there will be concepts throughout that will challenge the way you fundamentally think about the game of basketball.

Moreyball (Not a typo)

If you are a fan of sports, baseball or analytics, then you most likely have seen or heard of the movie/book “Moneyball”. Just like our baseball guide states, if you haven’t seen it, you should watch it as soon as possible. Bill James was the true pioneer behind bringing advanced statistics to the mainstream in sports and Daryl Morey is taking it to the next level in the NBA, introducing “Moreyball”.

Daryl Morey is the Houston Rockets GM. Morey was not an athlete and had no basketball experience whatsoever. He acquired a bachelors degree in computer science from Northwestern university and an MBA from MIT. Daryl Morey is a stats junkie, and based on heavy analytics usage has built the modern Houston Rockets.  

On the other end of this spectrum is Charles Barkley. Barkley, a Hall of Famer and 11 time all star argues that, “analytics is crap” and that the NBA is talent based and that Morey is “one of those idiots” and went as far as saying analytics is, "a bunch of guys who ain't never played the game [and] they never got the girls in high school." Watch the rant on the YouTube video below:

TNT commentator Charles Barkley rants about analytics in the NBA and Houston Rockets GM Daryl Morey.

That was two years ago when Houston finished with 55-27 record. Today, Houston boasts the NBA’s best record to date and Moreyball is in full effect relying on two basic tenants.

  • 3 > 2
  • It’s much easier to dunk the ball than to shoot it

The idea is that the most efficient shots in basketball are layups/dunks, and 3 pointers. The former makes perfect sense, you’re less likely to miss a shot if you are extremely close to the rim. However, it wasn’t until somewhat recently that teams have been looking closer at the 3-point shot. Morey’s key observation was that if someone takes 100 3-point shots and makes a third of them, then that produces the same amount of points as the person who takes 100 2-point shots and makes half of them. 33% from 3-point range is below league average, but 50% on all 2-pointers is extremely impressive, unless the majority of your shots come at the rim.

This analytical approach is largely based on advanced stats like True Shooting percentage. This adjusts for the fact that a 3-point shot is worth 50% more than a 2-point shot, and that free throws are a part of an efficient offensive performance as well. Morey’s conclusion was that instead of taking a mid-range shot, in most cases, you are better off taking a few steps back and shooting a 3-pointer.

This is shown perfectly in game 1 of last year’s Western Conference Semi-Finals, where the Rockets bested the San Antonio Spurs 126-99. Below is a visualization of all the shots that the Houston Rockets took that night.

(Credit /u/BradGroux, Reddit.com)

(Credit /u/BradGroux, Reddit.com)

In this win the Rockets were able to produce 27 more points, while only taking 3 more shots than the Spurs.

However, this brings us to the limitations of Moreyball. The Spurs were able to adjust throughout the series to better defend the James Harden-led squad, and moved on to the Western Conference Finals after 6 games.

The fate of Moreyball still remains to be seen, without a Houston championship it will be hard to convince the old guard of basketball that analytics can win championships. However, with the Rockets currently sitting on the best record in the league, and the philosophy’s poster boy James Harden looking primed to win the MVP award, they seem confident. We encourage you to join us in the future as we follow the journey of Moreyball, especially come playoff time when defense strengthens and every move will be analyzed under a microscope.  

Intro to Advanced Basketball Analytics Metrics

Effective Field Goal Percentage (eFG%): Effective Field Goal percentage is a metric that you may have occasionally encountered. eFG% is a pretty easy concept to understand as it simply takes into account the fact that three point shots are worth 50% more than two point shots. Looking at this numerically, shooting 50% from three is equal to shooting 33.33% from two (remind you of Moreyball?). This is an important statistic to acknowledge when looking at a given players field goal percentage as it will give you a better understanding of their true efficiency in scoring the basketball. An example of this is shown when looking at Demar Derozan and James Harden. This season, Derozan’s field goal percentage (46.1%) is higher than Hardens (45.1%), but his effective field goal percentage is lower, Derozan at 49.4% while Harden’s eFG% is 54.6%. This can be attributed to the fact that Harden shoots (and scores) a lot more three point shots than Derozan does, resulting in a higher eFG%.

Value Added (VA) = (Minutes * (PER - PRL)) / 67. This is the estimated number of points a player adds to a team’s season total above what ‘replacement player’ (for instance, the 12th man on the roster) would produce. More on PER later (it needs its own section), so circle back here. The PRL (Position Replacement Level) = 11.5 for power forwards, 11.0 for point guards, 10.6 for centers, 10.5 for shooting guards and small forwards.

Estimated Wins Added = Value Added (VA)/30

Usage Rate (USG) = [(FGA + (FTA * 0.44) + (Assists * 0.33) + TO) * 40 * League Pace] /(Minutes x Team Pace). Don't worry, someone else does all of the calculations. What all these calculations lead to, is the number of possessions a player uses per 40 minutes.

This statistic aims to point out certain players on teams which rely on him more often to create something on offence. Russell Westbrook in the 2016-17 season, was able to break the season record of triple-doubles in a season. To numerically show how much of a workload he had, can be exemplified with the highest usage rate in the NBA at 40.8%. This means that almost half the game the team would rely on him to create scoring, as this translated to 31 points per game and 10 assists (roughly 25 points per game) to bring a grand total of around 56 points production per game. The total for the team was 106.6 PPG. To say he was heavily relied on would be an understatement.

Player Efficiency Rating (PER): 

The most popular advanced metric commonly used today in basketball is player efficiency rating or PER for short. If you are familiar with baseball statistics, then this is comparable to WAR to determine a player’s efficiency compared to others. This metric involves one of the most complex formulas known within the analytics of all major sports.

What PER tries to accomplish is evaluating how productive a player performs on a per minute basis. It adds up positive contributions a player makes on the court while subtracting negative contributions in a statistical point value system. Things like points, rebounds assists would obviously be positive additions while turnovers would be negative. This stat is adjusted for pace and playing time which makes it easily comparable player to player.

The shortcoming with this stat is that there are not many stats in basketball that can back up how efficient a player is on defense. Sure, there are blocks and steals but this only tells so much and can be mostly a result of good team defense instead of individual. Where this deficiency becomes truly evident is that in 2013, Paul George, one of the NBA’s best two-way players had a lower PER than Jamal Crawford and Jr. Smith.  For those of you who don’t know much about Jr Smith, he is one of the best bad shot takers and makers in the NBA. Take a look at the video below and you’ll get a good idea of why his shot selection should rank him much lower.

Some analysts are obsessed with this stat, and others aren’t. Like all advanced statistics, you must view the whole picture before determining whether a player is performing well or not. This season, in Cleveland’s struggles with Isaiah Thomas, LeBron was close to averaging a triple double yet constantly had a negative PER. A triple-double (10+ in in any 3 categories) is one of the most impressive things a player can do, so even if you are not familiar with basketball you can realize quickly that PER is not the end all be all stat. Typically though it can give you a quick snapshot into who the most productive players on the court are and it generally includes the NBA elite.

How it's Calculated (You don't have to follow the whole thing, but it's good to view the inputs):

The calculation is the overall rating of a player’s per-minute statistical production and is widely applied by the largest sports corporations to distinguish players between one another. The league average is 15.00 every season.

The formula begins with calculating the unadjusted PER (uPER):

uPer 1.PNG
uper 2.PNG

Where:

per3.PNG

With:

tm, the prefix, indicating of team rather than of player;

lg, the prefix, indicating of league rather than of player;

min for number of minutes played;

3P for number of three-point field goals made;

FG for number of field goals made;

FT for number of free throws made;

VOP for value of possession (but in reference to the league, in this instance);

RB for number of rebounds: ORB for offensive, DRB for defensive, TRB for (total) combined, RBP for percentage of offensive or defensive;

Got all that? Good.

Once uPER is calculated, it must be adjusted to team pace and normalized for the league to become PER.

This final step takes away the advantage given by teams that play an uptempo style, as the adjustment accounts on a per possession basis so that data can be depicted better. By looking at the top 10 list in the NBA done by ESPN, you can tell that a trend through all players is that they seem to create shots and momentum on offense that appears to be effortless.

PER leaders.PNG

Intro into Match-Up Based Statistical Analysis

In sports, everyone is trying to find a new way to predict performance based on statistical analysis. With basketball being a match-up based sport, a match-up based analysis style is the most effective tool for predicting performance. Match-Up Based analysis deals with assessing habits of players, how efficient they are in certain areas on the floor, both offensively and defensively, and comparing this to their likely opponent in a given, upcoming game.

Here is a basic hypothetical example of match-up based analysis during a Toronto Raptors vs. Houston Rockets game. To keep this short, I will exclusively focus on the Point Guard of the Toronto Raptors, Kyle Lowry. To help predict Kyle’s performance we must first look at the basic offensive statistics; FGA, AST, REB, etc. I will then break down each of these statistics into 14 distinct zones, viewable on the graphic below. This will enable us to assess where Kyle’s tendencies for shooting, passing, driving, etc., derive from. We then asses how efficient he is in these areas by using more advanced statistics (EFG%, AST%, REB%, etc.). This information is critical as it allows us to predict where Kyle will be situated on offensive possessions, in addition to how efficient he is in those areas. We do the same analysis on the defensive side and move on to the player that will be battling Kyle for a majority of the game. Using Houston as the example, he will be matched up with Chris Paul. After taking in the same statistical analysis for Chris Paul, we will then compare both Point Guards offensive and defensive results against one another. The point of this (the thing here though, Skip), is to find out which Point Guard is better on any given night. Once we’ve analyzed these players and their behaviours on either end of the floor, the result will be the foremost indicator of how they’ll perform, in any given matchup.

mba.PNG

 

Given that this is a preliminary analysis, there are many external factors that could lead to bias of measurement. Some questions to further consider may include: What happens if teams double-team a player? What if a bench player is used more defensively to cover a starter? To effectively answer these, once a more in-statistical analysis is conducted, I will be able to analyze, with a degree of certainty, why a player is chosen to guard an opposing player, on any given night, and the reactional implications of this. By accumulation of vast quantities of data, applying this analysis strategy, and breaking each player down into one number, we are able to produce a result that takes everything into account. We'll be looking at this further throughout the year.

Intro to Defensive Statistics

Most people interested in basketball are familiar with the common box store defensive stats such as steals, blocks, and defensive rebounds to name a few. Basing a player’s defensive strength on these metrics is not ideal in today’s game, and that leads us to look at  more advanced statistics.

As a brief intro to these statistics, we will discuss defensive rating as well as defensive real plus minus. Defensive rating measures the number of points per possession (can also be measured per 100 possessions) the opponent’s offense scores while a certain player is on the floor. As an example, if a player has a DRTG of 102, it means that each possession, the opponents tend to score 1.02 points. Only points that are scored as a result of the individual player defensive breakdowns are counted against him. This also eliminates other certain factors like pace of play and minutes played per game. So obviously in this case, the lower the number the better. The only downside to this statistic is the fact it is difficult to determine why the defense was so good if 5 players were on the floor. For example, if player A and B play all of their minutes together and player B is the superior defender, it will also look like player A is a great defender. So, based on this attribute its very hard to see the defensive value of a single player on the court at one time.

The next type of defensive statistic is defensive real plus minus (dRPM). It measures value in points per 100 possessions, much to the same as DRTG, but instead it only compares against as average player. A DRPM of +1.5 means you are worth 1.5 points per 100 possessions compared to an average player in the league. Additionally, it uses models to take away possible fluctuating variables like home court advantage in order to level possession scoring information. Something that DRPM does that DRTG does not is the ability to make good guesses at which of the 5 players deserve credit for good defense per possession. Since there isn’t a lot to go off of earlier in the season, DRPM takes time to accurately guess which players are good at defense and corrects itself as the season goes on. Ultimately, there isn’t an exact way to determine which player on the court is the best at defense, but DRPM uses some fancy math in order to make the most accurate and best guesses as to who it is.

Statistics in sports, especially basketball, have become increasingly popular, and newer, better models will be introduced in the foreseeable future. These are just a few different measures of defensive statistics that teams are using more and more in order to pick lineups that match up better defensively against certain opponents.


Keep up to date with the Queen's Sports Analytics Organization. Like us on Facebook. Follow us on Twitter. For any questions or if you want to get in contact with us, email qsao@clubs.queensuca, or send us a message on Facebook.

Advanced Baseball Stats for Casual Baseball Fans by Anthony Turgelis

By Anthony Turgelis

We’ve all seen Moneyball. If you haven’t seen Moneyball, go see Moneyball, it’s on Netflix. The ‘Moneyball Revolution’ within baseball has shaken up the game, and changed the way that executives in baseball are looking at the game.

This will be an intro to some of the stats, metrics, and concepts that these executives are looking at. The goal here isn’t just to define what these things are, but rather to show how they can be used as tools of evaluation, to confirm the eye-test, or to just enhance the experience of the game. You might even end up sounding smart in front of your friends. When writing this article, I tried to include everything I wish I knew when first diving into the world of baseball analytics.

To avoid boring you with the history of how this Moneyball Revolution came to be, I’ll only drop one name that you should be familiar with - Bill James. Bill can be credited for being the pioneer of statistical analysis within baseball, as in the 1970s he was one of the first to publish this type of work that would be seen by a wide audience. Many people found his work fascinating, and attempted to replicate it, and - to make a long story short - after 30 years of this, the MLB finally took notice and the Moneyball Revolution began.

Concepts/Terms to Know:

The majority of these terms and concepts have been taken from Fangraphs, which is a site to find many advanced baseball stats and analysis. Links on where to find these concepts/stats will be provided.

Fielding Independent Pitching (FIP) - FIP is an adjusted Earned Run Average (ERA, or runs allowed by a pitcher excluding errors) metric that attempts to quantify what a pitcher’s value would be if they stripped out the defense component of the game. FIP assumes that all balls that are hit into play are given league average results on whether they fall for a hit or not. This way, a pitcher is not penalized for having a bad defense behind him, which certainly would affect their pitching results, and their ERA as a result. FIP is considered predictive as it has higher correlations across seasons than ERA, which makes sense considering it measures things that the pitcher can control and not things like defense which can fluctuate by game and by season. It is adjusted so that the league-average FIP is the same as the league-average ERA. This is done so that it can be easily compared to a player’s ERA to see if they are over/under-performing their FIP, and whether there may be any regression available for the player. There are cases of players who can consistently outperform their FIP numbers, such as Marco Estrada who in 2015-16 was elite at inducing weak contact (which can be considered a skill), so FIP assuming league-average results on balls-in-play would likely paint him as less effective than he actually is. On the other hand, his ERA did balloon to 4.98 in 2017 after significantly outperforming his FIP the previous two years, so the regression bug may have actually hit him as well.

FIP can be found on Fangraphs pitcher pages, such as Marco Estrada’s, next to ERA, where you will find his 2017 FIP to be 4.61.

Batting Average on Balls in Play (BABIP) - BABIP is a player’s batting average on only balls that were put into play, and the average is roughly .300 for both hitters and pitchers. The reason why this is a very important stat, is that it tends to stabilize after 800 balls in play. This means that if a player is having a stretch of months (or even a whole year) where they are achieving a much higher/lower BABIP than league average, and their career average, they are likely due for some regression as they have likely been getting lucky/unlucky on the results of the balls they have put into play. It’s worth noting that better hitters will likely have higher BABIPs, and vice-versa, and some players are able to sustain high BABIPs throughout their career without regression. The 2017 Toronto Blue Jays hitters ranked dead last in the entire MLB in BABIP in 2017, which can be seen as a source of optimism that they may achieve better results on their balls in play in 2018.

BABIP can be found on Fangraphs pitcher/batter pages, such as fringe prospect Dwight Smith Jr’s, who rode a .588 BABIP in 2017 to achieve his .370 batting average, which was less impressive and likely luck-driven given his ridiculous BABIP, and so he still earned a demotion and will likely not get an early look to crack the 2018 team.

Hit Probability - To temporarily stray from Fangraphs, Hit Probability is a metric that was introduced by Statcast at the beginning of the 2017 season to estimate the likelihood that a ball-in-play will be a hit, based on its launch angle and exit velocity compared to similarly hit balls in the past. Similarly to FIP, it attempts to negate the effects of defense and the ballpark on players who may have high percentage hits robbed by star outfielders making unlikely plays, or getting credit for many weak hits that likely would not be repeated. I did an analysis on how the 2017 Blue Jays were being affected by luck based on their hit probabilities, and throughout the season I saw players regress to what their averages were expected to be based on their Hit Probability numbers. The most extreme case was Devon Travis who had a cold start but still had high aggregated Hit Probability numbers but who, as the season progressed, positively regressed to the expected level. The quarter season report can be found here, and the mid-season report can be found here.

Hit Probability statistics can be found on Baseball-Savant here, where you can select any game and see the hit probabilities for all balls in play for that game.

Weighted Runs Created + (wRC+) - wRC+ is an attempt to quantify a player’s total offensive output into one total stat, based on the value of their contributions, after park adjustments. It uses the concept of Weighted On Base Average (wOBA) which simply gives the run value of each plate outcome. For example, it finds that triples contribute to runs roughly twice as often as a single, so a triple would be worth double the value of a single in this calculation. After doing this, you can find out the value of runs created by each player’s offensive outputs. wRC+ is a rate statistic, so it is very easy to be used even in smaller samples to see how a hitter has been performing. It is one of the best tools to use when evaluating a hitter’s offensive abilities. The league average wRC+ is 100, and each point above 100 is indicative of one percentage above league-average.

It can be found on the batter pages on Fangraphs, such as Mike Trout’s, who was the 2017 leader at 181 wRC+, beating Aaron Judge by 8 points even with 19 less home runs.

Park Adjustments - No Two Parks are The Same:

To state the obvious, no two MLB ballparks are the same. The most noticeable difference is obviously the different dimensions, but additionally there are many other factors at play such as weather and other environmental factors. As a result, there tend to be plenty of differences in player performance at different parks, and adjustments are calculated to reduce the effects of these parks as best as possible. They typically are separated for left and right-handed batters, since parks are not always symmetrical, they may favour one-sided batters over another.

Colorado’s Coors Field is regarded as the extreme case of a ‘Hitter’s Ballpark’ - hitters tend to generally perform well there due to the high altitude and large outfield so batters can expect more balls in the outfield to fall for hits. Conversely, AT&T Park in San Francisco is regarded as the largest case of a ‘Pitcher’s Ballpark’ due to its high walls and damp air. Rogers Center in Toronto is ranked as the 8th best ballpark for hitters. Four out of five ballparks in the AL East are considered to favour the hitter over the pitcher, so that could be one of the reasons why a team based in Toronto fails to attract premium free agent pitchers.

The War on WAR:

If you only have time to learn about one advanced stat in baseball, Wins Above Replacement (WAR) is the one to go with. WAR is an attempt to quantify the overall value of a player’s contributions into one easy number. It simply could be put as: The number of wins that you can expect your team to add while employing the player, compared to a different player that would be easily acquired from the minor-leagues or a team’s bench.

WAR is a counting stat and is based on what happened, rather than what will happen in the future. If an MVP-calibre player only played 20 games, they may have a lower WAR than many inferior players, due simply to the fact that they didn’t play enough games to accumulate a high WAR total.

Fangraphs goes into more details of what exactly goes into the WAR stat for hitters, but essentially it is the total value of runs that a batter contributes to the team in the areas of: hitting, baserunning, fielding, divided by how many wins a team can be expected to win with those runs added (Runs/Win generally fluctuates by year but is ~10). It is then adjusted by position (For example: CF is much harder to play than 1B, so they are credited accordingly - more here), adjusted by ballpark, and adjusted to consider the ‘Replacement Level’ player and how much more/less valuable that player is to this imaginary player.

For Pitchers, it is much more complicated, so it’s best to outline the two different WAR stats that are most commonly referenced. First, there’s Fangraphs WAR, commonly referred to as fWAR. fWAR uses Fielding Independent Pitching (FIP) during their calculations, instead of ERA. Recall that FIP is generally regarded as a more predictive stat than ERA, so fWAR could be better used as a tool to project future pitching performance. Conversely, Baseball Reference uses ERA when calculating their bWAR stat. ERA is based on what has actually happened, and could be influenced by team defense among other external effects. These effects are variable by game and are out of the pitcher's control, so this should be seen as more of a ‘what happened in the past?’ stat, rather than a ‘what should I expect in the future?’ stat.

Conclusion

I hope that this article has given you an introduction to some tools to enhance your viewership of baseball. These tools were selected as stats that may challenge how the game is traditionally viewed. Player’s are often over/undervalued by fans since traditional metrics such as batting average will never paint the full picture of their contributions. Hopefully the concepts learned today will allow you to form more complete opinions on player’s teams while enjoying the games.

Keep up to date with the Queen's Sports Analytics Organization. Like us on Facebook. Follow us on Twitter. For any questions or if you want to get in contact with us, email qsao@clubs.queensuca, or send us a message on Facebook.