2018-19 Season in Review: Montreal Canadiens & Ottawa Senators by Adam Sigesmund

By: Constantine Maragos

This year, the Eastern Conference was stronger than ever, with powerhouses such as the Tampa Bay Lightning leading the league throughout the regular season, along with playoff mainstays in the Washington Capitals and Pittsburgh Penguins. The Montreal Canadiens and Ottawa Senators both fell to the same fates at the end of this year, missing the post-season in back-to-back years. However, both franchises followed very different storylines throughout the year, yet both have promising futures going forward.

Screen Shot 2019-05-15 at 5.46.41 PM.png

Montreal Canadiens

Screen Shot 2019-05-14 at 6.18.49 PM.png

The Montreal Canadiens catapulted themselves back into the playoff conversation this season and finished off a respectable 2018-19 campaign with 44 wins and 96 points, placing them 4th in the Atlantic and 14th in the league. Despite falling two points short of the playoffs, the Montreal Canadiens showed signs of true growth, while also mending some of their biggest issues over the past few seasons. The Canadiens’ improvement this season can be attributed to a combination of the return to form of veterans such as Carey Price & Shea Weber, an injection of youth into the lineup, and a number of shrewd acquisitions by GM Marc Bergevin.

The key acquisition that Montreal made this summer was swapping former 2012 3rd overall pick Alex Galchenyuk for Arizona’s Max Domi. After struggling to find an effective top-six centre, Max Domi seemed to fit into that role well this season. Despite Domi’s rough start to his Canadiens’ tenure, receiving a preseason suspension for his sucker-punch on Florida’s Aaron Ekblad, Domi has settled in nicely with the Habs. Domi’s 28 goals and 72 points this season place him first in team scoring. In addition, his 0.88 all-situation xGoals/60 is good for fourth among Canadiens forwards. Also, his hard-hitting, pesty play style has led to an average of 3.14PIM/60, but while also drawing 3.25/60 penalties as well, leading Canadien’s skaters in both categories. Domi’s enigmatic playstyle has drawn mixed reviews, however, if he can turn this season’s 28 goal, 72-point performance into the standard of performance going forward, Habs fans will not be complaining too much in the future.

The return of captain Shea Weber gave the Canadiens an injection of veteran leadership and much-needed skill on the back end after the defenseman was sorely missed last year. Weber made his return known across the league after scoring 2 goals and 3 points in his first two games and has stabilized the Habs’ back end since. His 14 goals and 33 points this season are good for second in goals & points among Habs defensemen behind only Jeff Petry. Carey Price experienced a return to form this year, going 35-24-6 in 64 starts while posting a 2.49 GAA along with a 0.918 SV%, good for 12th in each category (among goalies with >25 GP). Price also finished 3rd in Quality starts with 37, as well as placing 8th in Goals Saved Above Average with 14.94. Additionally, Tomas Tatar was a pleasant surprise for the Canadiens. Having been shipped to Montreal in the summer in a package for former captain Max Pacioretty, Tatar carried with him the baggage of an underperforming contract. Tatar took his fresh start opportunity and ran with it, scoring 25 goals and 58 points on the season. Further, he finished the season with 0.84 xGoals/60 at 5v5, good for 5th amongst forwards. As a staple in the top six, Tomas Tatar has found himself a home in Montreal.

The Achilles heel in the Canadiens lineup was the struggles of the fourth line. The rotating carousel of Michael Chaput, Kenny Agostino, Nicolas Deslauriers, Charles Hudon, and Matthew Peca struggled immensely throughout the year. To note, the group’s possession metrics had been abysmal. In 5v5 relative Corsi, Deslauriers, Hudon, Chaput, and Peca round out the bottom for forwards, all sitting below -3.8%. To put things into perspective, the Ottawa Senators had only two forwards (with >20 games played) with a sub -4% relative Corsi. Additionally, apart from Kenny Agostino (51.6%), each player finished sub-50% in On-Ice xGoals/60. Only two other players on the roster finished in this category. With these struggles in mind, Marc Bergevin made two moves in an attempt to fix the problem. The Canadiens acquired Nate Thompson & a 2019 5th round pick from LA for a 2019 4th round pick, as well as Dale Weise & Christian Folin from Philadelphia for AHL forward Byron Froese and depth defenseman David Schlemko (both trades occurring in February). Along with these acquisitions, the Canadiens subsequently waived Michael Chaput and Kenny Agostino, who was claimed by the New Jersey Devils. However, Nate Thompson did not make much of an impact initially with the team, as he posted comparable advanced statistics as the players listed above, but did play well enough down the stretch to earn himself a one-year/$1 million extension. In addition, Dale Weise jumped between AHL Laval and the Canadiens’ press box since his acquisition. However, at the Trade Deadline, Bergevin flipped the recently waived Chaput to the Arizona Coyotes in exchange for winger Jordan Weal. Jordan Weal fit nicely into the Habs’ bottom six after shuffling around the league this year. Since joining the team, Weal has scored 4 goals and 10 points in a fourth line role. Additionally, he logged 1.4 even strength points/60, a higher total than each of the above-listed names. Weal also allowed the least amount of shot attempts Against/60 on the roster at 50.62. Weal’s performance down the stretch earned himself a two-year/$2.8 million extension. Despite the small sample size, the Habs looked to have found a few upgrades in the bottom six going into next season.

Looking forward, the Canadiens’ future looks bright. Centre Ryan Poehling looks like he is going to be a staple in the centre of the Habs lineup for years to come. Poehling made quite the first impression in his first career game, scoring a hattrick and the shootout winner against the Toronto Maple Leafs no less. Jesperi Kotkaniemi experienced ups and downs this year as a first-year player but also looks to be a future mainstay in the Habs' lineup. If Marc Bergevin is able to find a solution to their fourth line issues, while their young core continues to produce next season, the Canadiens look set to enter the playoff conversation next year.

Screen Shot 2019-05-15 at 6.01.38 PM.png

Ottawa Senators

Screen Shot 2019-05-15 at 6.07.00 PM.png

For the past two years, the Ottawa Senators organization has been mired in controversy, questionable moves, and subpar play. After trading away estranged captain Erik Karlsson to the San Jose Sharks, the Senators we left without an identity going into the 2018-19 season. The Senators finished off a dismal campaign with 29 wins and 64 points, leaving them at the basement of the NHL standings. Along with their last-place finish, the Senators will be without their first-round selection, now the fourth overall pick in this year's draft. Despite finishing as the worst team in the NHL, the Senators are still able to take some positives from this year. 

If the Senators were to take anything away from this season, it would have to be the emergence of Thomas Chabot. Chabot’s breakout 2018-19 campaign finished with 14 goals and 55 points, placing him 10th in NHL scoring among defensemen, while also missing 12 games due to two separate injuries. In addition, Chabot was selected as the Senators’ All-Star participant in San Jose. This year Chabot displayed the untapped potential that he possesses and looks poised to lead the Senators into their franchise’s next chapter. Chabot was an all-situations player for the Senators, logging around 20:45 even strength minutes per game, ranking 4th in the NHL behind only Seth Jones, Ryan Suter, and Drew Doughty. As a 22-year old defenseman, being mentioned among such prominent names is a great sign for things to come.

As the Senators began to fall out of playoff contention, the chatter of moving pending free agents Matt Duchene, Ryan Dzingel, and Mark Stone ramped up tremendously. Against the preconceived notions of many, GM Pierre Dorion was able to get solid value for the three veteran forwards. In a flurry of trades with the Columbus Blue Jackets, the Senators traded (in total) Matt Duchene, Ryan Dzingel, and Julius Bergman for Anthony Duclair, Vitaly Abramov, Jonathan Davidsson, a 2019 1st round pick, a 2020 conditional 1st (contingent on Duchene re-signing), a 2020 2nd round pick, and a 2021 2nd round pick. Given the position Dorion was in, he seemed to have yielded a solid return. With Columbus’ pending free agent situation taken into consideration, the draft picks Dorion received may significantly rise in value. In addition, Anthony Duclair has been a pleasant surprise for the Senators, having been considered a "throw-in" in the Dzingel trade, Duclair has scored 8 goals and 14 points in 21 games with the Sens, more than both Dzingel and Duchene in those categories. While Duclair’s xGoals/60 stayed relatively constant at ~0.8, since leaving Columbus, Duclair’s on Ice xGoals/60 rose from 2.4 to 9.99, which is again a big jump from the pair sent the other way. After jumping around the league for the past few years, Duclair may have been able to find some consistency and confidence on the Sens roster.  

The Senators also traded star winger Mark Stone in a blockbuster to the Vegas Golden Knights for blue-chip prospect Erik Brannstrom, a 2020 2nd round pick, and forward Oscar Lindberg. The 15th overall selection in the 2017 Entry Draft is developing into a superb offensive defenseman. As an AHL rookie, Brannstrom scored 7 goals and 32 points in 50 games. Brannstrom was also named an all-star at the 2019 World Junior Championship, scoring 4 goals in 5 games at the tournament. Although Mark Stone is a tremendous talent, and one of the best defensive forwards in hockey, Ottawa Senators fans should be excited for what’s to come with the likes of Erik Brannstrom and Thomas Chabot leading the backline.

Looking forward, the Senators have a lot of talent up front in chippy forward Brady Tkachuk, and young centre Colin White, among others. Additionally, with Thomas Chabot continuing to grow into a superstar, and Erik Brannstrom converting his solid offensive game to the NHL level, the Senators are well prepared as they move forward in their rebuild. Although it remains to be seen whether their decision to pick forward Brady Tkachuk over whoever is available with Colorado's fourth-overall pick this year, the Senators have been able to recuperate draft picks to help mend the difference. The Senators will still have a full seven draft picks this season, including four in the first three rounds. Furthermore, the Sens have 12 draft picks in 2020, including 7 in the first three rounds, and 8 draft picks in 2021. Despite the external uncertainties surrounding the Senators for a multitude of reasons, the organization needs to move on. For the Senators organization to move on from the circus that has been the past two years of their franchise, the front office needs to concentrate on the future and draft smart. In doing so, the Senators may be able to regain the trust of the fanbase in the future.

Special thanks to Owen Kewell for this article.

Keep up to date with the Queen's Sports Analytics Organization. Like us on Facebook. Follow us on Twitter. For any questions or if you want to get in contact with us, email qsao@clubs.queensuca, or send us a message on Facebook.

Major League Baseball Capacity Rates: Analyzing Fan Attendance Using Panel Data by Anthony Turgelis

By: Cody Smith and Patrick Mills

A note from the article’s editor (Anthony Turgelis): This article will read somewhat differently from the rest of our content, as this was a detailed final project for a 4th year Applied Econometrics course at Queen’s. I hope you enjoy the format, as it is more typical for analytics research projects.

Introduction

            Major League Baseball has been recognized as a staple of American culture since its inception in 1869. Over generations, the League has transformed from humble beginnings into an organization with 30 franchises over North America each playing 182 games per season. This period has seen franchises experience different levels of success when attempting to attract fans to games, and this variation has drawn the attention of many Economists looking for an answer as to why. Ahn and Lee (2014) found that from 1904-1957, fans were captivated by teams with the most winning record. This period was followed by a change in preference which Ahn and Lee identified as a desire for competitive games (1958-2012). Now, in the modern era, sports fans have a variety of professional sports to watch, as well as the ability to follow and support teams from a distance thanks to increased media coverage and reduced costs of transportation. Similar to how companies in retail rely on branding to differentiate themselves and attract customers, it is likely that branding will become increasingly important for MLB franchises as they compete with a growing number of substitute franchises across sports to attract people to fill seats at games.  This paper will examine the effect of Brand Equity Value (BEV) on capacity rates across MLB franchises. Our computation of BEV will be explained in our description of data. If the relation between BEV and capacity rates is found to be statistically significant, it could have major implications as it would provide compelling evidence of a new shift in fan preferences. Franchises would have to choose how to market themselves to stimulate interest among local and distant baseball fan bases alike. Evidence for a shift in fan preferences would also provide considerations for the MLB when creating policy to promote and protect the league’s growth and popularity.

            The remainder of this paper will be structured as follows: The second section will provide a commentary on existing literature discussing factors of MLB attendance. The third section will describe from where the data used in this paper as well as provide further explanation of BEV. The fourth section will explain the empirical model used in our regression analysis, as well as why it was chosen. The fifth section will present our regression tables as well as a commentary on the results. Finally, the sixth section will conclude with a summary of this papers findings as well as the importance and implications of these results.

Literature Review

Economists have published a collection of articles speculating over the key determinants of attendance for Major League Baseball franchises. Nesbit and King (2012) released a report finding that fans who play fantasy baseball are more likely to attend games, while Mittelhammer, Fort, et al., (2007), found increasing proximity between teams had an adverse effect on attendance in accordance with Hotelling’s model that consumers buy goods from the closest supplier. Ahn and Lee (2014) concluded that in earlier years of baseball (1904-1957) fans had been drawn to teams with winning records – the better the record, the higher the turnout – before a new era of baseball (1958-2012), saw fans who were drawn to games based on uncertainty of outcome, size and quality of the stadium as well as the playing styles of the teams. Throughout the literature however, one overarching determinant became reoccurring: the maintenance of a competitive balance in the league. Berri and Schmidt (2001) investigated this matter and concluded that as the league became more competitive, attendance could be expected to increase. Lemke, Leonard, et al. (2010), who attempted to establish a relation between promotions and giveaways in small and large markets on attendance, found competitive balance to be a significant factor of attendance. Ahn and Lee (2014), reached the same conclusion as well. Economists approaching the subject seem to agree that competitive balance is essential to the interest of fans as well as the financial health of the league. Lee (2016), showed that fans consider characteristics of home and away teams when making attendance decisions. In the same report, Lee also hypothesized that modern era baseball fans had less incentives to cheer for the home team, citing development of media and greater access to information, mobility in residence and reduced transportation costs as factors that would allow fans to pick and choose a team they wanted to support, rather than teams in close proximity. While this is still consistent with competitive balance, if true, previous methods of incentivizing local fans to attend games would prove to be less efficient.

Description of Data

            The primary data that was utilized in our regression model was obtained from ESPN, Sports Reference, Statistics Canada, Statista and the United States Census Bureau. Supplementary information used from Boston Globe Media Partners examined MLB ticket prices, and this paper took data presented by Forbes Media to establish franchise valuations. Data from ESPN and Sports Reference provided information on stadium capacity rates, strength of schedule, win rate over .500, estimated payroll, homeruns per game, and pace of games. The United States Census Bureau and Statistics Canada data pools were used to establish the demographic parameters of population and median household income. Data from Statista was referenced to establish ticket prices. We also took into consideration the addition of professional sports teams to cities with MLB franchises by including a dummy variable. If these arriving franchises were a part of a big four league (NHL, NFL, NBA, MLB), the existing MLB franchise was given a value of 1, while MLB teams in cities that did not gain a professional team were given a value of 0. Those that lost a franchise were given a value of -1. We also assigned a dummy variable to account for MLB franchises that moved into a new stadium over the period of examination. Teams that moved were assigned a value of 1 while teams that did not were assigned a value of 0. It is important to note that this is a potential source of error in our paper as teams who moved from a larger stadium to a smaller stadium will have experienced an increase in stadium capacity rate without a real increase in game attendance. Data from these sources was used to collect relevant information from all 30 MLB teams in the years 2008 and 2018, with a total of 60 observations.

Our primary measure deals with the evaluation of brand equity and is represented by the variable BEV. To accurately quantitate this qualitative statistic, we standardized four separate variables and used the average of the values to compute each franchises BEV statistic. As we only took data from separate two years, it was only possible to determine the effect of BEV in 2018. These four variables used for our BEV statistic are as follows:

 

1.      Market share – Presence in the market (2018 season)

a.       % of market share per team

b.      Individual team valuation/Sum of MLB team total valuation

2.      Transaction value – Price offered for service (2018 season)

a.       Average ticket prices/team

3.      Success generation – Team performance change (2008-2018)

a.       (Win % 2018 season/Win % 2008 season)-1

4.      Growth rate – Team valuation change (2008-2018)

a.       (Team valuation 2018 season/Team valuation 2008 season)-1

Table 1. Definition of Variables

cs1.png

Figure 2. Summary Statistics

cs2.jpg

Empirical Model

            To estimate the determinants of capacity rate, we have chosen to use a panel data regression. To decide between a fixed effects and random effects model, a Hausman test was conducted. After our results yielded Prob>Chi2= > 0.05, we chose to use a random effects GLS regression. This method will allow us to compare common factors of short-run demand to determine capacity rates in our given seasons, 2008 and 2018.

Equation (1) presents our basic empirical model of MLB game attendance:

cs3.PNG

Table 1 included in our description of data provides an explanation for each variable included in our empirical model. Using a random effects model is this instance is useful as the variation across franchises in our model is assumed to be random and uncorrelated with the predictor or independent variables included. Random effects assume that the error term is not correlated with the predictors, and under this assumption a random effects model will produce unbiased estimates of the the coefficients, use all the data available, and produce the smallest standard of error. After running our random effects GLS regression, we ran a simple OLS regression to determine how much of the variation in capacity rates could be explained by our Brand Equity Value variable, BEV, which will serve as our primary parameter of interest.

Results and Discussion

            After running our random effects GLS regression, we found three variables to be statistically significant. They were: estimated payroll for the season, home runs per game and franchise value. Our R-squared was 0.6407, and shows a strong correlation between the effect of the variables on the capacity percentage for MLB teams.

cs4.jpg

After running this regression, we tested our BEV statistic against the 2018 results to determine how much of the effect could be could attributed Brand equity value.

cs5.jpg

We found BEV to be statistically significant, with an R-squared value of 0.5196, meaning that 52% of the change in capacity percentage across MLB franchises can be accounted for by our constructed BEV variable. To the best of our knowledge, this paper is the first Economic evaluation of the effects of Brand Equity on capacity percentage, which is a direct measure of fan demand for a franchise. Our findings from our random effects GLS regression were consistent with existing literature, as estimated payroll, homeruns per game and franchise value have consistently been significant indications of attendance.

Conclusion

            The purpose of this paper was to establish an understanding of the impact of different variables on the capacity rate of MLB franchises. We took a collection of data from the 2008 and 2018 MLB season, as well as corresponding data of demographics and ran a random effects GLS regression, followed by a linear regression to determine how much of the change in capacity rates could be explained by our variable of interest, BEV. We found three variables in our random effects model to be statistically significant: estimated payroll, homeruns per game, and franchise value. Our findings in this regression are consistent with the literature. The results of our linear regression proved our hypothesis that Brand Equity Value plays a statistically significant role in determining capacity rates for franchises across the MLB. These findings are important as they indicate that fan preferences may be changing again, which is an observation that has been made in the literature over different periods. A change in fan preferences will have implications relating to economic and policy decisions as franchises attempt to stimulate fan interest by providing different amenities and incentives to differentiate themselves from the competition. The success of these efforts has implications for the municipalities and regions that generate tax revenue from the operations of these franchises, and could impact future decisions regarding expansion and relocation of MLB franchises.

NHL Western Canada 2018-19 Season Recap by Scott Schiffner

By: Constantine Maragos

With the conclusion of the 2018-19 NHL season, the Edmonton Oilers and Vancouver Canucks are facing the year-end media earlier than they’d like. Both teams have young superstars taking the league by storm, however glaring issues in each team holds them back from taking the next step. In this article, we’ll look into each of Edmonton and Vancouver’s seasons and look to see where they excelled, and where they need to improve.

mcdavid and draisaitl.png

Edmonton Oilers (35-38-9), 7th in Pacific Division

oilers logo.png

The Edmonton Oilers have caught themselves in another lacklustre season after their second round, seven-game series against the Anaheim Ducks two years ago. This year has been especially disappointing for the team. The Oilers finished with 35 wins and 79 points, good for 7th in a relatively weak Pacific Division, and 25th overall in the NHL. There is a multitude of factors one could blame this season’s finish for, including asset mismanagement and an underperforming roster.

Former Oilers General Manager Peter Chiarelli made a number of questionable moves in his tenure with the team that we do not need to harp on more than they already have. However, in the scope of this Chiarelli made certain moves which left the Oilers in the same position, or worse. To start, roughly one month into the season, the Oilers swapped Ryan Strome for New York Rangers forward Ryan Spooner. After only scoring 2 points in his first 18 games, it made sense at the time to try and shake things up with the swap. However, in doing so the Oilers acquired a player who’s struggles went beyond the offensive end. Spooner’s subpar defensive ability has been well documented and has led to his movement around the league. On the other hand, with a decreased defensive usage with the Rangers, Strome has been able to score at a significantly higher level. Spooner did not manage to score any more than he did in New York. Shortly after Chiarelli’s release, the new Oilers regime sent Spooner to the Canucks for Sam Gagner, who has been itching for NHL time since being stuck with the AHL Marlies all season. Gagner has been a step up from what Spooner and Strome were for the Oilers, recording 10 points in 25 games for the team.

The other two moves Chiarelli made this season involved shipping off forward Drake Caggiula, Jason Garrison, & a 2019 3rd round pick in a pair of trades to acquire defensemen Brandon Manning from the Blackhawks and Alex Petrovic (0-1-1, -7, 9GP) from the Panthers (both trades happened on December 30th). Again, these trades seem to have been poor asset management. Brandon Manning played a total of 12 pointless games for the Oilers before he was assigned to AHL Bakersfield. Alex Petrovic did not fare well either, playing only nine games, although he did miss time due to injury. As an impending UFA Petrovic did not make much of a case to be re-signed. Through his nine games with the team, Petrovic recorded only 1 assist, and registered a -7 plus/minus. Petrovic also achieved a -10.1% relative xGoals, as well as sitting at an On-Ice Shot Attempts Against/60 of 62.07. Petrovic served as a healthy scratch from February 16th until the end of the season. In turn, Drake Caggiula (5-7-12, +3, 26GP) stepped up his play with the Blackhawks. Although Caggiula has not been anything near a revelation, the primary scrutiny surrounding these moves is the lack of asset management and desperation shown by management. Of course, such moves are not the primary factor the Oilers’ demise but can be looked at as a sample size of the countless missteps that have occurred over the years. With many needs to address this season, the incoming management regime is left with a slim talent pool, and many needs to address this offseason.

Apart from the number of media storylines that shadow the organization, Leon Draisaitl enjoyed a career year. There was no shortage of critics regarding the huge contract he signed in 2017. Draisaitl’s 8-year/$66 million contract accounted for 11.33% of the Oilers cap hit at the time (and caused plenty of headaches among Leafs fans earlier this year), however, he has certainly begun to live up to that number, if not already. Draisaitl finished the year with 51 goals and 105 points, placing him 1 goal behind Alex Ovechkin for the Rocket Richard trophy and 4th overall in scoring. Draisaitl was also tasked with a heavy workload throughout the season, averaging 22:35 in ice time, second only behind teammate Connor McDavid among forwards. This statistic alone should measure the importance of these two players to the Oilers roster.

The most intriguing storyline this season was the persistent speculation surrounding 20-year old forward Jesse Puljujarvi. Puljujarvi never solidified himself into the Oilers lineup. To summarize Puljujarvi’s struggles this season, agent Markus Lehto has stated that it “may be beneficial [for Puljujarvi] to go somewhere else.” With that being said, assessing Puljujarvi’s current value is tough, as he is only 20, and a fresh start is what he may need. With only 4 goals and 9 points through 46 games, Puljujarvi’ season ended on February 15th due to a hip injury, which he underwent surgery for on March 4th. However, through such low production and inconsistent play Puljujarvi has not shown that he is capable of becoming a full-time NHL player. But as he is so young, the Oilers need to assess whether or not it is worth keeping someone who has shown flairs of skill or cut their losses and move forward with other assets. It is also worth mentioning that Edmonton has set a high price for Puljujarvi at a 1st, a prospect, and another asset (per Darren Dreger). It remains to be seen how the situation will play out, but the most sensible trade scenario, given Puljujarvi’s play and trending value, would be to try and swap him for another young player in a similar situation.

Looking forward, the Oilers have many needs to address, whether it be finding depth on the wing or stability on defense. The only consistency that lies within the organization is their two stars in Connor McDavid and Leon Draisaitl who seem to score no matter the circumstance. If the Oilers want to compete soon, they will need a number of smart acquisitions astute player development to improve their fortunes next year.

petterson and boeser.png

Vancouver Canucks (35-36-11), 5th in Pacific Division

nucks logo.png

Despite playing at around .500 for the entire season, the Canucks found themselves playing some meaningful games up until near the end of the season. The Canucks finished with 35 wins and 81 points, placing them 5th in the Pacific Division and 23rd in the NHL. However, without the outstanding second-half of Jacob Markstrom, the emergence of young superstar Elias Pettersson, and the reliable defensive play of centre Bo Horvat, the Canucks would be in a much lower place in the standings.

Through 60 starts this season, Jacob Markstrom posted a 28-23-9 record. The Canucks starter also contributed 34 Quality starts on the season, as well as an 11.1-point share, tied for 7th in the league. However, his performance in the second half of the season was a catalyst in keeping the Canucks near the playoff conversation this season. Since December 7th, Markstrom posted 19 wins at a 0.921 SV%, paired with his one shutout for the season. Night in and night out, Markstrom provided the Canucks with an opportunity to win. One of his most notable performances of the season was his 44-save outing in February against the fully-loaded Calgary Flames. The Flames outshot the Canucks 34-13 in the second and third period, but Markstrom held his ground throughout and led the team to a shootout victory. The progress Markstrom has made this year gives the Canucks needed stability in net throughout the rest of their rebuild, as rookie netminder Thatcher Demko still needs time to develop into a full-time pro, and prospect Mike DiPietro is still multiple years away from the Canucks crease. Going into next season, if Markstrom is able to build off his play this season, the Canucks could be playing meaningful games in spring quicker than we’d expect.

Another popular storyline is Elias Pettersson’s record-breaking season. The Swedish rookie hit the ground running at the beginning of the 2018-19 campaign, scoring in his debut, and subsequently scoring 10 points in 10 games through the month of October. Pettersson finished the season with 31 goals and 66 points, leading not only the Canucks but the entire rookie class in scoring by a considerable margin. Pettersson set a multitude of records through his impressive feats this season, most notably setting the record for most points by a Canucks rookie. In addition, the Pettersson effect was alive and well throughout the season. Petterson ranked third in relative Corsi at 4.3%, only behind linemates Brock Boeser and Josh Leivo no less. Despite complaints regarding his size (or lack thereof) Pettersson’s ability this year to protect and handle the puck in such a skilled manner has reinvigorated the Canucks offense. The insertion of Pettersson into the Canucks lineup has allowed for a seamless transition into a new identity for such a young core and gives great hope for the future. The next step in Pettersson’s career is his forthcoming Calder trophy win. However, a slow finish to the season and explosive entrance into the league by 25-year old rookie Jordan Binnington has created a bit of a conversation. Nonetheless, expect Pettersson to be named rookie of the year, and improvement next year on his already spectacular game.

As injuries seem to plague the Canucks year after year, head coach Travis Green leaned on centre Bo Horvat tremendously throughout various stretches of the season. Injuries to bottom-six centres Brandon Sutter and Jay Beagle at the beginning of the year led to a tremendous defensive load for Horvat to manage. Tasked with shutting down the top opposition night in and night out, while also looked at to still produce, Bo Horvat showed how important he is to the Canucks roster. Horvat lined up for 2018 faceoffs on the year which is more than any other player in the NHL. Additionally, 38.6% (779) of those draws came in the defensive zone. Horvat finished the year with a 53.7% success rate on draws. Along with the heavy defensive burden, Horvat had a career year offensively, recording career highs in goals, assists, and points (27-34-61, 82GP). As Horvat continues to develop as a 200-foot player, he has still yet to reach his full potential.

The biggest weakness that the Canucks have is their struggle on defense. Throughout the season, it was apparent that veteran Alex Edler was the backbone of their defense core and was relied on more than ever to perform. Edler logged an average TOI of 24:34, ranking him 10th in the NHL. Edler also missed extensive time due to injury, playing only 56 games on the year. The damage was tough to mitigate for the Canucks fdefense, as veteran Chris Tanev also missed a comparable amount of games this year (27). While this is not an unknown phenomenon, especially for Tanev, the absence of the veteran duo was missed more than ever. Despite the magnitude of games lost to injury, Edler was still able to produce impressive numbers, posting 10 goals and 34 points on the year. This was Edler’s 3rd time scoring 10 goals or more in a season, and the first time since 2011-12. As an impending free agent, and well-documented for his admiration of the city he’s called home his entire career, expect Edler to be back next season. As for the rest of the Canucks defense, there is still work to be done. Although defensemen Ben Hutton and Troy Stecher made good progress this year, there were countless occasions where they were anchored by defense partners such as Erik Gudbranson and Derrick Pouliot.  Before being traded, defenseman Erik Gudbranson was one of the worst possession defenders in the NHL. At the time of his trade, Gudbranson led the Canucks in shot attempts against per 60 with 66.64, while also at the bottom of the league in plus-minus at a -27. In addition, defenseman Derrick Pouliot looked lost at times when out on the ice. The lack of physicality from both defensemen led to long, drawn-out shifts in the defensive zone and countless turnovers on the breakout. Luckily, with the arrival of Quinn Hughes and potentially new personnel in the offseason, ice time on the back end will not be taken for granted, and play will improve.

Looking forward, the Canucks have a strong young core already in the NHL and will add another top prospect with the #10 pick at the NHL Draft in Vancouver. Also look for  Bo Horvat to take another step, where presumably he will be named team captain, a role that’s been he’s been groomed for ever since he came into the league. In addition, the developing chemistry between Brock Boeser and Elias Pettersson will only become stronger, and they have the potential to become one of the top duos in the NHL. Quinn Hughes’ NHL audition impressed many and should certainly excite fans going into his rookie season. His confidence with the puck on his stick will only increase as he acquires more NHL experience. Also, if the likes of Jake Virtanen, Josh Leivo, and Troy Stecher are able to build on their 2018-19 campaigns, they will develop into solid role players for the foreseeable future. With that being said, there is still a ways to go for the Vancouver Canucks, but there is a bright future ahead of them.

How Important is Thanksgiving in Relation to Making the Playoffs? by Alex Craig

By: Ryan Reid

How early is too early when it comes to getting excited about a player or teams’ success early on in the season? While looking at Mikko Rantanen’s pace through 20 games and assuming he will score 130 points seems a bit ridiculous now (he is currently on pace for just over 100), the fact is that a 20 game sample size for teams as a whole is often very predictive of whether or not they will ultimately make the playoffs. In fact, over the past 5 seasons, 77.5% of teams that found themselves in a playoff position at American Thanksgiving went on to make the playoffs.

Screen Shot 2019-03-05 at 4.27.36 PM.png

Given the high predictability of holding a playoff spot at Thanksgiving, I believed that when other statistics are analyzed, they are likely to provide an even greater ability to predict which teams are playoff teams given various statistics collected at American Thanksgiving each year. 

With the help of machine learning, I hoped to be able to create a model to out predict the strategy of picking current playoff teams.

Process Used

In creating a machine learning model, I wanted to be able to classify whether a team could be best classified as a playoff team or not, given a variety of statistics collected on Thanksgiving. To do so, I used Logistic Regression within machine learning in order to classify and group variables as binary, 1 being a playoff team, and 0 being a non-playoff team. Through examining the past 11 years of team data from Thanksgiving (minus the lockout shortened season for obvious reasons) and classifying each team, I hoped to train my model to be able to accurately classify playoff teams.

Screen Shot 2019-03-05 at 4.33.56 PM.png

Within python I used the numpy, pandas, pickle, and various features within sklearn including RFE (Recursive Feature Elimination) and Logistic Regression packages to create the model. Pandas was used to import and read spreadsheets from within excel. Pickle was used to save my finalized model. Numpy was used in certain fit calculations. RFE was used to eliminate features and assign coefficients to the impact criteria was having on the decision of whether a team made the playoffs. Finally, Logistic Regression was used to assign a predicted shape to the model.

Criteria Valuation             

Starting off with all statistics I could collect for teams at Thanksgiving, I began to weed out less predictive variables until I landed on a group of 8. Using Recursive Feature Elimination (RFE), I was able to continually run the model and see which variables were deemed most predictive and should be included in the model. The factors as listed below were deemed most predictive, in order of importance 
to the model. 

While point percentage is the most predictive, other statistics like shooting percentage, save percentage, or goals for percentage provide a bigger picture perspective that allows for a better predictive capability for the machine learning model.

It has been determined that having higher shots for, shooting percentage, and save percentage all have a negative effect on whether or not you end up making the playoffs. For shooting percentage and save percentage, this is likely due to the fact that the model has identified a PDO like correlation in which teams with a lower save percentage and shooting percentage can be classified as “unlucky” and will eventually regress towards the norm. Additionally, the number of shots a team takes relative to the other team has a negative correlation with making the playoffs. This could be due to score effects that cause losing teams to typically generate more shots that are of lower quality. As the model shows, it is primarily high danger chances that are predictive of making the playoffs, not just any shot.

The Results

Screen Shot 2019-03-05 at 4.39.46 PM.png

Running the model, 81.25% or 13 out of 16 playoff teams in a playoff spot as of March 1stwere correctly classified as playoff teams. Furthermore, an additional 2 teams (Columbus and Colorado) sat only 1 point back of a playoff spot. In contrast, picking the playoff teams at Thanksgiving would only result in a 68.75% success rate or 11 out of 16 teams. Furthermore, 3 teams that were in a playoff position at Thanksgiving are no longer in the playoff race in comparison to only 1 team (Buffalo) predicted by the model. 

Outliers

Particularly interesting decisions made by the machine learning model include the decision to not pick the Rangers to make the playoffs, despite leading the Metro at Thanksgiving, and the choice to select Vegas to make the playoffs despite a slow start.

One reason behind this choice could have been New York’s low number of ROW. With a mere 8 ROW in 22 games, the New York Rangers sat atop the Metropolitan Division mainly in part to their 4-0 record in shootouts. Seeing that the New York Rangers were playing so many close games, the model likely discounted the strength of the Rangers. Additionally, the New York Rangers had the 4thlowest corsi for %, 6thlowest shots for %, 9thlowest scoring chance for %. As for points for %, the Rangers were ranked at an underwhelming 13th in the league, but led the Metro since the Metro was a weak division and the Rangers had more games played. Given the Rangers low valuation across all these supporting criteria, the machine predicted that they would not make the playoffs despite their stronger points for % at Thanksgiving. 

As for the Golden Knights, despite holding the 29thbest point % in the league, Vegas was among the top 4 in the league in shots for %, corsi for % and scoring chances for %. Additionally, Vegas had the league’s lowest PDO (SH% + SV%) at 95.66. Given all these things considered, the model likely believed it was only a matter of time before the Vegas Golden Knights began winning.

Flaws in the Model

While my machine learning model appears to have the ability to out predict the strategy of picking all playoff teams at Thanksgiving, two main limitations of the model as highlighted above is the inability of the machine to pick teams based on the given playoff format, and the lack of data at various game states. 

Unaware of the NHL’s current playoff format, the model picked 9 Eastern Conference teams, and only 7 Western Conference teams. Without a grasp on the alignment of divisions within the league, the model is at a disadvantage when picking teams, particularly when specific divisions or conferences are more “stacked” than others. Therefore, there is the potential of the model picking an otherwise impossible selection of teams to make the playoffs.

Furthermore, data collected to be fed into the model was only even-strength data. While this provides a decent picture of a team’s capability, certain teams that rely on their power play, as the Penguins traditionally have, may be disadvantaged and discounted. Finding a way to incorporate this data into the model would likely provide a fuller picture and a more accurate prediction.

Final Thoughts

While the model I have created is by no means perfect, it provides a unique perspective into not only the importance of the first 20 or so games of the season, but also what statistics beyond wins are important in attempting to classify a playoff team. While the model appears to out predict the strategy of selecting all playoff teams at Thanksgiving, it will be interesting to see in years to come if there is a continued ability to classify playoff teams given Thanksgiving stats.

***All statistics gathered from Natural Stat Trick


Keep up to date with the Queen's Sports Analytics Organization. Like us on Facebook. Follow us on Twitter. For any questions or if you want to get in contact with us, email qsao@clubs.queensu.ca, or send us a message on Facebook.

What Makes a Top 10 Pitcher? by Alex Craig

By: Josh Margles

In baseball statistics, an earned run average (ERA) is the mean of earned runs given up by a pitcher per nine innings pitched. I decided to take a deeper look to see what goes into the ERA of a pitcher. In this study, I divided all the qualified pitchers from the last five years into two groups; top 10 ERA and non-top 10, as a means to determine what makes a top 10 ERA pitcher.

Using four indicators; strikeout percentage, walk percentage, left on base percentage, and BABIP (batting average on balls in play) we can figure out the probability that a pitcher will finish in the top 10 in ERA. I ranked all the pitchers in the last five seasons by these categories, and put them into a big matrix of numbers based on their rankings. To indicate if they finished in the top 10 ERA category, I put a 1 for top 10, and a 0 for finished outside the top 10. I used each pitcher’s yearly rank instead of their actual numbers because each year’s top 10 is different. Therefore, it is important to compare numbers on a year- to-year basis.

Some of the chart looks like this:

Screen Shot 2019-01-24 at 4.02.14 PM.png

To find a prediction, I used a program in R called XGBoost. XGBoost takes the information based on the previous data and tests to see if there is a pattern between where the pitcher finished in rank, and if he finished in the top 10 of ERA in the season. After running the numbers with different parameters on XGBoost we can determine two things. The program tells us which of the four stats is most indicative of a high ERA rank, and which pitchers were outliers (the model predicts the outcome).

First, let’s look at which stat is the most predictive in determining the rank. Surprisingly, LOB rank has the most impact on a pitchers ERA rank. Note that these aren’t percentages, rather they are used to show the relative importance in each stat in predicting ERA.

Screen Shot 2019-01-24 at 4.01.35 PM.png

This chart shows that where the pitcher finishes in LOB percentage is the best predictor. Interestingly enough, the pitcher that had the highest LOB percent (he left the highest percentage of runners on base) each of the last five years finished in the top 10 in ERA. Also, out of the pitchers that finished in the top five LOB percentage, 20 out of the 27 (there was one three-way tie) finished in the top 10. The chart also shows that LOB rank and K rank are much more significant than BB rank or BABIP rank.

Next, let’s look at the predictive aspect of the model. I ran the model using a number of different combinations of test and training data, and then had it predict on the pitchers. The model predicted around 85 percent of the pitchers correctly. Now, let’s look at a few pitchers that the model incorrectly predicted and why this data was wrong.

Screen Shot 2019-01-24 at 4.06.16 PM.png

Garrett Richards finished the 2014 season with a 2.61 ERA, which placed him 10th in the MLB. However, the model predicted that Richards would finish outside of the top 10 with those ranks. One explanation for why Richards finished with a good ERA is his HR rate. He had a 0.27 HR/9 rate in 2014, which was the lowest of any qualified pitcher in the last five years. So, while he allowed a lot of baserunners, not a lot came in because of the fact that he could keep the ball in the yard. Richards has been injured the last few years, but his success has been almost completely related to his home run rate.

Screen Shot 2019-01-24 at 4.09.23 PM.png

Stroman in 2017 had an ERA of 3.09, which placed him 9th. What Stroman lacks in strikeouts, he made up for in his ground ball to fly ball rate, as well as his groundball percentage. This allowed Stroman to get easy outs without needing to strike everyone out. Since he got so many groundballs, most of the hits he gave up were singles, which limited the amount of earned runs. He also induced the most double plays in 2017, which helped him get out of innings without allowing any earned runs.

Screen Shot 2019-01-24 at 4.11.24 PM.png

One problem with this model is that it treats everyone outside the top 10 as equals. In 2015, Scherzer had a 2.79 which was the 11th best in the MLB. Even though he finished with a great ERA, the reason he didn’t make it into the top 10 was because of the amount of HR he allowed. He gave up 31 HR which was the most in the NL. Even though he finished in the top 10 in these four stats, his home runs prevented him from being in the top 10 in ERA.

Screen Shot 2019-01-24 at 4.13.40 PM.png

One of the more interesting results was that the model projects Fiers in the top 10 even though he had a 3.56 ERA, finishing 24th in 2018. The reason why his LOB rank is so good, while still consistently giving up runs, is because he gave up the second most HR/9 of anyone in the MLB. While the rest of his numbers look good, like Scherzer, home runs prevented Fiers from having an elite ERA.

 

Stats from FanGraphs.com, Baseball-Reference.com and baseballsavant.com

How Important is Winning a Period in the NHL? by Alex Craig

By: Adam Sigesmund (@Ziggy_14)

Sometimes when I watch hockey on television, the broadcast will display a stat that makes me cringe. One of my (least) favourites is a stat like the one displayed just under the score in the screenshot below:

Picture1.png

Most of us have noticed these stats on broadcasts before. I imagine they are common because they match the game state (i.e. the Leafs are leading after the first period), so broadcasters probably believe we find them insightful. However, we are all smart enough to understand that teams should theoretically have a better record in games that saw them outscore their opponents in the first period. In this case, whatever amount of insight the broadcasters believe they are providing us with is merely an illusion. Perhaps they also saw value in the fact that the Leafs were undefeated in those 13 games, but that is not what I want to focus on today. 

More generally, my primary objective for this post is to shed light on the context behind this type of stat, mostly because broadcasts rarely provide it for us. Ultimately, I will examine 11 seasons worth of data to understand how the outcome of a specific period effects the number of standings points a team should expect to earn in that game. Yes, this means there will be binning*. And yes, I acknowledge that binning is almost always an inappropriate approach in any meaningful statistical analysis. The catch here is that broadcasters continue to display these binned stats without any context, and I believe it is important to understand the context of a stat we see on television many times each season.

* Binning is essentially dividing a continuous variable into subgroups of arbitrary size called “bins.”In this case, we are dividing a 60-minute hockey game into three 20-minute periods. 

A particular team wins a period by scoring more goals than their opponent. I looked at which teams won, lost, or tied each period by running some Python code through a data set provided by moneypuck.com. The data includes 13057 regular season games between the 2007-2008 and 2017-2018 seasons, inclusive. (Full disclosure: I’m pretty sure four games are missing here. My attempts to figure out why were unsuccessful, but I went ahead with this article because the rest of my code is correct, and 4 games out of over 13K is virtually insignificant anyways).  The table below displays our sample sizes over those eleven seasons:

Picture2.png

Remember that when the home team loses, the away team wins, so the table with our results will be twice as large at the table above. I split the data into home and away teams because of home-ice advantage; Home teams win more games than the visitors, which suggests that home teams win specific periods more often too. We can see this is true in the table shown above. In period 1, for example, the home team won 4585 times and lost only 3822 times. The remaining 4650 games saw first periods that ended in ties. 

We want to know the average number of standings points the home team earned in games after winning, tying, or losing period 1. This will give us three values: One average for each outcome of the first period. We also want to find the same information for the away team, giving us atotal of six different values for period 1. (This step is not redundant because of the “Pity Point”system, which awards one point to the losing team if they lost in overtime or the shootout. The implication is that some games result in two standings points but others end in three, so knowing which team won the game still does not tell us exactly how many points the losing team earned). Repeating this process for periods 2 and 3 brings our total to 18 different values. The results are shown below:

Picture3.png

The first entry in the table (i.e. the top left cell) tells us that when home teams win period 1, they end up earning an average of 1.65 points in the standings. We saw earlier that the home team has won the first period 4585 times, and now we know that they typically earn 1.65 points in the standings from those specific games. But if we ignore the outcome of each period, and focus instead on the outcomes of all 13057 games in our sample, we find that the average team earns 1.21 points in the standings when playing at home. (This number is from the sentence below the table —the two values there suggest the average NHL team finishes an 82-game season with around 91.43 points, which makes sense). So, we know that home teams win an average of 1.21 points in general, but if they win the first period they typically earn 1.65 points. In other words, they jumped from an expected points percentage of 60.5% to 82.5%. That is a significant increase.

However, in those 4585 games, the away team lost the first period because they were outscored by the home team. It is safe to say that the away team experienced a similar change, but in the opposite direction. Indeed, their expected gain decreased from 1.02 points (a general away game) to 0.54 points (the condition of losing period 1 on the road). Every time your favourite team is playing a road game and loses period 1, they are on track to earn 0.48 less standings points than when the game started; That is equivalent to dropping from a points percentage of 51% to 27%. Losing period 1 on the road is quite damaging, indeed. 

Another point of interest in these results, albeit an unsurprising one, is the presence of home-ice advantage in all scenarios. Regardless of how a specific period unfolds, the home team is always better off than the away team would be in the same situation.

I also illustrated these results in Tableau for those of you who are visual learners. The data is exactly the same as in the results table, but now it’s illustrated relative to the appropriate benchmark (1.21 points for home teams and 1.02 points for away teams).  

Picture4.png

Now, let’s reconsider the original stat for a moment. We know that when the Leafs won the first period, they won all 13 of those games. Clearly, they earned 26 points in the standings from those games alone. How many points would the average team have earned under the same conditions? While the broadcast did not specify which games were home or away, let’s assume just for fun that 7 of them were at home, and 6 were on the road. So, if the average team won 7 home games and 6 away games, and also happened to win the first period every time, they would have: 7(1.65) + 6(1.53) = 20.73 standings points. Considering that the Leafs earned 26, we can see they are about 5 points ahead of the average team in this regard. Alternatively, we can be nice and allow our theoretical “average team”to have home-ice advantage in all 13 games. This would bump them up to 13(1.65) = 21.45 points, which is still a fair amount below the Leafs’ 26 points. 

One issue with this approach is that weighted averages like the ones I found do not effectively illustrate the distributionof possible outcomes. All of us know it is impossible to earn precisely 1.65 points in the standings —the outcome is either 0, 1, or 2. An alternative approach involves measuring the likelihood of a team coming away with 2 points, 13 times in a row, given that all 13 games were played at home and that they won the first period every time. We know the average is 13(1.65) = 21.45 standings points, but how likely is that? It took a little extra work, but I calculated that the average team would have only a 3.86% chance to earn all 26 points available in those games. (I did this by finding the conditional probability of winning a specific game after winning the first period at home, and then multiplying that number by itself 13 times). Although the probability for the Leafs is a touch lower than this, since there is a good chance a bunch of those 13 games were not played at home, you should not allow such a low probability to shock you; 13 games is a small sample, especially for measuring goals. There is definitely lots of luck mixed in there. 

This brings us back to my original anecdote about cringing whenever I encounter this type of stat. Even if we acknowledge its fundamental flaw —scoring goals leads to wins, no matter when those goals occur in a game —the stat is virtually meaningless in a small sample. Goals are simply too rare to provide us with much insight in a sample of 13 games. Nevertheless, broadcasters will continue displaying these numbers without context. This article will not change that. So, the next time it happens, you can now compare that team to league average over the past eleven seasons. Even if the stat is not shown on television, all you need to know is the outcome of a specific period to find out how the average team has historically performed under the same condition. At the very least, we have a piece of context that we did not have before.

RBIs - Clutch? Or Opportunity? (xRBI) by Anthony Turgelis

RBIs are often criticized because they are largely dependent on how many plate opportunities the hitter gets with runners on base. Most analytics experts have dismissed RBIs as a dated stat, but many baseball insiders still claim that they have some relevance. We aim to address these flaws and create a stat that everyone can agree on.

Read More

Do Tired Defensemen Surrender More Rebounds? by Owen Kewell

By: Owen Kewell

Two thoughts popped into my mind, one after the other.

First, I wondered whether an NHL player’s performance fluctuated depending on how long they had been on the ice. Does short-term fatigue play a significant role over a single shift?

Second, I wondered how to quantify (and hopefully answer) this question.

The Data

Enter the wonderfully detailed shot dataset recently published by moneypuck.com. In it, we have over 100 features that describe the location and context of every shot attempt since the 2010-11 NHL season. You can find the dataset here: http://moneypuck.com/about.htm#data.

Within this data I found two variables to test my idea. First, the average number of seconds that the defending team’s defensemen had been on the ice when the attacking team’s shot was taken. The average across all 471,898 shots was 34.2 seconds, if you’re curious. With this metric I had a way to quantify the lifespan of a shift, but what variable could be used as a proxy for performance?

Fortunately, the dataset also says whether each shot was a rebound shot. To assess defensive performance, I decided to use the rate at which shots against were rebounds. Recovering loose pucks in your own end is a fundamental part of the job description for NHL defensemen, especially in response to your goalie making a save. Should the defending team fail to recover the puck, the attacking team could generate a rebound shot, which would often result in a goal against. We can see evidence of this in the 5v5 data:

Rebound shooting % is 3.6x larger than non-rebound shooting %

Rebound shooting % is 3.6x larger than non-rebound shooting %

The takeaway here is that 24.1% of rebound shots go into the net, compared to just 6.7% of non-rebound shots. Rebounds are much closer to the net on average, which can explain much of this difference.

I believe that a player’s ability to recover loose pucks is a function of their ability to anticipate where the puck is going to be and their quickness to get to there first. While anticipation is a mental talent, quickness is physical, meaning that a defender’s quickness could deteriorate over the course of their shift as short-term fatigue sets in. Could their ability to prevent rebound shots be consequently affected? Let’s plot that relationship:

No trendline graph.jpg

There’s a lot going on here, so let’s break it down.

The horizontal axis shows the average shift length of the defending defense pairing at the time of the shot against. I cut the range off at 90 seconds because data became scarce after that; pairings normally don’t get stuck on the ice for more than a minute and a half at 5v5. The vertical axis shows what percentage of all shots against were rebounds.

Each blue dot represents the rebound rate for all shots that share a shift length, meaning that there are 90 data points, or one for each second. The number of total shots ranges from 382 (90 seconds) to 8,124 (27 seconds). Here’s the full distribution:

Shot Rates.jpg

We can see that sample size is an inherent limitation for long shifts. The number of shots against drops under 1,000 for all shift lengths above 74 seconds, which means that the conclusions drawn from this portion of the data need to be taken with a grain of salt. This sample size issue also explains the plot’s seemingly erratic behaviour towards the upper end of the shift length range, as percentage rates of relatively rare events (rebounds) tend to fluctuate heavily in smaller sample sizes.

The Model

Next, I wanted to create a model to represent the trend of the observed data. The earlier scatter plot tells us that the relationship between shift length and rebound rate is probably non-linear, so I decided to use a polynomial function to model the data. But what should be this function’s degree? I capped the range of possibilities at degree = 5 to avoid over-fitting the data, and then set out to systematically identify the best model.

It’s common practice to split data into a training set and a testing set. I subjectively chose a split of 70-30% for training and testing, respectively. This means that the model was trained using 70% of all data points, and then its ability to predict previously unseen data was measured using the remaining 30%. Model accuracy can be measured by any number of metrics, but I decided to use the root mean squared error (RMSE) between the true data points and the model’s predictions. RMSE, which penalizes large model errors, is among the most popular and commonly-used error functions. I conducted the 70-30 splitting process 10,000 times, each time training and testing five different models (one each of degree 1, 2, 3, 4, and 5). Of the five model types, the 5th degree function produced the lowest root mean squared error (and therefore the highest accuracy) more often than the degree 1, 2, 3 or 4 functions. This tells us that the data is best modelled by a 5th degree polynomial. Fitting a normalized 5th degree function produced the following equation:

x  = shift length in seconds

x = shift length in seconds

This equation is less interesting than the curve that it represents, so let’s look at that:

Regression.jpg

What Does It Mean?

The regression appears to generally do a good job of fitting the data. Our r-squared value of 0.826 tells us that ~83% of the variance in ‘Rebound %’ is explained by defensemen shift length, which is encouraging. Let’s talk more about the function’s shape.

Rebound rate first differences decrease at first as the rate stabilizes, and then increase further

Rebound rate first differences decrease at first as the rate stabilizes, and then increase further

As defense pairings spend more time on the ice, they tend to surrender more rebound shots, meaning that they recover fewer defensive zone loose pucks. Pairings who are early in their shift (< 20 seconds) surrendered relatively few rebound shots, but there's likely a separate explanation for this. It's common for defensemen to change when the puck is in other team’s end, meaning that their replacements often get to start shifts with the puck over 100 feet away from the net they're defending. For a rebound shot to be surrendered, the opposing team would need to recover possession, transition to offense, enter the zone and generate a shot. These events take time, which likely explains why rebound rates are so low in the first 15-20 seconds of a shift.

We can see that rebound rates begin to stabilize after this threshold. The rate is most flat at the 34 second mark (5.9%), after which the marginal rate increase begins to grow for each additional second of ice time. This pattern of increasing steepness can be seen in the ‘Rebound Rate Increase’ column of the above chart and likely reflects the compounding effects of short-term fatigue felt by defensemen late in their shifts, especially when these shifts are longer than average. The sample size concerns for long shifts should again be noted, as should the accompanying skepticism that our long-shift data accurately represent their underlying phenomenon.

The main strategic implications of these findings relate to optimal shift length. The results confirm the age-old coaching mantra of ‘keep the shifts short’, showing a positive correlation between shift length and rebound rates. Defensemen shift lengths should be kept to 34 seconds or less, ideally, since the data suggests that performance declines at an increasingly steep rate beyond this point. Further investigation is needed, however, before one can conclusively state that this is the optimal shift length.

Considering that allowing 4 rebound shots generally translates to a goal against, it’s strategically imperative to reduce rebound shot rates by recovering loose pucks in the defensive zone. Better-rested defensemen are better able to recover these pucks, as suggested by the strong, positive correlation between defensemen shift length and rebound rates. While further study is needed to establish causation, proactively managing defensive shift lengths appears to be a viable strategy to reduce rebound shot rates. 

Any hockey fan could tell you that shifts should be kept short, but with the depth of available data we're increasingly able to figure out exactly how short they should be.

In Search of Similarity: Finding Comparable NHL Players by Owen Kewell

By: Owen Kewell

The following is a detailed explanation of the work done to produce my public player comparison data visualization tool. If you wish to see the visualization in action it can be found at the following link, but I wholeheartedly encourage you to continue reading to understand exactly what you’re looking at:

https://public.tableau.com/profile/owen.kewell#!/vizhome/PlayerSimilarityTool/PlayerSimilarityTool

NHL players are in direct competition with hundreds of their peers. The game-after-game grind of professional hockey tests these individuals on their ability to both generate and suppress offense. As a player, it’s almost guaranteed that some of your competitors will be better than you on one or both sides of the puck. Similarly, you’re likely to be better than plenty of others. It’s also likely that there are a handful of players league-wide whose talent levels are right around your own.

The NHL is a big league. In the 2017-18 season, 759 different skaters suited up for at least 10 games, including 492 forwards and 267 defensemen. In such a deep league, each player should be statistically similar to at least a handful of their peers. But how to find these league-wide comparables?

Enter a bit of helpful data science. Thanks to something called Euclidean distance, we can systemically identify a player’s closest comparables around the league. Let’s start with a look at Anze Kopitar.

Anze Kopitar's closest offensive and defensive comparables around the league

Anze Kopitar's closest offensive and defensive comparables around the league

The above graphic is a screenshot of my visualization tool.

With the single input of a player’s name, the tool displays the NHL players who represent the five closest offensive and defensive comparables. It also shows an estimate of the strength of this relationship in the form of a similarity percentage.

The visualization is intuitive to read. Kopitar’s closest offensive comparable is Voracek, followed by Backstrom, Kane, Granlund and Bailey. His closest defensive comparables are Couturier, Frolik, Backlund, Wheeler, and Jordan Staal. All relevant similarity percentages are included as well.

The skeptics among you might be asking where these results come from. Great question.

 

A Brief Word on Distance

The idea of distance, specifically Euclidean distance, is crucial to the analysis that I’ve done. Euclidean distance is a fancy name for the length of the straight line that connects two different points of data. You may not have known it, but it’s possible that you used Euclidean distance during high school math to find the distance between two points in (X,Y) cartesian space.

Now think of any two points existing in three-dimensional space. If we know the details of these points then we’re able to calculate the length of the theoretical line that would connect them, or their Euclidean distance. Essentially, we can measure how close the data points are to each other.

Thanks to the power of mathematics, we’re not constrained to using data points with three or fewer dimensions. Despite being unable to picture the higher dimensions, we've developed techniques for measuring distance even as we increase the complexity of the input data.

 

Applying Distance to Hockey

Hockey is excellent at producing complex data points. Each NHL game produces an abundance of data for all players involved. This data can, in turn, be used to construct a robust statistical profile for each player.

As you might have guessed, we can calculate the distance between any two of these players. A relatively short distance between a pair would tell us that the players are similar, while a relatively long distance would indicate that they are not similar at all. We can use these distance measures to identify meaningful player comparables, thereby answering our original question.

I set out to do this for the NHL in its current state.

 

Data

First, I had to determine which player statistics to include in my analysis. Fortunately, the excellent Rob Vollman publishes a data set on his website that features hundreds of statistics combed from multiple sources, including Corsica Hockey (http://corsica.hockey/), Natural Stat Trick (https://naturalstattrick.com) and NHL.com. The downloadable data set can be found here: http://www.hockeyabstract.com/testimonials. From this set, I identified the statistics that I considered to be most important in measuring a player’s offensive and defensive impacts. Let’s talk about offense first.

List of offensive similarity input statistics

List of offensive similarity input statistics

I decided to base offensive similarity on the above 27 statistics. I’ve grouped them into five categories for illustrative purposes. The profile includes 15 even-strength stats, 7 power-play stats, and 3 short-handed stats, plus 2 qualifiers. This 15-7-3 distribution across game states reflects my view of the relative importance of each state in assessing offensive competence. Thanks to the scope of these statistical measures, we can construct a sophisticated profile for each player detailing exactly how they produce offense. I consider this offensive sophistication to be a strength of the model.

While most of the above statistics should be self-explanatory, some clarification is needed for others. ‘Pass’ is an estimate of a player’s passes that lead to a teammate’s shot attempt. ‘IPP%’ is short for ‘Individual Points Percentage’, which refers to the proportion of a team’s goals scored with a player on the ice where that player registers a point. Most stats are expressed as /60 rates to provide more meaningful comparisons.

You might have noticed that I double-counted production at even-strength by including both raw scoring counts and their /60 equivalent. This was done intentionally to give more weight to offensive production, as I believe these metrics to be more important than most, if not all, of the other statistics that I included. I wanted my model to reflect this belief. Double-counting provides a practical way to accomplish this without skewing the model’s results too heavily, as production statistics still represent less than 40% of the model’s input data.

Now, let's look at defense.

List of defensive similarity input statistics

List of defensive similarity input statistics

Defensive statistical profiles were built using the above 19 statistics. This includes 15 even-strength stats, 2 short-handed stats, and the same 2 qualifiers. Once again, even-strength defensive results are given greater weight than their special teams equivalents.

Sadly, hockey remains limited in its ability to produce statistical measurements of individual defensive talent. It’s hard to quantify events that don’t happen, and even harder to properly identify the individuals responsible for the lack of these events. Despite this, we still have access to a number of useful statistics. We can measure the rates at which opposing players record offensive events, such as shot attempts and scoring chances. We can also examine expected goals against, which gives us a sense of a player’s ability to suppress quality scoring chances. Additionally, we can measure the rates at which a player records defense-focused micro-events like shot blocks and giveaways. The defensive profile built by combining these stats is less sophisticated than its offensive counterpart due to the limited scope of its components, but the profile remains at least somewhat useful for comparison purposes.

 

Methodology

For every NHLer to play 10 or more games in 2017-18, I took a weighted average of their statistics across the past two seasons. I decided to weight the 2017-18 season at 60% and the 2016-17 season at 40%. If the player did not play in 2016-17, then their 2017-18 statistics were given a weight of 100%. These weights represent a subjective choice made to increase the relative importance of the data set’s more recent season.

Having taken this weighted average, I constructed two data sets; one for offense and the other for defense. I imported these spreadsheets into Pandas, which is a Python package designed to perform data science tasks. I then faced a dilemma. Distance is a raw quantitative measure and is therefore sensitive to its data’s magnitude. For example, the number of ‘Games Played’ ranges from 10-82, but Individual Points Percentage (IPP%) maxes out at 1. This magnitude issue would skew distance calculations unless properly accounted for.

To solve this problem, I proportionally scaled all data to range from 0 to 1. 0 would be given to the player who achieved the stat’s lowest rate league-wide, and 1 to the player who achieved the highest. A player whose stat was exactly halfway between the two extremes would be given 0.5, and so on. This exercise in standardization resulted in the model giving equal consideration to each of its input statistics, which was the desired outcome.

I then wrote and executed code that calculated the distance between a given player and all others around the league who share their position. This distance list was then sorted to identify the other players who were closest, and therefore most comparable, to the original input player. This was done for both offensive and defensive similarity, and then repeated for all NHL players.

This process generated a list of offensive and defensive comparables for every player in the league. I consider these lists to be the true value, and certainly the main attraction, of my visualization tool.

Not satisfied with simply displaying the list of comparable players, I wanted to contextualize the distance calculations by transforming them into a measure that was more intuitively meaningful and easier to communicate. To do this, I created a similarity percent measure with a simple formula.

Similarity Formula.jpg

In the above formula, A is the input player, B is their comparable that we’re examining, and C is the player least similar to A league-wide. For example, if A->B were to have a distance of 1 and A->C a distance of 5, then the A->B similarity would be 1 - (1/5), or 80%. Similarity percentages in the final visualization were calculated using this methodology and provide an estimate of the degree to which two players are comparable.

 

Limitations

While I wholeheartedly believe that this tool is useful, it is far from perfect. Due to a lack of statistics that measure individual defensive events, the accuracy of defensive comparisons remains the largest limitation. I hope that the arrival of tracking data facilitates our ability to measure pass interceptions, gap control, lane coverage, forced errors, and other individual defensive micro-events. Until we have this data, however, we must rely on rates that track on-ice suppression of the opposing team’s offense. On-ice statistics tend to be similar for players who play together often, which causes the model to overstate defensive similarity between common linemates. For example, Josh Bailey rates as John Tavares’ closest defensive comparable, which doesn’t really pass the sniff test. For this reason, I believe that the offensive comparisons are more relevant and meaningful than their defensive counterparts.

 

Use Scenarios

This tool’s primary use is to provide a league-wide talent barometer. Personally, I enjoy using the visualization tool to assess relative value of players involved in trades and contract signings around the league. Lists of comparable players give us a common frame through which we can inform our understanding of an individual's hockey abilities. Plus, they’re fun. Everyone loves comparables.

The results are not meant to advise, but rather to entertain. The visualization represents little more than a point-in-time snapshot of a player’s standing around the league. As soon as the 2018-19 season begins, the tool will lose relevance until I re-run the model with data from the new season. Additionally, I should explicitly mention that the tool does not have any known predictive properties.

If you have any questions or comments about this or any of my other work, please feel free to reach out to me. Twitter (@owenkewell) will be my primary platform for releasing all future analytics and visualization work, and so I encourage you to stay up to date with me through this medium.

Analysis: How 5 Elite Scorers Get Their Goals by Owen Kewell

By: Owen Kewell

There’s something beautiful about scoring a goal.

Goals are the building blocks that make up hockey success, both on the individual and team level. They are a single moment in time, a culmination of a series of plays that ends with one team’s attack successfully defeating the other’s defense.

Teams are forever searching to add goals to their lineup, and for good reason. Goals win games, playoff series and, eventually, championships.

Goal-scoring is a repeatable talent, and while certain NHLers are far better at it than others, each player does it their own way. Each scorer exhibits unique tendencies of shot type selection and shot location.

Alex Ovechkin, Evgeni Malkin, Connor McDavid, Nikita Kucherov, and Patrik Laine are five of the best scorers in the game. Of the 10 goal leaders for the 2017-18 season, these five players possess the highest career goals per game rates. They are the elite of the elite when it comes to putting the puck into NHL nets.

I wanted to explore how they each do it differently.

Elite Scorers 1.jpg

The above visualization separates by shot type to show how each player scored their goals in the 2017-18 season. Overall, the most popular shot type was wrist shot, followed by snap shot, slap shot, and finally backhand.

It should be noted that the ‘AVG (10+ G Forwards)’ represents a weighted average of the relevant shot rate among all forwards who scored 10 or more goals, weighted by the number of goals that they scored. It’s a way to quantify ‘normal’ rates for the league’s goal scoring forwards.

Let’s take a more detailed look at each of these five players.

 

Alex Ovechkin

Elite Scorers 3.jpg

It’s no secret that Alex Ovechkin is really good at scoring goals. Since breaking into the league, he’s won the scoring title 7 times and no one else has won it more than twice. Sitting at 607 career goals, Ovi continues to propel himself further up the list of all-time greats. His 0.605 goals per game ranks first league-wide, beating out all other forwards by at least 0.08 G/GP.

Ovechkin loves slap shots, which should come as no surprise to anyone who’s watched Washington’s power play operate. His 17 slap shot goals were an uncontested 1st league-wide, with Steven Stamkos being the only other forward to score more than 7. Ovechkin’s slap shot is so powerful that it beats goalies clean even whey they know it’s coming, meaning that it can be unleashed without needing to be disguised.

Equally noteworthy, Ovechkin scored just 31% of his goals by wrist shot, which represents the lowest rate among all 32 players who scored 30+ goals.

Heat Map courtesy of Micah Blake McCurdy's website HockeyViz (https://hockeyviz.com)

Heat Map courtesy of Micah Blake McCurdy's website HockeyViz (https://hockeyviz.com)

The red areas in the above heat map show where Ovechkin shoots more frequently than the rest of the league. Ovechkin makes an absolute killing at the top of the left faceoff circle, often referred to as the ‘Ovi Spot’. This area lines up with Ovechkin’s average shot distance of 32.3 feet, which ranked in the 80th percentile among the league’s forwards.

Although it’s not reflected in the heat map, much of Ovechkin’s damage is done with the man advantage playing the left point. Of his 49 goals, 17 were scored on the power play, which ranked 2nd only behind a player further down this list. His remaining 32 were scored at even-strength, which again ranked 2nd in the league. Elite scoring across both special teams and even-strength situations throughout his career has propelled Ovechkin to the status of the league’s premier goal scorer.

 

Evgeni Malkin

Elite Scorers 5.jpg
Elite Scorers 6.jpg

Despite being the second-best player on his team, Malkin has put together the resume of an elite goal scorer. He’s scored 75 goals in 140 games over the past two seasons, which converts to 44 goals over an 82-game season. His career goals per game of 0.472 ranks 6th among active forwards, placing him in elite company.

What makes Malkin dangerous is his offensive versatility; he can score from anywhere on the ice. Equal parts power and precision, Malkin possesses a variety of weapons. His snap shot goal rate clocks in at roughly double the league average (his 11 snap shot goals ranked 4th), but his middle-of-the-pack rates for wrist shots, slap shots and backhands speak to his balanced toolkit. Malkin does not rely on a single shot type to score goals, meaning that defenders must respect all shot types that Malkin credibly threatens. 

Heat Map courtesy of Micah Blake McCurdy's website HockeyViz (https://hockeyviz.com)

Heat Map courtesy of Micah Blake McCurdy's website HockeyViz (https://hockeyviz.com)

Did I mention that Malkin can score from anywhere? The sea of red is the beauty of Evgeni Malkin. He’s one of the most complete offensive players in the league. In addition to his heavy shot, his slick puck-handling ability and power forward frame allow him to generate shots and scoring chances at elite rates in the low slot area. His shot distance ranked just inside the upper third league-wide, influenced both by his crease-area chances and his shot activity in the high slot.

Malkin joins Ovechkin as the only two players in the league to finish top-10 in both even-strength goals and power play goals. He scored 28 times at evens, ranking 7th, and 14 times with the man advantage, ranking 6th. Malkin is one of the game’s most dangerous players in the offensive zone, and his goal scoring abilities rank among the NHL’s elite.

 

Connor McDavid

Elite Scorers 8.jpg
Elite Scorers 9.jpg

At this point, not much more needs to be said about Connor McDavid’s offensive game. His 108 points were enough for a second consecutive Art Ross (but not Hart) Trophy. He’s the been the league’s best forward for the last two years, and he’s only 21 years old.

But is he a goal scorer? While it’s true that McDavid has been viewed more as a set-up man than a finisher thus far in his young career, in 2017-18 we saw a transformation in McDavid’s offensive role. Compared to the year prior, McDavid scored 11 more goals and took 23 more shots. He became more of a trigger man, electing to attempt shots more often instead of looking to pass. This development calls to mind a young Sidney Crosby, who recorded seasons of 70 and 84 assists before breaking out for 51 goals in 2009-10.

McDavid prefers to score goals with his wrist shot. His 25 wrist shot goals ranked 3rd league-wide behind only Nathan MacKinnon and Eric Staal, while his rate of 61% ranked 9th among the 32 players who scored 30+ goals. He hardly ever takes slap shots, registering just 7 of these shots during the entire season, of which just 1 beat the goalie. Rather than rely on strength to generate power, McDavid creates offense thanks to generational skating and elite-level hands. He’s able to create and navigate space better than anyone else on the planet and puts himself into positions where a quick and accurate wrist shot is more than enough to beat the goalie.

Heat Map courtesy of Micah Blake McCurdy's website HockeyViz (https://hockeyviz.com)

Heat Map courtesy of Micah Blake McCurdy's website HockeyViz (https://hockeyviz.com)

McDavid has figured out hockey’s (not-so) secret formula: if you get close to the net, you’re more likely to score. He's extremely effective at using his speed, hands, and vision to attack the most dangerous area of the ice. McDavid’s sub-20’ average shot distance is a testament to his elite ability to generate scoring chances from the crease and low slot area.

McDavid’s special teams split is intriguing. His 35 even-strength goals ranked first in the entire NHL, but his 5 power play goals tied him for 96th among forwards. This latter can be explained both by Edmonton’s league-worst power play and also McDavid’s primary role as a puck distributor on the top unit. If Edmonton’s power play improves, which is likely given regression to the mean, McDavid’s special teams goal-scoring could very well take a step forward and supplement his elite even-strength scoring totals. He is already the game’s best forward and he poses a legitimate threat to become the game’s best scorer sooner rather than later.

 

Nikita Kucherov

Elite Scorers 11.jpg
Elite Scorers 12.jpg

A late 2nd round pick, Nikita Kucherov has emerged from relative anonymity to become one of the league’s most dangerous forwards. His 79 goals over the past two seasons place 3rd league-wide, and he was one of just three players to break 100 points in 2017-18.

While Kucherov’s absurdly accurate wrist shot remains his primary weapon (4th in wrist shot goals with 24), he is equally dangerous on the backhand. He scored 8 times (21% of all goals) on his backhand, ranking 2nd among 30+ goal scorers to Brad Marchand in both raw total and rate. Kucherov’s ability to score using wrist shots and backhands is all the more impressive considering that he shoots from further away than 93% of other forwards. He can be successful from this range without relying on the power of slap and snap shots due to his innate ability to find and exploit tiny gaps that goaltenders leave open. His shots are precise and accurate, and he excels at finding any available daylight.

Heat Map courtesy of Micah Blake McCurdy's website HockeyViz (https://hockeyviz.com)

Heat Map courtesy of Micah Blake McCurdy's website HockeyViz (https://hockeyviz.com)

An incredibly versatile player, Nikita Kucherov generates shots at elite rates all over the mid and high-slot. Rather than favour a specific shooting location, he elects to test the goalie from all areas of the offensive zone. This makes Kucherov unpredictable, which helps explain why his quick-release wrist shot and backhand are so devastating. He doesn’t shoot much from the crease area, but driving the net really isn’t part of how he creates offense.

Kucherov was more of a goal-scorer at even-strength than on the power play in 2017-18. He recorded 31 ES goals, one of just four players to crack 30, compared with 8 on the man advantage. He played more of a set-up role on Tampa Bay’s 3rd-ranked power play, registering 28 assists as he regularly sent cross-ice passes to Steven Stamkos (15 PP goals). Kucherov’s outstanding season cemented his status as one of the most dangerous goal scorers in the NHL, and at the prime age of 25 he’s as good a bet as any to repeat his offensive dominance next season.

 

Patrik Laine

Elite Scorers 14.jpg

At just 20 years old, Patrik Laine is already among the game’s premier snipers. His 44 goals ranked 2nd league-wide in 2017-18, fueling the Jets to their franchise-best season. Laine’s biggest asset is his shot, which may very well be the best in the league. Among current NHLers with 50+ career goals, Patrik Laine’s career shooting percentage of 18.0% ranks 2nd behind only Paul Byron. Byron, meanwhile, had an average shot distance of 19.3 feet in 2017-18, least of all eligible forwards, while Laine’s average shot came from 36.1 feet, ranking in the 97th percentile. The kid can shoot the puck.

Laine’s weapon of choice is his snap shot, which he routinely uses to one-time pucks into the back of the net. His quick release and accurate shot placement resulted in 14 snap shot goals in 2017-18, which tied for the league lead with Phil Kessel. He also is a fan of the slap shot, with his 6 slap shot goals placing him in a tie for 4th among all forwards.

Heat Map courtesy of Micah Blake McCurdy's website HockeyViz (https://hockeyviz.com)

Heat Map courtesy of Micah Blake McCurdy's website HockeyViz (https://hockeyviz.com)

Here we see Laine’s favourite shooting locations. A right-handed shot, Laine loves to one-time pucks from the high slot. The fact that he’s able to beat the goalie so consistently from so far away speaks to his talent as a shooter. Like Ovechkin, Laine’s shooting locations lack variety, but he’s so good from his spots that goalies have difficulty stopping the shot even if they can anticipate that it’s coming.

The triggerman for the Jets’ 5th-ranked power play, Laine lead all NHLers with 20 power play goals in 2017-18. He would routinely patrol the space between the left half-wall and left point, making himself open to cross-seam passes and one-timing his quick snapshot on net. His 24 even-strength goals tied for 20th in the league, so he’s no slouch at 5-on-5 scoring either.

Since breaking into the league, Laine has used his generational shot to pick apart opposing goalies. The odds-on favourite to inherit Ovechkin’s throne as best goal-scorer is the league, the sky’s the limit for a kid who potted 44 goals in just his second season in the league.

 

Conclusion

So there we have it; the modus operandi of five of the game’s elite. While Ovechkin, Malkin, McDavid, Kucherov, and Laine possess a shared gift for putting the puck in the net, they achieve it with vastly different sets of techniques, skills, and strategies. There is no uniform way to score a goal across the league, but all that matters is that it goes in.

With goals representing the currency of the NHL, goal-scorers are among the most valuable assets out there. Scoring goals wins you games, playoff series, and, as 32-year old Alex Ovechkin and 31-year-old Evgeni Malkin know, Stanley Cup championships. Kucherov (25), McDavid (21), and Laine (20) have not yet won hockey’s ultimate prize but given their relative youth and their otherworldly ability to put the puck in the net, they might not be far away.

 

Data courtesy of Hockey Abstract (http://hockeyabstract.com/testimonials), Natural Stat Trick (https://naturalstattrick.com), and NHL.com (https://nhl.com).

Shot heat maps courtesy of Micah Blake McCurdy’s wonderful visualization website HockeyViz (https://hockeyviz.com).

Does Goalie Rest Help Win a Cup? by Owen Kewell

By: Owen Kewell

On Thursday night, two third period goals scored in quick succession proved to be all that the Washington Capitals needed to defeat the Vegas Golden Knights. In doing so, they became champions, and the core built around Alex Ovechkin finally earned the right to lift the Cup after years of bitter playoff disappointment.

At some point in the Cup Final, I recall reading that both Braden Holtby and Marc-Andre Fleury played relatively few regular season games compared to most starting goalies. I looked it up, and it’s true. Holtby ranked 18th among goalies in TOI this past season, while Fleury came in at 25th.

The two goalies who made it furthest in the 2018 playoffs had a relatively light regular season workload. Could this be more than coincidence? Could a lighter workload directly translate into improved playoff performance? My first thought on the matter was that a goalie who played fewer regular season games would experience less fatigue, and so would be better suited for a long and grueling playoff run. Intuitively, this theory is pleasantly logical, but does it hold any merit?

The Data

To tackle this question systematically, I examined the number of regular season games played by starting goalies of all playoff teams dating back to the 2007-08 season. I defined a playoff run’s starting goalie as the goalie who played the most minutes for that team in that playoff run. I grouped the goalies by the number of series that their teams won, thus separating goalies by degree of playoff success. I then looked at the number of regular season games played by the goalies in each group.

Cup-winning goalies tend to play 7-9 fewer regular season games&nbsp;

Cup-winning goalies tend to play 7-9 fewer regular season games 

The numbers in the coloured boxes show the median GP value for all starting goalies whose teams won the number of playoff series found on the horizontal axis. It’s worth noting that I prorated games played for the lockout-shortened 2013 season as if it were a standard 82 game season.

Interestingly, when we group by degree of playoff success, we can see that the goalies who went on to win the Stanley Cup generally played fewer regular seasons games than did the goalies who went on to be eliminated at one point or another. This certainly supports the hypothesis that having your starter play fewer games would help your chances in the playoffs. Let’s take a closer look at these Cup-winning goalies.

Regular season workload of Cup-winning goalies

Regular season workload of Cup-winning goalies

Of these 11 goalies, only 2 appeared in 60 or more regular season games: Jonathan Quick’s 69 games in 2011-12, and Marc-Andre Fleury’s 62 games in 2008-09. Comparatively, this rate of 2/11 is quite low:

Cup-winning goalies reach 60+ GP less frequently than any other group

Cup-winning goalies reach 60+ GP less frequently than any other group

Only 18.2% of Cup-winning goalies reached 60+ GP, while 47.2% of all playoff starters reached the same threshold. The difference between the two figures is stark, but let’s remember that sample size is a crucial piece of context. Due to the nature of awarding a title, we can only glean a single data point per season. As such, we have just 11 data points, and that’s including 2017-18 Braden Holtby.

We can’t ignore the possibility that Group 4’s low rate of 18.2% was caused by chance. If we were to simulate 11 random trials that each independently had a 47.2% chance of producing a certain outcome, as we established is league average for hitting 60 GP, the binomial distribution tells us that there’s a 4.8% chance that 2 or fewer of the trials would produce the desired outcome. In other words, there’s a 4.8% chance that the observed statistical phenomenon can be completely explained by random chance.

Shifting perspective, this also means that there’s a 95.2% chance that the result is not entirely attributable to chance, and there’s that at least some form of relationship that exists between a goalie’s workload and their likelihood of winning a Stanley Cup. The results, though produced in a small sample size, certainly suggest that a goalie being well-rested contributes to their ability to lead their team to a championship.

So I Should Rest My Goalie, but When?

This was my follow-up question. Accepting that a well-rested goalie is an ingredient in the Stanley Cup recipe, does it matter when that rest happens during the season?

To highlight patterns in the workload of the same 11 Cup-winning goalies, I split each of their regular seasons into thirds (Games 1-27, 28-55, and 56-82) using schedule data from https://www.hockey-reference.com. For each section of games, I examined the starter’s proportion of their team’s total goaltending minutes. For example, in Games 1-27 of Washington’s 2017-18 season, Holtby played 1162:34, which was 71.5% of all TOI for Washington goalies. The chart below shows data for all goalies, including a group median.

Cup-winning goalies tend to have their lightest workload in the season's middle third&nbsp;

Cup-winning goalies tend to have their lightest workload in the season's middle third 

Cup-winning starters tend to play a larger proportion of their team’s minutes during the first third (Games 1-27) and the last third (Games 56-82) of the regular season. Comparatively, during Games 28-55, they tend to play about 7% less frequently. The emphasis on the beginning and end of the season is logical: a team must win games early to build a comfortable position in the standings, and a team must win games late to enter the playoffs firing on all cylinders.

This chart suggests that the best time to rest a starting goalie is during the middle third of the season. This is not an inflexible rule, however, as we can see that there are many ways to structure rest over the course of a season and experience playoff success. Holtby, for what it’s worth, was at his busiest during the middle third of this past season and was still able to remain sharp throughout the playoffs.

Conclusion and Takeaways

Over the last decade, we’ve seen well-rested goalies lift the Stanley Cup more often than not. The empirical data support the notion that resting starters more frequently, particularly in the middle third of the season, will increase the likelihood of playoff success. This means that NHL coaching staffs with championship aspirations could gain an advantage by proactively managing their starter’s workload throughout the season.

Over-reliance on a starting goalie induces fatigue and invites the risk of said goalie being unable to maintain their performance over a two-month playoff run. While teams with strong starting goalies have tendencies to lean on them heavily throughout the regular season, this may be detrimental to championship aspirations. If a coach truly wanted to maximize their team’s Stanley Cup chances, they must ensure that their starting goalie is rested enough to maintain physical and mental focus over an extended playoff run. If this can be done, the team will be one step closer to hockey’s ultimate prize.

All data taken from Natural Stat Trick (https://www.naturalstattrick.com/) unless otherwise specified.

Investigating the Disappearance of Vegas’ First Line by Owen Kewell

By: Owen Kewell

The Golden Knights kept finding ways to pull it off. Driven by all-world goaltending, an opportunistic counter-attack, and the desire to prove the rest of the hockey world wrong (especially their former teams), the group that James Neal affectionately dubbed the ‘Golden Misfits’ put together a Cinderella run through the Western Conference and into the Stanley Cup Final.

Only midnight appears to be approaching faster than anticipated.

After a 6-2 loss at the hands of the Washington Capitals yesterday, the Golden Knights find themselves searching for answers as their first elimination game in franchise history looms. The last three games, which Vegas has lost by a combined score of 12-5, featured a team that appeared much different from the group we saw roll their way through the Western Conference and into a 1-0 Stanley Cup Final lead.

So what’s different?

Goaltending is the obvious answer. After posting a save percentage above .930% for each of the first three rounds, Fleury’s mark is a paltry .845% through four games in the Final. Anyone could point out that Fleury needs to be better, and while it’s not wrong, it’s not particularly insightful.

Instead, I wanted to investigate the play of Vegas’ other big guns, who have been similarly subpar in their recent string of losses. I’m referring to the Knights’ three-headed monster of a top line, which features William Karlsson between Jonathan Marchessault and Reilly Smith. These three have been catalysts for their team’s offense all season and are similarly 1-2-3 in team scoring for these playoffs.

The table below compares all-situations production of Vegas’ top line during the first 16 playoff games, which includes Rounds 1-3 and Game 1 of the Cup Final, versus their production in the last 3 games.

Graphic 1.jpg

We can clearly see that the group’s production has dropped off. While the trio was averaging well over one goal and three points per game through the first 16 games, they’ve managed only one goal and four points total in the last three games. Goals are low-frequency events by nature, though, so to properly evaluate their play in a sample as small as three games we need to look at the higher-frequency plays that lead to goals. The table below reflects even-strength play where Vegas’ 1st line is on the ice together.

Graphic 2.jpg

A few numbers jump out from the above table. While the top line is generating significantly more shot attempts than previously, they are producing fewer shots on goal. This means that a higher proportion of the line’s shot attempts are being blocked, and those that aren’t being blocked are missing the net more often. Only 38.3% of the line’s shot attempts in the last three games are reaching the net, which is down more than 10% from the previous 16 games.

Graphic 3.jpg

Elsewhere, the line’s event rates are down across the board. Per 60 minutes, Marchessault, Smith, and Karlsson are generating 4.86 fewer scoring chances, 1.19 fewer high danger chances, and 1.64 fewer goals than they did in the previous 16 games. Much of the reduced scoring can be explained by a decrease in the unit’s on-ice shooting percentage, but the line’s decreased scoring chance generation remains a worrying red flag.

Offensive production, or a lack thereof, does not exist in a vacuum. I would be remiss if I did not acknowledge the work that Matt Niskanen and Dmitry Orlov have done in neutralizing Vegas’ top line. This pairing has been heavily leaned upon to shut down Vegas’ stars, especially in Games 3 and 4 when Washington had last change as the home team. Using William Karlsson and Dmitry Orlov as proxies for Vegas’ 1st line (VGK L1) and Washington’s 1st pairing (WSH P1), we can see what proportion of VGK L1’s 5-on-5 minutes were played against WSH P1 in each game thus far.

Graphic 4.jpg

Vegas’ lone victory came in the only game where their top line was able to play most of their even-strength minutes away from Washington’s top shutdown pairing. Since then, VGK L1 has seen a healthy dose of Orlov and Niskanen, and their production has suffered.

Whether attributable to a lack of execution or stellar opposing defense, the play of Vegas’ first line has been insufficient in their last three games. Their goal-scoring is down by more than half, fewer shot attempts are reaching Braden Holtby, and the line isn’t producing scoring chances at their usual rate.

For Vegas to begin climbing out of the hole they find themselves in, their top line will need to reverse these trends and find a way to produce. If they don’t manage to do so, the strike of midnight might be right around the corner.

All statistics courtesy of Natural Stat Trick (https://www.naturalstattrick.com/)

The Stanley Cup Formula: An Investigation Through Machine Learning by Scott Schiffner

By: Owen Kewell

NHL seasons follow a formulaic plotline.

Entering training camp, teams share a common goal: win the Stanley Cup. The gruelling 82-game regular season separates those with legitimate title hopes from those whose rosters are insufficient, leaving only the sixteen most eligible teams. The attrition of playoff hockey gradually whittles down this number until a single champion emerges victorious, battle-tested from the path they took to win hockey’s top prize. Two months off, then we do it all again.

Teams that have won the Stanley Cup share certain traits. Anecdotally, it’s been helpful to have a dominant 1st line centre akin to Sidney Crosby, Jonathan Toews or Anze Kopitar. Elite puck-moving defensemen don’t hurt either, nor does a hot goalie. Delving deeper, though, what do championship teams have in common?

I decided to answer this question systematically with the help of some machine learning.

Some Background on Classification

Classification is a popular branch of supervised machine learning where one attempts to create a model capable of making predictions on new data points. We do this by building up, or ‘training’, the model using historical data, explicitly telling the model whether each past data point achieved the target class that we’re trying to predict. In the context of hockey, this data point could be some number of team statistics produced by the 2015 Chicago Blackhawks. The target here would be whether they won the Stanley Cup, which they did.

Sufficiently robust classification models can identify a number of statistical trends that underpin the phenomenon that they’re observing. The models can then learn from these trends to make reasonably intelligent predictions on the outcome of future data points by comparing them to the data that the classifier has already seen.

Building a Hockey Classifier

We can apply these techniques to hockey. We have the tools to train a model to learn which team statistics are most predictive of playoff success. To do this, we must first decide which stats to include in our dataset. To create the most intelligent classifier, we decided to include as many meaningful team statistics as possible. Here’s what we came up with:

team stats.jpg

It’s worth noting that we engineered the ‘Div Avg Point’ feature by calculating the average number of points contained by all teams in a given team’s division. The remaining statistics were sourced from Corsica and Natural Stat Trick. An explanation of each of these stats can be found on the glossaries for the two websites.

Our dataset included 210 data points: 30 teams per season over the 7 seasons between 2010-11 and 2016-17. Each data point included team name, the above 53 team stats, and a binary variable to indicate whether the team in question won the Cup. Using this data, we trained nine different models to recognize the statistical commonalities between the 7 teams whose seasons ended with a Stanley Cup championship. The best-performing model was a Logistic Regression model trained on even-strength data, and so all further analysis was conducted using this model.

Results: Team Stats that Matter Most

To evaluate which team stats were most strongly linked to winning a Cup, we created a z-score standardized version of our team data. We then calculated the estimated coefficients that our logistic regression model assigned to each team stat. The size of these coefficients indicates the relative importance of different team stats in predicting Stanley Cup champions. The 5-highest ranking team stats can be seen below:

top 5 team stats.jpg

Of all team statistics, ‘Goals For Per 60 Minutes’, or GF/60, is most predictive of winning a Stanley Cup. Of the 7 champions in the dataset, 4 ranked within the top 5 league-wide in GF/60 in their respective season, with 2016-17 Pittsburgh most notably leading the league in the statistic. Impressive results in ‘High Danger Chances For’ and ‘Team Wins’ both strongly correlate to playoff success, while ‘Scoring Chance For Percentage’ and ‘Shots on Goal For Percentage’ round out the top 5.    

What Does It Mean?

Generating a list of commonalities among past champions allows us to comment on what factors impact a team’s likelihood of going all the way. Most apparent is the importance of offense. It is more important to generate goals and high-danger chances than it is to prevent them, as GA/60 and HDCA rank 36th and 13th among all statistics, respectively (their corollaries are 1st and 2nd). In the playoffs, the best team offense tends to trump the best team defense, which we saw anecdotally in last year’s Pittsburgh v Nashville Final. If you want to win a Stanley Cup, the best defense is a good offense.

offense vs defense.jpg

We can see that a team’s ability to generate scoring chances, both high-danger and otherwise, is more predictive of playoff success than their ability to generate shots. Although hockey analytics pioneers championed the use of shot metrics as a proxy for puck possession, recent industry sentiment has shifted towards the belief that shot quality matters more than shot volume. The thinking here, which is supported by the above results, is that not all shots have an equal chance of beating a goalie, and so it is more important to generate a shot with a high chance of going in than it is to generate a shot of any kind. Between a team who can consistently out-chance opponents and a team who can consistently out-shoot opponents, the former is more likely to win a hockey game, and therefore playoff series.  

Application: The 2017-18 Season

A predictive model isn’t very helpful unless it can make predictions. So let’s make some predictions.

By feeding our model the team stats produced by the recently-completed 2017-18 regular season, we can output predictions of each team’s likelihood of winning the 2018 Stanley Cup. Since this is the fun part, let’s get right to the probability estimates for all 31 NHL teams:

probability estimates.jpg

The rankings above essentially indicate how similar each team’s season was to the regular season of teams that went on to win it all. In doing so, they hope to identify the teams most likely to replicate this success The model favours the Boston Bruins to win the 2018 Stanley Cup, predicting a victory over the Nashville Predators in the Final.

The above data highlights a few curiosities. Notably, we can see that some non-playoff teams had 5-on-5 numbers that were relatively comparable to past Cup champions. Specifically, the Blues, Stars, and Flames played 5-on-5 hockey well enough this season to qualify for the playoffs. The Blues and Flames can attribute their disappointingly long off-seasons to the 30th and 29th-ranked power plays, respectively. The Stars’ implosion is more of a statistical anomaly, and while conducting an autopsy would be interesting it would be better served as a subject for another article.

The lowest-ranked teams to have made the playoffs in the real world are the New Jersey Devils and the Washington Capitals. While their offensive star power might have been enough to get these squads to the dance, the model predicts a quick exit for them both.

A Computer-Generated Bracket:

2018bracket.jpg

For fun, I’ve filled out the above bracket using the class probability rankings generated by our model. Of the 8 teams who have won or are winning their first-round playoff series, the model picked 7 of them as at the winner, with Philadelphia being the exception. While it’s far too early to comment on the model’s accuracy, as only a single playoff series has been completed, it’s an encouraging start.

Limitations of the Analysis

The above results must be considered in the appropriate context. The model was trained and tested using only 5-on-5 data, which would explain the lack of love for teams with strong special teams like Pittsburgh and Toronto. The model is also blind to the NHL’s playoff format. Due to the NHL’s decision to have teams play against their divisional foes during the first two playoff rounds, teams in strong divisions have a much harder road to winning a Cup. Consider that Minnesota’s path to the conference final would likely involve Winnipeg and Nashville in the first two rounds, who finished 2nd and 1st in NHL standings in the regular season. Divisional difficulty is not reflected in the probabilities listed above, though incorporating divisional difficulty either probabilistically or through a strength of schedule modifier could be areas of further analysis.

A final limitation of the model is that it is trained using only 7 champions. In an ideal world, we would have access to dozens or hundreds of Stanley Cup positive instances, but due to the nature of the game there can only be one champion per year. We considered extending the dataset backwards past 2011 but ultimately decided against doing so. The NHL is different today than it was in the past. Training a model on a champion from 2000 tells us little about what it takes to have success in 2018. Using 2010-11 onwards represented a happy medium in the trade-off between data relevance and quantity.

What next?

Winning a Stanley Cup remains an inexact science. While it’s valuable to identify trends among past winners, there is no guarantee that what’s worked before will work again. It’s a game of educated guesses.

I believe that the most legitimate way to build a Stanley Cup winner is a combination of the past and the future. Analyzing historical data to identify team traits that are predictive of a championship is half the battle. The rest is anticipating what the future of the NHL will look like. The champions of the next few years will be lead by managers who are best able to identify what it’ll takes to win in the modern NHL. While the above framework approaches the first half in a systematic way, the latter remains much harder to crystallize.

In the meantime, let’s turn to what’s in front of our eyes. The playoffs have been tremendously entertaining thus far, and that’ll only pick up as teams are threatened by elimination. Let’s enjoy some playoff hockey. Let’s see which playing styles, tactics, and matchups seem to work. Let’s learn.

Even if your team gets eliminated, just remember that this season’s playoffs are just a couple months away from being data points to train next season’s model.

Then we do it all again.

Playoff Preview: Toronto Maple Leafs vs. Boston Bruins by Anthony Turgelis

By: Kurt Schulthies

Monday May 13, 2013:

The city of Toronto was electric. Competing in the Stanley Cup Playoffs for the first time in 12 seasons, the Toronto Maple Leafs inched their way to game 7 against the heavily favoured Boston Bruins. Continuing an improbable run led by Phil Kessel, Nazem Kadri, James Van Riemsdyk, Cody Franson, Dion Phaneuf, and James Reimer.

I was with a dozen of my closest friends, sitting at the head of the table in a Shoeless Joe’s party room. Every detail of that night is vivid in my mind -- for what was about to come can only be described as demoralizing. The Leafs held a 3 goal lead with less than 11 minutes to go in regulation time.

The lead evaporated. The Bruins’ eventual overtime winner became an inevitability.

Without a word, I immediately got up from my seat and stormed out of the bar. I glanced over at the patrons -- and to this day, I have never seen so many people simultaneously unsure how to react.

Present Day

Toronto is a dramatically different team. Now led by their sophomore phenom Auston Matthews, the Leafs look for revenge against the team that crushed the hopes of an entire fanbase five years ago.  

Taking an analytics-focused view, let’s see how Toronto and Boston compare now.

Offensive Matchup

Screen Shot 2018-04-12 at 5.23.24 PM.png
Screen Shot 2018-04-12 at 4.40.22 PM.png
All data used is courtesy of Corsica and NaturalStatTrick

The Leafs are superior to the Bruins in every major offensive category. Toronto is one of the highest paced teams in the league, relying on their high-end offensive talent to best opponents. Boston had a similarly strong offensive season, but failed to generate a significant amount of high danger scoring chances per 60 minutes of play. This can likely be attributed to the Bruins' slower paced style of play.

                               Toronto                                                                       Boston

Screen Shot 2018-04-12 at 4.31.41 PM.png
Screen Shot 2018-04-12 at 4.31.56 PM.png
Screen Shot 2018-04-12 at 5.14.40 PM.png

 

The visuals above show the league rank of each forward in 5v5 primary points per 60 minutes. This metric is highly repeatable year over year, and gives a somewhat accurate depiction of a player’s offensive prowess. However, numbers are somewhat skewed by factors such as the quality of their linemates and the quality of competition faced.

The first thing that stands out about the Leafs’ chart is Auston Matthews. He ranks first league wide in 5v5 P1/60. Fans can expect him to be a constant threat, and the biggest ‘X-factor’ player in the series. Boston is led by what is likely the league’s most dominant first line. It is one of the only lines that is capable of dominating the overpowering combination of Auston Matthews and William Nylander.

Heat maps created and available on HockeyViz.com

Heat maps created and available on HockeyViz.com

Toronto is incredible at generating high danger scoring chances. This metric is much more predictive of goal scoring than stats such as ‘shots’. In contrast, Boston is far below league average at generating scoring chances right in front of the net, but remain a threat in the high slot. Toronto outperforms metrics such as Corsi for and scoring chances due to their admirable scoring talent, and high number of odd man rushes per game. Boston has slightly above average shot quality, meaning they likely score near their expected results according to Corsi and scoring chances.

Defensive Matchup

Screen Shot 2018-04-12 at 5.23.35 PM.png
Screen Shot 2018-04-12 at 4.40.33 PM.png

                 Boston

Zdeno Chara - Charlie McAvoy

Torey Krug - Kevan Miller

Matt Grzelcyk - Adam McQuaid

               Toronto

Morgan Reilly - Ron Hainsey

Jake Gardiner - Nikita Zaitsev

Travis Dermott - Roman Polak

Boston has been an excellent defensive team this season, beating Toronto in every major defensive category. The Bruins are one of the best shot suppression teams in the NHL, forcing teams to shoot from unfavourable scoring positions. In contrast, the Leafs allow a high concentration of dangerous scoring chances from the slot, leading to a much worse defensive performance. Shots against location heat maps for each team can be seen below:

Heat maps created and available on HockeyViz.com

Heat maps created and available on HockeyViz.com

Toronto gives up a lot of high danger chances, leading to a higher expected goals against per game. It also means the team underperforms metrics such as corsi and scoring chances. Boston, in contrast, is excellent at shot suppression. This leads to outperforming metrics such as corsi and scoring chances, and results in a very low expected goals against per game.

Goaltending Matchup

Both the Leafs and Bruins boast top tier goaltenders with Frederik Andersen and Tuuka Rask. Using a goalie comparison tool created by Tyler Kelley (@DocKelley41), we are able to compare each goalie by key metrics:

Compare other goalies at: https://public.tableau.com/profile/tyler7457#!/vizhome/GoalieTool/2017-18ComparisonTool

Compare other goalies at: https://public.tableau.com/profile/tyler7457#!/vizhome/GoalieTool/2017-18ComparisonTool

For more on what each metric means, read here. The values on the x-axis of the graph are the percentile ranks that each of their stats fall on. Frederik Andersen is near the top of the charts with his Goals Saved Above Average. This is unsurprising considering the aforementioned shaky Leafs defense and the great play of Andersen so far this year. The stat highlights that if an average goalie were to be placed in the Leafs net in front of Andersen, they would be expected to concede a lot more goals. By this metric among others, it appears Andersen has a small edge over Tuuka Rask this season.

Prediction

The team statistics would suggest the Boston Bruins are the favourites in this series. However, in head-to-head matchups in the Toronto Maple Leafs have been the better team with a 7-1-0 record in 8 games over the past 2 seasons. This series should be a war, and one of the most likely first round matchups to go to 7 games. With that being said, my final prediction is Leafs in 7 games.


Keep up to date with the Queen's Sports Analytics Organization. Like us on Facebook. Follow us on Twitter. For any questions or if you want to get in contact with us, email qsao@clubs.queensuca, or send us a message on Facebook.

Playoff Preview: Winnipeg Jets vs. Minnesota Wild by Scott Schiffner

By: Owen Kewell and Scott Schiffner

The calm before the storm.

The brackets have been setup, the matchup strategies developed, and the razors hidden away. For the first time since June, playoff hockey is here. We are mere hours from the puck drop that’ll kick off the 2017-18 Stanley Cup Playoffs, the starting pistol for a two-month long marathon where only one team can cross the finish line. In anticipation of this, we at the Queen’s Sports Analytics Organization decided to tee up the matchups featuring Canadian teams. We start with the Winnipeg Jets, who will play host to the Minnesota Wild on Wednesday night. The first round playoff series between the Central division rival Winnipeg Jets (2nd, 52-20-10) and the Minnesota Wild (3rd, 45-26-11) is an exciting matchup that is sure to feature a high level of speed, talent, and physicality from both sides. Both squads have enjoyed productive seasons, with the Jets posting the best record of any Canadian team, finishing with 114 points.

Offensive Matchup

Winnipeg enters the series with the reputation of having one of the most lethal forward groups in the league. Lead by a rejuvenated Blake Wheeler (91 points) and 44 goals from sophomore winger Patrik Laine, the Jets possess high-end offensive firepower that has torched the league for the better part of the season. Minnesota, meanwhile, enjoyed strong seasons from Eric Staal (76 points), Mikael Granlund (67 points) and Jason Zucker (64 points). Let’s take a quick look at some summary statistics from the regular season.

Stats from Corsica.hockey

Stats from Corsica.hockey

The Jets scored 23 more goals than the Wild over the season, though much of this can be explained by their superior power play. Jets skaters had a higher shooting percentage, though the difference is too small to reasonably infer superior shooting ability. The Jets outperformed the Wild at generating shot attempts and scoring chances, though the Wild were able to create more high-danger scoring chances. While individual point totals suggest Winnipeg has more high-end forwards, we can examine depth charts to clarify the picture.

depth chart.jpg

The graphic above shows the current depth charts (courtesy of Daily Faceoff) and each player’s rank among NHL forwards in even-strength primary points per 60 minutes. Here we confirm our belief that Winnipeg’s forward group is much deeper than Minnesota’s, as we can see that six Jets produced at a top-line rate compared to just three Wild players. To understand how the above results were achieved, we turn to heat maps.

Heat maps created and available on HockeyViz.com

Heat maps created and available on HockeyViz.com

The red areas indicate locations where a team shoots more frequently than league average, while blue is the inverse. In these maps we can see two teams who have a very different approach to generating offence. The Jets set up a triangle of attack, which results in a high volume of shots coming from the points and the mid-high slot. Being able to attack the slot with such regularity doubtlessly contributed to the success that the Jets experienced this season. The Wild, meanwhile, seem to play more on the perimeter with the goal of funneling pucks towards the crease. This explains why Minnesota produced more high-danger chances than the Jets despite generating less total scoring chances.

The offence matchup clearly favours Winnipeg. The Jets have the top-end firepower and the depth to roll scoring threats on every line. Throw in a dangerous power play, and the Jets are dangerous enough to make life miserable for anyone attempting to contain them.

Defensive Matchup

Winnipeg Jets:

Josh Morrissey – Jacob Trouba

Joe Morrow – Dustin Byfuglien

Ben Chiarot – Tyler Myers

Minnesota Wild:

Jonas Brodin – Matthew Dumba

Carson Soucy – Jared Spurgeon

Nick Seeler – Nate Prosser

The Winnipeg Jets allowed 216 goals in 2017/18, with 144 coming at even strength, while Minnesota allowed 229 goals (144 at 5v5). Winnipeg gave up an average of 31.9 shots per game, while Minnesota surrendered 31.3 on average. In terms of possession metrics, Winnipeg controlled 51.42% of shot attempts over the course of the 2017/18 season, good for 10th in the league, while Minnesota sits 29th with only 47.17% of shot attempts.

Comparing the top pairing defencemen for both teams using HERO charts:

http://ownthepuck.blogspot.ca/2017/05/hero-charts-player-evaluation-tool.html

http://ownthepuck.blogspot.ca/2017/05/hero-charts-player-evaluation-tool.html

The Minnesota Wild’s defence corps has taken a significant blow going into the postseason with the loss of number 1 defenseman Ryan Suter, who logged an average of 26:46 minutes of ice time per game before suffering a season-ending ankle injury on March 31. Veteran defender Jared Spurgeon remains a game-time decision due to an injured hamstring. The burden to cover these minutes will fall squarely on the shoulders of young defensemen Jonas Brodin and Matt Dumba, who will be counted on in key defensive situations. The Winnipeg Jets boast a tough lineup of physical defencemen, including Dustin Byfuglien and Tyler Myers, who will look to shut down the Wild’s top offensive lines. The Winnipeg Jets have the edge when it comes to top-tier defencemen, as well as much stronger depth on the blueline overall.

Finally, let’s compare the heat maps for both Winnipeg and Minnesota in their own defensive zones.

Heat maps created and available on HockeyViz.com

Heat maps created and available on HockeyViz.com

Taking a look at these maps, both teams are effectively limiting the number of scoring chances from high-danger scoring areas around the net (<25 feet) and in the slot. Minnesota’s heat map clearly indicates that the majority of chances are coming from the point (>40 feet out from the net) and down the right side, a potential weakness that Winnipeg’s quick wingers will look to exploit. Winnipeg’s defence is managing to limit almost all chances from high-scoring areas directly in front of their net, keeping the majority of shot attempts to the outside perimeter of the rink.

Goaltending Matchup:

We close our positional matchups by considering goaltending. Winnipeg will rely on Connor Hellebuyck, who broke out this year to post the winningest season ever by an American goalie. The young upstart will go toe to toe with Devan Dubnyk, the waiver-wire reclamation project that Minnesota has turned into a competent starter. Dubnyk has the qualitative advantage of playoff experience, but let’s see how the numbers stack up.

goalies.jpg

Unless otherwise specified, the above percentages reflect even-strength play. We see that Hellebuyck and Dubnyk performed similarly at even strength, as their save percentages for low, medium and high danger shots are all within a single percentage point. Where we see a difference, however, is on the special teams. While these stats are influenced by the quality of special team units, we see that Hellebuyck has significantly outperformed Dubnyk on both power plays and penalty kills. We also see that Hellebuyck saved about 2 goals more than expected given the quality of the shots being faced, whereas Dubnyk was over 7 goals in the hole on this metric.

If there had to be a choice between the two to start a Game 7, Connor Hellebuyck would be a safe choice. Despite his inexperience, his exceptional season played a huge role in Winnipeg’s ascension to 2nd place in the NHL’s overall standings. He’s shown to be better than Dubnyk at stopping the puck, and for that reason, he gives his team a better chance to win.

In summary, the numbers indicate that Winnipeg has the advantage in terms of offense, defense, and goaltending. The Jets enter the playoffs on an absolute tear, having won 11 of their last 12 games. They are 3-1-0 vs. the Wild in their season series. We are predicting that the Winnipeg Jets will be victorious in their first-round series against the Minnesota Wild, likely in 5 or 6 games.  

How the Queen's Men's Hockey Team is Using Analytics - Interviewing Director of Analytics, Miles Hoaken by Anthony Turgelis

By: Anthony Turgelis (@AnthonyTurgelis)

If you've ever thought that sports analytics could only be implemented in national leagues, where there is plenty of data made publicly, then it's time to think again. Miles Hoaken is a first year Queen's University student in the Commerce program, that is the creator and director of the analytics department for the Queen's Men's Hockey team. In Miles' first year alongside the coaching staff, the team was able to break the school's record of most wins in a season (19) and finish second in the OUA Eastern conference. I sat down with Miles to talk about how he uses analytics to help make the team even better, and for tips on how other students can start getting into hockey analytics.

Thanks for coming today and agreeing to do this interview. I’m sure many students who support the Queen’s Men’s Hockey Team aren’t aware that there is an analytics department for the team, let alone that it’s run by a Queen’s student. Tell us a bit about who you are and what you do.

My name’s Miles Hoaken and I’m from Toronto. I started getting into hockey analytics when I was 13 years old. Basically, the Leafs lost game 7, blowing a 4-1 lead (as I’m sure a lot of you are aware), which made me realize that there might be another layer that myself and Leafs management weren’t paying attention to, and since they’re my childhood team I tend to follow them more. I started a blog when I was 13 years old, writing down some ideas that I had that were based on some hockey analytics, but not a lot since I was only 13 and I didn’t have the math background at the time to understand what some of the stats were. In 2014, the summer of analytics, I saw tons of people getting hired and realized it was realistic for analysts to get hired based on the work they produced on their blogs or Twitter, so I decided to get further into analytics, started writing more on my blog, and then in Grade 12 I got an analytics position with my high school team. I did statistical consulting on their play, mainly analyzing zone entries but varied depending on what the coach wanted from me on that day. From there, I parlayed that into my role with Queen’s, which is essentially running all their analytics and statistical operations. I basically serve as a coach on their coaching staff, so I’m right there in the office helping make decisions, advising the coach on certain strategy items, giving presentations to the players on occasion, and running that whole operation. We take a variety of stats, mainly pertaining to offensive output since that was the area coach was most interested in.

So you’re 18 years old and working with the coaches for the Varsity Hockey Team here at Queen’s for players who are often 3-5 years older than you. Cool to think about. How do you get the data that you use?

I get all the data live at the games, and it’s all tracked by hand. I print-out templates before the game that have everything that I’m going to fill out, for example, for an entry chart I’ll have categories to see who entered the zone, what type of entry it was (controlled or uncontrolled), what general location it was, and then some counting stats. To get the shot locations, I simply mark them down on a piece of paper and fill in the numbers in my spreadsheet after the game. This works well for us since we are trying to do them all live. I unfortunately don’t have the time luxury to go through all the games for many different stats and many different viewings, because I would probably fail all my school courses if I did. So it has to all be live, and has to all be fast, so the best way to do that right now is by hand. Next year, we have five other people helping me track stats, which should allow us to have more data to work with, but the long-term goal is to automate these parts of the job so that when I graduate, the analytics department could be run by one person at the click of a button.

Are you looking for more students to help out?

Right now we’ve filled all of our data-tracker positions for the upcoming year. We’re always looking for coders who can help out on some of the stuff on the presentation side since building a portal for the coaches is something that I’m trying to do. At my current level of coding I don’t think I could do it, but eventually with some help I think that we can get there. Keep watching our Facebook page, after next-year we’ll be looking for more data-trackers.

How has the coaching staff responded to your work with them?

The whole staff has been very receptive to analytics. Sometimes I come in with crazy ideas, but they really bear with me and take into account what I’m saying. Credit goes to Brett Gibson, when I walked into his office in the first meeting, I was a bit of an unknown and we were going to use an iPad app to track stats. I was able to convince him that the iPad app wasn’t that good and would be a waste of his money and the program’s money and that they should instead trust me and my templates. Maybe it takes a little bit of logic and a bit of crazy to trust an 18 year-old that he had never met before, but he put his faith in me and gave me this role and I will forever be grateful to him for that. He’s done a great job of incorporating me into the decisions and making sure my voice is heard. It’s something he didn’t have to do but I’m really glad he did. It’s been a great situation with the coaches, and coach Gibson has brought the program from a point where we only had 4 former CHLers when he started, to 21 CHLers now so that speaks to his work ethic and commitment to the program for sure.

What’s your relationship like with the players? Do you think they’ve bought in to your recommendations?

I’ve presented to them once so far. It was interesting to read the room because it seemed like the people at the top of the list for the stats I was presenting had a quicker buy-in to what I was talking about. The players at the bottom of the list seemed to look a little bit more confused by it, but what I found that the players near the bottom of the list actually had a larger increase in these stats than those near the top of the list, which made me think they were responding well to it. They also get access to my reports after every game.

Do you do any coding as part of what you do?

I would say that half of my job is in the rink doing the tracking and recommendations, and the other half is during the week, coding and making programs. The report I give to the coach after every game contains some offensive statistics which are all generated by graphs on the program R. I set it up so that I can simply change the game number and it will generate the code for that game. It’s a big part of what I do, if anyone is looking to get into hockey analytics, I would say the first thing to learn is coding because it will just make everything a lot easier. I also use coding to generate statistics on the league. I have a web scraper that takes the raw data from the U-Sports website and then turns it into ‘fancy stats’ – Goals for %, Shots for %, some I’m even able to get for 5v5 play through the data that they give us. So coding is a big part of it, I use R, personally but there is a big debate in the hockey analytics community between R and Python – you really can’t go wrong with either. I’m learning Python as part of a coding course at Queen’s next year (CISC121), but R is what I started with and the one I feel the most comfortable with.

This year you spoke about what you do for the Queen's Men's Hockey Team at VANHAC (Vancouver Hockey Analytics Conference). The presentation link is here. Tell us about your experience at VANHAC. Would you recommend it for those who are interested in working or learning about hockey analytics?

VANHAC was a really great experience for me. I went as a high school student, it was sort of like my grad trip. Some people go on S-Trip, I went to a hockey analytics conference which I think tells you all you need to know about my personality and my passion for this. *laughs* VANHAC is really awesome, it’s probably the best conference in North America, in terms of your value and hockey analytics specifically. Sloan (MIT) is the big one for sports analytics in general which I hope to go to someday. Really though, if you want to meet people from NHL teams, see some of the best research that’s come out recently, you have to go to VANHAC. It’s great because you don’t necessarily need to be an expert to go, some people were there with no experience whatsoever, didn’t know what Corsi was and ended up really enjoying it so it’s a really fun environment. The hockey analytics community is one of the most welcoming communities ever. When I was presenting there this year I didn’t feel nervous at all, so I definitely recommend it to any hockey analytics fan or even someone just trying to get into it.

Do you think we’ll ever see analytics at the forefront of U-Sports hockey? I feel like if more students knew that what you do is possible, there might be more focus towards it leading to each team having their own student-led analytics department.

At VANHAC, Brad Mills (@MillsBradley11) who’s the COO of Hockey Data (@HockeyDataInc), he approached me after my presentation and since he played in the NHL, we started talking about how the game is changing from the advances of analytics since he played. He said that given the amount of teams in U-Sports, and given all the statistics I was using, it would cost ~$11,000 to do what I do for every single team in the regular season. I was surprised at how little it was, but at the same time, I mentioned “That $11,000 is only worth it if we have all the data and nobody else does” since that’s what gives us our competitive advantage. That was actually one of the questions I received after my presentations which was “Do you do any analysis on players from other teams” and the answer to that is no, because the public information I can get is points, and I have no idea where these points are coming from necessarily, or if their skilled in any other way that a micro-stat could capture but I don’t have access to it. There are definitely people like me at other Universities, maybe not to the same extent or scale since we’re becoming one of the more advanced ones, especially given the amount of trackers we’ll have next year. I know Western and UOttawa have an analytics person as well but some teams don’t even have that voice in the room, and with that sometimes you can get into groupthink.

You’re active on Twitter (@SmoakinHoaken). How has Twitter been a learning tool for you?

Twitter has been huge for me, I got Twitter when I was 13, which you can probably tell from my handle (@SmoakinHoaken) (Hannah Montana reference). It’s been really key, people post their research on Twitter first, and people have gotten hired not because of Twitter, but because of the work they’ve put out on Twitter. It’s great for questions too, if you’re new to hockey analytics, you can use the hashtag #HockeyHelper and Alex Novet or someone from @HockeyGraphs will reply to you really fast with some advice.

Aside from @QSAOqueens, what are 5 Twitter accounts that you recommend hockey analytics enthusiasts to follow?

@IneffectiveMath – Micah Blake McCurdy (www.hockeyviz.com) – I got to meet him at VANHAC and he posts a lot of cool visuals and has a patreon with premium content which allows him to make even more graphs. His theme is that numbers are tired, and pictures are wired, which I really like. We’re actually trying to incorporate more pictures and visuals with Queen’s next year.

@AlexNovet and @HockeyGraphs, who post Hockey Graphs’ new articles on Twitter.

@SteveBurtch – I think he said he tweets a thousand times a month or something like that, so you get a lot of content that’s interesting. As he’s joked about himself, he has a surprisingly low “Bad-takes/60 tweets”, so you should definitely follow him.

@nnstats -  Superbowl champion, someone I really look up to for advice on coding and life etc. She will be the first female GM for sure.

@MannyElk – If you’re looking for hockey twitter, but also salads and interesting takes on pop culture, you should definitely follow Manny.

Manny is certainly a fun follow, and the others are great as well. How would you recommend hockey enthusiasts to learn more about the analytics behind the game, aside from reading all of the great content on www.qsao-queens.com and attending QSAO events?

I’d say read a lot. That’s what I did throughout my highschool years, I would just read and read and read until I finally felt comfortable presenting these ideas to a coach to volunteer. If you’re not comfortable learning coding just yet, learn everything you can about Excel or other data visualization software. Also learn to effectively communicate your ideas. I know that if I present my idea to the coaching staff as a bunch of numbers, they will not care. If I explain how the idea could be implemented and show them that it works in some setting, then it’s way more likely to be accepted. I think that’s a big problem with some analysts, they can be a little cantankerous or have a high and mighty attitude at times where it’s ‘them-vs-the-world,’ but that mentality won’t serve them well in life. So it’s really important to communicate these ideas effectively, in my opinion.

All the VANHAC talks are on Youtube (link to the playlist here) so watching all of those would be great. If you’re still in high school, or even University, find a way to work with them in any capacity. I started by running a Twitter account in Grade 11 for the Don Mills Flyers. From there, I met plenty of interesting people in the industry that I still keep in contact with. This gig helped me work for my high school team, which ended up being analytics.

Thanks for doing this Miles, looking forward to working with you in the future to help make the Queen's Men's Hockey Team even better.


Keep up to date with the Queen's Sports Analytics Organization. Like us on Facebook. Follow us on Twitter. For any questions or if you want to get in contact with us, email qsao@clubs.queensuca, or send us a message on Facebook.

Using Pitch Values to Preview the Blue Jays' 2018 Starting Pitchers by Anthony Turgelis

By: Anthony Turgelis (@AnthonyTurgelis) and Jordan Moore

Data Visualizations by: Adam Sigesmund (@Ziggy_14)

All data from Fangraphs, all projection values from ZiPS.

Baseball is back! Tomorrow afternoon, the Toronto Blue Jays will take on the New York Yankees to open their 2018 season. While there are certainly reasons to be optimistic and pessimistic about this Toronto Blue Jays team, their starting rotation remains to be seen as a strength. This article will first introduce Fangraph’s Pitch Value system and how they evaluate pitch effectiveness, and later preview the Blue Jays starting rotation so it can be seen what every pitcher has to offer.

Fangraphs Pitch Value System

The idea behind the Fangraphs Pitch Value System is to assign run values to how a pitcher performed while using this pitch. They are then compared to the average results, to determine whether each pitch value is below or above average, and by how much. These can also be viewed for hitters, who generate similar calculations based on how effective they are against each type of pitch. For this article, we will use the standardized calculations which are calculated on a ‘per 100 pitch basis’, since each pitcher’s pitch frequency widely varies, but to provide this further context we will also include their pitch mix from the current year.

The limitations of Pitch Values are that they are not always predictive, and can vary from year-to-year. Also, there is likely to be some variance depending on which batters the pitcher had to go up against, since the batter’s ability to hit each pitch will affect the results. If a pitcher were to happen to face batter’s who are less-skilled or less-prone to hitting a curveball (for example), a curveball-heavy pitcher may post higher curveball values due to this lucky arrangement, which may not be entirely indicative of their curveball results going forward. To offset this, their career pitch values will also be included, so if there did happen to be a year where a certain pitch performed wildly different (which again, could be from external factors), their career numbers could also be used to predict their value going forward. In addition, Pitch Values are only calculated for: fastballs (wFB/C), curveballs (wCB/C), changeups (wCH/C), cutters (wCT/C), sliders (wSL/C), knuckleballs (wKN/C), and splitters (wSF/C). Sinkers are included in fastball calculation.

The 2017 best and worst values will also be highlighted for reference on how effective each pitch is to the worst and best values that qualified (large enough sample) starting pitchers have produced. Values here were omitted if the pitch was less used in less than 15% of their pitches thrown, which could create small sample size noise and overstate its value. Using this statistic, let’s see where the Blue Jays Starters stack up against the rest of the league.

J.A. Happ - Opening Day Starter

29748036_1811370635574668_1748325984_o.png
happ.png

We begin with the Opening Day starter, J.A. Happ. Happ is a curious case as his career numbers probably don’t reflect his true ability at this point in his career - in a good way. After being traded from the Blue Jays to the Seattle Mariners, the Mariners later flipped Happ to the Pirates at the deadline. Here, Happ linked up with pitching coach Ray Searage, who has notoriously shaped-up the careers of many pitchers. Things clicked for Happ who turned a curveball that was arguably his weakest pitch, to one that achieved very good results, possibly because of its reduced usage.

His career pitch value on his curveball sits at -0.97 - below average - but in the last two seasons has achieved a curveball value of 1.53 and 1.41, above average and ranking him 18th among pitchers who pitched 100 innings of more last season. As seen in the graph, Happ had an above average fastball last year and throughout his career, a not-so-great slider and a not-so-great changeup. The thing to remember here is that these values are rate stats, so the fastball grading out positive is the most important part since it’s a pitch he throws ~70% of the time. Happ had an fWAR of 2.9 last season (38th among pitchers with 100+ IP), and is projected to put up an fWAR of 2.7 this upcoming season. The often-underappreciated Happ should continue to be one of the Jays most consistent pitchers.

Aaron Sanchez - Friday's Probable Pitcher

29746404_1809282842450114_1019471883_o (1).png
29747315_1811198782258520_1085325222_o.png

Aaron Sanchez, the 6’4 25 year-old California native is looking to stay healthy and pick up where he left off in the 2016 season. 2017 was a very disappointing season for Sanchez as a blister on his throwing hand kept him sidelined for all but 8 games in the season. In 2016, Sanchez held a 15-2 record with an ERA of 3.00 which was good enough to earn him his first All-Star game nod (replacing injured Craig Kimbrel). Considering Sanchez's small 2017 sample, his 2016 and career numbers will be used for comparison purposes.

In 2016, Sanchez held a fastball value average of 0.94, leading him to throw a fastball 74.60% of the time. What makes Sanchez’s fastball so unique is the ball tends to move like a breaking ball, however it still packs extreme heat causing his fastball to produce great results. Sanchez’s fastball value improved in 2016 compared to his career average of 0.86, so there is lots of optimism the young stud can continue this positive trend if he stays healthy. Sanchez's high value and high usage means that he sits down a lot of batters on his fastball, and it is a top pitch.

In 2016, Sanchez also saw an increase in his curveball value raising to 0.68 compared to his career average of 0.11, so we may see more of a curveball added to his arsenal in 2018 (threw a curveball 16.27% in 2016).  Sanchez saw a small increase in his change-up value in 2016 compared to his career average (0.27 to 0.7), so hopefully he can continue to develop his change-up as well. With one all-star game under his belt already, and his fastball, curveball, and change-up all improving in 2016 compared to his career average, Sanchez has a very bright future as he approaches his prime.

If Sanchez can continue to improve in all 3 of his pitches, he has the potential to be a CY Young candidate. In 2016, he produced an fWAR of 3.8, but ZiPS is projecting him at 2.3 fWAR for the coming season. If he can stay healthy, he has a chance to be significantly higher than that, if not the Jays might be in trouble. Injuries seem to be the only thing stopping Sanchez at this point in his career, so he and the Blue Jays will be hoping he stays on the field as much as possible. 

Marco Estrada - Saturday's Probable Pitcher

29633656_1809282845783447_144122169_o (2).png
29693606_1811175578927507_439077539_o.png

Coming into his fourth season with the Toronto Blue Jays, seasoned veteran Marco Estrada is looking to bounce back from a year where he saw all 4 of his pitches drop below his career averages in value. The 34 year old has been known as a location pitcher throughout his career, putting the baseball in the corners of the strike zone forcing the batter to make difficult decisions on whether to swing or not.

Throughout Estrada’s career, he has evolved into a change-up specialist, however his change-up value fell off a cliff in the 2017 season, dropping below the league average to a weak score of -0.7. This may indicate batters have solved the puzzle of his change-up pitch, or it could indicate age is taking a toll on Estrada’s performance (he will be 35 in July).  With a seasoned pitcher like Estrada however, there is always room for optimism as he could rebound in the 2018 season and bring his change-up value closer to his career average where his change-up value sits at 0.63, above the league average. Estrada’s fastball saw an insignificant value drop of 0.05 compared to his career fastball value of 0.28, indicating his arm strength is still healthy while he’s approaching age 35. Estrada saw the biggest drop in his curveball, dropping to a disappointing -1.57 in value, which may explain the low percentage of this pitch choice in 2017 (7.70%). Estrada’s cutter has always been very below average, and he saw this pitch drop in value as well to -1.63 while only throwing a cutter 6.70% of the time. The low percentages of curveball and cutters thrown in the 2017 season indicate he’ll rely heavily on his fastball and change-up again in 2018, so hopefully Estrada can rebound this season and find his change-up groove again.

Even in a down year, Estrada managed an fWAR of 2.6 in 2016, and is projected for 2.1 fWAR this coming season. Estrada was a guy who had outperformed his FIP in each of the three previous seasons before last, and will likely need to find out how to do that again this coming season, and figure out how to surpress contact like he used to.

Marcus Stroman - Sunday's Probable Starter

29633071_1809282852450113_1859610696_o.png
29746437_1811175595594172_2050757573_o.png

The would-be Opening Day starter had he not picked up a minor injury in Spring Training, many look to Marcus Stroman as the face of this ball club. Standing at 5' 8", Stroman is proof that Height Doesn't Measure Heart, and that if you can throw a baseball, you do not need to tower over the competition to be a starting pitcher.

Stroman leans heavily on his fastball and his slider, which is a good call considering those are his two best pitches. His slider pitch value is consistently above league average at 1.46 last year and 1.22 throughout his career. He had the 10th highest pitch value out of all qualified starters last year. The movement on this pitch can sometimes be just insane, which is seen in the video below:

Nasty, even though it missed the zone. His fastball grades out as above average as well, which is very valuable given the high usage. His tertiary pitches don't grade out as well, with his cutter and curveball getting near-average pitch values throughout his career. His 2017 cutter value was left in the graph to illustrate how a small sample size can affect this stat. His usage was only 2.4% last year, and a few unlucky results could really sway the pitch value stat. This shouldn't be a reason for concern.

Stroman had an fWAR of 3.4 last year, and ZiPS expects him to take another step forward this season projecting an fWAR of 4.5 this season. Expect him to battle with Aaron Sanchez this year to be regarded as the team's Ace.

Jaime Garcia - Monday's Probable Starter

29632656_1809282839116781_552419505_o (1).png
29681431_1811175582260840_1813309167_o.png

The only new addition to the list, Jaime Garcia called St. Louis home for 8 years where he won the world series in 2011, until he fell victim to the trading carousel of the MLB. He was traded to the Atlanta Braves on December 1 of 2016, where he recorded a 4-7 record with the Braves before being traded to the Minnesota Twins on July 24 in 2017. Less than a week later, Garcia was traded to the New York Yankees. On February 15, 2018, Garcia signed a 1 year deal (with a team option for a 2nd) with the Toronto Blue Jays where he is hoping to again find his groove and play his way into a multi-year contract. He has a standard 4-pitch mix with no pitch extremely more or less dominant than the rest

In 2017, Garcia’s fastball had a value of 0.28. He threw a fastball 60.36% of the time in 2017, and considering it's above average results, is a good weapon for him. Garcia’s curveball took a statistical dive in 2017 as he had a career curveball 0.03, and in 2017 this value dropped to a dismal -1.87, putting it in the lowest tier of value for qualified starters. In 2017 he only threw a curveball 6.74% of the time, which means that the negative results didn't hurt him overly often. We shouldn’t expect to see Garcia throw a lot of curveballs this year unless he can get better results with it. Garcia’s slider value in 2017 was right on par with his career slider value at -0.79, so we may see some reduced usage, but this pitch does seem to be one of his weaker ones. Garcia also saw a small improvement in his change-up last year, bringing his value up to 0.24 compared with his career change-up value of 0.11. Perhaps Estrada and Garcia can work together to improve one another’s change-up as Estrada is considered a change-up specialist who had a terrible year last year. 

Garcia put up an fWAR of 2.1 last season across his 3 teams. ZiPS projects him to put up an fWAR of 1.6 this season, which is perfectly acceptable for a 5th starter. If Garcia can get into a groove and continue to get good results on his fastball and change-up values, he will fit well into the Blue-Jays pitching rotation and he can be a valuable asset to the team for his pitching, leadership skills, and World Series experience. If not, it's a one-year deal that won't hurt in the long-run, which makes it a good signing considering where the Jays are at in this point of time.

Happy Opening Day everyone, we hope you'll follow along with QSAO as the season progresses for more Jays and MLB analysis. Catch the Blue Jays Opening game against the New York Yankees on March 29th at 3:37pm on Sportsnet Ontario.


Keep up to date with the Queen's Sports Analytics Organization. Like us on Facebook. Follow us on Twitter. For any questions or if you want to get in contact with us, email qsao@clubs.queensuca, or send us a message on Facebook.

NHL Player Comparison Tool Guide by Anthony Turgelis

By: Owen Kewell and Adam Sigesmund (@Ziggy_14)

Player comparison is a popular topic of debate among armchair general managers: which guy is better? Would you rather have Player A or Player B? In the wake of a big 1:1 trade, which team won? While in the past we were left to bias, favouritism, and the infamous eye test, today we have some visualization tools to help compare players across useful metrics.

HERO Chart:

One of the best and most intuitive of these tools is the HERO Chart, as pioneered by Domenic Galamini Jr. (@MimicoHero). These charts, which are within the realm of descriptive statistics, can be found at the following website: http://ownthepuck.blogspot.ca/

Below we can see Alex Ovechkin’s HERO Chart:

Ovy.JPG

What Stats Are Measured?

HERO charts show performance across five stats: ICETIME, GOALS, FIRSTA, SHOTGEN, and SHOTSUP. ICETIME refers to all-situation (even strength, power-play, or short-handed) minutes per game. GOALS measures 5-on-5 goals per 60 minutes, while FIRSTA measures 5-on-5 first assists per 60 minutes. SHOTGEN is 5-on-5 shots generated per 60 minutes and SHOTSUP is 5-on-5 shots suppressed per 60 minutes, both relative to average. These stats are measured across the most recent three seasons, with weightings of 44%-33%-22% respectively to ultimately reach a single measure.

It’s important to note some key features of these metrics. Aside from ICETIME, the other four stats are measured only at even-strength and per 60 minutes of playing time. This serves to level the playing field, and accounts for the situation and frequency with which different players are deployed. Making these adjustments gives us a better sense of a player’s true performance, though we must consider HERO chart results in an appropriate context. Logging massive minutes and special teams scoring remain hugely important parts of the game, so they should not be disregarded when evaluating a player’s usefulness even if they are not reflected in a player’s HERO chart.

What Do the Numbers Mean?

Each of the numbers you see represents a standardized rating from 0 to 10. A rating of 5 represents league average performance at a skater’s position, with a standard deviation of 2 in either direction. For example, as we can see, Alex Ovechkin is league average at first assists compared to eligible wingers. A rating above 5 shows performance above league average, and vice versa. The scores are normally distributed with a standard deviation of 2. We can see that Ovechkin is considerably above league average at generating shots, and somewhat below league average at suppressing shots.

Can I See Someone’s Stats Over Time?

Yes you can! Just under the HERO chart you’ll find a chart showing how the player has performed over recent years. The dark blue line represents primary points per hour, and the light blue line represents shot impact per hour. Here is Ovechkin’s. We can see a slow decline, though Ovechkin remains a strong performer in both metrics.

Ovy2.JPG

How Do I Compare Players Directly?

HERO charts were largely built to perform direct comparison, so when you enter Domenic’s website you’ll see two charts beside each other. You can select players of your choice from the dropdown menu  for either chart and see a direct comparison. Let’s compare two elite centres: Sidney Crosby and Connor McDavid.

Scanning the charts, we can see that Crosby ranks higher in goals and shot generation, while McDavid ranks higher in first assists and shot suppression. Both players are fantastic across the board.

cros mc.JPG

What Else Can I Do?

In addition to comparing players to other players, we can compare players to positional archetypes. For example, we could see how Max Pacioretty stacks up compared to the average first-line winger, or how Morgan Rielly performs relative to an average #1 defenceman. Below we can see Pacioretty’s chart:

Pac.JPG

If you’re interested in learning more about how the archetypes are calculated, there’s a section labelled ‘Chart Guide’ on the website containing an explanation of the methodology. Personally, I (Owen) enjoy using archetype comparisons to evaluate acquisitions that my favourite team makes, as it gives a high-level indication of where a player could fit into a lineup. It’s also useful for convincing your friends that the young guy you’re bullish on has legitimate upside, and that your team is going to go all the way because of it.

I Have Unanswered Questions - Where Do I Go?

That’s a quick and dirty explanation of what HERO charts are and how to use them. If you have any burning questions that are unaddressed, I encourage you to read through the HERO chart FAQ’s that Domenic published. The link can be found here: https://ownthepuck.wordpress.com/2017/01/21/hero-charts-frequently-asked-questions/.

All-3-Zone Player Comparison Tools:

Eric Tulsky once said "the magic of analytics is in recording all of the small things lost to memory that add up to something significant.” The easiest events to remember after you watch a hockey game are the big events: the goals, and sometimes even the shots. What you probably don’t remember, though, are the small plays that led up to those events, and the small plays that led to nothing at all. Tulsky worked with people like Corey Sznajder (@Shutdownline) to study the events in the neutral zone that drive offense. Although Tulsky now works for the Hurricanes, Sznajder runs a massive tracking project whose numbers are brought to life by CJ Turtoro's (@CJTDevil) All-3-Zones Player Comparison Tools. Before we learn about these tools, it is important to note that Sznajder literally watches every game to collect these stats, as opposed to the data from HERO charts which are released by the league and then displayed as you saw earlier. The sample sizes in these visuals are smaller as a result, but we will see in a moment how they capture some important ways that players create value for their teams.

There are two sets of visuals, which can be found at the links below:

  1. https://public.tableau.com/profile/christopher.turtoro#!/vizhome/ZoneTransitionsper60/5v5Entries

  2. https://public.tableau.com/profile/christopher.turtoro#!/vizhome/2-yearA3ZPlayerComps/ComparisonDashboard

First, we will discuss the set of visuals you can find by clicking that first link above. Below, you will see a screenshot of one of the four visuals available at that link:

Entries.JPG

The stats displayed on this page quantify what happens when a player tries to enter the offensive zone with the puck. He can either carry it in (carry-ins/60), dump it into the zone and then chase after it (dump-ins/60), pass it off to a teammate (Entry passes/60) or fail in his attempt (fails/60).

We care about these numbers because entering the offensive zone with control of the puck is a reliable way to create offense. It is one way to quantify a small thing lost to memory that gives rise to something significant. As you can probably see from the leaderboard above, players who succeed at entering with control are better at creating offence than those who struggle to bypass opposing defenders. This is why the players here are sorted by possession entries (carry-ins + entry passes per 60 minutes).

While tracking carry-ins is a way to quantify the creation of offence, we can also use these numbers to quantify defence. Whenever a player tries to carry the puck into the offensive zone, the opposing defenders want to stop them. The best defenders in these metrics allow the fewest possession entries. The worst ones allow attackers to create offence with ease. It should not surprise you, then, that attackers try to target the defenders who struggle to defend the blue line. Defenders who allow possession entries 90% of the time they are targeted by opposing teams are obviously quite poor at defending the blue line. Below, you will see which defenders allow the fewest possession entries as a percentage of the number of times they were targeted:

Entry D.JPG

Some of the best defenders in the league show up in this leaderboard, which is further validation that what we are studying is actually important. It is always a good sign when the numbers are validated by the eye test and by years of research.

The best defensive teams either prevent zone entries altogether, or they remove the puck from the defensive zone as soon as possible. Indeed, zone exits are another way to measure defensive contributions in hockey, for both forwards and defensemen. The screenshot below shows which players succeed at removing the puck from their zone:

Exit.JPG

Again, positive contributions are measured by Possession Exits/60. Exiting with possession of the puck occurs when a player carries the puck out of the defensive zone (carries/60), or when they make a successful pass to a teammate (Exit passes/60). If a player fails to exit the zone with the puck, it is obviously a failed attempted (Fails/60). If he dumps it, clears it, or ices the puck, he is merely giving the other team another chance to create offence, which is why Possession Exits/60 ignores Dumps/60, Clears/60, and Icings/60. Exiting the defending zone with possession of the puck is obviously better than not.

So far, we have learned how to quantify the ways players transition from the defensive zone to the neutral zone, and then into the offensive zone. All of these numbers have one underlying theme: Puck possession leads to shots. But how do we measure which players create the most shots? While the obvious answer is to count the number of shots a player takes, the tracking project takes this one step further, and counts up to three passes before each shot is taken. In the same way that points are counted as goals and assists at the player level, the tracking project keeps track of shots and the passes that precede them. The visual below illustrates how each player contributes to shots by shooting or passing:

Shot.JPG

This leaderboard ranks players by their Total Shot Contributions per 60 minutes. A player contributes to a shot if he is the shooter (Shots/60), or if he made at least one of three passes before the shot was taken. Assisting on a shot is the same as assisting on a goal, except Shot Contributions consider up to three passes before a shot while points only consider two passes. If a player made a pass immediately before the shot was taken it is called a Primary Shot Assist (sA1/60), if he made the second pass before the shot it is a Secondary Shot Assist (sA2/60), and if he made the third pass it is a Tertiary Shot Assist (sA3/60). Altogether, shot contributions are an excellent and reliable way to measure which players are creating offence.

Now that we have explored this first set of 4 visualizations, we can move on to the second part: The Player Comparison Tool. As you will see below, the Player Comparison Tool presents the numbers in a way that summarizes all of the stats we have learned about from the leaderboards. Take a look:

Subban.JPG

Most of the stats seen here should seem familiar, but this time they are aggregated to provide you with a more general snapshot of each player. For example, the Shot Contributions leaderboard we saw earlier broke down Shot Contributions into four stats: shots, primary shot assists, secondary shot assists, and tertiary shot assists. The Player Comparison Tool, summarizes these numbers to measure shooting (Shots60), passing (ShotAssists60; sA1/60 + sA2/60 + sA3/60), and total contributions (ShotContr60; Shots60 + ShotAssists60).

The zone entry leaderboard is summarized in the Entry section, using possession entries expressed as a rate stat (PossEntries60) and possession entries expressed as a percentage of total entry attempts (PossEntry%). Similarly, the zone exit leaderboard is summarized in the Exit section.

It is important to note that if you are viewing a forward using this tool, you will only see the first three sections. The fourth section, Entry Defence, is only available for defenders. This section summarizes the aforementioned Entry Defence per Target leaderboard. As discussed earlier, the best way to defend the blueline is to prevent attackers from entering the zone with control of the puck. A defender who breaks up a play at the blue line is credited with breaking up the play (Breakups60). Defenders who concede controlled zone entries less often are the ones who rank best in the second stat (PossEntriesAllowed60). This is also expressed as a percentage of the number of times the defender is the target of an attempted zone entry by the other team (PossEntry% Allowed).

You can view a players results in two 1-year windows and one 2-year window, covering the 2016-17 season and the 2017-18 season. This allows you to compare one player to himself (in consecutive seasons) or two players to each other (in the same single season or across both seasons simultaneously). As shown in the intro to analytics article, an example that motivates the study of the former is Nikita Zaitsev’s first two NHL seasons. If you are feeling extra fancy, you can also view two different players with the same name...

Aho.JPG

Although the most valid comparisons are those between players of the same position, which is obviously not true of the Sebastian Aho’s, it demonstrates one of the many ways you can be creative with these visuals once you start using them. With these tools at your disposal, you can answer silly questions like “Is Sebastian Aho better than Sebastian Aho?” along with more  objective ones such as “Who contributes to offence the most often?” and “Which defenders are best at defending the blueline?” It would be impossible to answer any of these questions without the hard work of people like Sznajder, Turtoro, Tulsky, and the mission to record mundane elements of the game that uncover hidden areas of player value.


Keep up to date with the Queen's Sports Analytics Organization. Like us on Facebook. Follow us on Twitter. For any questions or if you want to get in contact with us, email qsao@clubs.queensuca, or send us a message on Facebook.

An xGuide to Soccer Analytics by Anthony Turgelis

By: Anthony Turgelis (@Anthony Turgelis), Erik Kiudorf, Jovan Novakovic

The State of Soccer Analytics

Relative to other major sports, soccer lags behind with regards to its acceptance of analytics within the game. Soccer is an extremely traditional sport that is usually reluctant to change, so this should not come as a huge surprise. While there are some that are ignoring, there are some that are using this as a competitive advantage - and it’s really working in some cases.

In a game as fluid as soccer, it is difficult to understand the game objectively amidst differing opinions from players, fans, coaching staff and the media alike. However, the recent growth of analytics in soccer provides an element of objectivity. It introduces new measures of predictability that encourage analysis, in an area where it is currently lacking.

Another reason that soccer analytics lags behind to the public eye, is due to the rarity and inaccessibility of the data. Not to mention the complexity and quantity of data required to fully capture value on an open-play sport with infinite game outcomes. The company that holds the monopoly on advanced soccer data is called Opta, and they track every game in every major soccer league around the world. Since there are a lot of games to cover worldwide, lots of things to track, and only a few groups doing it, it’s not hard to see why this data is easy to monopolize. As a result, this data is either difficult to scrape from the web, or too expensive for personal use as it is believed to be priced in the four digit range per year for a license for a single league’s worth of data, but obviously this varies by use and is not confirmed by Opta themselves. As a result, it is difficult, but not impossible, to practice public soccer data analysis.

There are still other ways though! Sites like WhoScored and Squawka offer simple game stats for teams and players, although they are not exportable with traditional methods. For MLS specifically, American Soccer Analysis offers many features to get your fix for advanced stats, which will be highlighted throughout the article. These concepts can be used as evaluation tools, to confirm the eye-test, or to just enhance the viewing experience of the game.

How Teams are Using Analytics

Although statistical analysis is not new to soccer - where pass counts, pass completions and shots taken, for example, are often recorded - such stats only provide information of certain events in the game, while lacking further insight. Soccer analytics helps identify and acquire insight regarding potential players’ performances based on previous data sources collected from past performances. These advancements enable coaches and managers to utilize this data to plan more effective training programs, team selections, and game strategies.  

Analytics can be broken down into technical and physical categories. The physical aspects account for distance covered, intensity, number of accelerations and decelerations and jumps and lands. This data is most often utilized to monitor individual training loads which helps minimize injuries. The Seattle Sounders of Major League Soccer mainly focus on sports science along with physical analytics to ensure players are at their physical peaks and to prevent injuries

However, technical analytics act as a tool to help players and coaches to quantitatively assess individual and based team performances. This information is used to improve both individual and team performances and design successful strategies for upcoming games. These mechanisms can also provide knowledge to predict outcomes of games, create new game strategies, determine the price value of a player and connect players to brands and sponsorship opportunities. Devin Pleuler, Senior Manager of Analytics at Toronto Football Club, explains the importance of analytics in Major League Soccer “The players are on a salary cap but the analytics department is not so it’s a way you can set yourselves apart in a relatively cheap manner”. Analytics helps us quantify individual in-game events to provide an understanding of the probability of success, often evaluated by estimating goal scoring potential. It assigns values to the events - events being each stat category - to help better understand and coordinate tactics and systems. Coaches and managers can use this data to tailor tactical systems for upcoming games that are backed by objective information, translating to higher success rates on the field.

It's no surprise then, that in a game where analytics is finally starting to carve out a place for itself, that the two using it the most heavily in the MLS, have ended up in back-to-back MLS Cup finals against each other. Fun tidbit, when these two teams first competed in the MLS Cup finals, TFC's Senior Manager of Analytics challenged the Sounders' Director of Analytics, Ravi Ramineni, to a friendly wager:

No word on whether Devin actually gave up his calculator or not, as TFC did end up losing that round. If he did, perhaps he got it back the next year when TFC was victorious over the Sounders.

Expected Goals (xG)

The most popular and most cited advanced metric in soccer analytics is Expected Goals (xG). Generally, expected goals is the count of how many goals a player should have been expected to score on, based on the quality of their chances. There are many models attempting to capture this, some better than others, but none are perfect. The main two inputs that can be found in most, if not all xG models, is where the shot took place, and how the shot was taken.

The ‘where’ of the shot refers to both the distance and angle of the shot. Logically, it seems to make sense that the further away a player is from goal the less likely their shot is to result in a goal. This becomes reflected in this statistic as shots from distance generally have a lower xG than close ones. In American Soccer Analysis’s model, they consider how much of the goal mouth is available to shoot at. The closer a player is to the goal line the less goal mouth will be directly exposed to him. To compensate for that a sharper angle will result in a decrease in xG.

Determining how the shot was taken is a slightly more complicated, as it is composed of the manner in which the physical shot is taken, as well as the lead up play to the shot. Higher probabilities are awarded to shots taken with the player’s foot rather than the head. This is simply because statistically a shot taken with the foot is more likely to score than a header. The build up play before the shot will affect the xG rating. For example, a shot taken from 10 yards on a counter attack will be awarded a higher xG then the exact same shot resulting from a corner. The reason for this is a concept is due to the time and space that the player would be allowed. Typically, on a fast break a player has more space and is able to get off his preferred shot. Whereas with a corner, the eighteen-yard box is very clogged so players are rushed to shoot and the chance of the ball being deflected is much higher.

What Can xG Tell Us?

Reasonable conclusions that can be drawn from xG are how often a player is in a good spot to score, and makes themselves available for good chances. Comparing their expected goals to their actual goals will give you an indicator of a player’s finishing ability, and whether they’ve benefited from good or bad luck. Think of it this way, if a player misses a sitter in front of the net by skying it over the bar, this type of shot from that location could be expected at (making this up) 95%. This player’s goal count would be zero, but xG count would be 0.95. The player got into a good position to score, but performed weakly in finishing. If they kept this up, there would be a large gap and this player could be deemed a poor finisher.

On the other hand though, let’s say two players in two different games take the same shot (which is deemed to be a 50% shot, or a 0.5 xG) against two goalies that are standing in the same spot. One goalie dives across and makes an incredible save, while the other falls just short. The player who did not score is penalized in goals for unluckily going up against a better goalie, which is out of their control. Sometimes, factors that are out of player’s control can affect their xG count in the short-term, while normalizing closer to the real goal total in a larger sample where luck would not affect them as much.

On AmericanSoccerAnalysis.com, you can find constantly updated MLS xG counts by game, player, and team. On Twitter, @11tegen11 tweets out a game maps of xG that were accumulated by each team in the game, and gives the odds of each team winning based on their xG count. This is a great way to identify which teams really got the better chances, but ran into some bad luck or good goaltending. His charts typically look like this:

11.PNG

Each scoring chance is denoted by the bar moving higher. The larger the rise of the bar, the higher the xG of the scoring chance, which means the more likely they are to score. In this came, it can be seen that Jelsson Vargas scored on a ~0.1xG chance, meaning he would be expected to score on that chance once every ten tries. The final xG coutns were 1.27 for Montreal, and 0.96 for Toronto, leading to the conclusion that it was a fairly even game that could have gone either way. This can also be seen in the match odds near the top left (that looks like a France flag for this game). What these mean are that in games where one team put up ~1.27 xG, and the other put up ~0.96, the team with the higher xG would be expected to win 43% of the time, draw 30% of the time, and win 28% of the time. TFC can consider themselves slightly unlucky to come out of this game without a point.

Expected Assists (xA) and Key Passes

xG is the most common tool to analyze how dangerous an attacker is. However, it doesn’t take into account how effective a passer is. That is why the stat ‘expected assists’ or xA was created. Expected assists is designed to give credit to the player that creates a chance not just the player who takes the chance. The way it does this is by assigning the xG rating of the chance to the passer in the form of xA. Therefore, if a through ball leads to a chance with an xG rating of 0.4 the player who laid the pass would be assigned an xA rating of 0.4.

Adding on to the playmaking measurement is key passes. Key passes are defined as “the final pass or pass-cum-shot leading leading to the recipient of the ball shooting”. The beauty of this stat comes from its simplicity. As long as the receiving player shoots the ball the passer is awarded a key pass regardless of the result of the shot. Therefore, it is quite easy to track and look out for during a game and will give the viewer a decent sense of which players create chances. However, the simplicity of key passes are also their downfall. Because every key pass is awarded the same rating of 1 it does not account for the type of chance created. A three-yard pass leading to a shot that goes ten yards wide is worth the same amount as a through ball leading to a tap in. Unlike xA, key passes do not differentiate and are less effective at actually measuring the total effect of creativity of a passer.

Player Comparison (Radars)

One the most useful, and easy to interpret tools (mostly) available to the public community are player radars. Due to the data constraints outlined earlier, it’s not so easy for everyone to make them, but there are thankfully a few people on Twitter who post them on a consistent basis, and that has essentially created a database of them on there. Here’s an example of a player radar created by Ted Knutson (@mixedknuts), for Sebastian Giovinco in  the 2016 season:

It might look like there’s a lot going on there, but it’s actually quite simple. Eleven stats are highlighted above, chosen by their position (in this case, forward). Each are presented in a per90 basis, so everyone is judged by the same scale. The closer each value stat is to the outer areas of the circle, is the closer that this player was to being the best in their respective league at it. The outer circle represents the top-5 percentile, while the middle of the circle represents the bottom-5 percentile for players in the same competition. If a player has a stat that touches the end, they are likely to be considered elite in that category. If they have a stat near the middle, this might be an indicator of their play style or they may have work to do. 0.39 throughballs has no relation to 1.2 dispossessions at all, aside from representing the same percentile rank for each different stat.

From this radar, we can see that Giovinco is an extremely high volume shooter, which is reflected in his high shots per 90, and low xG per shot. At first glance, his passing % looks weak, but considering that his passes into box number his well above average, he could be thought of as a creator near the goal. You probably already knew this, but the radar makes significant claims that Sebastian Giovinco is a fantastic soccer player, and has dominated the MLS. This really highlights the beauty of soccer analytics - it’s a great way to confirm the eye-test.

To access these player radars, it’s not an ideal process. First, go to The Twitter Search Page (does not require an account). The three people who have been identified that consistently post these are: @Mixedknuts, @Fussballradars, and @thefutebolist. Type any of their names (start with @Mixedknuts, his database is probably the largest, then move on to the other two) and then the name of the player you are looking for. It’s sometimes best to then filter by photos, as all the radars will appear there. You could then have found the radar you are looking for. If that didn’t produce any results, it’s not entirely hopeless. Ted Knutson occasionally opens a request line on Twitter, so if you want a radar for a player who does not have one yet, you can request one that way.

Score Effects

Score Effects are an important concept to consider, especially for casual viewing, as it might help explain certain phenomena that occur every single match. The idea here is that when teams are winning, they tend to sit back and defend more, and while they are losing, they push forward. Seems obvious, right? The thing that is not always obvious to most people is how this will affect the flow of the game, the final stat-line, and the quality of shots that can be expected. Statsbomb did a detailed statistical analysis on score effects which can be found here, which shows some of the math and stats they used to confirm this effect.

Essentially, what they found was that when teams were leading in a game, they tend to form a ‘defensive shell’ which will tighten them up defensively, and drop deeper. This is done because to them, preventing a goal would be more valuable than scoring another. They tend to allow more shots from a further distance out, and these shots typically are less likely to go in.

On the other hand, when teams are trailing by a goal, they will tend to take more shots in a more desperate attempt to score the tying goal. These shots will typically be of lesser quality due to this desperation and by not being afforded the freedom to wait for the perfect chance to become available. The conversion rates on these shots tend to be lower, which is another hat-nod to the notion that these shots are of lesser quality.

Add all of this up, and you could see a very lopsided statline at the end of the game if one team happened to be trailing for the most of it. It might paint a picture that one team dominated and got lucky. This could be true, but hopefully with knowledge of the concept of score effects, you will be able to see through this scoreline and consider that these shots could have been lower quality and part of the defending team’s plan all along.


Keep up to date with the Queen's Sports Analytics Organization. Like us on Facebook. Follow us on Twitter. For any questions or if you want to get in contact with us, email qsao@clubs.queensu.ca, or send us a message on Facebook.