Nasty Nas’ Nasty Rubdown via `magick`


We have 2 legends, Biggie Smalls and Nas. At the 1:00 mark, Nasty Nas receives a Nasty Rubdown. Pretty sure this was the inspiration for Boosie’s Wipe Me Down.

I made a .gif version using a pen, a tablet, and command line ‘ImageMagick‘.

But the resulting FPS was slow, so I decided to try out Jeroen’s R package, magick , to tune settings for the sped up version below.



I could have totally ‘tuned’ these settings in standalone ‘ImageMagick’, but I like the comforting caress of R’s function syntax.

Some of magick’s R bindings can immediately accept a ‘.gif’, so you can do things like

nas_gif_original %>% image_chop(.,'0x10') %>% image_animate(.,fps=10)

There you have it. Biggie, Nas, an enthusiastic head caresser, pngs, and gifs. Brought to you by R and magick .

A gist to the R script is below.

Chop It: Look up the Generating Data Frame Columns of a Formula Term

We the moody Gucci, Louis and Pucci men
Escada, Prada
The chopper it got the Uzi lens
Bird’s-eye view
The birds I knew, flip birds
Bird gangs, it was birds I flew

Say you use the base #rstats lm() command

lm(data=dat_foo[,c('y','x1','x2')],y ~ x1 + x2 + x1:x2)

I want to be able to map the single formula term x1:x2 to the two
‘generating’ columns  dat_foo[,c('x1','x2')]

In words, for a term in a ?formula, lookup the involved ‘root’ columns of the data frame inside the formula’s associated environment.

I feel like this mapping must exist under the lm() hood somewhere. Various stackoverflow Q+A’s about formulas never directly talk about this lookup. This Rviews blog post sums up the formula landscape pretty well. But there does not seem to be a convenient expose of the explicit lookup/hash table of the df to term mapping.

I have to hand roll the few lines of code to implement the hash / lookup table myself. My solution is ‘loose’ since it chops up the terms in the formula, then creates many sub-formulas for each chopped term.

Is there a better / preferred way?

Bill and Ted Make the best out of a Shi… Stata situation: Rstudio + Rstata + Stata

After rewatching the thanksgiving classic, Bill and Ted’s Excellent Adventure, it reminded me of the history of #Rstats and its current status as the defacto software for general data programming.

The most excellent thing about R, is the literate programming options you have. As a data analyst, you are Bill S. Preston Esquire (or Ted “Theodore” Logan, they are exchangeable). Rstudio is the time traveling phone booth. Since its conception, Rstats had Sweave’s phone number on speed dial. Now, Rstudio has Rmarkdown. Compare this situation with…  Stata. Stata is Ghenkis Khan.

Seeing Mine Çetinkaya-Rundel post about the joys of Stata,

During these discussions a package called RStata also came up. This package is [a] simple R -> Stata interface allowing the user to execute Stata commands (both inline and from a .do file) from R.” Looks promising as it should allow running Stata commands from an R Markdown chunk. But it’s really not realistic to think students learning Stata for the first time will learn well (and easily) using this R interface. I can’t imagine teaching Stata and saying to students “first download R”. Not that I teach Stata, but those who do confirmed that it would be an odd experience for students…

I decided to see for myself how (un)approachable writing narratives for literate programming in Stata really is.

Related image

If Plato pitched his ideal to So-crates, he would claim:

Integrating Rstudio + Rmarkdown + R + RStata, should give you the best of 3 worlds

1) Write narratives that are human-readable

2) Manipulate data with human-readable R code

3) Have ‘paid-for-assurance’ of Stata analysis commands

But!  Bill and Ted would probably get bogged down during the setup. The key overhead step is to make sure Bill’s RStata package plays nicely with his local copy of Stata.

This is like chaparoning Ghenkis Khan in a shopping mall by letting him run loose without an adult-sized child leash.  He might be enjoying a delicious Cinnabon all by his lonesome, or he might be playing home run derby with a mannequin’s head.

It depends on Ghengis’ mood aka the disgruntled software versions in his computing environment.

The setup overhead is definitely an obstacle against adoption. You need to also version control Rstudio (undergoing rapid development) for its notebook feature and you need to align the Stata version (with their yearly business as usual updates).

I can only see this being useful if Ted is a Stata user with a near ‘final’ Stata .do file that he wants to share in a reproducible manner. During his presentation to his high school history class, Ted would just narrate his analysis center stage via markdown and whenever a result needs to be piped in, he could just chunk-source the .do file in Rstudio (like pulling Ghengis Khan out of the phone booth). Most Excellent.


The gist below is my standalone Rnotebook demo that should work if you have Rstudio 1.0.44 and Stata 14. Your Mileage May Very, with or without a time traveling phone booth.



Use Rstats to Share Google Map Stars with Friends

On my trip to Japan, I took this photo of the stairs leading to the “Rucker Park of Tokyo.” I crossed up some Tokyo cats, they were garbage. That one girl behind the blue pillar was practicing her hip hop power moves. She thought no one could see, but I saw.


I’ve been traveling. I’ve been starring places on google maps. I want to share my recs with friends. I also want to receive recs from friends. See this wired article that came out today!


“Google Maps” (what you use on the phone) exports ‘.json’ data


“Google My Maps” (what you share with friends) CANNOT import ‘.json’ data


For something like this, Gavin Belson would rip a new hole in some unsuspecting Hooli employee. I really hope the engineers of “Google (My) Maps” eventually roll out a backend feature that would make this post obsolete.


This is why you’re here, we’re going to fill the middle gap with a very easy #rstats script.

Step 1) Google Takeout > Google Maps (your places) > export json

Step 2) Use R to manipulate json then export a csv spreadsheet




Step 3) Google My Maps > Upload csv spreadsheet via Drag + Drop

Step 4) Share the url link of your new map with friends

Here’s my Google My Maps of Japan

Screen Shot 2016-06-10 at 10.40.06 PM

Spread the word, use this method, play a game of around the world… around the world 😉 , and share your recs.

PS, Shoutouts to seeing Slow Magic at O-nest in Shibuya

PPS, Sending good vibes for the recovery from the recent earthquake.

@slowmagic the based god rocked the taiko for the home crowd

A post shared by mikejacktzen (@mikejacktzen) on



# read in .json data
# Google ->; Takeout ->; Google Maps (My Places) ->; Saved Places.json

txt = '~/projects/Saved Places.json'
dat = fromJSON(txt, flatten = TRUE)

# keep the useful parts
df_feat = flatten(dat$features)
df_dat = df_feat %>%
select(`properties.Location.Business Name`,
`properties.Location.Geo Coordinates.Latitude`,
`properties.Location.Geo Coordinates.Longitude`

# subset to specific geographies
# method 1, grep for state in address (easier)

dat_jap = df_dat %>%

# export to a csv spreadsheet

# upload csv into Google My Maps to share

Lakers Lent: Chuck should have fasted sooner and Historical Win Trajectories

For the 2015 NBA season, the only exciting Lakers news is the return of the Kobe show and Charles Barkley’s Lakers Lent.chuck_madonna_churchThe Lakers started the season with 0 wins and 5 losses, amazingly bad. The round mound of rebound started Lakers Lent, fasting until the Lakers won. This week, Chuck finally ate and the Lakers finally got a win, advancing to 1 and 5 against the new look Charlotte Hornets. The following game, the Lakers lost to the Grizzlies.

What did Charles eat, is the question? Easy, I say organic foie gras milk shakes. The interesting question is, what other times in history have teams started 1 and 5? Starting under those conditions, where did they end up and what win trajectory paths did they follow?win_traj_w1l5For all historical 82 game seasons (thus excluding pre-1968 and the two lockout seasons) there have been 121 times where teams started 1 and 5, highlighted in cyan. Following these win paths, things look pretty grim. In general, teams end up in the tail of the pack, scum eaters, cocaroaches.

However, we notice a difference between seasons. In the current era, the final location at game 82 is more spread out (more variation), bottom feeder teams have more hope for positive win mobility, whereas teams in the older eras were stagnant (less variation), more likely remained near the bottom.

So, Chuck should have had many lents. Mavs Lent, Rockets Lent, and Knicks Lent. Before Chuck and Angelinos say “those aren’t the Lakers, they’re just wannabes that look like them,” theres some hope. That is, if the Lakers do not purposely go all out tank mode like the doormat 76ers.chuck_wannabe_fenceUp next, an interactive version that lets you choose the initial conditions.


As my favorite statistician, NAS, said, “no ideas original under the sun.” Substituting professions for ideas, Hadley Whickham is a modern day blacksmith who is forging open access [R] weapons. All of this analysis is possible by open access statistics. Specifically, using a combination of rvest for web scraping data from basketball-reference, dplyr for shaping the data, and ggplot2 for graphics.

SAS,PHO,LAL : “Untangibles” Not Captured by the Box Score

Our goal here is to structure and quantify the ‘Untangible’ attributes that do not show up in the end game boxscores. We overcome this boxscore ‘low-resolution’ measurement issue by using modern Statistical techniques, a Bayesian model. For each game, we structure the relationship between the teams’ points, their boxscores, and the teams’ additional untangible (latent) component of variation.

Deep playoff teams like OKC, MIA, and SAS, appear at the top of the ‘untangibles’ chart. Surprisingly, The Toronto Raptors and the Phoenix Suns are right behind the Spurs. The Lakers and their historical rivals Celtics, appear near the bottom. Both teams were in rebuilding mode in 2013. As a proxy for a team’s average ‘untangible’ effects, we will incorporate latent variation, due to unmeasured important micro scale events.


The championship Spurs had extremely great chemistry. Coach Popovich knows a good thing when he sees it. For nearly a decade, he has steered the Spurs ship working with Tim Duncan, Manu Ginobiflop, and Tony Parker as the core. ‘Eyeballing’ a Spurs game, the dazzling buffet of ball movement was easy on the eyes. The ball movement was like a fast paced soccer match. Further, running more plays for the riverside CA native, Kawhei Leonard, was like Vin Diesel hitting the NOS button. Shoutouts to the 909.


Phoenix had pocket rockets. The dynamic duo point guard combo of Goran Dragic and Eric Bledsoe was a refreshing yet effective approach for the Suns. This system went against the traditional cookie cutter lineup you see across the league.


The rehabbing Lakers were vomiting in the boxscore on a nightly basis. It was obvious the Lakers were going to be painfully bad. With the lame duck coaching situation of pringles D’Antoni and the spread of contagious injuries, fans knew to expect poor performance. Thus, the Lakers are near the bottom of the untangibles chart. The small ‘untangible’ effect tells us the variation in the Lakers point rates were already well accounted for in their disgusting boxscore measures.

As seen on @joy_behar_swagg ‘s instagram of a framed photo of a @woptype_swagg tweet

Boxscores are useful as descriptive end game summaries. Although ripe with information, you always hear the criticism, for good reason, that important ‘untangible’ attributes do not show up in the boxscore. As the name suggests, it all boils down to a purely ‘measurement’ issue; the boxscores lack the high resolution necessary for capturing dynamical point increasing game effects like: hustle, hands in the face, box outs, helping the help defender, spacing configurations, player interactions, etc.

Using box scores, we look at a season’s worth of (30 x 82 / 2) match-ups of pairwise (home team I versus away team J) combinations. To model the analysis (figure below), we make the following contextual assumptions:

1) We care about Wins, but we really care about Points (most points wins the game): For a game, each team’s Point Rate is the bivariate Poisson outcome.
2) Team I and team J’s point rates depend on boxscores: Via principal components of two way interactions of the raw boxscores (Field Goals, Rebounds, Turnovers, etc).
3) After accounting for boxscores, we structure what’s leftover: Use team level random effects.
4) Team I just ‘matches better’ against team J: The random effects are allowed to be pairwise correlated.


We demonstrated that although boxscores are limited, they are still helpful if used with appropriate methods. An obvious alternative to study boxscore untangibles is to approach the problem with ‘high-resolution’ data, like SportVU. This directly allows you to define and measure what the boxscore untangibles are. After talking to some NBA franchises, teams are barely scratching the surface of SportVU; setting up databases, defining the measures, and doing basic visualizations. To truly harness this extra information, NBA franchises need to start shifting towards model based analysis like these guys

Part 2: LAL, HOU, CHA, CHI – Offensive Rebounds Aren’t What They Used to Be

gamm2_4teamsThe Lakers have historically featured low post offense. Mikan, Kareem, Shaq, Bynum. I’m glad to see Byron Scott continue the tradition by featuring Kobe on the low-mid post this season, best low post footwork in the league!

The Houston Rockets were great in the 90s thanks to Akeem the Dream. Since then, the offensive rebound to win relationship hasn’t been the same.

For Charlotte, nothing much to see. Flat as pancakes.

The Bulls are interesting. There’s a very slight hill from the 90s till the present. No surprise since The Worm and Joakim Noah have been holding down the Bull’s offensive rebounds.

Takeaway: Accounting for team specific characteristics give you a clearer picture of reality. In the previous post, we looked at an ‘average team.’ That was fine if you wanted to look at the league wide trend. Clearly, each team is nuanced enough to be different than your ‘average team.’ Using modern Statistics, via ‘generalized additive models’ to smooth and ‘random effect’s’ to partially pool, you can estimate nonlinear relationships for specific teams.

To paraphrase Phil Jackson, team’s should play to the strengths of their personnel. Just because you see a leage-wide trend of jacking up corner threes, or crashing offensive rebounds does not mean a specific team should follow suit.

Up Next: A ‘Game Level’ model to quantify a team’s “untangible’s” that don’t show up in the boxscores.

Offensive Rebounds Aren’t What They Used to Be

We see a historical shift in the offensive rebounding to win rate relationship. Also, notice the peaks. This shows the diminishing returns when the offensive rebounds go past the optimal points. Offensive rebounds aren’t what they used to be. gamm2_avgtm

In the infant stages of the ABA/NBA, offensive rebounds were detrimental. Team’s barely knew how to dribble a ball. Fourth place baby!

Then, in the era of Big Men, teams realized how to capitalize on offensive rebounds.

Nowadays, Small Ball teams, with diverse scoring options, are more efficient when they go for less offensive rebounds

Here, we considered all the teams in the league pooled together, for an ‘average’ team. In Part Two, we’ll take a look at each team through partial pooling.

Kobe’s FGP – Time Series of Spatial Processes and Pau Gasol’s added Versatility

time_series_spatial_kb_fgpSpread across time, each colored line is a hexagonal area of the court. Based on intuition, you can guess that the areas where Kobe’s Field Goal Percentage (FGP) is higher are the ones closer to the rim and/or his favorite sweet spots. Most of them hover around 40 percent. Around 2006, Kobe’s Court X Time FGP are pretty stable.

However, 2007 and beyond, Kobe’s Court X Time FGP show more fluctuations. Pau Gasol anyone? He’ll be missed by LA.

Space-time data sets and analysis techniques are becoming more a-la-mode. First, you must choose a way to think about the joint ‘space-time’ structure (think electro-magnetism).

One simple way is to consider space and then time; resulting in a time series of spatial processes. This is like a stack of papers, each piece of paper is a slice of time across space. Alternatively, you can consider time and then space where the analogue would be a bundle of straws, each straw being a time series. The brave-hearted might pursue ‘non-seperable’ techniques and look at space-time together.

In the near future, i’ll present what we can do with these ideas and this space-time dataset of Kobe’s FGP.

Bootleg CourtVision with non-proprietary NBA data and [R]

Mike likes Basketball. Mike likes Spatial Data. Mike likes Open Access.

I’m a big fan of what Kirk Goldsberry and friends are doing. I’ve been following his work since he hit the scene (Awesome).

Let’s make some bootleg CourtVision heatmaps.

Nowadays we only see Kobe in black suits riding the pine; let’s get in the hot tub and travel back to a more golden time. We have three years of regular season data beginning in 2006 and ending in 2009.

kobe_06_09_noalphaCheck out 2006. Fans would remember this as the 81 game season. Equally as impressive, this was also the season Kobe had the crazy flurry of contiguous 50 point games (eat your heart out Kevin Durant). Hence, we see many bright blue hexagons lit across the court (areas with higher field goal percentage). People would also geek out for his 30+ feet jumpers.

The next year, 2007, we see darker tiles near the mid-range front of the rim. This was the year we got Pau Gasol. Giving Pau low post touches was a necessity. Opposing Defenses reinforced their interior defensive schemes, resulting in difficult interior shots.

Finally, in 2008, we see the most dark spots (also more intense).  What’s interesting is the geographic distribution of these lower field goal percentage areas; they seem much more “integrated” instead of being clustered in a single area (like we saw in 2007 right in front of the rim).

Our above heat map showed basic aggregates of within hexagon observations (the actual x’s and y’s). Each tile is composed of a varying number of observations. It would be nice to visually display the ‘uncertainty’ of each hexagon’s field goal percentage ‘estimate.’
So, I’ve mapped ‘shot attempts’ to the alpha (transparency) levels. Below is the result. We see the crazy gun-slinging 30+ foot jumpers as more transparent, because we observed less attempts in those tiles.

As an alternative, I believe Kirk maps this feature to the actual hexagon size. I wanted to dig up his original piece to get confirmation, but his archives stop on page 5.

kobe_06_09_alpha I’d love to see and work with proprietary data, such as the trendy “SportVU” data sets. I believe the utility of these hi res tracking system data sets is the ability to define much more realistic and complex “events” (scenarios). For example, check out “Kobe Assists,” that defines Kobe’s misses as an “assist” due to his teammates scoring off of offensive rebounds. My old attempt of wrangling custom events, turnover conversions, was a pain to do.

However, this exercise demonstrates the availability of tools (ggplot2 and [R]) and resources (free data); all that’s left is applying your ability.