Get out of my way! Dunk thru #rstats errors like the Big Shaq-istician

Ahh, leaves falling, parents crying, collegicians biking uphill with a bag of in-n-out in between their teeth. Must be the new academic school year!

I figured it’s a good time to introduce my work-in-progress datzen package of miscellaneous #rstats functions.  You can bee-line straight to the github readme with more examples.

Or stick around and I’ll highlight the Shaq example showcasing datzen::itersave()

In #rstats if you want to iterate, you can go about it in many different ways. Works pretty well for “homogeneous” iterations.

As good as they are, the standard approaches hit snags for “non-homogeneous” iterations, eg data from the web.

Go ahead, try them. I dare you.

You in 5 hours

“Aw shit, my brute force for loop crapped the bed during iteration 69. Now I have to manually restart it. I hope it doesn’t do it again. I’m running out of patience, and linen.”

Let’s take a look. The Big Aristotle, Dr. Shaq, was a notorious brute on the hardwood. Here he is, contemplating how he should score in the paint:

shaq = function(meatbag){
if(meatbag %in% 'scrub'){return('dunk on em')}
if(meatbag %in% 'sabonis'){return('elbow his face')}
if(!(meatbag %in% c('scrub','sabonis'))){
stop('shaq is confused')}
}

meatbags = c('scrub','sabonis','scrub','kobe')
names(meatbags) = paste0('arg_',seq_along(meatbags))

testthat::expect_failure(lapply(meatbags,FUN=shaq))
#> Error in FUN(X[[i]], ...): shaq is confused

Uh, some error confused Shaq.

enter, stage trap door
“Meet itersave()

front row faints
“It’s… hideously beautiful”

In a nutshell, itersave works like lapply but when it meets an ugly, unskilled, unqualified, and ungraceful error it will keep trucking along like Shaquille The Diesel O’Neal hitchhiking a ride on Chris Dudley’s back

mainDir=paste0(getwd(),'/tests/proto/')
subDir='/temp/'

itersave(func_user=shaq,
         vec_arg_func=meatbags,
         mainDir,subDir)
#> [1] "1 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_1"
#> [1] "2 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_2"
#> [1] "3 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_3"
#> [1] "4 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_4"

The meatbags that Shaq succesfully put into bodybags.

print('the successes')
#> [1] "the successes"
list.files(paste0(mainDir,subDir))
#> [1] "arg_1.rds" "arg_2.rds" "arg_3.rds" "failed"

It’ll also book keep any errors along the way via purrr::safely() and R.utils::withTimeout().

print('the failures')
#> [1] "the failures"
list.files(paste0(mainDir,subDir,'/failed/'))
#> [1] "arg_4.rds"

Along with the out, itersave has an in companion

enter, zipline from balcony
“meet iterload()

audience faints

iterload(paste0(mainDir,subDir,'/failed'))
#> $arg_4
#> $arg_4$ind_fail
#> [1] 4
#> 
#> $arg_4$input_bad
#> [1] "kobe"
#> 
#> $arg_4$result_bad
#> <simpleError in (function (meatbag) {    if (meatbag %in% "scrub") {        return("dunk on em")    }    if (meatbag %in% "sabonis") {        return("elbow his face")    }    if (!(meatbag %in% c("scrub", "sabonis"))) {        stop("shaq is confused")    }})("kobe"): shaq is confused>

Ah, it was the 4th argument, Kobe, that boggled Shaq’s mind.

“Jigga man [was] Diesel, when he [used to] lift the 8 Up” – Jay-Z

*Wiping away my sad Laker tear from my face while I type this*

“What could have been man, what could have been.”

R.I.P Frank Hamblen

Anyways, Shaq wisened up in Miami. He also fattened up in Phoenix, Cleveland, Boston, Hawaii, Catalina, etc.

shaq_wiser = function(meatbag){
if(meatbag %in% 'scrub'){return('dunk on em')}
if(meatbag %in% 'sabonis'){return('elbow his face')}
if(meatbag %in% 'kobe'){return('breakup &amp; makeup')}

if(!(meatbag %in% c('scrub','sabonis','kobe'))){
stop('shaq is confused')}
}

itersave(func_user=shaq_wiser,
         vec_arg_func=meatbags,
         mainDir,subDir)
#> [1] "1 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_1"
#> [1] "2 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_2"
#> [1] "3 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_3"
#> [1] "4 of 4"
#> [1] "2017-10-01 12:35:14 PDT"
#> [1] "arg_4"

So, give me the whole shebang. What was the whole story of Shaqs road trip?

out_il = iterload(paste0(mainDir,subDir))
cbind(meatbags,out_il)
#>       meatbags  out_il            
#> arg_1 "scrub"   "dunk on em"      
#> arg_2 "sabonis" "elbow his face"  
#> arg_3 "scrub"   "dunk on em"      
#> arg_4 "kobe"    "breakup & makeup"

So, if you use bare bones for loops or lapply you’ll crap out immediately when you hit an error.

On the other hand, even using purrr::map with purrr::safely , by design, it’ll do everything in one shot (eg batch results). This is not ideal when working with stuff online. When you backtrack to resolve unforseen edge-cases, it’ll feel like a cantor-set .

For web data in the wild, expect the unexpected. That’s why I baked up itersave . You have non-homogeneous edge cases aplenty.

These Chris Dudley looking edge cases are just waiting in the bushes for you.

Dunk thru them.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s