solving for better small-data systems

A Warning About warning()

Avoid R’s warning feature. This is particularly important if you use R in production; when you regularly run R scripts as part of your business process. This is also important if you author R packages. Don’t issue warnings in your own code and treat warnings in 3rd party code as errors that are explicitly suppressed. I’ll discuss a strategy to implement this in a little bit, but first let's discuss the warning feature and the justification for this advice.

The Warning

A warning cautions users without halting the execution of a function. It basically says “although I can and will give you an answer, there might be a problem with your inputs. Thus, the computation could be flawed.” For example, the correlation function issues a warning when an input vector has a standard deviation of 0. Rather than raising an error and halting execution, the function issues a warning and returns NA.

> cor( c( 1 , 1 ), c( 2 , 3 ) )
[1] NA
Warning message:
In cor(c(1, 1), c(2, 3)) : the standard deviation is zero

To issue a warning simply call the warning() function

Division = function( x , y )
    if ( y == 0 ) { warning( "Divide by 0" ) }
    return( x / y )
> Division( 1 , 0 )
[1] Inf
Warning message:
In Division(1, 0) : Divide by 0

Multiple warnings can be issued during a function call. They can originate from the function itself or from child functions. R stores all warnings and prints them out after the function returns.

ParentFunction = function( )
    warning( "warning from parent" )
    warning( "second warning from parent" )
    return( TRUE )

ChildFunction1 = function() {  warning( "warning from child" ) }
ChildFunction2 = function() {  warning( "second warning from child" ) }

> ParentFunction()
[1] TRUE
Warning messages:
1: In ParentFunction() : warning from parent
2: In ChildFunction1() : warning from child
3: In ChildFunction2() : second warning from child
4: In ParentFunction() : second warning from parent

Avoid Warning

You should avoid warnings (as a producer and a consumer) for three reasons:

  1. It adds unnecessary complexity.
    Functions should either succeed or fail and it should be as simple as that. Warnings introduce a third state. A function author must consider when to issue a warning, how to message it, and how the function will proceed in the warning/caution state. It's difficult enough to write a good function without this burden. On the flip side, a function consumer must decide how to handle a warning. Do you continue or halt? Does it depend on the warning? If so, do you switch on specific warning messages or keywords within the message? What if the warning message changes with an updated package? It’s far less complex when a function just fails; there’s less for both parties to do.
  2. Most warnings are actually errors.
    This is anecdotal, but I’d bet it generalizes well: 80% of the time I see warnings when there’s a bug in the inputs to a function call.
  3. You will miss critical errors.
    In an automated/production environment dealing with warnings is a pain. You might miss warnings outright because no human will see it. Per #2 above, this could mean bad things. Otherwise, you will have to pipe a script’s output to a file, parse it to detect the warning and notify someone. Let's say you pull that off. You are still betting that someone will heed the warnings in a timely manner. Also, you’re left with a time-consuming and ambiguous task of deciding which warnings to addressed. If, however, your script just failed, I’m guessing you will find out pretty quick and you will have to pay attention to the issue.

Step 1: Stop Warning

Instead of issuing a warning, do one of the following:

  1. Call stop("error message here"). This will generate an error and halt execution.
  2. Add a function parameter that specifies the action to take when the inputs are malformed. This is mostly likely a boolean (logical) parameter that indicates whether the function should fail or gives the function permission to proceed. Use a descriptive parameter. Some good prefixes are ignore and fail. For example, I’ve added parameters failSd0 and ignoreSd0 below.
Cor( x , y , failSd0 )
    if ( failSd0 && equals( sd( x ) , 0 ) ) { stop( "the standard deviation of x is 0" ) }
    if ( failSd0 && equals( sd( y ) , 0 ) ) { stop( "the standard deviation of y is 0" ) }

Cor( x , y , ignoreSd0  )
    if (  ( !ignoreSd0 ) && equals( sd( x ) , 0 ) ) { stop( "the standard deviation of x is 0" ) }
    if (  ( !ignoreSd0 ) && equals( sd( y ) , 0 ) ) { stop( "the standard deviation of y is 0" ) }

Just deciding between approach #1 and approach #2 is a useful exercise. It challenges you to judge the utility of a function in the face of specific categories of input (i.e. correlation when an input vector has standard deviation of 0). You might admit that a function’s outputs have little to no value when the inputs have a particularly quality. In that case, you would use approch #1 and just stop() the execution. This is the least complex thing to do.

Although solution #2 is a bit more work, it provides a wonderful benefit to your users. It gives them information about how not call your function. In other words, it helps them minimize failure. This information is nicely embedded into the parameter list. When a user sees the parameter failSd0 it triggers the most obvious question: “Is it possible that my x or y has a standard deviation of 0? Maybe I should check the code that populates x and y just in case.”

Step 2: Strategically Maneuver 3rd Party Warnings

To consume 3rd party code (i.e. CRAN packages) that issues warnings, do the following:

  1. Treat all warnings are errors.
    Take the complexity out of the equation by first assuming that all warnings are in fact errors. Simply call options( warn = 2 ). R will convert the first warning it encounters into an error and halt execution. I recommend putting this into your file so that this treatment is always applied.
  2. Selectively suppress warnings.
    There are times when a function issues a warning and you are certain that your inputs are correct and you are happy with the results. In this case, wrap the function call in suppressWarnings().
> options( warn = 2 )
> ParentFunction()
Error in ParentFunction() : (converted from warning) warning from parent
> suppressWarnings( ParentFunction() )
[1] TRUE

This is a case-by-case opt-in. Don’t wrap everything in suppressWarnings()! Be sure to add a clear justification in the comments:

# FictitiousFunction warns when myValue is 0.  myValue will always be 0.  
# This only affects result$ABC output which is fixed at NA.  We don't use this output.  
# All other elements of the result list are as expected.
suppressWarnings( FictitiousFunction( myValue ) )

Going-forward, your code could find itself in a state that triggers a 3rd party warning that you’ve never seen before. Congrats! This is a quality problem. Your code will raise an error. You can investigate the issue and in turn learn more about the package you depend on and about the behavior of your own code. And you can always suppressWarnings(). Your code will become more robust and more resilient.

Practice What You Preach

I have many thousands of lines of R code in production and I employ the strategy described above to avoid warnings. It’s a beautiful thing - I have never seen a single raw warning from R. My code simply passes or fails; there’s no third-state to deal with. Also, I sleep well knowing that my nightly processes don’t mask errors that are expressed as warnings.


I welcome your reaction and/or a comment (anonymous or not), below!

Subscribe to get new posts via e-mail.
[email protected]