In this post, I will give you a quick tip on how to make your life a little easier when working with data collected with SoSci Survey. In particular, I will show you how you can easily identify your variables by labelling them according to the details you provided when you created your questionnaire.

1 The Problem

When you create your SoSci Survey questionnaires, you feed in important information about the variables whose characteristics you would like to retrieve: You determine the question type and therefore the level of measurement, you provide possible answers a participant might come up with and you specify which items are used to operationalize a specific variable.

When you import the data you collected with your survey using the R API, though, some of that information gets lost. In particular, you end up with a dataframe whose variables are named after abbreviations for your variables automatically created by SoSci Survey. This might make it hard to identify which variable stands for which item.

Figure 1: An Example of Hard-to-Understand Variable Names

2 The Solution

2.1 The Function comment

Luckily, SoSci Survey saves the most important information in an attribute called comment. You can check this using the same-named function:

comment(data$NV01)

This will return the string you typed in when describing your variable in SoSci. In our specific case, it yields

"Social Media Regelmäßigkeit"

Now we finally know what NV01 was used for!

2.2 Comments as Labels

As we have just seen, we can use the comment function to get interesting information about a variable in our dataframe. Issuing this function once for every variable is cumbersome, though, so we will write a little function to assist us.

In order to do that, we first of all need the package labelled:1

ensure_packages("labelled")

This package provides the function var_label which allows us to read and write labels of variables and dataframes. The great thing about labels is that in RStudio, they are shown below the variable name when looking at your data frame. So when we define our comments from SoSci Survey as labels, all our variables will have a small explanation tooltip. What you get, then, looks something like this:

Figure 2: An Example of Better-to-Understand Variable Names

2.3 The Actual Function

So let us write a function which provides the functionality just explained above. It should take in a dataframe and, for every of its variable, define its comment attribute as its label. A simple for loop makes quick work of it:

comment_to_label <- function(dataframe){
  variables <- colnames(dataframe)
  for (variable in variables){
    var_label(dataframe[[variable]]) <- comment(dataframe[[variable]])
  }
  return(dataframe)
}

The function definition is quite easy to grasp: We define a list variables, which holds all the variable names of our dataframe, and for every item of that list variables, which we call variable, we set its label in the dataframe as its comment. Then, we return the changed dataframe.

Note that this function does not alter your dataframe in the global environment. There is a way to do this using the <<- operator, but I heavily advise against it. The function only returns the changed dataframe, but does not alter it itself. Thus, we also need the line

data <- comment_to_label(data)

to make sure that the dataframe in global environment actually changes. With that, we have defined a function which defines SoSci Survey comments as labels and, thus, made our dataframe more readable!


  1. I am here using a self-made function to ensure that the package is both installed and loaded. If you are interested in how I do it, check out this post. If not, just load the package as usual. ↩︎