Welcome to the R for Marketers II tutorial!

This is an indirect follow-up to the first tutorial.

Before we start, let’s assume that:

You understand the very basic elements of R
You’re looking for a way to give your team access to your scripts
You want to impress your boss with Docker * If you don’t know what It is, don’t worry. it will be explained further in the text.
You deal with Google Ads and/or Analytics

Problem introduction

Goal

Since in the first R for Marketers we dealt with the very basics of creating a script that queries google’s API, it now makes the most sense to explore the possibilities of enabling our code to be used by the non-coder peers in our marketing agency or department.

We will be making a script that counts the amount of Conversions, Clicks and Conversion Rate of a number of Search term wording combinations. Such combinations are probabilistic models called “Ngrams”. Ngrams can be very useful for the planning of both negative keywords and non-negative.

Better explanation of the usefulness of Ngrams

Let’s say we have the following search term:

I want a loan

A 1-gram splitting would be: “I”, “want”, “a”, “loan”

A 2-gram: “I want”, “a loan”, “want a”

A 3-gram : “I want a”, “want a loan”

How could this be effectively used?

In case you are dealing with a big account, and there are many search terms that have converted in the past month

It might be too tedious to add them automatically, or your API engineer is just busy.

So a neat way around this is to find the amount of Conversions for every N-gram that you think is relevant. e.g 2 ngrams will most likely be the one with the most search term ideas, 3 might be more accurate of what a person would actually type.

After you found the number of conversions for every N-gram, just add the N-grams from where the top converting keywords stem from.

That way you can encompass a large amount of broad match keywords at once.

Preparing

Let’s pick easy-to-use packages that showcase how simple and intuitive R can be.

We will need:

install.packages("plumber") # in order to make our code a service
install.packages("dplyr") # to wrangle our data
install.packages("stringr") # convenient string functions
install.packages("googlesheets") # to automatically create sheets for our work peers
install.packages("RAdwords") # getting data about our google ads accounts, ads etc
install.packages("tidytext") # package with the ngram algorithm and some helper functions

And the script* :

We will do a little trick here.

the package RAdwords does not allow you to manually authenticate yourself. It has only one way of authentication, which is calling the doAuth() function.

After the call, doAuth() annoyingly prompts you on the R console.

However, if we set this up as a service there mustn’t be any manual input for authentication, for as we do not want to give anyone direct access to our code.

So a way around this is to save the authentication files, and use the same files in the service. Every time the service is called, it is automatically authenticated with your own credentials.

But then you might think, isn’t that unsafe?

Yes, it is.

However there are some very obvious steps we can take, that would ensure at least some protection until someone updates the RAdwords package for manual authentication:

Use a private GitHub repo to keep the code. Only leave the Dockerfile there.
Use a private docker hub repo to keep the docker image at
Make sure it links to a google sheets file that only allows access to people with email addresses from within your company.

I will assume you will do all of that. It’s really simple.

Now quickly type the following in your console, and follow the instructions.

doAuth()

2 files were created: .google.auth.Rdata and .httr-oauth. We will use both of these files in our docker image.

Building the script

You may notice that the script is quite simple.

And it really is, but there are some points of interest such as the #* parts.

When we talked about making it a “service” it actually meant the script would become an API.

What is an API? API stands for Application Programming Interface. Think of it as transforming your code into an access point, where people can freely use your code without asking you, or you having to share the code itself with them.

In order to make the users go to a specific address and request an action to be done (in our case, create a sheet from a specified client ID) we need to create “routes” so they reach the script file.

The #* @get /xxxx represents the route that the user needs to reach in order to use our function.

#* @param xxx is the parameter of the function xxxx.

Let’s say that a client ID is 999-666-3333

If an account manager would like to get a list of 2 ngrams for that account in the timeframe between 01-01-2018 and 03-03-2018, he’d have to open terminal and type this:

curl -X “http://ip_address:port/plumber/ngrams?client_id=999-666-3333&ngram_n=2&start_date=2018-01-01&end_date=2018-03-03”


library(RAdwords)
library(dplyr)
library(stringr)
library(googlesheets)
library(plumber)
library(tidytext)

#* Returns a table of ngrams
#* @param ngram_n The number of Ngrams
#* @param client_id The client's id
#* @param start_date The starting date
#* @param end_date The ending date
#* @get /plumber/ngrams

function() {
google_auth <- doAuth()

searchterms_body <- statement(select = c(
"CampaignName",
"AdGroupName",
"Conversions",
"Query",
"Clicks",
"Cost"),
report = "SEARCH_QUERY_PERFORMANCE_REPORT",
where = "AdNetworkType1 = SEARCH",
start = start_date,
end = end_date)

data_query_test <- getData(clientCustomerId=client_id, google_auth = google_auth, statement = searchterms_body)

two_grams <- data_query_test %>%
unnest_tokens(ngram, Searchterm, token = "ngrams", n = ngram_n) %>%
group_by(ngram) %>%
summarise(
Conversions = sum(Conversions),
Clicks = sum(Clicks),
Cost = sum(Cost)) %>%
dplyr::filter(!is.na(ngram)) %>%
arrange(desc(Conversions)) %>%
top_n(50)

two_grams_total <- two_grams %>%
summarise(Conversions = sum(Conversions),
Clicks = sum(Clicks),
Cost = sum(Cost))

two_grams_final <- two_grams %>%
bind_rows(two_grams_total) %>%
mutate(ngram = replace(ngram, is.na(ngram), "Total"))

link_to_grams <- gs_new(title = client_id, input = two_grams_final)

link_to_grams$browser_url

}

Now that we have our script, we need to create the file that will “plumb” our script and launch it.

To “plumb” a script means to transform the script in a web service. Basically it will “launch” our API into the internet.

Create a file named plumber.R

library(plumber)

r <- plumb("main.R")
r$run(port = 80, host="0.0.0.0")

The scripts are done.

What is Docker and why do we need it?

Having the script on hand, you might think:

“How can I very easily, without anyone’s aid, make this available on the internet without the person requiring to install R or even download my scripts?”

The answer lies in 6 letters. Docker.

Think of Docker as a mini virtual machine that only keeps the exact essentials to run your application.

If you put your script on Docker, anyone can use it at any time and with (almost)any machine, provided they have docker installed. Regardless of having R or not.

With Docker you can build once, and ship anywhere!

Download Docker here, install it and make sure it is running before the next step.

Creating and setting your Docker Container up

Since a lot of people use Docker, most certainly someone has already attempted to put R scripts in it.

Instead of making our own image from ground 0 we can use a base image and simply copy our scripts and packages to it.

We will use a base image from the rocker project. It simply contains R 3.5.2 in it.

In order to tell docker how to copy our files and build upon Rocker’s image, we need to create a Dockerfile.

A Dockerfile is simply a build script. Dockerfiles have no extensions.

What is a “build script”? A build script is a set of steps that will be automatically run every time you either make changes to the file, or are running it for the first time.

On your current R folder, with all other scripts, create a file simply named Dockerfile with this content:


FROM rocker/r-ver:3.5.2

RUN apt-get update -qq && apt-get install -y \
libssl-dev \
libcurl4-gnutls-dev \
libxml2-dev

RUN R -e "install.packages('plumber')"
RUN R -e "install.packages('dplyr')"
RUN R -e "install.packages('stringr')"
RUN R -e "install.packages('googlesheets')"
RUN R -e "install.packages('RAdwords')"
RUN R -e "install.packages('tidytext')"

COPY / /

EXPOSE 80 80

ENTRYPOINT ["Rscript", "plumber.R", "main.R"]

As you see, first we are getting the base image with FROM

Then we are installing the necessary linux libraries for our packages with RUN

And after the R packages we have to copy our local contents to our image with COPY

At last we expose ports so our container is accessible with EXPOSE

And select the files to be run. ENTRYPOINT runs the selected files without even using a shell/command prompt.

Testing your Docker file

After all this done, let’s get Docker up and running to test our file.

On your terminal go to your project folder and type:

docker build -t ngram_analyzer .

It will most likely take quite a long time.

Do not worry if it takes upwards of 10-20 minutes. It takes really long to build an unoptimized docker image.

After its done type the following in order to test it:


docker run -p 80:80 ngram_analyzer

You can now test your script by typing this on the console:


curl -X "http://ip_address:port/plumber/ngrams?client_id=999-666-3333&ngram_n=2?start_date=2018-01-01?end_date=2019-03-03"

If everything is correct, you will get a link to a google docs document with the desired Ngram table.

Remember that every time you change the code, you need to rebuild the file

Ok, what to do with this image now?

Good question! Now that your image works, you can build more functions in it.

Once you have all the functions you want check this amazing tutorial by mark edmonson on how to literally make it available 24/7 to anyone in your team.

Where to go from here?

From here, if you have more interest about the topic, I’d recommend you to look up what “Shiny” R applications are. Imagine how cool would it be, if you could also transform your Ngrams app into a full dashboard. Make google Data Studio go out of business with the power of R.

That’ll be the topic of our Introduction to R for marketers III 🙂

Bruno

All author posts

Introduction to R for Marketers II - Ngrams & Docker