Many analysis projects involve data that is not stored in a database or easily accessible directly through Tableau. In some cases, clients have disparate Excel sheets that require cleansing or some amount of manipulation before importing into Tableau. The client may send updated data often, which requires re-running scripts, updating the data manually in Tableau and then re-saving the updated dashboard. This is a tedious and time-consuming process and if you notice that you’re spending lots of time in this type of cycle, this automation workflow will save you a lot of time.
Credit for the idea goes to [Jiggy Parikh](https://www.linkedin.com/in/parikhjignesh), a Senior Data Scientist at Biogen, whose scripts this example is based on. He faced a similar scenario in which he was constantly running simulations in R using data that was regularly sent from the client. The simulation results would be displayed in a Tableau dashboard that he would send back to the client. This repetitive cycle prompted him to write a workflow in R to save time.
I’ve altered the example to scrape salary data from [Indeed.com](http://www.indeed.com) so that it is reproducible for everyone. This example can be extended to use various Tableau dashboard templates, allowing the user to choose which type of dashboard to create. You have to create the initial Tableau dashboard and save it to your working directory as a .twb file.
The process consists of several steps:
1. Create the initial Tableau dashboard and save the .twb file in your R working directory.
2. Add the create_tableau_dashboard() function somewhere in your R script. This function will re-write the data file and zip all the files together into a packaged .twbx workbook that you can send straightaway to the client.
3. The end result is just two lines of R code in order to scrape, clean and visualize the data in a Tableau dashboard.
We start by creating a function for scraping Indeed data:
#' Scrape Indeed Salary Results for a specific job and city
#' @param job Job query to search for (e.g. data science)
#' @param city City to search
#' scrape_indeed(job = 'data science', city = 'new york')
#' @name scrape_indeed
## Set your working directory
scrape_indeed job_escape city_escape url_content
## Scrape salary distribution counts
html_node(xpath="//*[@id='rb_Salary Estimate']") %>%
## Clean up salaries and counts
salaries salaries counts
## Create resulting data.frame for Tableau
Next, we create the create_tableau_dashboard() function:
#' Export Indeed Salary Results as Packaged Tableau Workbook
#' @param indeed Resulting dataframe from scrape_indeed
#' @param save_location Directory path where workbook will be crated (default = current working directory or "getwd()")
#' @param dashboard_name Name of workbook to be saved; without extension (default "IndeedDashboard")
#' @param dashboard_template Type of dashboard to save; currently only one option (default "aggregated_tableau_workbook")
#' create_tableau_dashboard(indeed, save_location="~/Desktop", dashboard_name="Example Simulation Tableau Workbook")
#' @name create_tableau_dashboard
#' @export create_tableau_dashboard
#force set dashboard template for now
file_name_map aggregated_tableau_workbook = "indeed_tableau_data.txt"
#write text file
write.table(indeed, file=file.path(save_location, file_name_map[[dashboard_template]]), sep="\t", row.names=FALSE)
#write zip file with twbx extension
zip(suppressWarnings(normalizePath(file.path(save_location, paste0(dashboard_name, ".twbx")))),
normalizePath(file.path(save_location, paste0(dashboard_template, ".twb")))
), extras = c("-j"))
The scrape_indeed() function can be supplied with a job query and city. This will return salary ranges for the specified combo of query parameters. Then simply call the create_tableau_dashboard() function and the .twbx file will be updated and automatically created.
Here’s the resulting Tableau dashboard for Data Science salary ranges in New York based on Indeed.com job listings: