Creating a Business Intelligence (BI) & Analytics Strategy and Roadmap

This post provides some of my thoughts on how to go about creating a Business Intelligence (BI) & Analytics Strategy and Roadmap for your client or company.  Please comment with your suggestions from your experience for improving this information.


When creating or updating the BI & Analytics Strategy and Roadmap for a company, one of the first things to understand is:

Who are all the critical stakeholders that need to be involved?

Understanding who needs and uses the BI & Analytics systems is critical for starting the process of understanding and documenting the “who needs what, why, and when”.

These are some of the roles that are typically important stakeholders:

  • High-level business executives that are paying for the projects
  • Business directors involved in the usage of the systems
  • IT directors involved in the developing and support of the systems
  • Business Subject Matter Experts (SME’s) & Business Analysts
  • BI/Analytics/Data/System Architects
  • BI/Analytics/Data/System Developers and Administrators


Then, you need to ask all these stakeholders, especially those from the business:

What are the drivers for BI & Analytics? And what is the level of importance for each of these drivers?

This will help you to understand and document what business needs are creating the need for new or modified BI & Analytics solutions. You should then go deeper to understand … what are the business objectives and goals that are driving these business needs.  This will help you to understand and document the bigger picture so that a more comprehensive strategy and roadmap can be created.

The questions and discussions surrounding the above will require deep and broad business involvement. Getting the perspective of a wide range of users from all business areas that are using the BI & Analytics Systems is critical.  The business should be involved throughout the process of creating the strategy and roadmap, and all decisions should tie back to support for business objectives and goals. And the trail leading to all these decisions must be documented.

Some examples of business drivers include:

  • Gain more insight into who our best customers are and how best to acquire them.
  • Understand how weather affects our sales/revenue.
  • Determine how we can sell more to our existing customers.
  • Understand what causes employee turnover.
  • Gain insight into how we can improve staffing schedules.


And examples of business objectives and goals may include things like:

  • Increase corporate revenues by 10%
  • Grow our base of recurring customers
  • Stabilize corporate revenues over all seasons
  • Create an environment where employees love to work
  • Reduce payroll costs without a reduction in staff, for example, reduce turnover.


Then, turn to understanding and documenting the current scenario (if not already known). Identify what systems (including data sources) are in place, who are using them (and why and how), what capabilities do they offer, what are the must-haves, and what are the pain points and positive highlights.

Also, you will need to determine the current workload (and future workload if it can be determined) of the primary team members involved in developing, testing, and implementing BI & Analytics solutions.

This will help you understand a few things:

  • Some of the highest priority needs of the users
  • Gaps in capabilities and data between what is needed and what is currently in place (including an understanding of what is liked and disliked about the current systems)
  • Current user base knowledge and engagement
  • IT knowledge and skills
  • Resource availability – when are people available to work on new initiatives


What are the options and limitations?

  • Can existing systems be customized to meet the requirements?
  • Can they be upgraded to a new version that has the needed functionality?
  • Do we need to consider adding a new platform or replacing one or more of the existing systems with a new platform?
  • Can we migrate from/integrate one system to/with another system that we already have up and running?
  • Are any of our current systems losing vendor support or require an upgrade for other reasons? Has the pricing changed for any of our software applications?
  • What options does our budget permit us to explore?
  • What options do our knowledge and skills permit us to explore?


Once you have identified these items …

  • Identify and engage stakeholders, and document these roles and the people
  • Identify and document business drivers, objectives and goals
  • Understand and document the current landscape – needs (including must-haves), technology, gaps, users, IT staff, resource availability, and more
  • Identify and document options – based on current landscape, technology, budget, staff resources, etc.

… you can develop a “living” Strategy and Roadmap for BI & Analytics. And when I say “living”, I mean it will not be a static document, but will be fine-tuned over time as new information emerge and as changes arise in business needs, technology, and staff resources.


Your Strategy and Roadmap for BI & Analytics should include, but is not limited to:

  • BI & Analytics that will be used to satisfy business drivers, objectives and goals
  • Data acquisition and storage plan for meeting the analytics needs
  • Technology platforms that will be used to process and store data, and deliver the analytics
  • Information about any new technologies that needs to be acquired or implemented, and schedules
  • Roles and Responsibilities for all stakeholders involved in BI & Analytics projects
  • Planned staffing allocations and schedules
  • Planned staffing changes and schedules
  • User training (business users) and Delivery team training (technical implementers & developers for example)
  • List dependencies for each item or set of items

Working with built-in datasets in R

This is a quick cheat sheet of commands for working with built-in datasets in R.

R has a number of base datasets that come with the install, and there are many packages that also include additional datasets.  These are very useful and easy to work with.


> ?datasets                                  # to get help info on the datasets package

> library(help=”datasets”)     # provides detailed information on the datasets package, including listing and description of the datasets in the package

> data()                                       # lists all the datasets currently available by package

data(package = .packages(all.available = TRUE))  # lists all the available datasets by package even if not installed

> data(EuStockMarkets)            # loads  the dataset EuStockMarkets into the workspace

> summary(AirPassengers)      # provides a summary of the dataset AirPassengers

> ?airquality                                    # outputs in the Help window, information about the dataset “airquality”

> str(airmiles)                                 # shows the structure of the dataset “airmiles”

> View(airmiles)             # shows the data of the dataset “airmiles” in spreadsheet format



Statistics Basics – Descriptive vs Inferential Statistics

Descriptive Statistics
Statistics that quantitatively describes an observed data set. Analysis for descriptive statistics is performed on and conclusions drawn from the observed data only, and does not take into account any larger population of data.

Inferential Statistics
Statistics that make inferences about a larger population of data based on the observed data set. Analysis for inferential statistics takes into account that the observed data is taken from a larger population of data, and infers or predicts characteristics about the population.

Statistics Basics – Measures of Central Tendency & Measures of Variability

Measures of Central Tendency and Measures of Variability are frequently used in data analysis.  This post provides simple definitions of the common measures.


Measures of Central Tendency

Mean / Average – sum of all data points or observations in a dataset divided by the total number of data points or observations in the dataset.

The mean or average of this dataset with 5 numbers {2, 4, 6, 8, 10} is: 6

Sum of all data points:     (2+4+6+8+10)
Divided by:                       ———————–  = 6
Number of data points:              5

Median – with the values (data points) in the dataset listed in increasing (ascending) order, the median is the midpoint of the values, such that there are an equal number of data points above and below the median.  If there are an odd number of data points in the dataset, then the median value will be a single midpoint value. If there an even number of data points in the dataset, then the median value will be the mean/average of the two midpoint values.

The median of the same dataset {2, 4, 6, 8, 10} is:  6
This dataset has an odd number of data points (5), and the middle data point is the value 6, with 2 numbers below (2, 4) and 2 numbers above (8, 10).

Using an example of a dataset with an even number of data points:
The median of this dataset {2, 4, 6, 8, 10, 12} is: (6 + 8) / 2 = 7
Since there are 2 middle data points (6, 8), then we need to calculate the mean of those 2 numbers to determine the median.

Mode – the data point that appears the most times in the dataset.

Using our original dataset {2, 4, 6, 8, 10}, since each of the values only appear once, none appearing more times than the others, this dataset does not have a mode.

Using a new dataset {2, 2, 4, 4, 4, 4, 6, 8, 8, 8, 10}, the Mode in this case is: 4
4 is the value that appears the most times in the dataset.

Measures of Variability

Min – the minimum value of the all values in the dataset.
Min {2, 3, 3, 4, 5, 5, 5, 6, 7, 1, 3, 2, 7, 7, 8, 2, 3, 9} is 1.

Max – the maximum value of the all values in the dataset.
Max {2, 3, 3, 4, 5, 5, 5, 6, 7, 1, 3, 2, 7, 7, 8, 2, 3, 9} is 9.

Variance – a calculated value that quantifies how close or how dispersed the values in the dataset are to/from their average/mean value.  It is the average of the squared differences from the mean.

Variance of {2, 3, 4, 5, 6} is calculated as follows …

First find the Mean.  Mean = (2 + 3 + 4 + 5 + 6) / 5 = 4

Then, find the Squared Differences from the Mean … where ^2 means squared …
(2 – 4)^2 = (-2)^2 = 4
(3 – 4)^2 = (-1)^2 = 1
(4 – 4)^2 = (0)^2 = 0
(5 – 4)^2 = (1)^2 = 1
(6 – 4)^2 = (2)^2 = 4
Average of Squared Differences: (4 + 1 + 0 + 1 + 4) / 5 = 2

Standard Deviation – a calculated value that quantifies how close or how dispersed the values in the dataset are to/from each other.  It is the square root of the Variance (defined above).

For the above dataset, Standard Deviation {2, 3, 4, 5, 6} = Square Root (2) =~ 1.414

Kurtosis – a calculated value that represents how close the tail of the distribution of the dataset is to the tail of a normal distribution*.

Skewness – a calculated value that represents how close the symmetry of the distribution of the dataset is to the symmetry of a normal distribution*.

* A normal distribution, also known as the bell curve, is a probability distribution in which most values are toward the center (closer to the average) and less and less observations occur as you go further from the center.

Range – the difference between the largest number in the dataset and the smallest number in the dataset.
Range {2, 4, 6, 8, 10} = 10 – 2 = 8


Thanks for reading!


Exploring the RStudio Interface

In this post we will explore the RStudio interface.  This is where you will manage your R environment, issue commands for processing and analyzing data, create scripts, view results, and much more.  Below is an image of the default RStudio interface.


On the left:
Console – the window where you enter commands, and where output is displayed.

On the top-right:
Environment tab – shows the variables and values created through the console
History tab – shows the history of past executed commands

On the bottom-right:
Files tab – displays folders and files from the file system, from which you can select files, set working directory, create folders, copy and move folders and files, and more.
Plots tab – displays the plots that have been created and allows for you to export them.
Packages tab – displays all the packages currently installed and available.  Loaded packages will have the checkbox checked and packages must be loaded before they can be used.
Help tab – useful for getting help about R and R packages, and keyword search is available which can be very helpful when you don’t know exactly what you are looking for.
Viewer tab – can be used to view local web content, such as, static HTML files written to the session temporary directory or a locally run web application.

On the top-left (when a script is created or opened):
– Script pane and tabs – When you create or open a R script, it will create a new pane area in the top-left of the application window, and the Console pane will get shifted down to the bottom-left area.


A new Script tab will open in this pane for each new script opened or created. From this window, you will be able to run your script line by line or in its entirety, among many other functions.

Thanks for reading!

Installing, Loading, Unloading, and Removing R Packages in RStudio

R has thousands of packages available for statistics and data analytics, but before you can use them, they need to be installed.  In this post I cover installing, loading, unloading, and removing R packages in RStudio. In these examples, I use the ggplot2 package – a popular graphics and visualization package in R.  Wherever you see ggplot2 in the examples below, you can replace it with the package you want to perform these actions on.

To install a package via the User Interface

In RStudio, select Tools -> Install Packages from the main menu, or click Install in the Packages tab on the bottom-right.

The Install Packages dialog appears.

Start typing the name of the package you want to install, and a list of all packages that start with the letters you have type will show up in the selection list.

Select (or type the full name of) the package you want to install, ensure that “Install dependencies” is checked, and click “Install”.
The statement will be automatically entered and run as shown below.

And the output will show if the package is successfully installed.

At this point, you will be able to see the package in the list in the Packages tab on the right.

To install a package via script
Instead of using the user interface (menu), you can also install packages directly via script.


See script statement below.  And as before, the package shows in the Packages tab.

After a package has been installed, it needs to be loaded before you can use it.

Loading and Unloading a package (via user interface or script):

To load a package you can simply check the checkbox beside the package name in the Packages tab – as shown by the yellow box highlight below.  This will automatically enter and execute the command shown with the yellow arrow.

Or you can enter the script, by entering the command as shown with the green arrow:


As an alternative, the require(ggplot2) command will also load the package.

To unload the package, you can simply uncheck the checkbox beside the package name in the Packages tab, or enter the command shown by the red arrow:

detach("package:ggplot2", unload=TRUE)

R_Loading_Unloading_Packages You will notice that after running the detach command, the Package checkbox will not be checked (will be unchecked).

To remove (uninstall) a package (via user interface or script):

To remove a package, you can simply click the “x” icon shown to the right of the package in the Packages window. See the yellow box highlight beside ggplot2 below.

Or you can run the script command as shown below in the Console window.



The below shows the output after removing a package.  You will notice that the package is no longer in the Packages list on the right hand side.  In this example, ggplot2 is no longer in the list of packages.

An advantage of using the script option instead of the user interface methods to perform the above actions is that you will have a history of what you have done.

Thanks for reading!

Installing RStudio on Windows

RStudio is an open-source integrated development environment (IDE) for R. It also has commercial versions with expansive capabilities available (at a cost).  It runs on the desktop with multiple operating systems, or in a browser connected to a RStudio server.  RStudio includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management. This post covers installing RStudio.  Note that R needs to be installed first – see this post for installing R.

To get started, go to

Under RStudio, click Download.

Choose the desired version of RStudio.  You will likely want the “RStudio Desktop (Open Source License)” version. On the same page, you will be able to read about the various options available – free and pay versions.

This will bring you down in the page to the installers.  Choose the installer that is appropriate for you.  In this example, we are installing on Windows, and so we chose the “RStudio 1.0.153 – Windows Vista/7/8/10” version.  Note: RStudio requires that R is installed.  If you have not already installed R, do so first (see this post for Installing R).

After the download is complete, run the exe by double-clicking on it.

Click Next at the Welcome screen.

Choose the install directory, click Next

Chooses a Start Menu Folder, click Install

Installing …

Complete the installation.

Run RStudio

RStudio IDE

You will notice that the left window “Console” is the same as the “R Console” window in the stand-alone R installation.  This is because RStudio is built on top of R.

Good luck on your R journey!