Analyzing Data


Recently Scott Weersing of GP Strategies asked the questions “Should we add data science to competency/skills needed now?”  in a post he made on LinkedIn.   The post referenced a Harvard Business Review article “The Democratization of Data Science”.

The post and string of comments prompted this post which describes the data analysis workflow that is introduced in FKA’s Training Needs Analysis (TNA) workshop. It outlines a basic data analysis workflow that learning and development professionals should have the capability to use.

From Wikipedia:

“Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making.”

 Data Analysis Workflow

Analysis of data can be a sophisticated process that requires an advanced degree in statistics and the use of specialized analysis tools. The seven-step process introduced here is a simple approach that can be undertaken by applying some critical thinking and the effective use of a spreadsheet like Microsoft Excel™.


Click on Image to Enlarge


1.     Clean the data (remove blanks, inaccurate, spelling etc.).

In this step you need to review the data to clean up responses to improve their

potential to be analyzed.


For example, in the questionnaire respondents were asked to enter their country and city. When you examine the responses for ‘City’ you observe New York, New York City, NYC, Brooklyn, etc. To clean the City field you would change all of the responses to ‘New York City’ so all of the data can be summarized for that one location.




        Click on Image to Enlarge

2.     Check the data quality (completeness of responses, range of answers).

Ideally, most data quality challenges would have been identified and fixed during the pilot. In spite of the best efforts, some poor-quality data may end up in the responses pool.

Click on Image to Enlarge

In the questionnaire, respondents were asked to enter their current skill level and their ideal skill level for a wide range of items. The responses to these questions were numeric values (1, 2 or 3) which were used to flag a gap by calculating the difference between the current and ideal levels. The quality review of the data showed some responses had only one of the two values entered; which means this data cannot be used to calculate in the gap.


3.     Check the quality of measurements. (Did you get the answers you were expecting?)

The TNA should be undertaken with some clear objectives and the data collection is intended to help provide information related to these objectives. Simply put, you are checking to see if you got the answers you expected.

Click on Image to Enlarge

You can use the difference between the current-ideal responses for a given skill or knowledge item as an indication of a gap – zero meaning no gap, and two meaning a big gap. As shown below, respondents rated their current skill level as greater than the ideal level so the gap has a negative value. This is not what was expected and required some thought to interpret.

4.     Check the sample profile.

A TNA is conducted for a target population. If you collect a data sample you need to review the demographic data to confirm it aligns with the demographic profile of the target population.

An initial review of the data indicated there were no responses from a department which has 20% of the population – that was a red flag. A quick check confirmed that the departments’ data had not been included in the data pool.

Click on Image to Enlarge

5.     Summarize the responses.

The first and simplest form of data analysis is to summarize the raw data. A summary provides a clearer picture of what the TNA data is saying.

The responses to the survey provided over sixty thousand data points which are impossible for the human brain to interpret. For example, in the Communications Skills segment, 92% of the respondents use Speaking Skills while only 74% use Listening Skills. When the number of people using the skills is combined with the number indicating a performance gap between current and ideal then Presentation Skills has the greatest requirement for training while Meeting Management has the least.

Click on Image to Enlarge

6.     Graph the summary.

Use a tool like Microsoft Excel™ to create a graph to visually represent your results.

The graph clearly shows that respondents reported the largest gap in their Presentation Skills.

Click on Image to Enlarge

7.     Drill down on any extremes.

Take the areas with the largest gaps and explore the data further. You can take any field of demographic data and summarize the training gap for that segment.

The Presentation Skills showed the biggest gap so it was explored further. A reasonable question to ask was if this training need was common across the entire organization or was it more pronounced in one area. Note, that as the graph below shows, there are significant demands in Government Services and Structured Finance.

Relating this workflow back to Steve Weersing’s post and subsequent discussion in the comments there are some basic data analysis skills that learning and development professionals should have.


Jim Sweezie
VP Research and Product Development