VSTO & .NET & Excel

January 28, 2008

Measure Data Quality

Filed under: .NET & Excel, COM Add-ins, VSTO & Excel — Dennis M Wallentin @ 9:33 pm

Nowadays it’s a common task to retrieve data from databases and put the data into MS Excel. The acquired data is usually part of the information in decision processes. 

The data that is stored in databases are either collected automatically or manually. The process of collecting data manually can either be dictating or free. With ‘dictating’ I mean that no fields are allowed to be left empty upon registration while with ‘free’ I refer to the opposite, i e fields can be left empty.

One of the more critical data for all kind of corporate is the volume of quotations which is a key performance indicator (KPI). This type of data is usually manually entered into the business system.

Measuring the quality of the data is to try to answer the question to which degree the data is accurate and reliable. To do it we can use different approaches where we can create rather sophisticated models but here I prefer to keep it simple. The reason for it is that we need to make sure that it can be communicated in an easy way within the corporate in order to improve the quality. In my experience we should therefore keep it straight and simple.

A dictating system may seems to be attractive to use as it makes sure that users enter all required data by filling in all fields. But in view of the quality of the data we get a situation where it is very difficult to actually measure it. A free system can be a better approach as it allows to leave data fields empty which makes it possible to, at least,  measure which fields are filled in and not.

The following is an output from a free system where the data is retrieved in order to present the volume of quotations for a given time period:


As we can see in the above table some fields are left empty. To measure we need to decide which fields are important as indicators of the data quality. Here in the example the fields Expiring Month (EM), Sales Type (ST) and Sales Value (SV) have been selected.

Here we only measure the number of records that has one or more of these fields empty or not. The following table shows the output for the case:


In some corporate the presentation paradigm is based on charts instead of tables so here is the output presented in a chart as well:

(Based on the input from Dermot and Jon the chart is revised to the better)

The output of this simple measurement of quality indicates how accurate and reliable the information for the volume of quotations actually is.

In my experience, when some indicators for the quality of the data are available the decision makers gets a better understanding for the presented information. The indicators also provide input to improve the quality of data.

Kind regards,



  1. Let’s not perpetuate the presentation paradigm that favors 3D exploded pie charts. Either a bar chart (http://www.freefilehosting.net/show/3b5j6) or a dot plot (http://www.freefilehosting.net/show/3b5j4) would have been a better choice.

    Comment by Jon Peltier — January 28, 2008 @ 10:29 pm

  2. Dennis, I find it hard to read the pie slices without text labels (the eye has to keep jumping between the pie and legend), also it’s not clear which slices are “good” and which are “bad”. There’s no way to take the information in at a glance, as busy executives need to do.

    In this case, I think a Venn diagram would do the best job because it would clearly show the different combinations.

    Alternatively, I would suggest a (sideways) bar chart with the bars arranged in order from best(nothing missing) to worst (all missing).

    Comment by dermot — January 28, 2008 @ 11:17 pm

  3. Hi Jon and Dermot,

    Thanks for Your comments which are highly appreciated.

    I’m in favour of tables.
    Mainly due to my shortcoming to create relevant charts models as well as due to my semi-color blindness.

    I will update the chart accordingly.

    Kind regards,

    Comment by Dennis Wallentin — January 29, 2008 @ 11:48 am

  4. A table is also good. I started with a table to make my charts, three columns: {EM, ST, SV}, populated with Valid or Blank. This made it easier to get all 8 (i.e., 2^3) sets of conditions in the output, where you had only seven in the pie.

    Comment by Jon Peltier — January 29, 2008 @ 1:50 pm

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: