Check Excel Data for Blank and Invalid Values (Part 2 – Sort)

basic data analytics1As I explained in the last Excel post, you can check for blank and invalid data in Excel several ways.

In this post, I will focus on the insights and issues encountered by sorting each column from A to Z and then Z to A.

Sounds pretty simple, but I’m willing to bet you will be surprised to learn a thing or two…

For a list of the reasons why you must validate data before analyzing it, see Why You Must Validate Data.

This post is a continuation of the Excel: Basic Data Analytic series.

The Excel data referenced in this post, which relies on Excel 2013, is shown below. I used a short file in this example to keep things simple.

Excel sort data

Validate data via sorting

Sorting is one of the fastest ways to review your data for problems.

  1. Apply filter dropdowns to each column heading by selecting the first row in the table, clicking the Data tab, and then clicking Filter .
  2. One at a time, sort each column ‘alphabetically’ by clicking Sort A to Z (text fields), Sort Smallest to Largest (numeric fields), or Sort Oldest to Newest (date fields). Note the following in the graphic below, which shows the results of sorting each column separately.
    • Cells containing no data (true blanks) are sorted to the bottom (see green cells).
    • Cells containing only one or more spaces are sorted according to the format of the column (see pink cells). In text fields, they appear at the top; in the other formats, near the bottom. These false blanks are not as easy to find and Excel sometimes treats them differently than true blanks. Beware of them.
    • Cells containing non-printable characters and symbols are sorted according to how Excel interprets them (see blue cells), which makes them difficult to find. Since the snowflake symbol (cells A13, C13, & D12) is interpreted as a “T” by Excel,* it is sorted alphabetically as an “T”, near the bottom of the alphabetical list. Also note that the lightning bolt in cell c10 causes that value to be sorted to the bottom, after valid data.
    • In text fields, numbers, dates, and special characters (like punctuation) are sorted to the top (see yellow in columns A & B).
    • In number fields, negative numbers and zeros are sorted to the top (see yellow in column C). These values may be valid, depending on the process involved. Be sure to check this.
    • In number fields, dates, text, and special characters are sorted to the bottom (see grey in column C).
    • In date fields, invalid data is sorted to the bottom (see purple in column D). Note that future dates (see orange cell) are at the bottom; in this example, 12/12/2050 appears in the middle only because so many invalid values appear at the bottom. A future date may be valid, depending on the process involved. Again, be sure to check this.

Excel alpha sort

*To see how a symbol like the snowflake is interpreted by Excel, click in the cell and look in the formula bar for the value (see below).

Excel symbol interpretation

  1. One at a time, sort each column in reverse (Z to A).
  2. Note the following in the graphic below, which shows the results of sorting each column separately.
    • Cells containing true blanks are still sorted to the bottom (see green cells).
      As a result, you cannot easily identify true blanks by sorting, unless you want to scroll all the way to the bottom of your spreadsheet, so use another method, like filtering (a future topic).
    • Cells containing only one or more spaces are still sorted according to the format of the column (see pink cells).
    • Non-printable characters and symbols are still sorted according to how Excel interprets them (see blue cells).
      As a result, they may appear anywhere in your data. If they appear in the middle of a large file, you probably won’t find them. So when a formula or some other analysis is not working as expected (wrong count or percentage is wrong or a letter appears where it should not (like “T”, which could mean you have a snowflake), look for non-printable characters and symbols.
    • Invalid data is sorted the opposite as it was above (if invalid data was at the top before, it is now at the bottom).Excel reverse sort

Conclusions

– Use sorting for identifying:

  • False blanks.
  • Unexpected date conversions in a text field (user ID JUL15 becomes 15-Jul).
  • Invalid data, as some different types are sorted to the top when you do an alpha sort or reverse sort. That’s why you need to do both sorts.

– True blanks are hard to identifying during sorting as they always fall to the bottom during both types of sorting. Use filtering instead.

Questions for discussion:

  1. Do you typically validate your data? If not, why? Here’s why I validate.
  2. What methods do you use?
  3. How often do you find key data fields that are blank or contain invalid values?

See all the posts for these series at Excel: Basic Data Analytic series.

Advertisements

2 Comments

Filed under Data Analytics, How to..., Technology

2 responses to “Check Excel Data for Blank and Invalid Values (Part 2 – Sort)

  1. One more tip to sort the numbers row wise…Select the cells > Data Tab > Sort > Options button > Select “Sort left to right” in Orientation box.
    Hope this tip helps :)

    Like

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s