Depending on the size of the file and your preferences, you can either scroll through the dropdown list, sort each column from A to Z and then Z to A, or apply a filter.
Sometimes, you need to use a combination of these methods.
It’s important to know how these methods treat data differently and to be aware of their limitations.
For a list of the reasons why you must validate data before analyzing it, see Why You Must Validate Data.
This post is part of the Excel: Basic Data Analytic series.
The Excel data referenced in this post, which relies on Excel 2013, is shown below. I used a short file in this example to keep things simple (normally, your file is much larger and you won’t be able to view all the values without scrolling.)
This post describes how to validate data via the dropdown list. Future posts will describe the other 2 methods.
Validate data via dropdown list
This method is a fast way to review your data and look for problems.
- Apply filter dropdowns to each column heading by selecting the first row in the table, clicking the Data tab, and then clicking Filter .
- One at a time, click the filter dropdown arrow in each column and review the values. The graphic below shows the effect of each column sorted separately. Note that some values always appear at the TOP of the dropdown list, such as:
- Numbers in text field (see yellow in column A below).
- Dates in a text field (see yellow in column B), which can occur when user IDs are interpreted as dates (e.g., JUL15 becomes 15-Jul).
- Zeroes and negative numbers (see yellow in column C), which may be valid or invalid, depending on the context.
- Future dates (see yellow in column D), which may be valid or invalid, depending on the context.
- Unprintable characters, symbols, and other suspect characters appear in the dropdown list according to how Excel interprets them. For this reason, these values can be easy to miss in a big file where you have to scroll through a looooong dropdown list.
- The snowflake symbol in cell A5 (see first graphic in this post) is interpreted by Excel as a capital T (see blue T in column A graphic immediately above). Therefore, Excel lists it alphabetically as a T.
- The period symbol in cell C12 is also treated as a symbol and appears at the bottom (see blue . in column C).
- The lightning symbol in cell C11 is interpreted by Excel as a tilde (see blue 51~ in column C) and appears at the bottom.
- Blanks are always listed at the bottom of a dropdown list (see green Blanks in columns A, C, & D immediately above; column B had no blanks). Note that when using dropdowns, Excel treats blank cells and cells containing only spaces as Blanks (A7 & D9 are true blanks, while A9 & D4 contain a space). The sorting method treats these differently.
- In date and amount fields, invalid values are listed at the bottom.
- Text in an amount field (see green ok in column C immediately above).
- Invalid dates (see green 1/1/1000 in column D).
- After you identify suspect values in each column, clear the checkmarks in all other values, leave the suspect values checked, and click OK. Review all the values in each row, as an error in one column occasionally means other errors exist elsewhere in the row, as shown below.
Use dropdowns for identifying:
- Numbers in a text field.
- Unexpected date conversions in a text field.
- Negative numbers and zeroes in an amount/numeric field.
- Future dates in a date field.
- Blanks and some invalid values, but you have to scroll all the way down to the bottom.
Dropdowns can identify symbols and unprintable characters, but they are easy to miss in a long list, because you never know where in the dropdown they will appear.
See all the posts for these series at Excel: Basic Data Analytic series.