This is the first post in a series of basic data analytic procedures using Excel. If you work with data regularly, these procedures will help you understand your data better and analyze it faster.
I started this series because I am asked how to do these tasks, sometimes by experienced contractors and auditors.
These procedures are written for Excel 2013, but the same steps should also work in Excel 2010, but the screenshots might be a little different.
Why Duplicate Transactions Matter
Knowing whether a file contains duplicates is important for the following reasons:
- If you total a column of numbers for all records, duplicate records will cause the total to be incorrect. Likewise, if you’re trying to determine the percentages of IT auditors vs financial auditors in a list, duplicate records can affect the percentages.
- Having duplicates may indicate the system is processing or exporting records incorrectly.
- Duplicate transactions may be an indication that users are making errors or that fraud is occurring.
Removing Duplicates Across All Columns
To remove duplicates across all values (columns) in an Excel table:
- Click inside one of the cells in the table from which you want to remove duplicates.
In Eample 1 below, I clicked in cell A2 (to enlarge an example, click it).
- On the Data tab, click Remove Duplicates.
See Example 1 below.
- The Remove Duplicates dialog box appears. Since you want to remove rows/lines that are duplicates across all columns, leave all the columns checked, and press OK.
See Example 2 below.
- The duplicates are removed. Notice that Excel tells you how many duplicates were removed and how many unique values remain.
See Example 3 below.
Removing Duplicates Across Specific Columns
If you are interested in removing duplicates only in specific columns, leave only those columns checked.
For example, if you wanted to remove all duplicate positions from the list, you would leave only the Position column checked (see Example 4 below).
When you press OK, the list would appear as follows (see Example 5).
The first IT auditor, Finance Auditor, and Operations Auditor would remain, and all others would be removed (if only getting rid of bad auditors was that easy).
See all the posts for these series at Excel: Basic Data Analytic series.