How to Perform Data Validation

Do you perform appropriate population validation of the data you rely on in an audit?

Population or data validation is simply gaining confidence that the data you are using in your audit contains all the appropriate data for your audit objectives (e.g., your server list includes all the SOX servers).  So how do you do it? Let’s look at an example…

Just the SOX, Ma’am

For example, when you request and get a list of SOX servers, you can do the following:

  • Ask the subject matter expert (SME, a knowledgeable person who is providing the data) whether the list contains ALL of the SOX servers, and only the SOX servers. Often this person is from the IT department.

Make sure you ask him HOW he KNOWS the list is complete and accurate. His answer will indicate whether the data is appropriate or how well he knows his job, or both. His predecessor may have given him the list and pronounced it golden 2 years ago (I’ve heard that!). Or he might really know his stuff and be able to explain why the list is accurate.

  • Send the list to another supposedly knowledgeable person and ask her whether it appears complete and accurate (e.g., for SOX servers, a business analyst on the finance team might be able to verify them, or a security analyst). Make sure to ask her why she gave you the answer they did (always ask that Q). Also ask her if she knows of another source/document you can compare the list to.

It helps to play one SME off another. Tell the second SME, “Bob said that he thought this list was complete (or incomplete) because of X. Does that make sense to you?”

  • Compare the list the SME gives you to a list published on the intranet (if the source is the same as the one the SME used, that’s no help).
  • Scan the network yourself (usually auditors shouldn’t do this as it may appear as ‘hacking’). Usually an automated inventory tool (e.g., Tivoli) does this on a regular basis and creates reports, or someone on the security, network, or server team can do a quick scan. Find that person and have them run the scan.
  • Compare it to a list used previously in an audit (within the last 12 months).
  • Ping all the servers in the new list you received to ensure that they are all valid. A good, free, easy-t0-use tool for this is Fast Resolver.

Keep in mind that population validation does not always require exact and perfect confirmation (that would be nice). Usually, gaining reasonable comfort is enough. For example, a list of servers used in the last audit may have changed a bit, but if a high majority of the servers are the same in your new list, you can take some comfort from that, especially if that’s all the validation that is available. Also, you have to consider the risk level of what you’re auditing. A high-risk audit requires a much higher level of comfort.

Having said that, I’ve often found that exploring the differences between this year’s list and last year’s list is often where the issues are identified. I always explore the differences as much as I can given the risk level, what my spider sense is telling me, and the time available during the audit. If you sense some real issues, you might want to request more time for the audit.

Watch and Learn

In my experience, most audit departments request data and then receive it via email. Fast and helpful. But how accurate and dependable is it?

One good way to understand and validate data is to meet with the SME and watch her run a query and download the data. And if you have the SME capture screenshots of the process, you will obtain the tool/application used to obtain the data, the server/instance/database from which the data was obtained, and how difficult or easy a process it is (the more complicated the process, the more closely you need to observe each step).

You will also see the choices the SME makes on certain screens and any error messages or warnings that appear. And in the end, you will have a visual process for obtaining the same or similar data next year (make sure you document the process in your workpapers). So if the SME forgets or you get a new SME (or another auditor does the audit), it’s easy to obtain the data the next time. SMEs appreciate auditors who make their job easy by bringing what was done during the prior audit.

Yes, this takes extra time and effort on both the auditor and SME’s parts, but you will learn more about the process/system/data (and about the SME when you see her cube’s environment and the under-the-breath comments she makes about how stupid the system is she’s pulling data from).

If the SME is at a remote site and can’t be visited, you can have the SME share her desktop so you can watch what she does and that she captures the appropriate screenshots.

It Really Happened

I was doing a cyclical audit that another auditor had performed the previous year. Unlike the previous auditor, I watched the SME download the data to a spreadsheet. To my surprise, he downloaded 3 separate Excel worksheets and then combined them into one, which was not mentioned in the previous year’s workpapers. The SME explained that the system timed out when he tried to download the entire file, so instead he broke it up into 3 files and then combined it together.

Whenever a SME manipulates data, the chance of error increases dramatically. If I ever have to rely on data that I was not able to observe being generated, I always ask the SME whether the data was downloaded into one file or whether the data was formatted, tweaked, or adjusted in any way. If so, I ask the SME to document or describe the process, not only so I can validate the data, but also so I know what may need to be done the next time around to produce the same data.

Again, if data is adjusted in any way, I ask the SME how they KNOW they didn’t duplicate, lose, or accidentally change any records. In the case of the 3 worksheets, I’d want to hear the SME say something like, “I count the records in each worksheet and verify after I merge them that I have the same number of records. I also use the Excel Remove Duplicates command to ensure no duplicates were introduced.” Of course, if you’re observing the merge, you can see whether the SME counts the records and checks for duplicates.

I groan every time I find a problem that a SME could have identified by performing a simple totals check like the one described above. Even security and compliance professionals fail to do this on a regular basis when they provide data or perform their own controls.

Question Everything

Being skeptical is a auditor’s best friend. Asking WHY several times can help get to the bottom of a matter (see this for an introduction to 5 WHYs; here’s also an interesting rebuttal to this procedure). The point is, don’t take things at face value; ask additional questions if you think it’s warranted.

So no, I don’t question everything, just the things I don’t understand, are complicated, or don’t seem reasonable. The key is not to appear like you’re out to get someone or mistrust them–I like to say things like, “Help me understand this process by explaining X” or “Can you simplify that a little” or “Please provide more detail about X”.

And when you question things, use open-ended questions (questions that can’t be answered YES or NO).

I’d like to hear from you on these topics:

  • Whether population validation is performed on all your data
  • Do you regularly observe data being generated, or do you just receive it?
  • What your list of favorite questions or methods for getting more information?
  • Do you ever use desktop-sharing programs to watch your SMEs generate data?

Some other helpful methods are included in this post: What Everybody Ought to Know About Auditor Secrets

About these ads

3 Comments

Filed under Audit, How to...

3 responses to “How to Perform Data Validation

  1. So does anyone out there do population validation of the files used in an audit? Or do you just accept what you’re given as the truth?

  2. Is it OK if the internal auditor himself is the one who obtain the data required, from company’s server?

    • iac,
      Absolutely. In fact, that’s the best way of ensuring audit independence and that no one tampered with the data. Having said that, the auditor obtaining the data must understand what they are doing and that the method used to obtain the data is valid.

      For example, if you need data from the Northwest region and the Southwest region, but don’t realize that the data is stored in 2 separate tables and you only query 1 table, you won’t get the full population.

      Likewise, if you are obtaining all the admin accounts from Active Directory, but only list accounts in the Domain Admins group, you will be missing some of the admins, especially admins in company-specific groups that IT created above and beyond the Microsoft defaults.

      So get the data, but make sure you know it’s accurate. The first time you have the admin help you, and after you understand the process, you can obtain it yourself after that if you have the access. However, periodically, you need to talk to the admin to ensure that nothing has changed (e.g., a third table was added or another admin group was added).

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s