How to Perform Population Validation

Do you perform appropriate population validation of the data you rely on in an audit?

Population validation is simply gaining confidence that the data you are using in your audit contains all the appropriate data for your audit objectives (e.g., your server list includes all the SOX servers).

For the difference between population validation and data validation, see Why You Must Validate Data.

So how do you do population validation? Let’s look at an example…

Just the SOX, Ma’am

For example, when you request and get a list of SOX servers, you can do the following:

  • Ask the subject matter expert (SME, a knowledgeable person who is providing the data) whether the list contains ALL of the SOX servers, and only the SOX servers. Often this person is from the IT department.

Make sure you ask him HOW he KNOWS the list is complete and accurate. His answer will indicate whether the data is appropriate or how well he knows his job, or both. His predecessor may have given him the list and pronounced it golden 2 years ago (I’ve heard that!). Or he might really know his stuff and be able to explain why the list is accurate.

  • Send the list to another supposedly knowledgeable person and ask her whether it appears complete and accurate (e.g., for SOX servers, a business analyst on the finance team might be able to verify them, or a security analyst). Make sure to ask her why she gave you the answer they did (always ask that Q). Also ask her if she knows of another source/document you can compare the list to.

It helps to play one SME off another. Tell the second SME, “Bob said that he thought this list was complete (or incomplete) because of X. Does that make sense to you?”

  • Compare the list the SME gives you to a list published on the intranet (if the source is the same as the one the SME used, that’s no help).
  • Scan the network yourself (usually auditors shouldn’t do this as it may appear as ‘hacking’). Usually an automated inventory tool (e.g., Tivoli) does this on a regular basis and creates reports, or someone on the security, network, or server team can do a quick scan. Find that person and have them run the scan.
  • Compare it to a list used previously in an audit (within the last 12 months).
  • Ping all the servers in the new list you received to ensure that they are all valid. A good, free, easy-to-use tool for this is Fast Resolver.

Keep in mind that population validation does not always require exact and perfect confirmation (that would be nice). Usually, gaining reasonable comfort is enough. For example, a list of servers used in the last audit may have changed a bit, but if a high majority of the servers are the same in your new list, you can take some comfort from that, especially if that’s all the validation that is available.

Also, you have to consider the risk level of what you’re auditing. A high-risk audit requires a much higher level of comfort. The question is, how much time and effort is required to validate the list, and is that appropriate based on this particular audit and the risks involved (ok, that’s 2 questions).

Having said that, I’ve often found that exploring the differences between this year’s list and last year’s list is often where the issues are identified. I always explore the differences as much as I can given the risk level, what my spider sense is telling me, and the time available during the audit. If you sense some real issues, you might want to request more time for the audit.

Watch and Learn

In my experience, most audit departments request data and then receive it via email. Fast and helpful. But how accurate and dependable is it?

One good way to understand and validate data is to meet with the SME and watch her run a query and download the data. And if you have the SME capture screenshots of the process, you will obtain the tool/application used to obtain the data, the server/instance/database from which the data was obtained, and how difficult or easy a process it is (the more complicated the process, the more closely you need to observe each step).

You will also see the choices the SME makes on certain screens and any error messages or warnings that appear. And in the end, you will have a visual process for obtaining the same or similar data next year (make sure you document the process in your workpapers). So if the SME forgets or you get a new SME (or another auditor does the audit), it’s easy to obtain the data the next time. SMEs appreciate auditors who make their job easy by bringing what was done during the prior audit.

One time, when I showed the query used in the previous audit to a new SME, he said, “You used this last year? It’s missing most of the data you’re looking for. It’s obvious that the SME and the auditor in the previous audit didn’t know what they were doing.” He went on to explain why the previous query was wrong and provided a new one that provided the right data.

Yes, documenting the process takes extra time and effort on both the auditor and SME’s parts, but you will learn more about the process/system/data (and about the SME when you see her cube’s environment and the under-the-breath comments she makes about how stupid the system is from which she’s pulling data).

If the SME is at a remote site and can’t be visited, you can have the SME share her desktop so you can watch what she does and that she captures the appropriate screenshots.

It Really Happened

I was doing a cyclical audit that another auditor had performed the previous year. Unlike the previous auditor, I watched the SME download the data to a spreadsheet. To my surprise, he downloaded 3 separate Excel worksheets and then combined them into one, which was not mentioned in the previous year’s workpapers. The SME explained that the system timed out when he tried to download the entire file, so instead he broke it up into 3 files and then combined it together.

Whenever a SME manipulates data, the chance of error increases dramatically. If I ever have to rely on data that I was not able to observe being generated, I always ask the SME whether the data was downloaded into one file or whether the data was formatted, tweaked, or adjusted in any way. If so, I ask the SME to document or describe the process, not only so I can validate the data, but also so I know what may need to be done the next time around to produce the same data.

Again, if data is adjusted in any way, I ask the SME how they KNOW they didn’t duplicate, lose, or accidentally change any records. In the case of the 3 worksheets, I’d want to hear the SME say something like, “I count the records in each worksheet and verify after I merge them that I have the same number of records. I also use the Excel Remove Duplicates command to ensure no duplicates were introduced.” Of course, if you’re observing the merge, you can see whether the SME counts the records and checks for duplicates.

I groan every time I find a problem that a SME could have identified by performing a simple totals check like the one described above. Even security and compliance professionals fail to do this on a regular basis when they provide data or perform their own controls.

Question Everything

Being skeptical is a auditor’s best friend. Asking WHY several times can help get to the bottom of a matter (see this for an introduction to 5 WHYs; here’s also an interesting rebuttal to this procedure). The point is, don’t take things at face value; ask additional questions if you think it’s warranted.

So no, I don’t question everything, just the things I don’t understand, are complicated, or don’t seem reasonable. The key is not to appear like you’re out to get someone or mistrust them–I like to say things like, “Help me understand this process by explaining X” or “Can you simplify that a little” or “Please provide more detail about X”.

And when you question things, use open-ended questions (questions that can’t be answered YES or NO).

Review Queries Carefully

Another way of validating whether you received the right data and complete data is to look at the query and have the business contact walk you through it.

For example, assume you want a list of vendors and vendor addresses so you can compare them to employee addresses to see if any fraud is occurring; you want to know whether any vendor address is actually an employee’s address. You also want to see whether someone is changing vendor addresses to cover their tracks.

You request the data, get the data, and you are told the following the SQL query produced the data:

SELECT Vendor_Account Vendor_Name Street City State Zip
WHERE Status = A

The WHERE statement limits the data; only vendors that have a status of ‘A’ are included in your data. You ask your business contact, what does A mean? Usually, you’ll get a reply that it means the vendor is active and payments can be made to them.

That’s fine, as long as you wanted only active vendors. But since you need inactive vendors (all vendors), you need that WHERE statement removed. You should also request a list of all status codes and their definitions.

You also need a couple more fields: 1) field that identifies the previous address value before it was last changed; 2) field that includes the date of the change, and 3) ID of the person who changed it. When the query is updated, you get this:

SELECT Vendor_Account Vendor_Name Street City State Zip Changed_Value Change_Date UserID

This query meets your needs better and is more likely to provide the data you need for testing. If it’s not clear from the query (and it is missing from the one above), ask for the name of the database in which this data is stored (I also ask for the name/OS of the server on which the database resides).

Always include the query and the explanation of each field and any limiting statements (like WHERE statements) in your workpaper.

Your Turn

I’d like to hear from you on these topics:

  • Whether population validation is performed on all your data
  • Do you regularly observe data being generated, or do you just receive it?
  • What your list of favorite questions or methods for getting more information?
  • Do you ever use desktop-sharing programs to watch your SMEs generate data?

Some other helpful methods are included in these posts:



Filed under Audit, How to...

10 responses to “How to Perform Population Validation

  1. So does anyone out there do population validation of the files used in an audit? Or do you just accept what you’re given as the truth?


  2. Is it OK if the internal auditor himself is the one who obtain the data required, from company’s server?


    • iac,
      Absolutely. In fact, that’s the best way of ensuring audit independence and that no one tampered with the data. Having said that, the auditor obtaining the data must understand what they are doing and that the method used to obtain the data is valid.

      For example, if you need data from the Northwest region and the Southwest region, but don’t realize that the data is stored in 2 separate tables and you only query 1 table, you won’t get the full population.

      Likewise, if you are obtaining all the admin accounts from Active Directory, but only list accounts in the Domain Admins group, you will be missing some of the admins, especially admins in company-specific groups that IT created above and beyond the Microsoft defaults.

      So get the data, but make sure you know it’s accurate. The first time you have the admin help you, and after you understand the process, you can obtain it yourself after that if you have the access. However, periodically, you need to talk to the admin to ensure that nothing has changed (e.g., a third table was added or another admin group was added).


  3. Pingback: New IT Auditors Should Start Here | ITauditSecurity

  4. Pingback: Auditors, Do Data Analytics or Die | ITauditSecurity

  5. Pingback: How to Profile Data | ITauditSecurity

  6. Pingback: Excel: Basic Data Analytics | ITauditSecurity

  7. Pingback: Before You Analyze Data | ITauditSecurity

  8. Pingback: Before You Analyze Data2 | ITauditSecurity

  9. Wow, 10 years later, and only 1 person ever left a comment. That makes sense, given all the auditors I know from many different companies, who don’t know you should do population validation or even how to do it.

    We have a sad profession.


Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.