Excel: Basic Data Analytics

basic data analytics1Here’s a list of my basic data analytic procedures for Excel.

As I add more posts to the series, I’ll update this list.

I created this series because:

1) I often get asked by new AND EXPERIENCED auditors how to do these tasks,

2) when I review workpapers, I realize too many auditors are not aware of these functions,

3) these functions can save you a lot of time, and

4) you don’t need fancy tools like Audit Command Language to do data analytics (move over, skyyler!).

Also, I wanted to provide a list of simple analytics all in one place that you can use to explore your data.

Before You Analyze Data – an overview of population validation, data validation, and data profiling

How to Perform Population Validation – making sure your data is accurate and complete

Why You Must Validate Data

How to Profile Data

Excel: Identify Duplicates

Excel: Remove Duplicates

Excel: Identify Unique Values

Check Excel Data for Blank and Invalid Values (Part 1 – Dropdown)

Check Excel Data for Blank and Invalid Values (Part 2– Sort)

Excel: Text to Columns

Transform Data Fast with Excel Flash Fill

More Advanced Topics

Real Auditors Use Excel PowerPivot

Related Topics

Auditor Struggles – a 4-part series describing how some auditors struggled with these principles in a real audit, plus some solutions.

Stay tuned. More to come…..

28 Comments

Filed under Audit, Data Analytics, Free, How to..., Security

28 responses to “Excel: Basic Data Analytics

  1. This is a great post. I’d like you to cover on Excel Power Pivot. I still haven’t got chance to attend the training. Power Pivot is a more powerful pivot embedded in your Excel. Hopefully this can serve better to compensate some of Excel weakness compared with ACL.

    other essential functions in my opinion, you can add:
    text edit: create a common-key for reference using:
    – concatenate: or simply “&”. enable you to compare/lookup multiple criteria. e.g. combine EmployeeNo. and YYMM columns.
    – ctrl+h: quick add and replace (leave blank to remove) e.g. to combine names and remove spaces.
    – left: create common key e.g. just take the first 10-letter from the data (for vlookup later).

    – fuzzy lookup (https://www.microsoft.com/en-us/download/details.aspx?id=15011) a little addon for similarity lookup. I use this to compare vendors in multiple system, from a Holding Company view.

    I also an ACL user but I find Excel better because most of my data (from SAP) exported to Excel, so… why waste time analysing in ACL, then export to Excel again (to beautify/conclude for MS Word Report presentation).

    And, recently I downloaded an excel file with 300mb size and Excel failed to Open it. Kingsoft Spreadsheet can open but too bad the lines limited to 65536 only. No choice but to edit the text data. ACL also failed.

    Like

    • Latif Jxi,
      Thanks for all the comments…..I think you left the most comments by a reader in one day. Whoa!

      I’ve played with Power Pivot a bit, but its not on my blogging docket right now. Also havent’ used it in an audit yet. Need to finish the Basic Data Analytics series first.

      If I recall, I installed Power Pivot on a 64-bit Windows OS using 64-bit office, which allows the OS and Excel to access a lot more memory. And I did it in a virtual machine as it was a lot easier, cheaper, and faster than ordering a new laptop.

      I’ll look into your suggestions–thanks. I’ve been using Excel’s Fuzzy Lookup add-in, and I think it’s one of those gems that few people know about (for everyone else, it allows you to match columns in Excel without cleaning the data first, and the match does NOT need to be exact).

      I wrote some instructions up for my buddies, but it is really long. Fuzzy Lookup is easy to demonstrate and do, but hard to explain in written form. I really don’t want to get into videos, but some topics would lend themselves to that.

      The one mistake I see people make in Fuzzy Lookup (FL) is that they leave the ‘Number of Matches’ selection at the default of “1”. That means you will miss a second, third, or fourth match, as Excel will only show you the closest match. The main benefit behind this add-in is the non-exact or “close, but not perfect” match, which you won’t see if you use 1. I recommend at least 5 or 10 for this setting. [Recently an auditor told me about the big audit he did using FL and he never changed the setting from 1 in any of his tests. Wonder what he missed. When I told him about it, he said, “Really?”

      Although I use Excel when I can, I like ACL because you can script and automate audits, so if I ever think I’ll need to do it again, I use ACL.

      I’ve audited files larger than that in Excel. And in ACL, I’ve ran files with 25 million rows and 120 columns. Was slow in some cases, but it worked through it. That’s where those VMs come in handy!

      Like

  2. Hanuman Chalisa in English

    I savour, result in I found exactly what I was having a
    look for. You’ve ended my 4 day lengthy hunt! God Bless you man. Have
    a nice day. Bye

    Like

  3. Pingback: Excel: Identify Duplicates | ITauditSecurity

  4. Pingback: Excel: Remove Duplicates | ITauditSecurity

  5. Pingback: Excel: Text to Columns | ITauditSecurity

  6. Pingback: New IT Auditors Should Start Here | ITauditSecurity

  7. Pingback: Auditors, Do Data Analytics or Die | ITauditSecurity

  8. Pingback: Why You Must Validate Data | ITauditSecurity

  9. Scott Baker

    Wonderful article! We are linking to this great post on our website. Keep up the good work .

    Like

  10. Pingback: Will Robotics (RPA) Replace ACL? | ITauditSecurity

  11. Pingback: New IT Auditor (and WannaBEs) Master List | ITauditSecurity

  12. Kyle

    Hi Mack.

    I understand these functions in excel, but how do we utilize it to perform data analytics?
    What are the common purposes of performing data analytics? This area is still a little vague for me.

    Thanks.

    Kyle

    Like

    • Kyle,
      That’s a great question because understanding AND PERFORMING these functions is foundational to analyzing data (call it analytics, whatever).

      First, analytics is basically analyzing data to understand it, catch errors, find efficiencies, and gather insights.

      For example, are we charging the right amount for car insurance?

      When people are young, they pay more for car insurance. When they are really old, they tend to have more accidents, so they are higher risk.

      So it’s critical to have the right birthdate for your customers. One auditor, when validating birthdates in a database, discovered that some children were older than their parents (birthdate entry error), so the children never paid more during their early driving years.

      So you need to validate that your data is correct. I got into this more in my post, Why You Must Validate Data.

      If your file has duplicate rows in your customer sales table, you sales will appear higher than they are due to the ‘extra’ sales.

      Also, profiling your data, such as summarizing the data by region or salesperson will show you who is selling the most or the least. It will provide you with an average to measure everyone against.

      And if you identify outliers (in this case, what one person sold the least or the most), you might want to look at that case more closely, especially in the case of the person who sells 50% more than everyone else.

      Another example: if you determine that customers who buy A also tend to buy B, you can have your website suggest B when they add A to the shopping cart.

      And the list goes on. Does that help?

      Like

      • Kyle

        As always, very informative. Thanks for clearing that up for me!
        In this case, I believe data analytics is just a tool, while what you do with the tool depends on what you want to achieve.

        Is that right?

        Like

        • Kyle,
          Analytics is a process, not a tool. It is how you clean, format, and prepare data before you analyze it or use it to generate insights, with the analysis piece the most critical piece. Of course your analysis quality depends on how well your clean/prepare your data and understand it (understanding comes from profiling the data).

          UPDATE: Kyle posted our discussion on his blog (see pingback below). Thanks, Kyle!

          Like

  13. Pingback: # 2 What is Data Analytics – Kyle Bits

  14. Pingback: The Analytic Staircase for Auditors | ITauditSecurity

  15. Pingback: Kyle and a Conversation about Analytics | ITauditSecurity

  16. Pingback: How to Profile Data | ITauditSecurity

  17. Pingback: Before You Analyze Data | ITauditSecurity

  18. Pingback: Before You Analyze Data2 | ITauditSecurity

  19. Pingback: How to Profile Data | ITauditSecurity

  20. Pingback: xLookup Coming to Excel Near You! | ITauditSecurity

  21. John

    Hi Mack,
    I found my way here browsing some of your older Excel related posts. You note you would update this as other features or functions caught your eye, and thought I would pass along a couple I am enjoying.
    TEXTJOIN is awesome! In your post on excel duplicates I noted that I use painful concatenations with the ampersand. TEXTJOIN kicks the crap out of those and it is an array function (without the need for CTRL+SHIFT+Enter), so you can include the UPPER and TRIM functions within your join. One of my favorite bonus uses is that it acts similar to Subtotal so if you have a column of email addresses you can use it with a semicolon delimiter and it generates a distribution list you can paste into the To: (BCC) field of an email. And like subtotal, when you filter the table this calculated text will truncate to just the filtered people.
    Next XLOOKUP,
    Begone INDEX(MATCH()) XLOOKUP rocks the Kasbah.
    Of course the POWER PIVOT tools kind of go without saying.

    Liked by 1 person

  22. Pingback: Most Popular Blog Posts of 2021 | ITauditSecurity

  23. Pingback: Dilbert Does Big Data | ITauditSecurity

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.