11 Ways to Really Reveal Data Maturity

If you want to measure (or audit) data maturity in your company, many different data maturity models exist. ISACA has one, the IIA has one, and the Big 4 all have one.

In fact, data maturity models are everywhere, and they can be helpful. But regardless of the score your company gets when rated by one of these models, the actual maturity is often much worse.

Here’s 11 ways to really reveal data maturity–while some of the following items are measured somewhat in some of these models, I like to stick my thumb in the air and determine how strong these headwinds really are.

-1- Fixing everything is not always possible or desirable, but are they at least stopping the bleeding?

In other words, when new data sources (or applications) are captured or created, are the proper edits in place to avoid creating bad data (e.g., text notes in a bank account field or an account number in an address field)?

Has a standard method of capturing or storing data been defined? Has it been enforced on at least new applications and data sources? For example, client names are always entered in separate fields such as prefix (Mrs., Dr.), first name, middle initial, last name, suffix (CPA, JD)? Do you have a plan for dealing with multiple first, middle, and last names?

If not, then your company really doesn’t have a data plan and everyone will continue to clean data into eternity.

-2- Do all or several departments have its own data team to manage all the work it takes to find, obtain, clean, and transform data?

As you increase your data maturity, the amount of time people have to spend finding, obtaining, cleaning, and transforming data should decrease.

Have you ever seen a measure like this on a data maturity assessment? I haven’t. If you see one, let me know.

Skyyler knows a company (big, well known) that wanted to create a single source of truth for key data used across the company by applications and data analysts. They took most of the data people from each department and moved them to the new global data team. After 1 year, the departments started hiring their own data staff again, because it took too long for the global team to process all the data requests and the data wasn’t any better than before. Everyone still had to do a lot of cleaning and formatting.

That same company is on its 3rd attempt to accomplish this, and the third time is not the charm by any means.

-3- The same fieldname exists in different tables with different meanings.

It gets worse; I’ve seen the same fieldname in 2 different tables in the same database. In one table, termination date meant the date an employee was terminated from the company; in another table, termination date meant the date the employee’s benefits ceased.

Even if you have a complete data dictionary with definitions, this can still be confusing; besides, does everyone consult the DD? I mean, everyone knows what termination date means, so why look it up?

-4- Speaking of a data dictionary, is it secret, is it safe?

Yes, Gandalf, it usually is secret in that it doesn’t exist, and no, it’s not safe for anyone to use that data, given its lack of maturity or lineage. If you don’t have lineage, how do you comply with the CCPA when requested to delete a customer’s data?

When data dictionaries do exist, they are about as up to date as disaster recovery plans.

When data is enhanced with other data, is that explained in the data dictionary? How do I know where the field value contains raw data (as provided by the client) or enhanced data (value was raised/lowered by the business due to factors X, Y, and Z).

-5- Does the data life cycle include reviewing how the data is used across the company?

If not, how does Legal, Audit, Compliance, and last and usually least, the data owner know the data is being used appropriately? Some datasets cannot be used for some purposes, by law.

-6- Is key data cleaned and key applications upgraded to avoid future problems, or is the data landed elsewhere and then transformed with all kinds of band aids?

A pig wearing lip gloss is still a pig, which eventually will someday become bacon.

-7- Is data regularly checked for accuracy?

When data is added or captured, is it checked? Is it also checked regularly to ensure it stays accurate during data edits, database upgrades, etc.?

For example, when a new employee is hired, does the record get checked to ensure all key fields are added? First name, last name, address, SSN, birthdate, date of hire, etc.?

Human resources don’t always get all the employee’s data right away (such as which health plan is active or Active Directory user ID), so does a job run each month at least to identify missing or incorrect data?

When one of my apps import employee data to run a scheduled fraud analysis, it sends me an email when data is missing or wrong. Usually the SSN is missing or not 9 digits, or 2 records exist for the same employee.

When I find this, I have to open a help desk ticket to note the problem, and in a week or so, it is fixed. Each time this occurs (once a quarter at least), I ask the DBA why they don’t have these types of routines in place. I won’t bother posting the stupid answers I’ve received.

-8- As data is imported and used elsewhere in the company, how do you know you got all the data and that it was transferred accurately, without dropping data or corrupting it?

Although it’s 2024, I still find a lot of teams use only record counts in verifying data was transferred correctly. Some don’t even do that; last month I received several files 1 KB in size (usually they are over 1 GB each).

In this case, no record counts were used, as the team wasn’t aware of the issue until I called (more on that in #9 below).

Record counts can only imply completeness, but even accurate record counts can give the wrong impression. If you’re expecting 100K records with 10 columns and only get 100K records with 1 column, does that work for you? How about 100K of corrupted gobbled-gook?

Totaling the dollar amount of all transactions in the original file and comparing it to the same dollar amount in the copied file is one example of a good data accuracy and completeness test. All key data in a table need to be tested.

-9- When a data process fails, does someone get notified and proactively fix it?

Regarding the 1K files I mentioned in #8, fortunately, MY PROCESS checks the size of data I receive, so I got an email notifying me of the size issue.

The business team was unaware of the failure (thanks for telling us, as other departments were probably also affected). The team had no alert that the job failed (technically, they said, the job didn’t fail as it transferred all the files, just not enough records. WHAT?).

When asked how I could find comfort the problem won’t reoccur, they said ‘because we fixed it.’

Yes, I pass this info to the IT audit team for consideration in future audits, but I never hear about it again.

-10- When you get data from an adhoc or a scheduled request, does the business team validate it first?

I regularly tell new auditors on the team that I usually have to request data 3x before I get correct data. The first time is the original request and the next 2 are to fix problems I found while validating the data because the team didn’t do get it right (you’d think after the 2nd request, they’d make sure they got it right? They don’t).

This often occurs because the business analysts and the DBAs working on the request are not familiar with the data or the products the company sells. If I ask for a list of all cleaning supplies sold for all regions, I am often missing a couple new products and occasionally a new sales region that was just added (or split off an existing one).

Make sure you always validate your data.

-11- Are most data jobs automated and use a system ID?

If this isn’t true, how does an auditor trust the data? How does the business?

A manual process can easily alter or ‘lose’ data, especially when a personal user ID is used. Also, who can find or run the query/process located when that person leaves?

Conclusion

Regardless of your official data maturity score, if you haven’t addressed the above issues, your data maturity teenager still needs some parental assistance.

 

 

Leave a comment

Filed under Audit, Data Analytics, Data Science, Technology

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.