Data catalogs and their value

Let’s quickly outline the main bullet points about the integral part of any data repository – a data catalog.

A good data governance starts with the business glossary. It’s obvious that for the data to make sense first we need to define what each data element means.

Methodology and choosing the approach to catalog your data and build trust to your data should strive to include the below points:

  1. Collect existing documents describing data, data dictionaries, domain values, data categories etc.
  2. Delegate some tasks to business to contribute to data catalog
  3. Delegate tasks to operations and technology teams for technical metadata
  4. It’s good to limit the number of stakeholders in order to achieve consensus sooner
  5. Make effort to automate metadata discovery, and combine it with manual entry
  6. Keep the metadata up-to-date and reliable
  7. Technology teams have to keep track of transformations applied to the data and how these change with time
  8. Stop reliance on reverse-engineering the metadata based on source code written. Cataloging the data needs to be part of the application and platform development. It has dual benefit of helping Dev teams to integrate with each other, thus saving cycles and costs, and contributing to the data catalog
  9. Ensure data quality activities are prioritized at the beginning of the data asset onboarding. It’s much more costly to perform these activities once the data has been productionalized.
  10. Plan for elimination of point to point relationships between data producers and consumers, by encouraging publish-once consume-many patterns. This saves cycles and costs for the producer system.
  11. Keep the data onboarding process simple and lightweight. If it takes 8 weeks to onboard a data element it will hurt data catalog adoption.