Real-life example: recommendations issued to a client

This is a real-life example: recommendations issued to a client after review of their environments and development processes.

The client had a lot ETL coded in Stored Procs and it wasn’t feasible to change that – but still some things could be improved a lot without breaking the bank or causing a revolution – as can be seen from below the basics were severely lacking.

The below could apply to many organizations, so it’s worth listing out:

  • All jobs, stored procedures, ddl, schemas need to be stored in Git
  • Peer review needs to be done for all the code. This helps developers sharing better design patterns among themselves, spotting bad solutions sooner, training weaker developers to get up to speed.
  • A senior developer/architect needs to guide others by reviewing their code from time to time, by explaining solution design.
  • Code needs to be refactored, there’s a lot of redundancy and copy-paste wastage
  • Confluence needed to be up-to-date, this will make it easy to onboard new team members.
  • Confluence is good for organizing and searching for information, thus it needs to have pages that describe onboarding steps, location of the shared drives, databases, source code, data sources, data lineage, ETL jobs, ETL inputs, outputs, views, consumers of data.
  • Data catalogue needs to be created and be up to date all the times, this is where data consumers come to see if their requests can be satisfied
  • Data dictionary needs to be up to date and used for every extract, no more point-to-point feeds where domain knowledge of each system assumed but not verified.
  • Data consumers need to provide up-to-date BRDs, even if a simple 3-page doc. BRDs need to be stored in confluence.
  • Jira tickets should be used for all requests. Workload should be measured and priorities set based on how quickly jira tickets get resolved, use Jira reports and charts for this.
  • DB performance needs to be baselined and measured on a regular basis. Every DB has tools to view work load, heaviest queries, hot spot objects in the database, most frequently executed queries that hammer database and slow down other requests, queries that bottleneck on disk reads. This will give user a clue about what needs to be optimized.
  • Every new ETL needs to be checked for performance, by having a senior developer or DBA review query execution plan (every DB allows viewing the plan before the query is run)
  • All jobs need to produce good logging messages for investigation when needed. Logs should be easy to access and every production problem investigation should start with reviewing logs.
  • Alerts of the errors are good. Warnings before the problems happen are better (disk space is close to capacity, CPU is continuously above 80%, memory is swapping, number of DB sessions is close to the max etc.)
  • Too many meetings? Make sending clear agenda before each meeting mandatory. This will allow people who don’t benefit from attending skip the meeting.
  • Make sending meeting minutes mandatory. This helps clarifying any potential misunderstanding during the meeting, and assigns clear action items to the attendees