How to Set Up Data Engineering in an Enterprise Setting: A Comprehensive Guide

Jarid McKenzie

Intro

Over the past 7-8 years, I have worked with many students that have recently graduated. They come out with tons of skills and are almost ready for the business world. Most start their careers doing some basic report building. Then, they gradually work their way into other parts of the data and analytics space. That is the exact same journey that I took more than 10 years ago, albeit on more primitive technology.

The challenge that I see is that people don’t have any experience with dynamic data. Data changes every time you look at it. This is the crux that makes data engineering difficult and so different than what you learn in school.

In this series, we will develop an environment and framework for strategic analytics. This is different from operational analytics. The primary difference is the type of decisions the system supports. Almost 15 years ago, Daniel Kahneman wrote an excellent book called Thinking Fast and Slow. This describes the difference between strategic and operational analytics.

“System 1” is fast, instinctive and emotional.
“System 2” is slower, more deliberative, and more logical.

System 1 is what operational analytics supports. For example, if you’re driving, and see a speed trap, you will immediately look down at your speedometer and make a decision. Am I going to get a ticket? See my post HERE on whether operational analytics are right for your situation.

Old Drawing AI Interpretation of a Speed Trap

System 2 is what strategic analytics supports. How much should I put in my RRSP (401K for any Americans out there)? What should I sell my car for? These types of decisions need more thorough analysis and what we’re going to build a system for.

What do we need?

I have modified the Wide World Importers database to make it more current. There was also a procedure that caused errors randomly, so I disabled it. You can find the repository on my GitHub page. We’re going to publish this to an Azure SQL database using Visual Studio Community Edition, and then a little later on develop another database to control the extraction from that (or other) source.

I am setting this system to run on Azure Synapse, but nearly everything should translate to Microsoft Fabric. That will probably be content for a future iteration of this series, once I feel better about the maturity.

I recognize that these services have a cost, but there are numerous ways to get free Azure credits for these types of endeavors.

These are the components we’re going to use or create in the next post:

Visual Studio Community Edition
- Data storage and processing option checked
Azure SQL Server
- Wide World Importers
Azure Synapse Analytics environment
- Associated Data Lake Gen 2
- Spark Pool (well create this later in the series)

In the next post, I’ll explicitly show how to setup the environment and configure it to automatically generate new data on a daily basis.

How to Set Up Data Engineering in an Enterprise Setting: A Comprehensive Guide

Intro

What do we need?

Related Posts

Analytic Data Engineering Part 2 – Creating Data to Analyze

Analytical Data Engineering: Part 1 – Provisioning the Database Source

Leave a Reply Cancel reply