The Beginning: Domains of Knowledge to Become Data Experts

Caesario Kisty
5 min readNov 7, 2022

--

Since 2019, I have been fascinated by everything about data. Start with Data Management that designs Data Governance effectively, creates a data pipeline like Data Engineers do and builds a model for predictive analytics like Data Scientists. I learned it all, even if it’s just on the surface.

One day I realized that must decide how part I should deep dive into first. Is it Data Management? Data Engineer? Data Analyst? or Data Scientist? Then I learned about “Garbage In, Garbage Out”. Due to the messy “Data Kitchen” at my company. I think jargon related to AI, Machine Learning, Deep Learning, and Data Science would be too fancy for them. Hence, I am embarking on this Data Engineering journey to tidy up our “Data Kitchen”.

Before writing too much about Data Engineering on this platform, I would like to share my first thought about Data Engineer, Data Analyst, and Data Scientist. I supposed that this thought is very important for everyone else — or at least for me — when decided to begin their career in Data. Data careers as well as any career require having a specific domain field or basic knowledge that helps us perform our duties. Particularly in data careers — regardless of whether it’s data engineer, analyst, or scientist — there are three domains of knowledge that should be possessed by everyone who wishes to advance their careers.

First, there is Math and Statistics domain. This domain of knowledge will help you to have logical and analytical thinking. Suppose you run a coffee shop, and you notice that people buy Ice Coffee more often than Hot Coffee in summer and vice versa in winter. In addition, some buyers who celebrate a birthday treat their buddies to coffee while hanging out or having a meeting. One day your boss wants to give a Summer Discount to a buyer that celebrates their birthday but only for specific dates. How do you convince your boss of what kind of coffee should be given a discount and when the discount is applied? I suppose it is impossible for you to make a decision based on only your guess, isn’t it? Your boss wants something more quantifiable that proves one decision is more to give your coffee shop engagement than another decision. In that sense, Math and Statistics can be used to predict and quantify the best decision for you.

Secondly, the Computer Science domain. Have you ever felt frustrated when your application response was so sluggish? Then you realized that your application loads a huge of data that should appear on the interface. How to deal with it? In fact that most of the data is created by a digital device and its volume is amplified by increasing internet usage. With Computer Science you will learn how to optimize your application, scale storage, and process data more efficiently.

Thirdly, Specific Industry domains. In the end, we are not only become a developer who only builds something looking fancy. No matter what industry we work in, we must become problem solvers in our company. Find the weakness in the process of production or opportunity that should be exploited, then created innovation for it. So there is no reason for us, at least to understand the context of how the entire of our industry works.

It is well known that each of the three Data Experts has a different specific role in the business. Data Engineer builds and maintains the Data Pipeline, monitors the storage and processing scalability, and ensures the quality of data. Data Analyst describes and reveals insights from raw data, creates dashboard visualizations, and also performs diagnostic analytics to discover why data is the way it is. Data Scientist builds and develops a model that predicts the future and generates insight by using prescriptive analytics regarding what decisions the organization should have taken to achieve its objectives. According to their specific roles, I believe each has a different proportion of the three Domains of Knowledge.

For simplicity, I will divide the proportion into two domains, Main Domain and Complementary Domain. Starting with Data Scientists, the main domain for them is Math and Statistics. You must find a suitable algorithm for your data in order to build a robust model. A suitable algorithm must be found by using mathematical or statistical assumptions, not only by guessing which one is most appropriate for your model. The same principle applies when testing the model. In addition, you also need complementary skills to support your model, consisting of how to use relevant programming tools and understand the metrics or performance indicators of your company. Hence, you need Computer Science and Specific Industry Domains as Complementary Domains.

How about a Data Engineer? It is more common to deal with System Engineering tasks, but the case is more specific for preparing data from upstream to downstream. Such as designing the Data Schema and Architecture, installing the relevant tools, building programs to transform and clean your data, testing your quality of data, and optimizing your data query. In addition, you must monitor the scalability of storage and processing. Definitely, it is more prevalent in the field of Computer Science. But you should become a problem solver, not only a developer. So that’s a not bad thing if you understand the context of what the data will be used for your organization’s metrics — or maybe, it’s a must, I guess — . Furthermore, you need to be able to analyze data or at least familiarize yourself with statistical terms. Suppose you are asked to create a table column with statistical information.

Lastly, I believe that Data Analyst is more versatile than the two others. It is possible because the main domain of Data Analyst is Specific Industry from any kind of industry. Assume that you are an expert in any Specific Industry, like Finance, Commerce, Sport, Healthcare, Government, etc. If you want to create a decision based on data — or Data-driven Organizations — , then you should have a sense of analytical thinking and programming tools to process your huge data. Consequently, you also need the Computer Science and Math/Statistics Domain to become your complementary domain.

Before I conclude this, I would like to clarify my thoughts.

  1. The distinction of the proportion of the Domains of Knowledge doesn’t mean creating a border between one bachelor’s degree with another. It is possible for a bachelor of computer science to become Data Scientist, or a bachelor of math/statistics to become Data Engineer. This article tells us about what knowledge you should have, not what bachelor’s degree you should get.
  2. Between the Main and Complementary Domains, it’s not about what is most important between a couple of them. However, when switching careers, we need guidance on the domain we should focus on first.

So you have been reaching the last part of this article. Have you thought about beginning or changing to a data career? I hope you now have an idea of the domain you must focus on first.

--

--

No responses yet