A decade ago, it was rather easy to understand your relationship to data. In most cases, you were either a database administrator, business analyst, or consumer of data. As a database administrator you ‘owned’ the system and how the data was structured and stored. As a business analyst, you were the go-to ‘data person’ who could access the data and manipulate it into reports and graphs. As the data consumer, you received the reports to use in your decision making.
Although some organizations still operate this way, most have migrated to more complex environments utilizing cloud infrastructure, having multiple sources of data – some which may be unstructured, leveraging tools like AI for automation and predictive capabilities, and employing business intelligence platforms. These advances in technology and complexity of how we collect, store, engineer, and use our data has resulted in an explosion of data related roles. In today’s organization you will see more modern titles such as Data Analyst, Data Engineer, and Data Scientist as well as others affiliated with data such as a Machine Learning (ML) Engineer.
Now that we have labeled some of the job titles let us look at each of the roles to define what types of tasks they do and the skills they require.
- Data Engineers manage data throughout its lifecycle including designing, building, and maintaining data infrastructure whether on premise, in the cloud, or both. Key skills for Data Engineers are the ability to administer databases and other data repositories plus transforming or moving data between these platforms using technologies such as Hadoop and Spark. Other skills will include querying languages like SQL, programming languages like Python, C, or Java, and other ETL and database tools.
- Data Analysts obtain appropriate data, prepares it for analysis, and creates reports and visualization of data. Key skills for Data Analysts including visualization tools such as Microsoft Excel, Microsoft Power BI, and Tableau, querying languages like SQL, and for those with more advanced skills potentially using tools such as Jupyter Notebooks or RStudio, and programming in Python or R. You will also many times be explaining the data and its implications to others so will need to be a capable presenter as well.
- Data Scientists perform deeper analysis on the data including developing predictive models to solve more complex data problems. Key skills for Data Scientists include have a strong mathematical foundation in the areas of probability, statistics, linear algebra, and calculus, familiarity with tools such as Jupyter Notebooks and RStudio, programming languages such as Python or R, and modeling with Pandas, PyTorch and similar environments. Data Scientists typically will also possess some domain expertise.
- Machine Learning Engineer creates algorithms and models that use data to learn and generate predictive outcomes. Key skills for Machine Learning Engineers include programming skills in Python or R, mathematical foundations in linear algebra, probability, statistics, and multivariate calculus, ML algorithms, familiarity with tools such as Jupyter Notebooks and RStudio, and data modeling.
For those of us who been in the industry for a longer period, we can see that the modern data roles are still built on specific skills and tools. The difference is that now we need a broader set of skills that include both vendor-agnostic and vendor-specific tools regardless of role. In addition, we need to continue to refresh our knowledge and skills at a more rapid pace than ever before. But for many of us, that is exactly why we chose a profession in technology – to be challenged throughout our careers.
CertNexus is a vendor-neutral certification body, providing emerging technology certifications and micro-credentials for Business, Data, Development, IT, and Security professionals. CertNexus’ mission is to assist in closing the emerging tech global skills gap while providing individuals with a path towards establishing rewarding careers in Cybersecurity, Data Science, Internet of Things, and Artificial Intelligence (AI)/Machine Learning.