After completing a two-year master programme last year in Data Science, I felt very competent in the field and ready to get into the work force to put theory into practice and hone my skills along the way. My programme track entailed courses in Statistics, Machine Learning, Big data, Natural Language Processing, Data preparation and visualization, Information theory, Information systems among others. In these courses, I learned a great deal of the core concepts in data science even though for me, the ones that stuck most were mainly centered around ML algorithms and modeling. My practical experience was organized around retrieving datasets, cleaning up the datasets, extracting features and implementing a model that could answer a given question or predict a given scenario. Now, it was not only limited to this but this was the main ‘juice’ during my study years.
Fast forward few months after graduating from my programme, it came as somewhat of a bummer when I got into the field and my job wasn’t going to be centered around implementing models and evaluating them with some Python or R code that runs on my laptop. Of course, things are way more complex than that in the real world and I knew this before taking the job. More so I knew my new job will entail having to stretch outside what would have been my preferred specialization to learn new things beyond the knowledge gained in University. To be more specific, my job title was ‘Data science engineer’ but I didn’t understand much about what that entails when I started the job, and even though there’s a lot I still have to learn, I would say I’m at a place now where my understanding of the different pillars/roles in data science has significantly grown.
A bit of more context — the job I took wasn’t at a Big-4, or at any giant IT firms; where data science teams, roles and functions are more structured and well defined. It was a job where I was going to be starting out as somewhat of a pioneer ‘data science guy’ and a ‘thought leader’ in that regard, in a scale-up company. It didn’t take long before I started to try to expand my expertise to cover other data science areas because I realized that to foster a more data-driven approach within the company, I would need to be able to drive conversations with all types of other colleagues from different domains like core engineers, analysts and even designers.
The main data science roles you’ll find within the job market today span across these three pillars/areas:
- Data (science) Engineer
- Data scientist
- Data analyst
These positions are not necessarily interchangeable but there is a considerable overlap between them sometimes. I won’t delve too deep into what skills and techniques are needed for each area because that isn’t the intention of this post but a brief summary of what each separate role entails is necessary. A data engineer is mainly focused on building infrastructure for storing, moving and handling data. In this role, knowledge about data pipelines, data mining, big data, distributed systems, software engineering, advanced programming is needed. Depending on your particular job role, it may not be mandatory to be skilled in all these competencies. A data scientist’s focus lies more on modeling data and using machine learning techniques to give a clearer understanding or meaning to how data will evolve over time. This means the data scientist needs to be skilled in any of these areas including math, statistics, advanced analytics, ML/AI. Vast knowledge of the tools used for implementing machine learning or AI algorithms is also very handy to have. A data analyst focuses on trying to make sense of the data i.e. taking the data and attempting to turn it into information that can be used for making a decision. This is also a very crucial role because of management’s need within any company to make decisions based on fact that will foster the company’s growth.
So who is a data science generalist? This is someone who has or has developed an assortment of competence in the different data science areas (analysis, engineering, modeling). A generalist would know just enough and have a nice balance about these different areas to get things going or done within a company. So why should you consider being a data science generalist?
- As a fresh graduate, being a generalist will broaden your opportunities within the ‘data science’ job market.
- Many start-up and scale-up companies don’t necessarily need data science specialists, they require people who have a broad view of problems of varying scopes and can offer reasonable solutions, since many foundations may still need to be laid within the company. So if you’re looking to start your career in the smaller companies, it is definitely something to consider.
- In the work field, being or learning to be a data science generalist also sets you up to be a technical leader or data executive at some point in your career. Hence if you have ambitions to lead data science/technical teams, it may be profitable to expand your knowledge/expertise across the different data science areas.
- If you’re a team player and collaboration is your thing, being a generalist will enhance your collaboration with various teams within your company. This is because you will likely shuffle between different teams since your knowledge is not limited to a particular specialization.
So far in my job, I have put on the data engineer cap in one week, the data analyst cap in another and done some experimental work in modeling even as we try to expand our data science capabilities. It’s been quite a good experience so far. I think the most important thing I’ve learned is to remain flexible and eager to learn, or as I like to put it, ‘get your hands dirty’. As a data science generalist, you must learn to embrace tasks, learn the skills you need to do them adequately, and get on with it.