Getting into Data Science
Having spent almost a year transiting into the world of data science — familiarizing with what it’s all about, the concept, the need-to-knows, and trying out a bunch of techniques in real life applications, there are a couple of things I feel a total newbie should know about data science: most of these things I didn’t know while starting out.
There are a lot of articles out there like this, talking about data science and what a newcomer should expect. However, many of these articles are from the experts, people who have been doing Data science for a long while or at least, have substantial knowledge and practice in the field. So, I thought writing about Data science from a ‘newbie, not so newbie’ perspective won’t be such a bad idea, at least to give tips to anyone who has heard about it and is wondering, “What in the world is Data science?” or “Would data science be a good career choice for me?” So, here are my three cents:
What is Data Science all about?
Data science is defined differently even among the experts. Personally, I like to define it in the simplest of terms— it is the process of finding meaning in data and presenting information about the data in a way a layman would easily understand. Of course, this is not the ‘all in all’ concerning the field but it’s a good summary you can hold on to if you ever wondered what it’s all about. The process of doing Data science (finding meaning in data) involves more in-depth analysis that I don’t want to overlook as it is very important to know before deciding to venture into the field. This is where programming and statistical skills come in very very handy, hence why I feel Data science is actually a multi-disciplinary field. Here is an analogy — As the two assistant referees are to the 1st official (main referee) in a football match, so Software engineering and Statistics are to doing Data science. Data science relies on both the ability to do Statistics(or Mathematics in general) as well as good knowledge in Software engineering (or programming in general). If the main referee doesn’t have his two assistants to help him, then I bet most of his calls (especially the offside calls) will be based on guess work or decisions that are outside the laws of the game.
Data science uses the knowledge of both topics to analyze data, find meaning in the data, and present information about the data in a way a layman would easily understand. With that being said, even within the field of data science, there are sub-fields one can specialize in but I won’t be going into that in this article.
But is Mathematics really important?
I always asked this question even before I started doing data science and I was never really convinced about the necessity of having good knowledge of Math/Statistics before doing data science, also because I came across a good number of people who practice data science without a mathematical background. Yes that is correct, data science (especially in machine learning) can be applied without knowing a lot of math (thanks to the numerous data science libraries out there) but then that will be treating it as a black box. If you want to understand the underlying, internal workings of models that help extract meaning from data, then yes Math is very important. I have a degree in Software Engineering but I decided to pursue a degree in Computer Science to better understand the ‘behind the scenes’ of data science and I can tell you, it’s a lot of Math/Statistics concepts.
So, do I have to get a degree in Data Science?
This is really down to personal preference and choice but you don’t have to. Of course, there’s a lot of advantages in getting one (especially if you might be interested in contributing to research later in the future) however, you can also learn how to practice data science from the comfort of your living room, thanks to the numerous MOOC’s out there that teach data science. I started out this way and I found Datacamp and Dataquest to be very useful. However, there are many others including Udacity, Udemy, Cousera that teach data science specialization.
Final Thoughts
You might be asking after reading through, “so is data science all about data, math and programming?”. In a way it is, but it’s even more. For example, you should be a good communicator, you should love analysis, you should be able to look at data at a glance and already start thinking of patterns and meaning that might be extracted from the data. You should also be critical — you’ll understand this more when you get into the field but for example, when creating models to solve machine learning problems, you must be critical of your model and evaluate until you find the ‘perfect’ model.
Data science is a very exciting field to go into. We live in a time now where data rules almost everything and is a major player in the economy of any nation. The power of data science to predict occurrences, instances and get knowledge from large, sometimes meaningless data sets and many more, is what makes it to be termed as “the sexiest job of the 21st century”.