What do you imagine when you hear the job title “Data Scientist”? Probably not a shabby looking white-collar worker with a stern look on their face. Right?
Perhaps that is why Harvard Business Review coined “Data Scientists” as “the sexiest job of the 21st century”. They wrote, “If ‘sexy’ means having rare qualities that are much in demand, data scientists are already there. They are difficult and expensive to hire and, given the very competitive market for their services, difficult to retain.”
Data Scientists are technical professionals with the training and curiosity to make discoveries in the world of data. Although the term “Data Scientist” is a recent popular choice on LinkedIn for anyone working with data, the field itself is not new. Thousands of Data Scientists had already been working in startups and companies at the time the HBR published its article. Moreover, the aim to make computers as intelligent as human beings has been pursued for nearly a quarter of a century. There are multiple reasons as to why Data Scientists have become so popular recently. For one, companies have been collecting increasing amounts of data for years ever since it became in vogue to do so, largely driven by the success large tech companies have had in profiting from the data they collect, and secondly from the advancements in technology that have allowed this data collection to become economical.
An enormous amount of data is now available to most large companies across all industries, but many have not been using this in an efficient and productive way. However, companies are now waking up to the realization that they need to make use of this huge amount of data currently accessible to them through their corporate databases. How much data? 44 Trillion Gigabytes by 2020 up from only 4.4 Trillion in 2013.
The volume and variety of data have created an opportunity for both those with the skills to make use of it and the businesses who have been collecting it. However, the industry is facing the shortage of skills and expertise that is required to handle the increasing demand from companies seeking to make use of their rich data. So much so that even individuals who have pursued computer science and technical programs at universities are being thrust into performing demanding data analytic positions in the workplace.
According to stats from University of California, Riverside, 1/3rd of the US News & World reports top 100 Global Universities offer degrees in Data Science. Of these 29 universities, only six offer data science programs at Undergraduate level; the rest are postgraduate degrees. The average class size of these data science programs is just 23 students. The University of California predicts that small class sizes in an already limited number of universities offering Data Science programs are unlikely to make a “meaningful dent in closing the global data science talent gap”. In simple economic terms, demand outstrips supply, and in this case by a meaningful margin. In 2017, IBM predicted that the annual demand for new data scientists, data developers, and data engineers will reach nearly 700,000 openings by 2020. Therefore, a mere 23 student class size from one university and roughly around 700 graduating students from all universities offering data science programs will not fulfill the fast-growing demand for people with data science skills.
In 2018 the average salary for a junior level data scientist is $115,000 and those managing a team of 10–15 members can demand salaries as high as$350,000. Meanwhile, the median years of experience for a data scientist dropped from 9 years in 2014 to 6 years in 2015. Globally, the demand for data scientists is projected to exceed supply by more than 50% by 2019. With more than 40% of companies believing their inability to recruit data scientists is hindering their ability to compete it is no wonder over 60% of businesses train their staff in-house.
There are two main approaches to help alleviate this skills shortage. Firstly, and one championed by AI superstar Andrew NG is to train more data scientists using non-traditional methods such as MOOCs (Massive Open Online Courses). While this is a brilliant way for current developers and other data-centric employees to “Skill Up,” it is not yet the solution to the bigger problem. I say “yet” because this fundamentally requires a change in behavior. Employers don’t yet place enough value on this type of education; many employers still look exclusively to the brand name universities when hiring. While this mentality is slowly changing, it is not coming quick enough to solve the problem in the short to medium term.
The second approach is to enable more people without data science skills to apply these complex techniques to company data easily. In essence, let Artificial Intelligence and Machine Learning solve its own problems. By using techniques that have been developed (including here at MindsDB) over the last few years, it is possible to mimic a data scientist such that even a non-technical individual could perform data analytics with just a few lines of code or a few clicks.
These two solutions are not mutually exclusive and will in tandem help companies use their data in a more meaningful way, driving cost savings and/or drive growth and revenue. For this to happen effectively there needs to be cultural changes inside organizations, resulting in better hiring policies and also a better use of tools and software that can solve many of the data problems they face without the need to expand headcount and hire an expensive data scientist.