GLOBAL RESEARCH SYNDICATE
No Result
View All Result
  • Login
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
No Result
View All Result
globalresearchsyndicate
No Result
View All Result
Home Data Collection

Data Prep Still Dominates Data Scientists’ Time, Survey Finds

globalresearchsyndicate by globalresearchsyndicate
July 7, 2020
in Data Collection
0
Data Prep Still Dominates Data Scientists’ Time, Survey Finds
0
SHARES
15
VIEWS
Share on FacebookShare on Twitter

(BEST-BACKGROUNDS/Shutterstock)

Data scientists spend about 45% of their time on data preparation tasks, including loading and cleaning data, according to a survey of data scientists conducted by Anaconda. The company also analyzed the gap between what data scientists learn as students, and what the enterprises demand.

Data cleansing – fixing or discarding anomalous or wrong numbers and otherwise ensuring the data is an accurate representation of the phenomenon it is meant to measure — accounts for more than a quarter of average day for data scientists, followed by 19% for data loading (the “L” in ETL), according to Anaconda’s annual survey.

Data visualization tasks occupied for about 21% of their time, while model selection, model training and scoring, and model deployment each consume 11% to 12% of the day, the survey found.

“We were disappointed, if not surprised, to see that data wrangling still takes the lion’s share of time in a typical data professional’s day,” Anaconda wrote in its report, “2020 State of Data Science: Moving From Hype Toward Maturity.” “Data preparation and cleansing takes valuable time away from real data science work and has a negative impact on overall job satisfaction.”

It could be worse. In some surveys in the past, data prep tasks have occupied upwards of 70% to 80% of a data scientist’s time. That is why so many people have questioned the wisdom of asking highly skilled and highly paid data scientists to do the equivalent of digital janitorial work.

How data scientists spend their time (Image courtesy Anaconda “2020 State of Data Science: Moving From Hype Toward Maturity.”)

So, the sticky situation around asking data scientists to spend the bulk of their time preparing data for analysis continues. “This efficiency gap presents an opportunity for the industry to work on solutions to this problem, as one has yet to emerge,” Anaconda laments.

The 2020 State of Data Science is based on online surveys of nearly 2,400 people from more than 100 countries (not all of whom are data scientists themselves, although they work in the field). In addition to asking about common data science tasks, Anaconda inquired into the languages data scientists use, their favored toolkits, as well as identifying barriers to deployment of machine learning models and adoption of open source technology by other members of the data science team: developers, administrators, and line of business managers.

To no one’s surprise, Python dominated the language question. According to Anaconda’s survey, 47% of data scientists say they “always” use Python, while another 28% say they use it “frequently.” By comparison, only 10% of respondents say they “always” use R, which was the second most-used language in the survey. JavaScript, Java, C, C++, and C# were all in the mix, but Python simply dwarfed (or suffocated?) them in usage.

When it came to data science, it should come as no surprise that Anaconda’s own data science platform—which combines many of the most commonly used tools in the Python and R ecosystems into one easy-to-use bundle–was cited as the most-used toolkit (the sample of users Anaconda used for the survey may have had something to do with that). Interestingly, Anaconda says 44% of its users also use RStudio, which develops a suite of open source tools for R (and Python too).

Few people surveyed report not using Python (image courtesy Anaconda’s “2020 State of Data Science: Moving From Hype Toward Maturity.”

“But enterprises are using a number of tools and platforms to deliver on their data strategy, including a mix of proprietary, open-source, and hybrid solutions,” the Austin, Texas, company states in its report. “We hope to see expanded collaboration among industry players to ensure interoperability and harmonization among different tools.”

Why do people use open source technology? According to 27% of the data scientists surveyed by Anaconda, the number one reason is its utility. But developers and business managers see things differently, with 42% of developers citing “speed of innovation” compared to 26% for managers. Systems administrators, ever the misers, cited the economical (i.e. “free”) aspect of open source software as their number one draw.

A similar dynamic appears during deployment. Data scientists cited managing dependencies and environments as the biggest hurdles for models to production (cited by 39%), followed closely by a skills gap with Kubernetes and Docker (38%). However, developers and sys admins were most concerned about security (31% and 37%, respectively). Meanwhile, 27% of developers, meanwhile cited recoding that is often necessary to push Python and R models into production as a major roadblock to deployment.

In terms of skills, Anaconda cited Python, machine learning, and data visualization as the top three skills that students are learning. That jibes somewhat with the list of the top three skills that universities are teaching: Python, probability and statistics, and machine learning.

A gap exists between what students are taught and what enterprises expect (image courtesy Anaconda“2020 State of Data Science: Moving From Hype Toward Maturity”)

However, there’s little resemblance between these two lists and the skills that enterprises say they lack: big data management, advanced mathematics, and deep learning.

“Our study indicates that there are gaps between what enterprises need and what institutions teach,” Anaconda states. “Two of the most frequently-cited skills gaps among respondents working in enterprise environments–big data management (38% of respondents) and engineering skills (26%)–do not rank in the top 10 skills offered in university programs.”

Anaconda also delved into the obstacles preventing younger data scientists from getting their idea job (experience, technical skills, and soft skills were the top three), as well as some of the ethical concerns that data scientists might face when it comes to bias, privacy, diversity, automation, and advanced information warfare.

“Data science has the ability to be transformational for businesses, but our 2020 survey shows that both organizations and professionals in the space are still in the process of maturing,” Anaconda CEO and Co-Founder Peter Wang states in a press release. “From broadening the data science educational curriculum to being more intentional with open-source security, there are clear learnings here for the industry at large to implement in order to improve. We’ve seen positive progress in many of these areas, but there is still work to be done.”

You can download your copy of the 39-page report here.

Related Items:

Anaconda: Data Science Exiting Hadoop for the Cloud

The ‘Big Bang’ of Data Science and ML Tools

Is Python Strangling R to Death?

Related Posts

How Machine Learning has impacted Consumer Behaviour and Analysis
Consumer Research

How Machine Learning has impacted Consumer Behaviour and Analysis

January 4, 2024
Market Research The Ultimate Weapon for Business Success
Consumer Research

Market Research: The Ultimate Weapon for Business Success

June 22, 2023
Unveiling the Hidden Power of Market Research A Game Changer
Consumer Research

Unveiling the Hidden Power of Market Research: A Game Changer

June 2, 2023
7 Secrets of Market Research Gurus That Will Blow Your Mind
Consumer Research

7 Secrets of Market Research Gurus That Will Blow Your Mind

May 8, 2023
The Shocking Truth About Market Research Revealed!
Consumer Research

The Shocking Truth About Market Research: Revealed!

April 25, 2023
market research, primary research, secondary research, market research trends, market research news,
Consumer Research

Quantitative vs. Qualitative Research. How to choose the Right Research Method for Your Business Needs

March 14, 2023
Next Post
Paid Email Service Market SWOT Analysis by Key Players : Pabbly Email Marketing, Benchmark Email, SendinBlue

Paid Email Service Market SWOT Analysis by Key Players : Pabbly Email Marketing, Benchmark Email, SendinBlue

Categories

  • Consumer Research
  • Data Analysis
  • Data Collection
  • Industry Research
  • Latest News
  • Market Insights
  • Marketing Research
  • Survey Research
  • Uncategorized

Recent Posts

  • Ipsos Revolutionizes the Global Market Research Landscape
  • How Machine Learning has impacted Consumer Behaviour and Analysis
  • Market Research: The Ultimate Weapon for Business Success
  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA

Copyright © 2024 Globalresearchsyndicate.com

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights

Copyright © 2024 Globalresearchsyndicate.com