GLOBAL RESEARCH SYNDICATE
No Result
View All Result
  • Login
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
No Result
View All Result
globalresearchsyndicate
No Result
View All Result
Home Data Analysis

An Economist’s Guide To Becoming A Chief Data Scientist: Interview With Sandip Bhattacharjee

globalresearchsyndicate by globalresearchsyndicate
August 1, 2020
in Data Analysis
0
An Economist’s Guide To Becoming A Chief Data Scientist: Interview With Sandip Bhattacharjee
0
SHARES
7
VIEWS
Share on FacebookShare on Twitter

“Very early into my career, I had realised that the majority of decisions in our lives are very similar to an optimisation exercise.”

For this week’s ML practitioner’s series, Analytics India Magazine got in touch with Sandip Bhattacharjee, Chief Data Scientist at Tabsquare.ai. Sandip is also a 4x Kaggle expert. In this interview, he shares his experience from his data analytics journey that spans over a decade.

AIM: Can you talk about your education and your introduction to the world of data science? 

Sandip: As a student, I was always fascinated by the utility of statistics and econometrics to explain different aspects of consumer behaviour. And, as far as my academics are concerned, I am an Economist with a Masters in Economics from JNU with special papers in Statistics and Applied Econometrics. I did my graduation with a major in Economics along with minors in Mathematics and Statistics. My academics and fascination for statistics eventually ushered me into the world of data science. 

Very early into my career, I had realised that the majority of decisions in our lives are very similar to an optimisation exercise. For instance, when I am choosing between multiple routes to my office, I am either trying to minimise time taken or maybe trying to maximise my car’s fuel economy. I found direct correlation between real- life examples and machine learning algorithms, which too, are founded on the principles of ‘Loss Functions’. These algorithms also go through an optimisation exercise where the loss function is used to reduce the error in predictions. Once this was realised, I had experienced my moment of epiphany. The fact that this line of thought can be leveraged to use cases such as shaping consumer behaviour with data, only increased my fondness for Data Science.



AIM: Can You Talk About Your Data Science Journey?

Sandip: When I started my career in 2007, we didn’t have the word ‘Data Science’. Back in those days, use of Statistical models for descriptive, predictive and prescriptive analytics were all bundled into ‘Analytics’. My initial work in Analytics required me to build predictive models that mainly focussed around using family of Linear Regression (GLM, HLM, Ridge Regression etc.) and classical forecasting techniques (ARIMA, ARIMAX, ARCH, GARCH, VARMAX) using SAS as the main analytics tool. 

Over time I had to train myself on newer age Machine Learning techniques. I started with. Andrew Ng’s course on Deep Learning to prepare the basic foundations of ML before diving deep into studying a wide variety of topics to stay abreast with the state of the art in ML. Along with this, I also had to train myself on open source programming languages like R, and Python. 


W3Schools


AIM: As a Chief Data Scientist, what does your typical day look like? 

Sandip: Currently, I am working at Tabsquare.ai as VP and Chief Data Scientist. We are working on some of the most challenging and fascinating technology problems in the restaurant industry right now. This involves building state of the art (SOTA) in Menu Engineering, real-time recommendation engine and AI generated 1:1 promotions for customers. While in most cases the solution providers would stop at telling the next best action for their clients, the implementation of these solutions leaves a lot to be desired. 

At Tabsquare, we collect more than 4 million data points each day. The biggest challenge is to create robust and scalable AI solutions that can utilise this data to serve millions of customers in real-time. In most cases we are dealing with sub-millisecond latency of results being deployed on edge devices like mobiles, tablets, kiosks. We often have to hit the right balance between model performance and high throughput/low latency requirements.

AIM: Can you talk about the challenges you have faced?

Sandip: A successful data science project is 80% getting the ‘Data’ right and rest 20% is ‘Science’. In one of my projects 16 hours prior to an actual client presentation, we realised that our feature engineering pipeline had a case of target leakage within the k-fold CV regime. The model results were looking too good to be true and upon deeper inspection of all components we discovered this fatal flaw. The entire team stayed up through the night, corrected the feature engineering pipeline and updated the models. 

The updated models turned up nice and we were able to have a successful meeting. Even though this was a frustrating experience, it was one of the most valuable lessons for the entire team. In this context, accurate data is the fuel that powers the most sophisticated ML algorithms.



Contextualising a problem is very important because not every time you will find a straightforward application of what you read in a research paper and neither every business problem is about reaching the best possible value of an evaluation metric like we have in data science competitions.

That’s why I suggest aspirants to get into a habit of doing Exploratory Data Analysis (often called EDA) for any problem. EDA forms a solution foundation of good feature engineering and subsequently high-quality models. Once the foundation is created you can start making your way to advanced topics. For experienced professionals, keeping pace with new techniques and technologies is quite important. This involves reading up on the latest and greatest, knowing the foundation of the new techniques and trying to find time to code ML solutions end-to-end.

“While recruiting data scientists, the most important aspect I look for is ‘First Principles Thinking’.”

AIM: What does it take to make a good Data Scientist?

Sandip: If you want to stay competitive in this field, one should always find time to do some hands-on coding. It has helped me contextualise the last mile challenge of deploying scalable ML solutions much better. This in turn has helped me manage my teams and clients in a much more efficient manner.

While recruiting data scientists, the most important aspect I look for is ‘First Principles Thinking’. Once a candidate can break down a complicated problem into its basic building blocks it becomes much easier to formulate a set of steps that leads to the complete solution. I am never looking for someone who knows everything under the sun. However, when someone lists down certain types of ML models as their expertise, I would expect 100% clarity in concepts related to them. Clarity in data structures is also very crucial. In terms of programming I place more importance on semantics rather than syntax. Syntax is something even the most experienced programmers Google up on a daily basis. Finally, experience with code versioning tools is a plus.

See Also

My Journey In Data Science with Rahul

AIM: You are a Kaggle expert, what role do you think competitive platforms play in the DS ecosystem?

Sandip: This is often a highly polarising topic where one section believes that these competitions have no real value because in the real world you never get clean data like what you get in competitions. Whereas, the other section believes that doing competitions regularly gives you an edge. 

Being a 4X Kaggle expert, I hold a slightly different view in this regard. Data science competition platforms like Kaggle give you a view to a small yet very important section of a complete data science project. This corresponds to the sections of doing effective EDA, various feature engineering techniques and multiple ways of building highly accurate models. However, what these competitions don’t teach you is how to clean and massage the data to make it usable. Most importantly, they don’t teach the last mile challenge of deploying the scalable models and the art of stakeholder management. I personally use platforms like Kaggle to complement my learning from working on real life business problems. In the grand scheme of things both experience of solving real life business problems and knowledge gained from doing data science competitions has positive synergies on each other.

AIM: What Does Your ML Toolkit Look Like?

Sandip: My ML toolkit is a mixture of Python and Spark and looks like follows: 

  • Libraries: scikit-learn, Scipy, Statsmodels, LightGBM, xgboost, Tensorflow, cv2 and Transformers (by Huggingface). On Spark, I use MLlib a lot 
  • Hardware: My  personal Deep Learning Hardware setup consists of one custom made Desktop – 64GB RAM, 8GB NVIDIA GTX 1070Ti, 256GB SSD. The other is a Lenovo Legion laptop – 16GB RAM, 6GB NVIDIA RTX 2060, 1TB SSD. 
  • Cloud: I do utilise the free GPU and TPU quota provided by Kaggle & Google Colab. For office work, I have mostly worked on GCP and there we can customise the hardware as per the need of the specific project.

AIM: Any tips and recommendations for Data Science aspirants?

Sandip: If you are someone who is just getting started with data science, I would recommend the following: 

  • Prof. Andrew Ng’s course: to get the fundamentals sorted
  • Deep Learning’ by Ian Goodfellow, Yoshua Bengio and Aaron Courville: for mathematical foundations of Deep Learning without any code.
  • Courses/books by Dr. Adrian Rosebrock: for Computer Vision.
  • Full Stack Deep Learning course by Pieter Abbeel, Sergey Karayev and Josh Tobin: to move from solving DL problems from local system/Kaggle notebooks to full scale production level.

That said, for anyone starting off in Data Science, my suggestion would be to avoid trying to learn everything at once. One can follow a structured process where you can start with basics first. Start with basic regression and classification models and master the concepts end to end. 

To prepare for this ever-evolving field of data science one must have the mentality of a student and the zeal to keep learning. What is considered ‘advanced’ ML concept today may become ‘basic’ ML concept a few years down the line. I follow a three-pronged strategy to keep myself abreast with new developments in data science.

  • Keep reading new academic papers and articles in AI/ML
  • Keep programming skills handy by doing small personal projects and data science competitions whenever possible 
  • Finally, contextualise the new skills learnt on the previous two steps to the actual day-to-day business problems that we are trying to solve. 

Irrespective of how experienced you are in this field, the hunger to learn something new everyday is a key aspect in a Data Scientist’s journey.

Provide your comments below

comments


If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Related Posts

How Machine Learning has impacted Consumer Behaviour and Analysis
Consumer Research

How Machine Learning has impacted Consumer Behaviour and Analysis

January 4, 2024
Market Research The Ultimate Weapon for Business Success
Consumer Research

Market Research: The Ultimate Weapon for Business Success

June 22, 2023
Unveiling the Hidden Power of Market Research A Game Changer
Consumer Research

Unveiling the Hidden Power of Market Research: A Game Changer

June 2, 2023
7 Secrets of Market Research Gurus That Will Blow Your Mind
Consumer Research

7 Secrets of Market Research Gurus That Will Blow Your Mind

May 8, 2023
The Shocking Truth About Market Research Revealed!
Consumer Research

The Shocking Truth About Market Research: Revealed!

April 25, 2023
market research, primary research, secondary research, market research trends, market research news,
Consumer Research

Quantitative vs. Qualitative Research. How to choose the Right Research Method for Your Business Needs

March 14, 2023
Next Post
Trending News Corona impact on Production Checkweighers Market Research Report Analysis and Forecast till 2020-2025| Avery Weigh-Tronix,Thermo Fisher,Bizerba,ALL-FILL Inc.,Varpe contral peso

2020 Insights On the Covid-19 impact on5G Tester Market Research, Growth and Estimation Forecast by 2025|Anritsu, Keysight Technologies, Rohde & Schwarz, VIAVI Solutions, Spirent Communications – Owned

Categories

  • Consumer Research
  • Data Analysis
  • Data Collection
  • Industry Research
  • Latest News
  • Market Insights
  • Marketing Research
  • Survey Research
  • Uncategorized

Recent Posts

  • Ipsos Revolutionizes the Global Market Research Landscape
  • How Machine Learning has impacted Consumer Behaviour and Analysis
  • Market Research: The Ultimate Weapon for Business Success
  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA

Copyright © 2024 Globalresearchsyndicate.com

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights

Copyright © 2024 Globalresearchsyndicate.com