Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week
Menu Close
  • Curriculum
    • Curriculum
    • About Us
    • Testimonials
    • Management Team
    • Faculty Search
    • Teach With Us
    • Credit & Credentialing
  • Courses
    • Explore Courses
    • Course Calendar
    • About Our Courses
    • Course Tour
    • Test Yourself!
  • Mastery Series
    • Mastery Series Program
    • Bayesian Statistics
    • Business Analytics
    • Healthcare Analytics
    • Marketing Analytics
    • Operations Research
    • Predictive Analytics
    • Python for Analytics
    • R Programming
    • Rasch & IRT
    • Spatial Statistics
    • Statistical Modeling
    • Survey Statistics
    • Text Mining and Analytics
  • Certificates
    • Certificate Program
    • Analytics for Data Science
    • Biostatistics
    • Programming for Data Science – R (Novice)
    • Programming for Data Science – R (Experienced)
    • Programming for Data Science – Python (Novice)
    • Programming for Data Science – Python (Experienced)
    • Social Science
  • Degrees
    • Degree Programs
    • Computational Data Analytics Certificate of Graduate Study from Rowan University
    • Health Data Management Certificate of Graduate Study from Rowan University
    • Data Science Analytics Master’s Degree from Thomas Edison State University (TESU)
    • Data Science Analytics Bachelor’s Degree – TESU
    • Mathematics with Predictive Modeling Emphasis BS from Bellevue University
  • Enterprise
    • Organizations
    • Higher Education
  • Resources
    • Blog
    • FAQs & Knowledge Base
    • Glossary
    • Site Map
    • Statistical Symbols
    • Weekly Brief Newsletter Signup
    • Word of the Week

Blog

Home » Blog » Blog Type » General Post » Conversations with Data Scientists about R and Python

Conversations with Data Scientists about R and Python

  • April 21, 2020
  • , 3:27 pm

Died-in-the-wool software developers can get quite passionate about the relative virtues of one programming language or another, their debates sometimes threatening to transport you back to middle-school arguments about the greatest ballplayers of all time.  Though their computer passions find other outlets as well, data scientists also talk about software and programming.

As you plan your own personal skill development program in statistics, analytics, and data science, will you focus on Python?  R?  Both?  Something else?  For those whose work is primarily traditional statistical research, R is clearly preferred.  Data scientists, on the other hand, use both R and Python.  

The recruiting firm Burtchworks periodically surveys professionals in the field, and the firm makes a distinction between predictive analytics professionals (who deal with structured data) and data scientists (who deal with text and other unstructured data).  Burtchworks reports that R users are roughly split between the two groups, while Python is favored 2 to 1 by data scientists. Their definition of the roles is somewhat unique, and Burtchworks combines the groups for other analyses, but it is true that analysis of text and unstructured data is a useful sub-discipline to distinguish, and such practitioners do definitely lean toward Python.

I spoke with a number of data scientists to get a flavor of which they use, and why.

  1. Dave Shirley is a data scientist at a digital marketing agency, was trained in statistics, and uses mainly R.  It’s what he learned initially, and he finds it satisfies his needs:   “We use R for ad-hoc analysis, regular production of Excel and HTML reports, dash boarding (HTML). If there’s something we need to do programmatically the chances are we will do it using R.” 
  1. Niral Upadhyaya is a data scientist at Elder Research and, likewise, prefers R:  “Personally, I probably use R more than Python because it is more familiar or because it was already being used on the project, but I have used both. It really depends on the client, what they have approved, and the type of work we will be doing. For instance, in cybersecurity, I think Python is more prevalent since a lot of tools like Splunk are built upon it. Python also seems to be the choice when the problem needs deep learning. I probably use a mixture of SQL and R or Python for exploration and then I build models in either R or Python.” 
  1. Peter Gedeck is a Senior Data Scientist at Collaborative Drug Discovery where his work involves data collection and analysis, building and validating models, and finally making the models available to users either as web services or by embedding the whole process into applications. He also teaches the Predictive Analytics in Python series at Statistics.com.  “While I used R in the past, most of the functionality I require is now available in the main data science packages in Python. This together with the availability of excellent domain specific solutions for chem- and bioinformatics, makes Python my preferred language for data science.  I still use R for creating publication-quality graphs using ggplot. However, most of the time, my work requires embedding the analysis or model into a bigger system and in this case, a general programming language like Python is superior.” 
  1. Leanna Kent is a Data Scientist at Elder Research, who used to rely primarily on R but now also uses Python.  She works mainly in analysis and building models; others help in deployment.  “I prefer R, so if I have a choice I will use that.  I use Python when my projects require me to. While Python is easier to read, I find it more difficult to code.  With all of the different packages (base python, numpy, pandas) keeping track of data types is difficult, and I feel like I often have to hack into a solution.  I also prefer the visualization capabilities in R.
  1. Grant Fleming is also a Data Scientist at Elder Research whose work focuses on analysis, building models, and building datasets.  He uses both R and Python: “I use R for most billable work and data science/modeling tasks, Python for working with neural networks or text data.”
  1. Andrew Bruce is a Principal Research Scientist at Amazon:  “I use both, but now mostly R due to the type of work I’m doing – 1) exploratory analysis, 2) statistical modeling, and 3) experimental design.  I use Python for problems with bigger data involving ML and projects that need to be deployed into production. A vast majority of production code at Amazon for data science is based on Python.”
  1. Ramon Perez is the Director of UK Operations for Elder Research: “I use the Anaconda distribution of Python with Pandas, SciKitLearn, and Plotly.  Python as a general purpose language that is easier to put into production environments for clients and a lot of serious deep learning development is happening within Python.  However, the core statistical modeling packages in R are still the gold standard, especially for time series work and Bayesian techniques.”
  2. Finally, I also spoke to John Elder, the Founder of Elder Research, not about R vs. Python, but about data scientists and programming more generally: “It’s useful to distinguish the software engineering perspective from the data science perspective.  Professional software developers build software for wide distribution and must take the time along the way to “harden” their code – making it robust, efficient and error-free. This is essential for quality deliverables and a very valuable skill.  But not necessarlly during the discovery phase of a project! For data scientists, most code is written during the trial and error research and discovery phase; it would be a waste of precious time to ‘harden’ each iteration and branch along the way.”

Subscribe to the Blog

You have Successfully Subscribed!

Categories

Recent Posts

  • Dec 14: Statistics in Practice December 11, 2020
  • PUZZLE OF THE WEEK – School in the Pandemic December 11, 2020
  • From Kaggle to Cancel: The Culture of AI December 11, 2020

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

Latest Blogs

  • Dec 14: Statistics in Practice
    December 11, 2020/
    0 Comments
  • PUZZLE OF THE WEEK – School in the Pandemic
    December 11, 2020/
    0 Comments
  • From Kaggle to Cancel: The Culture of AI
    December 11, 2020/
    0 Comments

Social Networks

Linkedin
Twitter
Facebook
Youtube

Contact

The Institute for Statistics Education
4075 Wilson Blvd, 8th Floor
Arlington, VA 22203
(571) 281-8817

ourcourses@statistics.com

© Copyright 2021 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept