Skip to content

Explore Courses | Elder Research | Contact | LMS Login

Statistics.com Logo
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Undergraduate Degree Programs
    • Graduate Degree Programs
    • Massive Open Online Courses (MOOC)
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • News and Announcements
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Menu
  • Courses
    • See All Courses
    • Calendar
    • Intro stats for college credit
    • Faculty
    • Group training
    • Credit & Credentialing
    • Teach With Us
  • Programs/Degrees
    • Certificates
      • Analytics for Data Science
      • Biostatistics
      • Programming For Data Science – Python (Experienced)
      • Programming For Data Science – Python (Novice)
      • Programming For Data Science – R (Experienced)
      • Programming For Data Science – R (Novice)
      • Social Science
    • Undergraduate Degree Programs
    • Graduate Degree Programs
    • Massive Open Online Courses (MOOC)
  • Partnerships
    • Higher Education
    • Enterprise
  • Resources
    • About Us
    • Blog
    • Word Of The Week
    • News and Announcements
    • Newsletter signup
    • Glossary
    • Statistical Symbols
    • FAQs & Knowledge Base
    • Testimonials
    • Test Yourself
Student Login

Blog

Home Blog Conversations with Data Scientists about R and Python

Conversations with Data Scientists about R and Python

Died-in-the-wool software developers can get quite passionate about the relative virtues of one programming language or another, their debates sometimes threatening to transport you back to middle-school arguments about the greatest ballplayers of all time.  Though their computer passions find other outlets as well, data scientists also talk about software and programming.

As you plan your own personal skill development program in statistics, analytics, and data science, will you focus on Python?  R?  Both?  Something else?  For those whose work is primarily traditional statistical research, R is clearly preferred.  Data scientists, on the other hand, use both R and Python.  

The recruiting firm Burtchworks periodically surveys professionals in the field, and the firm makes a distinction between predictive analytics professionals (who deal with structured data) and data scientists (who deal with text and other unstructured data).  Burtchworks reports that R users are roughly split between the two groups, while Python is favored 2 to 1 by data scientists. Their definition of the roles is somewhat unique, and Burtchworks combines the groups for other analyses, but it is true that analysis of text and unstructured data is a useful sub-discipline to distinguish, and such practitioners do definitely lean toward Python.

I spoke with a number of data scientists to get a flavor of which they use, and why.

  1. Dave Shirley is a data scientist at a digital marketing agency, was trained in statistics, and uses mainly R.  It’s what he learned initially, and he finds it satisfies his needs:   “We use R for ad-hoc analysis, regular production of Excel and HTML reports, dash boarding (HTML). If there’s something we need to do programmatically the chances are we will do it using R.” 
  1. Niral Upadhyaya is a data scientist at Elder Research and, likewise, prefers R:  “Personally, I probably use R more than Python because it is more familiar or because it was already being used on the project, but I have used both. It really depends on the client, what they have approved, and the type of work we will be doing. For instance, in cybersecurity, I think Python is more prevalent since a lot of tools like Splunk are built upon it. Python also seems to be the choice when the problem needs deep learning. I probably use a mixture of SQL and R or Python for exploration and then I build models in either R or Python.” 
  1. Peter Gedeck is a Senior Data Scientist at Collaborative Drug Discovery where his work involves data collection and analysis, building and validating models, and finally making the models available to users either as web services or by embedding the whole process into applications. He also teaches the Predictive Analytics in Python series at Statistics.com.  “While I used R in the past, most of the functionality I require is now available in the main data science packages in Python. This together with the availability of excellent domain specific solutions for chem- and bioinformatics, makes Python my preferred language for data science.  I still use R for creating publication-quality graphs using ggplot. However, most of the time, my work requires embedding the analysis or model into a bigger system and in this case, a general programming language like Python is superior.” 
  1. Leanna Kent is a Data Scientist at Elder Research, who used to rely primarily on R but now also uses Python.  She works mainly in analysis and building models; others help in deployment.  “I prefer R, so if I have a choice I will use that.  I use Python when my projects require me to. While Python is easier to read, I find it more difficult to code.  With all of the different packages (base python, numpy, pandas) keeping track of data types is difficult, and I feel like I often have to hack into a solution.  I also prefer the visualization capabilities in R.
  1. Grant Fleming is also a Data Scientist at Elder Research whose work focuses on analysis, building models, and building datasets.  He uses both R and Python: “I use R for most billable work and data science/modeling tasks, Python for working with neural networks or text data.”
  1. Andrew Bruce is a Principal Research Scientist at Amazon:  “I use both, but now mostly R due to the type of work I’m doing – 1) exploratory analysis, 2) statistical modeling, and 3) experimental design.  I use Python for problems with bigger data involving ML and projects that need to be deployed into production. A vast majority of production code at Amazon for data science is based on Python.”
  1. Ramon Perez is the Director of UK Operations for Elder Research: “I use the Anaconda distribution of Python with Pandas, SciKitLearn, and Plotly.  Python as a general purpose language that is easier to put into production environments for clients and a lot of serious deep learning development is happening within Python.  However, the core statistical modeling packages in R are still the gold standard, especially for time series work and Bayesian techniques.”
  2. Finally, I also spoke to John Elder, the Founder of Elder Research, not about R vs. Python, but about data scientists and programming more generally: “It’s useful to distinguish the software engineering perspective from the data science perspective.  Professional software developers build software for wide distribution and must take the time along the way to “harden” their code – making it robust, efficient and error-free. This is essential for quality deliverables and a very valuable skill.  But not necessarlly during the discovery phase of a project! For data scientists, most code is written during the trial and error research and discovery phase; it would be a waste of precious time to ‘harden’ each iteration and branch along the way.”

Recent Posts

  • Oct 6: Ethical AI: Darth Vader and the Cowardly Lion
    /
    0 Comments
  • Oct 19: Data Literacy – The Chainsaw Case
    /
    0 Comments
  • Data Literacy – The Chainsaw Case
    /
    0 Comments

About Statistics.com

Statistics.com offers academic and professional education in statistics, analytics, and data science at beginner, intermediate, and advanced levels of instruction. Statistics.com is a part of Elder Research, a data science consultancy with 25 years of experience in data analytics.

 The Institute for Statistics Education is certified to operate by the State Council of Higher Education for Virginia (SCHEV)

Our Links

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team
  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

Social Networks

Facebook Twitter Youtube Linkedin

Contact

The Institute for Statistics Education
2107 Wilson Blvd
Suite 850 
Arlington, VA 22201
(571) 281-8817

ourcourses@statistics.com

  • Contact Us
  • Site Map
  • Explore Courses
  • About Us
  • Management Team

© Copyright 2023 - Statistics.com, LLC | All Rights Reserved | Privacy Policy | Terms of Use

By continuing to use this website, you consent to the use of cookies in accordance with our Cookie Policy.

Accept