BIG DATA — Data Science
About Data Scientists
Rising alongside the relatively new technology of big data is the new job title data scientist. While not tied exclusively to big data projects, the data scientist role does complement them because of the increased breadth and depth of data being examined, as compared to traditional roles.
So what does a data scientist do?
A data scientist represents an evolution from the business or data analyst role. The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math. What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge. Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization.
The data scientist role has been described as “part analyst, part artist.” Anjul Bhambhri, vice president of big data products at IBM, says, “A data scientist is somebody who is inquisitive, who can stare at data and spot trends. It’s almost like a Renaissance individual who really wants to learn and bring change to an organization.”
Whereas a traditional data analyst may look only at data from a single source — a CRM system, for example — a data scientist will most likely explore and examine data from multiple disparate sources. The data scientist will sift through all incoming data with the goal of discovering a previously hidden insight, which in turn can provide a competitive advantage or address a pressing business problem. A data scientist does not simply collect and report on data, but also looks at it from many angles, determines what it means, then recommends ways to apply the data.
Data scientists are inquisitive: exploring, asking questions, doing “what if” analysis, questioning existing assumptions and processes. Armed with data and analytical results, a top-tier data scientist will then communicate informed conclusions and recommendations across an organization’s leadership structure.
Fonte: IBM.COM = http://www-01.ibm.com/software/data/infosphere/data-scientist/
Analytics, Data Mining, Data Science Expert, KDnuggets President
Which Big Data, Data Mining, and Data Science Tools go together?
More Free Data Mining, Data Science Books and Resources
The list below based on the list compiled by Pedro Martins, but we added the book authors and year, sorted alphabetically by title, fixed spelling, and removed the links that did not work.
- An Introduction to Data Science by Jeffrey Stanton, Robert De Graaf, 2013.
An introductory level resource developed by Syracuse University
- An Introduction to Statistical Learning: with Applications in R by G. Casella, S, Fienberg, I Olkin, 2013.
Overview of statistical learning based on large datasets of information. The exploratory techniques of the data are discussed using the R programming language.
- A Programmer’s Guide to Data Mining by Ron Zacharski, 2012.
A guide through data mining concepts in a programming point of view. It provides several hands-on problems to practice and test the subjects taught on this online book.
- Bayesian Reasoning and Machine Learning by David Barber, 2012.
focusing on applying it to machine learning algorithms and processes. It is a hands-on resource, great to absorb all the knowledge in the book.
- Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners by Jared Dean, 2014.
On this resource the reality of big data is explored, and its benefits, from the marketing point of view. It also explains how to storage these kind of data and algorithms to process it, based on data mining and machine learning.
- Data Mining and Analysis: Fundamental Concepts and Algorithms by Mohammed J. Zaki, Wagner Meira, Jr., Data Mining and Analysis: Fundamental Concepts and Algorithms, Cambridge University Press, May 2014.
A great cover of the data mining exploratory algorithms and machine learning processes. These explanations are complemented by some statistical analysis.
- Data Mining and Business Analytics with R by Johannes Ledolter, 2013.
Another R based book describing all processes and implementations to explore, transform and store information. It also focus on the concept of Business Analytics.
- Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management by Michael J.A. Berry, Gordon S. Linoff, 2004.
A data mining book oriented specifically to marketing and business management. With great case studies in order to understand how to apply these techniques on the real world.
- Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery by Graham Williams, 2011.
The objective of this book is to provide you lots of information on data manipulation. It focus on the Rattle toolkit and the R language to demonstrate the implementation of these techniques.
- Gaussian Processes for Machine Learning by Carl Edward Rasmussen and Christopher K. I. Williams, 2006.
This is a theoretical book approaching learning algorithms based on probabilistic Gaussian processes. It’s about supervised learning problems, describing models and solutions related to machine learning.
Read the full post on KDnuggets: http://www.kdnuggets.com/2015/03/free-data-mining-data-science-books-resources.html
Principal Data Scientist at Booz Allen Hamilton
Data Science Declaration for 2015
Big Data Complexity Requires Fast Modeling Technology
With Prescriptive Analytics, the future ain’t what it used to be
Very interesting compilation published here, with a strong machine learning flavor (maybe machine learning book authors — usually academics — are more prone to making their books available for free). Many are O’Reilly books freely available. Here we display those most relevant to data science. I haven’t checked all the sources, but they seem legit. If you find some issue, let us know in the comment section below. Note that at DSC, we also have our free books:
There are several sections in the listing in question:
- Data Science Overviews (4 books)
- Data Scientists Interviews (2 books)
- How To Build Data Science Teams (3 books)
- Data Analysis (1 book)
- Distributed Computing Tools (2 books)
- Data Mining and Machine Learning (29 books)
- Statistics and Statistical Learning (5 books)
- Data Visualization (2 books)
- Big Data (3 books)
Here we mention #1, #5 and #6:
Data Science Overviews
Distributed Computing Tools
Data Mining and Machine Learning
- Introduction to Machine Learning (Amnon Shashua, 2008)
- Machine Learning (Abdelhamid Mellouk & Abdennacer Chebira)
- Machine Learning — The Complete Guide (Wikipedia)
- Social Media Mining An Introduction (Reza Zafarani, Mohammad Ali Abbasi, & Huan Liu, 2014)
- Data Mining: Practical Machine Learning Tools and Techniques (Ian H. Witten & Eibe Frank, 2005)
- Mining of Massive Datasets (Jure Leskovec, Anand Rajaraman, & Jeff Ullman, 2014)
- A Programmer’s Guide to Data Mining (Ron Zacharski, 2015)
- Data Mining with Rattle and R (Graham Williams, 2011)
- Data Mining and Analysis: Fundamental Concepts and Algorithms (Mohammed J. Zaki & Wagner Meria Jr., 2014)
- Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Goo… (Matthew A. Russell, 2014)
- Probabilistic Programming & Bayesian Methods for Hackers (Cam Davidson-Pilon, 2015)
- Data Mining Techniques For Marketing, Sales, and Customer Relations… (Michael J.A. Berry & Gordon S. Linoff, 2004)
- Inductive Logic Programming: Techniques and Applications (Nada Lavrac & Saso Dzeroski, 1994)
- Pattern Recognition and Machine Learning (Christopher M. Bishop, 2006)
- Machine Learning, Neural and Statistical Classification (D. Michie, D.J. Spiegelhalter, & C.C. Taylor, 1999)
- Information Theory, Inference, and Learning Algorithms (David J.C. MacKay, 2005)
- Data Mining and Business Analytics with R (Johannes Ledolter, 2013)
- Bayesian Reasoning and Machine Learning (David Barber, 2014)
- Gaussian Processes for Machine Learning (C. E. Rasmussen & C. K. I. Williams, 2006)
- Reinforcement Learning: An Introduction (Richard S. Sutton & Andrew G. Barto, 2012)
- Algorithms for Reinforcement Learning (Csaba Szepesvari, 2009)
- Big Data, Data Mining, and Machine Learning (Jared Dean, 2014)
- Modeling With Data (Ben Klemens, 2008)
- DSC Resources
- Additional Reading
- Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge
Bernard Marr é um LinkedIn Influencer
Best-Selling Author, Keynote Speaker and Leading Business and Data Expert
4 Things Big Data Can Do, and 3 Things It Can’t Do #bigdata http://ow.ly/3yjonn
4 Ways Big Data Will Change Every Business | SmartData Collectivehttp://www.smartdatacollective.com/bernardmarr/349932/4-ways-big-data-will-change-every-business
Big Data Decision Against Facebook: Implications For Google, Apple and 5,000 Other Companies
Here Are the Schools With Degrees in Data Science [List]
According to a recent report released by RJMetrics called The State of Data Science, there are 11,400 self-identifying Data Scientists on LinkedIn. The organization made it clear that it would only count people who proclaim to be data scientists, rather than going through a painstaking process of determining which skill sets made someone this type of professional and which did not.
Although RJMetrics anticipates that the number it came up with is a wild underestimate, it still shows a steep and recent increase. The report explains that its findings indicate that 52% of data scientists have entered the field within the past four years — meaning that colleges with degrees in data science are becoming an increasing focus. Given that, the question for many becomes: what are the best U.S. schools for data science and the ideal paths toward this sort of career?
“The demand is clearly there, as data science students are finding many job offers when they graduate and in a diverse marketplace.”
As you can imagine, because this niche but growing field is so new that most of the professionals don’t have educational backgrounds specific to Data Science. For the most part, data scientists have earned higher degrees — master’s and PhDs — but the subjects are all across the board. To be expected, the most popular courses of study are STEM-related. RJMetrics revealed that the top disciplines for data scientists with master’s degrees are Computer Science, Business Administration and Statistics. Meanwhile, for professionals holding PhDs, Physics, Computer Science and Mathematics are the most prevalent concentrations.
Schools are stepping up for Data Science
That being said, universities across the country are picking up on Data Science’s growing importance. Nowadays, businesses — regardless of their industry — are looking to accumulate, analyze and apply data to drive success in their space. To do that, it’s becoming key for employers to get their hands on Data Science talent. Select schools are stepping up and designing programs catered to developing students to thrive in this field.
There are Data Science — or something similarly named like Business Analytics or Data Mining — programs popping up in every U.S. state. In Massachusetts, schools like UMass and Worcester Polytechnic Institute (WPI), are among some of the first Data Science program pioneers.
“The demand is clearly there, as data science students are finding many job offers when they graduate and in a diverse marketplace,” said Elke Rundensteiner, professor at WPI — which has rolled out both a master’s and PhD program in Data Science in the past two years. “We are hearing from employers from marketing to cybersecurity to the pharmaceutical industry who have various data science needs. WPI has responded by digging deeper, offering more specific courses, and finding new intersections between disciplines.”
Because the field is so new and it’s being applied in so many different ways, universities are creating curricula that encompass a variety of disciplines needed to excel in Data Science. Most tracks are a melange of mathematics and IT, as well as business and computer sciences.
Schools aren’t solely focused on letting students earn degrees in this field. Universities around the country also know they play a crucial role in the Data Science community itself — namely, in its development.
“The demand for new methods and tools for big data is also growing,” explained Andrew McCallum, director and professor at UMass’ Center of Data Science. “Data science centers, like ours at UMass Amherst, bring the data users — industry and government — together with the data science researchers to create new technologies resulting in better decision making and the discovery of new knowledge.”
Where to go
UMass and WPI are hardly alone in jumping on the Data Science educational bandwagon. Here’s a comprehensive list of U.S. universities offering degree programs specifically in this emerging subject (with links to each school’s specific data-science program):
University of California Berkeley — Berkeley, CA
Chapman University — Orange, CA
Stanford — Stanford, CA
University of California San Diego — San Diego, CA
University of the Pacific — San Francisco, CA
University of Southern California — Los Angeles, CA
University of San Francisco — San Francisco, CA
Central Connecticut State University (CCSU) — New Britain, CT
University of Connecticut — Storrs, CT
American Sentinel University — Aurora, CO
University of Denver — Denver, CO
University of Central Florida — Orlando, FL
Catholic University of America — Washington, DC
George Washington University — Washington, DC
Georgetown University — Washington, DC
University of Iowa Tippie College of Business — Iowa City, IA
DePaul University — Chicago, IL
Illinois Institute of Technology — Chicago, IL
Northwestern University — Evanston, IL
University of Illinois Chicago Liautaud — Chicago, IL
University of Illinois at Urbana-Champaign — Urbana-Champaign, IL
University of Chicago Graham School — Chicago, IL
Indiana University Kelley School of Business — Bloomington, IN
Notre Dame — Notre Dame, IN
Purdue — Lafayette, IN
Saint Mary’s College — Notre Dame, IN
Northern Kentucky University — Highland Heights, KY
Lousiana State University — Baton Rouge, LA
Bentley — Waltham, MA
Brandeis — Waltham, MA
Harvard — Cambridge, MA
UMass Amherst Center for Data Science — Amherst, MA
WPI — Worcester, MA
University of Maryland — College Park, MD
Michigan State University — East Lansing, MI
University of Michigan Dearborn — Dearborn, MI
Winona State University — Winona, MN
University of Minnesota — Minneapolis, MN
North Carolina State University — Raleigh, NC
Saint Peter’s University — Jersey City, NJ
Rutgers — New Brunswick, NJ
Stevens Institute of Technology — Hoboken, NJ
Columbia New York, NY
Cornell — Ithaca, NY
Fordham — New York, NY
NYU — New York, NY
NYU Center for Data Science — New York, NY
Pace University — New York, NY and Westchester, NY
RPI — Troy, NY
Syracuse — Syracuse, NY
University of Rochester Institute for Data Science — Rochester, NY
The Ohio State University — Columbus, OH
University of Cincinnati — Cincinnati, OH
Xavier University — Cincinnati, OH
University of Oklahoma — Norman, OK
Carnegie Mellon University — Pittsburgh, PA
Drexel University — Philadelphia, PA
Saint Joseph’s University — Philadelphia, PA
College of Charleston — Charleston, SC
University of Tennessee — Knoxville, TN
Texas A&M University — Houston, TX
University of Texas Austin — Austin, TX
George Mason University — Fairfax, VA
Virginia Commonwealth University — Richmond, VA
University of Virginia — Charlottesville, VA
Originally published at anagauna.wordpress.com on October 17, 2015.