What is Data Science
Data Science is about drawing useful conclusions from large and diverse datasets through exploration, prediction, and inference. Exploration involvesidentifying patterns in information. Prediction involves using informationwe know to make informed guesses about values we wish we knew. Inferenceinvolves quantifying our degree of certainty: will those patterns we foundalso appear in new observations? How accurate are our predictions? Our primarytools for exploration are visualizations and descriptive statistics, forprediction are machine learning and optimization, and for inference arestatistical tests and models.
Statistics is a central component of data science because statisticsstudies how to make robust conclusions with incomplete information. Computingis a central component because programming allows us to apply analysistechniques to the large and diverse data sets that arise in real-worldapplications: not just numbers, but text, images, videos, and sensor readings.Data science is all of these things, but it is more than the sum of its partsbecause of the applications. Through understanding a particular domain, datascientists learn to ask appropriate questions about their data and correctlyinterpret the answers provided by our inferential and computational tools.
This page was created by The Jupyter Book Community