Skip to content

DeltaAnalytics/machine_learning_for_good

Repository files navigation

Introduction to Data Science for Social Good

How can we use data for social impact?

Data is powerful, and we believe that anyone can harness it for change.

In this introductory course, students will learn foundational theory and the necessary coding skills to translate data into actionable insights. Students will learn the latest machine learning tools and algorithms.

Data science is a highly interdisciplinary practice: demanding critical thinking, understanding of statistics, and technical coding ability. Irresponsible application of powerful algorithms or an inadequate exploration of underlying assumptions can lead to spurious results. In this course, we emphasize the fundamentals of proper data science and expose students to what is possible using sophisticated machine learning methods.

Each of the modules is hands-on, project-based, using real world data from KIVA, a non-profit that connects people through lending to alleviate poverty.

Delta Analytics is a 501(c)3 Bay Area non-profit dedicated to bringing rigorous data science to problem-solving, effecting change in nonprofits and the public sector, and making data science an accessible and democratic resource for anyone with the same mission.

Curriculum

Topics covered in this course include: supervised learning, unsupervised learning, ensemble approaches, recommendation algorithms, and text analysis (also called Natural Language Processing or NLP).

Algorithms covered in this course include: linear regression, decision trees, random forest, and k-means clustering.

The slides that cover the theory behind the code are available here. Our curriculum structure of presenting theory alongside a real-life long-form data science project will open doors to novices and professionals alike to harness the power of data for good.

Modules:

  1. Introduction
  • Who is Delta Analytics?
  • What is data science? What is machine learning?
  • Setting up your environment
  • Accessing the data
  1. Descriptive Statistics
  • Data validation and cleaning
  1. Feature Engineering

  2. Linear Regression

  3. Decision Trees

  • Ensemble approaches
  • Why use an ensemble approach?
  • Decision tree, random forest and bagging
  • Parametric vs. non-parametric models
  • What are hyperparameters and how do you choose them?
  1. Unsupervised Learning
  • Clustering
  • K-means algorithm

Outcomes of course

At the end of the course, students will:

  1. Have a solid understanding of the fundamental statistical and programming that underlying common data science methods.
  2. Be able to communicate with other data scientists using technical terms.
  3. Write code to clean, process, analyze, and visualize real world data.

Who is our target student?

The course is intended for any and all individuals interested in harnessing data towards solving problems in their communities. Minimal prior coding or mathematical/statistical experience is expected. Computer proficiency is necessary.

Our teachers

Delta Teaching Fellows are all data professionals working in the Bay Area. All of our time is donated for free to build out a curriculum that makes machine learning tools and knowledge more accessible to communities around the world. You can learn more about our team here.