Cloudera Data Science Workbench demos

Basic tour of Cloudera Data Science Workbench.

Workbench

There are 4 scripts provided which walk through the interactive capabilities of Cloudera Data Science Workbench.

Basic Python visualizations (Python 2). Demonstrates:

Markdown via comments
Jupyter-compatible visualizations
Simple console sharing

PySpark (Python 2). Demonstrates:

Easy connectivity to (kerberized) Spark in YARN client mode.
Access to Hadoop HDFS CLI (e.g. hdfs dfs -ls /).

Tensorflow (Python 2). Demonstrates:

Ability to install and use custom packages (e.g. pip search tensorflow)

R Demonstrates:

Run R code on CDSW, showing arules library

Advanced visualization with Shiny (R) Demonstrates:

Use of 'shiny' to provide interactive graphics inside CDSW

Jobs

We recommend setting up a "Nightly Analysis" job to illustrate how data scientists can easily automate their projects.

Setup instructions

Note: You only need to do this once.

In a Python 3 Session:

!pip3 install --upgrade dask 
!pip3 install --upgrade keras 
!pip3 install --upgrade matplotlib==2.0.0. 
!pip3 install --upgrade pandas_highcharts 
!pip3 install --upgrade protobuf 
!pip3 install --upgrade tensorflow==1.3.0.
!pip3 install --upgrade seaborn
!pip3 install --upgrade numpy

Note, you must then stop the session and start a new Python session in order for all the packages to be seen.

In an R Session:

install.packages('sparklyr')
install.packages('plotly')
install.packages("nycflights13")
install.packages("Lahman")
install.packages("mgcv")
install.packages('shiny') 
install.packages("arules")
install.packages("readr")

Stop all sessions, then proceed.

‹

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
data		data
.gitignore		.gitignore
1_python.py		1_python.py
2_pyspark.py		2_pyspark.py
3_tensorflow.py		3_tensorflow.py
4_basket_analysis.r		4_basket_analysis.r
5_shiny.R		5_shiny.R
README.md		README.md
groceries.csv		groceries.csv
server.R		server.R
spark-defaults.conf		spark-defaults.conf
ui.R		ui.R
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloudera Data Science Workbench demos

Workbench

Jobs

Setup instructions

About

Releases

Packages

Contributors 4

Languages

andremolenaar/cdsw_workshop_azure

Folders and files

Latest commit

History

Repository files navigation

Cloudera Data Science Workbench demos

Workbench

Jobs

Setup instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages