Cookiecutter DR DS

Data science project template. Based on cookiecutter-data-science.

Prerequisites

Python 3.5 or 2.7
Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:

$ pip install cookiecutter

or

$ conda config --add channels conda-forge
$ conda install cookiecutter

Starting a new project

cookiecutter https://github.com/mmakowski/cookiecutter-dr-ds

The directory structure

The directory structure of your new project looks like this:

├── Makefile             <- Makefile with commands like `make data` or `make train`
├── README.md            <- The top-level README for data scientists using this project.
├── data
│   ├── external         <- Data from third party sources.
│   ├── processed        <- The final, canonical data sets for modeling.
│   └── raw              <- The original, immutable data dump.
│
├── docs                 <- A default Sphinx project; see sphinx-doc.org for details
│
├── models               <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks            <- Jupyter notebooks. Naming convention is a number (for ordering),
│                           the creator's initials, and a short `-` delimited description, e.g.
│                           `01.0-mm-initial-data-exploration`.
│
├── references           <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports              <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures          <- Generated graphics and figures to be used in reporting
│
├── requirements.txt     <- The requirements file for reproducing the analysis environment, e.g.
│                           generated with `pip freeze > requirements.txt`
│
└── src                  <- Source code for use in this project.
    ├── __init__.py      <- Makes src a Python module
    │
    ├── data             <- Scripts to download, generate and transform data
    │   └── process.py   <- [template] transforms the raw data into the canonical format and
    |                       splits into the training and holdout.
    │
    ├── models           <- Scripts to train, evaluate and export the pipeline
    │   ├── evaluate.py  <- [template] evaluates the trained pipeline on the holdout set.
    |   ├── export.py    <- [template] exports the trained pipeline to PMML.
    │   └── train.py     <- [template] Trains the pipeline.
    │
    └── visualization    <- Scripts to create exploratory and results oriented visualizations

First steps

To set up the development sandbox, run:

make create_environment
source activate <project name>
make requirements

Then make evaluate will train and evaluate a dummy model, and make export will export it to PMML.

Next steps

Put the raw data in data/raw, or create a script to download the data from source repository.
Edit src/data/process.py to transform the raw data into TSV files.
Edit src/models/train.py to specify how the model should be trained.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
docs		docs
tests		tests
{{ cookiecutter.repo_name }}		{{ cookiecutter.repo_name }}
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cookiecutter.json		cookiecutter.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cookiecutter DR DS

Prerequisites

Starting a new project

The directory structure

First steps

Next steps

About

Releases

Packages

Languages

License

mmakowski/cookiecutter-dr-ds

Folders and files

Latest commit

History

Repository files navigation

Cookiecutter DR DS

Prerequisites

Starting a new project

The directory structure

First steps

Next steps

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages