Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more info on working with categorical data #16881

Merged
merged 13 commits into from
Feb 12, 2020
Prev Previous commit
Next Next commit
Add one hot encoding
  • Loading branch information
natke committed Jan 29, 2020
commit a87478d34245db79edccd5898f400b01413edab2
13 changes: 12 additions & 1 deletion docs/machine-learning/concepts/features-labels.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ A machine learning algorithm uses training data containing example features and

Features and labels must be numerical values in order to be processed by a machine learning algorithm

Often the available data is not numbers but rather text, images, and dates, and must be transformed into numbers before being processed by the training algorithm.
Often the available data is not numbers but rather text, images, and dates, and must be transformed into numbers before being processed by an ML.NET training algorithm.

## Categorical data

Expand All @@ -33,6 +33,17 @@ The transforms used to perform key value mapping are [MapValueToKey](xref:Micros

### One hot encoding

One hot encoding takes a finite set of values and maps them onto integers whose binary representation has a single `1` value in unique positions in the string. The following table shows an example with zip codes as raw values.

|Raw value|One hot encoded value|
|---------|---------------------|
|98052|00...01|
|98100|00...10|
|||
|98109|10...00|



### Hashing