Add more info on working with categorical data #16881

natke · 2020-01-29T19:41:09Z

Fixes #13590

AB#1668720

luisquintanilla

Nicely done. Few comments and suggestions.

docs/machine-learning/how-to-guides/prepare-data-ml-net.md

luisquintanilla · 2020-02-06T20:27:09Z

docs/machine-learning/how-to-guides/prepare-data-ml-net.md

+|---------|---------------------|
+|98052|00...01|
+|98100|00...10|
+|||


Suggested change

|||

docs/machine-learning/how-to-guides/prepare-data-ml-net.md

luisquintanilla · 2020-02-06T20:33:53Z

docs/machine-learning/how-to-guides/prepare-data-ml-net.md

@@ -18,7 +18,7 @@ Data is often unclean and sparse. ML.NET machine learning algorithms expect inpu

 Sometimes, not all data in a dataset is relevant for analysis. An approach to remove irrelevant data is filtering. The [`DataOperationsCatalog`](xref:Microsoft.ML.DataOperationsCatalog) contains a set of filter operations that take in an [`IDataView`](xref:Microsoft.ML.IDataView) containing all of the data and return an [IDataView](xref:Microsoft.ML.IDataView) containing only the data points of interest. It's important to note that because filter operations are not an [`IEstimator`](xref:Microsoft.ML.IEstimator%601) or [`ITransformer`](xref:Microsoft.ML.ITransformer) like those in the [`TransformsCatalog`](xref:Microsoft.ML.TransformsCatalog), they cannot be included as part of an [`EstimatorChain`](xref:Microsoft.ML.Data.EstimatorChain%601) or [`TransformerChain`](xref:Microsoft.ML.Data.TransformerChain%601) data preparation pipeline.

-Using the following input data which is loaded into an [`IDataView`](xref:Microsoft.ML.IDataView):
+Using the following input data and load it into an [`IDataView`](xref:Microsoft.ML.IDataView):


I'd fix the tenses to match here. Using / load. We got feedback on this before though if we said "Use the following input data and load it into an IDataView" confused users because the code to load the data into an IDataView is not there. So it was worded as it currently is to make it sound like..."This is what the data looks like and we assume it's been loaded into an IDataView". The reason for not including the load code is we don't want to focus on loading code one way or another (file / enumerable). We want to focus more on the input data / transforms and resulting output.

Ok sure that makes sense. Let me have another look at the wording

I've made them consistent and explicitly instructed to load into a variable called data. Let me know what you think

Co-Authored-By: Luis Quintanilla <[email protected]>

luisquintanilla

Thanks for making the changes. Looks good to me. The only thing I'd change is the unresolved suggestion in the One Hot Encoding table. There seems to be an extra blank row. If the intent is to show that there are other values in between I'd replace the blank column values with "...". Otherwise, I'd remove the extra row.

natke · 2020-02-12T15:52:05Z

Thanks! I missed that. Will update.

luisquintanilla · 2020-02-12T18:02:57Z

natke added 3 commits January 28, 2020 13:35

Add feature label article

106f98b

Add one hot encoding

a87478d

Added one hot encoding and hashing to data prep how-to

57b3e24

dotnet-bot added this to the January 2020 milestone Jan 29, 2020

dotnet-bot added the 📚 Area - ML.NET Guide label Jan 29, 2020

natke closed this Jan 29, 2020

natke reopened this Jan 29, 2020

natke added 2 commits January 29, 2020 14:26

Acrolinx and tidy up

b47c2ae

Fix xref

65105ad

natke closed this Jan 29, 2020

natke reopened this Jan 29, 2020

natke added 2 commits January 29, 2020 14:59

Fix typo

d1245e5

Remove one-hot encoding example, as the example is in the API docs

61e9eb4

natke marked this pull request as ready for review February 6, 2020 19:42

natke requested a review from luisquintanilla February 6, 2020 19:42

luisquintanilla reviewed Feb 6, 2020

View reviewed changes

natke and others added 5 commits February 11, 2020 12:44

Update docs/machine-learning/how-to-guides/prepare-data-ml-net.md

647091c

Co-Authored-By: Luis Quintanilla <[email protected]>

Update docs/machine-learning/how-to-guides/prepare-data-ml-net.md

600d8c3

Co-Authored-By: Luis Quintanilla <[email protected]>

Update docs/machine-learning/how-to-guides/prepare-data-ml-net.md

1b42d38

Co-Authored-By: Luis Quintanilla <[email protected]>

Update docs/machine-learning/how-to-guides/prepare-data-ml-net.md

e64cc4c

Co-Authored-By: Luis Quintanilla <[email protected]>

Made tenses consistent for loading enmerable data

781166b

luisquintanilla approved these changes Feb 12, 2020

View reviewed changes

Update after review

64c0352

natke merged commit e4c228a into dotnet:master Feb 12, 2020

BillWagner added dotnet-ml/svc and removed 📚 Area - ML.NET Guide labels Feb 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more info on working with categorical data #16881

Add more info on working with categorical data #16881

natke commented Jan 29, 2020 •

edited

Loading

luisquintanilla left a comment

luisquintanilla Feb 6, 2020

luisquintanilla Feb 6, 2020

natke Feb 11, 2020

natke Feb 11, 2020

luisquintanilla Feb 12, 2020

luisquintanilla left a comment

natke commented Feb 12, 2020

luisquintanilla commented Feb 12, 2020

Add more info on working with categorical data #16881

Add more info on working with categorical data #16881

Conversation

natke commented Jan 29, 2020 • edited Loading

luisquintanilla left a comment

Choose a reason for hiding this comment

luisquintanilla Feb 6, 2020

Choose a reason for hiding this comment

luisquintanilla Feb 6, 2020

Choose a reason for hiding this comment

natke Feb 11, 2020

Choose a reason for hiding this comment

natke Feb 11, 2020

Choose a reason for hiding this comment

luisquintanilla Feb 12, 2020

Choose a reason for hiding this comment

luisquintanilla left a comment

Choose a reason for hiding this comment

natke commented Feb 12, 2020

luisquintanilla commented Feb 12, 2020

natke commented Jan 29, 2020 •

edited

Loading