-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more info on working with categorical data #16881
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done. Few comments and suggestions.
|---------|---------------------| | ||
|98052|00...01| | ||
|98100|00...10| | ||
||| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
||| |
@@ -18,7 +18,7 @@ Data is often unclean and sparse. ML.NET machine learning algorithms expect inpu | |||
|
|||
Sometimes, not all data in a dataset is relevant for analysis. An approach to remove irrelevant data is filtering. The [`DataOperationsCatalog`](xref:Microsoft.ML.DataOperationsCatalog) contains a set of filter operations that take in an [`IDataView`](xref:Microsoft.ML.IDataView) containing all of the data and return an [IDataView](xref:Microsoft.ML.IDataView) containing only the data points of interest. It's important to note that because filter operations are not an [`IEstimator`](xref:Microsoft.ML.IEstimator%601) or [`ITransformer`](xref:Microsoft.ML.ITransformer) like those in the [`TransformsCatalog`](xref:Microsoft.ML.TransformsCatalog), they cannot be included as part of an [`EstimatorChain`](xref:Microsoft.ML.Data.EstimatorChain%601) or [`TransformerChain`](xref:Microsoft.ML.Data.TransformerChain%601) data preparation pipeline. | |||
|
|||
Using the following input data which is loaded into an [`IDataView`](xref:Microsoft.ML.IDataView): | |||
Using the following input data and load it into an [`IDataView`](xref:Microsoft.ML.IDataView): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd fix the tenses to match here. Using / load. We got feedback on this before though if we said "Use the following input data and load it into an IDataView" confused users because the code to load the data into an IDataView is not there. So it was worded as it currently is to make it sound like..."This is what the data looks like and we assume it's been loaded into an IDataView". The reason for not including the load code is we don't want to focus on loading code one way or another (file / enumerable). We want to focus more on the input data / transforms and resulting output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sure that makes sense. Let me have another look at the wording
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made them consistent and explicitly instructed to load into a variable called data
. Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
Co-Authored-By: Luis Quintanilla <[email protected]>
Co-Authored-By: Luis Quintanilla <[email protected]>
Co-Authored-By: Luis Quintanilla <[email protected]>
Co-Authored-By: Luis Quintanilla <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes. Looks good to me. The only thing I'd change is the unresolved suggestion in the One Hot Encoding table. There seems to be an extra blank row. If the intent is to show that there are other values in between I'd replace the blank column values with "...". Otherwise, I'd remove the extra row.
Thanks! I missed that. Will update. |
Fixes #13590
AB#1668720