Skip to content
View dx2-66's full-sized avatar
  • St.Petersburg

Block or report dx2-66

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dx2-66/README.md

This is my page. There are many like it, but this one's mine.

  • 🥇 I'm proficient with:

    • torch, scikit-learn, scikit-optimize, gplearn, numpy/scipy/statsmodels, pandas, xgboost/catboost/lgbm, matplotlib/seaborn/plotly, albumentations, category_encoders, HF transformers/diffusers, llamacpp, qdrant, fastapi, celery, streamlit
  • 🥈 I know my way around:

    • spark, SQL, keras/tensorflow, OpenGL, OpenCV, CI/CD pipelines
  • 🎖 Since I started, I've learned how to:

    • never accept the null hypothesis;
    • stop misinterpreting p-values;
    • read impurity-based, permutation and SHAP feature importances properly;
    • spot the outliers in several dimensions at once with Mahalanobis distance;
    • combat skewness with QuantileTransformer/PowerTransformer;
    • calibrate;
    • avoid the PCA trap in classification;
    • engineer features with symbolic regression;
    • effectively finetune SD using Dreambooth technique;
    • use vector databases to supply context for language models;
  • 🎖 In the past year, I've learned:

    • the actual difference between Adam and AdamW;
    • proper separation of evaluation and decision making;
    • the power of generalized linear models;
    • numerous applications IoU may have;
    • the match between kernel regression and two-layer neural networks;
    • augmenting the embeddings rather than raw data;
    • that the LOOCV MSE for KNN equals the training MSE for (K+1)NN;
    • the importance of embeddings' distances for the reasoning of LLMs;
    • the difficulties of separating epistemic and aleatoric uncertainty;
    • how to evaluate ensembling quality using information theory;
    • how to beat OpenAI using a 7B model running on CPU.

Stackoverflow | DS Stackexchange | Cross Validated

Checkio

Pinned Loading

  1. bomzhgan bomzhgan Public

    Jupyter Notebook

  2. art-challenge art-challenge Public

    CV: Image classification with a few plot twists.

    Jupyter Notebook

  3. gravitational_wave_detection gravitational_wave_detection Public

    Kaggle G2Net Gravitational Wave Detection

    Jupyter Notebook

  4. kaggle_disaster_tweets kaggle_disaster_tweets Public

    NLP: is this tweet about a real disaster?

    Jupyter Notebook

  5. symbolic_regression symbolic_regression Public

    Symbolic regression: a powerful yet underrated tool.

    Jupyter Notebook 1

  6. data-science data-science Public

    Выполненные проекты по ML и анализу данных.

    Jupyter Notebook 1