-
Aarhus University
- Aarhus, Denmark
- lassehansen.me
Highlights
- Pro
Stars
An extremely fast Python package and project manager, written in Rust.
A collection of ETLs from common data formats to Medical Event Data Standard
Open-source scientific and technical publishing system built on Pandoc.
MTEB: Massive Text Embedding Benchmark
Code for the paper: Reimagining Synthetic Data Generation through Data-Centric AI: A Comprehensive Benchmark (NeurIPS 2023)
fabric is an open-source framework for augmenting humans using AI. It provides a modular framework for solving specific problems using a crowdsourced set of AI prompts that can be used anywhere.
This repository contains resources pertaining to the Danish Foundation Models workshop on Danish LLMs at the D3A conference, which was held by the Alexandra Institute and the Center for Humanities …
🪓 A CLI so you can easily chop and stack changes. https://stacking.dev/
Dataset and modelling infrastructure for modelling "event streams": sequences of continuous time, multivariate events with complex internal dependencies.
Building blocks for foundation models.
🍬 Confection: the sweetest config system for Python
A discussion forum for discussing alternatives way for to scientific publishing.
Tools for interactive visual exploration of semantic embeddings.
A project for training foundational Danish language model
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Converting irregularly spaced time series, such as eletronic health records, into dataframes for tabular classification.
A Scandinavian Benchmark for sentence embeddings
State-of-the-Art Text Embeddings
FEMR (Framework for Electronic Medical Records) provides tooling for large-scale, self-supervised learning using electronic health records
Remove duplicates and near-duplicates from text corpora, no matter the scale.
Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.
🌲 Monorepo for the PSYCOP research project. Prediction of disease from Electronic Health Records at Aarhus University.
Evaluation of language models on mono- or multilingual tasks.
Machine learning tools for running repeated nested cross(-dataset)-validation.