DeduDLB - Deduplication of Names and Co-Authoring Networks in the DBLP
This repository stores a dataset collected from the DBLP computer science bibliography, an on-line reference for bibliographic information on major computer science publications. This dataset includes approximately 15 million records collected in September/2016.
From this dataset, two sub-datasets were created:
- The first has the original database collected from the DBLP with name deduplication treatment.
- The second presents three co-authorship social networks built using the snowball sampling technique.
Data | # Records |
---|---|
Publications in articles | 1,505,020 |
Authors | 1,779,971 |
Publications in proceedings | 31,549 |
Publications in inproceedings | 1,861,226 |
Relation between authors and publications | 9,707,161 |
Total | 14,884,927 |
- DBLP - Contains the original dataset collected from DBLP
- DBLP_name_desambiguation - Contains collected data with ambiguous names resolved
- DBLP_social_networks - Contains three co-authorship social networks
[1] Mariana O. Silva, Michele A. Brandão. “Deduplicação de Nomes e Redes de Co-autoria na DBLP”. Em: SBBD Dataset Showcase Workshop, pp. 203-211. Uberlândia, MG.
If you would like to use the datasets, please cite our paper [1].