Graph Database

 · 4 mins read

DAF (SIGMOD 2019)

Han, M., Kim, H., Gu, G., Park, K., & Han, W. S. Efficient Subgraph Matching: Harmonizing Dynamic Programming, Adaptive Matching Order, and Failing Set Together. Proceedings of the 2019 International Conference on Management of Data, 1429–1446.

DatabaseDomainVElabelavg-degDownload
YeastPPI3,11212,519718.04link
HumanPPI4,67486,2824436.91link
HPRDPPI9,46037,0813077.83link
EmailCommunication36,692183,8312010.02link
DBLPCollaboration317,0801,049,866206.62link
YAGOKnowledge Graph4,295,82511,413,47249,6765.31link

IDAR (VLDB 2021)

Kim, H., Min, S., Park, K., Lin, X., Hong, S. H., & Han, W. S. IDAR: Fast supergraph search using DAG integration. Proceedings of the VLDB Endowment, Volume 13, Issue 9. 1456–1468.

IDAR aims to solve the supergraph search task, where datasets are composed of multiple graphs. We provide number of connected components (N) and maximum number of vertices (VMax).

DatabaseDomainNVMaxDownload
AIDSMolecule42,687438link
NCIMolecule265,242342link
PubChemMolecule499,963801link

SymBi (VLDB 2021)

Min, S., Park, S. G., Park, K., Giammarresi, D., Italiano, G. F., & Han, W. S. Symmetric Continuous Subgraph Matching with Bidirectional Dynamic Programming. Proceedings of the VLDB Endowment, Volume 14, Issue 8. 1298-1310.

Symbi aims to solve the continuous subgraph matching task, where data is given as insertion/deletion operations on graph data. We provide the number of operation triplets. 90% of operations were considered as initial graphs, and remaining 10% acted as the update operation.

DatabaseDomainNum-TriplesDownload
LSBenchRDF23,317,563link(migrated) link(Archived)
NetflowTraffic Traces18,520,759link

VEQ (SIGMOD 2021, VLDBJ 2022)

Kim, H., Choi, Y., Park, K., Lin, X., Hong, S. H., & Han, W. S. Versatile Equivalences: Speeding up Subgraph Query Processing and Subgraph Matching. Proceedings of the 2021 International Conference on Management of Data, 925–937.

Subgraph search datasets contain multiple graphs, which therefore we provide average statistics.

DatabaseDomainNVavgEavgdownload
PBDSDNA, RNA, Protein6002,9393,064link
PCMProtein2004774,340link
PPIPPI204,94226,667link
IMDBCollaboration1,5001366link
REDDITSocial4,999509595link
COLLABCollaboration5,000742,457link

For subgraph matching, please refer to the datasets used in DAF

CRaB (ICDE 2021), DCQ (ICDE 2022)

Gu, G., Nam, Y., Park, K., Galil, Z., Italiano, G. F., & Han, W. S. Scalable graph isomorphism: Combining pairwise color refinement and backtracking via compressed candidate space. 2021 IEEE 37th International Conference on Data Engineering, 1368–1379.

Gu, G., Nam, Y., Park, K., Galil, Z., Italiano, G. F., & Han, W. S. Efficient Graph Isomorphism Query Processing using Degree Sequences and Color-Label Distributions. 2022 IEEE 38th International Conference on Data Engineering, 872–884.

We used the largest connected component when there are more than one.

DatabaseDomainVEavg-degDownload
HamsterSocial1,78812,47613.96link
GrQcCollaboration4,15813,4226.46link
HepThCollaboration8,63824,8065.74link
FacebookSocial4,03988,23443.69link
CondMatCollaboration21,36391,2868.55link
HepPhCollaboration11,204117,61921.0link
AstroPhCollaboration17,903196,97222.0link
BrightkiteSocial56,739212,9457.51link
PlusRouter283,872428,3843.02link
AmazonCo-Purchase334,863925,8725.53link
GowallaSocial196,591950,3279.67link
AdaptecCircuits870,5321,874,5074.31link
RoadNet(CA)Road1,957,0272,760,3882.82link
YoutubeSocial1,134,8902,987,6245.27link
BigblueCircuits3,795,0558,712,1384.59link
SkitterInternet1,694,61611,094,20913.09link
LiveJournalSocial3,997,96234,681,18917.35link