Skip to content

Commit

Permalink
Merge pull request #233 from bpedrood/master
Browse files Browse the repository at this point in the history
Adding HySGen to examples, and the hypergraph class (HGraph) to snap-core graphs.
  • Loading branch information
roks committed Feb 28, 2023
2 parents 0b73cda + 467ca80 commit 64bfc4d
Show file tree
Hide file tree
Showing 26 changed files with 6,344 additions and 3 deletions.
2 changes: 2 additions & 0 deletions examples/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@
!*/Makefile
*/.DS_Store
*/*.dSYM

Release
2 changes: 2 additions & 0 deletions examples/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ MakeAll:
$(MAKE) -C forestfire
$(MAKE) -C graphgen
$(MAKE) -C graphhash
$(MAKE) -C hysgen
$(MAKE) -C infopath
$(MAKE) -C kcores
$(MAKE) -C knnjaccardsim
Expand Down Expand Up @@ -63,6 +64,7 @@ clean:
$(MAKE) clean -C forestfire
$(MAKE) clean -C graphgen
$(MAKE) clean -C graphhash
$(MAKE) clean -C hysgen
$(MAKE) clean -C infopath
$(MAKE) clean -C kcores
$(MAKE) clean -C knnjaccardsim
Expand Down
40 changes: 40 additions & 0 deletions examples/hysgen/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Prerequisites
*.d

# Compiled Object files
*.slo
*.lo
*.o
*.obj

# Precompiled Headers
*.gch
*.pch

# Compiled Dynamic libraries
*.so
*.dylib
*.dll

# Fortran module files
*.mod
*.smod

# Compiled Static libraries
*.lai
*.la
*.a
*.lib

# Executables
*.exe
*.out
*.app
hysgen_main
*.sh

# Other
*_cmtyvv_*
*_cmty_*
backups/
results/
8 changes: 8 additions & 0 deletions examples/hysgen/.idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

29 changes: 29 additions & 0 deletions examples/hysgen/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
BSD 3-Clause License

Copyright (c) 2022, Bahman Pedrood
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
11 changes: 11 additions & 0 deletions examples/hysgen/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#
# Makefile for this SNAP example
# - modify Makefile.ex when creating a new SNAP example
#
# implements:
# all (default), clean
#

include ../../Makefile.config
include Makefile.ex
include ../Makefile.exmain
9 changes: 9 additions & 0 deletions examples/hysgen/Makefile.ex
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## Main application file
MAIN = hysgen_main
DEPH = $(EXSNAPADV)/hysgen.h
DEPCPP = $(EXSNAPADV)/hysgen.cpp
#CXXFLAGS += $(CXXOPENMP)
#CXXFLAGS += -g -rdynamic
#CXXFLAGS += -ggdb
#CXXFLAGS += -ggdb3 -rdynamic

51 changes: 51 additions & 0 deletions examples/hysgen/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
========================================================================
Hypergraph Simultaneous Generators (HySGen)
========================================================================

This program fits a probabilistic generative model to undirected, unweighted
hypergraphs to detect overlapping communities (node clusters) in hypergraphs.
It takes a hyperedge list, the number of communities to be discovered,
and several optional arguments as input, to produce a list of communities.
The details of the model and the community inference algorithm are described
in the following paper:

B. Pedrood, C. Domeniconi, and K. Laskey. "Hypergraph Simultaneous Generators." AISTATS 2022.

This code works under Windows with Cygwin with GCC, Mac OS X, Linux and other
Unix variants with GCC. To use with Visual Studio, you have to create a new
project for this program. Make sure that a C++ compiler is installed on the
system. Makefiles are provided, so you can complie the code in the command
line with the following command:
make all

/////////////////////////////////////////////////////////////////////////////

Parameters:
-i: Input [hyper]edgelist file url.
-o: Output file url + name prefix for the discovered communities.
-c: The number of communities to detect.
-op: Output file performance plot (Default: empty for no plot).
-ci: Community initialization file url (Default: empty).
-l: Url for node names file (Default: empty).
-mc: Minimum size of the communities(Default: 3).
-rs: Random Seed.
-xi: Maximum number of iterations (Default: 1000).
-ic: Initial membership value for the seed communities (Default: 0.1).
-in: The default membership value of each node to all the communities (Default: 0.03).
-rp: Ratio of initial memberships to be randomly perturbed (Default: 0.0).
-rw: Weight for l-1 regularization on learning the model parameters (Default: 0.005)
-sz: Initial step size for backtracking line search (Default: 0.05).
-sa: Control parameter for backtracking line search (Default: 0.1).
-sr: Step-size reduction ratio for backtracking line search (Default: 0.5).
-th: Cut-off threshold for the final community membership values (Default: the l-1 regularization value).

/////////////////////////////////////////////////////////////////////////////

Usage:

Discover 2 communities from the synthtic hypergraph (under synthetic_data/):

./hysgen_main -i:./synthetic_data/synthetic.hyperedges -o:./synthetic_res -c:2 -th:0.1 -rs:1


** For real-world hypergraph data please visit https://github.com/bpedrood/HySGen
86 changes: 86 additions & 0 deletions examples/hysgen/hysgen_main.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
#include "hysgen.h"
#include "agm.h"

int main(int argc, char* argv[]) {
Env = TEnv(argc, argv, TNotify::StdNotify);
Env.PrepArgs(TStr::Fmt("HySGen. build: %s, %s. Time: %s", __TIME__, __DATE__, TExeTm::GetCurTm()));
TExeTm ExeTm;

Try

const TStr InFNm = Env.GetIfArgPrefixStr("-i:", "./synthetic_data/synthetic.hyperedges", "Input [hyper]edgelist file url.");
const TStr OutFPrx = Env.GetIfArgPrefixStr("-o:", "./synthetic_res", "Output file url + name prefix for the discovered communities.");
int OptComs = Env.GetIfArgPrefixInt("-c:", 2, "The number of communities to detect.");
const TStr OutPlt = Env.GetIfArgPrefixStr("-op:", "", "Output file performance plot (empty for no plot).");
const TStr InitComFNm = Env.GetIfArgPrefixStr("-ci:", "", "Community initialization file url.");
const TStr LabelFNm = Env.GetIfArgPrefixStr("-l:", "", "Input file name for node names (Node ID, Node label).");
const int MinComSize = Env.GetIfArgPrefixInt("-mc:", 3, "Minimum size of the communities.");
const int RndSeed = Env.GetIfArgPrefixInt("-rs:", 0, "Random Seed.");
int MaxIter = Env.GetIfArgPrefixInt("-xi:", 1000, "Maximum number of iterations.");
const double InitComS = Env.GetIfArgPrefixFlt("-ic:", 0.1, "Initial membership value for the initially assigned communities.");
const double InitNulS = Env.GetIfArgPrefixFlt("-in:", 0.0, "The default membership value of each node to all the communities.");
double PerturbDensity = Env.GetIfArgPrefixFlt("-rp:", 0.0, "Ratio of initial memberships to be randomly perturbed.");
const double RegCoef = Env.GetIfArgPrefixFlt("-rw:", 0.005, "Weight for l-1 regularization on learning the model parameters.");
const double StepSize = Env.GetIfArgPrefixFlt("-sz:", 1.0, "Initial step size for backtracking line search.");
const double StepCtrlParam = Env.GetIfArgPrefixFlt("-sa:", 1.0, "Control parameter for backtracking line search.");
const double StepReductionRatio = Env.GetIfArgPrefixFlt("-sr:", 0.5, "Step-size reduction ratio for backtracking line search.");
const double Threshold = Env.GetIfArgPrefixFlt("-th:", MAX(0.01, RegCoef), "Cut-off threshold for the final community membership values.");


PHGraph G;
TIntStrH NIDNameH, NIDEdgelistnameH;
TStrIntH NameNIdH, EdgelistnameNIdH;
TStrHash<TInt> NodeNameH;
TVec<TFltV> WckVV;
TVec<TIntFltH> EstCmtyVH;
TVec<TIntV> EstCmtyVV;
if (InFNm.IsSuffix(".hgraph")) {
TFIn GFIn(InFNm);
G = THGraph::Load(GFIn);
} else {
G = THysgenUtil::LoadEdgeList(InFNm, NodeNameH);
NIDNameH.Gen(NodeNameH.Len()); NIDEdgelistnameH.Gen(NodeNameH.Len());
NameNIdH.Gen(NodeNameH.Len()); EdgelistnameNIdH.Gen(NodeNameH.Len());
for (int s = 0; s < NodeNameH.Len(); s++) {
NIDNameH.AddDat(s, NodeNameH.GetKey(s));
NIDEdgelistnameH.AddDat(s, NodeNameH.GetKey(s));
NameNIdH.AddDat(NodeNameH.GetKey(s), s);
EdgelistnameNIdH.AddDat(NodeNameH.GetKey(s), s);
}
}
if (LabelFNm.Len() > 0) {
TSsParser Ss(LabelFNm, ssfTabSep);
while (Ss.Next()) {
if (Ss.Len() > 1) {NIDNameH.AddDat(NameNIdH.GetDat(Ss[0]), Ss.GetFld(1)); }
}
}
printf("HyperGraph: %d Nodes %d Edges\n", G->GetNodes(), G->GetEdges());

TIntV NIDV;
G->GetNIdV(NIDV);

TExeTm RunTm;
THysgen Optimizer(G, RndSeed, InitComS, InitNulS);
Optimizer.ComInit(OptComs, MinComSize, PerturbDensity);
if (InitComFNm.Len() > 0) {
Optimizer.LoadComInit(InitComFNm);
}
Optimizer.SetRegCoef(RegCoef);

Optimizer.GetCmtyVV(EstCmtyVH, EstCmtyVV, WckVV, InitNulS, MinComSize);
THysgenUtil::DumpCmtyVH(OutFPrx + "_cmtyvv_init.txt", EstCmtyVH, NIDNameH, THysgenUtil::Alphabetical);

Optimizer.MLEGradAscent(1.0, MaxIter * G->GetNodes(), OutPlt, StepSize, StepCtrlParam, StepReductionRatio);
Optimizer.GetCmtyVV(EstCmtyVH, EstCmtyVV, WckVV, Threshold, MinComSize);

THysgenUtil::DumpCmtyVH(OutFPrx + "_cmty_SrtById_IdValues.txt", EstCmtyVH, NIDEdgelistnameH, THysgenUtil::Alphabetical);
THysgenUtil::DumpCmtyVH(OutFPrx + "_cmty_SrtByVals_IdValues.txt", EstCmtyVH, NIDEdgelistnameH, THysgenUtil::Value);
THysgenUtil::DumpCmtyVH(OutFPrx + "_cmty_SrtByVals_NameValues.txt", EstCmtyVH, NIDNameH, THysgenUtil::Value);
THysgenUtil::DumpCmtyVV(OutFPrx + "_cmty_SrtByName_Names.txt", EstCmtyVV, NIDNameH);

Catch

printf("\nrun time: %s (%s)\n", ExeTm.GetTmStr(), TSecTm::GetCurTm().GetTmStr().CStr());

return 0;
}
8 changes: 8 additions & 0 deletions examples/hysgen/stdafx.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
// stdafx.cpp : source file that includes just the standard includes
// cesna.pch will be the pre-compiled header
// stdafx.obj will contain the pre-compiled type information

#include "stdafx.h"

// TODO: reference any additional headers you need in STDAFX.H
// and not in this file
5 changes: 5 additions & 0 deletions examples/hysgen/stdafx.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#pragma once

#include "targetver.h"

#include "Snap.h"
2 changes: 2 additions & 0 deletions examples/hysgen/synthetic_data/ground_truth_comms.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
1 2 3 4 5 11 12 13 14 15 21 22 23 24 25
7 8 9 10 17 18 19 20 27 28 29 30
11 changes: 11 additions & 0 deletions examples/hysgen/synthetic_data/synthetic.description
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Supposed Scenario:

It's the first day of an academic year. Student gatherings have been recorded from some point during the spring semester last year. Assume there exist two communities of CS student and history students; the only social communities to which they belong. The members of these two communities are specified in the file "ground_truth_comms.txt".
The hypergraph for this example is a network of recorded gatherings in the university in the timeline mentioned before. Each gathering corresponds to a hyperedge that connects the attending students. The hypergraph is saved in "synthetic.hyperedges", where each line corresponds to the ID of the nodes in a hyperedge.
A regular graph equivalent of the hypergraph is represented in "synthetic.edges", where the list of the edges are stored. This graph is created by mapping a k-clique to a hyperedge of size k.


#################################
Community detection complication:

There are two large hyperedges in the hypergraph that make the problem of discoverying the communities complicated, which correspond to two outdoor welcome parties for the students. Nodes 36 through 71 in this hyperedges represent some passerbys who are not students, only joined the parties to enjoy the music, game and free food. Nodes 76 to 87 are new students, half (6) CS and half (6) histroy. The new students should not be correctly identified because the only gatherings they had so far has been an orientation, which has been gathered with 3 senior students of each major to talk about the dept for them; and of course the gathering of welcome party. In the party, they are divided into partiy groups (4-8) that are independent of their major.
Loading

0 comments on commit 64bfc4d

Please sign in to comment.