This is just a quick comparison of the edge swapping algorithms used in this package.
The first step is to create a bipartite graph for the test.
v1 <- sample(seq(2, 1e3, 2), 1e4, replace = TRUE)
v2 <- sample(seq(1, 1e3+1, 2), 1e4, replace = TRUE)
bgr <- tibble(node1 = as.character(v1),
node2 = as.character(v2)) %>%
unique() %>%
as_tbl_graph(directed = FALSE) %N>%
mutate(type = name %in% unique(as.character(v1)))
bgr
#> # A tbl_graph: 1001 nodes and 9799 edges
#> #
#> # A bipartite simple graph with 1 component
#> #
#> # Node Data: 1,001 x 2 (active)
#> name type
#> <chr> <lgl>
#> 1 898 TRUE
#> 2 266 TRUE
#> 3 374 TRUE
#> 4 574 TRUE
#> 5 910 TRUE
#> 6 202 TRUE
#> # … with 995 more rows
#> #
#> # Edge Data: 9,799 x 2
#> from to
#> <int> <int>
#> 1 1 501
#> 2 2 502
#> 3 3 503
#> # … with 9,796 more rows
This graph has 1001 nodes and 9799 edges.
The first bipartite edge swapping method is bipartite_edge_swap()
which relies on the ‘igraph’ and ‘tidygraph’ APIs for handling all of the data and logically manipulates a tidygraph graph object. This was the easiest implementation, though I expected that the manipulation of high-level data types would be a time-suck. This first test runs the edge swapping algorithm to swap 100 edges 20 times.
m1 <- bench::mark(bipartite_edge_swap(bgr, N = 100),
iterations = 20, filter_gc = FALSE) %>%
dplyr::select(-expression)
m1
#> # A tibble: 1 x 5
#> min median `itr/sec` mem_alloc `gc/sec`
#> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 6.82s 7.38s 0.136 1.96GB 9.44
The second edge swapping algorithm implementation is in bipartite_edge_swap2()
. This method took more work on my end, though its use of vector manipulations rapidly increased the speed of the algorithm. (As it is much faster, I now run 50 iterations for the benchmark.)
m2 <- bench::mark(bipartite_edge_swap2(bgr, N = 100),
iterations = 50, filter_gc = FALSE) %>%
dplyr::select(-expression)
m2
#> # A tibble: 1 x 5
#> min median `itr/sec` mem_alloc `gc/sec`
#> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 506ms 625ms 1.54 227MB 11.7
This first test ran 100 edge swaps on the graph bgr
. The first method took 7.38 s on average (a median of 20 iterations) while the second took 625 ms on average. Thus, the second method is much faster. Another striking difference is the amount of memory each method was allocated - the first method took about 2 GB of RAM! It also called the garbage collector an order of magnitude more frequently.
The function bipartite_edge_swap3()
implements the same algorithm used in method 2 in C++.
m2_C <- bench::mark(bipartite_edge_swap3(bgr, N = 100),
iterations = 50, filter_gc = FALSE) %>%
dplyr::select(-expression)
m2_C
#> # A tibble: 1 x 5
#> min median `itr/sec` mem_alloc `gc/sec`
#> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 403ms 457ms 2.18 182MB 4.18
This is the fastest method, requiring 457 ms on average with even less RAM required.
To specifically test everything expect for the edge swapping, I re-ran the same test but only conducting 1 edge swap in each iteration.
# first method
bench::mark(bipartite_edge_swap(bgr, N = 1),
iterations = 50, filter_gc = FALSE) %>%
dplyr::select(-expression)
#> # A tibble: 1 x 5
#> min median `itr/sec` mem_alloc `gc/sec`
#> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 52.9ms 80.9ms 10.9 21.9MB 2.85
# second method
bench::mark(bipartite_edge_swap2(bgr, N = 1),
iterations = 50, filter_gc = FALSE) %>%
dplyr::select(-expression)
#> # A tibble: 1 x 5
#> min median `itr/sec` mem_alloc `gc/sec`
#> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 404ms 549ms 1.77 160MB 3.46
# second method in C++
bench::mark(bipartite_edge_swap3(bgr, N = 1),
iterations = 50, filter_gc = FALSE) %>%
dplyr::select(-expression)
#> # A tibble: 1 x 5
#> min median `itr/sec` mem_alloc `gc/sec`
#> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 411ms 569ms 1.76 160MB 3.59
Interestingly, the first method (based around the ‘igraph’ and ‘tidygraph’ APIs) is substantially faster to do all of the other steps in the edge swapping process. This is likely because the second method must first transform the bipartite graph bgr
into the list of vectors while the first method works directly on the tidygraph object (it only does this once per call to bipartite_edge_swap2
). Still, it may be worth poking around on the setup steps in the second method to improve its efficiency.
To further test this hypothesis, the difference should shrink if I use a smaller bipartite graph.
v1 <- sample(seq(2, 50, 2), 100, replace = TRUE)
v2 <- sample(seq(1, 51, 2), 100, replace = TRUE)
bgr <- tibble(node1 = as.character(v1),
node2 = as.character(v2)) %>%
unique() %>%
as_tbl_graph(directed = FALSE) %N>%
mutate(type = name %in% unique(as.character(v1)))
bgr
#> # A tbl_graph: 51 nodes and 95 edges
#> #
#> # A bipartite simple graph with 1 component
#> #
#> # Node Data: 51 x 2 (active)
#> name type
#> <chr> <lgl>
#> 1 10 TRUE
#> 2 30 TRUE
#> 3 40 TRUE
#> 4 38 TRUE
#> 5 32 TRUE
#> 6 2 TRUE
#> # … with 45 more rows
#> #
#> # Edge Data: 95 x 2
#> from to
#> <int> <int>
#> 1 1 26
#> 2 2 27
#> 3 3 28
#> # … with 92 more rows
This smaller graph has 51 nodes and 95 edges. I now re-run the test with only 1 edge swap.
# first method
bench::mark(bipartite_edge_swap(bgr, N = 1),
iterations = 50, filter_gc = FALSE) %>%
dplyr::select(-expression)
#> # A tibble: 1 x 5
#> min median `itr/sec` mem_alloc `gc/sec`
#> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 20.8ms 29.1ms 22.5 324KB 5.39
# second method
bench::mark(bipartite_edge_swap2(bgr, N = 1),
iterations = 50, filter_gc = FALSE) %>%
dplyr::select(-expression)
#> # A tibble: 1 x 5
#> min median `itr/sec` mem_alloc `gc/sec`
#> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 4.48ms 6.22ms 105. 202KB 4.20
# second method in C++
bench::mark(bipartite_edge_swap3(bgr, N = 1),
iterations = 50, filter_gc = FALSE) %>%
dplyr::select(-expression)
#> # A tibble: 1 x 5
#> min median `itr/sec` mem_alloc `gc/sec`
#> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 4.67ms 5.19ms 167. 202KB 9.99
Now, the second method is actually faster because it was easier to change to the vector manipulation than lug around the relatively “heavy” tidygraph object (see the mem_alloc
columns). You can see that the garbage collection intervened a few times (in the gc/sec
and n_gc
column).
For comparison, here is the same test using the heavily optimized edge swapping algorithm in ‘igraph’. While fast, this test does not preserve the bipartite nature of the graph, thus I cannot use it for ‘WExT’.