Installation and usage¶
Installation¶
1. Clone the repository
git clone https://github.com/crawling-framework/crawling-framework.github.io.git
cd crawling-framework.github.io
2. Compile C++ code
make
3. Install python dependencies
pip install -r requirements.txt
(By default DGL library is configured for CPU. To use it with GPU visit https://www.dgl.ai/pages/start.html)
- Download and unpack archive with graph data
Manually download the archive and unzip it.
Usage¶
The first step is to add src directory to the python path:
export PYTHONPATH=src
Run 1 crawler from command line¶
Run from project folder:
python src/experiments/cmd.py -g <GRAPH> -c <CRAWLER> -n <RUNS>
At the very first run it takes some time to compile cython code.
To see available options type:: python experiments/cmd.py -h
Reproduce experiments from the WSDM23 paper¶
To obtain all the results from Table 4 one can run all configurations:
python src/experiments/paper_experiments.py
but this can take very long time (up to several weeks). Edit the file paper_experiments.py to run a proper configuration.
Once a crawler has finished its job, its result is saved to a corresponding file in results/ folder. Script python src/experiments/paper_plots.py will collect statistics of all the computed results.