Install Lookyloo
Requires Python 3.7 or higher, expects Ubuntu 20.04 or more recent. |
Install Lookyloo Dependencies
Install Poetry
Poetry is tool to handle dependency installation as well as building and packaging of Python packages.
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3
More details in the the installation guide.
Install Splash
Splash is a javascript rendering service with an HTTP API. It’s a lightweight browser with an HTTP API, implemented in Python 3 using Twisted and QT5.
You will need a running instance, and the preferred way is via Docker.
sudo apt install docker.io
sudo docker pull scrapinghub/splash
Install Redis
Redis: An open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker.
Redis should be installed from the source, and the repository must be in the same directory as the one you will be cloning Lookyloo in to. |
In order to compile and test redis, you will need a few packages:
sudo apt install build-essential tcl
git clone https://github.com/antirez/redis.git
cd redis
git checkout 6.0
make
# Optionally, you can run the tests:
make test
cd ..
Install Lookyloo
Prerequisites
-
Poetry is installed.
-
Redis is installed from the source. The repository must be in the same directory as the one you will be cloning Lookyloo in to.
-
Splash is installed.
Procedure
-
Clone the Lookyloo repository.
git clone https://github.com/Lookyloo/lookyloo.git
-
Change directory
cd lookyloo
-
Run the commands:
poetry install echo LOOKYLOO_HOME="'`pwd`'" > .env
If you have error messages about python 2.7, it is an issue with poetry,
and the quick and dirty fix is to edit ~/.poetry/bin/poetry and add a 3 at the end of the line:
|
#!/usr/bin/env python3
instead of
#!/usr/bin/env python
Configuration
-
Initialize the user configuration
cp config/generic.json.sample config/generic.json cp config/modules.json.sample config/modules.json
-
Edit the config files accordingly: see the
_notes
key in the JSON file. -
Make sure the configuration files are valid, and pull the 3rd party dependencies for the website
poetry run update --yes
Run Splash
-
From another terminal, run the following command:
sudo docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash --disable-browser-caches
# If you have a lot of ram at hand, you may want to run it this way:
# sudo docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash -s 100 -m 50000 --disable-browser-caches
Open the website
Unless you changed the default in config/generic.json
, the web interface will be reachable at http://0.0.0.0:5100/