Developers Documentation
This page will give you the basics when you want to work on Lookyloo’s source code.
We assume here that you already installed the platform and gave it a try.
General guidelines
-
After editing a file, you can run the
stop
andstart
scripts to restart the platform and see the changes. It will restart everything, including the valkey databases, and can take a bot of time. See below if you’re working on a specific part of the tool and want to stop/start a specific module. -
You can disable individual services by commenting them out in
bin/start.py
-
All the services can be started/killed manually (
CTRL+c
to kill)
Scripts
-
start
: Starts all the services -
stop
: Triggersshutdown
, stop valkey databases. -
shutdown
: Requests the services to stop by putting ashutdown
key in valkey, wait for all of them to stop. -
run_backend
: Starts/stops valkey databases -
async_capture
: Triggers a capture from the queue. Multiple instances of this process can be run in parallel -
background_indexer
: Caches the captures in valkey, creates the tree pickles, adds the urls, cookies, and body hashes in indexes for fast lookup -
archiver
: Archives the old captures (default: 180+ days), maintain the indexes files in the capture directories -
processing
: Triggers actions happening once a day. Currently: creates the user-agent file of the users of the platform (if enabled) -
start_website
: Starts the website (launches gunicorn processes)
A few scripts are for maintenance only:
* rebuild_caches
: clears the cache database, they will need to be rebuild for lookyloo to work
* update
: Used in production to update lookyloo from git.
Website
All the files related exclusively to the web interface and the API are in the website
directory.
If you’re working on the web interface, you probably want to comment out start_website
in bin/start.py
,
and start/kill it manually.
If you change anything in website/web/static/ you must run tools/generate_sri.py
before restarting the website in order to update the SRI hashes
of each resources. If you don’t do that, your browser will refuse to load them and you will be frustrated.
Or set ignore_sri to true in config/generic.json in order to ignore the SRI hashes.
|
Modules
-
lookyloo.py
: Main class, does all the heavy lifting to trigger the capture and access the results. -
modules.py
: Connectors to 3rd party modules. -
indexing.py
: Indexer for URLs, hashes, and cookies (creation and access). -
context.py
: Manage contextualization of captures and specific entries. -
abstractmanager.py
: Main class for all the services. -
capturecache.py
: Gives access to a cached capture in a pythonic way. -
helpers.py
: A bunch of methods useful all over the project. -
exceptions.py
: The exceptions that can be emitted by different part of the project.