Developers Documentation

This page will give you the basics when you want to work on Lookyloo’s source code.

We assume here that you already installed the platform and gave it a try.

General guidelines

After editing a file, you can run the stop and start scripts to restart the platform and see the changes. It will restart everything, including the valkey databases, and can take a bot of time. See below if you’re working on a specific part of the tool and want to stop/start a specific module.
You can disable individual services by commenting them out in bin/start.py
All the services can be started/killed manually (CTRL+c to kill)

Scripts

start: Starts all the services
stop: Triggers shutdown, stop valkey databases.
shutdown: Requests the services to stop by putting a shutdown key in valkey, wait for all of them to stop.
run_backend: Starts/stops valkey databases
async_capture: Triggers a capture from the queue. Multiple instances of this process can be run in parallel
background_indexer: Caches the captures in valkey, creates the tree pickles, adds the urls, cookies, and body hashes in indexes for fast lookup
archiver: Archives the old captures (default: 180+ days), maintain the indexes files in the capture directories
processing: Triggers actions happening once a day. Currently: creates the user-agent file of the users of the platform (if enabled)
start_website: Starts the website (launches gunicorn processes)

A few scripts are for maintenance only: * rebuild_caches: clears the cache database, they will need to be rebuild for lookyloo to work * update: Used in production to update lookyloo from git.

Website

All the files related exclusively to the web interface and the API are in the website directory.

If you’re working on the web interface, you probably want to comment out start_website in bin/start.py, and start/kill it manually.

If you change anything in website/web/static/ you must run tools/generate_sri.py before restarting the website in order to update the SRI hashes of each resources. If you don’t do that, your browser will refuse to load them and you will be frustrated. Or set ignore_sri to true in config/generic.json in order to ignore the SRI hashes.

Modules

lookyloo.py: Main class, does all the heavy lifting to trigger the capture and access the results.
modules.py: Connectors to 3rd party modules.
indexing.py: Indexer for URLs, hashes, and cookies (creation and access).
context.py: Manage contextualization of captures and specific entries.
abstractmanager.py: Main class for all the services.
capturecache.py: Gives access to a cached capture in a pythonic way.
helpers.py: A bunch of methods useful all over the project.
exceptions.py: The exceptions that can be emitted by different part of the project.