Developers Documentation

This page will give you the basics when you want to work on Lookyloo’s source code.

We assume here that you already installed the platform and gave it a try.

General guidelines

  • After editing a file, you can run the stop and start scripts to restart the platform and see the changes. It will restart everything, including the redis databases, and can take a bot of time. See below if you’re working on a specific part of the tool and want to stop/start a specific module.

  • You can disable individual services by commenting them out in bin/start.py

  • All the services can be started/killed manually (CTRL+c to kill)

Scripts

  • start: Starts all the services

  • stop: Triggers shutdown, stop redis databases.

  • shutdown: Requests the services to stop by putting a shutdown key in redis, wait for all of them to stop.

  • run_backend: Starts/stops redis databases

  • async_capture: Triggers a capture from the queue. Multiple instances of this process can be run in parallel (when multiple instances of splash are available, see production install guide)

  • background_indexer: Caches the captures in redis, creates the tree pickles, adds the urls, cookies, and body hashes in indexes for fast lookup

  • archiver: Archives the old captures (default: 180+ days), maintain the indexes files in the capture directories

  • processing: Triggers actions happening once a day. Currently: creates the user-agent file of the users of the platform (if enabled)

  • start_website: Starts the website (launches gunicorn processes)

A few scripts are for maintenance only: * rebuild_caches: clears the cache database, they will need to be rebuild for lookyloo to work * update: Used in production to update lookyloo from git.

Website

All the files related exclusively to the web interface and the API are in the website directory.

If you’re working on the web interface, you probably want to comment out start_website in bin/start.py, and start/kill it manually.

If you change anything in website/web/static/ you must run tools/generate_sri.py before restarting the website in order to update the SRI hashes of each resources. If you don’t do that, your browser will refuse to load them and you will be frustrated.

Modules

  • lookyloo.py: Main class, does all the heavy lifting to trigger the capture and access the results.

  • modules.py: Connectors to 3rd party modules.

  • indexing.py: Indexer for URLs, hashes, and cookies (creation and access).

  • context.py: Manage contextualization of captures and specific entries.

  • abstractmanager.py: Main class for all the services.

  • capturecache.py: Gives access to a cached capture in a pythonic way.

  • helpers.py: A bunch of methods useful all over the project.

  • exceptions.py: The exceptions that can be emitted by different part of the project.