This project provides an API and an user interface in order to convert any website into a Zim file.
All APIs are talking JSON over HTTP. As such, all parameters should be sent as stringified JSON and the Content-Type should be set to "application/json".
By posting to this endpoint, you are asking the system to start a new download of a website and a conversion into a Zim format.
- url: URL of the website to be crawled
- title: Title that will be used in the created Zim file
- email: Email address that will get notified when the creation of the file is over
- language: An ISO 639-3 code
- representing the language
- welcome: the page that will be first shown in the Zim file
- description: The description that will be embedded in the Zim file
- author: The author of the content
- job_id: The job id is returned in JSON format. It can be used to know the status of the process.
- 400 Bad Request will be returned in case you are not respecting the expected inputs. In case of error, have a look at the body of the response: it contains information about what is missing.
- 201 Created will be returned if the process started.
$ http POST http://0.0.0.0:6543/website-url url="https://refugeeinfo.eu/" title="Refugee Info" email="[email protected]" HTTP/1.1 201 Created { "job": "5012abe3-bee2-4dd7-be87-39a88d76035d" }
Retrieve the status of a job and displays the associated logs.
- status: The status of the job, it is one of 'queued', finished', 'failed', 'started' and 'deferred'.
- log: The logs of the job.
- 404 Not Found will be returned in case the requested job does not exist.
- 200 OK will be returned in any other case.
http GET http://0.0.0.0:6543/status/5012abe3-bee2-4dd7-be87-39a88d76035d HTTP/1.1 200 OK { "log": "<snip>", "status": "finished" }
Currently, the best way to install it is by retrieving the sources from github
$ git clone https://github.com/almet/zimit.git $ cd zimit
Create a virtual environment and install the project in it:
$ virtualenv venv $ venv/bin/pip install -e .
Then, run it how you want, for instance with pserve:
$ venv/bin/pserve zimit.ini
In a separate process, you also need to run the worker:
$ venv/bin/rqworker
And you're ready to go. To test it:
$ http POST http://0.0.0.0:6543/website-url url="https://refugeeinfo.eu/" title="Refugee Info" email="[email protected]"
sudo apt-get install httrack libzim-dev libmagic-dev liblzma-dev libz-dev build-essential libtool libgumbo-dev redis-server automake pkg-config
git clone https://github.com/wikimedia/openzim.git cd openzim/zimwriterfs ./autogen.sh ./configure make
Then upgrade the path to zimwriterfs executable in zimit.ini
$ rqworker & pserve zimit.ini
There are multiple ways to deploy such service, so I'll describe how I do it with my own best-practices.
First of all, get all the dependencies and the code. I like to have everything available in /home/www, so let's consider this will be the case here:
$ mkdir /home/www/zimit.notmyidea.org $ cd /home/www/zimit.notmyidea.org $ git clone https://github.com/almet/zimit.git
Then, you can change the configuration file, by creating a new one:
$ cd zimit $ cp zimit.ini local.ini
From there, you need to update the configuration to point to the correct binaries and locations.
# the upstream component nginx needs to connect to upstream zimit_upstream { server unix:///tmp/zimit.sock; } # configuration of the server server { listen 80; listen [::]:80; server_name zimit.ideascube.org; charset utf-8; client_max_body_size 200M; location /zims { alias /home/ideascube/zimit.ideascube.org/zims/; autoindex on; } # Finally, send all non-media requests to the Pyramid server. location / { uwsgi_pass zimit_upstream; include /var/ideascube/uwsgi_params; } }
[uwsgi] uid = ideascube gid = ideascube chdir = /home/ideascube/zimit.ideascube.org/zimit/ ini = /home/ideascube/zimit.ideascube.org/zimit/local.ini # the virtualenv (full path) home = /home/ideascube/zimit.ideascube.org/venv/ # process-related settings # master master = true # maximum number of worker processes processes = 4 # the socket (use the full path to be safe socket = /tmp/zimit.sock # ... with appropriate permissions - may be needed chmod-socket = 666 # stats = /tmp/ideascube.stats.sock # clear environment on exit vacuum = true plugins = python
[program:zimit-worker] command=/home/ideascube/zimit.ideascube.org/venv/bin/rqworker directory=/home/ideascube/zimit.ideascube.org/zimit/ user=www-data autostart=true autorestart=true redirect_stderr=true
That's it!