Benutzer:Dirk Hünniger/wb2pdf/install
General Remarks for all Operating Systems
[Bearbeiten]Use the Latest Version
[Bearbeiten]MediaWiki changes quickly, requiring frequent updates to mediawiki2latex. But the version in the operating systems repositories usually is not the latest version as found in the git repository. So we generally recommend that you update to the latest version as described below in the installation instructions for Debian.
Install Enough Memory
[Bearbeiten]Mediawiki2latex makes quite heavy use of memory. We recommend a host system with at least 8 GByte. This should be OK for documents of up to 100 pages of its output format. If LaTeX or PDF output is requested in combination with --bookmode, it should suffice for documents of almost arbitrary size. In all other cases the rule of thumb is 32 GByte of memory for 1000 pages of output. If a "killed" message appears in the terminal which you are running mediawiki2latex on, this usually means the available memory has been used up.
Installation on Ubuntu
[Bearbeiten]Follow the instruction for Debian below.
Installation on Debian
[Bearbeiten]mediawiki2latex 8.28 is included in the Debian Trixie distribution and works out of the box, still in some difficult cases it does not work properly. To work around this problem you should upgrade mediawiki2latex to the latest version from the git repository. You will also need to ensure that ImageMagick is installed and configured (see below).
First you need to install version 8.28 from the Trixie repository, enter (as root).
apt-get install mediawiki2latex
Then install the build-time and run-time dependencies and compile from source (as root):
apt-get update apt-get install -y mediawiki2latex apt-get install -y tzdata keyboard-configuration make chromium apt-get install -y ghc libghc-x509-dev libghc-pem-dev apt-get install -y libghc-regex-compat-dev libghc-http-dev cabal-install libghc-hxt-dev apt-get install -y libghc-split-dev libghc-blaze-html-dev libghc-file-embed-dev apt-get install -y libghc-hxt-http-dev apt-get install -y libghc-temporary-dev libghc-url-dev libghc-utf8-string-dev apt-get install -y libghc-utility-ht-dev libghc-http-conduit-dev libghc-happstack-server-dev apt-get install -y libghc-directory-tree-dev libghc-zip-archive-dev libghc-strict-dev apt-get install -y libghc-network-uri-dev libghc-tagsoup-dev libghc-word8-dev apt-get install -y ghostscript calibre latex2rtf libreoffice git apt-get install -y librsvg2-bin imagemagick apt-get install -y fonts-freefont-ttf texlive-xetex texlive-latex-recommended apt-get install -y texlive-latex-extra texlive-fonts-recommended texlive-fonts-extra apt-get install -y cm-super-minimal texlive-lang-all poppler-utils apt-get install -y lmodern texlive-plain-generic latex-cjk-common apt-get install -y fonts-cmu fonts-wqy-zenhei apt-get install -y djvulibre-bin pdftk libimage-exiftool-perl apt-get install -y texlive-science texlive-pstricks texlive-games libghc-hunit-dev apt-get install -y tzdata fonts-unifont apt-get install -y make curl texlive-extra-utils chromium-sandbox git clone https://git.code.sf.net/p/wb2pdf/git wb2pdf-git make -C wb2pdf-git make -C wb2pdf-git install
Finally, to enable image conversions, make sure that ImageMagick is installed and that it has permission to transform PS and PDF files to PNG.
Check whether ImageMagick is installed, and if not then run:
apt-get install imagemagick
Edit permissions (as root) in /etc/ImageMagick-7/policy.xml
<policy domain="coder" rights="read|write" pattern="PS" /> <policy domain="coder" rights="read|write" pattern="PS2" /> <policy domain="coder" rights="read|write" pattern="PS3" /> <policy domain="coder" rights="read|write" pattern="EPS" /> <policy domain="coder" rights="read|write" pattern="PDF" /> <policy domain="coder" rights="read|write" pattern="XPS" />
Note that this may entail some risks on a server machine, as explained in this piece on Solution to ImageMagick "not authorized" PDF Error by Bob Cromwell.
When processing large images it might be needed to update the maximum processable image sizes in the same file like this (/etc/ImageMagick-7/policy.xml):
<policy domain="resource" name="memory" value="8GiB"/> <policy domain="resource" name="map" value="8GiB"/> <policy domain="resource" name="width" value="100KP"/> <policy domain="resource" name="height" value="100KP"/> <policy domain="resource" name="area" value="10GP"/> <policy domain="resource" name="disk" value="20GiB"/>
Create /tmp mount point
[Bearbeiten]Since mediawiki2latex may take a lot of space in the /tmp directory is might make sense to mount /tmp to a separate partition. Like this: (Be aware that these command delete data on your lokal disk, so make sure you know what you are doing)
sudo -s mkfs.ext4 /dev/sdb nano /etc/fstab
add the line:
dev/sdb /tmp ext4 rw
Increase LaTeX buffer size
[Bearbeiten]When processing complex input it might be needed to increase the buffer size of lualatex. This is itself a quite complex operation. At first run in a terminal
kpsewhich texmf.cnf
This will point you to a file with you will need to investigate but not to change. In our case this file turned out to be:
/etc/texmf/web2c/texmf.cnf
In this file you will see a reference to a directory. In this directory you will find a file which file which you should edit. In our case this turned out to be:
/etc/texmf/texmf.d/00debian.cnf
in this file you have to add the line:
buf_size=10000000
after that you need to run (as root):
update-texmf
which will cause the change in:
/etc/texmf/web2c/texmf.cnf
cron job
[Bearbeiten]If needed create the following cronjob with crontab -e , to autofree disk space in temporary files.
02 4 * * * find /tmp/* -mtime +7 -exec rm {} \;
server
[Bearbeiten]If you like to run mediawiki2latex as server do the following steps.
1) become root:
sudo -s
2) start crontab editor
crontab -e
choose nano if promted to choose an editor
3) In the end of the file add the line:
@reboot screen -d -m /usr/bin/mediawiki2latex -s 80
press CTRL-X after that press y and press enter.
4) Reboot your system an visit localhost:80 in your browser. If you can see the mediawiki2latex server page everything is fine.
Installation on CentOS 7
[Bearbeiten]The instructions below apply to CentOS 7 (and likely CentOS 6). The primary concern with a CentOS 7 installation is to avoid the standard CentOS repository packages. Specifically the standard CentOS standard "epel" (Extra Packages for Enterprise Linux) repository contains ghc, cabal-install, and texlive, however, the versions in epel either provide incompatible versions (ghc and cabal) or are missing many components (texlive). Finally there are font dependancies that must be installed in order for MediaWike2LaTex to generate PDFs.
Prepare and Compile MediaWiki2LaTex
[Bearbeiten]The following versions of GHC, Cabal and Texlive are compatible with MediaWiki2PDF 7.33.
- Install the latest GHC compiler. As-of 2019-01, this is available/documented at: https://copr.fedorainfracloud.org/coprs/petersen/ghc-8.0.2
- Create /etc/yum.repos.d/petersen-ghc-8.0.2-epel-7.repo:
name=Copr repo for ghc-8.0.2 owned by petersen
baseurl=https://copr-be.cloud.fedoraproject.org/results/petersen/ghc-8.0.2/epel-7-$basearch/
type=rpm-md
skip_if_unavailable=True
gpgcheck=1
gpgkey=https://copr-be.cloud.fedoraproject.org/results/petersen/ghc-8.0.2/pubkey.gpg
repo_gpgcheck=0
enabled=1
enabled_metadata=1
yum disablerepo=epel install ghc cabal-install
cabal update
- Download and install the latest LaTex from: http://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz. This installer is a "live" install (it downloads install content as the install runs).
wget http://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz
tar xvzf install-tl-unx.tar.gz
cd install-tl-[build-date]
./install-tl
- Note the install-tl texlive is a lengthly install (5+ hours), optionally run the process in the background and disassociate it from the current login session:
- nohup sh -c "echo I | ./install-tl" > texlive-install.log 2>&1 &
- note the command avoid includes "echo I" for Install which is a required keyboard input to install-tl
- nohup will allow the install to run without being logged in.
- nohup sh -c "echo I | ./install-tl" > texlive-install.log 2>&1 &
- Download and install the latest mediawiki2latex source.
- git clone https://git.code.sf.net/p/wb2pdf/git wb2pdf-git
- cd wb2pdf-git
- cabal install
- All going well, this will result in a binary wb2pdf-git/dist/build/mediawiki2latex.
Install Fonts
[Bearbeiten]There are a fonts needed by mediawiki2latex that will not be available through the prior installation steps (e.g. GNU Freefont).
GNU Freefont
[Bearbeiten]- wget http://ftp.gnu.org/gnu/freefont/freefont-ttf-20120503.zip
- unzip freefont-ttf-20120503.zip
- cd freefont-20120503
- mkdir /usr/share/fonts/truetype/freefont
- cp *.ttf /usr/share/fonts/truetype/freefont
- fc-cache -f /usr/share/fonts
- Note: initiating fc-cache is not explicitly needed, however, this is generally good practice in order to fully register fonts in CentOS
Test Installion
[Bearbeiten]Installation on Windows
[Bearbeiten]1) Activate Linux Subsystem
- Go to Control Panel -> Programs -> Turn Windows Features On / Off
- The Windows Features Dialog will open
- Scroll to the bottom
- Enable Windows Subsystem for Linux
- Press OK
2) Use Docker
- Follow instructions here
Installation on other OS
[Bearbeiten]Use Docker
- Follow instructions here
Using Docker
[Bearbeiten]Some sucess was reached using docker
Dockerfile:
FROM debian:trixie ENV DEBIAN_FRONTEND=noninteractive RUN apt-get update RUN apt-get install -y mediawiki2latex RUN apt-get install -y tzdata keyboard-configuration make chromium RUN apt-get install -y ghc libghc-x509-dev libghc-pem-dev RUN apt-get install -y libghc-regex-compat-dev libghc-http-dev cabal-install libghc-hxt-dev RUN apt-get install -y libghc-split-dev libghc-blaze-html-dev libghc-file-embed-dev RUN apt-get install -y libghc-hxt-http-dev RUN apt-get install -y libghc-temporary-dev libghc-url-dev libghc-utf8-string-dev RUN apt-get install -y libghc-utility-ht-dev libghc-http-conduit-dev libghc-happstack-server-dev RUN apt-get install -y libghc-directory-tree-dev libghc-zip-archive-dev libghc-strict-dev RUN apt-get install -y libghc-network-uri-dev libghc-tagsoup-dev libghc-word8-dev RUN apt-get install -y ghostscript calibre latex2rtf libreoffice git RUN apt-get install -y librsvg2-bin imagemagick RUN apt-get install -y fonts-freefont-ttf texlive-xetex texlive-latex-recommended RUN apt-get install -y texlive-latex-extra texlive-fonts-recommended texlive-fonts-extra RUN apt-get install -y cm-super-minimal texlive-lang-all poppler-utils RUN apt-get install -y lmodern texlive-plain-generic latex-cjk-common RUN apt-get install -y fonts-cmu fonts-wqy-zenhei RUN apt-get install -y djvulibre-bin pdftk libimage-exiftool-perl RUN apt-get install -y texlive-science texlive-pstricks texlive-games libghc-hunit-dev RUN apt-get install -y tzdata fonts-unifont RUN apt-get install -y make curl texlive-extra-utils chromium-sandbox RUN git clone https://git.code.sf.net/p/wb2pdf/git wb2pdf-git RUN make -C wb2pdf-git RUN make -C wb2pdf-git install ENTRYPOINT ["mediawiki2latex"]
Which is used like this:
- make a new directory.
- inside it create a file called Dockerfile.
- copy and paste the above content into it.
- run the commands below in that directory.
sudo apt-get install -y docker-* sudo docker image build . -t m2lubuntu sudo docker container run --mount src=/home/dirk/2docker,target=/transfer,type=bind -i m2lubuntu -u https://de.wikibooks.org/wiki/Physikalische_Grundlagen_der_Nuklearmedizin/_Atom-und_Kernstruktur -o /transfer/dirk.pdf
Where you have to replace /home/dirk/2docker with you local directory where you want to output file dirk.pdf to be written to.
Installation Diagnostics and Validation
[Bearbeiten]Diagnostic Steps
[Bearbeiten]A recommended step to test a mediawiki2latex install is to run the following test:
mkdir rmtest mediawiki2latex -u https://en.wikipedia.org/wiki/Book:River_martin -o rivermartin.pdf -k -c rmtest
If mediawiki2latex appears to finish and generate rivermartin.pdf, then examine rivermartin.pdf, it should be some 84 pages. If a pdf is not generated then:
- cd rmtest/document/main
- xelatex main.tex
alternately run
- xelatex -interaction=nonstopmode main.tex
Review the detailed output of xelatex
Validation
[Bearbeiten]Given a version of rivermartin.pdf, compare this with the open server generated version of the River_martin test case via http://mediawiki2latex-large.wmflabs.org/