Benutzer:Dirk Huenniger/wb2pdf/manual

Aus Wikibooks
Zur Navigation springen Zur Suche springen

Processing more than one page[Bearbeiten]

MediaWikis templates system[Bearbeiten]

You can simply create a new page on the wiki and type something like this:

{{:MyPageOne}}
{{:MyPageTwo}}

The resulting page will display a concatenation of the pages MyPageOne and MyPageTwo.

MediaWikis Collection system[Bearbeiten]

You may create a collection using MediaWikis Collection Extension. This page can be processed by the Web Interface selecting TemplateExpansion=BookMode or with the command line version using the command line option --bookmode. Just keep in mind that the Web Interface does have a time limit of 1 hour 200 pages and the command line version does not have any limit.

Web Interface[Bearbeiten]

Time Limit[Bearbeiten]

Please note that there is a time limit of 'one hour in the web interface. So please don't ask for more then 200 pages. Otherwise your request might fail with a timeout error. If you install mediawiki2latex locally there is no such time limit. mediawiki2latex is opensource software and thus free to use and even to modify.

URL to the Wiki to be converted[Bearbeiten]

An URL to the Wikipedia article you wish to convert to a different format. You may just open the article you want with you web browser and copy the contends of the address line in the top your browser. That is already an URL that you can just paste here.

Output Format[Bearbeiten]

You got the choice between the following output formats:

  • Compiled PDF: A PDF file of the article you selected by supplying the URL to it. The PDF file will be created with the LaTeX typesetting software, which is often used to ẃrite books and articles in mathematics, physics and related fields.
  • Source ZIP: A ZIP file of the LaTeX source file of the article. This useful only if you want to change the layout using the LaTeX software yourself. In this case you will need to install Ubuntu App on Windows or have a Debian or Ubuntu like operating system installed. In order to compile the source with LaTeX you will also have to install the mediawiki2latex package from the repository of your distribution
  • EPUB File: A file in the EPUB format suitable for use with E book readers.
  • ODT File: An Open Document Text file. Usefull for importing into you favorite word processing software, if you want to modify the article offline.

Template Expansion[Bearbeiten]

  • Print: The default and recommended mode. Just the HTML generated by MediaWiki is evaluated. Use this if you are unsure.
  • MediaWiki: The MediaWiki Templates are expanded by MediaWiki into wikisource. The wikisource is parsed and processed further. Use this mode if you don't get the result you intended with the default mode.
  • normal: In this mode the templates are not expanded but rather mapped to LaTeX commands using a default mapping file. This mode might be helpful if you intend to compile a wikibook on the English or German Wikibooks. Of course you can also provide your own mapping file using the -t command line option. But is useful only if you know about LaTeX and want to create the PDF file that looks exactly the way you want.
  • Book Mode: All links on wiki page will be followed, but not recursively. All pages encountered will be processed.

Paper[Bearbeiten]

The dimensions of the page you with to use.

Vector Graphics[Bearbeiten]

Some images might be available in vector graphics format, allowing lossless arbitrary scaling. Since most PDF tools do not support that well all those images are converted to raster graphics formats by default. But you can override this behavior using this setting.

Command line Version[Bearbeiten]

We will just give a quick overview of all parameter and discuss them in detail below:

  -V, -?, -v    --version, --help     show version number
  -o FILE       --output=FILE         output FILE (REQUIRED)
  -f START:END  --featured=START:END  run selftest on featured article numbers from START to END
  -x CONFIG     --hex=CONFIG          hex encoded full configuration for run
  -s PORT       --server=PORT         run in server mode listen on the given port
  -t FILE       --templates=FILE      user template map FILE
  -r INTEGER    --resolution=INTEGER  maximum image resolution in dpi INTEGER
  -u URL        --url=URL             input URL (REQUIRED)
  -p PAPER      --paper=PAPER         paper size, on of A4,A5,B5,letter,legal,executive
  -m            --mediawiki           use MediaWiki to expand templates
  -h            --html                use MediaWiki generated html as input (default)
  -k            --bookmode            use book-namespace mode for expansion
  -z            --zip                 output zip archive of latex source
  -b            --epub                output epub file
  -d            --odt                 output odt file
  -g            --vector              keep vector graphics in vector form
  -i            --internal            use internal template definitions
  -l DIRECTORY  --headers=DIRECTORY   use user supplied latex headers
  -c DIRECTORY  --copy=DIRECTORY      copy LaTeX tree to DIRECTORY

--version

Shows version and help information

--output=FILE

Set the output file where the result should be written to. On windows you have to be careful, that the file is currently not open in any kind of software, since it won't be writable in this case. The extension is not evaluated so you will have to set other parameter to define the output format.

--featured

This feature is not implemented and the parameter might go away.

--hex

This parameter takes the whole configuration of mediawiki2latex as a single hex encoded string. This is only used by the mediawiki2latex server when it calls its sub processes. This is needed to make shell injection attacks impossible, so the shell will just see a hex encoded string and not try to run any script from that.

--server=PORT

Run mediawiki2latex web interface as http server. List on PORT.

--templates=FILE

Define a custom mapping file of MediaWiki templates to LaTeX commands. And example is given in file templates.user. The original wikitext will be parsed by mediawiki2latex. MediaWiki will not be used to expand any templates. Error message about "Unknown Template" will be added to the output PDF file in case templates are encountered which are not given in the mapping file.

--resolution=INTEGER

By default all images with a resolution higher that 300 dpi will be scaled down to 300 dpi in order to reduce the size of the resulting PDF file. With this paramter you can override this with you intended resolution. This is helpful if you need to produce a pdf file that is small enough to be uploaded to a file hosting website.

--url=URL

The URL to the article you wish to convert

--paper=PAPER

The dimension of the page you wish to use in the PDF. Supported values are some European DIN norms A4,A5,B5 as well as some American formats: letter,legal,executive. In LaTeX it is possible to define even more paper sizes in case you need.

--mediawiki

Use MediaWiki to expand the MediaWiki templates in the wikitext source but parse and process the resulting expanded wikitext source with mediawiki2latex

--html

Use MediaWiki to generate a HTML page from the wikitext source and parse and process the resulting HTML with mediawiki2latex

--bookmode

This mode is for processing collections made with MediaWikis Collection extension, this includes the pages found in the Book namespace on Wikipedia as well as user defined collection in the user namespaces. mediawiki2latex will follow all links in the wikitext, but not recursively. For each link it will load the HTML. It will stitch together all HTML loaded and parse and process that.

--zip

Create a zip file of the LaTeX source generated for the article as output.

--epub

Create an epub file of the article as output. Essentially an intermediate HTML file will be created. The images will be processed as usual and the formulas will be rendered to images. This result will be converted to an Epub file by calibre.

--odt

Create an odt file of the article as output. ODT stands for Open Document Text and can be imported by common word processing software. The same trick with an intermediate HTML file as desribed above for epub is done, but the odt file is created by libre office.

--vector

Don't raster vector images but include them a vector graphics in the PDF document. Usually PDF viewing and processing software does not too well when dealing with vector graphics, so its not recommended to do so.

-internal

Same as --templates, but uses a default template definition file compiled into the mediawiki2latex executable. This might be usefull only on the German and English wikibooks, since the template definition file contains some reasonable definitions for many templates on these sites.

--headers=DIRECTORY

Copy a directory with custom header files into the temporary LaTeX document tree before running xeLaTeX. This way you can define custom layouts and define you own latex newcommands which makes sense with the --templates option described above.

--copy=DIRECTORY

Copy the LaTeX (and possible HTML) source to the given directory. This option is useful if you want to manually edit the LaTeX document and compile it yourself. mediawiki2latex will still do everything requested including the creating of output files and compiling the sources. It will just copy the directory immediately before the compile step.

Tables[Bearbeiten]

There are some commands for the typesetting of tables. Tables can include horizontal as well as vertical rules and a frame surrounding the table. They will be drawn if and only if the template prettytable or the attribute class="wikitable" is present in the header of table be drawn. Secondly it might be useful to reduce the fontsize for a whole table. This can be achieved by writing latexfontsize="scriptsize" into the header of the table. In contrast to the tolerant behavior of mediawiki, wikipdf requires new tables to start in a new line. You can furthermore define the width of columns of a table by using the width attribute with a value in percent (%) in the attributes of cells of the table. There is also support for table headings. In a large table spanning several pages, it is often required to repeat the header (that is some rows in the beginning of the table) on the beginning of each page. This is done by marking some cells as header cell. This is done by using the exclamation mark (!) instead of the vertical bar (|) when writing down the table in the wiki syntax. The program considers the fist few rows to be part of the header as long as they continuously contain header cells.

Images[Bearbeiten]

List of Figures[Bearbeiten]

A table of images, their authors and licenses is automatically created in the appendix. In order to determine the name of the author, the information template on the description page of the image is analyzed, thus it needs to be present and to have a valid author entry.

Size of Files and Image Resolution[Bearbeiten]

Often the size of file allowed by the application, you want to use the generated pdf in, is limited. Especially submitting the PDF to a print on demand service often causes this issue. You can reduce the size of the file by dithering the images to a lower resolution, loosing some quality. Typical printing machines used for manufacturing books in an industrial manner today use a resolution of 300 dpi. Thus a higher resolution is usually not necessary. You can enter the maximum allowed resolution in the Graphical User Interface. All images with higher resolutions will be dithered accordingly.

Width of Images[Bearbeiten]

The width of image will usually be as large as possible, determined by the width of the page as well as the margins. You may modify this behavior by using a px command when including the image in the wiki source text. 400 pixels correspond to the maximum available width. Thus writing 200px will reduce the size to one half of the original size.

Wrapping Images[Bearbeiten]

The former template [[Vorlage:Latex Wrapfigure|Latex Wrapfigure]] can be used for that. It takes two parameters, image and width. Width is between 0.0 and 1.0, where 1.0 means full width of text. 0.5 means half the width of the text and so on. Image has to be a link to an image in the wiki notation starting and tailing double square brackets. see also section on used defined templates of this document and manual of the wrapfigure latex package found on ctan.

Templates[Bearbeiten]

Automated Expansion[Bearbeiten]

In the default case all Templates are expanded by MediaWiki. This is the meaning of the setting Template Inclusion = MediaWiki in the GUI.

Manual Expansion[Bearbeiten]

It is hard for an algorithm to determine how a mediawiki template should be converted to LaTeX code. This is because templates are implemented using HTML in an extensive manner in order to produce a good looking output on a Webbrowser, which is very different from the "what you get is what you mean" style LaTeX is using. Still all templates are algorithmically expanded by default as explained above. But we recommend an other way of dealing with templates, which will explain now. You have to set Template Inclusion=normal in the GUI. In this case only a limited number of templates is taken into account by wb2pdf. All other templates will cause the text UNKNOWN TEMPLATE message to come up in the resulting file. It is recommendable to search the output files for this string in order to make sure that all templates were processed correctly. To extend the template processor with custom templates you have to modify the file templates.user in the directory wb2pdf/trunk/latex.

[
["mywikitemplate1","MyLaTeXTemplate","paramx","3","paramy"],
["print version cover","LaTeXNullTemplate"],
["GCC_take_home","LaTeXGCCTakeTemplate","1"]
]

it contains a list of sublists. The fist item in each sublist is the name of the template in the wiki. The second is the name of the template in LaTeX. The following n elements of the sublist are the parameters in the wiki, which shall be passed to the template in LaTeX. Certainly you also have to modify templates.tex in the directory wb2pdf/trunk/document/main to add a definition for the LaTeX version of the template. When modifying templates.user be aware that each entry ends with a comma except for the last entry which does not end with a comma. Furthermore umlauts and non ansi characters have to be encoded in decimal utf8 notation this means:

"\195\156berschriftensimulation 5"


This isn't such a big problem since the Unknown Template Error message in main.tex file in directory wb2pdf/trunk/document/main will have exactly this format (decimal utf8 notation), thus you just need to copy and paste them.

If you need to have more degrees of freedom in defining how a template is processed you can also edit the source code of the template processor In order to extend the template processor of wb2pdf with you custom templates you need can also modify the function templateProcessor in the file LatexRenderer.hs an to recompile pa.exe. In order to do so you need to install the Glasgow Haskell compiler as well as its package manager (cabal). Many examples for custom templates are given in LatexRenderer.hs. Still this file is coded in the purely functional programming language Haskell, which having learned about will help you to define the processing of your custom template. LatexRenderer.hs is essentially a code generator writing code in the LaTeX typesetting language which you will also need to learn in order to extend the custom template abilities of wb2pdf.

Inputstage[Bearbeiten]

The full source of the wikipages to be compiled is downloaded to your local computer. This is done by load.py. This file is coded in the python language using the syntax of version 3.1. Since the subpages in a printversion of a wikibook are often included using a custom template it is necessary to modify load.py in order to make it load all the data you included. Many examples of the processing of custom templates are given in load.py. If you don't want to modify this file you may stick to mediawikis standard include mechanism using a double opening curly bracket followed by a colon, the wikipage you want to include and a tailing double closing curly bracket.

Other Stuff[Bearbeiten]

Other Fonts[Bearbeiten]

Currently we do full 16 Bit Unicode and chose vector font if available. If not we use GNU Unifont. We are using a combined font called megafont.ttf and you will have to modfy or replace it in order to use custom fonts. For the Linux command line version we use a default font that does not support the whole unicode range. You have to use the -f command line option and install the megafont ttf from the font.zip file you find on the sourceforge page in order to get 16 bit unicode.

Indents at the beginning of paragraphs[Bearbeiten]

You can cause the first line of each paragraph to be indented if you uncomment the line \usepackage{parskip} the file packages1.tex.