The manual contains detailed documetation on the netrik code. It should be of interest to you if you want to hack on netrik, but also if you are only curious how it works.
The manual includes:
The main idea of the layout engine is to split HTML processing into a couple of independant passes. As the single passes are fairly simple, this makes understanding and altering the code quite easy. On the other hand, this isn't terribly efficient the way it is done now. We think however that in the early stage netrik is now, it's more important to keep the code as simple as possible to faciliate fast developement.
Also, splitting up the processing is necessary to allow rendering of not completely loaded pages. As this is to be one of the main advantages of netrik (it's not implemented yet), we pay much attention to that.
However, the splitting into passes isn't presently optimal by any means. It's more or less the first which came to mind. There is not much point in optimizing it for the current feature set, as it may be broken by any new feature added. Don't forget: Always do the simplest thing that works :-) We will try to find a more efficient solution when netrik gets fairly stable.
Most of the processing steps of the layouting are working recursively due to the nature of SGML/XML. They arn't implemented recursively, though. This makes understanding a tick harder, but there is a couple of reasons for that. Efficiency is one of them, but not that important presently. The main reason is that a real recursive implementation wouldn't allow displaying partially loaded pages without real multithreading. However, as we already mentioned this feature has high priority, and easy implementation of that is very important. (Real multithreading is terribly inefficient, and quite complicated.)
Here is an overview how the different modules work together. Detailed descripitons of the modules are located in the various hacking-... files.
The main program first processes command line options.
Afterwards, load_page() is called, which first opens the input resource (file or HTTP connnetion) via init_load(), then loads the page and applies the various layouting passes to it.
After layouting, either the layouted page is dumped using dump() (when --dump option given), or the interactive viewer loop is entered.
In this loop, first curses fullscreen mode is (re)activated, and display() is called to start the pager. The pager quits as soon as either 'q' is typed to quit the program, command mode is entered by typing ':', a link is followed or a form control activated by typing <return> while some link is active, some URL is displayed via 'u', 'U' or 'c', or some page from the history is reloaded using 'b', 'f', 'B' etc. The reason is indicated by the return value, which is an "enum Pager_ret".
After the pager returns, fullscreen mode is turned off. Action is taken upon the return value.
If 'q' was typed, no action is taken; the loop is finished and the program quits.
If ':' was typed, a command is read using readline(). The command is added to history, and then interpreted. Currently the only known commands are ":e" and ":E". The URL is extracted from the command, and load_page() is called to load the desired new file. The URL of the current page is used as base for a relative URL with ":e". With ":E" (and also for ":e", if the current page is internal), no base is used; the URL is always interpreted absolutely.
If a link/form control was activated, the action depends on the link or form element type. For normal links load_page() is used with the current URL as base, just as with ":e". The link URL is extracted from the text item containing the link with get_link(), by help of the "link_list" structure. This process is described under Following Links in hacking-links.*.
Form submit buttons are quite similar. First, get_form_item() (also in hacking-links.*) is used to retrieve the item in the item tree which represents the form in which the button resides. This is used to get the form's submit address ("action") first. Having this, load_page() is used to do the submit; the form item is passed as the "form" argument. (And passed on to init_load() and http_init_load(); see hacking-load.*.) This is both to tell init_load() that a form is to be submitted, and where to find the form data. init_load() (or actually http_init_load()) then takes care for extracting the form data (using start_form() and form_next() from forms.c, also desribed in hacking-links.*) and submitting it to the server.
For form controls, the form value ("link->value" or "link->enabled", depending on the form control type) is adjusted appropriately. (Text/password input fields, <select> fields, radio buttons and checkboxes are implemented now.)
If 'u' was typed, the link URL is retrieved the same way, and printed to the screen; 'c' prints the "full_url" component of the current page URL.
'U' is similar to 'u'. Instead of printing the (relative) link URL directly, it merges it with the current page URL, thus getting the same absolute target URL which would be used if the link was actually followed.
If a history command was given, load_page() is called with the (split) URL taken from the desired "page_list" entry. We know which entry to take by "page_list.pos", which is set in display() to the new value before returning. If the history entry refers to the same HTML page as the one displayed up to now, the current page descriptor is passed as "reference". To determine wheter it is the same page, we need to check if all entries between the old and the new one (regardles whether the new one is before or after the old one in history) have "local" URLs, i.e. if the newer of the two entries was created only by following links to local anchors from the older one.
Afterwards the loop is repeated, viewing the new file and waiting for another command.
The file loading module consists of the files load.[ch], http.[ch], http-parse-header.[ch], and the url handling functions in url.[ch]. Some functions from forms.c are necessary also, when submitting HTML forms. The main loading functions (in load.c) are called from load_page(), as part of the Layouting process.
The loading is intialized by a call to init_load() with the desired URL as argument. A base URL is also passed, which is merged with the target URL to create the effective URL, if the target URL is a relative one. (Following links etc.) Then it decides whether it is a local file or an HTTP URL, and initializes the loading. (Opens file or establishes HTTP connection.)
The loading itself is done by calling load(). This function fills a buffer, which then can be processed. After ther buffer is processed, load() has to be called again, loading the next chunk.
The loading module is described in detail in hacking-load.*.
The layout engine consists of the files parse-syntax.c, syntax.h, facilities.c, dump-tree.c parse_elements.c, sgml.c, parse-struct.c, items.h, pre-render.c, render.c, render.h, layout.c, and layout.h.
As mentioned before, layouting is done in several passes.
The first passes are parsing syntax (parse_syntax()), looking up the element and attribute names (parse_elements()), (optionally) fixing a broken tree created by SGML documents (sgml_rework(), interpreting the elements (parse_struct()), and assigning positions and sizes to all items of the output page (pre_render()).
All these passes are applied from load_page(), immediately after the file is opened; load() is called from within parse_syntax().
After these processing steps, the page is ready for rendering. The actual rendering is done just in time by render(), which is called from the viewer, every time some new region of the output page needs to be displayed.
If a "reference" page is passed to load_page(), none of the processing passes is applied; the necessary data structures are taken from the referenced page instead. (This is used when following a link to some local anchor, which doesn't require loading a new page.)
The layout engine is described in detail in hacking-layout.*.
The viewer module consists of the files pager.c and pager.h.
The pager uses curses to display the layouted page in an interactive manner. (Presently, only scrolling, simple link selection, and various page history commands are implemented.)
Every time the visible output page region changes, render() is called to display the new region.
The viewer module is described in detail in hacking-pager.*.
Hyperlink (and anchor) handling is not a module in a classical sense; there is no central place, no source files specific for that. There are the links.c and links.h files, but they contain only some helper functions; most of the code necessary for handling links is distributed among almost all of the other modules.
The layout engine needs to extract all links and anchors while parsing the page, and assign coordinates to them while pre-rendering. The pager needs to highlight the selected link; provide commands for selecting and following links; and inform the main program about the link following. The main program needs to initiate loading of the link. The file loader needs to construct the target URL from the current page URL and the link URL.
All necessary steps are described in hacking-links.*, or else pointers to the specific module documentation are given.
HTML forms are very similar to links, and mostly they are handled together. Some special handling is required at certain places, though. These are mentioned in hacking-links.*. Some additional functions necessary for form handling reside in forms.c; these are also covered in hacking-links.*.
render.c uses the curses library for output. screen.c contains a couple of helper functions for handling curses, both in raw and in full screen mode.
init_curses() is responsible for initializing curses in raw mode, and getting some control sequences for setting colors etc.
start_curses() initializes curses in full screen mode, and sets the color pairs. It also initializes several curses settings.
For color pairs having black background or white foreground, the respective parts aren't set; the terminal default is used instead. (Unless --force-colors is used.) This is to produce the expected results in an xterm (or other terminal with bright background and black text).
set_color() sets the text attributes in full screen mode. The forground color is set, and the "bold" attribute if it is a light color.
If the background color is bright, the "standout" attribute is set -- this seems to be the only way to get a bright background in xterm. The drawback is that the foreground is always bold in this mode in an xterm. As standout implies reverse video, foreground and background colors have to be swapped in this mode.
set_color_raw() does the same in raw mode. As raw mode has no color pairs to be set during initialization, handling of default colors for white foreground/black background has to be done here. The respective parts simply aren't set at all, thus keeping the terminal default colors.
reset_colors_raw() resets all text attributes to their default values.