Thoughts about the sitegen program D. Walden; 5/2/2005 I tried to generate some discussion about the future of the TPJ sitegen program by sending a long note hinting at options after the first issue was published, but there was no response. Although I am not anxious to modify the sitegen program, I again suggest that now (reasonably soon after issue 2) is the time to discuss the future of the sitegen program, if we are going to make any changes to it (and particularly if the changes are big). 1. The quality of the program -------------------------- The program works relatively well and error free, I believe; but it is not a thing of beauty. I was learned a lot of new stuff about Perl at the same time the requirements for the program were changing from my original image. Thus, if the program is going to be changing very much or on a very regular basis, it needs a significant rewrite. 2. The operational modularity of the current version of the program ---------------------------------------------------------------- How the HTML pages look is mostly defined in a file of HTML templates that mostly can be changed without changing the program itself. My image was that the editorial side of TPJ could change these templates without involving me, but so far that has not happened. How the TOCs and indexes look is defined in a file for each issue with a name of the format yyyy-i.txt, where yyyy is the year and i is the issue number. The titles and authors from this file are also used at the top of the HTML pages for individual papers. My image was that the editorial side could change this file without involving me, and Lance made changes to this file for issue 2. How the HTML pages for individual papers looks comes from the title and author in the yyyy-i.txt file and from several little files in the paper's directory (e.g., _links.txt, _p.html, _abstract.html). My image was that the editorial side could construct and change these files for individual papers without involving me, and Lance did lots of this for issue 2. How the HTML pages linked to from the left column of the home page look comes from a couple of other driver files that the editorial side could change without involving me, but there were no changes between issues 1 and 2. There is one entry point into the sitegen program. Every time it is run, it regenerates the entire web site. 3. Directions for going forward --------------------------- a. Do nothing. Accept that the web site looks "good enough," and the program works pretty well. With more practice, the editorial side will learn to use more capabilities that are already built into the program (as listed in section 2). [This is easiest for me.] b. Don't do much. Somehow force a couple of the most needed capabilities into the current program without bothering to do a major cleanup first. [For instance, fix the program so any number of TOC categories can be defined by the editorial side in the yyyy-i.txt file rather than having only the current three built-in ones (Notices, Articles, and Columns) -- this would be a pretty easy change. Or, fix the program so it handle subtitles nicely -- this would be a bit tricky given the current structure of the program, my home built HTML template processing system, and my desire for visual details to be specified in the HTML templates rather than in the program flow.] Then live with the program going forward. c. Do a or b now, and sometime in the future do a major rewrite if/when we feel the need. [This might be very easy for me because I may have palmed off responsibility for the program on someone else by then.] d. Rewrite the program now. This should probably be done if we are going to keep changing the program going forward or want more modular operation of the program than we have now. [This would be considerable work that I am not anxious to do, but perhaps it has to be done.] Sections 4, 5, and 6 of the this note relates to this option. e. Stop using the sitegen program for everything but generating the indexes as cross all issues. This would gain the flexibility of being able to fine tune the look of the web site without whatever limitations a sitegen program will inevitably impose. [It would be relatively easy for me to change the existing program to only generate the indexes in their current format.] 4. The logical (not actual) separate parts of the sitegen program -------------------------------------------------------------- a. Generating the home page, pracjourn/index.html. This consists of generating the left column between the standard header and footer (and the non-index pages the left column links to), generating the current issue TOC, and applying the standard header and footer. The same subroutine generates the home page TOC as generates the TOCs for the archived issues but with a parameter which specifies "Current Issue". The TOC is generated based on the information in the issue's yyyy-i.txt file. b. Generating the indexes. This consists of generating the archive page (pracjourn/archive.html) across all issues, the title and author indexes (pracjourn/titleindex/index.html and pracjourn/authorindex/ index.html) across all issues and sorted appropriately, and the bibtex index (pracjourn/pracjourn.bib) across all issues. The standard header and footer are applied to each of these indexes. c. TOCs for each issue in the archive of back issues (e.g., pracjourn/ yyyy-i/index.html). The standard header and footer are applied to each TOC, and all the TOCs are generated with the same subroutine that also generates the home page TOC. d. HTML page for each paper in an issue (e.g., pracjourn/2005-2/ walden/index.html, pracjourn/2005-2/flom/index.html, etc.). Currently, the HTML pages for all pages in an issue are generated at the same time the issue's TOC is generated. Again, at present all of the above parts of the sitegen program are run each time the sitegen program is run. 5. Some ways the execution of the sitegen program might be partitioned, i.e., different entry points/parameters ------------------------------------------------------- a. Generate the HTML page for one paper in an issue. b. Generate the TOC for an issue and HTML pages for all papers in the issue. c. Do b over all issues. c. Generate home page and indexes, or do this and c. Of course, there are other options or more general ways to parameterize the call to the sitegen program. 6. Some possible changes to the driver files ----------------------------------------- Section 2 (at least implicitly) lists the various driver files for the TOCs and HTML pages for individual papers. A problem with the current setup is that the part of the program that generates the HTML pages for the individual papers needs to get the title and author out of the issues driver file and the rest of the needed information from multiple files in the paper's directory. This means the editorial side cannot just work with one driver file to configure the HTML page for a paper. The program could be changed so all the needed information (including title and author) is in one driver file in the paper's directory. Then the generation of the HTML page for one page could be very independent of everything else. The issue driver file would then be changed so it only has a list of the TOC categories and the paper directories. For example, the driver file for issue 2 (pracjourn/2005-2/2005-2.txt) might contain issue|2005|2 category|Notices piece|editor piece|readers piece|thiele piece|practex05 category|Articles piece|flom piece|schremmer ... category|Columns piece|null piece|peter ... The TOC generation subroutine would then have to look in a paper's directory for its title and author, but that would not be hard. Finally, we could add a line a line near the top of issue driver files that specified what versions of subroutines to call to generate the HTML page for an issue, to generate the TOC for an issue, and to unpack the data in the driver file for a paper (this last is most important). By doing this, the program could have the option of using different driver file formats and routines to process them to help us deal with backward compatibility problems. [A complication: Today we have three versions of the title of a paper (for HTML display, for sorting, and for use in the .bib index), and our thought was to also have three versions of author names whenever that became necessary. If we put the authors and titles in the driver files for the individual papers, then it would make sense for the people on the editorial side to provide, as necessary, all three formats. An alternative would be to only have the HTML display version of the title and author in the individual paper driver file, and include the other two, as necessary, in the issues driver file for use in the index generation. I prefer the former approach.]