[ Show as SlideShow ]

Scripting 2: Running the Scripts

Speaker: Volker RW Schaa (GSI Helmholtzzentrum für Schwerionenforschung GmbH)

This is the second part of the scripting tutorial about JPSP (JACoW Proceedings Script Package).
Part 1 explains about the necessary software packages and where to install them from, and shows
the JPSP configuration file. Prerequisite for all scripts, reports, and tools is the completely filled out
configuration file.

After the installation of all software packages is done, we have to make the required adaptions to
the conference configuration file conference.config and then we are ready to run the scripts.
To speed things up I have prepared a special zip file with everything set up for our demo. But first
an introductions which scripts are there, what they are for, and when they are called.

Outline of JPSP Scripts and Data Flow

JPSP Scripts and Data Flow

In the outline the structure of the JPSP scripts and their data flow are shown.
The scripts and generated batch files are listed and the graphic shows their

  • SRead (spmsread.pl in the graphic) fetches the XML from the
    SPMS instance and makes it available for the scripts
  • SBatch (spmsbatch.pl) is the main work horse,
  • GToC (generate_toc.pl) generates a table of contents
    if the SPMS procedure Generate TOC values cannot be used
  • SReA (spmsreadrearrange.pl) is used for the abstract booklet production of the
    big conferences (IPAC etc.) to move the poster sessions to end after all oral sessions.

Three Check and Report Scripts

  • ScanKey (scan-keywords.pl) searches the PDFs for keywords and
    generates a report about broken encoding
  • PageChk (pagecheck.pl) checks the PDFs for all kind of errors (page size,
    font embedding, etc.)
  • BoxChk (boxcheck.pl) checks the PDFs for tearing boxes by counting the number
    of text boxes per page flagging an alarm when this number appears too high.

Script Generated Batch Files

In the graphic all blue boxes are script generated batch files which have to be run to execute their task:

  • page_cnt (spms_corr_pages.bat) corrects the number of pages in SPMS for the paper PDFs
  • get_pdf (<xxxx>wget.bat) downloads the respective PDFs from the file server
    (<xxxx> stands for paper, talk, poster, all, or pdf)
  • wrap_TeX (gen_texpdf.bat) generates the proceedings PDFs with all
    embedded or imprinted information
  • ATC_TeX (gen_texpdf.bat) generates check sheets to make the comparison
    between SPMS and paper information easier during Author Title Check

Diagram for JPSP Scripts and Data Flow

spmsread.pl — Download the Conference XML from SPMS

Our demo configuration file conference.config is set to the SAP2017 conference,
so the output in this tutorial will show sessions and data from that conference.

The following script downloads the conference XML

 > spmsread.pl [clean]

spmsread.pl will download all XML data from SPMS. This is done with chunks in the size of one
session per data transfer. The session XMLs are then merged to one (conference) XML file. If an
old conference XML file already exists, it will be saved as a backup under the original name with
date and time of its creation attached.

To allow for a selective update of changed sessions, the session XMLs are kept and have to
be deleted to allow an update. This can be overridden using the option clean. In this case all
section XML files will be updated and a new conference XML file created.

spmsread.pl — Download Log (Part 1)

0001 D:\SAP2017>spmsread-171105.pl clean
0002 I found 1 command-line argument(s).
0003 clean
0005  This is version  6.0 of 05 Nov 2017 - vrws
0006  config file 'conference.config' found!
0008 >>
0009 >> reading from URL: https://spms.kek.jp/pls/sap2017
0010 >>
0012  xml file from config is: 'spms.xml'
0013 File ->./XML/spms.xml<- already exists!
0014 File ->./XML/spms.xml<- will be saved as ->./XML/spms-20171120-014736.xml<-
0015    1 file(s) copied.

lines 2+3 signal that the script is run in the overwrite mode (clean)
line 5 shows the version of the script
line 9 shows the SPMS instance where the data are read from
lines 13-15 show that an old spms.xml file was found and saved as backup

spmsread.pl — Download Log (Part 2)

0016   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
0017                                  Dload  Upload   Total   Spent    Left  Speed
0018 100  1190  100  1190    0     0   1190      0  0:00:01  0:00:01 --:--:--   939
0020 basic spms file './XML/spms_summary.xml' found!
0021 -----------> loading xml file for session MOAH
0022   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
0023                                  Dload  Upload   Total   Spent    Left  Speed
0024 100  9717  100  9717    0     0   9717      0  0:00:01  0:00:01 --:--:--  7675
  : more session downloads shown
0068 -----------> loading xml file for session WECH
0069   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
0070                                  Dload  Upload   Total   Spent    Left  Speed
0071 100 39832  100 39832    0     0  39832      0  0:00:01  0:00:01 --:--:-- 25747
0073 elapsed time: 24.11 [s]

lines 16-20 with spms_summary.xml downloaded we have a list of all sessions
lines 21-71 log the download of XML files for sessions which are found in spms_summary.xml

In the background all session XMLs are added to spms.xml which will be used for all further script actions.

spmsbatch.pl — Processing the Conference XML

Next step is to execute

 > spmsbatch.pl 

spmsbatch.pl first gobbles up the XML data in spms.xml, then it starts generating all kind
of web pages (web_html), batch files (wrap_TeX, ATC_TeX, and get_pdf), and a number of
control files for other task. The number and kind of generated files depends on settings in config,
the presence of files in conference directories like in papers, talks, posters, or the keywords file
and others.

Settings & Processing Stage

So it is clear that depending on the settings in config and the processing stage,
spmsbatch.pl will generate different amounts of reports, updates and web pages:

  1. pre-conference: no papers, talks, or posters have been uploaded, so we only get web_html
    and some TeX files to be used to generate the Abstract Booklet (not shown in the figure)
  2. during conference: more and more paper PDF files are available and downloaded (using
    get_PDF) to generate (the conference daily) status and error reports, and the list for
    Author-Title checks (ATC_TeX)
  3. Pre-Press Release: most of the paper PDFs have been uploaded and QAed, the talks are
    processed, (and posters uploaded). Now – in addition to the above – the "wrapped" papers
    (carrying the Pre-Press Release message) have to be generated (wrap_TeX), and will be
    included in the web pages together with talks (and posters) after another run of spmsbatch.pl.
  4. past conference: this stage is similar to the Pre-Press Release stage, now all PDFs are
    available and have to be checked; the page count comes into play and with page_cnt the
    number of pages from the checked PDFs can be corrected in SPMS. The wrapping has to
    be repeated (without the Pre-Press Release message).

spmsbatch.pl — Log (Part 1)

Now the log of the first run of spmsbatch.pl on SAP2017 XML:

0001 D:\SAP2017> spmsbatch-171105-doi.pl
0005 you are using ---> v 27.1 - 27.0-05 Nov 2017 vrws
0006  config file 'conference.config' found!
0007            OS platform     = MSWin32
0008            OS platform id  = 1
0010  DOI site not defined in 'conference.config' - will assume it's a subdir of conference
0012 DOI Landing Directory exists.
0013 XML directory for the DOI metadata exists.

2..4 have been removed from display as they show redefined functions.
5 shows version and date of the script. This information is also written to the generated html files.
6 signals that »conference.config« is found and processing will continue.
7+8 detected operating system; in this case Windows. OS is important for all generated batch files
10 the DOI landing site is not explicitly defined, therefore the standard setting is used
12 the DOI sub directory exists and does not have to be created (./doi)
13 a XML sub-directory exists (for ./DOIXML)

spmsbatch.pl — Log (Part 2)

0015 conference_pub_by JACoW
0016 config file points to './XML/spms.xml'
0017 logo : 428 x 125
0019 Code&Location file found!
0020 [' 0': MOAH ] Oral -- Oral Session 1 -- Main Hall -- posses
0029 [' 9': TUPH ] Poster -- Poster Session 2 -- Main Hall -- posses
0032 ['12': WECH ] Oral -- Oral Session 11 -- Main Hall -- posses
0033 # 13 code & locations

16 the standard XML file is being used (»spms.xml«)
17 gives the logo dimensions used in the web banner.
19 the "Code&Location" file is used for giving sessions a specific color in accordance to the synoptic table
20..32 list the sessions found in the XML by name, oral or poster, title, location, and the name of
the color in the synoptic table (default is "posses" for 'Pos'ter 'Ses'sion, which is always grey).
32 the "Code&Location" file has 13 entries, one for each session

spmsbatch.pl — Log (Part 3)

0034 going to close session [' 0': MOAH   ] [ chairs :0] : Oral Session 1
0035 going to close session [' 1': MOBH   ] [ chairs :0] : Oral Session
0036 --> Papercode : moch2 ### Documentname : paper_moch2.doc
0037 --> Papercode : moch3 ### Documentname : paper_doc-moch3.doc
0038 going to close session [' 2': MOCH   ] [ chairs :0] : Oral Session 3
0049 ## lower ######### > chen => Chen
0058 going to close session ['12': WECH   ] [ chairs :0] : Oral Session 11
0060 Conference XML 'SAP2017' closed
0062 #### 0.62 [s] ### end of XML read

34+35+38+58 again a sessions list. The number of chair persons is mentioned because "InDiCo" and
"SPMS" distinguish between 'Unchaired sessions' (mostly Poster session) and several session chairs
(SPMS only one chair).
36+37 files have been uploaded which do not agree with the naming scheme of paper files for a given
paper id. The most likely case is that authors upload their files using "Other Supporting Files" instead
of the correct "Source File" or "PDF".
49 lists an error in an author's name, in this case the last name is lowercase in SPMS
62 total time of processing the XML

spmsbatch.pl — Log (Part 4)

0065 Number of publishable papers : 56
0068 #### 0.66 [s] ### end checking PDF file presence
0070 KeyI PapI PaperCod Pub Keywords
0071 ---- ---- -------- --- --------
0074 Use of uninitialized value $isbn_str in concatenation (.) or string at D:\SAP2017\spmsbatch-171105-doi.pl line 5131.
0076 Use of uninitialized value $cpx_pos_off in concatenation (.) or string at D:\SAP2017\spmsbatch-171105-doi.pl line 5131.
0083 1#### 1.22 [s] ### end of session generation          (  13 session files )
0085 >>>>>>>>>>>> index.html .... NOT OVERWRITTEN ....

65 the number of publishable papers from SPMS (to be compared in the following processing steps)
70..72 this table will show Paper codes where problems exist (too few keywords found or not publishable) when PDFs have been downloaded
74..76 these errors appear only before PDF files have been downloaded from the file server
83 the "session" html files have been generated (lines shows the number of session files)
85 shows that an "index.html" file has been found and that it will not be replaced by the script's auto generated version of an entry page

spmsbatch.pl — Log (Part 5)

0087 ####   2.93 [s] ### end of author generation         ( 285 author files)
0088 ####   3.06 [s] ### end of institute generation      (  26 institute files)
0089 ####   3.19 [s] ### end of classification generation (   8 class files)
0090 Registration on jacow-org => http ://jacow.org/sap2017/doi/
0091 ####   4.12 [s] ### end of DOI landing pg generation (   0 DOI files)
0094 elapsed time : 4.12 [s]

87-89 are showing the number of files generated for unique authors, institutes, and classifications.
90 a DOI registration file will be generated for the default DOI location at jacow.org/<conf>/doi/
91 number of files generated for DOIs. As no PDF files have been downloaded yet and processing
mode is still "Pre-Press Release", no DOI, keyword, and DOI institute files will be generated (last
two are suppressed in this mode)

Typical Procedure for Proceedings Generation (Part 1)

  • SPMS: Generate TOC Values (build Table Of Contents in SPMS)
  • spmsread (Read XML)
  • spmsbatch (Process XML and generate download command files)
    • paperwget (Download paper PDFs)
    • posterwget (Download poster PDFs)
    • talkwget (Download poster PDFs)
  • spmsbatch (Process XML and generate check files)
    • gen_texpdf [ATC] (generate Author-Title-Check pages)
    • changes? (enter changes in SPMS)
      • spmsread (Read updated XML)
      • spmsbatch (Process XML, rebuild check file)
    • pagecheck (run page check on paper PDFs)
      • => pagecheck-result.txt ( check results, changes needed?)
    • scan-keywords (run scan for keywords and broken paper)
      • => keyword-count.txt (check keyword count)
      • => broken-papers.txt (check broken papers report)

Typical Procedure for Proceedings Generation (Part 2)

  • changes? (if problems found, make changes in PDF)
    • delete defective PDF
    • upload corrected PDF
    • config: set update mode (switch to update mode in config)
    • spmsbatch (update download script for missing PDFs only)
    • paperwget (re-download corrected papers)
  • spms_corr_pages (update page numbers in SPMS)
  • SPMS: Generate TOC Values (rebuilding Table Of Contents in SPMS)
  • spmsread (Read updated XML )
  • spmsbatch (Process XML, build wrapper command files)
    • gen_texpdf [papers] (wrap papers with conference information, hidden fields, page numbers etc.)
  • spmsbatch (incorporate all updates and PDFs in web site)