[ Show as SlideShow ]

A Futuristic View of the Editorial Process

Disclaimer

  • These slides speak only for the speaker, who has been an author for roughly twenty years and an editor for three years.
  • This is not an espionage to replace Human race with Cylons.

Overview

  • What do authors and editors want
  • A new scheme & the implementation
  • Futuristic future work
  • Summary

As an Author

  • Convenient user interface.
  • Support for various formats.
  • Intuitive visualisations between revisions.
  • Probably some social functionalities?
  • Carrying a poster all the way does not look sexy. Can the host take care of the printings?

As an Editor

  • Standardise inputs from authors.
  • Big screen with more contents and preferably bigger font sizes.
  • Manoeuvre changes.
  • Avoid duplicating jobs.
  • Reference format is a curse.
  • It would be nice if the cost can be covered by the host.

Efforts for Now

  • Provide templates for authors and hope they’ll follow with crossed fingers.
  • Lucky for LaTeX, we can use a decent editor with regular expression support. e.g.,
    • ([0-9]+[0-9\.]*)\s+([a-zA-Z]+[^\s]*) ==> \SI{\1}{\2}: replace units separated by spaces
    • \s+\\cite ==> ~\cite, or for \ref: make space before references unbreakable
  • poppler to extra the information of the PDFs (fonts, bounding box, page numbers, ...). e.g.,
    • I always use this script to check the PDF whose file name is the same as the directory Attach:checkfonts
  • Cat Scan Tools.
  • https://ref.ipac19.org/ or directly go to the homepage for the reference if it’s a JACoW publication.
  • Sharp eyes, steady hands, and QA.

The Current Workflow of a Conference

  • Install SPMS
  • Registration
  • Abstract submission
  • Booklet
  • Paper submission
  • Editing/QA/Author-Title-Check…
  • Proceedings

Pre-Conference

Editing

Some people are colourblind. Maybe with another colour scheme?

A New Design

Taking advantage of the migration of JPSP, and the software development during the past decades, I see an opportunity to design a different back-end:

  • Papers uploaded by authors/editors are converted to Abstract Syntactic Trees (ASTs)
    • Try pandoc -f docx+styles -t native JACoW_W16_A4.dotx > JACoW_W16_A4_styles.pandoc on the template to see what is AST, or
    • Use pandoc -f docx -t gfm --extract-media=. JACoW_W16_A4.dotx > JACoW_W16_A4_gfm.md to see the Markdown output in your favourite editor (I use Typora, by the way)
    • If you don't have pandoc installed, you can try my output here: Attach:JACoW_W16_A4.zip
  • ASTs are converted to various formats: Word, OpenDocument, LaTeX, Markdown
  • Each (or perhaps only the original one) upload is checked automatically for Author-Title-Check, syntactic errors, etc.
  • Each change is monitored to be merged

Back-end

A Preliminary pandoc Filter Example

main :: IO ()
main = do
    (fileName:_) <- getArgs
    src <- readFile fileName
    let secs = createSections $ latex $ pack src
    putStrLn $ getResult $ getContents $ secs
    where getLevel (Section n _ str) = n
          getLevels = map getLevel
          getTitle (Section n attr str) =
              attr <> (show n) <> resetColor <> " " <> str
          diff1 xs = zipWith (-) xs (0:xs)
          getColor n = if n>1 then boldRed else boldGreen
          boldRed    = "\x1b[1;31m"
          boldGreen  = "\x1b[1;32m"
          resetColor = "\x1b[0m"
          prettify (k, (Section n _ str)) = Section n (getColor k) str
          getContents sec = map prettify $ zip (diff1 $ getLevels $ sec) sec
          getResult = unlines . map getTitle

Results

Future Works (Phase II)

Patch Theory

Future Online Interface

Future Online Publishing

In addition to the good old pdf files, an HTML output is also nice.

  • The HTML is ready as soon as the .md is available. No extra work needed.
  • Hyperlinks, or fancy interactive graphics with javascript.
  • Meta goodies: times been cited, view similar papers by keywords/categories, follow-up works, share in Tweeter…
  • With the help of pandoc, we can use code fences to post-process the code to generate outputs/figures online. Especially useful for Wikis.
  • I also want to host a bibliography database in JACoW
    • The references from FAMOUS publishers can be converted to our own format.
    • If something from nowhere is referenced, the editor can interactively add this entry to our database.
    • A list of candidates will be displayed for the editor if any of the DOI or keywords (authors, titles, journals...) matches the database.
    • The authors can also take advantage of this, and typing something will give a list of candidates based on how popular they are and how precise they matches. The DOI match will always go to the top.
  • If we can make everything online, there's no need for a powerful PC any more. We can put everything on a Raspberry Pi microSD card, and use the budget to buy a big screen. Or book the flights for us?

Summary

  • M$ (including Windows OS) & Adob€ products can can be gradually removed from our standard software repository.
  • Not only can the FOSS replace the commercial software, they can also do more versatile jobs freely, specifically trained by us.
  • pandoc is the core of my proposal. It supports more than I could ask and it's text-based.
  • Automation is good. But without human supervision, it could go chaos. That's why it's necessary to go to Phase II to make interactive editing.

Backup Slides

Useful Links

Source of these slides

  • I was writing the slides with Markdown before I realised that pmWiki does not use the same syntax, so I stopped at some point.
  • Although this is partly finished, one can still see the power of pandoc
  • If you're interested, here is my slides Attach:c340.zip
  • You can view the Markdown file with Typora, a very convenient tool.
  • If you don't/won't have pandoc and graphviz installed I have already run the Makefile and you can open the .html file to view the output. Preview: Attach:ProposalReport.html

Full Source of the Haskell program:

  • Do not view in "slide show" mode!
  • Why the program is so long (compared to the Python version)?
    • The stuff before main::IO() is a one-time definition that will be moved to a module
    • The double colon lines are type restraints that can ensure the correctness of a program
{-# LANGUAGE OverloadedStrings #-}

import Data.Text (pack,unpack,Text)

import Text.Pandoc.Builder
import Text.Pandoc
import Text.Pandoc.Walk

import System.Environment (getArgs)


data Section = Section
  { level :: Int
  , attr  :: String
  , title :: String
  }

purely :: (b -> PandocPure a) -> b -> a
purely f = either (error . show) id . runPure . f

latex :: Text -> Pandoc
latex = purely $ readLaTeX def{
                   readerExtensions = getDefaultExtensions "latex" }


createSections :: Pandoc -> [Section]
createSections = query mkSec
  where mkSec (Header n _ inline) = [ Section
               { level=n
               ,  attr=""
               , title=getPlain inline
               }]
        mkSec _                   = []

getPlain :: [Inline] -> String
getPlain [] = ""
getPlain (x:xs) = stripeFrom x <> getPlain xs
  where stripeFrom (Str         strs  ) = unpack   strs
        stripeFrom (Code _      strs  ) = unpack   strs
        stripeFrom (Math _      strs  ) = unpack   strs
        stripeFrom (RawInline _ strs  ) = unpack   strs
        stripeFrom (Emph        strs  ) = getPlain strs
        stripeFrom (Underline   strs  ) = getPlain strs
        stripeFrom (Strong      strs  ) = getPlain strs
        stripeFrom (Strikeout   strs  ) = getPlain strs
        stripeFrom (Superscript strs  ) = getPlain strs
        stripeFrom (Subscript   strs  ) = getPlain strs
        stripeFrom (SmallCaps   strs  ) = getPlain strs
        stripeFrom (Quoted _    strs  ) = getPlain strs
        stripeFrom (Cite _      strs  ) = getPlain strs
        stripeFrom (Span _      strs  ) = getPlain strs
        stripeFrom (Link _      strs _) = getPlain strs
        stripeFrom (Image _     strs _) = getPlain strs
        stripeFrom Space                = " "
        stripeFrom _                    = ""


main :: IO ()
main = do
    (fileName:_) <- getArgs
    src <- readFile fileName
    let secs = createSections $ latex $ pack src
    putStrLn $ printResult $ getContents $ secs
    where getLevel (Section n _ str) = n
          getLevels = map getLevel
          getTitle (Section n attr str) = attr <> (show n) <> resetColor <> " " <> str
          diffl1 xs = zipWith (-) xs (0:xs)
          getColor n = if n>1 then boldRed else boldGreen
          boldRed    = "\x1b[1;31m"
          boldGreen  = "\x1b[1;32m"
          resetColor = "\x1b[0m"
          convertPretty (k, (Section n _ str)) = Section n (getColor k) str
          getContents sec = map convertPretty $ zip (diffl1 $ getLevels $ sec) sec
          printResult = unlines . map getTitle

Charts

  • I made the charts with Graphvis
  • My Makefile looks like this:
DOT := dot
#DOT := sfdp

IMG_SRCS    := $(wildcard *.gv)
SVG_OBJS    := $(patsubst %.gv,%.svg,$(IMG_SRCS))
PNG_OBJS    := $(patsubst %.gv,%.png,$(IMG_SRCS))

all: $(SVG_OBJS) $(PNG_OBJS)

clean: ; rm -rf $(patsubst %.gv,%.svg,$(IMG_SRCS))  $(patsubst %.gv,%.png,$(IMG_SRCS)) 

%.svg: %.gv ; $(DOT) -Tsvg  $< -o $@ 

%.png: %.gv ; $(DOT) -Tpng  $< -o $@

Charts (cont.)

  • I don't need the Makefile in my gitit wiki. There's already an existing plugin that handles this:
module Dot (plugin) where

import Network.Gitit.Interface
import System.Process (readProcessWithExitCode)
import System.Exit (ExitCode(ExitSuccess))
import Data.ByteString.Lazy.UTF8 (fromString)
import Data.Digest.Pure.SHA (sha1, showDigest)
import System.FilePath </> -- this should be surrounded with double parenthesis but gives a weird symbol in the slides

plugin :: Plugin
plugin = mkPageTransformM transformBlock

transformBlock :: Block -> PluginM Block
transformBlock (CodeBlock (_, classes, namevals) contents) | "dot" `elem` classes = do
  cfg <- askConfig
  let (name, outfile) =  case lookup "name" namevals of
                                Just fn   -> ([Str fn], fn ++ ".png")
                                Nothing   -> ([], uniqueName contents ++ ".png")
  liftIO $ do
    (ec, _out, err) <- readProcessWithExitCode "dot" ["-Tpng", "-o",
                         staticDir cfg </> "img" </> outfile] contents
    let attr = ("image", [], [])
    if ec == ExitSuccess
       then return $ Para [Image attr name ("/img" </> outfile, "")]
       else error $ "dot returned an error status: " ++ err
transformBlock x = return x

-- | Generate a unique filename given the file's contents.
uniqueName :: String -> String
uniqueName = showDigest . sha1 . fromString

Charts (cont.)

  • Just put a Graphviz source code here and it will generate and insert the output in place:
~ ~ ~ {.dot name="diagram1"}
digraph G {Hello->World}
~ ~ ~
  • The output image will be diagram1.png, or the SHA hash of the piece of the code if name is not provided.
  • The piece of code will be replaced with this image in the output HTML