This repository contains the language specification for LayoutScript.
LayoutScript is a domain-specific language (DSL) designed to streamline the process of interacting with electronic resources by abstracting away technical complexities. With LayoutScript, users can define steps to read content from various sources such as spreadsheets, web pages, and emails in a human-readable format, resembling a step-by-step guide rather than traditional computer code. By providing a more intuitive and accessible approach, LayoutScript simplifies tasks involving electronic resources, reduces the need for technical expertise, and enhances productivity for users across various domains.
Terminology used in this document and other places that refer to LayoutScript.
Term | Meaning |
---|---|
address |
The locator or identifier used to locate specific content within a given context. This could be an XPath expression, a CSS selector (commonly used for web content), or another type of selector depending on the type of resource being accessed. |
content |
The actual information or data that is being targeted or retrieved using the provided address. This could include text, images, audio/video, links, or any other type of data that is relevant. |
context |
The environment or setting in which both the content and the address exist. |
The anatomy of an LayoutScript file consists of a list of directives. Each directive is typically placed on a separate line within the file. The first line must always contain the layout
directive, followed by any additional directives as needed. The delimiter (or separator) between arguments is <tab>
.
first line — | layout <tab> "engine name"
| ^
| \
| must begin with this
|
|
/¯ | "directive" <tab> "first argument" <tab> "second argument" [<tab> ...]
next lines [...]
\_ | "directive" <tab> "first argument"
| ^
| \
| directives are part of this
| language specification
|
commented — | # this is a comment
| ^
| \
| lines that start with this character should be ignored
The quotes in the sketch above are used solely to denote the placeholders. In a LayoutScript file, quotes will be included as part of the arguments, forming part of the text itself.
Version 1.0 of the languages provides the following directives to cover working with HTML documents and resources that are adjacent to it. The abbreviation of FA denotes the first argument; SA, second argument.
Document:
layout
is a reserved keyword to denote the begining of a LayoutScriptprompt
attempts to delegate the content as an argument to FA as a functionlabel
classifies the last evaluated content as FA
Text:
extract
outputs the first captured group from a given RegExp as FAreplace
outputs a new text by replacing the text of FA with the text of SAremove
outputs a new text by removing the text of FAinsert
outputs a new text by inserting the given placeholder FA in SAglue
outputs a new text by inserting the given FA as a prefixkeep
filters (keeps) by a given RegExp as FAdrop
filters (drops) by a given RegExp as FA
HTML:
bundle
changes the context on whichfollow
andselect
performfollow
outputs a list of elements that match the given address of XPath as FAselect
outputs a list of elements that match the given address of CSS selector as FA
HTML/CSS:
style
extracts the value of the CSS property as FA
Example:
layout vivid 1.0
# this is a comment... it can be useful
# or not, depending on what it's writen
select h1
label title
extract ([0-9]+)
label title year
replace [0-9]+ 20XX
# label can be used a few more times to
# also store the last evaluated/outputs
insert ~ This "~" is the page title
label observation
glue New:<space>
# get css properties for the headings
select h1
style font-family
label link font
style color
label link color
# to focus on elements that are groupped
# together...
bundle //div[@class="article"]
follow h1[contains(@class, "title")]/a/@href
keep \.png$
label link
follow a/@target
drop (_blank|_top)
label link target
# call another procedure
prompt rpc-call
prompt 3rd-party-api
prompt subprocess.sh