Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix sanitizer config - multiple rules #11133

Merged
merged 8 commits into from
Apr 29, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions custom/conf/app.ini.sample
Original file line number Diff line number Diff line change
Expand Up @@ -963,8 +963,10 @@ SHOW_FOOTER_VERSION = true
; Show template execution time in the footer
SHOW_FOOTER_TEMPLATE_LOAD_TIME = true

[markup.sanitizer]
; The following keys can be used multiple times to define sanitation policy rules.
[markup.sanitizer.1]
; The following keys can appear once to define a sanitation policy rule.
; This section can appear with an incremenented number to define multiple rules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
; This section can appear with an incremenented number to define multiple rules.
; This section can appear with an incrementing numbers or any distinct alphanumeric string to define multiple rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "this section can appear with an incrementing numbers to define multiple rules" is correct English either.

I've made it just "This section can appear again with a unique alphanumeric string to define multiple rules".

; e.g., [markup.sanitizer.1] -> [markup.sanitizer.2]
cipherboy marked this conversation as resolved.
Show resolved Hide resolved
;ELEMENT = span
;ALLOW_ATTR = class
;REGEXP = ^(info|warning|error)$
Expand Down
38 changes: 36 additions & 2 deletions docs/content/doc/advanced/config-cheat-sheet.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -646,7 +646,7 @@ Two special environment variables are passed to the render command:
Gitea supports customizing the sanitization policy for rendered HTML. The example below will support KaTeX output from pandoc.

```ini
[markup.sanitizer]
[markup.sanitizer.TeX]
; Pandoc renders TeX segments as <span>s with the "math" class, optionally
; with "inline" or "display" classes depending on context.
ELEMENT = span
Expand All @@ -658,7 +658,41 @@ REGEXP = ^\s*((math(\s+|$)|inline(\s+|$)|display(\s+|$)))+
- `ALLOW_ATTR`: The attribute this policy allows. Must be non-empty.
- `REGEXP`: A regex to match the contents of the attribute against. Must be present but may be empty for unconditional whitelisting of this attribute.

You may redefine `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` multiple times; each time all three are defined is a single policy entry.
**Note**: The above section naming policy is new; previously the section was `[markup.sanitizer]` and keys could be redefined.
Now, a unique identifier must appear in the section name (e.g., `[markup.sanitizer.TeX]`) in order to parse multiple rules.
This was changed because the implementation with the ini parser used was flawed; the following configs were indistinguishable after parsing:

```ini
[markup.sanitizer]
ELEMENT = a
ALLOW_ATTR = target
REGEXP = $1
ELEMENT = a
ALLOW_ATTR = rel
REGEXP = $2
ELEMENT = img
ALLOW_ATTR = src
REGEXP = $3
```

and

```ini
[markup.sanitizer]
ELEMENT = a
ALLOW_ATTR = target
REGEXP = $1
ELEMENT = img
ALLOW_ATTR = rel
REGEXP = $2
ELEMENT = img
ALLOW_ATTR = src
REGEXP = $3
```

Because of limitations in the ini library, we are unable to automatically migrate configurations.

We will still parse the first rule from a `[markup.sanitizer]` section if present, but multiple rules must be manually migrated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This was changed because the implementation with the ini parser used was flawed; the following configs were indistinguishable after parsing:
```ini
[markup.sanitizer]
ELEMENT = a
ALLOW_ATTR = target
REGEXP = $1
ELEMENT = a
ALLOW_ATTR = rel
REGEXP = $2
ELEMENT = img
ALLOW_ATTR = src
REGEXP = $3
```
and
```ini
[markup.sanitizer]
ELEMENT = a
ALLOW_ATTR = target
REGEXP = $1
ELEMENT = img
ALLOW_ATTR = rel
REGEXP = $2
ELEMENT = img
ALLOW_ATTR = src
REGEXP = $3
```
Because of limitations in the ini library, we are unable to automatically migrate configurations.
We will still parse the first rule from a `[markup.sanitizer]` section if present, but multiple rules must be manually migrated.

I'd rather remove this section. If you feel like an explanation is needed, please reference the version where it was introduced and link the relevant issue or this PR (e.g. "this was changed from the original implementation in 1.11 due to severe limitations; see #1234"). But we tend to avoid making explicit version references in the docs, so maybe just remove it and let the Blog speak. 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a blog entry is fine with @zeripath I could write such, but I must confess I've never looked at the blog entries and mostly read documentation. :-) On the breaking section of release notes, there's a link to the PR, but not to the blog entry -- do none of those have blog entries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... So this bit could be dropped and instead placed in the PR description - during release we can then precis this and put it into the blog post.


## Time (`time`)

Expand Down
9 changes: 7 additions & 2 deletions docs/content/doc/advanced/external-renderers.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ IS_INPUT_FILE = false
If your external markup relies on additional classes and attributes on the generated HTML elements, you might need to enable custom sanitizer policies. Gitea uses the [`bluemonday`](https://godoc.org/github.com/microcosm-cc/bluemonday) package as our HTML sanitizier. The example below will support [KaTeX](https://katex.org/) output from [`pandoc`](https://pandoc.org/).

```ini
[markup.sanitizer]
[markup.sanitizer.TeX]
; Pandoc renders TeX segments as <span>s with the "math" class, optionally
; with "inline" or "display" classes depending on context.
ELEMENT = span
Expand All @@ -86,6 +86,11 @@ FILE_EXTENSIONS = .md,.markdown
RENDER_COMMAND = pandoc -f markdown -t html --katex
```

You may redefine `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` multiple times; each time all three are defined is a single policy entry. All three must be defined, but `REGEXP` may be blank to allow unconditional whitelisting of that attribute.
You must define `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` in each section.

To define multiple entries, define different section names (e.g., `[markup.sanitizer.1]` and `[markup.sanitizer.2]`).
These can be numbers, identifying names, or anything else.

Once your configuration changes have been made, restart Gitea to have changes take effect.

**Note**: The above section numbering policy is new; previously the section was `[markup.sanitizer]` and keys could be redefined.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Note**: The above section numbering policy is new; previously the section was `[markup.sanitizer]` and keys could be redefined.

Same as above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had specifically requested some information as this is a breaking change requiring change to config (although the previous config simply does not work.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid duplicating too much info, I've added a reference to the cheat sheet here but explicitly left this line. Its either that or add a reference to this pull request (in both places) if we don't want it in the documentation -- but to someone wanting to migrate, it isn't clear what the problem is just by looking at our extended discussion here. :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Suggested change
**Note**: The above section numbering policy is new; previously the section was `[markup.sanitizer]` and keys could be redefined.
**Note**: Prior to Gitea 1.12 there was a single `markup.sanitiser` section and keys could be redefined, however, there were significant problems with this method of configuration necessitating this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had specifically requested some information as this is a breaking change requiring change to config (although the previous config simply does not work.)

Sorry, @zeripath. I missed your comment, which was already resolved.

58 changes: 25 additions & 33 deletions modules/setting/markup.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ func newMarkup() {
continue
}

if name == "sanitizer" {
if name == "sanitizer" || strings.HasPrefix(name, "sanitizer.") {
zeripath marked this conversation as resolved.
Show resolved Hide resolved
newMarkupSanitizer(name, sec)
} else {
newMarkupRenderer(name, sec)
Expand All @@ -67,44 +67,36 @@ func newMarkupSanitizer(name string, sec *ini.Section) {
return
}

elements := sec.Key("ELEMENT").ValueWithShadows()
allowAttrs := sec.Key("ALLOW_ATTR").ValueWithShadows()
regexps := sec.Key("REGEXP").ValueWithShadows()
elements := sec.Key("ELEMENT").Value()
allowAttrs := sec.Key("ALLOW_ATTR").Value()
regexpStr := sec.Key("REGEXP").Value()

if len(elements) != len(allowAttrs) ||
len(elements) != len(regexps) {
log.Error("All three keys in markup.%s (ELEMENT, ALLOW_ATTR, REGEXP) must be defined the same number of times! Got %d, %d, and %d respectively.", name, len(elements), len(allowAttrs), len(regexps))
if regexpStr == "" {
rule := MarkupSanitizerRule{
Element: elements,
AllowAttr: allowAttrs,
Regexp: nil,
}

ExternalSanitizerRules = append(ExternalSanitizerRules, rule)
return
}

ExternalSanitizerRules = make([]MarkupSanitizerRule, 0, len(elements))

for index, pattern := range regexps {
if pattern == "" {
rule := MarkupSanitizerRule{
Element: elements[index],
AllowAttr: allowAttrs[index],
Regexp: nil,
}
ExternalSanitizerRules = append(ExternalSanitizerRules, rule)
continue
}

// Validate when parsing the config that this is a valid regular
// expression. Then we can use regexp.MustCompile(...) later.
compiled, err := regexp.Compile(pattern)
if err != nil {
log.Error("In module.%s: REGEXP at definition %d failed to compile: %v", name, index+1, err)
continue
}
// Validate when parsing the config that this is a valid regular
// expression. Then we can use regexp.MustCompile(...) later.
compiled, err := regexp.Compile(regexpStr)
if err != nil {
log.Error("In module.%s: REGEXP (%s) at definition %d failed to compile: %v", regexpStr, name, err)
return
}

rule := MarkupSanitizerRule{
Element: elements[index],
AllowAttr: allowAttrs[index],
Regexp: compiled,
}
ExternalSanitizerRules = append(ExternalSanitizerRules, rule)
rule := MarkupSanitizerRule{
Element: elements,
AllowAttr: allowAttrs,
Regexp: compiled,
}

ExternalSanitizerRules = append(ExternalSanitizerRules, rule)
}

func newMarkupRenderer(name string, sec *ini.Section) {
Expand Down