Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text transformation feature – dashes, "smart" quotes, etc. #1490

Closed
ile opened this issue Dec 27, 2018 · 16 comments · Fixed by ckeditor/ckeditor5-typing#190
Closed

Text transformation feature – dashes, "smart" quotes, etc. #1490

ile opened this issue Dec 27, 2018 · 16 comments · Fixed by ckeditor/ckeditor5-typing#190
Milestone

Comments

@ile
Copy link

ile commented Dec 27, 2018

Would this be the correct plugin to handle things like:

  • Turn double dash (--) into – (endash)
  • Turn triple dash (---) into — (emdash)
  • In practice this would mean to turn double/triple dash + space into a wider dash.
  • https://en.wikipedia.org/wiki/Dash

Or

  • Turn "..." into “...”
  • Turn '...' into ‘...’ (or more "beautiful" quotation marks)

Or

  • If a text is selected (as in "painted") and then a quotation mark is typed, the selected text is wrapped in quotes.

If this would be the correct plugin, let's treat this issue as a feature request? Thank you.

@ile ile changed the title Dashes, "smart" quotes Dashes, "smart" quotes, etc. Dec 27, 2018
@scofalik
Copy link
Contributor

This topic was briefly discussed here: #1366. The short answer is: it probably is possible to implement those transformations using what is already available in autoformat feature but those kinds of transformations weren't on our mind when we created InlineAutoformatEditing so I can't guarantee you that it will work flawlessly or that the integration will look good.

@ile
Copy link
Author

ile commented Jan 1, 2019

Ok.

It would be nice to have a ckeditor-typography package, which would do something like this:

https://github.com/pnevyk/tipograph
https://github.com/sapegin/richtypo.js

I know I could perhaps create it myself, and maybe I will, but if you did it, it would probably be correctly done from the beginning...

@Reinmar
Copy link
Member

Reinmar commented Jan 30, 2019

Agree. We actually planned this package and there might've even been a ticket for it (can't find it though). The autoformatting feature is about formatting, so it's a bit different topic. Text transformations would be a separate thing.

I'm moving this ticket to the main repo and confirming.

@Reinmar Reinmar transferred this issue from ckeditor/ckeditor5-autoformat Jan 30, 2019
@Reinmar Reinmar added type:feature This issue reports a feature request (an idea for a new functionality or a missing option). status:confirmed labels Jan 30, 2019
@Reinmar Reinmar added this to the next milestone Jan 30, 2019
@Reinmar Reinmar changed the title Dashes, "smart" quotes, etc. Text transformation feature – dashes, "smart" quotes, etc. Jan 30, 2019
@Reinmar Reinmar modified the milestones: next, iteration 22, iteration 23 Mar 1, 2019
@mlewand
Copy link
Contributor

mlewand commented Apr 2, 2019

We were unable to fit this feature in the current iteration, however we added mentions (#998) which included some APIs (most importantly text watcher) that will be helpful while implementing this feature. We're looking to implement it during next iteration.

@mlewand mlewand modified the milestones: iteration 23, iteration 24 Apr 2, 2019
@jodator
Copy link
Contributor

jodator commented Apr 2, 2019

@mlewand the text watcher will probably be moved to other place. Also it needs some work to make it more usable - right now it is focused on the Mention feature needs. This should be properly exposed in next iteration.

@jodator
Copy link
Contributor

jodator commented May 10, 2019

@mlewand Some insights about the feature:

  1. What to change transform as we can go crazy here:

    • dashes to endash/emdash
    • ... to …
    • quotes (language rules?)
      • "smart quotes of selected text"
      • example rules
      • should change according to (text's) language (follow up feature?)
    • math related?
      • 2 x 3 to 2 × 3 (and similar for number)
      • <= to ≤
      • etc...
      • numbers thousands separator (kinda meh)
    • spaces?
    • other ligatures like
      • ?? !?, etc.
      • (c) to ©, (tm) to ™
  2. Name of the feature (thus repo name) Text transformations? Typography?

  3. How/what to configure:

Probably we should go with a set of default, most requested options: dashes, quotes, maybe some ligatures for some common things.

  1. Other notes:

    • it probably should only work for typing not selection change (ie TextWatcher in mentions acts on selection changes also) - so maybe some sort of flag or scoped events will be needed
    • the feature should be easily extensible - in some simplest ie 'NY' -> 'New York' but it should also supports RegExps
  2. Others implementations (also feature names):

The OpenOffice Writer has an AutoCorrect feature which does this and additional things (out of the scope of this feature):
Selection_006

The Google docs have it under:
Selection_007

Edit: MS Word have those features split to AutoCorrect and Math AutoCorrect :
Screenshot 2019-05-10 13 52 16
Screenshot 2019-05-10 13 52 41

References:

  1. pnevyk/tipograph
  2. Butterick’s Practical Typography
  3. Wikipedia's entry on quotation marks

@jodator
Copy link
Contributor

jodator commented May 16, 2019

While implementing a feature some questions camoe out:

  1. Feature configuration namespace
    option A:

    editor.config.set( 'textTransformation.transformations', [
    	{ from: '\\.\\.\\.', to: '…' },
    ] );

    option B:

    editor.config.set( 'typing.textTransformations', [
    	{ from: '\\.\\.\\.', to: '…' },
    ] );

    I'm divided between those two: option B is when we're sure that the configuration will only consist one key while option A is more future-proof.

  2. How to extend/mix/remove default configuration.

    I'm pretty sure that whatever default configuration we figure out it not gonna suite everyone :) We can go with simple replace the config as other features. So defining new config will overwrite default. We could ease developers by exposing parts of default configuration as:

    2a. Stings of groups (no import needed, some docs reading required)

    ClassicEditor.create( foo, {
    	textTransformation: {
    		transformations: [
    			'symbols',
    			'dashes',
    			'quotes', // or `quotes_en_us`, 'quotes_pl`, etc
    			{ from: 'CKS', to: 'CKSource' }, // custom
    		]
    	}
    }

    2b. (may be additional to a) Name each default transformation and allow it as in 2a, ie, trademark, quotes_en_us_primary, emdash, etc.

    2c. Export constants (meh I wouldn't go there - requires import)

    2d. Getter / method on TextTransformation class (would require someone to write a plugin to extend a configuration so it also look like something weird).

  3. Add support for to: option to be a callback? The RegExp alone is doing fine but one might like to have some special rules. (also might be follow of a base feature).

  4. As we should be able to configure a feature by either String or RegExp then I think it is better to enforce appending end of string token ($) to strings from config.

    const transformations = [
    	{ from: '\\.\\.\\.', to: '…' },
    	{ from: ' -- ', to: ' – ' },
    	{ from: ' --- ', to: ' — ' },
    	// etc
    ];

    looks better then:

    const transformations = [
    	{ from: '\\.\\.\\.$', to: '…' },
    	{ from: ' -- $', to: ' – ' },
    	{ from: ' --- $', to: ' — ' },
    	// etc
    ];

    Rationale behind this is that we want to change typed text (thus at the end of string - before the caret) and not all text in a paragraph. Hope this makes sense. Can be overridden with RegExp.

  5. Finally - what set of default transformations we would like to define?

    It might be a follow-up as the feature should not rely on set of transformations. (ps.: to be tested with large set of text watchers - might be separate ticket also for TextWatcher alone).

@mlewand
Copy link
Contributor

mlewand commented May 16, 2019

  1. I like the idea of making it a part of the typing as it fits it very well. I think that even option B can be extensible while maintaining backward compatibility, so I wouldn't be worried about that.

    Also note that typing pretty much implies text, we don't have to call it textTransformations but we can simplify it down to text.transformations.

    Future format (if needed) might look like:

    editor.config.set( 'typing.transformation', {
    	transformations: [
    		{ from: '\\.\\.\\.', to: '…' },
    	],
    	getActive: () => myAmazingCms.user.wysiwyg.textTransformationActive
    } );

    The only thing I don't like about the above is two subsequent transformation in the config chain.

    Alternatively I'm thinking about API like that:

    editor.config.set( 'typing.transformation', {
    	// Override default transformations set.
    	include: [
    		'dashes',
    		'quotes'
    	],
    	getActive: () => myAmazingCms.user.wysiwyg.textTransformationActive
    } );
    editor.config.set( 'typing.transformation', {
    	// Use the default set +trademark but without symbols.
    	exclude: [
    		'symbols'
    	],
    	extra: [
    		'trademark'
    	],
    	getActive: () => myAmazingCms.user.wysiwyg.textTransformationActive
    } );

    WDYT?

  2. Names for both groups and individual transformation looks very convenient. So option 2b sounds like a way to go. We'll need to pay attention to well document these.

  3. I don't see a need to implement this at this point. If this feature will be requested we'll add it in future releases.

  4. Not sure about this one.

    Can be overridden with RegExp.

    Could you elaborate more on that?

    I understand this as if someone wants to match stuff beyond the caret he or she can skip the $ using raw regexp. E.g.

    const transformations = [
    	{ from: /\.\.\./, to: '…' }, // match beyond caret
    	{ from: ' -- $', to: ' – ' }
    ];
  5. I'll address this later.

@jodator
Copy link
Contributor

jodator commented May 16, 2019

@mlewand:

Ad 4: Instead of pattern you can provide RegExp instance. But matching text inside paragraph will not work as we assume that change is done at the caret so the $ is somewhat obligatory.

The stuff with RegExp would work but only if we allow custom callback for to option.

{ from /abc/g, ( wholeTextBeforeCaret ) => return doSomethingWith( wholeTextBeforeCaret ) }

but instead of /abc/g there would be a more complex regex.

Ad1. The include/exclude/extra is also an option to go. The only thing I don't get is the getActive() callback.

@mlewand
Copy link
Contributor

mlewand commented May 16, 2019

  1. I'm still not sure about that, I don't feel comfortable with doing magic stuff to provided regexp, in a sense that dev provides:

    { from /abc/g, ( wholeTextBeforeCaret ) => return doSomethingWith( wholeTextBeforeCaret ) }

    And we convert it into:

    { from /abc$/g, ( wholeTextBeforeCaret ) => return doSomethingWith( wholeTextBeforeCaret ) }

    As long as our documentation shows this in listing I think we're good with requiring tailing $ character in regexps.

    For sure we can't input regular expressions as a string, because it will prevent us from adding simple string (literal) transformations, like the one you provided:

    { from: 'CKS', to: 'CKSource' }

regarding 1. Yea, I added this getActive() option just as a dummy example how can it evolve later on 🙂

@jodator
Copy link
Contributor

jodator commented May 16, 2019

  1. I'm still not sure about that, I don't feel comfortable with doing magic stuff to provided regexp, in a sense that dev provides:

@mlewand no I don't wan't to convert RegExp objects - only the Strings which are regex patterns.

If someone would like to define:

{ from: 'CKS', to: 'CKSource' }

I'm convinced that we need to create a RegExp like this:

new RegExp( 'CKS$' ); 

so the text watcher (I mean the transformation) will trasnform:

Some CKS here and CKS there is CKS[]

to

Some CKS here and CKS there is CKSource[]

and not to:

Some CKSource here and CKSource there is CKSource[]

This makes sense because one can undo text transformation and thus do not need to worry of the erlier undone transformation to be transformed again.

@jodator
Copy link
Contributor

jodator commented May 16, 2019

So after F2F talk with @mlewand it indeed make no sense to mess with config as I described. The reason behind this was to simplify configuration but at the same time simple strings transformation such as cks -> CKSource & (tm) -> ™ should be done on callbakcks internally and should be simple string.endsWith() not full-blown RegExps.
We should only mention this in the docs.

@Reinmar
Copy link
Member

Reinmar commented May 30, 2019

Could you sum up, guys, what's the plan for this feature? How will the configuration look like? And how it will be triggered?

@jodator
Copy link
Contributor

jodator commented May 30, 2019

Sure. So if you'd like to fiddle with current state of this there's a PR which implements (use ckeditor5 branch for mgit co as this covers multiple repos):

  • the text transformation is triggered by typing changes - before a caret

  • the typed transformation can be undone with undo (and redone with redo)

  • the data or the text that is not just before caret is not transformed

  • there's a minimal set of transformation (it yet to be decided which to add)

  • the transformation are not triggered on selection change (that'd be strange)

  • we'll start with English quotes but we could go with quotes rules for other languages

  • transformations are configured as:

     typing: {
     	transformation: {
     		include: [
     			// configured by feature - set of defaults. 
     		],
     		exclude: [
     			// remove transformations from defaults (see below to decide/implement)
     		]
     		extra: [
     			// user defined additional transformations
     			{ from: 'CKE', to: 'CKEditor' }, // string
     			{ from: /([a-z]+)@(example.com)$/, to: '$1.at.$2' } // regexp
     		]
     	}
     }

Still to decide/implement:

  • sane set of default transformation
  • named configurations (to be used in config's exclude and include)

For named configurations as see it as group name (ie. 'math' or 'quotes') and individual (like 'quotes_primary_en' or 'trademark'.

For group stubs checkout currently implemented (very limited just to showcase groups):

const DEFAULT_TRANSFORMATIONS = [
	// Common symbols:
	{ from: '(c)', to: '©' },
	{ from: '(r)', to: '®' },
	{ from: '(tm)', to: '™' },

	// Mathematical:
	{ from: '1/2', to: '½' },
	{ from: '<=', to: '≤' },

	// Typography:
	{ from: '...', to: '…' },
	{ from: ' -- ', to: ' – ' },
	{ from: ' --- ', to: ' — ' },

	// Quotations:
	// English primary.
	{ from: buildQuotesRegExp( '"' ), to: '$1“$2”' },
	// English secondary.
	{ from: buildQuotesRegExp( '\'' ), to: '$1‘$2’' }
];

As you can see there are some rules:

  1. if it is stirng we're using ' ' in some transformations that should happen only after user type space (mostly dashes).
  2. RegExp must include $ character to denote caret placement (to be described in the docs).

@jodator
Copy link
Contributor

jodator commented May 30, 2019

@Reinmar and a showcase:

Peek 2019-05-30 12-58

Reinmar added a commit to ckeditor/ckeditor5-typing that referenced this issue Jun 27, 2019
Feature: Introduced the text transformation feature. Additionally, the `TextWatcher` util was moved to this package from `@ckeditor/ckeditor5-mention`. Closes ckeditor/ckeditor5#1490.
@mlewand
Copy link
Contributor

mlewand commented Jul 3, 2019

This feature recently was merged to the master branch 🎉

For the usage API see the docs (or in case they're not deployed yet: the discussion above 🙂).

Sample screencast:
Screencast showing parts of text being transformed as the end user types into the editor

@mlewand mlewand removed status:confirmed type:feature This issue reports a feature request (an idea for a new functionality or a missing option). labels Sep 23, 2019
@mlewand mlewand modified the milestones: iteration 25, backlog Sep 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants