Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web clipper not clipping code snippets properly #5626

Closed
Jos512 opened this issue Oct 25, 2021 · 15 comments · Fixed by #10126
Closed

Web clipper not clipping code snippets properly #5626

Jos512 opened this issue Oct 25, 2021 · 15 comments · Fixed by #10126
Labels
bug It's a bug stale An issue that hasn't been active for a while...

Comments

@Jos512
Copy link

Jos512 commented Oct 25, 2021

When I clip content with the webclipper, newlines are removed from code blocks. The result is one long line rather than a code block in Joplin.

Environment

Joplin version: 2.4.9
Webclipper version: 2.1.3
Browser: Brave version 1.31.87 Chromium: 95.0.4638.54
Platform: Windows 10

Steps to reproduce

  1. Go to https://devblogs.microsoft.com/dotnet/http-3-support-in-dotnet-6/
  2. Use 'clip simplified page' feature of webclipper (same issue happens with other approaches).
  3. Go to Joplin and find second code block in Markdown. That contains:
public  static  async  Task  Main(string[] args)  {  var builder =  WebApplication.CreateBuilder(args); builder.WebHost.ConfigureKestrel((context, options)  =>  { options.Listen(IPAddress.Any,  5001, listenOptions =>  {  // Use HTTP/3 listenOptions.Protocols  =  HttpProtocols.Http1AndHttp2AndHttp3; listenOptions.UseHttps();  });  });  }

But the actual code example in the blog post is:

public static async Task Main(string[] args)
{
  var builder = WebApplication.CreateBuilder(args);
  builder.WebHost.ConfigureKestrel((context, options) =>
  {
    options.Listen(IPAddress.Any, 5001, listenOptions =>
    {
      // Use HTTP/3
      listenOptions.Protocols = HttpProtocols.Http1AndHttp2AndHttp3;
      listenOptions.UseHttps();
    });
  });
}
  1. Go to https://gomakethings.com/boolean-shorthands-and-truthiness/
  2. Clip the simplified page with the web clipper.
  3. Go to Joplin. The first code example in the Markdown is:
// Split the text content into an array, using spaces to break words // Use Array.filter() to remove any words without a length // (this removes double spaces in the content) let words = text.value.split(' ').filter(function (word) {
	return word.length;
}); 

But the formatting based on the website should be:

// Split the text content into an array, using spaces to break words
// Use Array.filter() to remove any words without a length
// (this removes double spaces in the content)
let words = text.value.split(' ').filter(function (word) {
	return word.length;
});

Describe what you expected to happen

I would expect the note to keep the newlines from the original page.

Logfile

There are no errors or warnings in the Developer Tool's window when clipping.


Thanks for your consideration and let me know if I can provide more information.

@Jos512 Jos512 added the bug It's a bug label Oct 25, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2022

Hey there, it looks like there has been no activity on this issue recently. Has the issue been fixed, or does it still require the community's attention? This issue may be closed if no further activity occurs. You may comment on the issue and I will leave it open. Thank you for your contributions.

@github-actions github-actions bot added the stale An issue that hasn't been active for a while... label Jan 9, 2022
@Jos512
Copy link
Author

Jos512 commented Jan 11, 2022

Hi bot, no the issue still happens so I'd like to keep this open. Thanks for keeping the repo clean.

@github-actions github-actions bot removed the stale An issue that hasn't been active for a while... label Jan 11, 2022
@github-actions
Copy link
Contributor

Hey there, it looks like there has been no activity on this issue recently. Has the issue been fixed, or does it still require the community's attention? This issue may be closed if no further activity occurs. You may comment on the issue and I will leave it open. Thank you for your contributions.

@github-actions github-actions bot added the stale An issue that hasn't been active for a while... label Feb 10, 2022
@Jos512
Copy link
Author

Jos512 commented Feb 13, 2022

Issue still happens.

@github-actions github-actions bot removed the stale An issue that hasn't been active for a while... label Feb 14, 2022
@github-actions
Copy link
Contributor

Hey there, it looks like there has been no activity on this issue recently. Has the issue been fixed, or does it still require the community's attention? This issue may be closed if no further activity occurs. You may comment on the issue and I will leave it open. Thank you for your contributions.

@github-actions github-actions bot added the stale An issue that hasn't been active for a while... label Mar 16, 2022
@Jos512
Copy link
Author

Jos512 commented Mar 18, 2022

Issue still happens, just checked now again.

@github-actions github-actions bot removed the stale An issue that hasn't been active for a while... label Mar 18, 2022
@github-actions
Copy link
Contributor

Hey there, it looks like there has been no activity on this issue recently. Has the issue been fixed, or does it still require the community's attention? This issue may be closed if no further activity occurs. You may comment on the issue and I will leave it open. Thank you for your contributions.

@github-actions github-actions bot added the stale An issue that hasn't been active for a while... label Apr 18, 2022
@Jos512
Copy link
Author

Jos512 commented Apr 19, 2022

Same issue

@Daeraxa
Copy link
Collaborator

Daeraxa commented Apr 19, 2022

I think this might be expected behaviour based on what the HTML on the website actually is. If we look at one of those code blocks:

<div class="highlight">
	<pre class="chroma" style="font-family: Menlo, Monaco, &quot;Courier New&quot;, monospace;">
		<code class="language-js" data-lang="js">
			<span class="c1">// Split the text content into an array, using spaces to break words</span>
			<span class="c1">// Use Array.filter() to remove any words without a length</span>
			<span class="c1">// (this removes double spaces in the content)</span>
			<span class="c1"></span>
			<span class="kd">let</span>
			<span class="nx">words</span>
			<span class="o">=</span>
			<span class="nx">text</span>
			<span class="p">.</span>
			<span class="nx">value</span>
			<span class="p">.</span>
			<span class="nx">split</span>
			<span class="p">(</span>
			<span class="s1">' '</span>
			<span class="p">).</span>
			<span class="nx">filter</span>
			<span class="p">(</span>
			<span class="kd">function</span>
			<span class="p">(</span>
			<span class="nx">word</span>
			<span class="p">)</span>
			<span class="p">{</span>
			<span class="k">return</span>
			<span class="nx">word</span>
			<span class="p">.</span>
			<span class="nx">length</span>
			<span class="p">;</span>
			<span class="p">});</span>
		</code>
	</pre>
</div>

Everything is just in various <span> tags with no actual newlines. If the clipper tried to break on every span tag then it would be even more of a mess (plus it would break all kinds of other formatting).
All these little classes are then referenced by stuff in their CSS including word-wrap styles in their code class.

@github-actions github-actions bot removed the stale An issue that hasn't been active for a while... label Apr 19, 2022
@Jos512
Copy link
Author

Jos512 commented Apr 20, 2022

Thanks for the reply! 🙂 Much appreciated.

I don't think there's a lack of newlines on the https://gomakethings.com/boolean-shorthands-and-truthiness/ page. If I go to view-source, I see each line of code having its own line:

view-sourcehttpsgomakethings comboolean-short_2022-04-20_12-17-13

There are indeed a lot of <span> tags, but Joplin usually works fine with that. Take this page for example: https://flaviocopes.com/bubble-sort-javascript/.

The first code example on that page also uses <span> tags:

<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"><code class="language-js" data-lang="js"><span style="color:#66d9ef">const</span> <span style="color:#a6e22e">bubbleSort</span> <span style="color:#f92672">=</span> (<span style="color:#a6e22e">originalArray</span>) =&gt; {
  <span style="color:#66d9ef">let</span> <span style="color:#a6e22e">swapped</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>

  <span style="color:#66d9ef">const</span> <span style="color:#a6e22e">a</span> <span style="color:#f92672">=</span> [...<span style="color:#a6e22e">originalArray</span>]

  <span style="color:#66d9ef">for</span> (<span style="color:#66d9ef">let</span> <span style="color:#a6e22e">i</span> <span style="color:#f92672">=</span> <span style="color:#ae81ff">1</span>; <span style="color:#a6e22e">i</span> <span style="color:#f92672">&lt;</span> <span style="color:#a6e22e">a</span>.<span style="color:#a6e22e">length</span> <span style="color:#f92672">-</span> <span style="color:#ae81ff">1</span>; <span style="color:#a6e22e">i</span><span style="color:#f92672">++</span>) {
    <span style="color:#a6e22e">swapped</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">false</span>

    <span style="color:#66d9ef">for</span> (<span style="color:#66d9ef">let</span> <span style="color:#a6e22e">j</span> <span style="color:#f92672">=</span> <span style="color:#ae81ff">0</span>; <span style="color:#a6e22e">j</span> <span style="color:#f92672">&lt;</span> <span style="color:#a6e22e">a</span>.<span style="color:#a6e22e">length</span> <span style="color:#f92672">-</span> <span style="color:#a6e22e">i</span>; <span style="color:#a6e22e">j</span><span style="color:#f92672">++</span>) {
      <span style="color:#66d9ef">if</span> (<span style="color:#a6e22e">a</span>[<span style="color:#a6e22e">j</span> <span style="color:#f92672">+</span> <span style="color:#ae81ff">1</span>] <span style="color:#f92672">&lt;</span> <span style="color:#a6e22e">a</span>[<span style="color:#a6e22e">j</span>]) {
        ;[<span style="color:#a6e22e">a</span>[<span style="color:#a6e22e">j</span>], <span style="color:#a6e22e">a</span>[<span style="color:#a6e22e">j</span> <span style="color:#f92672">+</span> <span style="color:#ae81ff">1</span>]] <span style="color:#f92672">=</span> [<span style="color:#a6e22e">a</span>[<span style="color:#a6e22e">j</span> <span style="color:#f92672">+</span> <span style="color:#ae81ff">1</span>], <span style="color:#a6e22e">a</span>[<span style="color:#a6e22e">j</span>]]
        <span style="color:#a6e22e">swapped</span> <span style="color:#f92672">=</span> <span style="color:#66d9ef">true</span>
      }
    }

    <span style="color:#66d9ef">if</span> (<span style="color:#f92672">!</span><span style="color:#a6e22e">swapped</span>) {
      <span style="color:#66d9ef">return</span> <span style="color:#a6e22e">a</span>
    }
  }

  <span style="color:#66d9ef">return</span> <span style="color:#a6e22e">a</span>
}</code></pre></div>

But with the 'Clip simplified page' command, the code in Joplin's note becomes:

const bubbleSort = (originalArray) => {
  let swapped = false

  const a = [...originalArray]

  for (let i = 1; i < a.length - 1; i++) {
    swapped = false

    for (let j = 0; j < a.length - i; j++) {
      if (a[j + 1] < a[j]) {
        ;[a[j], a[j + 1]] = [a[j + 1], a[j]]
        swapped = true
      }
    }

    if (!swapped) {
      return a
    }
  }

  return a
}

Joplin_2022-04-20_12-19-59

So for this page Joplin has no problem. (Although I don't know why Joplin fails in one case and succeeds in the other.)

@Daeraxa
Copy link
Collaborator

Daeraxa commented Apr 20, 2022

Yeah, not sure what I was looking at the first time, I thought the first example didn't actually contain newlines within the spans but it does, not sure why it seems the clipper is removing them.

@Jos512
Copy link
Author

Jos512 commented Apr 20, 2022

Yes I don't get it either. But I appreciate your input, it's always good to get more thoughts. And nice for me to not talk with myself and a bot anymore. 🙂

@Daeraxa
Copy link
Collaborator

Daeraxa commented Apr 20, 2022

So I just had a look with a different markdown tool (Markdownload - on Chrome and FF) and noticed some interesting results using your first example.

If you select just the codeblock by itself then the extension formats it the same way as the Joplin clipper but if you select the line preceding it then it formats it correctly...

@github-actions
Copy link
Contributor

Hey there, it looks like there has been no activity on this issue recently. Has the issue been fixed, or does it still require the community's attention? If you require support or are requesting an enhancement or feature then please create a topic on the Joplin forum. This issue may be closed if no further activity occurs. You may comment on the issue and I will leave it open. Thank you for your contributions.

@github-actions github-actions bot added the stale An issue that hasn't been active for a while... label May 20, 2022
@github-actions
Copy link
Contributor

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, feel free to create a new issue with up-to-date information.

wljince007 added a commit to wljince007/joplin that referenced this issue Mar 18, 2024
laurent22 added a commit that referenced this issue Apr 20, 2024
…e in multiline and delete code number lines (#10126)

Co-authored-by: Laurent Cozic <[email protected]>
Co-authored-by: Henry Heino <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug It's a bug stale An issue that hasn't been active for a while...
Projects
None yet
2 participants