Skip to content

This script extracts a single HTML tag (returns an object) or multiple HTML tags (returns a list of objects) as JSON / Python dictionary. Click the link below to test it online (Google Colab):

Notifications You must be signed in to change notification settings

tomasfn87/html-tags-to-json

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

html-tags-to-json

Test it (Google Colab)




This script extracts a single HTML tag (returns an object) or multiple HTML tags (returns a list of objects) as JSON.

{
    "name": "head",
    "content": "<head><title>Test</title></head>",
    "innerHtml": {
        "name": "title",
        "content": "<title>Test</title>",
        "innerHtml": "Test"
    }
}


Usage:


myHtmlContent = Html('<head><title>Test</title></head>')
myHtmlContent.extractHtmlTagAsDict()

Output:

{
    name: 'head',
    content: '<head><title>Test</title></head>',
    innerHtml: {
        name: 'title',
        content: '<title>Test</title>',
        innerHtml: 'Test'
    }
}



If there's no HTML content, the main extraction function (extractHtmlTagAsDict) will just return the string back:

myOnlyTextContent = Html('This is not HTML.')
myOnlyTextContent.extractHtmlTagAsDict()

Output:

'This is not HTML.'



If there's HTML content mixed with text and comments, the main extraction function (extractHtmlTagAsDict) will return an array with each tag, text and comment:

myHtmlAndTextContent = Html('''
    <p>
        This is a paragraph.<br>
    </p>
    This is text with HTML (and a comment).
    <!-- A comment -->
''')
myHtmlAndTextContent.extractHtmlTagAsDict()

Output:

[
    {
        name: 'p':,
        content: '<p>This is a paragraph.<br></p>',
        innerHtml: [
            'This is a paragraph.',
            {
                name: 'br',
                content: '<br>'
            }
        ]
    },
    'This is text with HTML (and a comment).',
    {
        comment: '<!-- A comment -->',
    }
]

About

This script extracts a single HTML tag (returns an object) or multiple HTML tags (returns a list of objects) as JSON / Python dictionary. Click the link below to test it online (Google Colab):

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages