Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An example of http_method and body in splash script #49

Open
lopuhin opened this issue Apr 4, 2016 · 4 comments
Open

An example of http_method and body in splash script #49

lopuhin opened this issue Apr 4, 2016 · 4 comments

Comments

@lopuhin
Copy link
Contributor

lopuhin commented Apr 4, 2016

There are examples of using cookies in the docs, but no examples of setting method and body. I think it would be useful to add it, or perhaps even add the following class (with a better name): with it is possible to use full capabilities of scrapyjs without digging into splash scripts:

class DefaultExecuteSplashRequest(SplashRequest):
    '''
    This is a SplashRequest subclass that uses minimal default script
    for the execute endpoint with support for POST requests and cookies.
    '''
    SPLASH_SCRIPT = '''
    function last_response_headers(splash)
        local entries = splash:history()
        local last_entry = entries[#entries]
        return last_entry.response.headers
    end

    function main(splash)
        splash:init_cookies(splash.args.cookies)
        assert(splash:go{
            splash.args.url,
            headers=splash.args.headers,
            http_method=splash.args.http_method,
            body=splash.args.body,
            })
        assert(splash:wait(0.5))

        return {
            headers=last_response_headers(splash),
            cookies=splash:get_cookies(),
            html=splash:html(),
        }
    end
    '''

    def __init__(self, *args, **kwargs):
        kwargs['endpoint'] = 'execute'
        splash_args = kwargs.setdefault('args', {})
        splash_args['lua_source'] = self.SPLASH_SCRIPT
        super(DefaultExecuteSplashRequest, self).__init__(*args, **kwargs)
@lopuhin
Copy link
Contributor Author

lopuhin commented Apr 4, 2016

Ah, this example is missing http_status support.

@kmike
Copy link
Member

kmike commented Apr 4, 2016

Yeah, this makes sense.

For all other endpoints http_method and body work as-is, but for Lua script you have to implement it yourselves.

HTTP status code is handled for /execute since fa4f287, but in a very limited way - no response body, not headers, no cookies. You're right that it must be handled explicitly in a script to provide good experience.

As for DefaultExecuteSplashRequest, it looks related to scrapinghub/splash#283; I wonder if we should provide a way to use scripts stored in separate .lua files in SplashExecuteRequest or SplashLuaRequest (or in SplashRequest directly).

@lopuhin
Copy link
Contributor Author

lopuhin commented Apr 5, 2016

Ah, I missed that error support, this is nice!

I like the SplashExecuteRequest idea. Making it all composable looks really hard though.

lopuhin added a commit to lopuhin/scrapy-splash that referenced this issue Apr 11, 2016
@Gallaecio
Copy link
Contributor

As for covering this in the documentation, shouldn’t it be done in the Splash documentation instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants