Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r.init() failed to download TagUI_Linux.zip file due to firewall - pack() or hack #104

Closed
xxdxxd opened this issue Feb 7, 2020 · 25 comments
Labels

Comments

@xxdxxd
Copy link

xxdxxd commented Feb 7, 2020

Hi Sir,

Your RPA looks great, I am trying to test it in our work environment(Linux Ubuntu) but got an error. I think it has to do our firewall. Do you have suggestions on how to bypass this issue? I tried to download this zip file from Windows, then copied it to my home directory in linux, but init() function still tried to download this zip file and timed out.

In [100]: r.init(visual_automation = True, chrome_browser = False)
[RPA][INFO] - setting up TagUI for use in your Python environment
[RPA][INFO] - downloading TagUI (~200MB) and unzipping to below folder...
[RPA][INFO] - /home/xxx/
[RPA][ERROR] - failed downloading from https://github.com/tebelorg/Tump/releases/download/v1.0.0/TagUI_Linux.zip...
<urlopen error [Errno 110] Connection timed out>

Thanks,
Dave

@kensoh kensoh changed the title r.init() function failed to download TagUI_Linux.zip file due to firewall r.init() function failed to download TagUI_Linux.zip file due to firewall - use pack() Feb 7, 2020
@kensoh kensoh changed the title r.init() function failed to download TagUI_Linux.zip file due to firewall - use pack() r.init() failed to download TagUI_Linux.zip file due to firewall - use pack() Feb 7, 2020
@kensoh kensoh added the query label Feb 7, 2020
@kensoh
Copy link
Member

kensoh commented Feb 7, 2020

Hi Dave, yes this looks like your company firewall may be blocking automated downloads from GitHub. You can use the pack() function on a computer with internet access and no firewall. After that, copy the zip file to your work computer to use.

See more details at API reference and the 3-step guide here #36 (comment)

@xxdxxd
Copy link
Author

xxdxxd commented Feb 7, 2020 via email

@kensoh
Copy link
Member

kensoh commented Feb 8, 2020

Hi Dave, can you tell me how do you download TagUI_Linux.zip when your work computer has no access to internet and not allowed to copy files from home computer to office computer?

If you are doing it through proxy, can you try setting it using the following before you run Python and import the package? The standard download urllib.request used by the download() function should use the proxy settings defined in the environment.

Windows

set http_proxy=https://proxy.myproxy.com
set https_proxy=https://proxy.myproxy.com

macOS / Linux

export http_proxy=https://proxy.myproxy.com
export https_proxy=https://proxy.myproxy.com

There is no plan to have a function within the package to define proxy, partly because proxy settings are not applied to the Chrome browser invoked through TagUI, and partly because a normal user for RPA use cases won't be programmatically changing proxy. User can still automate the steps through frontend UI layer to change proxy the way he does it manually.

@kensoh
Copy link
Member

kensoh commented Feb 8, 2020

For recognising the local TagUI_Linux.zip it is not a good solution, because there are 7 other files which will also be downloaded during init(). These other 7 files are stable cutting edge version source code files for TagUI. If implement recognise local files, users will have to also download these 7 files, which is too much user friction.

@kensoh
Copy link
Member

kensoh commented Feb 8, 2020

Adding on, as you can download TagUI_Linux.zip on your work computer, there must be some way which your work computer can access the internet.

If above suggestion using proxy does not work and init() can't run, you can use pack() on your home Linux computer to generate the zip file, and upload using something like Firefox Send - https://send.firefox.com or some other ways which allows your work computer to download.

Let me know more how you download TagUI_Linux.zip on your work computer and any other details so that we can find the best solution to use the package and update it in future.

@xxdxxd
Copy link
Author

xxdxxd commented Feb 8, 2020 via email

@kensoh
Copy link
Member

kensoh commented Feb 8, 2020

I see.. Thanks for sharing these details! Look forward to hearing more from you on how it turns out when you set environmental variables for proxy. I will work with you closely to figure out a way to have the tool run on your work computer with as little user friction as possible.

Letting the tool run on computers with restricted internet access or no internet access is a primary goal of this project, as I see decentralised distribution and running of software to be very important in the coming decade.

In the meantime, from above data points, it looks like your firewall may be configured to allow http requests from allowed apps such as Chrome browser but not other apps like a Python process. Because the URL which you use to download the TagUI_Linux.zip and the URL which the tool automatically downloads the file is the same - https://github.com/tebelorg/Tump/releases/download/v1.0.0/TagUI_Linux.zip

If that's the case, in the worst case scenario, the following may work for you. Try this only if the setting proxy method does not work. Because many steps below, troublesome.

  1. run r.pack() on your personal Linux computer to generate the zip
  2. on your GitHub account, create a dummy repo and a dummy release
  3. attach the rpa_python.zip and rpa.py as attachments in the dummy release
  4. use Chrome browser on your work laptop to download the files
    (use the same way as how you download TagUI_Linux.zip)

@xxdxxd
Copy link
Author

xxdxxd commented Feb 10, 2020

I tried to set the environmental variable http_proxy and https_proxy in Linux before running Python.

In [1]: import rpa as r
In [2]: r.init()
[RPA][INFO] - setting up TagUI for use in your Python environment
[RPA][INFO] - downloading TagUI (~200MB) and unzipping to below folder...
[RPA][INFO] - /abc/def
[RPA][ERROR] - failed downloading from https://github.com/tebelorg/Tump/releases/download/v1.0.0/TagUI_Linux.zip...

Out[2]: False

So it did not work.
Also we can not use personal github to transfer files to company network, so running pack() will not work.

@xxdxxd
Copy link
Author

xxdxxd commented Feb 10, 2020

I can use python requests library to download the TagUI_Linux.zip file without issues after I set up the proxy. When I use rpa init() function, it will give error. I wonder if init() function can accept some parameter to accept the proxy info.

@kensoh
Copy link
Member

kensoh commented Feb 11, 2020

From the Python docs, it looks like the way download() use urllib.request it will automatically get proxy from the environment - https://docs.python.org/3/library/urllib.request.html

I've also check Python requests package and it seems to also be using urllib functions to get proxy settings - https://github.com/psf/requests/blob/master/requests/compat.py

Can you try below to see if that works? Maybe the quotation marks are needed, and the previous solution I found from StackOverflow without quotes is wrong.

export HTTP_PROXY="https://10.10.10.10:8000"
export HTTPS_PROXY="https://10.10.10.10:1212"

In the worst case scenario you can use the 4 steps above to upload the zip file from pack() and use your Python requests package proxy method to do the download.

@xxdxxd
Copy link
Author

xxdxxd commented Feb 11, 2020

I tried the export with quotation marks, it did not work either.

The 4 steps will not work because it will violate the company policy.

@xxdxxd
Copy link
Author

xxdxxd commented Feb 11, 2020

The following is how I use proxy and it can work.

import requests
from requests.adapters import HTTPAdapter

class ProxyUAAdapter(HTTPAdapter):
def proxy_headers(self, proxy):
headers = super(ProxyUAAdapter, self).proxy_headers(proxy)
headers['User-Agent'] = 'Lynx'
return headers

s = requests.Session()
s.mount('https://', ProxyUAAdapter())
s.mount('https://', ProxyUAAdapter())

proxies = {'http': 'https://10.10.10.10:8000', 'https': 'https://10.10.10.10:1212'}

url = "https://www.abcd.com/a.zip"

j = s.get(url, proxies = proxies)

@kensoh
Copy link
Member

kensoh commented Feb 11, 2020

I see.. I'm afraid I have no other idea to try for now. Because exhaust what online documentation suggests. Will have to look out for more data points from other users with similar setup to see if there is some way to solve.

For the 4th step to download file, I mean you download the same way using Python requests module on the uploaded URL, the same way that you manage to download TagUI_Linux.zip.

@kensoh
Copy link
Member

kensoh commented Feb 11, 2020

Yes you can do steps 1 to 3 to upload zip file and rpa.py to a dummy release on github. After that you use your requests script above to download the URLs of the files uploaded in step 3.

@kensoh
Copy link
Member

kensoh commented Feb 11, 2020

I'll avoid adding requests as a dependency to proxy for now to avoid having dependencies, but you can hack download() function in tagui.py to include your code above, so that it uses proxy to download everytime.

I'm assuming that what you care about is installing the package to use. Then steps 1 to 3 combine with your requests script to download should work. If your use case for automation involves calling download() function and need to access URLs through proxy, then you will have to hack download() to include your code above to always retrieve through your proxies.

@kensoh
Copy link
Member

kensoh commented Feb 12, 2020

For you and other users reference - below is the hacked version of download() which includes your requests proxy method to download files. I used some dummy free proxy which is not reliable for testing. It has to be changed to your stable proxies.

Also, the headers I use the same as the Lynx user agent since it works in your environment. That should be changed accordingly to be something else valid, otherwise I think some web server will not want to serve a request from Lynx browser.

def download(download_url = None, filename_to_save = None):
    """function for python 2/3 compatible file download from url"""

    if download_url is None or download_url == '':
        print('[RPA][ERROR] - download URL missing for download()')
        return False

    # if not given, use last part of url as filename to save
    if filename_to_save is None or filename_to_save == '':
        download_url_tokens = download_url.split('/')
        filename_to_save = download_url_tokens[-1]

    # delete existing file if exist to ensure freshness
    if os.path.isfile(filename_to_save):
        os.remove(filename_to_save)

    # handle case where url is invalid or has no content
    try:
        import requests
        from requests.adapters import HTTPAdapter

        class ProxyUAAdapter(HTTPAdapter):
            def proxy_headers(self, proxy):
                headers = super(ProxyUAAdapter, self).proxy_headers(proxy)
                headers['User-Agent'] = 'Lynx'
                return headers

        s = requests.Session()
        s.mount('https://', ProxyUAAdapter())
        s.mount('https://', ProxyUAAdapter())
        proxies = {'http': '142.93.80.189:80', 'https': '148.251.200.199:1080'}
        get_response = s.get(download_url, proxies = proxies)
        downloaded_file = open(filename_to_save,'wb')
        downloaded_file.write(get_response.content)
        downloaded_file.close()

    except Exception as e:
        print('[RPA][ERROR] - failed downloading from ' + download_url + '...')
        print(str(e))
        return False

    # take the existence of downloaded file as success
    if os.path.isfile(filename_to_save):
        return True

    else:
        print('[RPA][ERROR] - failed downloading to ' + filename_to_save)
        return False

@kensoh kensoh changed the title r.init() failed to download TagUI_Linux.zip file due to firewall - use pack() r.init() failed to download TagUI_Linux.zip file due to firewall - pack() or hack Feb 12, 2020
@xxdxxd
Copy link
Author

xxdxxd commented Feb 12, 2020

Is the download() function part of your rpa python library? If so, where can I find this function and then modify it to your latest hacked version?

How do I use this download() function in rpa? before r.init()?
Thanks!

@kensoh
Copy link
Member

kensoh commented Feb 12, 2020

Yes download() is inside tagui.py. You can do below and find the file location to modify it -

import tagui as t
print(t.__file__)

init() function will automatically call download() to download the files, no need to hack init().

@xxdxxd
Copy link
Author

xxdxxd commented Feb 12, 2020

I made changes to the download() function and used the correct proxy, now it can download zip file, but has new errors:

In [2]: r.init()
[RPA][INFO] - setting up TagUI for use in your Python environment
[RPA][INFO] - downloading TagUI (~200MB) and unzipping to below folder...
[RPA][INFO] - /abc/def/
[RPA][INFO] - done. syncing TagUI with stable cutting edge version
[RPA][INFO] - TagUI now ready for use in your Python environment
[RPA][INFO] - visual automation (optional) requires special setup on Linux,
[RPA][INFO] - see the link below to install OpenCV and Tesseract libraries
[RPA][INFO] - https://sikulix-2014.readthedocs.io/en/latest/newslinux.html
[RPA][ERROR] - following happens when starting TagUI...

/abc/def/.tagui/src/tagui: line 304: type: google-chrome: not found
ERROR - cannot find Chrome command "google-chrome"
update chrome_command setting in tagui/src/tagui and make sure symlink to command is created

============================
google-chrome is not allowed on our Linux, firefox browser is installed. Is it possible to use firefix instead? If so, how to modify your code to do this?

Thanks!

@kensoh
Copy link
Member

kensoh commented Feb 12, 2020

Oh shucks.. This package is designed to work only with Google Chrome.

One last thing that you can try is to modify tagui.py to change below -

browser_option = 'chrome'

to the following -

browser_option = 'firefox'

And download Firefox v59 from there - https://ftp.mozilla.org/pub/firefox/releases/59.0

However, this has not been tested to work for this package and will most likely fail.

@kensoh
Copy link
Member

kensoh commented Feb 12, 2020

I'll leave the readme unchanged for now, since there is mention of Chrome browser. But let me know if the readme is confusing to suggest that Firefox or Internet Explorer is supported, then I update.

@xxdxxd
Copy link
Author

xxdxxd commented Feb 12, 2020

I changed
browser_option = 'firefox'

===================================================
In [2]: r.init()
[RPA][ERROR] - following happens when starting TagUI...

Gecko error: it seems /usr/bin/firefox is not compatible with SlimerJS.
See Gecko version compatibility. If version is correct, launch slimerjs
with --debug=true to see Firefox error message

Out[2]: False

@kensoh
Copy link
Member

kensoh commented Feb 12, 2020

Have you installed Firefox v59 from the link in my post above?

@xxdxxd
Copy link
Author

xxdxxd commented Feb 14, 2020

Our firefox version is v72.0.1 64 bits on Linux ubuntu. Does rpa library only work with v59?

Thanks!

@kensoh
Copy link
Member

kensoh commented Feb 14, 2020

This package uses TagUI project. TagUI uses SlimerJS to control Firefox, but only for version 59 and earlier. Because v60 onwards the Firefox architecture has changed totally.

You can try installing v59 Firefox and make the modification above to see if it works. But I think it is 99.9% not likely to work properly. The package is designed to work with Chrome only.

@kensoh kensoh closed this as completed Feb 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants