r.init() failed to download TagUI_Linux.zip file due to firewall - pack() or hack #104

xxdxxd · 2020-02-07T16:24:04Z

Hi Sir,

Your RPA looks great, I am trying to test it in our work environment(Linux Ubuntu) but got an error. I think it has to do our firewall. Do you have suggestions on how to bypass this issue? I tried to download this zip file from Windows, then copied it to my home directory in linux, but init() function still tried to download this zip file and timed out.

In [100]: r.init(visual_automation = True, chrome_browser = False)
[RPA][INFO] - setting up TagUI for use in your Python environment
[RPA][INFO] - downloading TagUI (~200MB) and unzipping to below folder...
[RPA][INFO] - /home/xxx/
[RPA][ERROR] - failed downloading from https://github.com/tebelorg/Tump/releases/download/v1.0.0/TagUI_Linux.zip...
<urlopen error [Errno 110] Connection timed out>

Thanks,
Dave

kensoh · 2020-02-07T16:34:19Z

Hi Dave, yes this looks like your company firewall may be blocking automated downloads from GitHub. You can use the pack() function on a computer with internet access and no firewall. After that, copy the zip file to your work computer to use.

See more details at API reference and the 3-step guide here #36 (comment)

xxdxxd · 2020-02-07T23:10:37Z

Hi Ken, Our organization has firewall, your method is to first use pack() to get the zip file. But pack() still failed in our organization network because it has firewall. We are not allowed to copy files from home computer to office computers. How to deal with this issue? I can download the TagUI_Linux.zip file in office computer, can you modify your init function so that it will check if this zip file exists, if yes, then no need to download it and can directly use it? Maybe there are better ways to handle this such as specify http proxy. Thanks, Dave

…

On Fri, Feb 7, 2020 at 11:34 AM Ken Soh ***@***.***> wrote: Hi Dave, yes this looks like your company firewall may be blocking automated downloads from GitHub. You can use the pack() function on a computer with internet access and no firewall. After that, copy the zip file to your work computer to use. See more details at API reference <https://github.com/tebelorg/RPA-Python#api-reference> and the 3-step guide here #36 (comment) <#36 (comment)> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#104?email_source=notifications&email_token=ADYRVY7ZIMQVJZ5PEO6TTB3RBWEQZA5CNFSM4KRRG4NKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELDU62Q#issuecomment-583487338>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADYRVY4HJ3LD76Y5AHTKODTRBWEQZANCNFSM4KRRG4NA> .

kensoh · 2020-02-08T08:55:01Z

Hi Dave, can you tell me how do you download TagUI_Linux.zip when your work computer has no access to internet and not allowed to copy files from home computer to office computer?

If you are doing it through proxy, can you try setting it using the following before you run Python and import the package? The standard download urllib.request used by the download() function should use the proxy settings defined in the environment.

Windows

set http_proxy=https://proxy.myproxy.com
set https_proxy=https://proxy.myproxy.com

macOS / Linux

export http_proxy=https://proxy.myproxy.com
export https_proxy=https://proxy.myproxy.com

There is no plan to have a function within the package to define proxy, partly because proxy settings are not applied to the Chrome browser invoked through TagUI, and partly because a normal user for RPA use cases won't be programmatically changing proxy. User can still automate the steps through frontend UI layer to change proxy the way he does it manually.

kensoh · 2020-02-08T08:57:14Z

For recognising the local TagUI_Linux.zip it is not a good solution, because there are 7 other files which will also be downloaded during init(). These other 7 files are stable cutting edge version source code files for TagUI. If implement recognise local files, users will have to also download these 7 files, which is too much user friction.

kensoh · 2020-02-08T09:00:39Z

Adding on, as you can download TagUI_Linux.zip on your work computer, there must be some way which your work computer can access the internet.

If above suggestion using proxy does not work and init() can't run, you can use pack() on your home Linux computer to generate the zip file, and upload using something like Firefox Send - https://send.firefox.com or some other ways which allows your work computer to download.

Let me know more how you download TagUI_Linux.zip on your work computer and any other details so that we can find the best solution to use the package and update it in future.

xxdxxd · 2020-02-08T13:16:30Z

In our company, all computers can access internet and we have a firewall. The windows computer has set up so that it can access most websites and download files like TagUI_Linux.zip without problems. All personal email accounts and other file sharing websites are not allowed(including send.firefox.com). USB drive is not allowed to use either. There is no way to transfer files from home computer to office computer. The policy is very strict. In Linux machines on our office network, it can access internet within firewall. We use http proxy to access internet. I have used requests python library with some proxy setup before to access some websites to download data. I am not sure if it will work by just setting the environmental variable for http proxy. I will test it next week. From your youtube video demo, it seems that this is a great tool, I really want to use it. Thanks!

kensoh · 2020-02-08T14:13:52Z

I see.. Thanks for sharing these details! Look forward to hearing more from you on how it turns out when you set environmental variables for proxy. I will work with you closely to figure out a way to have the tool run on your work computer with as little user friction as possible.

Letting the tool run on computers with restricted internet access or no internet access is a primary goal of this project, as I see decentralised distribution and running of software to be very important in the coming decade.

In the meantime, from above data points, it looks like your firewall may be configured to allow http requests from allowed apps such as Chrome browser but not other apps like a Python process. Because the URL which you use to download the TagUI_Linux.zip and the URL which the tool automatically downloads the file is the same - https://github.com/tebelorg/Tump/releases/download/v1.0.0/TagUI_Linux.zip

If that's the case, in the worst case scenario, the following may work for you. Try this only if the setting proxy method does not work. Because many steps below, troublesome.

run r.pack() on your personal Linux computer to generate the zip
on your GitHub account, create a dummy repo and a dummy release
attach the rpa_python.zip and rpa.py as attachments in the dummy release
use Chrome browser on your work laptop to download the files
(use the same way as how you download TagUI_Linux.zip)

xxdxxd · 2020-02-10T19:04:27Z

I tried to set the environmental variable http_proxy and https_proxy in Linux before running Python.

In [1]: import rpa as r
In [2]: r.init()
[RPA][INFO] - setting up TagUI for use in your Python environment
[RPA][INFO] - downloading TagUI (~200MB) and unzipping to below folder...
[RPA][INFO] - /abc/def
[RPA][ERROR] - failed downloading from https://github.com/tebelorg/Tump/releases/download/v1.0.0/TagUI_Linux.zip...

Out[2]: False

So it did not work.
Also we can not use personal github to transfer files to company network, so running pack() will not work.

xxdxxd · 2020-02-10T20:01:03Z

I can use python requests library to download the TagUI_Linux.zip file without issues after I set up the proxy. When I use rpa init() function, it will give error. I wonder if init() function can accept some parameter to accept the proxy info.

kensoh · 2020-02-11T09:42:21Z

From the Python docs, it looks like the way download() use urllib.request it will automatically get proxy from the environment - https://docs.python.org/3/library/urllib.request.html

I've also check Python requests package and it seems to also be using urllib functions to get proxy settings - https://github.com/psf/requests/blob/master/requests/compat.py

Can you try below to see if that works? Maybe the quotation marks are needed, and the previous solution I found from StackOverflow without quotes is wrong.

export HTTP_PROXY="https://10.10.10.10:8000"
export HTTPS_PROXY="https://10.10.10.10:1212"

In the worst case scenario you can use the 4 steps above to upload the zip file from pack() and use your Python requests package proxy method to do the download.

xxdxxd · 2020-02-11T13:30:50Z

I tried the export with quotation marks, it did not work either.

The 4 steps will not work because it will violate the company policy.

xxdxxd · 2020-02-11T13:58:02Z

The following is how I use proxy and it can work.

import requests
from requests.adapters import HTTPAdapter

class ProxyUAAdapter(HTTPAdapter):
def proxy_headers(self, proxy):
headers = super(ProxyUAAdapter, self).proxy_headers(proxy)
headers['User-Agent'] = 'Lynx'
return headers

s = requests.Session()
s.mount('https://', ProxyUAAdapter())
s.mount('https://', ProxyUAAdapter())

proxies = {'http': 'https://10.10.10.10:8000', 'https': 'https://10.10.10.10:1212'}

url = "https://www.abcd.com/a.zip"

j = s.get(url, proxies = proxies)

kensoh · 2020-02-11T13:58:09Z

I see.. I'm afraid I have no other idea to try for now. Because exhaust what online documentation suggests. Will have to look out for more data points from other users with similar setup to see if there is some way to solve.

For the 4th step to download file, I mean you download the same way using Python requests module on the uploaded URL, the same way that you manage to download TagUI_Linux.zip.

kensoh · 2020-02-11T14:00:43Z

Yes you can do steps 1 to 3 to upload zip file and rpa.py to a dummy release on github. After that you use your requests script above to download the URLs of the files uploaded in step 3.

kensoh · 2020-02-11T14:10:49Z

I'll avoid adding requests as a dependency to proxy for now to avoid having dependencies, but you can hack download() function in tagui.py to include your code above, so that it uses proxy to download everytime.

I'm assuming that what you care about is installing the package to use. Then steps 1 to 3 combine with your requests script to download should work. If your use case for automation involves calling download() function and need to access URLs through proxy, then you will have to hack download() to include your code above to always retrieve through your proxies.

kensoh · 2020-02-12T02:20:35Z

For you and other users reference - below is the hacked version of download() which includes your requests proxy method to download files. I used some dummy free proxy which is not reliable for testing. It has to be changed to your stable proxies.

Also, the headers I use the same as the Lynx user agent since it works in your environment. That should be changed accordingly to be something else valid, otherwise I think some web server will not want to serve a request from Lynx browser.

def download(download_url = None, filename_to_save = None):
    """function for python 2/3 compatible file download from url"""

    if download_url is None or download_url == '':
        print('[RPA][ERROR] - download URL missing for download()')
        return False

    # if not given, use last part of url as filename to save
    if filename_to_save is None or filename_to_save == '':
        download_url_tokens = download_url.split('/')
        filename_to_save = download_url_tokens[-1]

    # delete existing file if exist to ensure freshness
    if os.path.isfile(filename_to_save):
        os.remove(filename_to_save)

    # handle case where url is invalid or has no content
    try:
        import requests
        from requests.adapters import HTTPAdapter

        class ProxyUAAdapter(HTTPAdapter):
            def proxy_headers(self, proxy):
                headers = super(ProxyUAAdapter, self).proxy_headers(proxy)
                headers['User-Agent'] = 'Lynx'
                return headers

        s = requests.Session()
        s.mount('https://', ProxyUAAdapter())
        s.mount('https://', ProxyUAAdapter())
        proxies = {'http': '142.93.80.189:80', 'https': '148.251.200.199:1080'}
        get_response = s.get(download_url, proxies = proxies)
        downloaded_file = open(filename_to_save,'wb')
        downloaded_file.write(get_response.content)
        downloaded_file.close()

    except Exception as e:
        print('[RPA][ERROR] - failed downloading from ' + download_url + '...')
        print(str(e))
        return False

    # take the existence of downloaded file as success
    if os.path.isfile(filename_to_save):
        return True

    else:
        print('[RPA][ERROR] - failed downloading to ' + filename_to_save)
        return False

xxdxxd · 2020-02-12T13:57:57Z

Is the download() function part of your rpa python library? If so, where can I find this function and then modify it to your latest hacked version?

How do I use this download() function in rpa? before r.init()?
Thanks!

kensoh · 2020-02-12T14:47:10Z

Yes download() is inside tagui.py. You can do below and find the file location to modify it -

import tagui as t
print(t.__file__)

init() function will automatically call download() to download the files, no need to hack init().

xxdxxd · 2020-02-12T19:07:35Z

I made changes to the download() function and used the correct proxy, now it can download zip file, but has new errors:

In [2]: r.init()
[RPA][INFO] - setting up TagUI for use in your Python environment
[RPA][INFO] - downloading TagUI (~200MB) and unzipping to below folder...
[RPA][INFO] - /abc/def/
[RPA][INFO] - done. syncing TagUI with stable cutting edge version
[RPA][INFO] - TagUI now ready for use in your Python environment
[RPA][INFO] - visual automation (optional) requires special setup on Linux,
[RPA][INFO] - see the link below to install OpenCV and Tesseract libraries
[RPA][INFO] - https://sikulix-2014.readthedocs.io/en/latest/newslinux.html
[RPA][ERROR] - following happens when starting TagUI...

/abc/def/.tagui/src/tagui: line 304: type: google-chrome: not found
ERROR - cannot find Chrome command "google-chrome"
update chrome_command setting in tagui/src/tagui and make sure symlink to command is created

============================
google-chrome is not allowed on our Linux, firefox browser is installed. Is it possible to use firefix instead? If so, how to modify your code to do this?

Thanks!

kensoh · 2020-02-12T20:35:43Z

Oh shucks.. This package is designed to work only with Google Chrome.

One last thing that you can try is to modify tagui.py to change below -

browser_option = 'chrome'

to the following -

browser_option = 'firefox'

And download Firefox v59 from there - https://ftp.mozilla.org/pub/firefox/releases/59.0

However, this has not been tested to work for this package and will most likely fail.

kensoh · 2020-02-12T20:39:52Z

I'll leave the readme unchanged for now, since there is mention of Chrome browser. But let me know if the readme is confusing to suggest that Firefox or Internet Explorer is supported, then I update.

xxdxxd · 2020-02-12T21:30:58Z

I changed
browser_option = 'firefox'

===================================================
In [2]: r.init()
[RPA][ERROR] - following happens when starting TagUI...

Gecko error: it seems /usr/bin/firefox is not compatible with SlimerJS.
See Gecko version compatibility. If version is correct, launch slimerjs
with --debug=true to see Firefox error message

Out[2]: False

kensoh · 2020-02-12T21:36:11Z

Have you installed Firefox v59 from the link in my post above?

xxdxxd · 2020-02-14T13:36:04Z

Our firefox version is v72.0.1 64 bits on Linux ubuntu. Does rpa library only work with v59?

Thanks!

kensoh · 2020-02-14T16:59:59Z

This package uses TagUI project. TagUI uses SlimerJS to control Firefox, but only for version 59 and earlier. Because v60 onwards the Firefox architecture has changed totally.

You can try installing v59 Firefox and make the modification above to see if it works. But I think it is 99.9% not likely to work properly. The package is designed to work with Chrome only.

kensoh changed the title ~~r.init() function failed to download TagUI_Linux.zip file due to firewall~~ r.init() function failed to download TagUI_Linux.zip file due to firewall - use pack() Feb 7, 2020

kensoh changed the title ~~r.init() function failed to download TagUI_Linux.zip file due to firewall - use pack()~~ r.init() failed to download TagUI_Linux.zip file due to firewall - use pack() Feb 7, 2020

kensoh added the query label Feb 7, 2020

kensoh mentioned this issue Feb 8, 2020

How to use proxy in accessing websites? - for feedback and discussion #97

Closed

kensoh changed the title ~~r.init() failed to download TagUI_Linux.zip file due to firewall - use pack()~~ r.init() failed to download TagUI_Linux.zip file due to firewall - pack() or hack Feb 12, 2020

kensoh closed this as completed Feb 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

r.init() failed to download TagUI_Linux.zip file due to firewall - pack() or hack #104

r.init() failed to download TagUI_Linux.zip file due to firewall - pack() or hack #104

xxdxxd commented Feb 7, 2020

kensoh commented Feb 7, 2020

xxdxxd commented Feb 7, 2020 via email

kensoh commented Feb 8, 2020

kensoh commented Feb 8, 2020

kensoh commented Feb 8, 2020

xxdxxd commented Feb 8, 2020 via email

kensoh commented Feb 8, 2020

xxdxxd commented Feb 10, 2020

xxdxxd commented Feb 10, 2020

kensoh commented Feb 11, 2020

xxdxxd commented Feb 11, 2020

xxdxxd commented Feb 11, 2020

kensoh commented Feb 11, 2020

kensoh commented Feb 11, 2020

kensoh commented Feb 11, 2020 •

edited

Loading

kensoh commented Feb 12, 2020

xxdxxd commented Feb 12, 2020

kensoh commented Feb 12, 2020 •

edited

Loading

xxdxxd commented Feb 12, 2020

kensoh commented Feb 12, 2020

kensoh commented Feb 12, 2020

xxdxxd commented Feb 12, 2020

kensoh commented Feb 12, 2020

xxdxxd commented Feb 14, 2020

kensoh commented Feb 14, 2020 •

edited

Loading

r.init() failed to download TagUI_Linux.zip file due to firewall - pack() or hack #104

r.init() failed to download TagUI_Linux.zip file due to firewall - pack() or hack #104

Comments

xxdxxd commented Feb 7, 2020

kensoh commented Feb 7, 2020

xxdxxd commented Feb 7, 2020 via email

kensoh commented Feb 8, 2020

kensoh commented Feb 8, 2020

kensoh commented Feb 8, 2020

xxdxxd commented Feb 8, 2020 via email

kensoh commented Feb 8, 2020

xxdxxd commented Feb 10, 2020

I tried to set the environmental variable http_proxy and https_proxy in Linux before running Python.

xxdxxd commented Feb 10, 2020

kensoh commented Feb 11, 2020

xxdxxd commented Feb 11, 2020

xxdxxd commented Feb 11, 2020

kensoh commented Feb 11, 2020

kensoh commented Feb 11, 2020

kensoh commented Feb 11, 2020 • edited Loading

kensoh commented Feb 12, 2020

xxdxxd commented Feb 12, 2020

kensoh commented Feb 12, 2020 • edited Loading

xxdxxd commented Feb 12, 2020

kensoh commented Feb 12, 2020

kensoh commented Feb 12, 2020

xxdxxd commented Feb 12, 2020

Out[2]: False

kensoh commented Feb 12, 2020

xxdxxd commented Feb 14, 2020

kensoh commented Feb 14, 2020 • edited Loading

kensoh commented Feb 11, 2020 •

edited

Loading

kensoh commented Feb 12, 2020 •

edited

Loading

kensoh commented Feb 14, 2020 •

edited

Loading