Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r.init() is very slow - network issue to github.com, pack() or download manually #376

Closed
zhaoyukoon opened this issue Apr 28, 2022 · 20 comments
Labels

Comments

@zhaoyukoon
Copy link

when i try to run import rpa as r; r.init(), the second line code never stop with following message:

[RPA][INFO] - setting up TagUI for use in your Python environment
[RPA][INFO] - downloading TagUI (~200MB) and unzipping to below folder...
[RPA][INFO] - /Users/ygzhao_1

I try to downlaod tagUI and unzip to /Users/ygzhao_1, the problem still exists.

I guess it is because of the slow speed in China.

What can I do to fix this?
Thanks

@kensoh
Copy link
Member

kensoh commented Apr 28, 2022

Hi @zhaoyukoon yes I suspect it is network issue. I have other users reporting similar issue. In China, it seems that the network is not very reliable to download files from GitHub, which this package does during the initial run.

Can you try on your personal laptop or use VPN software to see if it works? If it does, you can use pack() function to create a zip file for you to move to your company intranet computer. More info on using pack() - #36 (comment)

In the future, @kangyiwen may be looking at hosting some of these resource files on a website which can be reliably accessed in China. When that happens, I can upgrade the package to support downloading from alternative hosts.

@kensoh
Copy link
Member

kensoh commented Apr 28, 2022

Let me know how it goes. There is another way which is to download manually, but there are a few steps and more complicated to do. Need to download manually the zip file, and also the delta update files, and then create a dummy marker file to tell the package that it is already updated.

@kensoh kensoh changed the title r.init() is very slow r.init() is very slow - network access issue to reach github.com, some ideas Apr 28, 2022
@kensoh kensoh added the query label Apr 28, 2022
@zhaoyukoon
Copy link
Author

Thanks for reply. It is internet issue. It is very unstable to download file from github. Usually i download files manually via Chrome with proxy. My suggestion is that the setup script can be updated to check whether the required file exist or not.

Currently i am trying to change the tagui.py for checking.

@kensoh
Copy link
Member

kensoh commented Apr 29, 2022

I see.. There are a few steps, first is download the TagUI zip file for your OS, then update them with the delta files from this folder (including the file in tagui.sikuli) - https://github.com/tebelorg/Tump/tree/master/TagUI-Python

Finally, need to create a dummy file rpa_python_1.47.0 in your tagui folder, as a marker so that the package does not try to download or update again on init(). Can you let me know if doing all the above steps is easier for you than using pack()?

If yes, I will update the package to support users who prefer to manually download the files instead of using pack(). Thanks in advance for your feedback! It will help improve the usability of rpa package.

@zhaoyukoon
Copy link
Author

What is the purpose of _tagui_delta, which try to download files from https://raw.githubusercontent.com/tebelorg/Tump/master/TagUI-Python/? The program always fails here as many files are downloaded.

I suggest the program can either download a zip file or support manual download here.

@kensoh
Copy link
Member

kensoh commented Apr 29, 2022

The TagUI zip file is big, 100-200MB. The delta file design is to only update the small changes (<1MB), instead of downloading big latest zip file every time. Thanks for your feedback, I will think over how to make it easier.

The real work is the manual downloading. Maybe a good way is write clearly the steps how to download manually for Windows, Mac, Linux. I think that might be the best and easier path for users. I will try to do that when I get some time.

@zhaoyukoon
Copy link
Author

Cool! Wait for the update.

@kensoh
Copy link
Member

kensoh commented Apr 30, 2022

These are the steps to manually download and set up the dependencies. If you are using init() with network access or use pack(), the steps below will be done automatically by rpa package. Do below only if you cannot use those options.

1. Download TagUI engine from https://github.com/tebelorg/Tump/releases

  • for Windows, download TagUI_Windows.zip
  • for Mac, download TagUI_macOS.zip
  • for Linux, download TagUI_Linux.zip

2. Unzip the file to the respective folders for your operating system

  • for Windows, look for %APPDATA% (cd %APPDATA%), and put your tagui folder there
  • for Mac or Linux, look for ~ folder (cd ~), put your tagui folder there, rename it to .tagui

3. Download update files from https://github.com/tebelorg/Tump/tree/master/TagUI-Python

  • tagui, tagui.cmd, end_processes, end_processes.cmd, tagui_header.js, tagui_parse.php
  • put these files into the tagui\src (for Windows) or .tagui/src folder (for Mac or Linux)
  • download the tagui.sikuli\tagui.py and overwrite the one in your src\tagui.sikuli folder
  • choose overwrite when prompted, to overwrite your existing files with the above files

4. Create marker file so that rpa package will not try to download anything on init()

  • under the tagui folder (Windows) or the .tagui folder (Mac or Linux),
  • create an empty file rpa_python_1.47.0 (or the version number you pip install)

5. For Windows, check if PHP is working correctly and no broken dependency

6. For Mac and Linux, do the following to ensure execute permissions are there

  • run the command from terminal cd ~ and then chmod -R 755 .tagui

7. For Mac only, do the following steps to fix the OpenSSL issue with Mac

@kensoh
Copy link
Member

kensoh commented Apr 30, 2022

@zhaoyukoon above are the steps to download and set up manually, let me know if you run into any problems!

@kensoh kensoh changed the title r.init() is very slow - network access issue to reach github.com, some ideas r.init() is very slow - network access issue to github.com, download manually Apr 30, 2022
@kangyiwen
Copy link

kangyiwen commented Apr 30, 2022

@zhaoyukoon Github下载的确非常慢,我放到国内了

各类TagUI安装/插件/资源 国内下载,请访问:
链接:https://pan.baidu.com/s/1GrkOxmu9fTBpwzO1JzaNUw?pwd=tagu

如果百度网盘下载还慢,加我微信: kyw2004 我单独发你。

@kensoh
Copy link
Member

kensoh commented Apr 30, 2022

Thanks @kangyiwen! For rpa package, it uses a forked version of TagUI that I maintain, in order to adapt TagUI engine to work well with Python process.

Thus rpa package will not work with the standard TagUI (human language version) zip files which have iterated in another direction. The files at steps 1357 above would be separately required to be downloaded.

@kensoh
Copy link
Member

kensoh commented Apr 30, 2022

Adding on to above discussions, @zhaoyukoon and @kangyiwen, if you can,

Can you try below Python script file to see how is automatic downloading from my own server instead of GitHub?

That will give me more information if it makes sense to host a separate download location as an option to GitHub.

test_download.py

import rpa as r
r.download('https://tebel.org/TagUI_Windows.zip')

@kensoh
Copy link
Member

kensoh commented May 4, 2022

I'm suspecting downloading from my tebel.org will not be much better than GitHub. My server is hosted in Singapore.

Anyway, see how your testing goes. And if it makes sense to set up a secondary download server as part of package.

@kensoh kensoh changed the title r.init() is very slow - network access issue to github.com, download manually r.init() is very slow - network access issue to github.com, pending feedback on 2nd server May 4, 2022
@kensoh kensoh changed the title r.init() is very slow - network access issue to github.com, pending feedback on 2nd server r.init() is very slow - network issue to github.com, pending feedback on 2nd server May 4, 2022
@kensoh kensoh changed the title r.init() is very slow - network issue to github.com, pending feedback on 2nd server r.init() is very slow - network issue to github.com, pending tests on 2nd server May 4, 2022
@zhaoyukoon
Copy link
Author

It is still very slow for https://tebel.org/TagUI_Windows.zip

@kensoh
Copy link
Member

kensoh commented May 9, 2022

I see.. Looks like for now, need to try these steps here - #376 (comment)

Because even if iterate codebase for rpa package to support files that are pre-downloaded by users, most of steps above are still needed to download manually from GitHub, using VPN or other ways. For users without access to github.com.

@kensoh
Copy link
Member

kensoh commented May 9, 2022

Alternative is try pack() on a PC with internet access. Also, in theory, Python can support setting proxy.

But I have not validated that approach to work, no feedback from users so far on that method.

@kensoh kensoh changed the title r.init() is very slow - network issue to github.com, pending tests on 2nd server r.init() is very slow - network issue to github.com, pack() or download manually May 9, 2022
@zhaoyukoon
Copy link
Author

zhaoyukoon commented May 10, 2022

These are the steps to manually download and set up the dependencies. If you are using init() with network access or use pack(), the steps below will be done automatically by rpa package. Do below only if you cannot use those options.

1. Download TagUI engine from https://github.com/tebelorg/Tump/releases

  • for Windows, download TagUI_Windows.zip
  • for Mac, download TagUI_macOS.zip
  • for Linux, download TagUI_Linux.zip

2. Unzip the file to the respective folders for your operating system

  • for Windows, look for %APPDATA% (cd %APPDATA%), and put your tagui folder there
  • for Mac or Linux, look for ~ folder (cd ~), put your tagui folder there, rename it to .tagui

3. Download update files from https://github.com/tebelorg/Tump/tree/master/TagUI-Python

  • tagui, tagui.cmd, end_processes, end_processes.cmd, tagui_header.js, tagui_parse.php
  • put these files into the tagui\src (for Windows) or .tagui/src folder (for Mac or Linux)
  • download the tagui.sikuli\tagui.py and overwrite the one in your src\tagui.sikuli folder
  • choose overwrite when prompted, to overwrite your existing files with the above files

4. Create marker file so that rpa package will not try to download anything on init()

  • under the tagui folder (Windows) or the .tagui folder (Mac or Linux),
  • create an empty file rpa_python_1.47.0 (or the version number you pip install)

5. For Windows, check if PHP is working correctly and no broken dependency

6. For Mac and Linux, do the following to ensure execute permissions are there

  • run the command from terminal cd ~ and then chmod -R 755 .tagui

7. For Mac only, do the following steps to fix the OpenSSL issue with Mac

  • go to this url https://github.com/tebelorg/Tump/releases
  • download and unzip phantomjs-2.1.1-macosx.zip
  • delete the phantomjs folder under .tagui/src folder and
  • replace with the above unzipped folder (rename to phantomjs)

I try to install related file based on your instruction, it works.

>>> import rpa as r
>>> r.init()
True

Thanks!

@kensoh
Copy link
Member

kensoh commented May 11, 2022

Thanks @zhaoyukoon for sharing back your findings! Ok I'll close the issue now. In the future, if there's a better way around this, I will improve the rpa package with the new way.

@DoubleK6
Copy link

Using proxies to download is faster and easier, open a cmd window, set the proxy to the cmd window, and execute python rpa.py in the cmd window. Only the pack() function is called in rpa.py. You get the zip file. Download speed depends on the agent speed. Even with the manual download method above, there is no proxy is still a problem, since there is a proxy, why not use the pack() function?

@kensoh
Copy link
Member

kensoh commented Oct 25, 2023

Also adding on, it is also possible to set proxy within the Python code -
https://stackoverflow.com/questions/31639742/how-to-pass-all-pythons-traffics-through-a-http-proxy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants