Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DATABASE_URL and DATA_PATH options do not take effect in the config file #100

Closed
LcodingL opened this issue Nov 4, 2019 · 21 comments
Closed
Labels
bug Something isn't working

Comments

@LcodingL
Copy link

LcodingL commented Nov 4, 2019

Describe the bug
Ive set the option DATABASE_URL to support MySQL in a correct format and restart scrapydweb,but no DBS in [DB_APSCHEDULER, DB_TIMERTASKS, DB_METADATA, DB_JOBS] had been created and the Settings of DATABASE displayed on web UI are still "sqlite:https:////......"

To Reproduce
Steps to reproduce the behavior:

  1. edit 'scrapydweb_settings_v10.py' with 'DATABASE_URL = 'mysql:https://root:[email protected]:3306''
  2. run command: pip install --upgrade pymysql
  3. restart scrapydweb by running command 'scrapydweb' under path where the config file is.

Expected behavior

  1. I used to use default DATABASE_URL and data were stored in sqlite normally and now i want to use MySQL backend. Will the related databases be created in mysql automatically?
  2. Since i didnt do database migration from sqlite to mysql manually, i thought no job status should be displayed on Dashboard after i set DATABASE_URL of mysql in config file .But it showed all the jobs status as before and the Settings of DATABASE displayed on web UI are still "sqlite:https:////......"

3.If the Settings of DATABASE displayed on web UI is right the database used by the running scrapydweb?

4.Do i need to migrate data from sqlite to mysql manually if i want use MySQL backend in the future?

Logs

[2019-11-04 15:48:56,143] INFO in apscheduler.scheduler: Scheduler started
[2019-11-04 15:48:56,162] INFO in scrapydweb.run: ScrapydWeb version: 1.4.0
[2019-11-04 15:48:56,163] INFO in scrapydweb.run: Use 'scrapydweb -h' to get help
[2019-11-04 15:48:56,163] INFO in scrapydweb.run: Main pid: 2630
[2019-11-04 15:48:56,163] DEBUG in scrapydweb.run: Loading default settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/default_settings.py


Overriding custom settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/scrapydweb_settings_v10.py


[2019-11-04 15:48:56,321] DEBUG in scrapydweb.run: Reading settings from command line: Namespace(bind='0.0.0.0', debug=False, disable_auth=False, disable_logparser=False, disable_monitor=False, port=5000, scrapyd_server=None, switch_scheduler_state=False, verbose=True)
[2019-11-04 15:48:56,321] DEBUG in scrapydweb.utils.check_app_config: Checking app config
[2019-11-04 15:48:56,323] INFO in scrapydweb.utils.check_app_config: Setting up URL_SCRAPYDWEB: http:https://127.0.0.1:5000
[2019-11-04 15:48:56,324] DEBUG in scrapydweb.utils.check_app_config: Checking connectivity of SCRAPYD_SERVERS...

Index Group Scrapyd IP:Port Connectivity Auth
#######################################################################
1____ dataocean___________ 10.8.32.56:6800_______ True_______ None
2____ dataocean___________ 10.8.64.78:6800_______ True_______ None
#######################################################################

/Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/sqlalchemy/ext/declarative/clsregistry.py:129: SAWarning: This declarative base already contains a class with the same class name and module name as scrapydweb.models.Job, and will be replaced in the string-lookup table.
% (item.module, item.name)
[2019-11-04 15:48:56,436] DEBUG in scrapydweb.utils.check_app_config: Created 2 tables for JobsView
[2019-11-04 15:48:56,436] INFO in scrapydweb.utils.check_app_config: Locating scrapy logfiles with SCRAPYD_LOG_EXTENSIONS: ['.log', '.log.gz', '.txt']
[2019-11-04 15:48:56,440] INFO in scrapydweb.utils.check_app_config: Scheduler for timer tasks: STATE_RUNNING
[2019-11-04 15:48:56,481] INFO in scrapydweb.utils.check_app_config: create_jobs_snapshot (trigger: interval[0:05:00], next run at: 2019-11-04 15:53:56 CST)


Visit ScrapydWeb at http:https://127.0.0.1:5000 or http:https://IP-OF-THE-CURRENT-HOST:5000


[2019-11-04 15:48:56,486] INFO in scrapydweb.run: For running Flask in production, check out http:https://flask.pocoo.org/docs/1.0/deploying/

  • Serving Flask app "scrapydweb" (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: off
    [2019-11-04 15:48:56,487] DEBUG in apscheduler.scheduler: Next wakeup is due at 2019-11-04 15:53:56.480998+08:00 (in 299.999017 seconds)
    [2019-11-04 15:49:26,498] INFO in werkzeug: * Running on http:https://0.0.0.0:5000/ (Press CTRL+C to quit)
    [2019-11-04 15:49:26,585] DEBUG in ApiView: view_args of >http:https://127.0.0.1:5000/1/api/daemonstatus/
    {
    "node": 1,
    "opt": "daemonstatus",
    "project": null,
    "version_spider_job": null
    }

Environment (please complete the following information):

  • OS: macOS 10.14
  • Python: 3.6
  • ScrapydWeb: 1.4.0
  • Browser :Chrome 73

Thx for your time !

@LcodingL LcodingL added the bug Something isn't working label Nov 4, 2019
@my8100 my8100 removed their assignment Nov 4, 2019
@my8100 my8100 added insufficient info No action would be taken until more info is provided and removed bug Something isn't working labels Nov 4, 2019
@my8100
Copy link
Owner

my8100 commented Nov 4, 2019

Check and make sure DATABASE_URL has been configured as expected:

$ echo $DATABASE_URL
$ python -c "from scrapydweb_settings_v10 import DATABASE_URL; print(DATABASE_URL)"

@LcodingL
Copy link
Author

LcodingL commented Nov 4, 2019

Thx for your timely reply!
I set configuration in this way:

DATABASE_URL = 'mysql:https://root:[email protected]:3306'

Runned command and got results as below:
$ echo $DATABASE_URL
-->''
$ python -c "from scrapydweb_settings_v10 import DATABASE_URL; print(DATABASE_URL)"
-->'mysql:https://root:[email protected]:3306'

Is there anything wrong?

@my8100
Copy link
Owner

my8100 commented Nov 4, 2019

Is it the file you are editing?

Overriding custom settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/scrapydweb_settings_v10.py
  1. Remove the DATABASE_URL option in the config file.
  2. Execute $ export SCRAPYDWEB_TESTMODE=True and $ export DATABASE_URL=mysql:https://root:[email protected]:3306
  3. Restart scrapydweb, some log like below should be found at the begining.
  4. If not found, mv scrapydweb_settings_v10.py scrapydweb_settings_v10.py.bak, then pip uninstall scrapydweb, then pip install --upgrade scrapydweb, finally restart scrapydweb.
APSCHEDULER_DATABASE_URI: mysql:https://root:[email protected]:3306/scrapydweb_apscheduler
SQLALCHEMY_DATABASE_URI: mysql:https://root:[email protected]:3306/scrapydweb_timertasks
SQLALCHEMY_BINDS: {'jobs': 'mysql:https://root:[email protected]:3306/scrapydweb_jobs', 'metadata': 'mysql:https://root:[email protected]:3306/scrapydweb_metadata'}

@LcodingL
Copy link
Author

LcodingL commented Nov 5, 2019

Great !
Ive tried the first three methods you listed above and it works!
4 related databases has been created automatically and data are stored normally!
Also, ive tried to $ export SCRAPYDWEB_TESTMODE=False and restart and it works as well.
So It seems to be that we should set DATABASE_URL in server envionment variables instead of in config file.Is that designed so or something need correction?

Besides,
Could i do the database migration of scrapydweb_timertasks to save duplicate operation of scheduling timer tasks manually again?

THANKS A LOT ^^

@my8100
Copy link
Owner

my8100 commented Nov 5, 2019

  1. Make sure there’s only one DATABASE_URL in the file.
$ cat /Users/laihuiying/Workspace/PythonEnv/scrapydweb/scrapydweb_settings_v10.py | grep DATABASE_URL
  1. Set DATABASE_URL = 'mysql:https://root:[email protected]:3306' in the config file above.
  2. Execute $ export DATABASE_URL=
  3. Restart scrapydweb.

You can try to migrate the database by yourself.

@LcodingL
Copy link
Author

LcodingL commented Nov 6, 2019

Ive followed the steps and failed.The config file didnt work.It is likely to be that we must set environment variables export DATABASE_URL=mysql:https://username:password@IP:PORT manually to make mysql backend valid.

@my8100
Copy link
Owner

my8100 commented Nov 6, 2019

$ export DATABASE_URL=
$ echo $DATABASE_URL
$ mv scrapydweb_settings_v10.py scrapydweb_settings_v10.py.bak
$ pip uninstall scrapydweb
$ pip install --upgrade scrapydweb

Restart scrapydweb and re-config the new generated file.
If still not working, post the full log, as well as the result of the following cmd:

$ echo $DATABASE_URL
$ pwd
$ cat scrapydweb_settings_v10.py | grep DATABASE_URL

@LcodingL
Copy link
Author

LcodingL commented Nov 6, 2019

Hi
Ive done the reinstallation and ran it with new config file yet it failed again.Below is the full log for your reference:

[2019-11-06 23:31:41,179] INFO in apscheduler.scheduler: Scheduler started
[2019-11-06 23:31:41,186] INFO in scrapydweb.run: ScrapydWeb version: 1.4.0
[2019-11-06 23:31:41,187] INFO in scrapydweb.run: Use 'scrapydweb -h' to get help
[2019-11-06 23:31:41,187] INFO in scrapydweb.run: Main pid: 10215
[2019-11-06 23:31:41,187] DEBUG in scrapydweb.run: Loading default settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/default_settings.py


Overriding custom settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/scrapydweb_settings_v10.py


[2019-11-06 23:31:41,301] DEBUG in scrapydweb.run: Reading settings from command line: Namespace(bind='0.0.0.0', debug=False, disable_auth=False, disable_logparser=False, disable_monitor=False, port=5000, scrapyd_server=None, switch_scheduler_state=False, verbose=False)
[2019-11-06 23:31:41,301] DEBUG in scrapydweb.utils.check_app_config: Checking app config
[2019-11-06 23:31:41,303] INFO in scrapydweb.utils.check_app_config: Setting up URL_SCRAPYDWEB: http:https://127.0.0.1:5000
[2019-11-06 23:31:41,303] DEBUG in scrapydweb.utils.check_app_config: Checking connectivity of SCRAPYD_SERVERS...

Index Group Scrapyd IP:Port Connectivity Auth
####################################################################################################
1____ None________________ 127.0.0.1:6800________ True_______ None
2____ test________________ localhost:6800________ True_______ None
####################################################################################################

/Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/sqlalchemy/ext/declarative/clsregistry.py:129: SAWarning: This declarative base already contains a class with the same class name and module name as scrapydweb.models.Job, and will be replaced in the string-lookup table.
% (item.module, item.name)
[2019-11-06 23:31:41,434] DEBUG in scrapydweb.utils.check_app_config: Created 2 tables for JobsView
[2019-11-06 23:31:41,434] INFO in scrapydweb.utils.check_app_config: Locating scrapy logfiles with SCRAPYD_LOG_EXTENSIONS: ['.log', '.log.gz', '.txt']
[2019-11-06 23:31:41,439] INFO in scrapydweb.utils.check_app_config: Scheduler for timer tasks: STATE_RUNNING
[2019-11-06 23:31:41,479] INFO in scrapydweb.utils.check_app_config: create_jobs_snapshot (trigger: interval[0:05:00], next run at: 2019-11-06 23:36:41 CST)


Visit ScrapydWeb at http:https://127.0.0.1:5000 or http:https://IP-OF-THE-CURRENT-HOST:5000


[2019-11-06 23:31:41,484] INFO in scrapydweb.run: For running Flask in production, check out http:https://flask.pocoo.org/docs/1.0/deploying/

  • Serving Flask app "scrapydweb" (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: off
    [2019-11-06 23:31:41,485] DEBUG in apscheduler.scheduler: Next wakeup is due at 2019-11-06 23:36:41.479754+08:00 (in 299.998942 seconds)
    [2019-11-06 23:32:11,694] INFO in werkzeug: * Running on http:https://0.0.0.0:5000/ (Press CTRL+C to quit)
    [2019-11-06 23:32:17,956] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:17] "GET /1/nodereports/ HTTP/1.1" 200 -
    [2019-11-06 23:32:18,008] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/css/style.css HTTP/1.1" 200 -
    [2019-11-06 23:32:18,009] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/css/icon_upload_icon_right.css HTTP/1.1" 200 -
    [2019-11-06 23:32:18,014] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/css/dropdown.css HTTP/1.1" 200 -
    [2019-11-06 23:32:18,016] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/icons_menu.js HTTP/1.1" 200 -
    [2019-11-06 23:32:18,017] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/github_buttons.js HTTP/1.1" 200 -
    [2019-11-06 23:32:18,025] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/common.js HTTP/1.1" 200 -
    [2019-11-06 23:32:18,026] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/jquery.min.js HTTP/1.1" 200 -
    [2019-11-06 23:32:18,031] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/vue.min.js HTTP/1.1" 200 -
    [2019-11-06 23:32:18,033] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/element-ui%402.4.6/lib/theme-chalk/index.css HTTP/1.1" 200 -
    [2019-11-06 23:32:18,049] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/element-ui%402.4.6/lib/index.js HTTP/1.1" 200 -
    [2019-11-06 23:32:18,366] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/element-ui%402.4.6/lib/theme-chalk/fonts/element-icons.woff HTTP/1.1" 200 -
    [2019-11-06 23:32:18,378] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "POST /1/api/daemonstatus/ HTTP/1.1" 200 -
    [2019-11-06 23:32:22,901] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:22] "GET /1/settings/ HTTP/1.1" 200 -`

And the results of cmds:
$ echo $DATABASE_URL

$ pwd

/Users/laihuiying/Workspace/PythonEnv/scrapydweb

$ cat scrapydweb_settings_v10.py | grep DATABASE_URL

DATABASE_URL = 'mysql:https://root:[email protected]:3306'

Thx for your patience !

@my8100
Copy link
Owner

my8100 commented Nov 6, 2019

What’s the result of this cmd now?

$ python -c "from scrapydweb_settings_v10 import DATABASE_URL; print(DATABASE_URL)"

@LcodingL
Copy link
Author

LcodingL commented Nov 6, 2019

mysql:https://root:[email protected]:3306

@my8100
Copy link
Owner

my8100 commented Nov 6, 2019

Can you post the screenshot of the related info in the Settings page?

@LcodingL
Copy link
Author

LcodingL commented Nov 6, 2019

Ive tried many times to upload screenshot but failed every time T.T

@my8100
Copy link
Owner

my8100 commented Nov 6, 2019

Then just post the text.

@LcodingL
Copy link
Author

LcodingL commented Nov 6, 2019

For easy-reading,ive removed all the comments:

DATA_PATH = os.environ.get('DATA_PATH', '')

DATABASE_URL = 'mysql:https://root:[email protected]:3306'

@my8100
Copy link
Owner

my8100 commented Nov 6, 2019

Actually, I’m asking for the value of DATABASE displayed on the web UI.
How did you judge that the config in the file is not working?
For convenience, you can execute $ export SCRAPYDWEB_TESTMODE=True and restart scrapydweb to see which backend is being used behind the scenes.

@LcodingL
Copy link
Author

LcodingL commented Nov 7, 2019

I judge from the DATABASE displayed on the web UI:

{
"APSCHEDULER_DATABASE_URI": "sqlite:https:////Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/data/database/apscheduler.db",
"SQLALCHEMY_DATABASE_URI": "sqlite:https:////Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/data/database/timer_tasks.db",
"SQLALCHEMY_BINDS_METADATA": "sqlite:https:////Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/data/database/metadata.db",
"SQLALCHEMY_BINDS_JOBS": "sqlite:https:////Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/data/database/jobs.db"
}

And no related database was created.

@my8100
Copy link
Owner

my8100 commented Nov 7, 2019

Adding sys.path.append(os.getcwd()) before the try clause would fix the issue.
lt’s /Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/vars.py for your case.

Thanks for your support!

SCRAPYDWEB_SETTINGS_PY = 'scrapydweb_settings_v10.py'
try:
custom_settings_module = importlib.import_module(os.path.splitext(SCRAPYDWEB_SETTINGS_PY)[0])
except ImportError:

@LcodingL
Copy link
Author

LcodingL commented Nov 7, 2019

Hi sorry for the delay
Ive added that line before the try clause and restarted scrapydweb and it worked!
Thank you so much for the helpful share and consistent dedication to make it better!

@my8100 my8100 changed the title DATABASE_URL option didn't work[BUG] DATABASE_URL and DATA_PATH options do not take effect in the config file Nov 8, 2019
@my8100 my8100 added bug Something isn't working and removed insufficient info No action would be taken until more info is provided labels Nov 8, 2019
@argoyal
Copy link

argoyal commented Jan 12, 2020

I was facing similar issue. If the DATABASE_URL if present in the environment variable then it works. But if I try to create DATABASE_URL in the custom settings file using some other environment variables, then it fails to work. I will look into this and try to raise a PR resolving this issue.

@Irving-plus
Copy link

git 连接拉不下来

@IMYR666
Copy link

IMYR666 commented Aug 29, 2022

Adding sys.path.append(os.getcwd()) before the try clause would fix the issue. lt’s /Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/vars.py for your case.

Thanks for your support!

SCRAPYDWEB_SETTINGS_PY = 'scrapydweb_settings_v10.py'
try:
custom_settings_module = importlib.import_module(os.path.splitext(SCRAPYDWEB_SETTINGS_PY)[0])
except ImportError:

Hi, the last version was 1.4.0 released on August 16, 2019. But this bug was fixed on May 11, 2020.Can you re-release the latest version? thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants