DATABASE_URL and DATA_PATH options do not take effect in the config file #100

LcodingL · 2019-11-04T08:01:08Z

Describe the bug
Ive set the option DATABASE_URL to support MySQL in a correct format and restart scrapydweb,but no DBS in [DB_APSCHEDULER, DB_TIMERTASKS, DB_METADATA, DB_JOBS] had been created and the Settings of DATABASE displayed on web UI are still "sqlite:https:////......"

To Reproduce
Steps to reproduce the behavior:

edit 'scrapydweb_settings_v10.py' with 'DATABASE_URL = 'mysql:https://root:[email protected]:3306''
run command: pip install --upgrade pymysql
restart scrapydweb by running command 'scrapydweb' under path where the config file is.

Expected behavior

I used to use default DATABASE_URL and data were stored in sqlite normally and now i want to use MySQL backend. Will the related databases be created in mysql automatically?
Since i didnt do database migration from sqlite to mysql manually, i thought no job status should be displayed on Dashboard after i set DATABASE_URL of mysql in config file .But it showed all the jobs status as before and the Settings of DATABASE displayed on web UI are still "sqlite:https:////......"

3.If the Settings of DATABASE displayed on web UI is right the database used by the running scrapydweb?

4.Do i need to migrate data from sqlite to mysql manually if i want use MySQL backend in the future?

Logs

[2019-11-04 15:48:56,143] INFO in apscheduler.scheduler: Scheduler started
[2019-11-04 15:48:56,162] INFO in scrapydweb.run: ScrapydWeb version: 1.4.0
[2019-11-04 15:48:56,163] INFO in scrapydweb.run: Use 'scrapydweb -h' to get help
[2019-11-04 15:48:56,163] INFO in scrapydweb.run: Main pid: 2630
[2019-11-04 15:48:56,163] DEBUG in scrapydweb.run: Loading default settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/default_settings.py

Overriding custom settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/scrapydweb_settings_v10.py

[2019-11-04 15:48:56,321] DEBUG in scrapydweb.run: Reading settings from command line: Namespace(bind='0.0.0.0', debug=False, disable_auth=False, disable_logparser=False, disable_monitor=False, port=5000, scrapyd_server=None, switch_scheduler_state=False, verbose=True)
[2019-11-04 15:48:56,321] DEBUG in scrapydweb.utils.check_app_config: Checking app config
[2019-11-04 15:48:56,323] INFO in scrapydweb.utils.check_app_config: Setting up URL_SCRAPYDWEB: http:https://127.0.0.1:5000
[2019-11-04 15:48:56,324] DEBUG in scrapydweb.utils.check_app_config: Checking connectivity of SCRAPYD_SERVERS...

Index Group Scrapyd IP:Port Connectivity Auth
#######################################################################
1____ dataocean___________ 10.8.32.56:6800_______ True_______ None
2____ dataocean___________ 10.8.64.78:6800_______ True_______ None
#######################################################################

/Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/sqlalchemy/ext/declarative/clsregistry.py:129: SAWarning: This declarative base already contains a class with the same class name and module name as scrapydweb.models.Job, and will be replaced in the string-lookup table.
% (item.module, item.name)
[2019-11-04 15:48:56,436] DEBUG in scrapydweb.utils.check_app_config: Created 2 tables for JobsView
[2019-11-04 15:48:56,436] INFO in scrapydweb.utils.check_app_config: Locating scrapy logfiles with SCRAPYD_LOG_EXTENSIONS: ['.log', '.log.gz', '.txt']
[2019-11-04 15:48:56,440] INFO in scrapydweb.utils.check_app_config: Scheduler for timer tasks: STATE_RUNNING
[2019-11-04 15:48:56,481] INFO in scrapydweb.utils.check_app_config: create_jobs_snapshot (trigger: interval[0:05:00], next run at: 2019-11-04 15:53:56 CST)

Visit ScrapydWeb at http:https://127.0.0.1:5000 or http:https://IP-OF-THE-CURRENT-HOST:5000

[2019-11-04 15:48:56,486] INFO in scrapydweb.run: For running Flask in production, check out http:https://flask.pocoo.org/docs/1.0/deploying/

Serving Flask app "scrapydweb" (lazy loading)

Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.

Debug mode: off
[2019-11-04 15:48:56,487] DEBUG in apscheduler.scheduler: Next wakeup is due at 2019-11-04 15:53:56.480998+08:00 (in 299.999017 seconds)
[2019-11-04 15:49:26,498] INFO in werkzeug: * Running on http:https://0.0.0.0:5000/ (Press CTRL+C to quit)
[2019-11-04 15:49:26,585] DEBUG in ApiView: view_args of >http:https://127.0.0.1:5000/1/api/daemonstatus/
{
"node": 1,
"opt": "daemonstatus",
"project": null,
"version_spider_job": null
}

Environment (please complete the following information):

OS: macOS 10.14
Python: 3.6
ScrapydWeb: 1.4.0
Browser :Chrome 73

Thx for your time !

my8100 · 2019-11-04T14:04:46Z

Check and make sure DATABASE_URL has been configured as expected:

$ echo $DATABASE_URL
$ python -c "from scrapydweb_settings_v10 import DATABASE_URL; print(DATABASE_URL)"

LcodingL · 2019-11-04T14:29:34Z

Thx for your timely reply！
I set configuration in this way:

DATABASE_URL = 'mysql:https://root:[email protected]:3306'

Runned command and got results as below:
$ echo $DATABASE_URL
-->''
$ python -c "from scrapydweb_settings_v10 import DATABASE_URL; print(DATABASE_URL)"
-->'mysql:https://root:[email protected]:3306'

Is there anything wrong?

my8100 · 2019-11-04T15:02:29Z

Is it the file you are editing?

Overriding custom settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/scrapydweb_settings_v10.py

Remove the DATABASE_URL option in the config file.
Execute $ export SCRAPYDWEB_TESTMODE=True and $ export DATABASE_URL=mysql:https://root:[email protected]:3306
Restart scrapydweb, some log like below should be found at the begining.
If not found, mv scrapydweb_settings_v10.py scrapydweb_settings_v10.py.bak, then pip uninstall scrapydweb, then pip install --upgrade scrapydweb, finally restart scrapydweb.

APSCHEDULER_DATABASE_URI: mysql:https://root:[email protected]:3306/scrapydweb_apscheduler
SQLALCHEMY_DATABASE_URI: mysql:https://root:[email protected]:3306/scrapydweb_timertasks
SQLALCHEMY_BINDS: {'jobs': 'mysql:https://root:[email protected]:3306/scrapydweb_jobs', 'metadata': 'mysql:https://root:[email protected]:3306/scrapydweb_metadata'}

LcodingL · 2019-11-05T08:08:28Z

Great !
Ive tried the first three methods you listed above and it works!
4 related databases has been created automatically and data are stored normally!
Also, ive tried to $ export SCRAPYDWEB_TESTMODE=False and restart and it works as well.
So It seems to be that we should set DATABASE_URL in server envionment variables instead of in config file.Is that designed so or something need correction?

Besides,
Could i do the database migration of scrapydweb_timertasks to save duplicate operation of scheduling timer tasks manually again?

THANKS A LOT ^^

my8100 · 2019-11-05T13:02:47Z

Make sure there’s only one DATABASE_URL in the file.

$ cat /Users/laihuiying/Workspace/PythonEnv/scrapydweb/scrapydweb_settings_v10.py | grep DATABASE_URL

Set DATABASE_URL = 'mysql:https://root:[email protected]:3306' in the config file above.
Execute $ export DATABASE_URL=
Restart scrapydweb.

You can try to migrate the database by yourself.

LcodingL · 2019-11-06T03:54:25Z

Ive followed the steps and failed.The config file didnt work.It is likely to be that we must set environment variables export DATABASE_URL=mysql:https://username:password@IP:PORT manually to make mysql backend valid.

my8100 · 2019-11-06T04:00:12Z

$ export DATABASE_URL=
$ echo $DATABASE_URL
$ mv scrapydweb_settings_v10.py scrapydweb_settings_v10.py.bak
$ pip uninstall scrapydweb
$ pip install --upgrade scrapydweb

Restart scrapydweb and re-config the new generated file.
If still not working, post the full log, as well as the result of the following cmd:

$ echo $DATABASE_URL
$ pwd
$ cat scrapydweb_settings_v10.py | grep DATABASE_URL

LcodingL · 2019-11-06T15:48:55Z

Hi
Ive done the reinstallation and ran it with new config file yet it failed again.Below is the full log for your reference:

[2019-11-06 23:31:41,179] INFO in apscheduler.scheduler: Scheduler started
[2019-11-06 23:31:41,186] INFO in scrapydweb.run: ScrapydWeb version: 1.4.0
[2019-11-06 23:31:41,187] INFO in scrapydweb.run: Use 'scrapydweb -h' to get help
[2019-11-06 23:31:41,187] INFO in scrapydweb.run: Main pid: 10215
[2019-11-06 23:31:41,187] DEBUG in scrapydweb.run: Loading default settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/default_settings.py

Overriding custom settings from /Users/laihuiying/Workspace/PythonEnv/scrapydweb/scrapydweb_settings_v10.py

[2019-11-06 23:31:41,301] DEBUG in scrapydweb.run: Reading settings from command line: Namespace(bind='0.0.0.0', debug=False, disable_auth=False, disable_logparser=False, disable_monitor=False, port=5000, scrapyd_server=None, switch_scheduler_state=False, verbose=False)
[2019-11-06 23:31:41,301] DEBUG in scrapydweb.utils.check_app_config: Checking app config
[2019-11-06 23:31:41,303] INFO in scrapydweb.utils.check_app_config: Setting up URL_SCRAPYDWEB: http:https://127.0.0.1:5000
[2019-11-06 23:31:41,303] DEBUG in scrapydweb.utils.check_app_config: Checking connectivity of SCRAPYD_SERVERS...

Index Group Scrapyd IP:Port Connectivity Auth
####################################################################################################
1____ None________________ 127.0.0.1:6800________ True_______ None
2____ test________________ localhost:6800________ True_______ None
####################################################################################################

/Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/sqlalchemy/ext/declarative/clsregistry.py:129: SAWarning: This declarative base already contains a class with the same class name and module name as scrapydweb.models.Job, and will be replaced in the string-lookup table.
% (item.module, item.name)
[2019-11-06 23:31:41,434] DEBUG in scrapydweb.utils.check_app_config: Created 2 tables for JobsView
[2019-11-06 23:31:41,434] INFO in scrapydweb.utils.check_app_config: Locating scrapy logfiles with SCRAPYD_LOG_EXTENSIONS: ['.log', '.log.gz', '.txt']
[2019-11-06 23:31:41,439] INFO in scrapydweb.utils.check_app_config: Scheduler for timer tasks: STATE_RUNNING
[2019-11-06 23:31:41,479] INFO in scrapydweb.utils.check_app_config: create_jobs_snapshot (trigger: interval[0:05:00], next run at: 2019-11-06 23:36:41 CST)

Visit ScrapydWeb at http:https://127.0.0.1:5000 or http:https://IP-OF-THE-CURRENT-HOST:5000

[2019-11-06 23:31:41,484] INFO in scrapydweb.run: For running Flask in production, check out http:https://flask.pocoo.org/docs/1.0/deploying/

Serving Flask app "scrapydweb" (lazy loading)

Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.

Debug mode: off
[2019-11-06 23:31:41,485] DEBUG in apscheduler.scheduler: Next wakeup is due at 2019-11-06 23:36:41.479754+08:00 (in 299.998942 seconds)
[2019-11-06 23:32:11,694] INFO in werkzeug: * Running on http:https://0.0.0.0:5000/ (Press CTRL+C to quit)
[2019-11-06 23:32:17,956] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:17] "GET /1/nodereports/ HTTP/1.1" 200 -
[2019-11-06 23:32:18,008] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/css/style.css HTTP/1.1" 200 -
[2019-11-06 23:32:18,009] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/css/icon_upload_icon_right.css HTTP/1.1" 200 -
[2019-11-06 23:32:18,014] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/css/dropdown.css HTTP/1.1" 200 -
[2019-11-06 23:32:18,016] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/icons_menu.js HTTP/1.1" 200 -
[2019-11-06 23:32:18,017] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/github_buttons.js HTTP/1.1" 200 -
[2019-11-06 23:32:18,025] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/common.js HTTP/1.1" 200 -
[2019-11-06 23:32:18,026] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/jquery.min.js HTTP/1.1" 200 -
[2019-11-06 23:32:18,031] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/js/vue.min.js HTTP/1.1" 200 -
[2019-11-06 23:32:18,033] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/element-ui%402.4.6/lib/theme-chalk/index.css HTTP/1.1" 200 -
[2019-11-06 23:32:18,049] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/element-ui%402.4.6/lib/index.js HTTP/1.1" 200 -
[2019-11-06 23:32:18,366] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "GET /static/v140/element-ui%402.4.6/lib/theme-chalk/fonts/element-icons.woff HTTP/1.1" 200 -
[2019-11-06 23:32:18,378] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:18] "POST /1/api/daemonstatus/ HTTP/1.1" 200 -
[2019-11-06 23:32:22,901] INFO in werkzeug: 127.0.0.1 - - [06/Nov/2019 23:32:22] "GET /1/settings/ HTTP/1.1" 200 -`

And the results of cmds:
$ echo $DATABASE_URL

$ pwd

/Users/laihuiying/Workspace/PythonEnv/scrapydweb

$ cat scrapydweb_settings_v10.py | grep DATABASE_URL

DATABASE_URL = 'mysql:https://root:[email protected]:3306'

Thx for your patience !

my8100 · 2019-11-06T15:55:30Z

What’s the result of this cmd now?

$ python -c "from scrapydweb_settings_v10 import DATABASE_URL; print(DATABASE_URL)"

LcodingL · 2019-11-06T15:56:43Z

mysql:https://root:[email protected]:3306

my8100 · 2019-11-06T15:59:00Z

Can you post the screenshot of the related info in the Settings page?

LcodingL · 2019-11-06T16:07:30Z

Ive tried many times to upload screenshot but failed every time T.T

my8100 · 2019-11-06T16:10:25Z

Then just post the text.

LcodingL · 2019-11-06T16:19:51Z

For easy-reading,ive removed all the comments:

DATA_PATH = os.environ.get('DATA_PATH', '')

DATABASE_URL = 'mysql:https://root:[email protected]:3306'

my8100 · 2019-11-06T16:33:56Z

Actually, I’m asking for the value of DATABASE displayed on the web UI.
How did you judge that the config in the file is not working?
For convenience, you can execute $ export SCRAPYDWEB_TESTMODE=True and restart scrapydweb to see which backend is being used behind the scenes.

LcodingL · 2019-11-07T01:41:47Z

I judge from the DATABASE displayed on the web UI:

{
"APSCHEDULER_DATABASE_URI": "sqlite:https:////Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/data/database/apscheduler.db",
"SQLALCHEMY_DATABASE_URI": "sqlite:https:////Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/data/database/timer_tasks.db",
"SQLALCHEMY_BINDS_METADATA": "sqlite:https:////Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/data/database/metadata.db",
"SQLALCHEMY_BINDS_JOBS": "sqlite:https:////Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/data/database/jobs.db"
}

And no related database was created.

my8100 · 2019-11-07T04:36:47Z

Adding sys.path.append(os.getcwd()) before the try clause would fix the issue.
lt’s /Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/vars.py for your case.

Thanks for your support!

scrapydweb/scrapydweb/vars.py

Lines 18 to 21 in 8104386

 SCRAPYDWEB_SETTINGS_PY = 'scrapydweb_settings_v10.py' 

 try: 

 custom_settings_module = importlib.import_module(os.path.splitext(SCRAPYDWEB_SETTINGS_PY)[0]) 

 except ImportError:

LcodingL · 2019-11-07T16:04:56Z

Hi sorry for the delay
Ive added that line before the try clause and restarted scrapydweb and it worked！
Thank you so much for the helpful share and consistent dedication to make it better！

argoyal · 2020-01-12T11:24:18Z

I was facing similar issue. If the DATABASE_URL if present in the environment variable then it works. But if I try to create DATABASE_URL in the custom settings file using some other environment variables, then it fails to work. I will look into this and try to raise a PR resolving this issue.

Irving-plus · 2021-01-18T11:46:21Z

git 连接拉不下来

IMYR666 · 2022-08-29T08:24:06Z

Adding sys.path.append(os.getcwd()) before the try clause would fix the issue. lt’s /Users/laihuiying/Workspace/PythonEnv/scrapydweb/lib/python3.6/site-packages/scrapydweb/vars.py for your case.

Thanks for your support!

scrapydweb/scrapydweb/vars.py

Lines 18 to 21 in 8104386

SCRAPYDWEB_SETTINGS_PY = 'scrapydweb_settings_v10.py'

try:

custom_settings_module = importlib.import_module(os.path.splitext(SCRAPYDWEB_SETTINGS_PY)[0])

except ImportError:

Hi, the last version was 1.4.0 released on August 16, 2019. But this bug was fixed on May 11, 2020.Can you re-release the latest version? thx

LcodingL added the bug Something isn't working label Nov 4, 2019

LcodingL assigned my8100 Nov 4, 2019

my8100 removed their assignment Nov 4, 2019

my8100 added insufficient info No action would be taken until more info is provided and removed bug Something isn't working labels Nov 4, 2019

my8100 changed the title ~~DATABASE_URL option didn't work[BUG]~~ DATABASE_URL and DATA_PATH options do not take effect in the config file Nov 8, 2019

my8100 added bug Something isn't working and removed insufficient info No action would be taken until more info is provided labels Nov 8, 2019

my8100 added a commit that referenced this issue Mar 1, 2020

Read DATA_PATH and DATABASE_URL from config file on boot (#100)

7c41afa

my8100 closed this as completed in 6d7c53a Mar 1, 2020

my8100 mentioned this issue May 11, 2020

database_url doesn't working in version 1.4.0 #141

Closed

my8100 mentioned this issue Apr 14, 2021

scrapydweb_settings_v10.py的DATABASE_URL修改不生效 #182

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATABASE_URL and DATA_PATH options do not take effect in the config file #100

DATABASE_URL and DATA_PATH options do not take effect in the config file #100

LcodingL commented Nov 4, 2019 •

edited

Loading

my8100 commented Nov 4, 2019

LcodingL commented Nov 4, 2019 •

edited

Loading

my8100 commented Nov 4, 2019 •

edited

Loading

LcodingL commented Nov 5, 2019

my8100 commented Nov 5, 2019 •

edited

Loading

LcodingL commented Nov 6, 2019

my8100 commented Nov 6, 2019 •

edited

Loading

LcodingL commented Nov 6, 2019

my8100 commented Nov 6, 2019

LcodingL commented Nov 6, 2019

my8100 commented Nov 6, 2019

LcodingL commented Nov 6, 2019

my8100 commented Nov 6, 2019

LcodingL commented Nov 6, 2019 •

edited by my8100

Loading

my8100 commented Nov 6, 2019

LcodingL commented Nov 7, 2019

my8100 commented Nov 7, 2019

LcodingL commented Nov 7, 2019

argoyal commented Jan 12, 2020

Irving-plus commented Jan 18, 2021

IMYR666 commented Aug 29, 2022

DATABASE_URL and DATA_PATH options do not take effect in the config file #100

DATABASE_URL and DATA_PATH options do not take effect in the config file #100

Comments

LcodingL commented Nov 4, 2019 • edited Loading

my8100 commented Nov 4, 2019

LcodingL commented Nov 4, 2019 • edited Loading

my8100 commented Nov 4, 2019 • edited Loading

LcodingL commented Nov 5, 2019

my8100 commented Nov 5, 2019 • edited Loading

LcodingL commented Nov 6, 2019

my8100 commented Nov 6, 2019 • edited Loading

LcodingL commented Nov 6, 2019

my8100 commented Nov 6, 2019

LcodingL commented Nov 6, 2019

my8100 commented Nov 6, 2019

LcodingL commented Nov 6, 2019

my8100 commented Nov 6, 2019

LcodingL commented Nov 6, 2019 • edited by my8100 Loading

my8100 commented Nov 6, 2019

LcodingL commented Nov 7, 2019

my8100 commented Nov 7, 2019

LcodingL commented Nov 7, 2019

argoyal commented Jan 12, 2020

Irving-plus commented Jan 18, 2021

IMYR666 commented Aug 29, 2022

LcodingL commented Nov 4, 2019 •

edited

Loading

LcodingL commented Nov 4, 2019 •

edited

Loading

my8100 commented Nov 4, 2019 •

edited

Loading

my8100 commented Nov 5, 2019 •

edited

Loading

my8100 commented Nov 6, 2019 •

edited

Loading

LcodingL commented Nov 6, 2019 •

edited by my8100

Loading