feat(agent/workspace): Add GCS and S3 FileWorkspace providers (Signif…

…icant-Gravitas#6485) * refactor: Rename FileWorkspace to LocalFileWorkspace and create FileWorkspace abstract class - Rename `FileWorkspace` to `LocalFileWorkspace` to provide a more descriptive name for the class that represents a file workspace that works with local files. - Create a new base class `FileWorkspace` to serve as the parent class for `LocalFileWorkspace`. This allows for easier extension and customization of file workspaces in the future. - Update import statements and references to `FileWorkspace` throughout the codebase to use the new naming conventions. * feat: Add S3FileWorkspace + tests + test setups for CI and Docker - Added S3FileWorkspace class to provide an interface for interacting with a file workspace and storing files in an S3 bucket. - Updated pyproject.toml to include dependencies for boto3 and boto3-stubs. - Implemented unit tests for S3FileWorkspace. - Added MinIO service to Docker CI to allow testing S3 features in CI. - Added autogpt-test service config to docker-compose.yml for local testing with MinIO. * ci(docker): tee test output instead of capturing * fix: Improve error handling in S3FileWorkspace.initialize() - Do not tolerate all `botocore.exceptions.ClientError`s - Raise the exception anyways if the error is not "NoSuchBucket" * feat: Add S3 workspace backend support and S3Credentials - Added support for S3 workspace backend in the Autogpt configuration - Added a new sub-config `S3Credentials` to store S3 credentials - Modified the `.env.template` file to include variables related to S3 credentials - Added a new `s3_credentials` attribute on the `Config` class to store S3 credentials - Moved the `unmasked` method from `ModelProviderCredentials` to the parent `ProviderCredentials` class to handle unmasking for S3 credentials * fix(agent/tests): Fix S3FileWorkspace initialization in test_s3_file_workspace.py - Update the S3FileWorkspace initialization in the test_s3_file_workspace.py file to include the required S3 Credentials. * refactor: Remove S3Credentials and add get_workspace function - Remove `S3Credentials` as boto3 will fetch the config from the environment by itself - Add `get_workspace` function in `autogpt.file_workspace` module - Update `.env.template` and tests to reflect the changes * feat(agent/workspace): Make agent workspace backend configurable - Modified `autogpt.file_workspace.get_workspace` function to either take a workspace `id` or `root_path`. - Modified `FileWorkspaceMixin` to use the `get_workspace` function to set up the workspace. - Updated the type hints and imports accordingly. * feat(agent/workspace): Add GCSFileWorkspace for Google Cloud Storage - Added support for Google Cloud Storage as a storage backend option in the workspace. - Created the `GCSFileWorkspace` class to interface with a file workspace stored in a Google Cloud Storage bucket. - Implemented the `GCSFileWorkspaceConfiguration` class to handle the configuration for Google Cloud Storage workspaces. - Updated the `get_workspace` function to include the option to use Google Cloud Storage as a workspace backend. - Added unit tests for the new `GCSFileWorkspace` class. * fix: Unbreak use of non-local workspaces in AgentProtocolServer - Modify the `_get_task_agent_file_workspace` method to handle both local and non-local workspaces correctly
lyc202001 · Dec 7, 2023 · 1f40d72 · 1f40d72
1 parent fdd7f8e
commit 1f40d72
Show file tree

Hide file tree

Showing 20 changed files with 1,425 additions and 152 deletions.
diff --git a/.github/workflows/autogpt-ci.yml b/.github/workflows/autogpt-ci.yml
@@ -83,6 +83,15 @@ jobs:
  matrix:
  python-version: ["3.10"]
 
+ services:
+ minio:
+ image: minio/minio:edge-cicd
+ ports:
+ - 9000:9000
+ options: >
+ --health-interval=10s --health-timeout=5s --health-retries=3
+ --health-cmd="curl -f https://localhost:9000/minio/health/live"
+
  steps:
  - name: Checkout repository
  uses: actions/checkout@v3
@@ -154,8 +163,11 @@ jobs:
  tests/unit tests/integration
  env:
  CI: true
- OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
  PLAIN_OUTPUT: True
+ OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+ S3_ENDPOINT_URL: https://localhost:9000
+ AWS_ACCESS_KEY_ID: minioadmin
+ AWS_SECRET_ACCESS_KEY: minioadmin
 
  - name: Upload coverage reports to Codecov
  uses: codecov/codecov-action@v3

diff --git a/.github/workflows/autogpt-docker-ci.yml b/.github/workflows/autogpt-docker-ci.yml
@@ -89,6 +89,15 @@ jobs:
  test:
  runs-on: ubuntu-latest
  timeout-minutes: 10
+
+ services:
+ minio:
+ image: minio/minio:edge-cicd
+ options: >
+ --name=minio
+ --health-interval=10s --health-timeout=5s --health-retries=3
+ --health-cmd="curl -f https://localhost:9000/minio/health/live"
+
  steps:
  - name: Check out repository
  uses: actions/checkout@v3
@@ -124,23 +133,25 @@ jobs:
  CI: true
  PLAIN_OUTPUT: True
  OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+ S3_ENDPOINT_URL: https://minio:9000
+ AWS_ACCESS_KEY_ID: minioadmin
+ AWS_SECRET_ACCESS_KEY: minioadmin
  run: |
  set +e
- test_output=$(
- docker run --env CI --env OPENAI_API_KEY \
- --entrypoint poetry ${{ env.IMAGE_NAME }} run \
- pytest -v --cov=autogpt --cov-branch --cov-report term-missing \
- --numprocesses=4 --durations=10 \
- tests/unit tests/integration 2>&1
- )
- test_failure=$?
+ docker run --env CI --env OPENAI_API_KEY \
+ --network container:minio \
+ --env S3_ENDPOINT_URL --env AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY \
+ --entrypoint poetry ${{ env.IMAGE_NAME }} run \
+ pytest -v --cov=autogpt --cov-branch --cov-report term-missing \
+ --numprocesses=4 --durations=10 \
+ tests/unit tests/integration 2>&1 | tee test_output.txt
 
- echo "$test_output"
+ test_failure=${PIPESTATUS[0]}
 
  cat << $EOF >> $GITHUB_STEP_SUMMARY
  # Tests $([ $test_failure = 0 ] && echo '✅' || echo '❌')
  \`\`\`
- $test_output
+ $(cat test_output.txt)
  \`\`\`
  $EOF
 

diff --git a/autogpts/autogpt/.env.template b/autogpts/autogpt/.env.template
@@ -8,9 +8,32 @@ OPENAI_API_KEY=your-openai-api-key
 ## EXECUTE_LOCAL_COMMANDS - Allow local command execution (Default: False)
 # EXECUTE_LOCAL_COMMANDS=False
 
+### Workspace ###
+
 ## RESTRICT_TO_WORKSPACE - Restrict file operations to workspace ./data/agents/<agent_id>/workspace (Default: True)
 # RESTRICT_TO_WORKSPACE=True
 
+## DISABLED_COMMAND_CATEGORIES - The list of categories of commands that are disabled (Default: None)
+# DISABLED_COMMAND_CATEGORIES=
+
+## WORKSPACE_BACKEND - Choose a storage backend for workspace contents
+## Options: local, gcs, s3
+# WORKSPACE_BACKEND=local
+
+## WORKSPACE_STORAGE_BUCKET - GCS/S3 Bucket to store workspace contents in
+# WORKSPACE_STORAGE_BUCKET=autogpt
+
+## GCS Credentials
+# see https://cloud.google.com/storage/docs/authentication#libauth
+
+## AWS/S3 Credentials
+# see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
+
+## S3_ENDPOINT_URL - If you're using non-AWS S3, set your endpoint here.
+# S3_ENDPOINT_URL=
+
+### Miscellaneous ###
+
 ## USER_AGENT - Define the user-agent used by the requests library to browse website (string)
 # USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
 
@@ -29,12 +52,6 @@ OPENAI_API_KEY=your-openai-api-key
 ## EXIT_KEY - Key to exit AutoGPT
 # EXIT_KEY=n
 
-## PLAIN_OUTPUT - Plain output, which disables the spinner (Default: False)
-# PLAIN_OUTPUT=False
-
-## DISABLED_COMMAND_CATEGORIES - The list of categories of commands that are disabled (Default: None)
-# DISABLED_COMMAND_CATEGORIES=
-
 ################################################################################
 ### LLM PROVIDER
 ################################################################################
@@ -201,5 +218,5 @@ OPENAI_API_KEY=your-openai-api-key
 ## Note: Log file output is disabled if LOG_FORMAT=structured_google_cloud.
 # LOG_FILE_FORMAT=simple
 
-## PLAIN_OUTPUT - Disables animated typing in the console output.
+## PLAIN_OUTPUT - Disables animated typing and the spinner in the console output. (Default: False)
 # PLAIN_OUTPUT=False
diff --git a/autogpts/autogpt/autogpt/agents/features/file_workspace.py b/autogpts/autogpt/autogpt/agents/features/file_workspace.py
@@ -5,11 +5,15 @@
 if TYPE_CHECKING:
  from pathlib import Path
 
- from ..base import BaseAgent
+ from ..base import BaseAgent, Config
 
-from autogpt.file_workspace import FileWorkspace
+from autogpt.file_workspace import (
+ FileWorkspace,
+ FileWorkspaceBackendName,
+ get_workspace,
+)
 
-from ..base import AgentFileManager, BaseAgentConfiguration
+from ..base import AgentFileManager, BaseAgentSettings
 
 
 class FileWorkspaceMixin:
@@ -22,32 +26,36 @@ def __init__(self, **kwargs):
  # Initialize other bases first, because we need the config from BaseAgent
  super(FileWorkspaceMixin, self).__init__(**kwargs)
 
- config: BaseAgentConfiguration = getattr(self, "config")
- if not isinstance(config, BaseAgentConfiguration):
- raise ValueError(
- "Cannot initialize Workspace for Agent without compatible .config"
- )
  file_manager: AgentFileManager = getattr(self, "file_manager")
  if not file_manager:
  return
 
- self.workspace = _setup_workspace(file_manager, config)
+ self._setup_workspace()
 
  def attach_fs(self, agent_dir: Path):
  res = super(FileWorkspaceMixin, self).attach_fs(agent_dir)
 
- self.workspace = _setup_workspace(self.file_manager, self.config)
+ self._setup_workspace()
 
  return res
 
+ def _setup_workspace(self) -> None:
+ settings: BaseAgentSettings = getattr(self, "state")
+ assert settings.agent_id, "Cannot attach workspace to anonymous agent"
+ app_config: Config = getattr(self, "legacy_config")
+ file_manager: AgentFileManager = getattr(self, "file_manager")
 
-def _setup_workspace(file_manager: AgentFileManager, config: BaseAgentConfiguration):
- workspace = FileWorkspace(
- file_manager.root / "workspace",
- restrict_to_root=not config.allow_fs_access,
- )
- workspace.initialize()
- return workspace
+ ws_backend = app_config.workspace_backend
+ local = ws_backend == FileWorkspaceBackendName.LOCAL
+ workspace = get_workspace(
+ backend=ws_backend,
+ id=settings.agent_id if not local else "",
+ root_path=file_manager.root / "workspace" if local else None,
+ )
+ if local and settings.config.allow_fs_access:
+ workspace._restrict_to_root = False # type: ignore
+ workspace.initialize()
+ self.workspace = workspace
 
 
 def get_agent_workspace(agent: BaseAgent) -> FileWorkspace | None:

diff --git a/autogpts/autogpt/autogpt/app/agent_protocol_server.py b/autogpts/autogpt/autogpt/app/agent_protocol_server.py
@@ -33,7 +33,11 @@
 from autogpt.commands.user_interaction import ask_user
 from autogpt.config import Config
 from autogpt.core.resource.model_providers import ChatModelProvider
-from autogpt.file_workspace import FileWorkspace
+from autogpt.file_workspace import (
+ FileWorkspace,
+ FileWorkspaceBackendName,
+ get_workspace,
+)
 from autogpt.models.action_history import ActionErrorResult, ActionSuccessResult
 
 logger = logging.getLogger(__name__)
@@ -340,7 +344,7 @@ async def create_artifact(
  else:
  file_path = os.path.join(relative_path, file_name)
 
- workspace = get_task_agent_file_workspace(task_id, self.agent_manager)
+ workspace = self._get_task_agent_file_workspace(task_id, self.agent_manager)
  await workspace.write_file(file_path, data)
 
  artifact = await self.db.create_artifact(
@@ -361,7 +365,7 @@ async def get_artifact(self, task_id: str, artifact_id: str) -> Artifact:
  file_path = os.path.join(artifact.relative_path, artifact.file_name)
  else:
  file_path = artifact.relative_path
- workspace = get_task_agent_file_workspace(task_id, self.agent_manager)
+ workspace = self._get_task_agent_file_workspace(task_id, self.agent_manager)
  retrieved_artifact = workspace.read_file(file_path, binary=True)
  except NotFoundError:
  raise
@@ -376,24 +380,33 @@ async def get_artifact(self, task_id: str, artifact_id: str) -> Artifact:
  },
  )
 
+ def _get_task_agent_file_workspace(
+ self,
+ task_id: str | int,
+ agent_manager: AgentManager,
+ ) -> FileWorkspace:
+ use_local_ws = (
+ self.app_config.workspace_backend == FileWorkspaceBackendName.LOCAL
+ )
+ agent_id = task_agent_id(task_id)
+ workspace = get_workspace(
+ backend=self.app_config.workspace_backend,
+ id=agent_id if not use_local_ws else "",
+ root_path=agent_manager.get_agent_dir(
+ agent_id=agent_id,
+ must_exist=True,
+ )
+ / "workspace"
+ if use_local_ws
+ else None,
+ )
+ workspace.initialize()
+ return workspace
+
 
 def task_agent_id(task_id: str | int) -> str:
  return f"AutoGPT-{task_id}"
 
 
-def get_task_agent_file_workspace(
- task_id: str | int,
- agent_manager: AgentManager,
-) -> FileWorkspace:
- return FileWorkspace(
- root=agent_manager.get_agent_dir(
- agent_id=task_agent_id(task_id),
- must_exist=True,
- )
- / "workspace",
- restrict_to_root=True,
- )
-
-
 def fmt_kwargs(kwargs: dict) -> str:
  return ", ".join(f"{n}={repr(v)}" for n, v in kwargs.items())
diff --git a/autogpts/autogpt/autogpt/config/config.py b/autogpts/autogpt/autogpt/config/config.py
@@ -20,6 +20,7 @@
  OPEN_AI_CHAT_MODELS,
  OpenAICredentials,
 )
+from autogpt.file_workspace import FileWorkspaceBackendName
 from autogpt.logs.config import LoggingConfig
 from autogpt.plugins.plugins_config import PluginsConfig
 from autogpt.speech import TTSConfig
@@ -51,10 +52,19 @@ class Config(SystemSettings, arbitrary_types_allowed=True):
  chat_messages_enabled: bool = UserConfigurable(
  default=True, from_env=lambda: os.getenv("CHAT_MESSAGES_ENABLED") == "True"
  )
+
  # TTS configuration
  tts_config: TTSConfig = TTSConfig()
  logging: LoggingConfig = LoggingConfig()
 
+ # Workspace
+ workspace_backend: FileWorkspaceBackendName = UserConfigurable(
+ default=FileWorkspaceBackendName.LOCAL,
+ from_env=lambda: FileWorkspaceBackendName(v)
+ if (v := os.getenv("WORKSPACE_BACKEND"))
+ else None,
+ )
+
  ##########################
  # Agent Control Settings #
  ##########################

diff --git a/autogpts/autogpt/autogpt/core/resource/model_providers/schema.py b/autogpts/autogpt/autogpt/core/resource/model_providers/schema.py
@@ -172,24 +172,10 @@ class ModelProviderCredentials(ProviderCredentials):
  api_version: SecretStr | None = UserConfigurable(default=None)
  deployment_id: SecretStr | None = UserConfigurable(default=None)
 
- def unmasked(self) -> dict:
- return unmask(self)
-
  class Config:
  extra = "ignore"
 
 
-def unmask(model: BaseModel):
- unmasked_fields = {}
- for field_name, field in model.__fields__.items():
- value = getattr(model, field_name)
- if isinstance(value, SecretStr):
- unmasked_fields[field_name] = value.get_secret_value()
- else:
- unmasked_fields[field_name] = value
- return unmasked_fields
-
-
 class ModelProviderUsage(ProviderUsage):
  """Usage for a particular model from a model provider."""
 

diff --git a/autogpts/autogpt/autogpt/core/resource/schema.py b/autogpts/autogpt/autogpt/core/resource/schema.py
@@ -1,7 +1,7 @@
 import abc
 import enum
 
-from pydantic import SecretBytes, SecretField, SecretStr
+from pydantic import BaseModel, SecretBytes, SecretField, SecretStr
 
 from autogpt.core.configuration import (
  SystemConfiguration,
@@ -39,6 +39,9 @@ def update_usage_and_cost(self, *args, **kwargs) -> None:
 class ProviderCredentials(SystemConfiguration):
  """Struct for credentials."""
 
+ def unmasked(self) -> dict:
+ return unmask(self)
+
  class Config:
  json_encoders = {
  SecretStr: lambda v: v.get_secret_value() if v else None,
@@ -47,6 +50,17 @@ class Config:
  }
 
 
+def unmask(model: BaseModel):
+ unmasked_fields = {}
+ for field_name, _ in model.__fields__.items():
+ value = getattr(model, field_name)
+ if isinstance(value, SecretStr):
+ unmasked_fields[field_name] = value.get_secret_value()
+ else:
+ unmasked_fields[field_name] = value
+ return unmasked_fields
+
+
 class ProviderSettings(SystemSettings):
  resource_type: ResourceType
  credentials: ProviderCredentials | None = None