Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify log and error report for easier trouble shooting #3190

Open
yecol opened this issue Sep 6, 2023 · 8 comments
Open

Unify log and error report for easier trouble shooting #3190

yecol opened this issue Sep 6, 2023 · 8 comments
Assignees

Comments

@yecol
Copy link
Collaborator

yecol commented Sep 6, 2023

Motivations and proposal

  • a unified error code, easy to find a concrete reason for failure.
  • easy to identify which engine; Within each engine, it can have its own status code.
  • easy to find which machine(node), and easy to get the full log.
  • no engine should revise/omit the original error info.

Current status

  • vineyard/gie are using status codes.
  • coordinator would convert to internal error
  • enumerate the status code: TBF

Proposal
update:

  • Append error code, error type and engine to error message.

** deprecated!!!**

  • Each engine can maintain its original status code
  • Add a unified EngineIdentifier to indicate the status belong to which engine
  • Revise the Status implementation to check the error

Example:

  • Status Code (assume is analytical engine)
  Enum class StatusCode {
  Ok = 0;
  Invalid = 1;
  Unknown = 255;
}
  • Engine Identifier
Enum class EngineIdentifier {
  null = 0;
  analytical = 1;
  interactive = 2;
  learning = 3;
  vineyard = 4;
  graphar = 5;
  gart = 6;
}
  • Status Implementation
    replace the code_ value type from StatusCode to uint32 , which high 16 bits store the engine identifier, the low 16 bits store the status code
// assume this is Status of analytical engine
class Status {
  Status(uint32_t code, std::string message);

  Status InvalidError(std::string message) {
     return Status(((static_cast<uint32_t>(EngineIdentifier::analytical) << 16) | static_cast<uint32_t>(ErrorCode::Invalid)), message);
  }

  // check is invalid error occur in analytical engine.
  bool IsInvalid() {
    return  static_cast<EngineIdentifier>(code_ >> 16) == EngineIdentifier::analytical && static_cast<ErrorCode>(code_ & 0xFF) == ErrorCode::Invalid; 
  }
   
  // check is error occur in vineyard module.
  bool IsVineyardError() {
    return static_cast<EngineIdentifier>(code_ >> 16) == EngineIdentifier::vineyard;
  }

 private:
  uint32_t code_;
  std::string message;
}

Corner cases

@acezen
Copy link
Collaborator

acezen commented Sep 7, 2023

Status codes of each engine

Vineyard

Code Number Description
OK 0 Not an error; returned on success.
Invalid 1 Invalid data, for example a string that fails parsing.
KeyError 2 Failed key lookups (e.g. column name in a table).
TypeError 3 Type errors (such as mismatching data types).
IOError 4 IO errors (e.g. Failed to open or read from a file).
EndOfFile 5 Reader reaches at the end of file.
NotImplemented 6 An operation or a combination of operation and data types is unimplemented.
AssertionFailed 7 The condition assertion is false.
UserInputError 8 Invalid user input.
ObjectExists 11 The object exists if the user are still trying to creating such object.
ObjectNotExists 12 The object exists if the user are still trying to creating such object.
ObjectSealed 13 The object has already been sealed.
ObjectNotSealed 14 The object hasn't been sealed yet.
ObjectIsBlob 15 Unsupported operations on blob objects.
ObjectTypeError 16 Cast object to mismatched types.
ObjectSpilled 17 The object has already been spilled.
ObjectNotSpilled 18 The object is not spilled yet.
MetaTreeInvalid 21 Metatree related error occurs.
MetaTreeTypeInvalid 22 The "typename" field in metatree is invalid.
MetaTreeTypeNotExists 23 The "typename" field not exists in metatree.
MetaTreeNameInvalid 24 The "id" field in metatree is invalid.
MetaTreeNameNotExists 25 The "id" field not exists in metatree.
MetaTreeLinkInvalid 26 A field in metatree is expected to be a link but it isn't.
MetaTreeSubtreeNotExists 27 Expected subtree doesn't exist in metatree.
VineyardServerNotReady 31 The requested vineyard server is not ready yet.
ArrowError 32 Arrow related error occurs.
ConnectionFailed 33 Client failed to connect to vineyard server.
ConnectionError 34 Client losts connection to vineyard server.
EtcdError 35 The vineyard server meets an etcd related error.
AlreadyStopped 36 The object has already been stopped.
RedisError 37 The vineyard server meets an redis related error.
NotEnoughMemory 41 The vineyard server cannot allocate more memory blocks.
StreamDrained 42 There's no more chunks in the stream.
StreamFailed 43 The stream has failed.
InvalidStreamState 44 Internal invalid state found for stream.
StreamOpened 45 The stream has been opened.
GlobalObjectInvalid 51 Invalid global object structure.
UnknownError 255 Unknown errors.

@acezen
Copy link
Collaborator

acezen commented Sep 7, 2023

GraphAr

Code Number Description
OK 0 Not an error; returned on success.
KeyError 1 error status for failed key lookups
TypeError 2 error status for type errors
Invalid 3 error status for invalid data
IndexError 4 error status when an index is out of bounds
OutOfMemory 5 error status for out-of-memory conditions
IOError 6 error status when some IO-related operation failed
YamlError 7 error status some yaml parse related operation failed, use to catch mini-yaml raised error
ArrowError 8 error status when some arrow-related operation failed
UnknownError 9 error status for unknown errors

@acezen
Copy link
Collaborator

acezen commented Sep 7, 2023

Interactive engine

Currently, GIE defines Error in each layer:

Compiler:

Code Number Description
OK 0 Not an error; returned on success.
AntlrParseException - antlr parse error
GraphLabelNotFoundException - cannot find given label from the schema
GraphPropertyNotFountException - cannot find given property from the schema
GraphMisMatchTypeException - cannot apply operations on specific type, i.e. get properties from non-node/edge types.
GraphMisMatchOperandsException - cannot apply operands to operator

IR :

Code Number Description
OK 0 Not an error; returned on success
ParseExprError 1 Parse an expression error
MissingDataError 2 Missing necessary data
CStringError 3 The error while transforming from C-like string, aka char*
UnknownTypeError 4 The provided data type is unknown
InvalidRangeError 5 The provided range is invalid
NegativeIndexError 6 The given index is negative
BuildJobError 7 Build Physical Plan Error
ParsePbError 8 Parse protobuf error
ParentNotFoundError 9 The parent of an operator cannot be found
ColumnNotExistError 10 A column (property) does not exist in the store
TableNotExistError 11 A table (label) does not exist in the store
TagNotExistError 12 A queried tag has not been specified
UnSupported 13 An unsupported error
Others 14 other errors.

Runtime (no number defined yet):

Code Number Description
OK 0 Not an error; returned on success
FnGenDecodeOpError - Decode pb structure error
FnGenParseError - Parse pb structure error
FnGenUnsupportedError - An unsupported error in udf gen
FnExecNullGraphError - The graph is not registered in engine
FnExecGetTagError - A tagged column does not exist in the data
FnExecStoreError - An error occurs when query the storage
FnExecExprEvalError - A expression evaluation error
FnExecUnExpectedData - The data type is unexpected
FnExecAccumError - An error in accumulation
FnExecUnSupported - An unsupported error in execution
StoreQueryError - An error when querying storage
StoreWriteError - An error when write a graph
ExprEvalCastError - The error while casting from different data types to a general data type Object
ExprEvalMissingContext - Missing context for the certain variable
ExprEvalMissingOperands - The error of missing required operands in an arithmetic or logical expression
ExprEvalEmptyExpression - An error where an empty expression is to be evaluated
ExprEvalUnexpectedDataType - Meant to evaluate a variable, but the data type is unexpected (e.g., not a graph-element)
ExprEvalGetNoneFromContext - Get None from Context
ClusterInfoMissing - An error that cluster info is missing

@acezen
Copy link
Collaborator

acezen commented Sep 7, 2023

GART

Code Number Description
OK 0 Not an error; returned on success.
KeyError 1 error status for failed key lookups
TypeError 2 error status for type errors
Invalid 3 error status for invalid data
IndexError 4 error status when an index is out of bounds
OutOfMemory 5 error status for out-of-memory conditions
KafkaConnectError 6 error status for failed Kafka connection
CapturerError 7 error status caused by the log capturer
MsgError 8 error status for failed message handling
ParseFileError 9 error status for parsing files
ParseLogeError 10 error status for parsing logs
OperationError 11 error status for operation-related errors
ConfigError 12 error status for incorrect configurations
PremisionError 13 error status caused incorrect premision of database users
VineyardError 14 error status caused by Vineyard
GrinError 15 error status caused by GRIN
UnknownError 16 error status for unknown errors

@acezen
Copy link
Collaborator

acezen commented Sep 7, 2023

Groot

Groot use exception to handle errors

@acezen
Copy link
Collaborator

acezen commented Sep 7, 2023

Learning engine

Code Number Description
OK 0 Not an error; returned on success.

@acezen
Copy link
Collaborator

acezen commented Sep 7, 2023

Analytical Engine

Code Number Description
Ok 0 Not an error; returned on success.
IOError 1 error code for IO operation failed
ArrowError 2 arrow related error occur
VineyardError 3 vineyard related error occur
UnspecificError 4 unknown error
DistributedError 5 error for distributed job failed, error messages will be gathered
NetworkError 6 network related error
CommandError 7 op command receive failed
DataTypeError 8 Type errors (such as mismatching data types).
IllegalStateError 9 IllegalState error(such as query return a null context wrapper without error message)
InvalidValueError 10 Invalid value, for example a string that fails parsing.
InvalidOperationError 11 Invalid operation error, for example some operation is only available in certain fragment but called by other fragment
UnsupportedOperationError 12 Unsupported operation error
UnimplementedMethod 13 method unimplemented
GraphArError 14 GraphAr related error

@yecol
Copy link
Collaborator Author

yecol commented Sep 12, 2023

  • appends errors and all details (esp. from v6d, in coordinator...)
  • log format in config
  • review boost::leaf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants