-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrading elasticsearch to 5.x. #187
Conversation
build.gradle
Outdated
@@ -138,6 +138,10 @@ configure(javaProjects) { | |||
dependency("org.glassfish.jersey.containers:jersey-container-servlet:2.19") | |||
dependency("org.glassfish.jersey.media:jersey-media-json-jackson:2.19") | |||
dependency("org.spockframework:spock-guice:1.0-groovy-2.4") | |||
/**es 5.4.1 dependencies*/ | |||
dependency("org.elasticsearch.client:transport:5.4.1") | |||
dependency("org.apache.logging.log4j:log4j-api:2.7") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need these explicitly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To pin the version of elasticSearch client instead of relying on the default one. I removed the dependency of org.apache.logging.log4j:log4j-api:2.7.
metacat-main/build.gradle
Outdated
compile("com.amazonaws:aws-java-sdk-sns") | ||
compile("joda-time:joda-time") | ||
compile("org.apache.logging.log4j:log4j-api") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we're using log4j like this here instead of using slf4j/logback like everywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required by the elasticSearch transport client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why do we have to explicitly depend on it. It does not get it transitively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may change based on the spring boot version.
want to set it to a specific version, which the cluster is.
@Configuration | ||
@ConditionalOnProperty(value = "metacat.elasticsearch.enabled", havingValue = "true") | ||
public class ElasticSearchConfig { | ||
|
||
private static final String TCP_CONNECTION_TIMEOUT = "60s"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this being used only once maybe you should just have it as a local variable? Additionally this looks like something that would want to be a property with a default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Don't think we need to make it configurable. Roll back the code to previous.
@@ -51,11 +53,13 @@ | |||
|
|||
private String id; | |||
private Object dto; | |||
private final String timestamp = String.valueOf(Instant.now().toEpochMilli()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set the timestamp in the constructor and have this as null by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to put null as default? This field is always required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we read the document from elasticsearch, how will you populate this Object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of this use-case.Needs a bit explain how to use it after reading from the es. If we need to update a certain field, we can upload the fields explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Moved timestamp init to constructor.
final ObjectNode node = metacatJsonLocator.emptyObjectNode(); | ||
node.set("dataMetadata", dto.getDataMetadata()); | ||
es.updates(ElasticSearchDoc.Type.table.name(), ids, metacatRequestContext, node); | ||
final ElasticSearchDoc doc = new ElasticSearchDoc(dto.getName().toString(), dto, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what the previous code is doing to be honest. In the new world, the API does not take in the node, we can talk about it and update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the logic to previous implementation.
* @param client elastic search client | ||
* @param config config | ||
* @param metacatJson json utility | ||
* @param elasticSearchMetric elastic search metric | ||
*/ | ||
public ElasticSearchUtilImpl( | ||
@Nullable final Client client, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the new ES client have retry logic? If so, can we use that instead of us handling the retry logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it doesn't have the retry logic.
this.elasticSearchMetric.getElasticSearchSaveFailureCounter().increment(); | ||
log("ElasticSearchUtil.saveToIndex", type, id, null, e.getMessage(), e, true, index); | ||
} | ||
} | ||
|
||
private static List<String> getIds(final SearchResponse response) { | ||
return FluentIterable.from(response.getHits()).transform(SearchHit::getId).toList(); | ||
final List<String> ret = Lists.newArrayList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cant we use the Iterable?
} | ||
|
||
private <T> List<T> parseResponse(final SearchResponse response, final Class<T> valueType) { | ||
return FluentIterable.from(response.getHits()).transform(hit -> { | ||
final List<T> ret = Lists.newArrayList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cant we use the Iterable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can. But the previous FluentIterable.from xx related implemenation doesn't allow mocking/stubbing the object in the unit test.
metacat-main/build.gradle
Outdated
@@ -74,14 +75,16 @@ dependencies { | |||
runtime("io.springfox:springfox-swagger-ui") | |||
runtime("org.webjars:hal-browser") | |||
runtime("org.springframework:spring-aspects") | |||
runtime("org.apache.logging.log4j:log4j-api") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed since log4j-core gets log4j-api
Bring back the previous elasticSearch using embedded es server. |
testCompile("io.airlift:testing-mysql-server") | ||
testCompile("org.apache.logging.log4j:log4j-core") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to testRuntime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, tried that. The build process will fail using testRuntime
:metacat-main:compileTestGroovy
Unable to load class org.elasticsearch.node.Node due to missing dependency org/apache/logging/log4j/Logger
if (dto.isDataExternal()) { | ||
final List<String> ids = es.getTableIdsByUri(metadataType, dto.getDataUri()); | ||
ids.remove(dto.getName().toString()); | ||
if (!ids.isEmpty()) { | ||
final ObjectNode node = metacatJsonLocator.emptyObjectNode(); | ||
node.set("dataMetadata", dto.getDataMetadata()); | ||
es.updates(ElasticSearchDoc.Type.table.name(), ids, metacatRequestContext, node); | ||
node.set(ElasticSearchDoc.Field.DATA_METADATA, dto.getDataMetadata()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we not setting the timestamp?
* @param client elastic search client | ||
* @param config config | ||
* @param metacatJson json utility | ||
* @param elasticSearchMetric elastic search metric | ||
*/ | ||
public ElasticSearchUtilImpl( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to set the timestamp on softDelete, update and create.
@@ -371,7 +355,7 @@ private ObjectNode toJsonObject(final ElasticSearchDoc elasticSearchDoc) { | |||
final QueryBuilder queryBuilder = QueryBuilders.boolQuery() | |||
.must(QueryBuilders.termsQuery("name.qualifiedName.tree", names)) | |||
.must(QueryBuilders.termQuery("deleted_", false)) | |||
.must(QueryBuilders.rangeQuery("_timestamp").lte(marker.toDate())) | |||
.must(QueryBuilders.rangeQuery(ElasticSearchDoc.Field.TIMESTAMP).lte(marker.toDate())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work? We are storing the timestamp in epoch millis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update to milli for comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test case
@@ -1,6 +1,7 @@ | |||
{ | |||
"settings": { | |||
"index":{"number_of_shards":10}, | |||
"index":{"number_of_shards":5, | |||
"max_result_window": 2147483647}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we setting this property high?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed yesterday, it's needed as setting the size in the query.
final SearchRequestBuilder request = client.prepareSearch(esIndex)
.setTypes(type)
.setSearchType(SearchType.QUERY_THEN_FETCH)
.setQuery(queryBuilder)
.setSize(Integer.MAX_VALUE);
} | ||
} | ||
"type": "keyword", | ||
"index": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should set this to:
{
"inputFormat": {
"type" "text",
"index" = true,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same applies to outputformat, owner, uri and serializationLib
"user_": { | ||
"type": "string", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need keyword and text like inputFormat
"fields": { | ||
"tree": { | ||
"type": "string", | ||
"type": "text", | ||
"analyzer": "paths" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to say index=true for the property 'tree'.
Updated the mapping based on comments. |
No description provided.