Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrading elasticsearch to 5.x. #187

Merged
merged 2 commits into from
Sep 13, 2017
Merged

upgrading elasticsearch to 5.x. #187

merged 2 commits into from
Sep 13, 2017

Conversation

zhljen
Copy link
Contributor

@zhljen zhljen commented Sep 5, 2017

No description provided.

build.gradle Outdated
@@ -138,6 +138,10 @@ configure(javaProjects) {
dependency("org.glassfish.jersey.containers:jersey-container-servlet:2.19")
dependency("org.glassfish.jersey.media:jersey-media-json-jackson:2.19")
dependency("org.spockframework:spock-guice:1.0-groovy-2.4")
/**es 5.4.1 dependencies*/
dependency("org.elasticsearch.client:transport:5.4.1")
dependency("org.apache.logging.log4j:log4j-api:2.7")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these explicitly here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To pin the version of elasticSearch client instead of relying on the default one. I removed the dependency of org.apache.logging.log4j:log4j-api:2.7.

compile("com.amazonaws:aws-java-sdk-sns")
compile("joda-time:joda-time")
compile("org.apache.logging.log4j:log4j-api")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we're using log4j like this here instead of using slf4j/logback like everywhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required by the elasticSearch transport client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why do we have to explicitly depend on it. It does not get it transitively

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may change based on the spring boot version.
want to set it to a specific version, which the cluster is.

@Configuration
@ConditionalOnProperty(value = "metacat.elasticsearch.enabled", havingValue = "true")
public class ElasticSearchConfig {

private static final String TCP_CONNECTION_TIMEOUT = "60s";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this being used only once maybe you should just have it as a local variable? Additionally this looks like something that would want to be a property with a default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Don't think we need to make it configurable. Roll back the code to previous.

@@ -51,11 +53,13 @@

private String id;
private Object dto;
private final String timestamp = String.valueOf(Instant.now().toEpochMilli());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set the timestamp in the constructor and have this as null by default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to put null as default? This field is always required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we read the document from elasticsearch, how will you populate this Object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of this use-case.Needs a bit explain how to use it after reading from the es. If we need to update a certain field, we can upload the fields explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Moved timestamp init to constructor.

final ObjectNode node = metacatJsonLocator.emptyObjectNode();
node.set("dataMetadata", dto.getDataMetadata());
es.updates(ElasticSearchDoc.Type.table.name(), ids, metacatRequestContext, node);
final ElasticSearchDoc doc = new ElasticSearchDoc(dto.getName().toString(), dto,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what the previous code is doing to be honest. In the new world, the API does not take in the node, we can talk about it and update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the logic to previous implementation.

* @param client elastic search client
* @param config config
* @param metacatJson json utility
* @param elasticSearchMetric elastic search metric
*/
public ElasticSearchUtilImpl(
@Nullable final Client client,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the new ES client have retry logic? If so, can we use that instead of us handling the retry logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't have the retry logic.

this.elasticSearchMetric.getElasticSearchSaveFailureCounter().increment();
log("ElasticSearchUtil.saveToIndex", type, id, null, e.getMessage(), e, true, index);
}
}

private static List<String> getIds(final SearchResponse response) {
return FluentIterable.from(response.getHits()).transform(SearchHit::getId).toList();
final List<String> ret = Lists.newArrayList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cant we use the Iterable?

}

private <T> List<T> parseResponse(final SearchResponse response, final Class<T> valueType) {
return FluentIterable.from(response.getHits()).transform(hit -> {
final List<T> ret = Lists.newArrayList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cant we use the Iterable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can. But the previous FluentIterable.from xx related implemenation doesn't allow mocking/stubbing the object in the unit test.

@@ -74,14 +75,16 @@ dependencies {
runtime("io.springfox:springfox-swagger-ui")
runtime("org.webjars:hal-browser")
runtime("org.springframework:spring-aspects")
runtime("org.apache.logging.log4j:log4j-api")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed since log4j-core gets log4j-api

@zhljen
Copy link
Contributor Author

zhljen commented Sep 8, 2017

Bring back the previous elasticSearch using embedded es server.

testCompile("io.airlift:testing-mysql-server")
testCompile("org.apache.logging.log4j:log4j-core")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to testRuntime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, tried that. The build process will fail using testRuntime
:metacat-main:compileTestGroovy

Unable to load class org.elasticsearch.node.Node due to missing dependency org/apache/logging/log4j/Logger

if (dto.isDataExternal()) {
final List<String> ids = es.getTableIdsByUri(metadataType, dto.getDataUri());
ids.remove(dto.getName().toString());
if (!ids.isEmpty()) {
final ObjectNode node = metacatJsonLocator.emptyObjectNode();
node.set("dataMetadata", dto.getDataMetadata());
es.updates(ElasticSearchDoc.Type.table.name(), ids, metacatRequestContext, node);
node.set(ElasticSearchDoc.Field.DATA_METADATA, dto.getDataMetadata());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we not setting the timestamp?

* @param client elastic search client
* @param config config
* @param metacatJson json utility
* @param elasticSearchMetric elastic search metric
*/
public ElasticSearchUtilImpl(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to set the timestamp on softDelete, update and create.

@@ -371,7 +355,7 @@ private ObjectNode toJsonObject(final ElasticSearchDoc elasticSearchDoc) {
final QueryBuilder queryBuilder = QueryBuilders.boolQuery()
.must(QueryBuilders.termsQuery("name.qualifiedName.tree", names))
.must(QueryBuilders.termQuery("deleted_", false))
.must(QueryBuilders.rangeQuery("_timestamp").lte(marker.toDate()))
.must(QueryBuilders.rangeQuery(ElasticSearchDoc.Field.TIMESTAMP).lte(marker.toDate()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? We are storing the timestamp in epoch millis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update to milli for comparison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case

@@ -1,6 +1,7 @@
{
"settings": {
"index":{"number_of_shards":10},
"index":{"number_of_shards":5,
"max_result_window": 2147483647},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we setting this property high?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed yesterday, it's needed as setting the size in the query.
final SearchRequestBuilder request = client.prepareSearch(esIndex)
.setTypes(type)
.setSearchType(SearchType.QUERY_THEN_FETCH)
.setQuery(queryBuilder)
.setSize(Integer.MAX_VALUE);

}
}
"type": "keyword",
"index": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should set this to:
{
"inputFormat": {
"type" "text",
"index" = true,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same applies to outputformat, owner, uri and serializationLib

"user_": {
"type": "string",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need keyword and text like inputFormat

"fields": {
"tree": {
"type": "string",
"type": "text",
"analyzer": "paths"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to say index=true for the property 'tree'.

@zhljen
Copy link
Contributor Author

zhljen commented Sep 13, 2017

Updated the mapping based on comments.

@zhljen zhljen merged commit a625d3b into Netflix:master Sep 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants