Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jaeger-v2] Add support for Elasticsearch #5152

Merged
merged 34 commits into from
Feb 27, 2024

Conversation

akagami-harsh
Copy link
Member

@akagami-harsh akagami-harsh commented Jan 29, 2024

Which problem is this PR solving?

Description of the changes

  • add support for elasticsearch in jaeger-v2

How was this change tested?

  • tested locally

Checklist

akagami-harsh and others added 4 commits January 29, 2024 21:40
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
@akagami-harsh akagami-harsh marked this pull request as ready for review January 29, 2024 20:23
@akagami-harsh akagami-harsh requested a review from a team as a code owner January 29, 2024 20:23
Copy link

codecov bot commented Jan 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.49%. Comparing base (8a5f824) to head (278d49a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5152      +/-   ##
==========================================
+ Coverage   94.48%   94.49%   +0.01%     
==========================================
  Files         334      334              
  Lines       19267    19303      +36     
==========================================
+ Hits        18204    18241      +37     
  Misses        875      875              
+ Partials      188      187       -1     
Flag Coverage Δ
cassandra-3.x 25.53% <ø> (ø)
cassandra-4.x 25.53% <ø> (ø)
elasticsearch-5.x 19.83% <ø> (ø)
elasticsearch-6.x 19.81% <ø> (ø)
elasticsearch-7.x 19.94% <ø> (-0.02%) ⬇️
elasticsearch-8.x 20.03% <ø> (-0.02%) ⬇️
grpc-badger 19.46% <ø> (ø)
kafka 14.06% <ø> (ø)
opensearch-1.x 19.96% <ø> (ø)
opensearch-2.x 19.94% <ø> (-0.02%) ⬇️
unittests 92.35% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
f := NewFactory()
f.InitFromOptions(Options{
Primary: namespaceConfig{Configuration: cfg},
others: make(map[string]*namespaceConfig),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, if only Primary and not Archive is ever initialized, would it actually provide an archive storage to the UI? Did you try it out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right it dose not work with ui. it throw's this error

docker container used to test it

docker run --name es01 -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" elasticsearch:8.12.0

image
do you know what causes it?🤔 , i tried fixing it with different configuration of Options but couldn't fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to look in the server logs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put together a docker-compose that illustrates this issue?

This "all shards failed" error has always annoyed me, I wonder if there are more details in the error returned from ES that we're not logging that could be more descriptive. Like, it's ridiculous for a database to respond with "your query failed" instead of telling why (the fact that it failed on "all shards" is not an explanation of why).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put together a docker-compose that illustrates this issue?

sure, i'll make one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check this one previously because there were no errors in the UI. However, I've just ran it with elasticsearch and found this issue.

harsh@MSI:~$ curl localhost:16686

  function getJaegerStorageCapabilities() {
        const DEFAULT_STORAGE_CAPABILITIES = { "archiveStorage": false };
        const JAEGER_STORAGE_CAPABILITIES = {"archiveStorage":false};
        return JAEGER_STORAGE_CAPABILITIES;
      }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when i checked with badger and memstore strangely {"archiveStorage":false} in both of them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for memstore you need to add archive to the config explicitly:

diff --git a/cmd/jaeger/internal/all-in-one.yaml b/cmd/jaeger/internal/all-in-one.yaml
index d17c52fc..a6189c8e 100644
--- a/cmd/jaeger/internal/all-in-one.yaml
+++ b/cmd/jaeger/internal/all-in-one.yaml
@@ -9,11 +9,14 @@ service:
 extensions:
   jaeger_query:
     trace_storage: memstore
+    trace_storage_archive: memstore2

   jaeger_storage:
     memory:
       memstore:
         max_traces: 100000
+      memstore2:
+        max_traces: 100000

But maybe this is ok - the idea in v2 was to decouple primary and archive storage altogether, so that one could use different types of backends for those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in theory the config you have should've worked, because you did define archive storage, but something isn't working.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a small fix for above problem, which sets {"archiveEnabled":true}

akagami-harsh and others added 3 commits February 6, 2024 12:51
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Signed-off-by: Harshvir Potpose <122517264+akagami-harsh@users.noreply.github.com>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Signed-off-by: Harshvir Potpose <122517264+akagami-harsh@users.noreply.github.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
@yurishkuro yurishkuro added changelog:new-feature Change that should be called out as new feature in CHANGELOG v2 changelog:exprimental Change to an experimental part of the code and removed changelog:exprimental Change to an experimental part of the code labels Feb 6, 2024
@yurishkuro
Copy link
Member

please rebase on top of #5171

@yurishkuro yurishkuro added changelog:exprimental Change to an experimental part of the code and removed changelog:new-feature Change that should be called out as new feature in CHANGELOG labels Feb 7, 2024
akagami-harsh and others added 4 commits February 7, 2024 12:18
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
f := NewFactory()
f.InitFromOptions(Options{
Primary: namespaceConfig{Configuration: cfg},
others: make(map[string]*namespaceConfig),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to look in the server logs.

@akagami-harsh
Copy link
Member Author

akagami-harsh commented Feb 16, 2024

Better to look in the server logs.

looked at the server logs. getting this err when trying to access ui, not sure why.

2024-02-17T01:40:04.265+0530    info    service@v0.93.0/service.go:165  Everything is ready. Begin running and processing data.
2024-02-17T01:40:08.406+0530    error   app/http_handler.go:492 HTTP handler, Internal Server Error     {"kind": "extension", "name": "jaeger_query", "error": "search services failed: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]"}
github.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleError
        /home/harsh/OpenSource/JAEGER/es/jaeger/cmd/query/app/http_handler.go:492
github.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).getServices
        /home/harsh/OpenSource/JAEGER/es/jaeger/cmd/query/app/http_handler.go:163
net/http.HandlerFunc.ServeHTTP
        /usr/local/go/src/net/http/server.go:2136

debugged it little bit and found out this code is returning the error

searchService := s.client().Search(indices...).
Size(0).
Query(serviceQuery).
IgnoreUnavailable(true).
Aggregation(operationsAggregation, serviceFilter)
searchResult, err := searchService.Do(context)
if err != nil {
return nil, fmt.Errorf("search operations failed: %w", err)
}

@@ -45,7 +45,7 @@ import (

// Configuration describes the configuration properties needed to connect to an ElasticSearch cluster
type Configuration struct {
Servers []string `mapstructure:"server_urls"`
Servers []string `mapstructure:"server_urls" validate:"required,min=1,dive,url"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a test we could write for ensuring that these annotations are correct (btw, what is "dive"?)

We could do it brute-force and just write individual tests for individual fields, but it seems redundant as that would be already done in the validator package (hopefully). But at the same time, having no tests to even ensure that the syntax of the annotations is correct is troublesome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for L58 and below - can you try removing yaml:... annotations? I think they are left-overs from earlier attempts to build on OTEL Coll, I don't think we're using them anywhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for L58 and below - can you try removing yaml:... annotations? I think they are left-overs from earlier attempts to build on OTEL Coll, I don't think we're using them anywhere.

removed it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(btw, what is "dive"?)

dive instructs the validation library to examine each element within the slice individually

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a test we could write for ensuring that these annotations are correct

We could do it brute-force and just write individual tests for individual fields, but it seems redundant as that would be already done in the validator package (hopefully). But at the same time, having no tests to even ensure that the syntax of the annotations is correct is troublesome.

we can write unit tests that check whether the validation behaves as expected. Instead of testing individual fields, we can create a set of test cases that cover various scenarios, including both valid and invalid configurations.
should i create one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but that wouldn't be what I am after, such test would simply verify that the Validator is working as expected, on select fields. But if I declare a field with annotations that don't make sense, I want to catch that in unit tests. E.g. misspell dive as diev (btw weird option, "recursive" would've been much more self explanatory).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the "github.com/go-playground/validator" library in our unit tests to detect misspelled annotations and fields with invalid annotation combinations.

like this

package main

import (
	"github.com/go-playground/validator"
)

type Webhook struct {
	CallbackURL string `validate:"url,badAnnotation"` // badAnnotation is not a valid tag
}

func main() {
	webhook := Webhook{
		CallbackURL: "https://www.example.com/",
	}
	validate := validator.New()
	validate.Struct(webhook)

}
>output 
panic: Undefined validation function 'badAnnotation' on field 'CallbackURL'

goroutine 1 [running]:
github.com/go-playground/validator.(*Validate).parseFieldTagsRecursive(0xc00002e240, {0x526987?, 0x55812d?}, {0x526971, 0xb}, {0x0, 0x0}, 0x0)
        /home/harsh/go/pkg/mod/github.com/go-playground/validator@v9.31.0+incompatible/cache.go:289 +0x9d9
github.com/go-playground/validator.(*Validate).extractStructCache(0xc00002e240, {0x538520?, 0xc00020a950?, 0x7fda1aac3198?}, {0x52160c, 0x7})
        /home/harsh/go/pkg/mod/github.com/go-playground/validator@v9.31.0+incompatible/cache.go:150 +0x4ec
github.com/go-playground/validator.(*validate).validateStruct(0xc000118480, {0x58b0c0, 0x68faa0}, {0x538520?, 0xc00020a950?, 0xc000242000?}, {0x538520?, 0xc00020a950?, 0x40e3a5?}, {0x58cd60, ...}, ...)
        /home/harsh/go/pkg/mod/github.com/go-playground/validator@v9.31.0+incompatible/validator.go:37 +0x196
github.com/go-playground/validator.(*Validate).StructCtx(0xc00002e240, {0x58b0c0, 0x68faa0}, {0x538520?, 0xc00020a950?})
        /home/harsh/go/pkg/mod/github.com/go-playground/validator@v9.31.0+incompatible/validator_instance.go:304 +0x491
github.com/go-playground/validator.(*Validate).Struct(...)
        /home/harsh/go/pkg/mod/github.com/go-playground/validator@v9.31.0+incompatible/validator_instance.go:277
main.main()
        /home/harsh/Testing/main.go:16 +0x65
exit status 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about "github.com/asaskevich/govalidator", will it do the same? We do not use "github.com/go-playground/validator"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it do the same?

Nope, it doesn't do the same. I actually tried using the existing validator at first.

Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
@@ -0,0 +1,11 @@
FROM alpine:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why you need this one - how is it different from the main Dockerfile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having issues with the main Dockerfile so i created a simpler one as a solution. now we don't need the Dockerfile and Docker-compose file anymore, so I'm removing them.

cmd/jaeger/es_config.yaml Outdated Show resolved Hide resolved
pkg/es/config/config.go Show resolved Hide resolved
f := NewFactory()
f.InitFromOptions(Options{
Primary: namespaceConfig{Configuration: cfg},
others: make(map[string]*namespaceConfig),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so when I run v2 with your new config file (and without -tags ui), then placeholder UI shows this:

JAEGER_CONFIG = {"archiveEnabled":true};
JAEGER_STORAGE_CAPABILITIES = {"archiveStorage":false};
JAEGER_VERSION = {"gitCommit":"","gitVersion":"","buildDate":""};

We expect {"archiveStorage":true} in order for archiving to work

@yurishkuro
Copy link
Member

I ran with your docker-compose and config and (with the help of #5209) it returns this error:

RootCause[Fielddata is disabled on [serviceName] in [jaeger-service-]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [serviceName] in order to load field data by uninverting the inverted index. Note that this can use significant memory. [type=illegal_argument_exception]]

akagami-harsh and others added 7 commits February 19, 2024 13:47
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
@akagami-harsh
Copy link
Member Author

I ran with your docker-compose and config and (with the help of #5209) it returns this error:

RootCause[Fielddata is disabled on [serviceName] in [jaeger-service-]. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [serviceName] in order to load field data by uninverting the inverted index. Note that this can use significant memory. [type=illegal_argument_exception]]

thanks a lot for the help. now i think it is working fine, setting UseReadWriteAliases to true fixes it.

Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
archive[archiveNamespace] = &namespaceConfig{
Configuration: cfg,
namespace: archiveNamespace,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this is logically wrong, but will work for now. The underlying problem is the existence of storage.ArchiveFactory interface, we need to get rid of it long term, because as far as interface goes the archive storage is just like a regular storage, only with different configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Signed-off-by: Harshvir Potpose <hpotpose62@gmail.com>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@yurishkuro yurishkuro enabled auto-merge (squash) February 27, 2024 00:40
@yurishkuro yurishkuro merged commit 7a04461 into jaegertracing:main Feb 27, 2024
35 checks passed
@akagami-harsh akagami-harsh deleted the elasticsearch branch March 17, 2024 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:exprimental Change to an experimental part of the code v2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants