Salesforce and Google Drive mappings #3613

mattnowzari · 2025-08-15T18:27:42Z

Part of https://github.com/elastic/search-team/issues/10779 and https://github.com/elastic/search-team/issues/10780

This PR adds two explicit mappings for Salesforce and Google Drive connectors under the agent folder.
This is preliminary work to eventually have agentless connectors create fresh indices using these mappings.

Checklists

Pre-Review Checklist

this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
this PR has a meaningful title
this PR links to all relevant github issues that it fixes or partially addresses
this PR has a thorough description

mattnowzari · 2025-08-18T11:45:04Z

.python-version

@@ -0,0 +1 @@
+3.11.11


Ahh! I will delete this.

I just got this added to the .gitignore a few days ago

erikcurrin-elastic · 2025-08-18T12:44:49Z

connectors/agent/mappings/salesforce.json

@@ -0,0 +1,744 @@
+{
+  "mappings": {
+    "_meta": {


Do we want to expound on this at all? Something like: it can contain support cases, sales opportunities...

Short answer - yes.

Long answer:
I initially thought _meta fields were beholden to the 50 char limit that field-level metadata has, but turns out they're not! Abhi's experiments have shown that long-form mapping descriptions do have benefit, so expanding on top-level _meta fields is going to be easy.

The per-field char limit should probably be increased to match - Sean has a draft PR here that has been on my radar to possibly pick up and push through.

erikcurrin-elastic · 2025-08-18T12:45:41Z

connectors/agent/mappings/salesforce.json

+        "properties": {
+          "Id": {
+            "type": "keyword"
+          },


Weird that we use different casing for these. But, I imagine you inherited this schema

Yep, this is the data shape that comes from our Salesforce connector

This is an instance where we've previously wanted to maintain the output of a given source so that it feels familiar to the customer (Maybe they know Account.Id), but we can afford to be more opinionated now.

I'm not saying these shouldn't be capitalized - just don't feel beholden to what comes out of our Salesforce connector. We can rename fields with Ingest Pipelines.

erikcurrin-elastic · 2025-08-18T12:49:55Z

connectors/agent/mappings/salesforce.json

+          }
+        }
+      },
+      "body": {


Can you explain why we use sparse embedding here and semantic text in google drive?

AFAIK, both are semantic_text.

model_settings.task_type : sparse_embedding is the a default setting for semantic_text that gets set when you create an index and have not set it to something else manually. So even if we didn't specify this, I think Kibana will automatically populate it to this unless we manually specify something else.

So, a mapping that is written as...

{ "mappings": { "properties": { "field1": { "type": "semantic_text" } } } }

...gets turned into the following when the index is created:

{ "mappings": { "properties": { "field1": { "inference_id": ".elser-2-elasticsearch", "model_settings": { "service": "elasticsearch", "task_type": "sparse_embedding" }, "type": "semantic_text" } } } }

This is the behavior I've seen at any rate. It would be worth making sure the mappings are consistent in how semantic_text field are defined, so it's a good callout regardless.

As always, it might also be worth researching to see if there is any fine-tuning we can do with the settings!

seanstory

Great progress! A few loosely held suggestions.

seanstory · 2025-08-18T13:51:29Z

connectors/agent/mappings/google_drive.json

+      "mime_type": {
+        "type": "keyword"
+      },
+      "name": {


I might suggest title, as I think that might be more common across data sources. We don't have to use a shared schema, but if we see opportunities to create overlap, I don't think it'll hurt.

seanstory · 2025-08-18T13:53:20Z

connectors/agent/mappings/google_drive.json

+        "type": "text",
+        "meta": {
+          "description": "This is the name of a document."


grey area, but I'd suggest semantic_text for file names/titles. Like a file might be "Earnings Report.pdf", but it would be great for it to show up in a search for "Quarterly profits"

Though - maybe we want both? I could see someone also expecting exact-match to work.

seanstory · 2025-08-18T13:55:27Z

connectors/agent/mappings/google_drive.json

+        },
+        "type": "text",
+        "meta": {
+          "description": "A document with this field is from a shared drive"


This makes it sound like it should be a boolean. Is this the name of its shared drive?

seanstory · 2025-08-18T13:56:54Z

connectors/agent/mappings/google_drive.json

we're missing meta.description values for a lot of the properties.

seanstory · 2025-08-18T13:58:53Z

connectors/agent/mappings/salesforce.json

+        "properties": {
+          "Id": {
+            "type": "keyword"
+          },


This is an instance where we've previously wanted to maintain the output of a given source so that it feels familiar to the customer (Maybe they know Account.Id), but we can afford to be more opinionated now.

I'm not saying these shouldn't be capitalized - just don't feel beholden to what comes out of our Salesforce connector. We can rename fields with Ingest Pipelines.

seanstory · 2025-08-18T14:02:45Z

connectors/agent/mappings/salesforce.json

+                    "inference_id": ".elser-2-elasticsearch",
+                    "model_settings": {
+                      "service": "elasticsearch",
+                      "task_type": "sparse_embedding"
+                    },


I'd scrap all this. If we leave just "type": "semantic_text", then when the relevance team changes the default model/settings, new indices will pick up the new defaults. Less maintenance for us.

seanstory · 2025-08-18T14:03:48Z

connectors/agent/mappings/salesforce.json

+      "ConvertedAccount": {
+        "type": "object"
+      },
+      "ConvertedContact": {
+        "type": "object"
+      },
+      "ConvertedOpportunity": {
+        "type": "object"
+      },


Looks like we don't really know what these are. If we don't know what they are and don't know why we'd want to search them, let's not even index them. We can drop this stuff in an ingest pipeline.

seanstory · 2025-08-18T14:04:42Z

connectors/agent/mappings/salesforce.json

+              "BccAddress": {
+                "type": "text"
+              },
+              "CcAddress": {
+                "type": "text"
+              },


I'd think email addresses should be keyword, since we probably only care about exact match when searching. WDYT?

seanstory · 2025-08-18T14:07:27Z

connectors/agent/mappings/salesforce.json

This mapping is huge. I'd suggest that we try to pair down how much we keep for WorkChat. We want it to have plenty of context, and plenty of ways to filter its search results, but if we can't imagine a realistic way that we might leverage a field for search or for valuable result insights - let's remove it for now. Adding fields later will be much easier than deleting fields later.

mattnowzari added 3 commits August 15, 2025 13:15

Added initial baseline mappings for Salesforce and GDrive

9f4fe92

Cleaned up and added some metadata to Salesforce mapping

3ed9c41

Added another semantic_text field

0279ed2

mattnowzari added the v9.2.0 label Aug 15, 2025

github-actions bot added the auto-backport label Aug 15, 2025

Update NOTICE.txt

d46261a

mattnowzari commented Aug 18, 2025

View reviewed changes

Shortened meta field that exceeded 50 chars

b3f2652

erikcurrin-elastic reviewed Aug 18, 2025

View reviewed changes

seanstory reviewed Aug 18, 2025

View reviewed changes

		@@ -0,0 +1 @@
		3.11.11

Salesforce and Google Drive mappings #3613

Are you sure you want to change the base?

Salesforce and Google Drive mappings #3613

Conversation

mattnowzari commented Aug 15, 2025

Part of https://github.com/elastic/search-team/issues/10779 and https://github.com/elastic/search-team/issues/10780

Checklists

Pre-Review Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanstory left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!