USE_CASE_ID: extract_data_from_crm NAME: Extract data from CRM (multi-source) CATEGORY: Data operations The user wants to pull records (contacts, companies / accounts) out of one or more CRMs into a Floqer sub-sheet for downstream work — analysis, normalisation, secondary enrichment, push to another destination. Supports Salesforce and HubSpot today; Pipedrive is planned but not yet exposed via the Floqer API. For routing, file structure, and cross-cutting principles, see https://floqer.com/docs/use-case-catalog.txt. This file does not repeat them. INDEX: 1. When to use / when not to use 2. Inputs and pre-flight clarifications 3. Outputs 4. Workflow design 5. Implementation 6. Best practices 7. Common variations 8. Failure modes and mitigations 9. Related use cases ================================================================================ 1. WHEN TO USE / WHEN NOT TO USE ================================================================================ USE WHEN: - The user has a CRM (Salesforce, HubSpot, or both) and wants a snapshot of records into Floqer for downstream processing. - Downstream is an LLM analysis (compare / score / cluster / deduplicate / brief-write) — see "Preferred pattern: slim before LLM" below. This is the canonical wrapper around any "feed CRM data into an LLM" workflow. - The selection criteria can be expressed as a property filter (e.g. lifecyclestage = lead, Website contains "acme") rather than requiring a pre-known record ID. - The user is fine with a point-in-time snapshot, not a live sync. DO NOT USE WHEN: - The user wants to monitor for CHANGES (additions, updates, role moves) rather than a snapshot. Use CRM contact change detection or schedule this flow recurrently with downstream dedupe. - The user wants ONE specific record by known CRM ID. Use the lookup action directly with that ID — skip the merge / fan-out chain. - The target CRM isn't yet supported via Floqer actions (Pipedrive, Zoho, Close, Attio). ================================================================================ 2. INPUTS AND PRE-FLIGHT CLARIFICATIONS ================================================================================ Before designing, confirm with the user: - Which CRM(s)? Salesforce, HubSpot, or both. - Which object? HubSpot: `contacts` or `companies`. Salesforce: any standard or custom object (`Contact`, `Account`, `Lead`, `Opportunity`, custom `__c` objects). - Selection criteria? List of property+value pairs. Note the per-CRM matching semantics (see below). - Row cap? HubSpot's `limit` field caps per-criterion results (default 10). Salesforce returns up to ~200 per query, capped server-side. - Destination shape? Normalised across CRMs into a unified column set (recommended) OR per-CRM raw passthrough (one sub-sheet per CRM). - Cost budget? Every action in this chain is 0 credits. Free path end-to-end. Connection prerequisites — must be set up before this flow will run: - HubSpot: app.floqer.com → Settings → Connections → HubSpot OAuth. - Salesforce: same, Salesforce OAuth. The connection must live on the API key's user account; if it lives under a different team member, Get Action Field Options returns 424. Per-CRM constraints worth surfacing to the user up front: - HubSpot lookup is EXACT-MATCH per criterion. There is no contains operator. For a broad pull, filter on a universal property like `lifecyclestage` (every contact has one) rather than guessing partial email / name values. - HubSpot lookup only exposes `contacts` and `companies` via the API. `deals` and `leads` are not exposed at the moment. - Salesforce lookup supports `exact_match: false` for contains-search. Useful for broad pulls (e.g. `Email contains "@"`). - Salesforce's `records_found` columns are NOT pre-registered as typed refs. Don't try to access them via the 3-segment `{{.records_found.}}` shape in a JS expression — those refs substitute as empty strings. Use the responseId pattern (§5.4 below). ================================================================================ 3. OUTPUTS ================================================================================ A Floqer sub-sheet on the same workflow with N rows — one per extracted CRM record. Default normalised schema (recommended when combining sources): - source string `hubspot` | `salesforce` | `pipedrive` - id string the source CRM's record ID - first_name string - last_name string - email string - phone string - title string - company string HubSpot: `associatedcompanyid` (linked company record ID; falls back to free-text `company`) | Salesforce: `AccountId` Per-CRM raw outputs (the full structured_array with every property the CRM returned) remain available in the lookup action's cell on the main sheet if a downstream action needs them. ================================================================================ 4. WORKFLOW DESIGN ================================================================================ Preferred pattern: SLIM BEFORE LLM ----------------------------------- Whenever an LLM action follows a CRM lookup, the JS formatter step is not optional — it is the load-bearing piece that makes the flow work. A raw HubSpot contact carries ~450 properties; a raw Salesforce Contact / Account carries 100+ standard fields plus every custom field. Feeding that into an LLM blows context budget, drops signal-to-noise, and produces flat or hallucinated comparisons. Default to slim early, slim hard. Project to ONLY the fields the LLM actually needs for the task at hand: - Compare / score / cluster ACCOUNTS → name, domain (Website), industry, employee count, annual revenue, location (City / State / Country). Drop everything else. - Compare / score / cluster CONTACTS → first_name, last_name, email, title. Optionally phone, seniority, linked company ID. - Write per-record briefs → the slim shape above plus 1–2 free-text fields (description, recent notes). Never raw. The recipes in §5.4 are written this way on purpose — they project to ~8 fields out of ~450. Even when no LLM follows, the slim shape is what downstream Floqer actions, push_data_to_sheet mappings, and sub-sheet columns are designed around. Going wider is the exception, not the default. Chain (linear): hubspot_lookup_object → salesforce_lookup_record → format_data_using_js_expression (slim + normalise + merge) → raw_to_structured_array (discover columns) → push_data_to_sheet (fan out to sub-sheet) → [optional] llm_models (per-row analysis on slim shape) Each lookup outputs a `structured_array` of records (HubSpot's `records`, Salesforce's `records_found`). The JS formatter normalises both into a single slim JSON array of objects. `raw_to_structured_array` discovers the column schema so downstream actions can reference each column. `push_data_to_sheet` fans the list out into a Floqer sub-sheet, one row per record, ready for per-row LLM analysis or any other follow-on action. Workflow inputs: one trigger field (any string) is enough — the filter values are baked into action configs, not per-row. If the user wants filter values parameterised per run, add them as inputs and reference them in the lookup configs via `{{input.}}`. For Pipedrive when added: insert a `pipedrive_lookup_*` node in the chain and extend the JS merge to consume its structured_array. The responseId helper in §5.4 already handles any structured_array input — no refactor needed. ================================================================================ 5. IMPLEMENTATION ================================================================================ 5.1 — Add the 5 actions sequentially with `after` so the chain wires up. POST .../actions/add Body: {"action_id":"hubspot_lookup_object"} POST .../actions/add Body: {"action_id":"salesforce_lookup_record","after":""} POST .../actions/add Body: {"action_id":"format_data_using_js_expression","after":""} POST .../actions/add Body: {"action_id":"raw_to_structured_array","after":""} POST .../actions/add Body: {"action_id":"push_data_to_sheet","after":""} 5.2 — Configure HubSpot lookup. PATCH .../actions/ Body: { "inputs": { "hubspot_object": "contacts", "operator": "or", "limit": "5", "add_properties_and_their_values": [ { "name": "lifecyclestage", "value": "lead" } ] } } Field notes: - hubspot_object: `contacts` or `companies` only. - operator: `or` (any criterion matches) or `and` (all match) — combines multiple `add_properties_and_their_values` entries. - Each `{name, value}` is EXACT match. No contains operator. - limit: default 10. Sent as a stringified number per the action contract. 5.3 — Configure Salesforce lookup. PATCH .../actions/ Body: { "inputs": { "salesforce_object": "Contact", "add_fields_and_their_values": [ { "name": "Email", "value": "@" } ], "search_operator": "OR", "exact_match": false } } Field notes: - exact_match: false → CONTAINS search. `Email contains "@"` is a near-universal contact pull on standard orgs. - search_operator: OR (any criterion matches) or AND. - One match per criterion when exact_match=true; multiple per criterion when exact_match=false. Capped at ~200 records server-side. 5.4 — Configure the JS formatter (merge + normalise). TRAP: 3-segment refs `{{.records.}}` and `{{.records_found.}}` DO NOT work inside JS expressions — they substitute as empty strings at runtime. Those refs are only valid in `push_data_to_sheet`'s `choose_columns_to_send_to` and other column-mapping contexts. In a JS expression, the 2-segment ref to the structured_array injects as a "responseId pattern" object — keyed by row index, each value an array of field triples. Canonical structured_array shape in JS: { "0": [{ responseId, responseStructureId, value }, …], // row 0 "1": [{ responseId, responseStructureId, value }, …], // row 1 … } Working recipe (drop in `data_formatter`): (() => { const rowsFromStructured = (obj) => { if (!obj || typeof obj !== "object") return []; return Object.keys(obj) .filter(k => /^\d+$/.test(k)) .sort((a, b) => +a - +b) .map(k => { const rec = {}; (obj[k] || []).forEach(f => { const key = (f.responseStructureId || f.responseId || "").toLowerCase(); if (key) rec[key] = f.value; }); return rec; }); }; const hs = rowsFromStructured({{.records}}); const sf = rowsFromStructured({{.records_found}}); const norm = [ ...hs.map(c => ({ source: "hubspot", id: c.hs_object_id || "", first_name: c.firstname || "", last_name: c.lastname || "", email: c.email || "", phone: c.phone || "", title: c.jobtitle || "", // Prefer the linked company record's ID (matches Salesforce's // accountid). The free-text `company` field on a contact is // often empty — it's only populated when the user manually // typed it, not from the contact↔company association. company: c.associatedcompanyid || c.company || "" })), ...sf.map(c => ({ source: "salesforce", id: c.id || "", first_name: c.firstname || "", last_name: c.lastname || "", email: c.email || "", phone: c.phone || c.mobilephone || "", title: c.title || "", company: c.accountid || "" })) ]; return norm; })() ACCOUNTS variant (use when extracting companies/Accounts instead of contacts — e.g. for an LLM "compare these accounts" task). Same helper, different object types upstream, different field projection: name, domain, industry, employee count, revenue, location. (() => { const rowsFromStructured = (obj) => { if (!obj || typeof obj !== "object") return []; return Object.keys(obj) .filter(k => /^\d+$/.test(k)) .sort((a, b) => +a - +b) .map(k => { const rec = {}; (obj[k] || []).forEach(f => { const key = (f.responseStructureId || f.responseId || "").toLowerCase(); if (key) rec[key] = f.value; }); return rec; }); }; const hs = rowsFromStructured({{.records}}); const sf = rowsFromStructured({{.records_found}}); const norm = [ ...hs.map(c => ({ source: "hubspot", id: c.hs_object_id || "", name: c.name || "", domain: c.domain || c.website || "", industry: c.industry || "", employees: c.numberofemployees || "", revenue: c.annualrevenue || "", city: c.city || "", state: c.state || "", country: c.country || "" })), ...sf.map(c => ({ source: "salesforce", id: c.id || "", name: c.name || "", domain: c.website || "", industry: c.industry || "", employees: c.numberofemployees || "", revenue: c.annualrevenue || "", city: c.billingcity || c.shippingcity || "", state: c.billingstate || c.shippingstate || "", country: c.billingcountry || c.shippingcountry || "" })) ]; return norm; })() Note the projection size: ~10 fields out of HubSpot's ~150 company properties and Salesforce's ~70 standard Account fields. That ratio is the point — see §4 "Slim before LLM". Helper notes: - `responseStructureId` is the canonical column key on HubSpot's `records` and Salesforce's `records_found` — both surface semantic names there (`firstname`, `email`). - Always `.toLowerCase()` the key. HubSpot returns lowercase keys (`firstname`); Salesforce returns PascalCase (`Id`, `IsDeleted`). Normalise so the downstream `.map` works on either source. - The `Array.isArray` / `typeof !== "object"` guards degrade gracefully when one CRM returns no rows. Apollo divergence (and the general rule): The same responseId-pattern SHAPE applies to most provider-backed structured_array outputs, but a few — notably Apollo's `list_of_employees` from `get_employees_by_company_using_apollo` — surface OPAQUE UUIDs as both `responseId` and `responseStructureId`, not semantic names. The helper above will silently return rows keyed by UUID, and `e.first_name` will always be undefined. For Apollo specifically, switch to POSITION-BASED PARSING — read the inner array by INDEX into the column order published in action-catalog.txt under the action's `Produces` block: // Apollo's live output has 21 columns. The 3 email-related // columns (positions 16, 17, 20) are intentionally NOT exposed // — pair with person_work_email_waterfall downstream for email // enrichment. The nulls reserve those slots so the rest of the // mapping stays aligned with Apollo's actual position order. const APOLLO_COLS = [ "first_name","last_name","person_linkedin_url", "person_title","person_headline","person_twitter_url", "person_facebook_url","person_github_url","employement_history", "departments","seniority","functions","phone_numbers","city", "state","country", null, null, // email_status, person_email "company_name","company_domain", null // email_confidence ]; const rowsByPosition = (obj, cols) => { if (!obj || typeof obj !== "object") return []; return Object.keys(obj) .filter(k => /^\d+$/.test(k)) .sort((a,b) => +a - +b) .map(k => { const rec = {}; (obj[k] || []).forEach((f, i) => { if (i < cols.length && cols[i]) rec[cols[i]] = f.value; }); return rec; }); }; const employees = rowsByPosition( {{.list_of_employees}}, APOLLO_COLS ); General rule: try `responseStructureId`-keyed access first. If the resulting rows have UUID keys instead of semantic ones (probe with `Object.keys(rows[0])` in a one-off formatter), fall back to position-based parsing per the catalog's Produces order. The action's `GET .../outputs` endpoint surfaces semantic column NAMES but not the UUIDs the live injection uses, so there's no clean runtime map. How to discover field names (the bit that's easy to get wrong): 1. SCHEMA — start from the action-detail file: - https://floqer.com/docs/action-detail/hubspot_lookup_object.txt §1 INPUTS lists ~450 standard HubSpot properties per object with their type. This is the authoritative naming source for standard HubSpot fields. - https://floqer.com/docs/action-detail/salesforce_lookup_record.txt §1 INPUTS lists Salesforce's per-object standard fields. Custom fields (`*_c`) are surfaced at runtime, not in the doc. 2. DYNAMIC OPTIONS — for custom HubSpot properties or custom Salesforce fields the agent's org has defined: POST .../actions/{aid}/options/add_properties_and_their_values Body: { "context": { "hubspot_object": "contacts" } } Returns `{value, label, extras: {name, type, ...}}` for every property on the chosen object — including the org's custom ones. Use the same endpoint pattern for Salesforce on `add_fields_and_their_values` with `salesforce_object` context. 3. GROUND TRUTH — run a row, then read the action's cell output directly: POST .../sheets/{sid}/rows/list Body: {"row_ids":[""]} Inspect `cells..outputs.records[0]` (HubSpot) or `cells..outputs.records_found[0]` (Salesforce). The keys present on a real record are the ones to reference. This is the definitive answer when the schema and the actual data disagree — for example: HubSpot contacts expose BOTH a free-text `company` field AND `associatedcompanyid` (the linked Company record's ID); both are in the schema, but `company` is almost always empty in practice while `associatedcompanyid` is populated whenever the contact↔company association exists. Repeatability: yes for standard objects (schema is stable in the action-detail file and the responseId shape is universal). The one place it bites: assuming a field exists because the schema lists it. Some fields are only populated under specific data-entry paths — verify with step 3 on a sample row before wiring downstream consumers. 5.5 — Configure raw_to_structured_array. PATCH .../actions/ Body: { "inputs": { "array": "{{.formatted_data}}" } } Schema discovery is a TWO-RUN dance — see https://floqer.com/docs/action-detail/raw_to_structured_array.txt §3: 1. Run the row once → R2S parses the input array and stores a sample for column inference. 2. GET .../actions//outputs → discover the column schema. Each column carries a 3-segment `structured_array_reference` like `{{.list.}}`. 3. Run the row a SECOND time → R2S populates its typed `list` output. The first run leaves the cell empty; without the second run, `push_data_to_sheet` errors with "The selected list does not have any data". 5.6 — Create the destination sub-sheet. POST .../workflows//sheets Body: {"name": "Merged Contacts"} Then add input columns matching the R2S schema: POST .../sheets//inputs Body: [ {"name":"source","type":"string"}, {"name":"id","type":"string"}, {"name":"first_name","type":"string"}, {"name":"last_name","type":"string"}, {"name":"email","type":"string"}, {"name":"phone","type":"string"}, {"name":"title","type":"string"}, {"name":"company","type":"string"} ] 5.7 — Configure push_data_to_sheet. PATCH .../actions/ (on the MAIN sheet, not the sub-sheet) Body: { "inputs": { "choose_or_create_a_new_sheet": "", "select_a_list": "{{.list}}", "choose_columns_to_send_to": [ {"name":"source", "value":"{{.list.source}}"}, {"name":"id", "value":"{{.list.id}}"}, {"name":"first_name", "value":"{{.list.first_name}}"}, {"name":"last_name", "value":"{{.list.last_name}}"}, {"name":"email", "value":"{{.list.email}}"}, {"name":"phone", "value":"{{.list.phone}}"}, {"name":"title", "value":"{{.list.title}}"}, {"name":"company", "value":"{{.list.company}}"} ] } } This is the context where 3-segment refs DO work — column mapping on push_data_to_sheet. Don't confuse it with the JS expression context in §5.4. 5.8 — Run. Add a single trigger row, then either: Option A — Run the chain head with `run_next_action: true`: POST .../actions//run Body: {"row_ids":[""], "run_next_action": true} The cascade hits HS → SF → JS → R2S → push. Push fails on this first cascade because R2S hasn't populated its list yet (see §5.5). After Get Action Outputs, re-run R2S with cascade to trigger push: POST .../actions//run Body: {"row_ids":[""], "run_next_action": true} Option B — Run each action individually with run_next_action: false: HS → SF → JS → R2S (first pass) → Get Action Outputs → R2S (second pass) → push. Cleaner sequencing, more roundtrips. ================================================================================ 6. BEST PRACTICES ================================================================================ - Slim before LLM. If any LLM action follows the lookup — comparison, scoring, clustering, brief-writing — the JS formatter is mandatory, not optional. Project to the 8–12 fields the task actually needs (see §4). Raw CRM records carry 100–450 properties; passing them straight into a prompt is the single biggest cause of flat or hallucinated output in this pattern. - Test on a SMALL pull first. Set HubSpot `limit` to 5 and pick a narrow Salesforce filter — verify the merged shape in the destination sub-sheet before widening. - Lowercase all keys in the JS normaliser. HubSpot=lowercase, Salesforce=PascalCase — normalising both prevents quiet misses on `c.firstname` when SF gave you `c.FirstName`. - Pick `responseStructureId` over `responseId` for column lookup. On standard CRM lookups they're identical, but `responseId` can be a UUID on AI / scraping outputs while `responseStructureId` stays semantic. - Guard with `Array.isArray` / `typeof !== "object"`. One CRM returning no rows shouldn't fail the merge. - Add `source` as the first column so downstream consumers (filtering, scoring, routing) can branch on CRM origin without re-deriving it. ================================================================================ 7. COMMON VARIATIONS ================================================================================ - Single-CRM extract: drop the unused lookup + simplify the JS merge to one spread. The slimming step still matters whenever an LLM follows. - LLM analysis as the final step: append an `llm_models` action after `push_data_to_sheet`, running per-row on the sub-sheet's slim columns. Common tasks: pairwise compare, score against a rubric, cluster by theme, draft outreach. Because the JS step already slimmed to ~8 fields, the prompt stays compact and the model gets clean signal. - Companies / Accounts instead of Contacts: swap object types (HubSpot `companies`, Salesforce `Account`) and use the ACCOUNTS variant of the JS normaliser in §5.4 (name, domain, industry, employee count, revenue, location). - Per-source raw passthrough: skip the JS merge; push two separate sub-sheets (one per CRM), each with that CRM's native column shape. Useful when downstream consumers need every property HubSpot/Salesforce returned, not the normalised subset. - Recurring snapshot: schedule the workflow via CronCreate to refresh the sub-sheet daily / weekly. Pair with `auto_dedupe_rows` on the destination sub-sheet (keyed on `source:id`) so the same record isn't appended twice across runs. - Filter-as-input: declare the CRM filter values as workflow inputs and reference via `{{input.}}` in the lookup configs. Lets the user re-run with different criteria without re-PATCHing the chain. ================================================================================ 8. FAILURE MODES AND MITIGATIONS ================================================================================ - HubSpot lookup returns "No data found" → the exact-match filter didn't hit. Broaden by switching to a universal property (`lifecyclestage`, `createdate`) or confirm the user knows a specific value. - JS "X.map is not a function" / "not iterable" → the formula treated a structured_array as a plain array. Either you used a 2-segment ref unquoted expecting a JS array, or a 3-segment ref expecting parallel arrays. Both wrong — use the responseId helper in §5.4. - "Reference does not match any known variable on this sheet" warning on Salesforce 3-segment refs → those columns aren't pre-registered. Use the 2-segment ref + responseId helper. - push_data_to_sheet errors with "The selected list does not have any data" → R2S didn't get its second pass. Re-run R2S after Get Action Outputs, then push. - Salesforce lookup returns 424 on field-options resolve → the API key user doesn't own the Salesforce connection. Re-provision the key from the connection-owning user's account. - Sub-sheet missing columns the JS produces → input columns on the destination sheet need to match `choose_columns_to_send_to` names. The server returns `unknown_field` warnings on PATCH if any are missing. ================================================================================ 9. RELATED USE CASES ================================================================================ - CRM contact change detection — when the goal is detecting moves / title changes, not a snapshot https://floqer.com/docs/use-case-detail/crm_contact_change_detection.txt - Cross-table lookup via HTTP API Call — when filtering an existing sub-sheet by an action-output column value https://floqer.com/docs/use-case-detail/cross_floqer_table_lookup.txt ================================================================================ This file is maintained manually. Last updated: 2026-05-25. Full interactive reference: https://floqer.com/docs/reference Use-case catalog: https://floqer.com/docs/use-case-catalog.txt