<aside> 👉 This is a grab-bag of SQPARQL and Ne04j queries which I’ve found useful during the process of working with collections data, Wikipedia, and Google Cloud Named Entity Recognition – and which is perhaps useful to others(?)

</aside>

Basic CSV loading into Neo4j

LOAD CSV FROM 'file:///Saltaire-selection-130322.csv' AS row
RETURN row
LIMIT 5;

Create items from CSV

LOAD CSV FROM 'file:///Saltaire-selection-130322.csv' AS row
WITH
  row[0] as itemId,
  row[10] as description,
  row[14] as keywords,
  row[16] as notes,
  row[18] as title
MERGE (c:CollectionItem {id: itemId})
SET
  c.itemId = itemId,
  c.description = description,
  c.keywords = keywords,
  c.notes = notes,
  c.title = title,
  c.collection = 'Saltaire'
RETURN count(c);

Delete stuff

MATCH (c:CollectionItem {collection: 'Saltaire'})
DELETE c;

Dry-run of Google Cloud NER, on a single record

MATCH (c:CollectionItem {itemId:"154580237"})
CALL apoc.nlp.gcp.entities.stream(c, {
  key: apoc.static.getAll("gcp").apiKey,
  nodeProperty: "description"
})
YIELD value
UNWIND value.entities AS entity
RETURN entity;

Run all collection items through Google Cloud NER and store the relationships

MATCH (c:CollectionItem {collection: "Lister"})
WHERE c.description IS NOT NULL
WITH collect(c) AS items
CALL apoc.nlp.gcp.entities.graph(items, {
  key: apoc.static.getAll("gcp").apiKey,
  nodeProperty: "description",
  writeRelationshipType: "ENTITY",
  writeRelationshipProperty: "gcpEntityScore",
  write: true,
  scoreCutoff: 0.01
})
YIELD graph AS g
RETURN g;

Return the graph connection items and entities with an NER score above 0.5

MATCH (c:CollectionItem)-[e:ENTITY]->(en:Entity)
WHERE e.gcpEntityScore > 0.5
RETURN c, en;

Create a full-text search index on nodes & properties

CREATE FULLTEXT INDEX itemDescriptions FOR (c:CollectionItem) ON EACH [c.description]

NB can also use FOR (c:CollectionItem|OtherThing) if indexing more than one node type

Use the search index

CALL db.index.fulltext.queryNodes("itemDescriptions", "salt") YIELD node, score
WITH (c:CollectionItem {})
WHERE score > 0.2
RETURN node.title, node.description, score