<aside> 👉 This is a grab-bag of SQPARQL and Ne04j queries which I’ve found useful during the process of working with collections data, Wikipedia, and Google Cloud Named Entity Recognition – and which is perhaps useful to others(?)
</aside>
Basic CSV loading into Neo4j
LOAD CSV FROM 'file:///Saltaire-selection-130322.csv' AS row
RETURN row
LIMIT 5;
Create items from CSV
LOAD CSV FROM 'file:///Saltaire-selection-130322.csv' AS row
WITH
row[0] as itemId,
row[10] as description,
row[14] as keywords,
row[16] as notes,
row[18] as title
MERGE (c:CollectionItem {id: itemId})
SET
c.itemId = itemId,
c.description = description,
c.keywords = keywords,
c.notes = notes,
c.title = title,
c.collection = 'Saltaire'
RETURN count(c);
Delete stuff
MATCH (c:CollectionItem {collection: 'Saltaire'})
DELETE c;
Dry-run of Google Cloud NER, on a single record
MATCH (c:CollectionItem {itemId:"154580237"})
CALL apoc.nlp.gcp.entities.stream(c, {
key: apoc.static.getAll("gcp").apiKey,
nodeProperty: "description"
})
YIELD value
UNWIND value.entities AS entity
RETURN entity;
Run all collection items through Google Cloud NER and store the relationships
MATCH (c:CollectionItem {collection: "Lister"})
WHERE c.description IS NOT NULL
WITH collect(c) AS items
CALL apoc.nlp.gcp.entities.graph(items, {
key: apoc.static.getAll("gcp").apiKey,
nodeProperty: "description",
writeRelationshipType: "ENTITY",
writeRelationshipProperty: "gcpEntityScore",
write: true,
scoreCutoff: 0.01
})
YIELD graph AS g
RETURN g;
Return the graph connection items and entities with an NER score above 0.5
MATCH (c:CollectionItem)-[e:ENTITY]->(en:Entity)
WHERE e.gcpEntityScore > 0.5
RETURN c, en;
Create a full-text search index on nodes & properties
CREATE FULLTEXT INDEX itemDescriptions FOR (c:CollectionItem) ON EACH [c.description]
NB can also use FOR (c:CollectionItem|OtherThing) if indexing more than one node type
Use the search index
CALL db.index.fulltext.queryNodes("itemDescriptions", "salt") YIELD node, score
WITH (c:CollectionItem {})
WHERE score > 0.2
RETURN node.title, node.description, score