Tagging

Tagging is a key feature of the Voluntarily Platform.

All the major data entities can be tagged: Organisations, People, Activities, Opportunities

Tags are used for:

  • Grouping items into categories: physics, robots, year12

  • Listing skills required and provided: coding, spanish

  • Search terms and faceted searches: coding (12), Robots (2)

  • Extra Metadata fields for describing entities: easy, smallgroup

Tags generate a customer driven taxonomy (folksonomy) although we will prefill the available tags list with common terms to drive people towards consistency. e.g. science, technology, engineering, mathematics

Tagging UI - Design Pattern

http://ui-patterns.com/patterns/Tag

Problem summary

Items need to be labelled, categorized, and organized using keywords that describe them.

Example

▲ When adding tags to a video at Vimeo, tags are separated by commas, and each tag are upon submission added to a horizontal line below the input field. Each tag can easily be removed separately.

Usage

  • Use when the content on your website is possibly mapped into multiple categories and does not necessarily only fit into one hierarchical category.

  • Use when you want users to contribute data to your website and let them organize their contributed data themselves.

Solution

Let users associate multiple topics with a piece of content. Allow users to add appropriate keywords to categorize their own content in a non-hierarchical way. Let users use hashtags to integrate tagging into the content itself.

Allow keywords to be associated with items on a website/application such as blog articles, ecommerce products and media. Use terms that categorically describe these items. Permit these items to be found in a search using these keywords. Let contributors of information add keywords to the content they submit. Keywords can be displayed as links that aid in finding items with matching keywords.

Rationale

Tagging helps make it easier for users to find their own content and for their peers to discover content related to their interests.

Tags are relevant keywords associated with or assigned to a piece of information. Tags are often used on social websites, where users can upload their own content. Here, tags are used to let users organize and categorize their own data in the public sphere. In this way, tags can be seen as a bottom-up categorization of data rather than a top-down categorization of data, where the creators of the site define the hierarchy data is submitted to.

Requirements

  • Avoid duplication of tags - two uses of the same word should reference the same tag.

  • Tag list may contain 1000s elements do not download full list to the client if avoidable.

  • Tag entry should autocomplete - e.g physics, physical exercise, as this drives people to use existing tags rather than enter new ones.

  • Tag searches are case insensitive, tags can be short sentences and include spaces.

  • For any collection we can quickly create a frequency counted set of the associated tags.

  • one or more tags can be used to filter search results. (Faceted Search). filters operate independently and users can add or remove them in arbitrary order in concert with the updating of results. https://alistapart.com/article/design-patterns-faceted-navigation/

  • Tags are simple strings and do not carry entity specific metadata - e.g. scores, endorsements etc. Use badges for that.

Implementation Options

https://docs.mongodb.com/manual/tutorial/model-data-for-keyword-search/

 

String Array

Each entity has a String Array tags field e.g. tags: ['robot', ‘physics’, ‘year12’]

Schema: { tags: [String] } Data: { tags:['robot', ‘physics’, ‘year12’] }
Example: { title : "Moby-Dick" , author : "Herman Melville" , published : 1851 , ISBN : 0451526996 , topics : [ "whaling" , "allegory" , "revenge" , "American" , "novel" , "nautical" , "voyage" , "Cape Cod" ] }

Pro:

  • Managing data very simple - just CRUD the string array.

Con:

  • No reference required to another collection.

  • Strings duplicated many times across each entity.

  • Search uses text comparisons

  • Autocomplete or pick lists not available

  • keyword lookup does not support stemming or synonyms

 

 

Tag Object Array

There is a collection of tags each carrying a name string and id. Entities maintain an array of TagIDs.

 

Current Implementation as of Oct 2019

Schema Tag { _id: ObjectID name: String } Item { tags: [Tag] } Data tagCollection: [ { _id: 1, name: 'physics' }, { _id: 2, name: 'physical education' }, { _id: 3, name: 'chemistry' }, { _id: 4, name: 'biology' } ... ] item: { tags: [2,3,4] }

Pro:

  • Space saving, each string occurs only once, each tag array only holds ids

  • An index of tag strings can be created making searching for tag id faster.

  • Easy to generate autocomplete list based in prefix or partial match.

  • Tag collection has own API so easy to add and initialise tags, create tag cloud etc.

  • Tag object can be expanded to include other category data, or a parent object creating a taxonomic hierarchy or full ontology.

Con:

  • Populate out tag array with equivalent strings requires many database calls, is slow and expensive on memory - not mongodb friendly

  • Search on tags requires finding string in tags collection and then finding ID in entities.

  • Removing a tag from entity will not remove it from tag collection as there may be other uses, but we don’t easily see when there are zero uses.

  • In MongoDB, a write operation on a single document is atomic. For fields that must be updated together, embedding the fields within the same document ensures that the fields can be updated atomically. The problem here is that we first have to update the tags collection and then place the results into the item document. https://docs.mongodb.com/manual/tutorial/model-data-for-atomic-operations/

     

Tag Document Collection

Placing each tag as a separate document in a tag collection touches on https://docs.mongodb.com/manual/core/data-model-operations/ Collection Contains Large Number of Small Documents. If the total number of tags is not large then it may make more sense to treat this as a single document.

e.g

Pro:

  • If the tag list is not too large it can be retrieved in a single API call and used client side, Held in memory to drive select or autocomplete lists effectively. However this is not necessary and we can also provide an autocomplete API that returns partial matches to keywords. along with categories

  • grouping tags into collections allows some categorisation - e.g. school topics, skills etc. This may help pick lists focus on key terms.

  • Results are copied as strings arrays into the item documents so can be searched using multiple terms using keyword search. Docs have atomic updates.

  • Adding tags is just upserting the document. As the list is used for selection, an item with a tag not in the list is not a problem.

  • Also providing the tag list via redux will allow it to be server side added to the web page.

Con:

  • Each tag string is stored multiple times - once in the tag list, and again for each usage. However tags are short strings.

  • Finding all the docs matching a given tag means searching each collection separately. This is ok though.

  • Counting instances of each tag would need to be done client side after a result set has been provided.

 

Likely size of the tags document. 1000- 10000 words, average word length 7, largest file 70k. This is smaller than most images in the system.

Tag Object + Tag Usage collections

Tags appear to be intrinsically ‘many to many’ - each tag is used many times, each item has many tags. This implies a join table. This is very Relational database style and may be an anti-pattern for mongoDB.

I found some places where a separate collection was not added but the two entities both have [ObjectID] for the other. http://blog.markstarkman.com/blog/2011/09/15/mongodb-many-to-many-relationship-data-modeling/ However, this seems easy to get out of synch and breaks the atomic write pattern.

 

Pro:

  • Space saving, each string occurs only once, each tag array only holds ids

  • An index of tag strings can be created making searching for tag id faster.

  • hasTag collection can be searched for a list of tag ids returning a list of matching items.

  • Storing model in the record allows a range of different entities to share the same tags and be type identified or sorted before or after discovery.

  • Easy to generate autocomplete list based in prefix or partial match.

  • Tag collection has own API so easy to add and initialise tags, create tag cloud etc.

  • Tag object can be expanded to include other category data, or a parent object creating a taxonomic hierarchy or full ontology.

  • HasTag collection has own API so easy to add and remove tags for any entity without touching the main item.

  • Single place to count how many items use a given tag.

  • Tags for an item can be lazy fetched after the item has been fetched.

  • Adding or removing tags for an item can operate separately from editing the item content

  • MongoDB has specific handlers for this type of construction https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

  •  

Con:

  • Search on tags requires finding strings in tags collection and single search of HasTag collection.

  • Removing a tag from entity will not remove it from tag collection as there may be other uses, but we can test whether a tag is unused with a single query.

  • Adds another collection and API.