syndu | Sept. 10, 2023, 12:28 a.m.
Hello readers,
Today, we're going to explore a fascinating piece of code that lies at the heart of our system, a module we affectionately call Memex. This module is responsible for detecting entities and subjects in a chat message and retrieving relevant messages from the conversation history based on these entities. Let's dive in!
The Memex module is primarily composed of two functions: detect_entities_and_subjects and get_relevant_memex_messages.
The detect_entities_and_subjects function takes a chat message as input and returns a list of entities and subjects detected in the message.
def detect_entities_and_subjects(chat_message):
tokens = Token.objects.filter(chat_message_id=chat_message.id).filter(Q(entity__isnull=False) | Q(pos__in=['NOUN', 'PROPN']))
entities_and_subjects = [token.token for token in tokens]
return entities_and_subjects
It does this by querying the Token model, which stores tokenized words from each chat message along with their Part-Of-Speech (POS) tags and Named Entity Recognition (NER) annotations. The function filters for tokens that either have a non-null entity (indicating that they've been recognized as a named entity) or have a POS tag of 'NOUN' or 'PROPN' (indicating that they're likely subjects of the sentence).
The get_relevant_memex_messages function uses the entities detected by the previous function to retrieve relevant messages from the conversation history.
def get_relevant_memex_messages(chat_message):
entities = detect_entities_and_subjects(chat_message)
message_ids = (
Token.objects.filter(
chat_message_id__author=chat_message.author, token__in=entities
)
.values_list("chat_message_id", flat=True)
.distinct()
)
...
relevant_messages = ChatMessage.objects.filter(id__in=message_ids).order_by(
"-timestamp"
)[:depth]
...
return conversation
First, it calls detect_entities_and_subjects to get a list of entities. Then, it queries the Token model again to get the IDs of all messages authored by the same user that contain these entities.
The function also includes a mechanism to limit the number of messages retrieved based on a 'context depth'. This context depth can be set for each message and defaults to 3 if not specified.
Finally, it queries the ChatMessage model to retrieve the actual messages corresponding to the IDs, orders them by timestamp (most recent first), and limits the number of messages based on the context depth.
The result is a list of the most recent, relevant messages from the conversation history, which can then be used to provide more context-aware responses.
In conclusion, the Memex module is a powerful tool for contextual conversation analysis. By identifying key entities and subjects in a chat message and retrieving relevant messages from the conversation history, it enables the system to understand the context better and provide more accurate and relevant responses.
Stay tuned for more deep dives into our system's inner workings!