Date: Friday, April 05, 2024
Work Undertaken Summary
Risks
TMA02 deadline fast approaching, focus shifting to TMA work
Time Spent
1hr TMA02 1hr research understanding transformers 1.5hr Pepper Diagram 1hr StoryTransformer Dev work (Tagging) 1.5hr StoryTransformer Diagram 1.25hr infini-attention papper 0.5 update schedule
Questions for Tutor
Next work planned
- [If you are registered in a Computing and IT (Honours degree) specialist route]
Will the solution be within the specialism route of my degree?
I’m not on a degree specialisation, I assume a statement in the report would be useful?
Raw Notes
TMA02
- TMA02 Assignment TMA 02 Review of Work in Progress: 2 Project activities | OU online (open.ac.uk)
- Resource map contains useful links for each submission: Resource Map | OU online (open.ac.uk)
TODO continue from here: TMA 02 Review of Work in Progress: 3.1 Preparation and planning | OU online (open.ac.uk)
Pepper Database Diagram First Attempt
---
title: Pepper Database Entity Relationship Diagram
---
erDiagram
USER ||--o{ MESSAGE : sends
USER ||--o{ WORKCARD : raises
WORKCARD |o--o{ MESSAGE : contains
TICKETBUG |o--o{ MESSAGE : contains
TICKETTASK |o--o{ MESSAGE : contains
WORKCARD ||--o{ TICKETTASK : contains
WORKCARD ||--o{ TICKETBUG : contains
ORGANISATION ||--o{ WORKCARD : "raises"
ORGANISATION ||--o{ ORGANISATIONUSER : contains
USER ||--o{ ORGANISATIONUSER : "works for"
ORGANISATION {
Guid Id
string name
}
ORGANISATIONUSER {
Guid OrganisationId PK,FK
Guid UserId PK,FK
}
USER {
Guid Id PK
bool IsDisabled
string FirstName_Calc
string LastName_Calc
string Initials_Calc
}
MESSAGE {
Guid Id PK
string RawText
string PlainText
int MessageType "1 = comment, 2 = internal comment"
bool Hidden
int TicketId FK "Links to WorkCard, Task or Bug"
DateTime CreatedUtc
Guid CreatedById FK
DateTime DeletedUtc
Guid DeletedById FK
bool IsDeleted
}
WORKCARD {
int Id PK
string Subject
string Description
int StatusId
WorkCardCategory Category "0 = Support, 1 = Chargeable work"
Guid OrganisationId
string Requirements
DateTime CreatedUtc
Guid CreatedById FK
DateTime DeletedUtc
Guid DeletedById FK
bool IsDeleted
}
TICKETTASK {
int Id PK
string Subject
string Description
int WorkCardId FK
DateTime CreatedUtc
Guid CreatedById FK
DateTime DeletedUtc
Guid DeletedById FK
bool IsDeleted
}
TICKETBUG {
int Id PK
string Subject
string Description
int WorkCardId FK
DateTime CreatedUtc
Guid CreatedById FK
DateTime DeletedUtc
Guid DeletedById FK
bool IsDeleted
}
https://mermaid.js.org/syntax/entityRelationshipDiagram.html
Pepper Diagram second attempt
---
title: Pepper Database Entity Relationship Diagram
---
erDiagram
USER ||--o{ MESSAGE : sends
USER ||--o{ TICKET : raises
TICKET |o--o{ MESSAGE : contains
TICKET ||--o| TICKETBUG : "is a"
TICKET ||--o| WORKCARD : "is a"
TICKET ||--o| TICKETTASK : "is a"
WORKCARD ||--o{ TICKETTASK : contains
WORKCARD ||--o{ TICKETBUG : contains
ORGANISATION ||--o{ TICKET : "raises"
ORGANISATION ||--o{ ORGANISATIONUSER : contains
USER ||--o{ ORGANISATIONUSER : "works for"
ORGANISATION {
Guid Id
string name
}
ORGANISATIONUSER {
Guid OrganisationId PK,FK
Guid UserId PK,FK
}
USER {
Guid Id PK
bool IsDisabled
string FirstName_Calc
string LastName_Calc
string Initials_Calc
}
MESSAGE {
Guid Id PK
string RawText
string PlainText
int MessageType "1 = comment, 2 = internal comment"
bool Hidden
int TicketId FK "Links to WorkCard, Task or Bug"
DateTime CreatedUtc
Guid CreatedById FK
DateTime DeletedUtc
Guid DeletedById FK
bool IsDeleted
}
TICKET {
int Id PK
string Subject
string Description
TicketEntityType EntityType "0 = WC, 1=Task, 2=Bug"
int StatusId
DateTime CreatedUtc
Guid CreatedById FK
DateTime DeletedUtc
Guid DeletedById FK
bool IsDeleted
}
WORKCARD {
int Id PK
WorkCardCategory Category "0 = Support, 1 = Chargeable work"
Guid OrganisationId
string Requirements
}
TICKETTASK {
int Id PK
int WorkCardId FK
}
TICKETBUG {
int Id PK
int WorkCardId FK
}
The ticketing system used by my employer is an in house product called Pepper. Pepper handles both support requests and normal feature development. This diagram is a slice of the system that is important to my project.
The pepper system uses inheritance as part of its modelling. The Ticket type is an abstract class from which WorkCard, TicketBug, TicketTask derive. They are all stored in the same database table using the Table-per-hierarchy pattern (Inheritance - EF Core | Microsoft Learn). This design choice was to enable bugs and tasks to be easily promoted into their own independent work cards.
classDiagram
Ticket <|-- WorkCard
Ticket <|-- TicketTask
Ticket <|-- TicketBug
Story Transformer
classDiagram
direction RL
class StoryTransformer {
-StoryDbContext context
-StorySerializer serializer
+StoryTransformer(context, serializer)
+RunAsync(outputFolder) Task
-GetBatchAsync(int batchNumber) Task
-EnumerateWithIndex(WorkCard[] batch, int batchNumber)
}
StoryTransformer o-- StoryDbContext
StoryTransformer o-- StorySerializer
class StoryDbContext {
+DBSet~WorkCard~ WorkCards
}
class StorySerializer {
-IStoryFormatter formatter
-StoryTagger[] taggers
-IStoryPseudoAnonymizer[] anonymizers
+StorySerializer(formatter, taggers, anonymizers)
+SerializeAsync(outputFolder, workCard, cardIndex)
-CreateFileContentAsync(workCard)
-GenerateTagsAsync(workCard, content) string[]
-GeneratePseudoAnonymizedContentAsync(content) string
}
StorySerializer o-- IStoryFormatter
StorySerializer o-- "*" IStoryTagger
StorySerializer o-- "*" IStoryPseudoAnonymizer
class IStoryFormatter {
+Format(WorkCard workCard) string
}
<<interface>> IStoryFormatter
class MarkdownStoryFormatter {
-StringBuilder stringBuilder
-ReverseMarkdown.Converter reverseMarkdown
+MarkdownStoryFormatter()
+Format(WorkCard workCard) string
-AppendField(string fieldName, string value)
-AppendHtmlField(string fieldName, string html)
-GetUserName(User user)
}
IStoryFormatter <|.. MarkdownStoryFormatter : implements
class IStoryTagger {
+AddTagsAsync(ITagCollection tags, WorkCard workCard, string content)
}
<<interface>> IStoryTagger
class KeywordTagger {
+KeywordTagger()
+AddTagsAsync(ITagCollection tags, WorkCard workCard, string content)
}
class FeatureTagger {
+FeatureTagger()
+AddTagsAsync(ITagCollection tags, WorkCard workCard, string content)
}
class TimescaleTagRemover {
+TimescaleTagRemover()
+AddTagsAsync(ITagCollection tags, WorkCard workCard, string content)
}
IStoryTagger <|.. KeywordTagger : implements
IStoryTagger <|.. FeatureTagger : implements
IStoryTagger <|.. TimescaleTagRemover : implements
class IStoryPseudoAnonymizer {
+PseudoAnonymizeAsync(string content)
}
<<interface>> IStoryPseudoAnonymizer
class RegexAnonymizer {
-RegexAnonymizer(AnonymizationRule[] rules)
-AnonymizationRule[] rules
+CreateAsync(string fileName)
+AddTagsAsync(ITagCollection tags, WorkCard workCard, string content)
}
IStoryPseudoAnonymizer <|.. RegexAnonymizer : implements
class AnonymizationRule {
+Regex Regex
+string Replacement
}
RegexAnonymizer o-- "*" AnonymizationRule
class ITagCollection {
+Collection~string~ Tags
+AddTag(string tag)
+RemoveTag(string tag)
}
<<interface>> ITagCollection
class TagCollection {
-Set~string~ _tags
-Set~string~ _removedTags
+TagCollection()
+Collection~string~ Tags
+AddTag(string tag)
+RemoveTag(string tag)
}
ITagCollection <|.. TagCollection : implements
ITagCollection <-- IStoryTagger
I’ve designed the system to be very modular to allow for rapid experimentation / flexibility / iteration.
Most items interact only via an interface to allow bits to be swapped out.
Most items use dependency injection, inversion of control to allow for wider use / top level configuration.
Added ITagCollection to allow for logic in the adding of tags, e.g. synonyms or deduplication of tags.
The tag generation is done via an IStoryTagger interface, this allows breaking up of the logic associated with generating the tags. E.g. KeywordTagger adds tags based off the presence of keywords, FeatureTagger exploits the structured information from pepper about which feature it was assigned to. A future tagger could call off to an LLM to categorize the card.