Date: Thursday, May 23, 2024
Work Undertaken Summary
Risks
Time Spent
2.5hr dev container. 1hr reading 3.25hr initial AI with llama3 and phi3 1hr tweaking storytransformer output to use different tags (for easier processing of start and end of sections) and for filtering for only lead stories 4hr Storyweaver:
- Started with phi3 and it got stuck in output loops
- Tweaked temperature and beam count with limited success
- Good results but only for first 2k ish characters
- Switch to llama3, good results to 3k characters
- refine the prompt until it starts outputting more what is wanted at about 8k characters
Questions for Tutor
Next work planned
Read the prompting guide from meta: Prompting | How-to guides (meta.com)
Change examples to use the actual story format Maybe change the delimiters to be more generic like start-story and end-story. Double check if mistral-instruct-0.3 is trash Change storytransformer to search for and care about lead stories
Raw Notes
pytorch/.devcontainer/README.md at main · pytorch/pytorch (github.com)
pytorch/.devcontainer/Dockerfile at main · pytorch/pytorch (github.com)
pytorch/.devcontainer/cuda/environment.yml at main · pytorch/pytorch (github.com)
Had to mount using a volume as else it complained. think it might have actually just been complaining about mouting anything at /
Reading
New models added to the Phi-3 family, available on Microsoft Azure | Microsoft Azure Blog Tiny but mighty: The Phi-3 small language models with big potential - Source (microsoft.com)
[2404.14219] Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone (arxiv.org)
They say that phi3 punches above its weight with language understanding but due to its size doesnât store much âinbuiltâ knowledge. But given that we are providing the information that doesnât seem like a detractor for our use case. Also only understands english, which is fine.
RULER
hsiehjackson/RULER: This repo contains the source code for RULER: Whatâs the Real Context Size of Your Long-Context Language Models? (github.com) Shows that llama3 doesnât really degrade between 0-32k tokens. phi3 mini degrades to worse than llama2 but still pretty respectable and is the smallest model. Give them 2 weeks then if they havenât updated the benchmark consider running it ourselves with llama3 7B and phi 7b and 14B.
Running the model
Had to provide an example to get it to just return the expected result rather than a âHere is the description you asked forâ.
Turned down the temperature to make it less creative, then changed do_sample to false to force to not be creative.
<?xml version="1.0" encoding="utf-8"?>
<root>
<system_prompt>You are a helpful assistant tasked with creating a description of a vertical slice of a software system.
You never hallucinate information, if you are unsure you omit the information.
The user will descibe what the area of the system relates to, the current description of the area and a list of stories that have been completed.
You will then provide a new description of the area that includes only the relevant information from the stories.
The user will provide the current version of the between [existing documentation] and [first story]
Stories are delimited by the [next story] token.
You will output ONLY the complete new description of the area.
</system_prompt>
<example>
<prompt>Describe Applications</prompt>
<batch>Applicants can create a user to complete the application</batch>
<batch>Existing applicants can see a list of applications and can create or edit their applications</batch>
<batch>Internal users can see a list of applications, edit or approve.</batch>
<input>Applications is a form which propspective users fill out to request services.</input>
<output>Applications is a form which propspective users fill out to request services.
From the applications list applicants and internal users can:
- Create
- Edit
Internal users can:
- Approve
</output>
</example>
<test>
<prompt>Describe the Leads functional area.</prompt>
<batch>Internal users can email a lead to the supplier for quoting</batch>
<batch>A lead can be edited by a customer service admin</batch>
<batch>A lead can be deleted</batch>
<batch>A lead contains:
- Name
- Reference
- Created Date
- Status
- New
- In Progress
- Complete
- Completed Date</batch>
<batch>Add a new lead status quoted, which is set when the supplier replies to the quote</batch>
<batch>Internal users can complete a lead, which sets the completed date</batch>
<input>Internal users can create a lead (permission required)</input>
</test>
<testoutput>
Leads is a functional area where internal users can create, manage, and track leads. Internal users can:
- Create a lead
- Email a lead to the supplier for quoting
- Edit a lead (customer service admin permission required)
- Delete a lead
- Complete a lead, which sets the completed date
A lead contains the following information:
- Name
- Reference
- Created Date
- Status (New, In Progress, Complete, Quoted)
</testoutput>
</root>Llama3 8B - 8bit
Leads is a functional area where internal users can create, manage, and track leads. Internal users can:
- Create a lead
- Email a lead to the supplier for quoting
- Edit a lead (customer service admin permission required)
- Delete a lead
- Complete a lead, which sets the completed date A lead contains the following information:
- Name
- Reference
- Created Date
- Status (New, In Progress, Complete, Quoted)
phi3 small (7B) - 8bit
The Leads functional area allows internal users with the appropriate permissions to:
- Create a lead, which includes fields for Name, Reference, Created Date, Status (New, In Progress, Complete, Quoted), and Completed Date.
- Email a lead to the supplier for quoting.
- Edit a lead.
- Delete a lead.
- Complete a lead, which sets the Completed Date.
Additionally, a new âQuotedâ status is available for leads, which is set when the supplier replies to the quote.
phi3 medium (14B) - 4bit
Leads is a functional area where internal users can create, edit, and delete leads. It requires permission to create a lead. A lead contains the following information:
- Name
- Reference
- Created Date
- Status (New, In Progress, Complete)
- Completed Date
A lead can be emailed to the supplier for quoting. Once the supplier replies, the lead status can be updated to âquotedâ. A lead can be edited by a customer service admin. Internal users can also complete a lead, which sets the completed date.
Llama3 70B - no quant
Only used about 8% gpu. Basically no ram usage. Vram at 14GB seems to mostly be just reading from disk at 500MB/s
Traceback (most recent call last):
File "/workspaces/StoryWeaver/weaver.py", line 71, in test_prompt
response = model.test()
File "/workspaces/StoryWeaver/llm/basic_model.py", line 92, in test
result = self._run_messages(next_messages)
File "/workspaces/StoryWeaver/llm/llama3.py", line 41, in _run_messages
outputs = self.model.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1758, in generate
result = self._sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2397, in _sample
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1183, in forward
logits = self.lm_head(hidden_states)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 161, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 328, in pre_forward
value = self.weights_map[name]
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/offload.py", line 118, in __getitem__
return self.dataset[f"{self.prefix}{key}"]
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/offload.py", line 171, in __getitem__
tensor = f.get_tensor(weight_info.get("weight_name", key))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.96 GiB. GPU
OutOfMemoryError('CUDA out of memory. Tried to allocate 1.96 GiB. GPU ')
An error occurred, please try again
Did not finish after 3 hours.
StoryTransformer Changes
Leads was out of order because one card was made like 4 months before the others but the order wasnât optimal. So the initial assumption about just basing off the created date wasnât sufficient for leads.
As this was the only story Iâve decided to update the stories created date to more accurately reflect its location in the timeline.
update Tickets Set CreatedUtc='2024-06-27' where Id=131421
Tag Remover
The fuzzy text match caused results for before the major feature was developed to be surfaced. To prevent this I added a âtaggerâ that removes any tags added for stories prior to a certain date e.g. if a story was matched that was before the initial development then it can safely be ignored.
StoryWeaver - Testing 14b phi3
Problem - It can get stuck in local minima and so repeatedly output the same text in a loop. Only discovered after
- 3e, Any Lead/Notes, Lead/Documents, Lead/Communications should be copied to any new or existing Company records
Leads & Opportunities - Documents
Leads & Opportunities - Communications
Leads & Opportunities - Actions
Leads & Opportunities - Notes/Audit
Leads Lookups
New lookups, which will be used on Leads, as follows:
- New Reasons type lookup of "Lead Cancellation" (with other Reasons lookups)
- New Reasons type lookup of "Lead Loss" (with other Reasons lookups)
- Lead Probability
Leads & Opportunities - Documents
Leads & Opportunities - Communications
Leads & Opportunities - Actions
Leads & Opportunities - Notes/Audit
Leads Lookups
New lookups, which will be used on Leads, as follows:
- New Reasons type lookup of "Lead Cancellation" (with other Reasons lookups)
- New Reasons type lookup of "Lead Loss" (with other Reasons lookups)
- Lead Probability
Leads & Opportunities - Documents
Leads & Opportunities - Communications
Leads & Opportunities - Actions
Leads & Opportunities - Notes/Audit
Leads Lookups
Started testing various temporatures to see if they are any better.
Increasing by 0.2 each time.
0.2 got stuck. 0.4 got stuck but on reports, which is fair enough as they are very complicated.
0.8 - 04 temp loop After about 1733 characters it started just spitting out the stories verbatim, maybe something to do with the 4k ish base context causing it to start dying around 1.7x2 = 3.5k words.
temp 0.8 started to hallucinate work being either completed or in progress. Interestingly it started generating in the style we use for new cards
Tried to use beams (5) but ran out of memory. Tried to use beams (3) but it never completed, assumedly got stuck in a generation loop like before.
Trying with llama3, might need to try models with specifically longer context sizes, or maybe try creating a skeleton with different headers that it can out content under, and running against each header rather than for the section overal.
llama3 completed! interestingly the final result was around 1.7k characters⊠Is the system pairing things down to stay within some limits or did it just get confused?
At this commit it got confused and removed some important detail. https://gitea.4man.dev/lukethoma5/story-weaver-output/commit/f27796d0b6d36284d1477a1e306aec6305b27131
Going to try re-generating with a longer allowed generation. Failing that going to add the important detail back and see if it tries to cut it again.
Increasing the limit came to the same out-come. going to try and add the detail back.
Itâs definitely culling the information, trying to keep it to a certain size. will retry without telling it to be consise and see what it does.
It wasnât being told to be consise.
I told it You will NOT remove any information from the existing documentation, unless that information has been contradicted by a later story. and the output jumped from 1600 characters to 2700 characters. With seemingly all the important information still there.
It is still removing important functionality 144415, 144416 in Leads - Leads & Opportunities - Convert, Cancel, other Status changes · 453734e017 - story-weaver-output - Gitea: Aeternus (4man.dev) but less than before.
It takes about 10 minutes to run through all the stories. now taking much much longer after making it print out more. It took about 40 minutes Could look to having it output a diff rather than the full thing to increase speed? Could look at creating the skeleton as just various headers Could look at improving the prompt to be more precise and give examples of how to interpret the various fields. Could try again with phi3 now that we know that llama3 can do it. phi3 immedaitely got stuck in a loop again after the first batch, never mind
It works surprisingly well!
It added content in the right places!

Latest version is 8k characters⊠yippe, worked so much better on llama than on phi3.
Trying to get it to output a diff instead only led to it outputting the whole thing as a blank document into a completely new document.