nicAI | my personal AI assistant

hobby project as a long-term goal to help guide my AI learning

23 Jul 2022

Goal:
- learn machine learning & AI with a specific and personal project
- leverage personal content structured as part of this site (eg movies, musiç )

First steps

Learning Computer Science

Learning AI

Github repos to explore

Fine-tuning a model

27 Aug 2024

OpenAI's GPT4o can now be fine-tuned: https://openai.com/index/gpt-4o-fine-tuning/

Dashboard: https://platform.openai.com/finetune

nicai/240827-gpt-fine-tune-model.jpg

Create a fine-tuned model
Base Model
Training data
Add a jsonl file to use for training.
Upload new
Select existing
Upload a file or drag and drop here

(.jsonl)

Validation data
Add a jsonl file to use for validation metrics.
Upload new
Select existing
None
Suffix
Add a custom suffix that will be appended to the output model name.
my-experiment
Seed
The seed controls the reproducibility of the job. Passing in the same seed and job parameters should produce the same results, but may differ in rare cases. If a seed is not specified, one will be generated for you.
Random
Configure hyperparameters

Batch size
auto

Learning rate multiplier
auto
In most cases, range of 0.1- 10 is recommended

Number of epochs
auto
In most cases, range of 1- 10 is recommended

Learn more: https://platform.openai.com/docs/guides/fine-tuning

Explore first prompt engineering tactics: https://platform.openai.com/docs/guides/prompt-engineering

OpenAI Assistant

02 Oct 2024

Started testing OpenAI's Assistant API - it's very good.

Migrated NicKalGPT to the OpenAI Playground:

https://platform.openai.com/playground/assistants

and then used it via API as follows:

from openai import OpenAI
client = OpenAI()

import json

ASSISTANT_ID = os.getenv("NIC_KAL_GPT")

def wait_on_run(run, thread):
    while run.status == "queued" or run.status == "in_progress":
        run = client.beta.threads.runs.retrieve(
            thread_id=thread.id,
            run_id=run.id,
        )
        time.sleep(0.5)
    return run


user_prompt = f"""
write the following:

- a cold email, using known frameworks, and in my style
- 3 key questions to ask on a cold call
- a pesonalised Linkedin connection request message

to:

{contact_data}

Write in:
    - German, if contact is in Germany or Austria
    - French, if contact is in France
    - English, otherwise, or if unsure

Never start emails with "I hope this email finds you well" or "I hope you are doing well", or "Quick question". 
Instead, start with a question or a statement that shows you know something about the person or their company.
Use the informal way to refer to the company's name, as if you were talking to a friend (eg "BMW" not "BMW Group").
When using client references from Kaltura, choose relevant ones that are likely to be known by the recipient, so either in the same industry or well-known Enterprises.

Only return the content of the email, including Subject line, but do not add any extra comments to your answer.

Return as JSON following this format:

{
    "email_subject": "Subject Line",
    "email_body": "Email Body",
    "questions": ["Question 1", "Question 2", "Question 3"],
    "linkedin": "Message",
}

Do not return anything else, as I will parse your response programmatically.
Remove any Markdown formatting from your response.
"""

thread = client.beta.threads.create()

message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content=user_prompt,
)

run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=ASSISTANT_ID,
)

run = wait_on_run(run, thread)

messages = client.beta.threads.messages.list(thread_id=thread.id)

message = messages.data[0].content[0].text.value

print("\n\nmessage:")
print()
print(message)

print("\n\n")

# Parse the JSON data
data = json.loads(message)

# Accessing individual fields
email_subject = data["email_subject"]
email_body = data["email_body"]
questions = data["questions"]
linkedin_message = data["linkedin"]

# Printing extracted data
# print(f"\n\nOutbound Suggestions for {x.first} {x.last} at {x.company} located in {x.country}:\n")
print("Email Subject:\n", email_subject)
print("\nEmail Body:\n", email_body)
print("\n\nQuestions for cold call:")
for question in questions:
    print("-", question)
print("\n\nLinkedIn Connect Message:\n", linkedin_message)

03 Oct 2024

Playground: https://platform.openai.com/playground/assistants

Structured output doesn't seem to work with "File Search" on (vector database)? 🤔
https://platform.openai.com/docs/guides/structured-outputs/introduction

Prompts

"You will find below some information.
Leverage the knowledge provided, but try to think beyond what has been shared and list all the use cases CompanyX could use Kaltura for.
Write these use cases and benefits in my style.
Provide the answer in Markdown format, in a code block, without using headers nore bold font."

Resources

links

social