Bonehead.AI : An Intelligent Orthopedic Diagnostic Agent

Purpose: BoneHead,
a Diagnostic AI

To build an AI diagnostic tool,, which will intelligently and accurately provide diagnosis and prognosis suggestions for orthopedic cases, based on three factors: intake exam notes, patient history, and an AI digestion of the published corpus — critically filtered by experts to a select dataset of the top 2% of all published research findings.

Focus: Top 2%

The initial proof-of-concept for will be the specific procedure of shoulder arthroplasty within the broader field of clinical orthopedic surgery. While there are more than 90,000 papers keyed by PubMed to the field of “Orthopedics” in 2023, there are approximately 17,000 papers identified by AAOS that relate to this particular procedure. Those 17,000 will be our baseline.

Furthermore, a team of human researchers (comprised of both clinical experts and statisticians) manually reviews those 17,000 papers, and will filter them to a mere 350 “best practice candidate” papers that are the most relevant, and least biased. This is roughly the top 2% of the published repository. Those 300 will be heavily weighted (given 10-100x more relevance than the general body of knowledge) in the training matrix of our AI tool.

Use Case:
Augmenting Human Expertise

The Doctor will use natural language to interact with the system. Using a simple verbal, conversational interface, the clinician will speak into their phone, laptop, or workstation, through a secure interface, and receive, in parallel, both verbal and written (transcript) responses. An example conversation might go as follows:

Doc: “Bonehead, diagnosis.”

BH: “Yes, Doctor Roberts. How may I help you today?”

Doc: “What is the recommendation of best practice for arthroplasty, cement fixation for treatment of displaced thermal neck fracture?”

BH: (responds with:)

  1. detailed clinical recommendation
  2. hyperlinked citations to both abstracts and full text of relevant vetted papers
  3. follow-up questions for doc to enable system clarify / improve diagnostic.

Development Process:
Building BoneHead

The development of is scoped to occur in multiple sequential phases. At the conclusion of each phase, certain milestone criteria will be measured in order to clear the path forward to the next phase.

  1. scope approval
    • agreement amongst all stakeholders that scope, budget and timeline are acceptable.
  2. data acquisition
    • Access will be obtained for pay-walled and subscription only research databases. This will nominally include the body of work of the 350 vetted papers. Ideally it will also include, at a minimum abstracts, and maximally the full text (including linked citations) of the entire corpus (17,000 papers from 2023)
  3. data prep
    • The data corpus (and related metadata) will be thoroughly scrubbed and re-formatted for optimal ingestion by the AI engines, most probably in a JSON, machine-friendly format.
    • We will test this with a test corpus of a dozen papers until the scrub process is automated and perfected. At that point the entire corpus will be processed via the scrubbing engine.
  4. engine selection & licensing
    • The best in class GPAI (general purpose AI) will be selected, and licensing negotiated and secured. Leading candidates include GPT4 (OpenAI), Bard Ultra (Alphabet), and Llama (Meta). These are all generally classed as “Foundation / Frontier” Models, and represent the absolute SoTA (State of the Art) of current AI technology.
  5. fine tuning
    • The selected GPAI engine will be fed the scrubbed corpus, both narrow (350) and wide (17,000). Weights will be assigned to heavily favor the 350 key papers. It is of note that the 17,000 will still play a key role in the training of the system, as the AI will be able to gather important intelligence and patterns from the broader corpus, while applying the strict wisdom of the select corpus to that knowledge-base.
  6. pre-prompt design & testing
    • The pre-prompt are the guardrails, the tone, and the perspective via which the GPAI will transform into BoneHead, the clinical diagnostic expert. Designing of a functional pre-prompt is an iterative process, and is key to the success of the project. Various approaches will be tested, and the results analyzed and graded by a panel of expert physicians.
  7. alpha launch: A/B testing
    • A select group of physicians will be given access to the version 1.0 release of the system. All inputs and outputs will be centrally recorded — following strict HIPAA guidelines — for quality post-analysis, and future analysis, improvements and enhancements to the system.
    • During this time, multiple candidate AIs will be running in parallel. Each response will give the querying physician the option to grade it on a scale of 1 to 10.
    • The Alpha launch will be free to use.
  8. v1.0 launch
    1. A secure UX / login / registration system will be placed on the front end (web interface)
    2. BoneHead AI will be made available to the entire membership of AAOS.
    3. Firm usage pricing will be established **
  9. updates / maintaining currency
    1. The corpus of clinical practice knowledge and the leading edge of research is a moving target.
    2. On a quarterly basis, the engine will be updated with the latest published research papers (at present, roughly 5,000 new papers per quarter).
    3. On an annual basis, a new set of weighted key papers will be identified for the engine. (annually, approximately 500 papers will be selected as keys).
  10. roadmap
    1. Real time feedback will be collected via the interface.
    2. A roadmap will be constructed for future development.

**NOTE: This proposal does not include an account management, usage limits / throttling, or integrated billing system. Such would be scoped, developed, and launched in parallel with the primary AI engine, or after proof of concept. Initially, total monthly usage would be billed directly to AAOS.


Proposed Schedule:

The project should take approximately 90 days from approval of scope (Phase 1) to alpha launch (Phase 7).

So given a hypothetical contract approval date of Jan 15,
we could have live testing up and running by
April 15, 2024.


Proposed Budget:

A rough proxy budget for such an undertaking is approximately $125,000.

dSky has serious interest in initiating this project, so discounts are available.

Note that this cost does not include model licensing or operational costs. Initial Model Licensing should not exceed $5,000 USD. Operational costs are estimated to be a maximum of $0.50 (fifty cents) per diagnosis. That is inexpensive, but could accrue to large numbers under heavy usage load.



Gregory Roberts

(310) 487-9662





AAOS : American Academy Of Orthopaedic Surgeons

CPG: Clinical Practice Guidelines

  • Clinical practice guidelines are statements that include recommendations intended to optimize patient care. They are informed by a systematic review of evidence, and an assessment of the benefits and harms of alternative care options.
  • CPG Goal: “to synthesize published research with the aim of providing a transparent and robust summary of the research findings for a particular orthopaedic disease topic.” It is paramount that these recommendations be based on sound scientific evidence, and not on the bias, skillset or habits of any individual researcher / physician.
  • Specific document types within the CPG include:
    • Clinical Practice Guidelines,
    • Appropriate Use Criteria, and
    • Performance Measures
  • Example CPG: Hip Fractures in the Elderly (AAOS, 2023)

Training Data:

PubMed: The leading repository of peer-reviewed research papers (NIH: 36 million+ citations)

  • PubMed lists 544,037 results for published papers on the subject of “orthopedics” from 1846-2024
  • Publication intensity has dramatically increased in recent years, more than doubling each decade since 1980. From 1980 to 1989, a total of 14,000 papers were published. For the comparable decade 2010 to 2019, more than 220,000 papers were published referencing keyword “orthopedics”

medRxiv: the “pre-print server” for health sciences (Cold Spring Harbor / Yale)

  • Pre-print servers are a timely way to get data and insights from research laboratories into the field of practice and industry. A rough statistic is that for medical / clinical papers, roughly 1 in 5 (20%) of pre-prints will ever actually be accepted for publication in a peer-reviewed journal.
  • The mean time from pre-print posting to actual journal publication is 178 days, or roughly 6 months.