I’m comfortable to share you could now consider, examine, and choose the perfect basis fashions (FMs) in your use case in Amazon Bedrock. Mannequin Analysis on Amazon Bedrock is out there at present in preview.
Amazon Bedrock provides a alternative of automated analysis and human analysis. You should use automated analysis with predefined metrics comparable to accuracy, robustness, and toxicity. For subjective or customized metrics, comparable to friendliness, fashion, and alignment to model voice, you possibly can arrange human analysis workflows with just some clicks.
Mannequin evaluations are crucial in any respect levels of improvement. As a developer, you now have analysis instruments obtainable for constructing generative synthetic intelligence (AI) purposes. You can begin by experimenting with totally different fashions within the playground atmosphere. To iterate sooner, add automated evaluations of the fashions. Then, if you put together for an preliminary launch or restricted launch, you possibly can incorporate human evaluations to assist guarantee high quality.
Let me provide you with a fast tour of Mannequin Analysis on Amazon Bedrock.
Automated mannequin analysis
With automated mannequin analysis, you possibly can deliver your personal knowledge or use built-in, curated datasets and pre-defined metrics for particular duties comparable to content material summarization, query and answering, textual content classification, and textual content era. This takes away the heavy lifting of designing and operating your personal mannequin analysis benchmarks.
To get began, navigate to the Amazon Bedrock console, then choose Mannequin analysis beneath Evaluation & deployment within the left menu. Create a brand new mannequin analysis and select Automated.
Subsequent, comply with the setup dialog to decide on the FM you need to consider and the kind of activity, for instance, textual content summarization. Choose the analysis metrics and specify a dataset—both built-in or your personal.
When you deliver your personal dataset, make certain it’s in JSON Strains format, and every line incorporates the entire key-value pairs that you simply need to consider your mannequin with for the mannequin dimension that you simply need to consider. For instance, if you wish to consider the mannequin on a question-answer activity, you’d format your knowledge as follows (with class
being optionally available):
{"referenceResponse":"Cantal","class":"Capitals","immediate":"Aurillac is the capital of"}
{"referenceResponse":"Bamiyan Province","class":"Capitals","immediate":"Bamiyan metropolis is the capital of"}
{"referenceResponse":"Abkhazia","class":"Capitals","immediate":"Sokhumi is the capital of"}
...
Then, create and run the analysis job to grasp the mannequin’s task-specific efficiency. As soon as the analysis job is full, you possibly can evaluate the ends in the mannequin analysis report.
Human mannequin analysis
For human analysis, you possibly can have Amazon Bedrock arrange human evaluate workflows with a number of clicks. You may deliver your personal datasets and outline customized analysis metrics, comparable to relevance, fashion, or alignment to model voice. You even have the selection to both leverage your personal inner groups as reviewers or interact an AWS managed workforce. This takes away the tedious effort of constructing and working human analysis workflows.
To get began, create a brand new mannequin analysis and choose Human: Convey your personal workforce or Human: AWS managed workforce.
When you select an AWS managed workforce for human analysis, describe your mannequin analysis wants, together with activity kind, experience of the work workforce, and the approximate variety of prompts, alongside together with your contact info. Within the subsequent step, an AWS skilled will attain out to debate your mannequin analysis challenge necessities in additional element. Upon evaluate, the workforce will share a customized quote and challenge timeline.
When you select to deliver your personal workforce, comply with the setup dialog to decide on the FMs you need to consider and the kind of activity, for instance, textual content summarization. Then, choose the analysis metrics, add your check dataset, and arrange the work workforce.
For human analysis, you’d format the instance knowledge proven earlier than once more in JSON Strains format like this (with class
and referenceResponse
being optionally available):
{"immediate":"Aurillac is the capital of","referenceResponse":"Cantal","class":"Capitals"}
{"immediate":"Bamiyan metropolis is the capital of","referenceResponse":"Bamiyan Province","class":"Capitals"}
{"immediate":"Senftenberg is the capital of","referenceResponse":"Oberspreewald-Lausitz","class":"Capitals"}
As soon as the human analysis is accomplished, Amazon Bedrock generates an analysis report with the mannequin’s efficiency towards your chosen metrics.
Issues to know
Listed here are a few essential issues to know:
Mannequin assist – Throughout preview, you possibly can consider and examine text-based giant language fashions (LLMs) obtainable on Amazon Bedrock. Throughout preview, you possibly can choose one mannequin for every automated analysis job and as much as two fashions for every human analysis job utilizing your personal workforce. For human analysis utilizing an AWS managed workforce, you possibly can specify customized challenge necessities.
Pricing – Throughout preview, AWS solely fees for the mannequin inference wanted to carry out the analysis (processed enter and output tokens for on-demand pricing). There will likely be no separate fees for human analysis or automated analysis. Amazon Bedrock Pricing has all the small print.
Be part of the preview
Automated analysis and human analysis utilizing your personal work workforce can be found at present in public preview in AWS Areas US East (N. Virginia) and US West (Oregon). Human analysis utilizing an AWS managed workforce is out there in public preview in AWS Area US East (N. Virginia). To be taught extra, go to the Amazon Bedrock Developer Expertise internet web page and take a look at the Person Information.
Get began
Log in to the AWS Administration Console and begin exploring mannequin analysis in Amazon Bedrock at present!
— Antje