New Off-the-Shelf (OTS) Datasets from Appen Accelerate AI Deployment
26 February 2021 - 12:00AM
Business Wire
High-quality datasets include scripted speech,
images with text, body movement and human audio
Appen Limited (ASX:APX), the leading provider of high-quality
training data for organizations that build effective AI systems at
scale, today announced new off-the-shelf (OTS) datasets. These
datasets are designed to make it easier and faster for businesses
to acquire the high-quality training data needed to accelerate
their artificial intelligence (AI) and machine learning (ML)
projects. The new OTS datasets include human body movement and
innovative baby crying sounds, as well as scripted speech and
images with text suitable for optical character recognition (OCR)
for high-demand but hard-to-acquire languages, such as Arabic,
Croatian, Greek, Hungarian, Thai and more. With the expanded
datasets, Appen’s total OTS offering includes over 250 datasets,
comprising of over 11,000 hours of audio, over 25,000 images and
over 8.7 million words across 80 languages and multiple
dialects.
Appen’s OTS datasets are a fast, cost-effective tool to
jumpstart an AI or ML project with consistent high-quality training
data. Teams expanding their AI capabilities can also leverage OTS
datasets to effectively improve accuracy, develop new model skills
and incorporate other improvements into their AI models. An OTS
dataset is often delivered in one week, for example, compared to
the eight to twelve weeks for a new dataset collection and
annotation project – or even longer, depending on complexity. All
Appen datasets are developed using a fully transparent, opt-in
methodology, so AI specialists can be assured their data is clean
and compliant, eliminating the potential risk of backlash and
reputation damage.
“AI teams around the world working on projects with tight
deadlines and flexible data requirements can benefit from using
off-the-shelf datasets,” said Wilson Pang, CTO of Appen. “OTS
datasets shorten time to value and provide access to high-quality
data at a lower total cost than using traditional methods. We at
Appen take the necessary steps to ensure that all our datasets are
ethically sourced and demographically balanced, enabling companies
to maintain responsible AI practices by minimizing bias in their
models and ensuring fair treatment of data annotators. You always
know the precise quality of an OTS dataset, which helps build
better AI that works in the real world.”
MediaInterface has delivered language technology solutions to
healthcare-related institutions in Germany and other parts of
Europe for over 20 years. When the company was expanding into
France, it had fully localized software but lacked French lexicon
data, especially French names and places, which are often
referenced in patient health information. Using Appen OTS datasets,
MediaInterface acquired approximately 21,000 French names and
14,000 place names. “The critical data from Appen has been
incorporated into our background lexicon to successfully launch in
a new market, and this helps us build out new vocabularies for our
clients and strengthen our approach for future market launches as
well,” said Ines Wendler, product manager at MediaInterface.
The most experienced AI experts combine OTS datasets with
on-demand data collection and annotation projects to meet their
complex AI model training data needs. Appen is the leader in
offering continued support through a range of specific data
collection services, such as ongoing data annotation and smart
labeling, through AI-powered tools and automated workflows to
maximize efficiency.
“We interact with AI from the moment we wake up to the moment we
go to bed – through virtual assistants, chatbots, search engines,
social networks, medical devices, smart cars and other
applications,” said Judith Bishop, Appen’s senior director of AI
specialists, who leads a team of 100 AI linguists and language
experts. “Language is often the primary interface for many of these
compelling AI use cases, so to guarantee a great experience, the
model needs to be trained to work for everyone. Appen’s commitment
to high-quality data and responsible, ethical AI development allows
companies purchasing our off-the-shelf datasets to accelerate their
AI projects with complete confidence in their data.”
Joining the existing hundreds of datasets already live on
appen.com, the list of new Appen OTS datasets that are now
available includes:
- Scripted speech for Arabic (Egypt), Arabic (Saudi Arabia),
Arabic (United Arab Emirates), Central Khmer (Cambodia), Croatian,
Greek, Hungarian, Polish, Spanish (Spain), and Turkish
- Image OCR for Simplified Chinese printed text, Thai printed
text, and Finnish printed text
- Includes pre-recorded billboards, outer packaging, signs,
magazines, and menus to train and update computer vision OCR
models
- Human body movement (China)
- Includes annotated videos of people moving, tracked at pixel
level, suitable for game development, fitness apps and more
- Baby crying audio (China)
- Includes pre-recorded and annotated baby sounds that can be
used to train AI models to recognize different crying sounds and
alert parents
Availability
For more information and to request an Appen OTS dataset sample,
visit https://appen.com/off-the-shelf-datasets/
About Appen Limited
Appen collects and labels images, text, speech, audio, video,
and other data used to build and continuously improve the world’s
most innovative artificial intelligence systems. Our expertise
includes having a global crowd of over 1 million skilled
contractors who speak over 235 languages, in over 70,000 locations
and 170 countries, and the industry’s most advanced AI-assisted
data annotation platform. Our reliable training data gives leaders
in technology, automotive, financial services, retail, healthcare,
and governments the confidence to deploy world-class AI products.
Founded in 1996, Appen has customers and offices globally.
View source
version on businesswire.com: https://www.businesswire.com/news/home/20210225005274/en/
Titus Capilnean Director, Corporate Marketing
tcapilnean@appen.com
Appen (ASX:APX)
Historical Stock Chart
From Oct 2024 to Nov 2024
Appen (ASX:APX)
Historical Stock Chart
From Nov 2023 to Nov 2024