Welcome to our fourth section.
We're just going to show you how we are promoting at this moment
Read-Across both of the generation of database and an automated tool for Read-Across.
Because these are if you see our graph again,
data and then later they helped
all our critical elements to allow actually to execute good read across.
We need to know that data are available.
We need to retrieve the data.
We need to be sure that they have sufficient quality in order
to funnel them into our read-across process.
So we did actually take a step here ourself because the reach legislation is not
only a reason for a lot of animal testing and to carry out read-cross.
It is also delivering for us a source of
public information which could be the basis for good quality read-across,
because as first legislation in the world,
it foresees that summary reports are made publicly available from the registrations.
So everyone has access to the key pieces of information on a given chemical
which has been registered and this can be done
via the website of the chemical agents in Helsinki.
The problem is that this website allows you to
enter the chemical name or structure and if there is something inside,
you get the dossier or summary of dossier,
but you need to know that the substances inside.
You cannot systematically search it.
And most of the information is actually free text.
So we downloaded the database in December 2014.
And if I say we,
this work was carried out by a Ph.D. student, Tom Luechtefeld.
And at the time already 9,800 chemicals had been registered.
And these 9,800 chemicals were associated with more than 800,000 chemical studies.
This makes it simply the largest toxicological database in the world.
The problem is that these summaries are not in a standardized database format.
So let's say a substance was tested on rabbit eyes,
the infamous rabbit Draize eye test.
Then in this database,
if the substance had a very strong corrosive effect,
it could risk eye rotation severe.
It could read eye corrosion.
It could read severe eye irritant.
It could read it is category 1.
It could be cut dot one.
The One could be Arabian or Roman.
As a toxicologist, we understand this is the same.
A computer has to learn this.
It was done here by Tom was to use natural language processing,
or use the search engines in order to read these data and
convert this into a database which is machine readable.
And this led to a series of papers in the number two issue of our journal Altex,
which is has been cited already by the few times.
And these four articles are describing quite a bit of the toxicological landscape.
I will show you now what consequences this has for our analysis and for Read Across.
We've been discussing already earlier that there's the challenge of negative Read Across.
At the same time,
for any given health effect,
the majority of substances are negative.
These are data from various different sources
which suggest a bit of uncertainty you have to
admit that for example 90 percent of industrial chemicals are not actually toxic.
So it can't kill an animal within 24 hours by even doses up to two gram per kilogram,
97 percent are not skin corrosive,
93 percent are not skin irritants,
and so on and so on.
Which means, most chemicals are not as bad as many people fear,
which is a good thing but which also means we need to
identify substances that do not need extensive testing.
Otherwise, we don't really save a lot.
So we need methods for negatives,
but the big question is,
can we intrapolate no hazard?
And the numbers I just gave you about
the low frequency is also prevalences of
certain toxic effects were very nicely confirmed in Tom's analysis.
As we can see here,
the dark bars show substances for which
sufficient data were available and which did not receive label,
which means they were not toxic.
And the light bars show the fraction of substance which
has a positive label which means the hazard was found,
and what was quite remarkable,
no classification was found has more than 20 percent.
Skin allergy is 20 percent was the most frequent label for industrial chemicals,
17 percent for serious eye damage was the second one following.
Everything else was much more rare.
This shows you the predominance of non-toxic substances for any given health effect.
The database also has a very impressive overlap with other databases.
So our 9,800 substances there's
51 substances overlapping was ToxRefDB of the Environmental Protection Agency.
There's 1,700 substance overlap with Tox 21.
The high throughput screening program of NIH and others.
PubChem which you got to know from Howard Husberg,
there's about 5,000 substance of these listed,
and this means that we have a tremendous opportunity it's
a gold mine for computation in toxicology using our new database.
You can also use the dataset now to understand
how well the animal experiments are actually performing.
This is the rabbit eye test which I used as an example just a second ago,
and what we found in the database is that
many chemicals have been tested a lot of times in rabbit eyes.
As you can see, we found two chemicals which were actually
tested more than 90 times in rabbit eyes.
We found 69 chemicals which were tested 45 times in rabbit eyes and more.
And this enormous ways of animals can only be explained by the fact
that in the past there was no such public repository of information.
A company would not know that somebody else had tested and registered a substance.
And a lot of the information was siloed before
the mutual acceptance of data from 1981 in OECD for example,
tests would have been repeated in each and
every country because they would only accept data produced in their same country.
And for this reason many substance received such extensive testing.
This waste of animals however gives us now
a very objective rate of assessing how well reproducible is such a test.
How often do we get the same result if we are putting a chemical into rabbit eyes?
And this was actually quite shocking.
If the first test resulted it's a corrosive chemical,
it's a severe eye irritant really strong damage to the animal,
and we carried such an experiment would be done a
second time only in 70 percent of the cases the same result would be obtained.
In 20 percent, the eye irritation would be mild,
and then 10 percent even no effect.
This shows an enormous lottery.
And this is data based on 670 chemicals which have been tested extensively.
These are all studies which we carried out under good laboratory practice.
So the highest standards of quality control,
and which is shedding a very,
very questionable light on this animal experiment.
And I think this is a very important step towards objectively
discussing how good does alternative method need to be in order to replace animal test.
To give you another example,
skin sensitization is something we are typically
addressing with the local lymph node assay in mice, LLNA.
This assay is 89 percent reproducible.
It is also one of the few validated animal tests because it is considered a
refinement method which is as has been replacing increasingly the guinea pig test.
But the guinea pig test,
known as the Buehler test,
or the guinea pig maximization test,
as you can see on the upper picture is
only by 77 percent predicting the local lymph node assay.
All of this based on several hundred of chemicals
for each comparison showing you that we are
happily accepting to animal tests which are congruent
only for 77 percent of the chemicals while at the same time,
alternative approaches which are more than 80
percent predictive sometimes up to 90 percent have not been
acceptable by regulators because they
thought that the animal is much more of a gold standard than it actually is.
And now we can use our dataset to do something really new.
What we are using here of what we've produced here is a chemical similarity map,
which means we took
the chemicals and based on the structure using tanimoto similarity in this case,
we produced a map.
Two chemicals which are close to each other here share a lot of structural similarity.
Those which are far away from each other don't share such similarity.
And on this map, we can now blot our hazards.
In this case, skin sensitization,
all of the blue chemicals are chemicals which are producing skin allergies.
And as you can see quite easily,
there is part of the chemical universe where there's no skin sensitizer.
And in some other areas,
there's clusters of skin sensitizers because that's
the chemistry which is producing skin sensitization.
I think a very nice illustration of similar structure means similar biological effects.
And using already a very simple type of predictions by simply asking,
if I pretend I don't have data for one of
my chemicals and use instead the nearest neighbor,
the substance which is closest.
How good can I predict the property?
And as you can see here,
we are reaching accuracies between 80 and 92 percent.
So this is absolutely in the rank order of the reproducibility of the guideline of tests.
And depending on how much similarity we are
requiring 75 percent to 95 percent increasing,
accuracy gets better and better as you would expect.
But also we find less and less chemicals for which
we have a neighbor of this data which is so similar.
And this demonstrates that the population of
our similarity map with more chemicals the availability of
big data is a powerful tool to
map with confidence larger and larger parts of the chemical universe.
So the fact that some 30 to 40,000 chemicals are likely to be registered in
2018 in this database of
the European Chemicals Agency is very promising that in the future,
we will be able to make more and more use of
the chemical similarity map instead of actual testing.
This work produced quite a bit of excitement when we announced this earlier this year.
This is an article showing the similarity map in science, nature,
and Scientific American, followed the same day so we got a lot of interest.
As you can imagine also by those journals which are
more directed towards the chemical industry but also
the Financial Times and others are reporting
on the advances possible with these type of technologies.