UCSC Genome Browser
UCSC Genome Browser
UCSC Genome Browser
UCSC Genome Browser
UCSC Genome Browser
Complete human genome
Transposable Elements
First Layer of Genome Annotation
Epigenetics
Second Layer of Genome Annotation
Second Layer of Genome Annotation
Second Layer of Genome Annotation
Second Layer of Genome Annotation
Second Layer of Genome Annotation
Data Accumulation
ENCODE: Encyclopedia of DNA Elements
Digital Universe
Digital Universe
Что делать?
Что получилось? (Success Stories)
СКРЫТЫЕ ЦЕПИ МАРКОВА
Gene Prediction
Promoter prediction
We have many experimental genome-wide annotations
Annotations under different conditions
Как много данных?
Открытые вопросы
16.07M
Категория: МедицинаМедицина

UCSC Genome Browser

1.

Школа «Науки о данных»
Трек «Биоинформатика» 25-26 апреля

2. UCSC Genome Browser

3. UCSC Genome Browser

4. UCSC Genome Browser

5. UCSC Genome Browser

6. UCSC Genome Browser

7. Complete human genome

8. Transposable Elements

• 45% of the human genome is occupied by
transposons and transposon-like repetitive
elements.
• Barbara McClintock (1902-1992) in 50s.
• Nobel prize in 1983

9.

Class II
Class I – retrotransposons (via RNA intermediate)
Схожесть с ретровирусами
Retrovirus reverse transcripiton
http://www.youtube.com/watch?v=eS1GODinO8w

10.

11.

12.

13.

Active
Non-Active

14. First Layer of Genome Annotation

15. Epigenetics

16. Second Layer of Genome Annotation

17. Second Layer of Genome Annotation

18. Second Layer of Genome Annotation

19. Second Layer of Genome Annotation

20. Second Layer of Genome Annotation

21.

Third Layer of Genome Annotation

22. Data Accumulation

23. ENCODE: Encyclopedia of DNA Elements

24. Digital Universe

• Like the Physical Universe the Digital
Universe is also expanding but much
faster doubling every two years – and by
2020 will be 44 zettabytes (10^ 21)
• Every second a new 205 000 bytes come
to being
• At the end of this lecture the digital
universe will grow by 2 214 000 000
bytes or 2.2 GB.

25. Digital Universe

Data Universe Will Expand To 44 Trillion GBs By 2020

26. Что делать?

27. Что получилось? (Success Stories)

28. СКРЫТЫЕ ЦЕПИ МАРКОВА

0
1
1
1

1
2
2
2

2



K
K
K
x1
x2
x3


0
K
xL
H1
H2
Hi
HL-1
HL
X1
X2
Xi
XL-1
XL

29. Gene Prediction

30.

GeneMark
HMM
E0
E1
E2
I0
I1
I2
Einit
3’ UTR
Esngl
5’ UTR
forward strand
Eterm
E- exons
I- introns
single exon
5’ UTRs
3’ UTRs
P- promoter
region polyA site
N- intergenic
region
polyA
P
N
backward strand
P
polyA
Esngl
5’ UTR
Einit
3’ UTR
Eterm
I0
I1
I2
E0
E1
E2
30

31. Promoter prediction

McPromoter
• Hidden Markov model with six interpolated
Markov chain submodels
– upstream 1 and 2,
– TATA box, spacer,
– Initiator
– downstream.
– Gaussian densities of DNA physicochemical
properties.
• Neural network classifier

32.

Nature 2010
• predict tissue-dependent changes in
alternative splicing for thousands of exons.
• 1,014 features: known motifs, new motifs,
short motifs and features describing transcript
structure
• trained on RNA-seq data
• single-layer logistic Bayesian network or
neural network, or a weighted combination of
single-layer decision trees.

33.

• Genome intrinsic organization
can explain ,50% of the in vivo
nucleosome positions
• Probabilistic nucleosome–DNA
interaction model - built on
dinucleotide distrubution
• Thermodynamic model for
predicting nucleosome positions
genome-wide.

34.

35.

Schematic overview of epigenetic regulatory mechanisms.
Yonggang Zhou et al. Circ Res. 2011;109:1067-1081
Copyright © American Heart Association, Inc. All rights reserved.

36.

Random Forest model predicts cancer mutation densities from epigenomic
mark ups

37.

38. We have many experimental genome-wide annotations

Question 1: Are different
annottaions correlated? To what
extent?
Question 2: Can we find patterns
in annotations?
(Unsupervised learning)

39. Annotations under different conditions

40. Как много данных?

• Roadmap Epigenomics
~ 3 000 полногеномных данных
• ENCODE Encyclopedia of Genomic Elements
~ 9000 полногеномных данных
• International Cancer Genome Consortium
~ 20 000 patients (~50 типов рака)
• The Cancer Genome Atlas
~ patients 11 000 (~33 типа рака)

41. Открытые вопросы

• Какие участки кода работают одновременно?
• Как переключать режимы работы клетки?
• Как перепрограммируется код для разных
типов тканей?
• Сколько механизмов регуляции существует в
клетках (надежда на универсальность)?

42.

Спасибо за внимание
English     Русский Правила