内容简介
书中包含以下内容:
?深入分析你已经在使用的系统,并学习如何更高效地使用和运维这些系统
?通过识别不同工具的优缺点,作出更明智的决策
?了解一致性、可伸缩性、容错性和复杂度之间的权衡
?理解分布式系统研究,这些研究是现代数据库构建的基石
?走到一些主流在线服务的幕后,学习它们的架构
作者简介
MartinKleppmann,是英国剑桥大学的一名分布式系统研究员。在此之前他曾是软件工程师和企业家,在Linkedin和Rapportive工作过,从事大规模数据基础设施相关的工作。Martin经常在大会做演讲,写博客,也是开源贡献者。精彩书评
“这《设计数据密集型应用(影印版)》太棒了,它在分布式系统理论和实际工程之间的巨大鸿沟上架起了一座桥梁。多希望十年前就能读到这《设计数据密集型应用(影印版)》,那么这些年来我犯的很多错误就都能避免了。”
——JayKreps(ApacheKafka创始人,ConfluentCEO)
“这是一本软件工程师的必读之作。《设计数据密集型应用》是能够连接理论和实践的稀有资料,它能帮助开发者在设计和实现数据基础设施及系统的时候作出明智的决策。”
——KevinScoot(微软CTO)
目录
PartI.FoundationsofDataSystems
1.Reliable,Scalable,andMaintainableApplications3
ThinkingAboutDataSystems4
Reliability6
HardwareFaults7
SoftwareErrors8
HumanErrors9
HowImportantIsReliability?10
Scalability10
DescribingLoad11
DescribingPerformance13
ApproachesforCopingwithLoad17
Maintainability18
Operability:MakingLifeEasyforOperations19
Simplicity:ManagingComplexity20
Evolvability:MakingChangeEasy21
Summary22
2.DataModelsandQueryLanguages27
RelationalModelVersusDocumentModel28
TheBirthofNoSQL29
TheObject-RelationalMismatch29
Many-to-OneandMany-to-ManyRelationships33
AreDocumentDatabasesRepeatingHistory?36
RelationalVersusDocumentDatabasesToday38
QueryLanguagesforData42
DeclarativeQueriesontheWeb44
MapReduceQuerying46
Graph-LikeDataModels49
PropertyGraphs50
TheCypherQueryLanguage52
GraphQueriesinSQL53
Triple-StoresandSPARQL55
TheFoundation:Datalog60
Summary63
3.StorageandRetrieval69
DataStructuresThatPowerYourDatabase70
HashIndexes72
SSTablesandLSM-Trees76
B-Trees79
ComparingB-TreesandLSM-Trees83
OtherIndexingStructures85
TransactionProcessingorAnalytics?90
DataWarehousing91
StarsandSnowflakes:SchemasforAnalytics93
Column-OrientedStorage95
ColumnCompression97
SortOrderinColumnStorage99
WritingtoColumn-OrientedStorage101
Aggregation:DataCubesandMaterializedViews101
Summary103
4.EncodingandEvolution111
FormatsforEncodingData112
Language-SpecificFormats113
JSON,XML,andBinaryVariants114
ThriftandProtocolBuffers117
Avro122
TheMeritsofSchemas127
ModesofDataflow128
DataflowThroughDatabases129
DataflowThroughServices:RESTandRPC131
Message-PassingDataflow136
Summary139
PartII.DistributedData
5.Replication151
LeadersandFollowers152
SynchronousVersusAsynchronousReplication153
SettingUpNewFollowers155
HandlingNodeOutages156
ImplementationofReplicationLogs158
ProblemswithReplicationLag161
ReadingYourOwnWrites162
MonotonicReads164
ConsistentPrefixReads165
SolutionsforReplicationLag167
Multi-LeaderReplication168
UseCasesforMulti-LeaderReplication168
HandlingWriteConflicts171
Multi-LeaderReplicationTopologies175
LeaderlessReplication177
WritingtotheDatabaseWhenaNodeIsDown177
LimitationsofQuorumConsistency181
SloppyQuorumsandHintedHandoff183
DetectingConcurrentWrites184
Summary192
6.Partitioning199
PartitioningandReplication200
PartitioningofKey-ValueData201
PartitioningbyKeyRange202
PartitioningbyHashofKey203
SkewedWorkloadsandRelievingHotSpots205
PartitioningandSecondaryIndexes206
PartitioningSecondaryIndexesbyDocument206
PartitioningSecondaryIndexesbyTerm208
RebalancingPartitions209
StrategiesforRebalancing210
Operations:AutomaticorManualRebalancing213
RequestRouting214
ParallelQueryExecution216
Summary216
7.Transactions221
TheSlipperyConceptofaTransaction222
TheMeaningofACID223
Single-ObjectandMulti-ObjectOperations228
WeakIsolationLevels233
ReadCommitted234
SnapshotIsolationandRepeatableRead237
PreventingLostUpdates242
WriteSkewandPhantoms246
Serializability251
ActualSerialExecution252
Two-PhaseLocking(2PL)257
SerializableSnapshotIsolation(SSI)261
Summary266
8.TheTroublewithDistributedSystems273
FaultsandPartialFailures274
CloudComputingandSupercomputing275
UnreliableNetworks277
NetworkFaultsinPractice279
DetectingFaults280
TimeoutsandUnboundedDelays281
SynchronousVersusAsynchronousNetworks284
UnreliableClocks287
MonotonicVersusTime-of-DayClocks288
ClockSynchronizationandAccuracy289
RelyingonSynchronizedClocks291
ProcessPauses295
Knowledge,Truth,andLies300
TheTruthIsDefinedbytheMajority300
ByzantineFaults304
SystemModelandReality306
Summary310
9.ConsistencyandConsensus321
ConsistencyGuarantees322
Linearizability324
WhatMakesaSystemLinearizable?325
RelyingonLinearizability330
ImplementingLinearizableSystems332
TheCostofLinearizability335
OrderingGuarantees339
OrderingandCausality339
SequenceNumberOrdering343
TotalOrderBroadcast348
DistributedTransactionsandConsensus352
AtomicCommitandTwo-PhaseCommit(2PC)354
DistributedTransactionsinPractice360
Fault-TolerantConsensus364
MembershipandCoordinationServices370
Summary373
PartIII.DerivedData
10.BatchProcessing389
BatchProcessingwithUnixTools391
SimpleLogAnalysis391
TheUnixPhilosophy394
MapReduceandDistributedFilesystems397
MapReduceJobExecution399
Reduce-SideJoinsandGrouping403
Map-SideJoins408
TheOutputofBatchWorkflows411
ComparingHadooptoDistributedDatabases414
BeyondMapReduce419
MaterializationofIntermediateState419
GraphsandIterativeProcessing424
High-LevelAPIsandLanguages426
Summary429
11.StreamProcessing439
TransmittingEventStreams440
MessagingSystems441
PartitionedLogs446
DatabasesandStreams451
KeepingSystemsinSync452
ChangeDataCapture454
EventSourcing457
State,Streams,andImmutability459
ProcessingStreams464
UsesofStreamProcessing465
ReasoningAboutTime468
StreamJoins472
FaultTolerance476
Summary479
12.TheFutureofDataSystems489
DataIntegration490
CombiningSpecializedToolsbyDerivingData490
BatchandStreamProcessing494
UnbundlingDatabases499
ComposingDataStorageTechnologies499
DesigningApplicationsAroundDataflow504
ObservingDerivedState509
AimingforCorrectness515
TheEnd-to-EndArgumentforDatabases516
EnforcingConstraints521
TimelinessandIntegrity524
Trust,butVerify528
DoingtheRightThing533
PredictiveAnalytics533
PrivacyandTracking536
Summary543
Glossary553
Index559