@@ -87,7 +87,7 @@ scientific literature}.
8787
8888Cloud storage services like S3 ensure availability of data, but they
8989have a centralized hub-and-spoke networking model and are therefore
90- limited by their bandwidth, meaning popular files can be come very
90+ limited by their bandwidth, meaning popular files can become very
9191expensive to share. Services like Dropbox and Google Drive provide
9292version control and synchronization on top of cloud storage services
9393which fixes many issues with broken links but rely on proprietary code
@@ -203,7 +203,7 @@ able to discover or communicate with any member of the swarm for that
203203Dat. Anyone with the public key can verify that messages (such as
204204entries in a Dat Stream) were created by a holder of the private key.
205205
206- Every Dat repository has corresponding a private key that kept in your
206+ Every Dat repository has a corresponding private key that is kept in your
207207home folder and never shared. Dat never exposes either the public or
208208private key over the network. During the discovery phase the BLAKE2b
209209hash of the public key is used as the discovery key. This means that the
@@ -327,7 +327,7 @@ UTP source it tries to connect using both protocols. If one connects
327327first, Dat aborts the other one. If none connect, Dat will try again
328328until it decides that source is offline or unavailable and then stops
329329trying to connect to them. Sources Dat is able to connect to go into a
330- list of known good sources, so that the Internet connection goes down
330+ list of known good sources, so that if the Internet connection goes down
331331Dat can use that list to reconnect to known good sources again quickly.
332332
333333If Dat gets a lot of potential sources it picks a handful at random to
@@ -392,7 +392,7 @@ of a repository, and data is stored as normal files in the root folder.
392392\subsubsection{Metadata Versioning}\label{metadata-versioning}
393393
394394Dat tries as much as possible to act as a one-to-one mirror of the state
395- of a folder and all it's contents. When importing files, Dat uses a
395+ of a folder and all its contents. When importing files, Dat uses a
396396sorted depth-first recursion to list all the files in the tree. For each
397397file it finds, it grabs the filesystem metadata (filename, Stat object,
398398etc) and checks if there is already an entry for this filename with this
@@ -421,7 +421,7 @@ for old versions in \texttt{.dat}. Git for example stores all previous
421421content versions and all previous metadata versions in the \texttt{.git}
422422folder. Because Dat is designed for larger datasets, if it stored all
423423previous file versions in \texttt{.dat}, then the \texttt{.dat} folder
424- could easily fill up the users hard drive inadverntently . Therefore Dat
424+ could easily fill up the user's hard drive inadvertently . Therefore Dat
425425has multiple storage modes based on usage.
426426
427427Hypercore registers include an optional \texttt{data} file that stores
@@ -441,7 +441,7 @@ you know the server has the full history.
441441Registers in Dat use a specific method of encoding a Merkle tree where
442442hashes are positioned by a scheme called binary in-order interval
443443numbering or just ``bin'' numbering. This is just a specific,
444- deterministic way of laying out the nodes in a tree. For example a tree
444+ deterministic way of laying out the nodes in a tree. For example, a tree
445445with 7 nodes will always be arranged like this:
446446
447447\begin{verbatim}
@@ -498,7 +498,7 @@ It is possible for the in-order Merkle tree to have multiple roots at
498498once. A root is defined as a parent node with a full set of child node
499499slots filled below it.
500500
501- For example, this tree hash 2 roots (1 and 4)
501+ For example, this tree has 2 roots (1 and 4)
502502
503503\begin{verbatim}
5045040
@@ -508,7 +508,7 @@ For example, this tree hash 2 roots (1 and 4)
5085084
509509\end{verbatim}
510510
511- This tree hash one root (3):
511+ This tree has one root (3):
512512
513513\begin{verbatim}
5145140
@@ -554,7 +554,7 @@ process. The seven chunks get sorted into a list like this:
554554bat-1
555555bat-2
556556bat-3
557- cat-1
557+ cat-1
558558cat-2
559559cat-3
560560\end{verbatim}
@@ -583,7 +583,7 @@ for this Dat.
583583
584584This tree is for the hashes of the contents of the photos. There is also
585585a second Merkle tree that Dat generates that represents the list of
586- files and their metadata and looks something like this (the metadata
586+ files and their metadata, and looks something like this (the metadata
587587register):
588588
589589\begin{verbatim}
@@ -984,7 +984,7 @@ Ed25519 sign(
984984\end{verbatim}
985985
986986The reason we hash all the root nodes is that the BLAKE2b hash above is
987- only calculateable if you have all of the pieces of data required to
987+ only calculable if you have all of the pieces of data required to
988988generate all the intermediate hashes. This is the crux of Dat's data
989989integrity guarantees.
990990
@@ -1022,7 +1022,7 @@ Each entry contains three objects:
10221022\begin{itemize}
10231023\tightlist
10241024\item
1025- Data Bitfield (1024 bytes) - 1 bit for for each data entry that you
1025+ Data Bitfield (1024 bytes) - 1 bit for each data entry that you
10261026 have synced (1 for every entry in \texttt{data}).
10271027\item
10281028 Tree Bitfield (2048 bytes) - 1 bit for every tree entry (all nodes in
@@ -1040,8 +1040,8 @@ filesystem. The Tree and Index sizes are based on the Data size (the
10401040Tree has twice the entries as the Data, odd and even nodes vs just even
10411041nodes in \texttt{tree}, and Index is always 1/4th the size).
10421042
1043- To generate the Index, you pairs of 2 bytes at a time from the Data
1044- Bitfield, check if all bites in the 2 bytes are the same, and generate 4
1043+ To generate the Index, you pair 2 bytes at a time from the Data
1044+ Bitfield, check if all bits in the 2 bytes are the same, and generate 4
10451045bits of Index metadata~for every 2 bytes of Data (hence how 1024 bytes
10461046of Data ends up as 256 bytes of Index).
10471047
@@ -1103,7 +1103,7 @@ the SLEEP files.
11031103
11041104The contents of this file is a series of versions of the Dat filesystem
11051105tree. As this is a hypercore data feed, it's just an append only log of
1106- binary data entries. The challenge is representing a tree in an one
1106+ binary data entries. The challenge is representing a tree in a one
11071107dimensional way to make it representable as a Hypercore register. For
11081108example, imagine three files:
11091109
@@ -1368,7 +1368,7 @@ register message on the first channel only (metadata).
13681368\begin{itemize}
13691369\tightlist
13701370\item
1371- \texttt{id} - 32 byte random data used as a identifier for this peer
1371+ \texttt{id} - 32 byte random data used as an identifier for this peer
13721372 on the network, useful for checking if you are connected to yourself
13731373 or another peer more than once
13741374\item
@@ -1548,7 +1548,7 @@ message Cancel {
15481548\subsubsection{Data}\label{data-1}
15491549
15501550Type 9. Sends a single chunk of data to the other peer. You can send it
1551- in response to a Request or unsolicited on it's own as a friendly gift.
1551+ in response to a Request or unsolicited on its own as a friendly gift.
15521552The data includes all of the Merkle tree parent nodes needed to verify
15531553the hash chain all the way up to the Merkle roots for this chunk.
15541554Because you can produce the direct parents by hashing the chunk, only
@@ -1580,7 +1580,7 @@ message Data {
15801580 optional bytes value = 2;
15811581 repeated Node nodes = 3;
15821582 optional bytes signature = 4;
1583-
1583+
15841584 message Node {
15851585 required uint64 index = 1;
15861586 required bytes hash = 2;
@@ -1611,7 +1611,7 @@ like Git-LFS solve this by using HTTP to download large files, rather
16111611than the Git protocol. GitHub offers Git-LFS hosting but charges
16121612repository owners for bandwidth on popular files. Building a distributed
16131613distribution layer for files in a Git repository is difficult due to
1614- design of Git Packfiles which are delta compressed repository states
1614+ design of Git Packfiles, which are delta compressed repository states
16151615that do not easily support random access to byte ranges in previous file
16161616versions.
16171617
@@ -1704,7 +1704,7 @@ very desirable for many other types of datasets.
17041704
17051705\subsection{WebTorrent}\label{webtorrent}
17061706
1707- With WebRTC browsers can now make peer to peer connections directly to
1707+ With WebRTC, browsers can now make peer to peer connections directly to
17081708other browsers. BitTorrent uses UDP sockets which aren't available to
17091709browser JavaScript, so can't be used as-is on the Web.
17101710
@@ -1722,7 +1722,7 @@ System}\label{interplanetary-file-system}
17221722IPFS is a family of application and network protocols that have peer to
17231723peer file sharing and data permanence baked in. IPFS abstracts network
17241724protocols and naming systems to provide an alternative application
1725- delivery platform to todays Web. For example, instead of using HTTP and
1725+ delivery platform to today's Web. For example, instead of using HTTP and
17261726DNS directly, in IPFS you would use LibP2P streams and IPNS in order to
17271727gain access to the features of the IPFS platform.
17281728
@@ -1731,7 +1731,7 @@ Registers}\label{certificate-transparencysecure-registers}
17311731
17321732The UK Government Digital Service have developed the concept of a
17331733register which they define as a digital public ledger you can trust. In
1734- the UK government registers are beginning to be piloted as a way to
1734+ the UK, government registers are beginning to be piloted as a way to
17351735expose essential open data sets in a way where consumers can verify the
17361736data has not been tampered with, and allows the data publishers to
17371737update their data sets over time.
@@ -1740,7 +1740,7 @@ The design of registers was inspired by the infrastructure backing the
17401740Certificate Transparency (Laurie, Langley, and Kasper 2013) project,
17411741initated at Google, which provides a service on top of SSL certificates
17421742that enables service providers to write certificates to a distributed
1743- public ledger. Anyone client or service provider can verify if a
1743+ public ledger. Any client or service provider can verify if a
17441744certificate they received is in the ledger, which protects against so
17451745called ``rogue certificates''.
17461746
@@ -1763,7 +1763,7 @@ they need to), as well as a
17631763\href{https://github.com/bittorrent/bootstrap-dht}{DHT bootstrap}
17641764server. These discovery servers are the only centralized infrastructure
17651765we need for Dat to work over the Internet, but they are redundant,
1766- interchangeable, never see the actual data being shared, anyone can run
1766+ interchangeable, never see the actual data being shared, and anyone can run
17671767their own and Dat will still work even if they all are unavailable. If
17681768this happens discovery will just be manual (e.g.~manually sharing
17691769IP/ports).
0 commit comments