Skip to content

Commit b50b746

Browse files
arcalineaJoe Hand
authored andcommitted
Typo fixes (#54)
* Typo fixes and commas * Change ambiguous wording hash --> has
1 parent c639cee commit b50b746

1 file changed

Lines changed: 24 additions & 24 deletions

File tree

papers/dat-paper.txt

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ scientific literature}.
8787

8888
Cloud storage services like S3 ensure availability of data, but they
8989
have a centralized hub-and-spoke networking model and are therefore
90-
limited by their bandwidth, meaning popular files can be come very
90+
limited by their bandwidth, meaning popular files can become very
9191
expensive to share. Services like Dropbox and Google Drive provide
9292
version control and synchronization on top of cloud storage services
9393
which fixes many issues with broken links but rely on proprietary code
@@ -203,7 +203,7 @@ able to discover or communicate with any member of the swarm for that
203203
Dat. Anyone with the public key can verify that messages (such as
204204
entries in a Dat Stream) were created by a holder of the private key.
205205

206-
Every Dat repository has corresponding a private key that kept in your
206+
Every Dat repository has a corresponding private key that is kept in your
207207
home folder and never shared. Dat never exposes either the public or
208208
private key over the network. During the discovery phase the BLAKE2b
209209
hash of the public key is used as the discovery key. This means that the
@@ -327,7 +327,7 @@ UTP source it tries to connect using both protocols. If one connects
327327
first, Dat aborts the other one. If none connect, Dat will try again
328328
until it decides that source is offline or unavailable and then stops
329329
trying to connect to them. Sources Dat is able to connect to go into a
330-
list of known good sources, so that the Internet connection goes down
330+
list of known good sources, so that if the Internet connection goes down
331331
Dat can use that list to reconnect to known good sources again quickly.
332332

333333
If Dat gets a lot of potential sources it picks a handful at random to
@@ -392,7 +392,7 @@ of a repository, and data is stored as normal files in the root folder.
392392
\subsubsection{Metadata Versioning}\label{metadata-versioning}
393393

394394
Dat tries as much as possible to act as a one-to-one mirror of the state
395-
of a folder and all it's contents. When importing files, Dat uses a
395+
of a folder and all its contents. When importing files, Dat uses a
396396
sorted depth-first recursion to list all the files in the tree. For each
397397
file it finds, it grabs the filesystem metadata (filename, Stat object,
398398
etc) and checks if there is already an entry for this filename with this
@@ -421,7 +421,7 @@ for old versions in \texttt{.dat}. Git for example stores all previous
421421
content versions and all previous metadata versions in the \texttt{.git}
422422
folder. Because Dat is designed for larger datasets, if it stored all
423423
previous file versions in \texttt{.dat}, then the \texttt{.dat} folder
424-
could easily fill up the users hard drive inadverntently. Therefore Dat
424+
could easily fill up the user's hard drive inadvertently. Therefore Dat
425425
has multiple storage modes based on usage.
426426

427427
Hypercore registers include an optional \texttt{data} file that stores
@@ -441,7 +441,7 @@ you know the server has the full history.
441441
Registers in Dat use a specific method of encoding a Merkle tree where
442442
hashes are positioned by a scheme called binary in-order interval
443443
numbering or just ``bin'' numbering. This is just a specific,
444-
deterministic way of laying out the nodes in a tree. For example a tree
444+
deterministic way of laying out the nodes in a tree. For example, a tree
445445
with 7 nodes will always be arranged like this:
446446

447447
\begin{verbatim}
@@ -498,7 +498,7 @@ It is possible for the in-order Merkle tree to have multiple roots at
498498
once. A root is defined as a parent node with a full set of child node
499499
slots filled below it.
500500

501-
For example, this tree hash 2 roots (1 and 4)
501+
For example, this tree has 2 roots (1 and 4)
502502

503503
\begin{verbatim}
504504
0
@@ -508,7 +508,7 @@ For example, this tree hash 2 roots (1 and 4)
508508
4
509509
\end{verbatim}
510510

511-
This tree hash one root (3):
511+
This tree has one root (3):
512512

513513
\begin{verbatim}
514514
0
@@ -554,7 +554,7 @@ process. The seven chunks get sorted into a list like this:
554554
bat-1
555555
bat-2
556556
bat-3
557-
cat-1
557+
cat-1
558558
cat-2
559559
cat-3
560560
\end{verbatim}
@@ -583,7 +583,7 @@ for this Dat.
583583

584584
This tree is for the hashes of the contents of the photos. There is also
585585
a second Merkle tree that Dat generates that represents the list of
586-
files and their metadata and looks something like this (the metadata
586+
files and their metadata, and looks something like this (the metadata
587587
register):
588588

589589
\begin{verbatim}
@@ -984,7 +984,7 @@ Ed25519 sign(
984984
\end{verbatim}
985985

986986
The reason we hash all the root nodes is that the BLAKE2b hash above is
987-
only calculateable if you have all of the pieces of data required to
987+
only calculable if you have all of the pieces of data required to
988988
generate all the intermediate hashes. This is the crux of Dat's data
989989
integrity guarantees.
990990

@@ -1022,7 +1022,7 @@ Each entry contains three objects:
10221022
\begin{itemize}
10231023
\tightlist
10241024
\item
1025-
Data Bitfield (1024 bytes) - 1 bit for for each data entry that you
1025+
Data Bitfield (1024 bytes) - 1 bit for each data entry that you
10261026
have synced (1 for every entry in \texttt{data}).
10271027
\item
10281028
Tree Bitfield (2048 bytes) - 1 bit for every tree entry (all nodes in
@@ -1040,8 +1040,8 @@ filesystem. The Tree and Index sizes are based on the Data size (the
10401040
Tree has twice the entries as the Data, odd and even nodes vs just even
10411041
nodes in \texttt{tree}, and Index is always 1/4th the size).
10421042

1043-
To generate the Index, you pairs of 2 bytes at a time from the Data
1044-
Bitfield, check if all bites in the 2 bytes are the same, and generate 4
1043+
To generate the Index, you pair 2 bytes at a time from the Data
1044+
Bitfield, check if all bits in the 2 bytes are the same, and generate 4
10451045
bits of Index metadata~for every 2 bytes of Data (hence how 1024 bytes
10461046
of Data ends up as 256 bytes of Index).
10471047

@@ -1103,7 +1103,7 @@ the SLEEP files.
11031103

11041104
The contents of this file is a series of versions of the Dat filesystem
11051105
tree. As this is a hypercore data feed, it's just an append only log of
1106-
binary data entries. The challenge is representing a tree in an one
1106+
binary data entries. The challenge is representing a tree in a one
11071107
dimensional way to make it representable as a Hypercore register. For
11081108
example, imagine three files:
11091109

@@ -1368,7 +1368,7 @@ register message on the first channel only (metadata).
13681368
\begin{itemize}
13691369
\tightlist
13701370
\item
1371-
\texttt{id} - 32 byte random data used as a identifier for this peer
1371+
\texttt{id} - 32 byte random data used as an identifier for this peer
13721372
on the network, useful for checking if you are connected to yourself
13731373
or another peer more than once
13741374
\item
@@ -1548,7 +1548,7 @@ message Cancel {
15481548
\subsubsection{Data}\label{data-1}
15491549

15501550
Type 9. Sends a single chunk of data to the other peer. You can send it
1551-
in response to a Request or unsolicited on it's own as a friendly gift.
1551+
in response to a Request or unsolicited on its own as a friendly gift.
15521552
The data includes all of the Merkle tree parent nodes needed to verify
15531553
the hash chain all the way up to the Merkle roots for this chunk.
15541554
Because you can produce the direct parents by hashing the chunk, only
@@ -1580,7 +1580,7 @@ message Data {
15801580
optional bytes value = 2;
15811581
repeated Node nodes = 3;
15821582
optional bytes signature = 4;
1583-
1583+
15841584
message Node {
15851585
required uint64 index = 1;
15861586
required bytes hash = 2;
@@ -1611,7 +1611,7 @@ like Git-LFS solve this by using HTTP to download large files, rather
16111611
than the Git protocol. GitHub offers Git-LFS hosting but charges
16121612
repository owners for bandwidth on popular files. Building a distributed
16131613
distribution layer for files in a Git repository is difficult due to
1614-
design of Git Packfiles which are delta compressed repository states
1614+
design of Git Packfiles, which are delta compressed repository states
16151615
that do not easily support random access to byte ranges in previous file
16161616
versions.
16171617

@@ -1704,7 +1704,7 @@ very desirable for many other types of datasets.
17041704

17051705
\subsection{WebTorrent}\label{webtorrent}
17061706

1707-
With WebRTC browsers can now make peer to peer connections directly to
1707+
With WebRTC, browsers can now make peer to peer connections directly to
17081708
other browsers. BitTorrent uses UDP sockets which aren't available to
17091709
browser JavaScript, so can't be used as-is on the Web.
17101710

@@ -1722,7 +1722,7 @@ System}\label{interplanetary-file-system}
17221722
IPFS is a family of application and network protocols that have peer to
17231723
peer file sharing and data permanence baked in. IPFS abstracts network
17241724
protocols and naming systems to provide an alternative application
1725-
delivery platform to todays Web. For example, instead of using HTTP and
1725+
delivery platform to today's Web. For example, instead of using HTTP and
17261726
DNS directly, in IPFS you would use LibP2P streams and IPNS in order to
17271727
gain access to the features of the IPFS platform.
17281728

@@ -1731,7 +1731,7 @@ Registers}\label{certificate-transparencysecure-registers}
17311731

17321732
The UK Government Digital Service have developed the concept of a
17331733
register which they define as a digital public ledger you can trust. In
1734-
the UK government registers are beginning to be piloted as a way to
1734+
the UK, government registers are beginning to be piloted as a way to
17351735
expose essential open data sets in a way where consumers can verify the
17361736
data has not been tampered with, and allows the data publishers to
17371737
update their data sets over time.
@@ -1740,7 +1740,7 @@ The design of registers was inspired by the infrastructure backing the
17401740
Certificate Transparency (Laurie, Langley, and Kasper 2013) project,
17411741
initated at Google, which provides a service on top of SSL certificates
17421742
that enables service providers to write certificates to a distributed
1743-
public ledger. Anyone client or service provider can verify if a
1743+
public ledger. Any client or service provider can verify if a
17441744
certificate they received is in the ledger, which protects against so
17451745
called ``rogue certificates''.
17461746

@@ -1763,7 +1763,7 @@ they need to), as well as a
17631763
\href{https://github.com/bittorrent/bootstrap-dht}{DHT bootstrap}
17641764
server. These discovery servers are the only centralized infrastructure
17651765
we need for Dat to work over the Internet, but they are redundant,
1766-
interchangeable, never see the actual data being shared, anyone can run
1766+
interchangeable, never see the actual data being shared, and anyone can run
17671767
their own and Dat will still work even if they all are unavailable. If
17681768
this happens discovery will just be manual (e.g.~manually sharing
17691769
IP/ports).

0 commit comments

Comments
 (0)