Usage
To become a index provider involves three main parts:
- 1.A DagStore migration
- 2.Indexing announcement of available content
- 3.Content routing verification
The DagStore migration is required to roll out the new CARv2 indexing format,
MultihashIndexSorted
. It regenerates the indices with the latest CARv2 index format that includes the full multihash of CIDs in each DagStore shard. For more inform see DagStore Lotus documentation.The indexing announcement publishes advertisements about the content available for retrieval onto a gossipsub topic, which is listened to by a set of indexer nodes.
The advertisements are then processed by the indexer in order to provide an endpoint that allows clients to lookup where and how to retrieve data for a given multihash. For more information, see Indexer Node Design.
The content routing verification involves checking that the advertisements published by the Lotus instance are ingested by the indexer nodes and for a given stored multihash the correct provider is returned by the indexer node.
Note that ingestion of advertisements by the indexer nodes is progressive; total ingestion time depends on the number of multhihashes being advertised. Therefore, verification needs to be performed with some delay after indexing announcement is made.
The new index format stores the multihash code as well as the digest. The size of DagStore index files as a result are slightly larger. The size increase is negligible.
The index provider integration announces changes to the chain of advertisements onto a gossipsub topic, named
/indexer/ingest/mainnet
, which is propagated through the Lotus daemon onto the indexer nodes.The index provider integration exposes a GraphSync server which serves requests from indexer nodes to sync the list of advertised multihashes provided by the SP.
Note that the server is exposed on the address configured under
IndexProvider
configuration section, with key ListenAddresses
. The advertisements will include the address configured under the same section with key AnnounceAddresses
. You must make sure both sets of address are reachable publicly. For more information see Indexer Provider Configuration.The index provider integration shares a datastore with
markets
process, wrapped under the namespace index-provider
. The datastore entries stored include:- 1.internal mappings used by the index provider engine, and
- 2.cache of chained multihash chunks.
The storage used by the internal mappings is negligible.
The storage used by caching is bound by an LRU cache, the maximum size of which is configured to
1024
, i.e. the number of chunks cached. The maximum length of each chunk configured to 16,384
.The exact storage usage, then, depends on the number of multihashes stored in a single chunk and the size of each multihash, and can be calculated as:
1024 * 16384 * <multihash-length>
For example, caching 128-bit long multihashes will result in chunk sizes of 0.25MiB with maximum cache growth of 256 MiB.
Note that the LRU cache may grow beyond its max size if the generated chain of chunks is longer than the configured
LinkChunkSize
. This is to avoid partial caching of chunks within a single advertisement. The cache expansion is logged in INFO
level at provider/engine
logging subsystem and can be monitored for diagnosis purposes.- 1.Stop the
daemon
miner
andmarkets
processesStop all lotus processes to suspend any changes to the state of DagStore during the rollout.The daemon needs to be stoped in order to roll out a change to the API that protects connections between markets and daemon libp2p node. - 2.Back up the existing DagStore repositoryThe DagStore repository is located at
$LOTUS_MARKETS_PATH/dagstore
by default. Make a copy of that folder. This is necessary for:- 1.verifying that the expected shard indices are re-generated after migration, and
- 2.❗rolling back the changes in case of an error.
- 3.Delete the existing DagStore repositoryDelete the DagStore repository located at
$LOTUS_MARKETS_PATH/dagstore
by default. The absence of the repository signals to the Lotus instance that a DagStore migration is needed and will automatically trigger one uponmarkets
instance start-up. - 4.Rotate any existing Lotus log files and adjust log levelFor easier debugging rotate any existing logs so that the new logs only include output generated by the target release.
- 5.Deploy
daemon
andminer
processesDeploy the index provider tag -master-spx.idxprov.rc-1
to thedaemon
andminer
processes and await until they are fully started and ready.Note: This tag is based off release/v1.15.0, thus it also supports the upcoming OhSnap network v15 upgrade!They are ready when the following commands succeed:lotus-miner storage-deals list
lotus-miner sectors list
- 6.Deploy the target release on
markets
process - 7.Start the
markets
processStart only themarkets
process and wait for the following log line in the markets process logs:dagstore migration completed successfully
This indicates that the list of shards that require initialisation have been queued for processing. See [DagStore First-time Migration](https://lotus.filecoin.io/docs/storage-providers/dagstore/#first-time-migration) for more information. See Indexer Provider Config to customize the configuration of the subsystem that announces indexes. - 8.Configure logging subsystemsMake sure your lotus installation persists the log files for future debugging. Set the log level for the following subsystems on market node to
INFO
:go-legs-gpubsub
provider/engine
dagstore
To do this run the following command:lotus-miner --call-on-markets log set-level --system provider/engine --system go-legs-gpubsub --system dagstore info - 9.Initialise the the DagStore shardsTo start the initialisation of DagStore shards, run:
lotus-miner dagstore initialize-all --concurrency=N
if you run a monolith miner process orlotus-miner --call-on-market dagstore initialize-all --concurrency=N
if you have split your market subsystem.⚠️ Initialization places IO workload on your storage system.N
controls the number of deals that are concurrently initialised. See DagStrore Force Bulk Initialisation docs for more information.Wait for the initialisation to complete. The initialisation time is a factor of the volume of data stored, since it involves re-indexing the data blocks. - 10.Verify re-creation of DagStore repositoryThe successful completion of the previous step should recreate the DagStore repository, located at
$LOTUS_MARKETS_PATH/dagstore
. Navigate to that director. Under the subfolderindex
verify that matching*.full.idx
files can be found for all files under the same sub-directory in the backup of DagStore taken in step 2. - 11.✨ Announce all indices to the indexersTo announce all the indices in bulk to the indexers, run:
lotus-miner index announce-all
if you run a monolith miner process orlotus-miner --call-on-market index announce-all
if you have split your market subsystem.This command generates advertisements and publishes indices onto the indexer gossipsub channel. In the markets logs look for a series of logs that includedeal announcement sent to index provider
. You should see one such log per deal. The log line also includes advertisement CID, the deal proposal CID to which it belongs and the shardKey from which its multihash entries are generated. The logs should also include logs that provide information about the number of multihash entries each advertisement includes. For example:deal announcement sent to index provider {"advertisementCid": "baguqeeqqvr2irdrq45d7npj7elogzpaaam", "shard-key": "baga6ea4seaqegic2h4qoao4urcwhin7tgwlb4cguqymtriheqoyjjaabz6viegq", "proposalCid": "bafyreihhqszkcv3egsb7xkuhyswqjdy3oa2kboi2zjvrbkuj3jgq2g4d4m"}Generated linked chunks of multihashes {"totalMhCount": 32449, "chunkCount": 2}Note that the bulk advertisement only announces deals that are not expired and handed over to the sealing subsystem. The expired deals will not be advertised. For any remaining deals the advertisement will occur after they are handed over to the sealing subsystem.Wait for the bulk indexing announcement to complete. The bulk announcement is complete whenfinished announcing active deals to index provider
is logged. - 12.Verify indices in DagStore repository are ingested by indexer nodesTo verify ingestion, download and install the latest
provider
CLI tool from:The built binaries can be found under assets attached for each target platform.Once installed, download the following script.Script below to be provided as a downloadable.sh
file; for now pasted below for review purposes.#!/usr/bin/env shMINER_ID="${1:?miner peer ID must be specified as the first argument}"DAGSTORE_REPO="${2:?dagstore repo location must be specified as the second argument}"SAMPLING_PROB="${3-0.05}"echo "MINER_ID: ${MINER_ID}"echo "DAGSTORE_REPO: ${DAGSTORE_REPO}"echo "SAMPLING_PROB: ${SAMPLING_PROB}"echo ""for idx in ${DAGSTORE_REPO}/index/*.full.idxdoecho "Verifying ${idx}"provider verify-ingest \--to cid.contact:80 \--print-unindexed-mhs \--provider-id "${MINER_ID}" \--from-car-index "${idx}" \--sampling-prob "${SAMPLING_PROB}"echo ""doneThe script takes two mandatory argument:- Miner peer ID as the first argument, and
- Path to dagstore repository as the second argument, e.g.
$LOTUS_MARKETS_PATH/dagstore
A third argument may optionally be specified as a number between0.0>=1.0
which sets the selection probability of the multihash sample that is verified for ingestion, set to 5% by default.You can find out what your peer ID is usinglotus-miner net id
This script will:- iterate over all index files that match
*.full.idx
name, - selects 5% of the multihashes at random from each index, and
- verifies that the indexer node has those multihash associated with the given
MINER_ID
as the provider.
A verification result is printed for each file. Verify that verification is successful for each of the files. Seeprovider verify-ingest -h
for example output.You can adjust the values under theIndexProvider
session in the config.toml of your market process to configure indexes announcement to the indexer.If the session doesn't exist, you can manually add it:[IndexProvider]# The maximum number of multihash chunk links that index provider cache can store before# LRU eviction. If chunks belonging to a single advertisement are larger than the cache can# hold, the cache is resized to be able to hold all links. The actual disk usage depends on# LinkedChunkSize and the length of multihashes. For example, for 128-bit long multihashes# with the default LinkedChunkSize, and LinkCacheSize the cache size can grow to 256MiB.## type int# env var: LOTUS_INDEXPROVIDER_LINKCACHESIZE#LinkCacheSize = 1024# The number of multihashes in each chunk of the# advertised multihash entries linked list. If multihashes are 128-bit, then# setting LinkedChunkSize = 16384 will result in blocks of 0.25MiB when# full.## type int# env var: LOTUS_INDEXPROVIDER_LINKEDCHUNKSIZE#LinkedChunkSize = 16384# The gossipsub topic name used to publish change to the advertised content.## env var: LOTUS_INDEXPROVIDER_PUBSUBTOPIC#PubSubTopic = "/indexer/ingest/mainnet"# Whether to purge all cached entries on start-up.## env var: LOTUS_INDEXPROVIDER_PURGELINKCACHE#PurgeLinkCache = false# Binding address for the libp2p host contacted by indexer nodes to sync the list of advertised# multihashes. Note that when port is set to 0 a random port is generated at runtime and may be# different on every restart. The format of the strings specified must conform to multiaddress;# see https://multiformats.io/multiaddr/## type: []string# env var: LOTUS_INDEXPROVIDER_LISTENADDRESSES#ListenAddresses = ["/ip4/0.0.0.0/tcp/0", "/ip6/::/tcp/0"]# The address the endpoints at which the data associated to the advertised# multihashes can be retrieved. If not specified, the ListenAddresses are used instead. The format# of the strings specified must conform to multiaddress; see https://multiformats.io/multiaddr/## type: []string# env var: LOTUS_INDEXPROVIDER_ANNOUNCEADDRESSES#AnnounceAddresses = []# The maximum number of simultaneous requests syncing the list of advertised multihashes between# the indexers and the index provider.## type: uint64# env var: LOTUS_INDEXPROVIDER_MAXSIMULTANEOUSTRANSFERS#MaxSimultaneousTransfers = 20
Last modified 1yr ago