Benchmarks

Day 8 - When we ran some benchmark for the process of index creation and hosting.

Benchmarking the loading scripts

From a file named arxiv-metadata-oai-snapshot.json containing metadata and abstracts of about 2M papers in 153 different scientific categories, we generated partial indexes and evaluated how it can run in production considering:

  • machine provisioning needed,
  • how to update regularly from the arXiv snapshots,
  • volume of data.

Generating Embeddings

First, we evaluated the Jupyter notebooks from Redis demo code RedisVentures/redis-arXiv-search.

Model Machine Time
arxiv-embeddings.ipynb Apple M1 Pro 8-core 17min
arxiv-embeddings.ipynb Saturn Cloud T4-XLarge 4-cores 4min
single-gpu-arxiv-embeddings.ipynb T4-XLarge 4-cores, saturn-python-rapids image 30min

To then load the Pickle index to Redis is easy from a normal desktop machine.

Model Machine Time
arxiv_embeddings_10000.pkl Apple M1 Pro 8-core 6min

Profiling the code

After refactoring to Python scripts, we used Delgan/loguru to add logs, tqdm/tqdm to track troughput and pythonprofilers/memory_profiler to find pain points and we measured that,

For generate_index.py script, the model.encode(sentence) was the largest pain point, and finding the optimal way to do it was important.

$ python3 -m memory_profiler generate_index.py \
  --year_month=200811 \
  --input_path="arxiv-metadata-oai-snapshot.json" \
  --output_path="arxiv/cutoff=200811/output.pkl" \
  --model_name="sentence-transformers/all-MiniLM-L12-v2"

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    33    292.0 MiB    292.0 MiB           1   @profile
    34                                         def run(
    35                                             year_month,
    36                                             input_path="arxiv-metadata-oai-snapshot.json",
    37                                             output_path="arxiv_embeddings_10000.pkl",
    38                                             model_name="sentence-transformers/all-mpnet-base-v2",
    39                                         ):
    40                                             """Generate Embeddings and Create a File Index."""
    41
    42    292.0 MiB      0.0 MiB           1       logger.info(f"Reading papers for {year_month}...")
    43    315.4 MiB     23.4 MiB           1       df = pd.DataFrame(get_papers(input_path, year_month))
    44
    45    315.9 MiB      0.4 MiB           1       logger.info("Getting categories predictions")
    46                                             # df["predicted_categories"] = get_paper_classification_predictions(
    47                                             #     df["title"] + " " + df["abstract"], top_k=3
    48                                             # )
    49
    50                                             # https://www.sbert.net/docs/usage/semantic_textual_similarity.html
    51    447.0 MiB    131.1 MiB           1       model = SentenceTransformer(model_name)
    52
    53    447.0 MiB      0.0 MiB           1       logger.info("Creating embeddings from title and abstract...")
    54    447.0 MiB      0.0 MiB           1       logger.info(model_name)
    55
    56    724.2 MiB    103.8 MiB           2       df["vector"] = df.progress_apply(
    57    620.4 MiB -3613776.1 MiB       59149           lambda x: _featurize(model, x["title"], x["abstract"]), axis=1
    58                                             )
    59    729.7 MiB      5.5 MiB           1       df = df.reset_index().drop("index", axis=1)
    60
    61    730.0 MiB      0.3 MiB           1       df = df.reset_index().drop("index", axis=1)
    62
    63    731.5 MiB      1.5 MiB           1       logger.info("Exporting to pickle file...")
    64    731.5 MiB      0.0 MiB           1       with open(output_path, "wb") as f:
    65    917.0 MiB    185.5 MiB           1           data = pickle.dumps(df)
    66    919.8 MiB      2.7 MiB           1           f.write(data)

As we can see using pandas to load and store data wasn't optimal because of its memory overhead.

$ python3 -m memory_profiler load_data.py \
  --concurrency_level=2 \
  --separator="|"  \
  --vector_size=384 \
  --reset_db=False \
  --embeddings_path="arxiv/cutoff=200811/output.pkl"

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
    98     67.0 MiB     67.0 MiB           1   @profile
    99                                         def run(
   ...
   107
   108     67.2 MiB      0.2 MiB           1       config = get_settings()
   109
   110     67.2 MiB      0.0 MiB           1       if reset_db:
   111                                                 logger.info(f"TODO {reset_db}")
   112
   113     67.2 MiB      0.0 MiB           2       Paper.Meta.database = get_redis_connection(
   114     67.2 MiB      0.0 MiB           1           url=config.get_redis_url(), decode_responses=True
   115                                             )
   116     67.2 MiB      0.0 MiB           1       Paper.Meta.global_key_prefix = "THM"
   117     67.2 MiB      0.0 MiB           1       Paper.Meta.model_key_prefix = "Paper"
   118
   119     67.2 MiB      0.0 MiB           1       redis_conn = redis.from_url(config.get_redis_url())
   120
   121    165.1 MiB     97.8 MiB           2       asyncio.run(
   122     67.2 MiB      0.0 MiB           2           load_all_data(
   ...


    51   67.969 MiB   67.969 MiB           1   @profile
    52                                         async def load_all_data(
   ...
    62   67.984 MiB    0.016 MiB           1       logger.info("Loading papers...")
    63  607.562 MiB  539.578 MiB           1       papers = read_paper_df(embeddings_path).head(1)
    64  152.219 MiB -455.344 MiB           1       papers = papers.to_dict("records")
    65
    66  152.219 MiB    0.000 MiB           1       logger.info("Writing to Redis...")
    67  152.578 MiB  304.844 MiB           4       await gather_with_concurrency(
    68  152.219 MiB    0.000 MiB           2           redis_conn, concurrency_level, separator, vector_size, *papers

Load testing the HTTP Server

Using wg/wrk we have made a few tests to see how much HTTP server and Redis could handle if many clients connected to it.

$ wrk -t4 -c20 -d30s https://thm-cli.community.saturnenterprise.io/api/docs

Running 30s test @ https://thm-cli.community.saturnenterprise.io/api/docs
  4 threads and 20 connections

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   105.53ms   21.50ms 461.06ms   88.24%
    Req/Sec    47.63      7.97    80.00     79.85%
  5629 requests in 30.11s, 5.74MB read
Requests/sec:    186.98
Transfer/sec:    195.19KB
$ wrk -t4 -c10 -d30s --script benchmark/post_similar.lua https://thm-cli.community.saturnenterprise.io/api/v1/paper/vectorsearch/text
Running 30s test @ https://thm-cli.community.saturnenterprise.io/api/v1/paper/vectorsearch/text
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   352.05ms   52.12ms 697.77ms   89.43%
    Req/Sec     5.96      2.66    10.00     50.36%
  672 requests in 30.04s, 12.83MB read
Requests/sec:     22.37
Transfer/sec:    437.32KB
$ wrk -t4 -c10 -d30s --script benchmark/post_text.lua https://thm-cli.community.saturnenterprise.io/api/v1/paper/vectorsearch/text/user
Running 30s test @ https://thm-cli.community.saturnenterprise.io/api/v1/paper/vectorsearch/text/user
  4 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   575.41ms  171.71ms   1.48s    89.46%
    Req/Sec     4.16      3.14    10.00     58.23%
  408 requests in 30.06s, 8.60MB read
Requests/sec:     13.57
Transfer/sec:    292.92KB

Discussing the Cost of Hosting

Pricing of the stack

  • Redis Cloud Enterprise: $0.881/hr = USD 600/month
  • Saturn Cloud Notebooks and Deployment: $0.21/hour = USD 150/month

The total is about 750 USD/month. You might be able to decrease costs a bit using infrastructure as a service instead of platform as a service, but more DevOps skills would be needed to configure cloud accounts on GCP or AWS, for example.

links