How to optimize a sort based on “latest” related model

Multi tool use
How to optimize a sort based on “latest” related model
So say we have two models
class Product(models.Model):
""" A model representing a product in a website. Has new datapoints referencing this as a foreign key daily """
name = models.CharField(null=False, max_length=1024, default="To be Scraped")
url = models.URLField(null=False, blank=False, max_length=10000)
class DataPoint(models.Model):
""" A model representing a datapoint in a Product's timeline. A new one is created for every product daily """
product = models.ForeignKey(Product, null=False)
price = models.FloatField(null=False, default=0.0)
inventory_left = models.BigIntegerField(null=False, default=0)
inventory_sold = models.BigIntegerField(null=False, default=0)
date_created = models.DateField(auto_now_add=True)
def __unicode__(self):
return "%s - %s" % (self.product.name, self.inventory_sold)
The goal is to sort a QuerySet of products based on the inventory_sold value of the latest datapoint attached to the product. Here's what I have so far:
products = Product.objects.all()
datapoints = DataPoint.objects.filter(product__in=products)
datapoints = list(datapoints.values("product__id", "inventory_sold", "date_created"))
products_d = {}
# Loop over the datapoints values array
for i in datapoints:
# If a datapoint for the product doesn't exist in the products_d, add the datapoint
if str(i["product__id"]) not in products_d.keys():
products_d[str(i["product__id"])] = {"inventory_sold": i["inventory_sold"], "date_created": i["date_created"]}
# Otherwise, if the current datapoint was created after the existing datapoint, overwrite the datapoint in products_d
else:
if products_d[str(i["product__id"])]["date_created"] < i["date_created"]:
products_d[str(i["product__id"])] = {"inventory_sold": i["inventory_sold"], "date_created": i["date_created"]}
# Sort the products queryset based on the value of inventory_sold in the products_d dictionary
products = sorted(products, key=lambda x: products_d.get(str(x.id), {}).get("inventory_sold", 0), reverse=True)
This works alright, but it's extremely slow with a high (500,000~) number of products and datapoints. Is there any better way to do this?
And on a side note (unimportant), since I haven't able to find anything about this, it seems like the unicode method of the DataPoint model is also making needless SQL queries. Is this something that's a default characteristic of Django models once they get passed to templates?
1 Answer
1
I think you can use a subquery here to annotate the value of the most recent datapoint, then sort on that.
Based on the example in those docs, it would be something like:
from django.db.models import OuterRef, Subquery
newest = DataPoint.objects.filter(product=OuterRef('pk')).order_by('-date_created')
products = Product.objects.annotate(
newest_inventory_sold=Subquery(newest.values('inventory_sold')[:1])
).order_by('newest_inventory_sold')
For your side point, to avoid the extra queries when outputting DataPoints you will need to use select_related
in the original query:
select_related
datapoints = DatePoint.objects.filter(...).select_related('product')
This will do a JOIN so that getting the product name doesn't cause a new db lookup.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
That's perfect! I had earlier tried using a combination of annotate with Max, but that was definitely the wrong choice.This worked brilliantly. Thanks for the help!
– Wiggy A.
Jun 30 at 10:40