"Where's My Stuff" camera

joejoejoe11

New member
How much time do you spend looking for a phone / remote control / car-keys?The idea: a stationary camera is positioned in your living-room / kitchen, possibly on the ceiling where is has a good field of view. Whenever you look for an object, you can ask (vocally) "wheres my " ?And get a vocal response, such as "on the kitchen table", "on the floor beneath the chair".

This can be good for:
  1. The Absent-minded
  2. ADHD
  3. Early stage dementia
  4. Finding things another person (such as family member) moved without your knowledge
  5. Fun??
For those of you who wonder, I am aware of the technical challenges: small object size, difficulties of object detection, occlusions, etc. There is also the concern about privacy; however with increasingly popular home assistants, robots (e.g., roomba) and security systems, I don't think this part will be a major hurdle.

EDIT: I am aware of the technical difficulties.

EDIT2: To clarify, due to doubts expressed in some comments. I'm a machine learning and computer vision expert. I know extremely well what I'm talking about.

My question to you, reddit:Would such a product be desirable?
 
@joejoejoe11 The tech is there. It would just be really expensive to rig your whole house for it.

Using a combination of MANY cameras all running constantly, making real time determinations that require a lot of commute on each camera, weight sensors, and nfc tags the same way Amazon brick and mortar stores work, you could do this. It would just be like $30k to outfit your house with this tech.

The consumer would have to be
  1. Rich
  2. Have no privacy concern whatsoever
  3. Be very forgetful
  4. Think solving their forgetfulness is worth more than the cost and the privacy
That market exists, but it's miniscule.
 
@renegade4god The costs I imagine should be a few hundreds of dollars at most:

There's no need to cover the entire house. For example, the bedroom will *really* cause privacy concerns. So, as I said, mainly the kitchen and living room. Other rooms are up to the user's choice.

Since this is purely a vision based solution, no need for weight sensors and NFC tags.

Basically a few ceiling mounted 360 degree cameras , or narrower FOV that track motion.

About bandwidth - I don't need to capture video all the time, only at a relatively low frequency (say, once a second) or when motion is detected. Most modern wifi networks should be able to support this.

In addition to this there is the cost of maintaining a main server (which doesn't have to be at the user's home, this is a design choice), that will perform the detection and identification of the objects.

So the users don't have to be so rich, really.
 
@joejoejoe11 I'm sorry, but you're just flat out wrong. Why would I buy this if it's only going to know where stuff in my kitchen and living room is? Half the time it won't be able to help me find something.

And what I'm telling you is, this can't be a purely vision based solution. Unless you're putting individual cameras with compute (like a deeplens which costs $250 a pop) in every corner and cabinet to cover the ENTIRE space. You brushed off occlusion in your OP, but you can't do that. It's a serious hurdle to your issue, and one that HAS to be addressed.

A few fueling mounted 360° cameras won't accomplish what you want anymore than like maybe 10% of the time. And no one is going to go through the hassle of using something like that that only works 10% of the time.

I never said anything about bandwidth. To accomplish what you want, you need full video processing at ANY point there is motion. Once a second won't work for what you want to do. The cameras could have their own private mesh network. Bandwidth isn't the issue. It's the constant processing of (what would have to be a very high quality) video stream that's the issue.

What? You don't need a main server. Unless they need to be able to access it outside their home (which I see no reason for) then you don't need to host that. That's just an unnecessary security vulnerability. If the identification models are built into the compute capabilities, then it can store a few megabytes of data like

KeysCountertop2020-06-24
RemoteCoffee Table2020-06-26

Item
Location
Date Seen

For every item.

Honestly, it's very clear that you have absolutely no understanding of how this technology works, so it's no surprise you're drastically underestimating the difficulty of this problem. And you aren't willing to listen to people trying to explain it to you.

This is my field of expertise. I am a data scientist. I have built a conservation drone that flies around my company's facilities and uses deeplens technology to identify dangerous or endangered species in the area. I understand this tech. I know how it works. You need to learn to listen.
 
@joejoejoe11 This problem has been thought through and I'm pretty sure I've read a few usability studies around this. We have the tech and we know how to solve it, there are essentially three parts to this problem:
  • vision: assuming you have cameras covering every square inch of your home, you can process the images and perform segmentation and effectively tagging items (you might have to initially ask questions to your users but nothing new)
  • knowledge base: once an artifact is identified it needs to stored effectively and be readily available to search.
  • search: your search Will be only as effective as the classification you can perform on it. For example when you say where are my keys you need to effectively build a context on the artifacts link to entities. Google images does that, so nothing new.
The problems:
  • privacy concerns
  • better classification, redoing classifications?
  • hard to describe things.
Problems except privacy can be mitigated upto a certain extent using heuristics. e.g.:
  • user: where is my box?
  • AI: which box?
  • user: the one that I got from Costco ( if the system had access to your location, it can minimize the search space by correlating the dates you were at Costco)
Just an example but these tiny snippets will help improve performance (again has been done before)

Companies like Samsung are building something similar but by restricting the amount of area they need to cover, e.g. smart refrigerators.
  • You need to watch only so much area
  • you need to segment and classify only so much items.
  • this could maybe then be scaled up to say cover the kitchen or one room at a time.
You can ask questions like:
  • suggest recipes based on what I have in the fridge
  • an I running out of milk?
  • auto populate grocery lists based on habits?.
Hope this helps :)
 
@truthseeker1971 Another problem to not is cost. A single high def camera with onboard memory and compute will run you like $250. And this startup expects people to buy several of them to cover their whole house just for the convenience of maybe being able to find something when you've misplaced it?
 
@truthseeker1971 This definitely helps :) ; a few remarks:

> This problem has been thought through

I'd really love it if you can post here links to where this has been thought through, the usability studies, and Samsung's product.

Specifically I think a refrigerator is too difficult, definitely more than the problem I'm trying to solve, if I limit myself to specific objects.

The problems you specify can be addressed with a one-time step, such as showing the system the phone and saying "this is my phone"; the system will be able to detect any object it has been familiarized with in this way.

The Costco-box problem is interesting, however for now I suggest to address this pareto-style - solve well the most common problems, and think about the long-tail later.
 
@joejoejoe11 How on earth could you possibly think that doing this in a fridge would somehow be MORE difficult than doing it across the whole house?

The problems you specify can be addressed with a one-time step, such as showing the system the phone and saying "this is my phone"; the system will be able to detect any object it has been familiarized with in this way.

That's not how Machine Learning works. If it sees your phone from a different angle, with different lighting, while partially obstructed, it won't be able to identify it. ML models need to be trained on large volumes of data.

A big reason you're underestimating the difficulty of this problem is because you apparently have absolutely NO idea how the technology works.
 
Back
Top