Unstructured Street Data in New York

Tim Spann
7 min readJun 7, 2024

--

Milvus, Vector Databaase, Deep Learning, AI, Gen AI, Vision, Street Cameras, Open Data, Vision Processing, Ultralytics, Slack, Pytorch

Since I am in New York preparing for some incredible meetups and can’t wait to interact with everyone I wanted to start building some New York City use cases for unstructured data processing and vector databases.

You may have seen my previous article looking at this data and we used Ultralytics YoLo v8 to find cool stuff in them and post to Slack.

I have found I can do a lot with Python, Milvus and some killer libraries. So I rebuilt this and it’s cool. I am using Python 3.11 and 3.12 on Mac OSX M1 and Ubuntu 20 on AMD. I am using the latest Milvus and Pymilvus SDK. I am running ATTU on Docker for management.

For this project, we will need Python 3.11+ and these libraries.

LIBRARIES

  • ultralytics
  • slack_sdk
  • pymilvus
  • scikit-learn
  • timm
  • torch
  • numpy
  • Torch Vision
  • pillow

This demo uses some great portions of this great example, so if you tried that one before you will see a lot of things in familiar:

I am going to build a simple unstructured data processing pipeline with Python and show you what I am doing step by step. Below is the full code if you want to jump ahead.

Step 1: Load List of Cameras from 511NY REST JSON End point

response = requests.get(url).content

Step 2: Load JSON

json_object = json.loads(response)

Step 3: YOLO v8 Model

yolomodel = YOLO(‘yolov8n.pt’)

Step 4: Connect to Slack

client = WebClient(token=slack_token)

Step 5: Connect to Milvus

milvus_client = MilvusClient( uri=MILVUS_URL )

Step 6: Build a Schema

schema = CollectionSchema(fields=fields)

Step 7: Create a Collection

milvus_client.create_collection(COLLECTION_NAME, DIMENSION, schema=schema, metric_type=”COSINE”, auto_id=True)

Step 8: Create an index

index_params = milvus_client.prepare_index_params()
index_params.add_index(field_name = “vector”, metric_type=”COSINE”)
milvus_client.create_index(COLLECTION_NAME, index_params)

Step 9: Iterate through JSON and Write fields and images

for jsonitems in json_object:
videourl = jsonitems['VideoUrl']
img = requests.get(url)
if img.status_code == 200:
with open(filepath, 'wb') as f:
f.write(img.content)

Step 10: Run YOLO v8 Prediction on Image

results = yolomodel.predict(filepath, stream=False, save=True, imgsz=640, conf=0.5)

Step 11: Iterate through YOLO results

for result in results:
outputimage = result.path
savedir = result.save_dir
speed = result.speed
names = result.names
boxes = result.boxes # Boxes object for bounding box outputs
masks = result.masks # Masks object for segmentation masks outputs
keypoints = result.keypoints # Keypoints object for pose outputs
probs = result.probs # Probs object for classification outputs
obb = result.obb # Oriented boxes object for OBB outputs
resultfilename = "camimages/{0}.png".format(uuid.uuid4())
result.save(filename=resultfilename) # save to disk
strText = ":tada:" + str(strname) + ":" + str(roadwayname)

Step 12: Post and Upload to Slack

try:
response = client.files_upload_v2(
channel="C06NE1FU6SE",
file=resultfilename,
title=roadwayname,
initial_comment="Transformed image " + str(strfilename),
)
except SlackApiError as e:
assert e.response["error"]

Finally, insert into Milvus.

try:
imageembedding = extractor(resultfilename)
milvus_client.insert( COLLECTION_NAME, {"vector": imageembedding,
"filepath": filepath, "url": url, "videourl": videourl, "latlong": latlong,
"name": strname, "roadwayname": roadwayname,"directionoftravel":
directionoftravel, "videourl": videourl})
      print("resultfilename:" + resultfilename)
print("Milvus:sent collection:" + roadwayname)
except Exception as e:
print("An error:", e)

Now we can run the application and start processing street camera images.

SLACK_BOT_TOKEN="tokenFindYourown" NYURL="https://511ny.org/api/getcameras?key=getakey&format=json" python streetcams.py

Let’s search against our new camera images.

Showing Collection in ATTU

OUTPUT IN SLACK

DEMO

REAL WORLD

Not only is our data in the real world, so are we. Come to our beautiful new location with our friendly hosts. I swear we are not in the Matrix.

UPCOMING MEETUPS

June 18, 2024 — Princeton

June 20, 2024 — Times Square at Microsoft Reactor with AI Camp NYC

CODE

NEXT STEPS

If you have seen previous meetups, talks, articles or demos you know that I have been acquiring transit data. My next step is to grab transit, weather and other public data sources for New York City and enhance them with images and image processing. With our images vectorized, we can find nearest matches with example images!?!?!! We will also have correlated data for latitude, longitude, speeds, temperatures and lots of other attributes. We will also see how to add JSON and Array fields to Milvus. We will also use edge NVIDIA devices to capture, vectorize and run various models right on the device.

Next up I will be building a notebook with some extra comments and display.

NEW YORK CITY DATA

The unstructured database for New York City and beyond

RESOURCES

--

--

Tim Spann

Principal Developer Advocate, Zilliz. Milvus, Attu, Towhee, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning. https://www.datainmotion.dev/